In this blog, I will talk about LambdaRank and its TensorFlow implementation. Note that LambdaRank is published in the paper Learning to Rank with Nonsmooth Cost Functions.

RankNet Revisited

Since LambdaRank is improved upon RankNet, let first revisit the cost function and gradients of RankNet.

RankNet’s Cost Function

Note that in Learning To Rank, there are a few typical ranking losses

But usually these ranking loss are non differentiable, hence it is difficult to apply gradient descent directly. …

In this blog, I will talk about the how to speed up training of RankNet and I will refer to this speed up version as Factorised RankNet. Note that this is published in the paper Learning to Rank with Nonsmooth Cost Functions.

RankNet Training Process Examined

Typically, to train a ML model, we need to do two things:

And recall that in part I, we have the following equations:

Equation 1. RankNet’s cost function

In part I, I have go through RankNet which is published by Microsoft in 2005. 2 years after, Microsoft published another paper Learning to Rank with Nonsmooth Cost Functions which introduced a speedup version of RankNet (which I called “Factorised RankNet”) and LambdaRank.

However, before I jump to talk about Factorised RankNet and LambdaRank, I’d like to show you how to implement RankNet using custom training loop in Tensorflow 2.

This is important because Factorised RankNet and LambdaRank cannot be implemented just by Keras API, it is necessary to use low level API like TensorFlow and PyTorch as we will…

I come across the field of Learning to Rank (LTR) and RankNet, when I was working on a recommendation project. However, it is a bit tricky to implement the model via TensorFlow and I cannot find any detail explanation on the web at all. Hence in this series of blog posts, I’ll go through the papers of both RankNet and LambdaRank in detail and implement the model in TF 2.0.

For this post, I will go through the followings

This seems to match with the title “Starspace”

Recently I have come across a tool called StarSpace which is created by Facebook. StarSpace catches my eyes because it learns to represent different objects into the same embedding space. In order words, it learns embeddings and used these embeddings to complete a lot of different tasks, below are some examples

StarSpace could archive many more tasks, as long as…

UC Berkeley had organised a great Bootcamp on reinforcement learning back in 2017. And the exercise 3.6 of lab 4 asked candidates to compute gradient of a softmax policy.

Lab 4 Exercise 3.6

And I found that the formulation of the policy in lab 4 is a little bit different from the formulation given in lecture (e.g. David Sliver’s course, Chapter 13 of Sutton & Barto Book).

In this blog post, we will examine how to compute document similarity for news articles using Denoising Autoencoder (DAE) combined with a Triplet Loss function. This approach is presented in Article De-duplication Using Distributed Representations published by Yahoo! JAPAN and my Tensoflow implementation could be find here.


The most common way of computing document similarity is to transform documents into TFIDF vectors and then apply any similarity measure e.g. cosine similarity to these vectors.

However this approach has 2 disadvantages / limitations:

Louis Kit Lung Law

Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store