In this blog, I will talk about LambdaRank and its TensorFlow implementation. Note that LambdaRank is published in the paper **Learning to Rank with Nonsmooth Cost Functions**.

Since LambdaRank is improved upon RankNet, let first revisit the cost function and gradients of RankNet.

Note that in Learning To Rank, there are a few typical ranking losses

- Mean Reciprocal Rank (MRR)
- Mean Average Precision (MAP)
- Normalised Discounted Cumulative Gain (NDCG)

But usually **these ranking loss are non differentiable**, hence it is difficult to apply gradient descent directly. …

In this blog, I will talk about the how to speed up training of RankNet and I will refer to this speed up version as Factorised RankNet. Note that this is published in the paper **Learning to Rank with Nonsmooth Cost Functions****.**

Typically, to train a ML model, we need to do two things:

- compute cost function
- compute gradient (derivative of cost with respect to model’s weights)

And recall that in part I, we have the following equations:

In part I, I have go through **RankNet**** **which is published by Microsoft in 2005. 2 years after, Microsoft published another paper **Learning to Rank with Nonsmooth Cost Functions** which introduced a speedup version of RankNet (which I called **“Factorised RankNet”**) and **LambdaRank**.

However, before I jump to talk about Factorised RankNet and LambdaRank, I’d like to show you how to implement RankNet using custom training loop in Tensorflow 2.

This is important because Factorised RankNet and LambdaRank cannot be implemented just by Keras API, it is necessary to use low level API like TensorFlow and PyTorch as we will…

I come across the field of Learning to Rank (LTR) and RankNet, when I was working on a recommendation project. However, it is a bit tricky to implement the model via TensorFlow and I cannot find any detail explanation on the web at all. Hence in this series of blog posts, I’ll go through the papers of both RankNet and LambdaRank in detail and implement the model in TF 2.0.

For this post, I will go through the followings

- the paper which first proposed RankNet (Learning to Rank using Gradient Descent)
- the paper summarised RankNet, LambdaRank (From RankNet to LambdaRank…

Recently I have come across a tool called StarSpace which is created by Facebook. StarSpace catches my eyes because it learns to represent different objects into the same embedding space. In order words, it learns embeddings and used these embeddings to complete a lot of different tasks, below are some examples

- Word / Tag embeddings (map from a short text to relevant hashtags)
- Document recommendation (embed and recommend documents for users based on their historical likes/click data)
- Sentence Embeddings (given the embedding of one sentence, one can find semantically similar/relevant sentences)

UC Berkeley had organised a great Bootcamp on reinforcement learning back in 2017. And the exercise 3.6 of lab 4 asked candidates to compute gradient of a softmax policy.

And I found that the formulation of the policy in lab 4 is a little bit different from the formulation given in lecture (e.g. David Sliver’s course, Chapter 13 of Sutton & Barto Book).

In this blog post, we will examine how to** compute document similarity for news articles** using **Denoising Autoencoder** (DAE) combined with a **Triplet Loss** function. This approach is presented in Article De-duplication Using Distributed Representations published by Yahoo! JAPAN and my Tensoflow implementation could be find here.

The most common way of computing document similarity is to transform documents into TFIDF vectors and then apply any similarity measure e.g. cosine similarity to these vectors.

However this approach has 2 disadvantages / limitations:

- This requires transforming every document into a vector of large dimensions (~10k) and it may not be ideal…

Data Scientist