RankNet, LambdaRank TensorFlow Implementation — part IV

3 min readFeb 28, 2021

In this blog, I will talk about LambdaRank and its TensorFlow implementation. Note that LambdaRank is published in the paper Learning to Rank with Nonsmooth Cost Functions.

RankNet Revisited

Since LambdaRank is improved upon RankNet, let first revisit the cost function and gradients of RankNet.

RankNet’s Cost Function

Note that in Learning To Rank, there are a few typical ranking losses

Mean Reciprocal Rank (MRR)
Mean Average Precision (MAP)
Normalised Discounted Cumulative Gain (NDCG)

But usually these ranking loss are non differentiable, hence it is difficult to apply gradient descent directly. To work around this problem, RankNet used a cross entropy cost function as shown in equation I below.

Equation 0. Definition of Pij

RankNet’s Gradients

Recall that in previous blog, we have following equations

Equation 2. Derivative of cost with respect to RankNet’s weights

Equation 3. Derivative of cost with respect to oij=oi-oj

RankNet is trained with gradient descent, hence gradients with respect to model’s weights (dCij/dWk) need to be computed throughout the training process.

And from equations (2) and (3), it is obvious that we can actually compute the gradients directly without computing the cost at all.

LambdaRank

From above, we have two main observations

RankNet does not consider any ranking loss in the optimisation process
Gradients could be computed without computing the cross entropy loss

To improve upon RankNet, LambdaRank defined the gradient directly (without defining its corresponding loss function) by taking ranking loss into consideration: scale the RankNet’s gradient by the size of change in NDCG if document i is swapped with j.

The idea is quite straight forward, if the change in NDCG by swapping i and j is large, we expect the gradient to be large as well to make this happens in the hope of maximising NDCG. And indeed, the paper shown that this works empirically.

TensorFlow Implementation

In order to implement LambdaRank based on RankNet, we need to calculate the change in NDCG if i and j is swapped and apply it in gradient descent.

Note that in line 10–11 in the above code snippet, delta NDCG is calculated and applied.

Performance

To show the difference in performance between LambdaRank with RankNet, I performed experiments on artificial dataset which has 200 queries, 50 documents per query on average, 50% are used as training set and 50% as validation set. NDCG@k on validation set is measured and shown below.

Validation NDCG@k of RankNet vs LambdaRank

From the graph, we could see that LambdaRank gives better NDCG than RankNet. You may find the jupyter notebook here which generate the graph above.