RankNet, LambdaRank TensorFlow Implementation — part IV
In this blog, I will talk about LambdaRank and its TensorFlow implementation. Note that LambdaRank is published in the paper Learning to Rank with Nonsmooth Cost Functions.
RankNet Revisited
Since LambdaRank is improved upon RankNet, let first revisit the cost function and gradients of RankNet.
RankNet’s Cost Function
Note that in Learning To Rank, there are a few typical ranking losses
- Mean Reciprocal Rank (MRR)
- Mean Average Precision (MAP)
- Normalised Discounted Cumulative Gain (NDCG)
But usually these ranking loss are non differentiable, hence it is difficult to apply gradient descent directly. To work around this problem, RankNet used a cross entropy cost function as shown in equation I below.
RankNet’s Gradients
Recall that in previous blog, we have following equations
RankNet is trained with gradient descent, hence gradients with respect to model’s weights (dCij/dWk) need to be computed throughout the training process.
And from equations (2) and (3), it is obvious that we can actually compute the gradients directly without computing the cost at all.
LambdaRank
From above, we have two main observations
- RankNet does not consider any ranking loss in the optimisation process
- Gradients could be computed without computing the cross entropy loss
To improve upon RankNet, LambdaRank defined the gradient directly (without defining its corresponding loss function) by taking ranking loss into consideration: scale the RankNet’s gradient by the size of change in NDCG if document i is swapped with j.
The idea is quite straight forward, if the change in NDCG by swapping i and j is large, we expect the gradient to be large as well to make this happens in the hope of maximising NDCG. And indeed, the paper shown that this works empirically.
TensorFlow Implementation
In order to implement LambdaRank based on RankNet, we need to calculate the change in NDCG if i and j is swapped and apply it in gradient descent.
Note that in line 10–11 in the above code snippet, delta NDCG is calculated and applied.
Performance
To show the difference in performance between LambdaRank with RankNet, I performed experiments on artificial dataset which has 200 queries, 50 documents per query on average, 50% are used as training set and 50% as validation set. NDCG@k on validation set is measured and shown below.
From the graph, we could see that LambdaRank gives better NDCG than RankNet. You may find the jupyter notebook here which generate the graph above.