Discounted cumulative gainDiscounted cumulative gain (DCG) is a measure of ranking quality in information retrieval. It is often normalized so that it is comparable across queries, giving Normalized DCG (nDCG or NDCG). NDCG is often used to measure effectiveness of search engine algorithms and related applications. Using a graded relevance scale of documents in a search-engine result set, DCG sums the usefulness, or gain, of the results discounted by their position in the result list.[1] NDCG is DCG normalized by the maximum possible DCG of the result set when ranked from highest to lowest gain, thus adjusting for the different numbers of relevant results for different queries. OverviewTwo assumptions are made in using DCG and its related measures.
Cumulative GainDCG is a refinement of a simpler measure, Cumulative Gain (CG).[2] Cumulative Gain is the sum of the graded relevance values of all results in a search result list. CG does not take into account the rank (position) of a result in the result list. The CG at a particular rank position is defined as: Where is the graded relevance of the result at position . The value computed with the CG function is unaffected by changes in the ordering of search results. That is, moving a highly relevant document above a higher ranked, less relevant, document does not change the computed value for CG (assuming ). Based on the two assumptions made above about the usefulness of search results, (N)DCG is usually preferred over CG. Cumulative Gain is sometimes called Graded Precision. Discounted Cumulative GainThe premise of DCG is that highly relevant documents appearing lower in a search result list should be penalized, as the graded relevance value is reduced logarithmically proportional to the position of the result. The usual formula of DCG accumulated at a particular rank position is defined as:[1] Until 2013, there was no theoretically sound justification for using a logarithmic reduction factor[3] other than the fact that it produces a smooth reduction. But Wang et al. (2013)[2] gave theoretical guarantee for using the logarithmic reduction factor in Normalized DCG (NDCG). The authors show that for every pair of substantially different ranking functions, the NDCG can decide which one is better in a consistent manner. An alternative formulation of DCG[4] places stronger emphasis on retrieving relevant documents: The latter formula is commonly used in industrial applications including major web search companies[5] and data science competition platforms such as Kaggle.[6] These two formulations of DCG are the same when the relevance values of documents are binary;[3]: 320 . Note that Croft et al. (2010) and Burges et al. (2005) present the second DCG with a log of base e, while both versions of DCG above use a log of base 2. When computing NDCG with the first formulation of DCG, the base of the log does not matter, but the base of the log does affect the value of NDCG for the second formulation. Clearly, the base of the log affects the value of DCG in both formulations. Convex and smooth approximations to DCG have also been developed, for use as an objective function in gradient based learning methods.[7] Normalized DCG
Search result lists vary in length depending on the query. Comparing a search engine's performance from one query to the next cannot be consistently achieved using DCG alone, so the cumulative gain at each position for a chosen value of should be normalized across queries. This is done by sorting all relevant documents in the corpus by their relative relevance, producing the maximum possible DCG through position , also called Ideal DCG (IDCG) through that position. For a query, the normalized discounted cumulative gain, or nDCG, is computed as:
where IDCG is ideal discounted cumulative gain, and represents the list of relevant documents (ordered by their relevance) in the corpus up to position p. The nDCG values for all queries can be averaged to obtain a measure of the average performance of a search engine's ranking algorithm. Note that in a perfect ranking algorithm, the will be the same as the producing an nDCG of 1.0. All nDCG calculations are then relative values on the interval 0.0 to 1.0 and so are cross-query comparable. The main difficulty encountered in using nDCG is the unavailability of an ideal ordering of results when only partial relevance feedback is available. ExamplePresented with a list of documents in response to a search query, an experiment participant is asked to judge the relevance of each document to the query. Each document is to be judged on a scale of 0-3 with 0 meaning not relevant, 3 meaning highly relevant, and 1 and 2 meaning "somewhere in between". For the documents ordered by the ranking algorithm as the user provides the following relevance scores: That is: document 1 has a relevance of 3, document 2 has a relevance of 2, etc. The Cumulative Gain of this search result listing is: Changing the order of any two documents does not affect the CG measure. If and are switched, the CG remains the same, 11. DCG is used to emphasize highly relevant documents appearing early in the result list. Using the logarithmic scale for reduction, the DCG for each result in order is:
So the of this ranking is: Now a switch of and results in a reduced DCG because a less relevant document is placed higher in the ranking; that is, a more relevant document is discounted more by being placed in a lower rank. The performance of this query to another is incomparable in this form since the other query may have more results, resulting in a larger overall DCG which may not necessarily be better. In order to compare, the DCG values must be normalized. To normalize DCG values, an ideal ordering for the given query is needed. For this example, that ordering would be the monotonically decreasing sort of all known relevance judgments. In addition to the six from this experiment, suppose we also know there is a document with relevance grade 3 to the same query and a document with relevance grade 2 to that query. Then the ideal ordering is: The ideal ranking is cut again to length 6 to match the depth of analysis of the ranking: The DCG of this ideal ordering, or IDCG (Ideal DCG) , is computed to rank 6: And so the nDCG for this query is given as: Limitations
See alsoReferences
|