英语翻译We now define the learning problem and the user-in
英语翻译
We now define the learning problem and the user-interaction model more generally.At each round t,our algorithm presents a ranking yt from a corpus xt ∈ X of candidate documents1.We assume that the user acts (approximately) rational ac- cording to an unknown utility function U(xt,yt) that models both relevance of the documents as well as their dependen- cies (e.g.redundancy).In the context of such a utility function,we can interpret the user feedback as a preference between rankings.This type of preference feedback over multiple rounds t is the input for our learning model.Given the set of candidate documents xt,the optimal ranking is denoted by
y∗ t := arg max y∈Y U(xt,y).(1)
Since the user’s utility function U(xt,y) is unknown,this optimal ranking y∗ t cannot be computed.The goal of the learning algorithm is to predict rankings with utility close to that of y∗ t .Note,however,that the user feedback does not even give the optimal y∗ t to the algorithm (as in traditional supervised learning),but only the user feedback ranking y¯t is observed.To analyze the learning algorithms in the sub- sequent sections,we refer to any feedback that satisfies the following inequality as strictly α-informative feedback:
U(xt,y¯t) − U(xt,yt) ≥ α(U(xt,y∗ t ) − U(xt,yt)).(2)