Creation of Reliable Relevance Judgments in Information Retrieval Systems Evaluation Experimentation through Crowdsourcing: A Review
Table 1
User-based evaluation methods.
User-based methods
Description
Human in the lab
This method involves human experimentation in the lab to evaluate the user-system interaction
Side-by-side panels
This method is defined as collecting the top ranked answers generated by two IR systems for the same search query and representing them side by side to the users. To evaluate this method, in the eyes of human assessor, a simple judgment is needed to see which side retrieves better results
A/B testing
A/B testing involves numbers of preselected users of a website to analyse their reactions to the specific modification to see whether the change is positive or negative
Using clickthrough data
Clickthrough data is used to observe how frequently users click on retrieved documents for a given query
Crowdsourcing
Crowdsourcing is defined as outsourcing tasks, which was formerly accomplished inside a company or institution by employees assigned externally to huge, heterogeneous mass of potential workers in the form of an open call through Internet