Evaluation in the Wild
This work has been created in the PROMISE network of excellence (contract n. 258191) as a part of the 7th Framework Program of the European commission (FP7/2007-2013).
Stefan Rietberger, Melanie Imhof, Martin Braschler, Zurich University of Applied Sciences
Richard Berendsen, University of Amsterdam
Anni Järvelin, Preben Hansen, Swedish Institude of Compute Science / University of Gothenburg
Alba García Seco de Herrera, Theodora Tsikrika, University of Applied Sciences and Arts Western Switzerland
Mihai Lupu, Vienna University of Technology
Vivien Petras, Maria Gäde, Michael Kleineberg, Humboldt University
Khalid Choukri, Evaluations and Language Resources Distribution Agency
This report describes a methodology to perform an application-centric evaluation of operational information access systems. The application is treated as a black-box. Moreover the presented methodology can evaluate aspects of functionality which typical users can actually access and experience. The methodology estimates the user perception based on a wide range of criteria that cover four categories, namely indexing, document matching, the quality of the search results and the user interface of the system. The criteria are established best practices in the information retrieval domain as well as advancements for user search experience. For each criterion a test script has been defined that contains step-by-step instructions, a scoring schema and adaptations for the three PROMISE use case domains. The proposed methodology can be used to monitor a single application over time, to conduct a direct comparison of a few applications or to arrange an evaluation campaign. The evaluation requires the tested search application to have text search functionality. Also we recommend to only compare applications within the same use case domain. To validate the presented methodology an evaluation campaign was conducted, where each participating PROMISE partner evaluated sites either from the PROMISE use case domains or from the enterprise search domain. The results and insights from this campaign served as a basis to further improve and refine the proposed criteria. In the end of the report a practical step by step tutorial to conduct an evaluation using the methodology as described is given.