Tasks
Task definition: This task is a standard ad-hoc retrieval task, which measures information retrieval effectiveness with respect to user input in the form of queries. No further user-system interaction is assumed although automatic blind feedback or query expansion mechanisms are allowed to improve the system ranking. The ad-hoc setting is the standard setting for an information retrieval system - without prior knowledge about the user need or context, the system is required to produce a relevance-ranked list of documents based entirely on the query and the features of the collection documents. For CHiC, it will also serve to develop a baseline for system performance. We will test monolingual, bilingual and multilingual retrieval in 3 major European languages: English, French and German.
Topics: Topics are taken from real-life Europeana query topics and consist of a mixture of topical and named-entity queries. Navigational queries are rarely seen in Europeana, however queries for people, places and works (named entities) occur very often. The 50 short topics in title-format only (e.g. "Eiffel tower") reflect real expressed user needs and are distributed according to query category statistics (mostly named entities, some topical queries etc.) in a cultural heritage digital library researched previously.
Expected results: Participants are expected to submit relevance-ranked result lists for all 50 topics in TREC-style format. More specifications on the result formatting will be released later.
Relevance assessments: Relevance assessments will be done manually by first collaboratively generating an assumed information need for the query and describing it (which will be used for later editions) and assessing the pooled documents for their relevance according to the query + information need. This assumes the perspective of an average user (we assume the majority of users typing that particular query would have that particular information).
Evaluation metrics: The evaluation metrics for the ad-hoc task will be the standard information retrieval measures of precision and recall, particularly the standard measure mean average precision (MAP) and precision@k.
Task definition: This task requires systems to present a list of 12 objects (represents the first Europeana results page), which are relevant to the query and should present a particular good overview over the different object types and categories targeted towards a casual user, who might like the "best" documents possibly sorted into "must sees" and "other possibilities." This task is about returning diverse objects and resembles the diversity tasks of the Interactive TREC track or the CLEF Image photo tracks. For CHiC, this task resembles a typical user of a cultural heritage information system, who would like to get an overview over what the system has with respect to a certain concept or what the best alternatives are. It is also a pilot task for this type of data collection.
Documents returned should be as diverse as possible with respect to:
- media type of object (text, image, audio, video)
- content provider
- query category
- field match (which metadata field contains a query term)
- other features to be described / suggested by participants, e.g. other query categories.
We will test monolingual, bilingual and multilingual retrieval in 3 major European languages: English, French and German.
Topics: Topics are taken from real-life Europeana query topics and consist of a mixture of topical and named-entity queries. The 25 topics reflect real expressed user needs and are distributed according to query category statistics (mostly named entities, some topical queries etc.) but will be enhanced with suggested query categories that show different ambiguous aspects of a topic (e.g. topic = "Chardonne", categories: person, place). More query categories can be suggested by participants.
Expected results: Participants are expected to submit 12 relevant results for all 25 topics in TREC-style format. More specifications on the result formatting will be released later.
Relevance assessments: Relevance assessments will be done manually by first collaboratively generating an assumed information need for the query and describing it (which will be used for later editions) and assessing the pooled documents for their relevance according to the query + information need + variability / diversity.
If possible, we will compare 2 types of assessments: cultural heritage experts vs. "naive" users of cultural heritage information systems in order to be able to compare their assessments of relevance and variability.
Evaluation metrics: The evaluation metrics for the variabililty task will be the standard information retrieval measure of precision, particularly the standard measure mean average precision (MAP) and precision@k as well as diversity measures used in the Interactive TREC track like cluster-recall and intent-aware precision, which might be adapted to the diversity requirements set forth in this task.
Task definition: The task requires systems to present a ranked list of at most 10 related concepts for a query to semantically enrich the query and / or guess the user's information need or original query intent. Related concepts can be extracted from Europeana data (internal information) or from other resources in the LOD cloud or other external resources (e.g. Wikipedia).
Europeana already enriches about 30% of its metadata objects with concept names and place (included in the test collection). It uses the following vocabularies for its included semantic enrichments, which can be explored further as well:
- GeoNames
- GEMET
- DBPedia
Semantic enrichment is an important task in information systems with short and therefore ambiguous queries like Europeana, which will support the information retrieval process either interactively (the user is asked for clarification, e.g. "Did you mean?") or automatically (the query is automatically expanded with semantically related concepts to increase the likely search success). For CHiC, this task resembles a typical user interaction, where the system should react to an ambiguous query with a clarification request (or a result output as required in the variability task). We will offer the task and topics in 3 major European languages: English, French and German.
Additional Collections: For semantic enrichment, the Europeana Linked Open Data collections can also be used: Europeana released metadata on 2.5 million objects as linked open data in a pilot project. The data is represented in the Europeana Data Model (RDF) and encompasses collections from ca. 300 content providers. Other external resources are allowed but need to be specified in the description from participants. The objects described in the LOD dataset are included in the Europeana test collection, but the RDF format might be convenient for accessing object enrichments.
Topics: Topics are taken from real-life Europeana query topics and consist of a mixture of topical and named-entity queries. The 25 topics reflect real expressed user needs and are distributed according to query category statistics (mostly named entities, some topical queries etc.).
Expected results: Participants are expected to submit 10 ranked different terms or phrases for all 25 topics which express semantic enrichments for the query in the respective language and could be used for query expansion. More specifications on the result formatting will be released later.
Relevance assessments: Relevance will be assessed in 2 phases:
(1) First all submitted enrichments will be assessed manually for use in an interactive query expansion environment (e.g. "does this suggestion make sense with respect to the original query?").
The summary for the manual assessment on a 3-point scale: definitly relevant, maybe relevant, not relevant is now available: Summary SE-Task. Precision (strong) is the average precision (over 25 queries) of "relevant" suggestions over all suggestions. Precision (weak) is the average precision (over 25 queries) of "relevant" and "maybe relevant" over all suggestions.
(2) The submitted terms and phrases will be used in a query expansion experiment with a standard IR system, i.e. the enrichments will be individually added to the query and submitted to the system. The results will be assessed according to ad-hoc retrieval standards.
Evaluation metrics: The evaluation metrics for the semantic enrichment task will be the standard information retrieval measure of precision (+precision@1 and @3) for the first phase of assessing just the submitted enrichments and the standard ad-hoc information retrieval measures for the second phase of assessing the submitted enrichments as query expansion variations.