Log Data

The European Library (TEL) dataset

The TEL search/action logs are stored in a relational table and contain different types of actions and choices of the user. Each record represents a user action and the most significant fields:

  • A numeric id, for identifying registered users or "guest" otherwise;
  • User's IP address;
  • An automatically generated alphanumeric, identifying sequential actions of the same user (sessions) ;
  • Query contents;
  • Name of the action that a user performed;
  • The corresponding collection's alphanumeric id;
  • Date and time of the action's occurrence.

Three years and a half of log data will be released:

  • January 2007-June 2008, 1,900,000 records (distributed at LogCLEF 2009)
  • January 2009-December 2009, 760,000 records (distributed at LogCLEF 2010)
  • January 2010-December 2010, 950,000 records (to be distributed at LogCLEF 2011)

Sogou dataset

The Sogou query logs contain queries to the Chinese Sogou search engine. The data contains:

  • a user ID
  • the query terms
  • URL in the result ranking
  • user click information

We are currently negotiating a separate license agreement for giving LogCLEF participants access to the Sogou log data

Deutscher Bildungserver (DBS) dataset

The quality controlled "Deutscher Bildungsserver" is a clearinghouse for educational resources on the Web. It also contains content provided by the DIPF as well as descriptions and reviews on Web sites on education. The Internet resources (web sites) are described, checked for their quality, manually indexed and classified. The logs were collected in the time between September and November of 2009. The logs are server logs in standards format in which the searches and the results viewed can be observed. An excerpt is shown in table 2. The logs have been anonymized by partially obscuring the IP addresses of users. The two upper levels of server names or IP addresses have been hashed. This allows the reconstruction of sessions within the data. Note that accesses by search engine bots are still within the logs. The logs allow to observe two types of user queries:

  • queries in search engines (in the referrer when DBS files were found using a search engine)
  • queries within the DBS (see query parameters in metasuche/qsuche)