The overall organization will follow a CLEF-style evaluation process with a shared dataset composed of a: collection of documents, a set of topics, and a set relevance assessments.
The languages of the collections for the task are (ISO 6391-1 codes within parentheses):
- English (en);
- French (fr);
- German (de);
- Greek (el);
- Italian (it);
- Spanish (es);
- Swedish (sv);
- Ukranian (uk).
Topics will be available in the above languages and, in addition:
- Chinese (zh);
- Japanese (ja).
In the first round, we plan to evaluate the systems in a classic multilingual lexical search fashion. In the subsequent rounds, the multilingual semantic search aspects will be more prominent by asking the systems, for example, to highlight technical concepts in the topics as well as the documents, to show the contextual meaning of the concept in order to improve the readability of the document and the effectiveness of the system.
For each of the two subtasks described in the following, we welcome to types of submissions
- monolingual runs where the language of the collection and the language of the topics is the same;
- bilingual runs where the language of the collection and the language of the topics are different same.
Subtask 1 - High Precision:
In this subtask, participants are required to build systems that will help the general public to retrieve the most relevant documents on the Web concerning COVID-19 efficiently. The main focus of this subtask is on the top ranked documents; evaluation measures like Precision at 5 and 10 documents as well as Normalized Discounted Cumulative Gain will be used to compare systems.
Substask 2 - High recall:
In this subtask, the focus is more on the problem of finding as many relevant documents as possible with the least effort. Given a limited amount of resources, such as a time limit and expert availability in time of crisis, there will be a limit on the maximum number of documents that can be retrieved in order to build a set of relevant documents that should be delivered to the general public. Evaluation measures like Recall@k and Area Under ROC will be used to compare the systems.
In the first round, the systems will work without relevant information. From the second round, the systems can use the information about the relevance assessments to optimize their systems.
The topics have been created by selecting 1) a subset of the queries created for the TREC-COVID Task (courtesy of TREC-COVID Task organizers) and 2) a selection of queries made available in the Bing search dataset for Coronavirus Intent which includes queries from all over the world that had an explicit/implicit intent related to the Coronavirus or Covid-19.
Topics are structured in the following way:
<topic number"topic identifier" xml:lang="ISO 639-1 code" >
<keyword>keyword based query</keyword>
<conversational>the query as a question posed by the user</conversational>
<explanation>a more detailed explanation of what the set of retrieved documents should look like</explanation>
keyword field represents the “traditional” way a user performs the search on a Web search engine. It is basically a set of keywords, i.e. "surgical mask protection".
conversational field is more like a way of asking the same thing in a verbal way, i.e. "does a surgical mask protect from covid-19?"
explanation field is used to provide information to the assessors when performing relevance assessments, i.e. "The documents retrieved should contain information about …".
Please, find hereby the links to download the topics for each round.
After participants submit their runs, a subset of documents for each run will be pooled for each topic in order to get a sample of documents to judge.
Please, find hereby the links to download the relevance judgements for each round.
Participants are provided with a single repository for all the tasks they take part in.
The repository contains the runs, resources, code, and report of each participant.
The repository is organised as follows:
submission: this folder contains the runs submitted for the different tasks in the different evaluation rounds.
score: this folder contains the performance scores of the submitted runs.
code: this folder contains the source code of the developed system.
resource: this folder contains (language) resources created during the participation.
report: this folder contains the rolling technical report describing the techniques applied and insights gained during participation, round after round.
Covid-19 MLIA Eval consists of three tasks run in three rounds.
score folders are organized into sub-folders for each task and round as follows:
submission/task1/round1: for the runs submitted to the first round of the first taks. Similar structure for the other tasks and rounds.
score/task1/round1: for the performance scores of the runs submitted to the first round of the first taks. Similar structure for the other tasks and rounds.
Participants which do not take part in a given task or round can simply delete the corresponding sub-folders.
The goal of Covid-19 MLIA Eval is to speed up the creation of multilingual information acces systems and (language) resources for Covid-19 as well as openly share these systems and resources as much as possible. Therefore, participants are more than encouraged to share their code and any additional (language) resources they have used or created.
All the contents of these repositories are realeased under the Creative Commons Attribution-ShareAlike 4.0 International License.
Organizers share contents common to all participants through the Multilingual Semantic Search task repository.
The repository is organised as follows:
topics: this folder contains the topics to be used for task.
ground-truth: this folder contains the ground-truth, i.e. the qrels, for the task.
report: this folder contains the rolling technical report describing the overall outcomes of the task, round after round.
Covid-19 MLIA Eval runs in three rounds.
ground-truth folders are organized into sub-folders for each round, i.e.
All the contents of this repository are realeased under the Creative Commons Attribution-ShareAlike 4.0 International License.
Rolling Technical Report:
The rolling technical report should be formatted according to the Springer LNCS format, using either the LaTeX template or the Word template. LaTeX is the preferred format.