SMART Task
SeMantic AnsweR Type prediction task
NOTICE: The final call for submissions is out in the Semantic Web mailing list, WikiCFP, and DBpedia Forum.
NOTICE: Due to Covid-19, ISWC 2020 has decided to go virtual. The SMART Task Challenge will also take place virtually.

Task Description

SMART task is a dataset for the answer type prediction task. Question Answering is a popular task in the field of Natural Language Processing and Information Retrieval, in which, the goal is to answer a natural language question (going beyond the document retrieval). Question or answer type classification plays a key role in question answering. The questions can be generally classified based on Wh-terms (Who, What, When, Where, Which, Whom, Whose, Why). Similarly, the answer type classification is determining the type of the expected answer based on the query. Such answer type classifications in literature are performed as a short-text classification task using a set of coarse-grained types, for instance, either 6 or 50 types with TREC QA task. A granular answer type classification is possible with popular Semantic Web ontologies such as DBpedia (~760 classes) and Wikidata (~50K classes).

In this challenge, given a question in natural language, the task is to predict type of the answer using a set of candidates from a target ontology.

Usage

@article{smarttask2020,
title={{SeMantic AnsweR Type prediction task (SMART) at ISWC 2020 Semantic Web Challenge}},
author={Nandana Mihindukulasooriya and Mohnish Dubey and Alfio Gliozzo and Jens Lehmann and Axel-Cyrille Ngonga Ngomo and Ricardo Usbeck},
journal={CoRR/arXiv},
year={2020},
volume={abs/2012.00555},
url={https://arxiv.org/abs/2012.00555}
}
  • Find the train and test splits at our github repo also..
  • We're in the process of creating a one-click benchmarking process. For the time being, please contact us to report your results.
Every data item in the dataset consists of the following fields:

TBD
v0.1.0 - 01-07-2019
  • [RELEASE] First version of the dataset released with 30,000 datapoints.
  • lc-quad.sda.tech published

Leaderboard

SMART - DBpedia dataset

System Accuracy NDCG@5 NDCG@10
Setty et al. 0.98 0.80 0.79
Nikas et al. 0.96 0.78 0.76
Perevalov et al. 0.98 0.76 0.73
Kertkeidkachorn et al. 0.96 0.75 0.72
Ammar et al. 0.94 0.62 0.61
Vallurupalli et al. 0.88 0.54 0.52
Steinmetz et al. 0.74 0.54 0.52
Bill et al. 0.79 0.31 0.30

SMART - Wikidata dataset

System Accuracy MRR
Setty et al. 0.97 0.68
Kertkeidkachorn et al. 0.96 0.59
Vallurupalli et al. 0.85 0.40

References

[2] Question Embeddings for Semantic Answer Type Prediction
Eleanor Bill and Ernesto Jiménez-Ruiz
[3] Hierarchical Contextualized RepresentationModels for Answer Type Prediction
Natthawut Kertkeidkachorn, Rungsiman Nararatwong, Phuc Nguyen, Ikuya Yamada, Hideaki Takeda, and Ryutaro Ichise
[7] COALA - A Rule-Based Approach to Answer Type Prediction
Nadine Steinmetz and Kai-Uwe Sattler
[8] Fine and Ultra-Fine Entity Type Embeddings for Question Answering
Sai Vallurupalli, Jennifer Sleeman, and Tim Finin

Example Questions and Answer Types

The following table illustrates some example questions and expected answer types from DBpedia ontology and Wikidata ontology.
Question Answer Type
DBpedia Wikidata
Who is the heaviest player of the Chicago Bulls? dbo:BasketballPlayer wd:Q3665646
Which languages were influenced by Perl? dbo:ProgrammingLanguage wd:Q9143
Give me all actors starring in movies directed by and starring William Shatner. dbo:Actor wd:Q33999
How many employees does IBM have? number number

Dataset

We provide two datasets for this task, one using the DBpedia ontology and the other using the Wikidata ontology. Both follow the structure shown below.

Each question will have a (a) question id, (b) question text in natural language, (c) an answer category ("resource"/"literal"/"boolean"), and (d) answer type.

If the category is "resource", answer types are ontology classes from either the DBpedia ontology or the Wikidata ontology. If category is "literal", answer types are either "number", "date", or "string". "boolean" answer type. If the category is "boolean", answer type is always "boolean".


  [
    {
      "id": "dbpedia_1",
      "question": "Who are the gymnasts coached by Amanda Reddin?",
      "category": "resource",
      "type": ["dbo:Gymnast", "dbo:Athlete", "dbo:Person", "dbo:Agent"]
    },
    {
    "id": "dbpedia_2",
    "question": "How many superpowers does wonder woman have?",
    "category": "literal",
    "type": ["number"]
    }
    {
      "id": "dbpedia_3",
      "question": "When did Margaret Mead marry Gregory Bateson?",
      "category": "literal",
      "type": ["date"]
    },
    {
      "id": "dbpedia_4",
      "question": "Is Azerbaijan a member of European Go Federation?",
      "category": "boolean",
      "type": ["boolean"]
    }
  ]
        

Dataset statistics:

The DBpedia dataset contains 21,964 (train - 17,571, test - 4,393) questions and the Wikidata dataset contains 22,822 (train - 18,251, test - 4,571) questions.

DBpedia training set consists of 9,584 resource questions, 2,799 boolean questions, and 5,188 literal (number - 1,634, date - 1,486, string - 2,068) questions.

Wikidata training set consists of 11,683 resource questions, 2,139 boolean questions, and 4,429 literal questions.

Evaluation Metrics

For each natural language question in the test set, the participating systems are expected to provide two predictions: answer category and answer type. Answer category can be either "resource", "literal" or "boolean".

If answer category is "resource", the answer type should be an ontology class (DBpedia or Wikidata, depending on the dataset). The systems could predict a ranked list of classes from the corresponding ontology. If answer category is "literal", the answer type can be either "number", "date" or "string".

Category predication will be considered as a multi-class classification problem and accuracy score will be used as the metric. For type predication, we will use the metric lenient NDCG@k with a Linear decay from the paper from Balog and Neumayer.

We will provide more details on evaluation and the evaluations scripts soon.

Submission Details

Participants are requested to submit the system output for the test data. The format is as same as the training data. In addition, the participants are requested to submit a system description that will be included in a joint ISWC challenge proceedings volume in CEUR. System descriptions must be in English either in PDF or HTML, formatted in the style of LNCS, and no longer than 12 pages. Submissions can be sent via email or through the slack workspace. The accepted systems will get the opportunity to show their results during the ISWC 2020 poster and demo session.

Important Dates

Date Description
06 May 2020 Release of the training set.
07 September 2020 Release of the test set.
28 September 2020 Submission of system output and system description.
1 October 2020 Publication of results and notification of Acceptance for Presentation.
07 October 2020 Camera-ready submission.
2-6 November 2020 ISWC Conference