SMART Task
SeMantic AnsweR Type prediction task
NOTICE: Due to Covid-19, ISWC 2020 has decided to go virtual. The SMART Task Challenge will also take place virtually.

Task Description

SMART task is a dataset for the answer type prediction task. Question Answering is a popular task in the field of Natural Language Processing and Information Retrieval, in which, the goal is to answer a natural language question (going beyond the document retrieval). Question or answer type classification plays a key role in question answering. The questions can be generally classified based on Wh-terms (Who, What, When, Where, Which, Whom, Whose, Why). Similarly, the answer type classification is determining the type of the expected answer based on the query. Such answer type classifications in literature are performed as a short-text classification task using a set of coarse-grained types, for instance, either 6 or 50 types with TREC QA task. A granular answer type classification is possible with popular Semantic Web ontologies such as DBpedia (~760 classes) and Wikidata (~50K classes).

In this challenge, given a question in natural language, the task is to predict type of the answer using a set of candidates from a target ontology.

Leaderboard

SMART - DBpedia dataset

System Accuracy NDCG@5 NDCG@10
Setty et al 0.98 0.80 0.79
Nikas et al 0.96 0.78 0.76
Perevalov et al 0.98 0.76 0.73
Kertkeidkachorn et al 0.96 0.75 0.72
Ammar et al 0.94 0.62 0.61
Vallurupalli et al 0.88 0.54 0.52
Steinmetz et al 0.74 0.54 0.52
Bill et al 0.79 0.31 0.30

SMART - Wikidata dataset

System Accuracy MRR
Setty et al 0.97 0.68
Kertkeidkachorn et al 0.96 0.59
Vallurupalli et al 0.85 0.40

References

[1] "A Methodology for Hierarchical Classification of Semantic Answer Types of Questions".
Ammar Ammar, Shervin Mehryar, and Remzi Celebi
[2] "Question Embeddings for Semantic Answer Type Prediction".
Eleanor Bill and Ernesto Jiménez-Ruiz
[3] "Hierarchical Contextualized RepresentationModels for Answer Type Prediction".
Natthawut Kertkeidkachorn, Rungsiman Nararatwong, Phuc Nguyen, Ikuya Yamada, Hideaki Takeda, and Ryutaro Ichise
[4] "Two-stage Semantic Answer Type Prediction for QA using BERT and Class-Specificity Rewarding".
Christos Nikas, Pavlos Fafalios, and Yannis Tzitzikas
[5] "Augmentation-based Answer Type Classification of the SMART dataset".
Aleksandr Perevalov and Andreas Both
[6] "Semantic Answer Type Prediction using BERT".
Vinay Setty and Krisztian Balog
[7] "COALA:A Rule-Based Approach to Answer Type Prediction".
Nadine Steinmetz and Kai-Uwe Sattler
[8] "Fine and Ultra-Fine Entity Type Embeddings for Question Answering".
Sai Vallurupalli and Jennifer Sleeman

Example Questions and Answer Types

The following table illustrates some example questions and expected answer types from DBpedia ontology and Wikidata ontology.
Question Answer Type
DBpedia Wikidata
Who is the heaviest player of the Chicago Bulls? dbo:BasketballPlayer wd:Q3665646
Which languages were influenced by Perl? dbo:ProgrammingLanguage wd:Q9143
Give me all actors starring in movies directed by and starring William Shatner. dbo:Actor wd:Q33999
How many employees does IBM have? number number

Dataset

We provide two datasets for this task, one using the DBpedia ontology and the other using the Wikidata ontology. Both follow the structure shown below.

Each question will have a (a) question id, (b) question text in natural language, (c) an answer category ("resource"/"literal"/"boolean"), and (d) answer type.

If the category is "resource", answer types are ontology classes from either the DBpedia ontology or the Wikidata ontology. If category is "literal", answer types are either "number", "date", or "string". "boolean" answer type. If the category is "boolean", answer type is always "boolean".


  [
    {
      "id": "dbpedia_1",
      "question": "Who are the gymnasts coached by Amanda Reddin?",
      "category": "resource",
      "type": ["dbo:Gymnast", "dbo:Athlete", "dbo:Person", "dbo:Agent"]
    },
    {
    "id": "dbpedia_2",
    "question": "How many superpowers does wonder woman have?",
    "category": "literal",
    "type": ["number"]
    }
    {
      "id": "dbpedia_3",
      "question": "When did Margaret Mead marry Gregory Bateson?",
      "category": "literal",
      "type": ["date"]
    },
    {
      "id": "dbpedia_4",
      "question": "Is Azerbaijan a member of European Go Federation?",
      "category": "boolean",
      "type": ["boolean"]
    }
  ]
        

Dataset statistics:

The DBpedia dataset contains 21,964 (train - 17,571, test - 4,393) questions and the Wikidata dataset contains 22,822 (train - 18,251, test - 4,571) questions.

DBpedia training set consists of 9,584 resource questions, 2,799 boolean questions, and 5,188 literal (number - 1,634, date - 1,486, string - 2,068) questions.

Wikidata training set consists of 11,683 resource questions, 2,139 boolean questions, and 4,429 literal questions.

Evaluation Metrics

For each natural language question in the test set, the participating systems are expected to provide two predictions: answer category and answer type. Answer category can be either "resource", "literal" or "boolean".

If answer category is "resource", the answer type should be an ontology class (DBpedia or Wikidata, depending on the dataset). The systems could predict a ranked list of classes from the corresponding ontology. If answer category is "literal", the answer type can be either "number", "date" or "string".

Category predication will be considered as a multi-class classification problem and accuracy score will be used as the metric. For type predication, we will use the metric lenient NDCG@k with a Linear decay from the paper from Balog and Neumayer.

We will provide more details on evaluation and the evaluations scripts soon.

Important Dates

Date Description
06 May 2020 Release of the training set.
07 September 2020 Release of the test set.
28 September 2020 Submission of system output and system description.
1 October 2020 Publication of results and notification of Acceptance for Presentation.
07 October 2020 Camera-ready submission.
2-6 November 2020 ISWC Conference

Usage

  • Find the train and test splits at our github repo also..
  • We're in the process of creating a one-click benchmarking process. For the time being, please contact us to report your results.
Every data item in the dataset consists of the following fields:

TBD
v0.1.0 - 01-07-2019
  • [RELEASE] First version of the dataset released with 30,000 datapoints.
  • lc-quad.sda.tech published
@inproceedings{tbd,
title={Title},
author={Authors},
booktitle={Proceedings of the ...},
year={2020},
organization={...}
}