SMART Task | SeMantic AnsweR Type prediction task

SMART task is a dataset for the answer type prediction task. Question Answering is a popular task in the field of Natural Language Processing and Information Retrieval, in which, the goal is to answer a natural language question (going beyond the document retrieval). Question or answer type classification plays a key role in question answering. The questions can be generally classified based on Wh-terms (Who, What, When, Where, Which, Whom, Whose, Why). Similarly, the answer type classification is determining the type of the expected answer based on the query. Such answer type classifications in literature are performed as a short-text classification task using a set of coarse-grained types, for instance, either 6 or 50 types with TREC QA task. A granular answer type classification is possible with popular Semantic Web ontologies such as DBpedia (~760 classes) and Wikidata (~50K classes).

In this challenge, given a question in natural language, the task is to predict type of the answer using a set of candidates from a target ontology.

@article{smarttask2020,
title={{SeMantic AnsweR Type prediction task (SMART) at ISWC 2020 Semantic Web Challenge}},
author={Nandana Mihindukulasooriya and Mohnish Dubey and Alfio Gliozzo and Jens Lehmann and Axel-Cyrille Ngonga Ngomo and Ricardo Usbeck},
journal={CoRR/arXiv},
year={2020},
volume={abs/2012.00555},
url={https://arxiv.org/abs/2012.00555}
}

Find the train and test splits at our github repo also..
We're in the process of creating a one-click benchmarking process. For the time being, please contact us to report your results.

Every data item in the dataset consists of the following fields:

TBD

v0.1.0 - 01-07-2019

[RELEASE] First version of the dataset released with 30,000 datapoints.
lc-quad.sda.tech published

SMART - DBpedia dataset

System	Accuracy	NDCG@5	NDCG@10
Setty et al.	0.98	0.80	0.79
Nikas et al.	0.96	0.78	0.76
Perevalov et al.	0.98	0.76	0.73
Kertkeidkachorn et al.	0.96	0.75	0.72
Ammar et al.	0.94	0.62	0.61
Vallurupalli et al.	0.88	0.54	0.52
Steinmetz et al.	0.74	0.54	0.52
Bill et al.	0.79	0.31	0.30

SMART - Wikidata dataset

System	Accuracy	MRR
Setty et al.	0.97	0.68
Kertkeidkachorn et al.	0.96	0.59
Vallurupalli et al.	0.85	0.40

References

[1] A Methodology for Hierarchical Classification of Semantic Answer Types of Questions
Ammar Ammar, Shervin Mehryar, and Remzi Celebi

[2] Question Embeddings for Semantic Answer Type Prediction
Eleanor Bill and Ernesto Jiménez-Ruiz

[3] Hierarchical Contextualized RepresentationModels for Answer Type Prediction
Natthawut Kertkeidkachorn, Rungsiman Nararatwong, Phuc Nguyen, Ikuya Yamada, Hideaki Takeda, and Ryutaro Ichise

[4] Two-stage Semantic Answer Type Prediction for QA using BERT and Class-Specificity Rewarding
Christos Nikas, Pavlos Fafalios, and Yannis Tzitzikas

[5] Augmentation-based Answer Type Classification of the SMART dataset
Aleksandr Perevalov and Andreas Both

[6] Semantic Answer Type Prediction using BERT: IAI at the ISWC SMART Task 2020
Vinay Setty and Krisztian Balog

[7] COALA - A Rule-Based Approach to Answer Type Prediction
Nadine Steinmetz and Kai-Uwe Sattler

[8] Fine and Ultra-Fine Entity Type Embeddings for Question Answering
Sai Vallurupalli, Jennifer Sleeman, and Tim Finin

The following table illustrates some example questions and expected answer types from DBpedia ontology and Wikidata ontology.

Question	Answer Type
Question	DBpedia	Wikidata
Who is the heaviest player of the Chicago Bulls?	dbo:BasketballPlayer	wd:Q3665646
Which languages were influenced by Perl?	dbo:ProgrammingLanguage	wd:Q9143
Give me all actors starring in movies directed by and starring William Shatner.	dbo:Actor	wd:Q33999
How many employees does IBM have?	number	number

We provide two datasets for this task, one using the DBpedia ontology and the other using the Wikidata ontology. Both follow the structure shown below.

Each question will have a (a) question id, (b) question text in natural language, (c) an answer category ("resource"/"literal"/"boolean"), and (d) answer type.

If the category is "resource", answer types are ontology classes from either the DBpedia ontology or the Wikidata ontology. If category is "literal", answer types are either "number", "date", or "string". "boolean" answer type. If the category is "boolean", answer type is always "boolean".


  [
    {
      "id": "dbpedia_1",
      "question": "Who are the gymnasts coached by Amanda Reddin?",
      "category": "resource",
      "type": ["dbo:Gymnast", "dbo:Athlete", "dbo:Person", "dbo:Agent"]
    },
    {
    "id": "dbpedia_2",
    "question": "How many superpowers does wonder woman have?",
    "category": "literal",
    "type": ["number"]
    }
    {
      "id": "dbpedia_3",
      "question": "When did Margaret Mead marry Gregory Bateson?",
      "category": "literal",
      "type": ["date"]
    },
    {
      "id": "dbpedia_4",
      "question": "Is Azerbaijan a member of European Go Federation?",
      "category": "boolean",
      "type": ["boolean"]
    }
  ]

Dataset statistics:

The DBpedia dataset contains 21,964 (train - 17,571, test - 4,393) questions and the Wikidata dataset contains 22,822 (train - 18,251, test - 4,571) questions.

DBpedia training set consists of 9,584 resource questions, 2,799 boolean questions, and 5,188 literal (number - 1,634, date - 1,486, string - 2,068) questions.

Wikidata training set consists of 11,683 resource questions, 2,139 boolean questions, and 4,429 literal questions.

For each natural language question in the test set, the participating systems are expected to provide two predictions: answer category and answer type. Answer category can be either "resource", "literal" or "boolean".

If answer category is "resource", the answer type should be an ontology class (DBpedia or Wikidata, depending on the dataset). The systems could predict a ranked list of classes from the corresponding ontology. If answer category is "literal", the answer type can be either "number", "date" or "string".

Category predication will be considered as a multi-class classification problem and accuracy score will be used as the metric. For type predication, we will use the metric lenient NDCG@k with a Linear decay from the paper from Balog and Neumayer.

We will provide more details on evaluation and the evaluations scripts soon.

Participants are requested to submit the system output for the test data. The format is as same as the training data. In addition, the participants are requested to submit a system description that will be included in a joint ISWC challenge proceedings volume in CEUR. System descriptions must be in English either in PDF or HTML, formatted in the style of LNCS, and no longer than 12 pages. Submissions can be sent via email or through the slack workspace. The accepted systems will get the opportunity to show their results during the ISWC 2020 poster and demo session.

Date	Description
06 May 2020	Release of the training set.
07 September 2020	Release of the test set.
28 September 2020	Submission of system output and system description.
1 October 2020	Publication of results and notification of Acceptance for Presentation.
07 October 2020	Camera-ready submission.
2-6 November 2020	ISWC Conference

RSM/Manager, IBM Research AI

Professor, Universität Paderborn

Professor, University of Bonn

Ph.D. Student, University of Bonn

Postdoctoral Researcher, IBM Research AI

Team Leader - Conversational AI, Fraunhofer IAIS Dresden

Version	1.0
2 Datasets (DBpedia/Wikidata) 44,786 questions
DBpedia Dataset	Total: 21,940 Train: 17,571 Test: 4,369
Wikidata Dataset	Total: 22,822 Train: 18,251 Test: 4,571

Data Characteristics

Contact Us

Task Description

Usage

Leaderboard

SMART - DBpedia dataset

SMART - Wikidata dataset

References

Example Questions and Answer Types

Dataset

Evaluation Metrics

Submission Details

Important Dates

People