Datasets

Find us on Hugging Face
AI2's latest open-source models and datasets can be found on our Hugging Face page.

Viewing 1-10 of 86 datasets

IfQA Counterfactual Reasoning Benchmark
3,800 open-domain questions designed to assess counterfactual reasoning abilities of NLP modelsAristo • 2023Counterfactual reasoning benchmark introduced in the EMNLP-2023 paper titled "IfQA: A Dataset for Open-domain Question Answering under Counterfactual Presuppositions".
Digital Socrates
DS Critique Bank contains annotated critiques of answers and explanations from "student" models.Aristo • 2023DS Critique Bank (DSCB) is a dataset of multiple-choice questions with associated answers and explanations provided by "student models", along with "critiques" of the explanations provided by "critique models". Many of the instances have human annotations.
Satlas Explorer
Satlas Explorer applies ML on satellite imagery to derive a wide range of geospatial data. • 2023Satlas Explorer is a demonstration of the use of AI to extract a variety of interesting data from satellite imagery, which can provide a near-real-time understanding of how our planet is changing. The current release contains predictions for: (1) the…
ParRoT (Parts and Relations of Things)
11,720 “X relation Y?” True/False questions on parts of everyday things and relational information about these partsAristo • 2023This is the dataset in "Do language models have coherent mental models of everyday things?", ACL 2023.
Belief and Reasoning Dataset
BaRDA: A Belief and REasoning Dataset that Separates Factual Accuracy and Reasoning AbilityAristo • 2023BaRDa is a new belief and reasoning dataset for evaluating the factual correctness ("truth") and reasoning accuracy ("rationality", or "honesty") of new language models. It was created in collaboration with, and with the support of, the Open Philanthropy…
Lila
A math reasoning benchmark of over 140K natural language questions annotated with Python programsAristo • 2022A comprehensive benchmark for mathematical reasoning with over 140K natural language questions annotated with Python programs and natural language instructions. The data set comes with multiple splits: Lila-IID (train, dev, test), Lila-OOD (train, dev, test…
WANLI: Worker-and-AI NLI
An NLI dataset created via a collaborative approach between language models and crowdworkers • 2022WANLI is an NLI dataset of 108K examples created through a novel approach for dataset creation based on worker and AI collaboration, which brings together the generative strength of language models and the evaluative strength of humans. Models trained on…
Entailer
Data for "Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning", EMNLP 2022Aristo • 2022Data for "Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning", EMNLP 2022
TeachMe
Supplementary data for "Towards Teachable Reasoning Systems: Using a Dynamic Memory ...", EMNLP 2022Aristo • 2022Supplementary data for "Towards Teachable Reasoning Systems: Using a Dynamic Memory ...", EMNLP 2022
Natural Instructions
A large benchmark of tasks and their language instructions • 2022The goal of Natural-Instructions project is to provide a good quality benchmark for measuring generalization to unseen tasks. This generalization hinges upon (and benefits from) understanding and reasoning with natural language instructions that plainly and…

1
2
3
•••
9

Natural Language Processing

Computer Vision

AI for the Environment

Experimentation and Communication

Research

Research

Datasets

IfQA Counterfactual Reasoning Benchmark

Digital Socrates

Satlas Explorer

ParRoT (Parts and Relations of Things)

Belief and Reasoning Dataset

Lila

WANLI: Worker-and-AI NLI

Entailer

TeachMe

Natural Instructions