Data & Code

A single place for starter code, core QANTA datasets, and paper-specific resources.

Competition / Tutorial Code

The best way to quickly get started with these data and this format is to use the CodaLab starter system and run it locally.

Resource	Link	Description
Competition baseline	Pinafore/qanta-codalab	Simplified system for quick setup, inspection, and leaderboard submission
Full research codebase	Pinafore/qb	Main QA system used in exhibition matches and research prototypes
Leaderboard	CodaLab	Submit and compare systems

QANTA Data

Computer-friendly data derived directly from quiz bowl data:

Normal Questions
Human responses
Naturalized Questions
Adversarial Questions (in the same format as the normal questions)

Data Direct Download	Huggingface Link	Description	Code
QANTA main datasets	-	Canonical QANTA question data and related dataset docs	Pinafore/qb
-	Quizbowl human responses	Human answer traces and response behavior data	maharshi95/neural-irt
-	-	Naturalized questions derived from trivia-style QA	Pinafore/qb2nq
Adversarial questions JSON	-	Adversarial examples in compatible QA format	Eric-Wallace/trickme-interface

Full Dataset Catalog

The 2021 tossup release is the main benchmark dataset for modern QANTA work:

QANTA Tossup Dataset (2021)

~100k pyramid-style quiz bowl tossup questions with full text, answers, and metadata (category, tournament, year).

Split	Download
Train	Download
Dev	Download

Code: github.com/Pinafore/qb

Historical releases: http://cs.umd.edu/~miyyer/qblearn/

Code / Data from Papers

Paper	Data Direct Download	Huggingface Link	Description	Code
Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can’t Answer?	-	-	NAACL 2025	-
GRACE: A Granular Benchmark for Evaluating Model Calibration against Human Calibration	-	-	ACL 2025 calibration benchmark	yysung/advcalibration
No Questions are Stupid and but some are Poorly Posed: Understanding Poorly-Posed Information-Seeking Questions	-	-	ACL 2025 question quality	nehasrikn/poorly-posed-questions
Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above	-	-	ACL 2025 evaluation methods	-
ADVSCORE: A Metric for the Evaluation and Creation of Adversarial Benchmarks	-	-	NAACL 2025 adversarialness metric	-
Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA	-	Quizbowl collection	EMNLP 2024 complementarity	maharshi95/neural-irt
You Make me Feel like a Natural Question: Training QA Systems on Transformed Trivia Questions	-	-	EMNLP 2024 naturalized QA	Pinafore/qb2nq
PEDANTS (Precise Evaluations of Diverse Answer Nominee Text for Skinflints): Use Evaluation Metrics Wisely—Efficient Evaluation Analysis and Benchmarking for Open-Domain Question Answering	-	-	EMNLP Findings 2024 evaluation	zli12321/PEDANTS-LLM-Evaluation
Automatic Explicitation to Bridge the Background Knowledge Gap in Translation and its Evaluation with Multilingual QA	-	-	EMNLP 2023 translation and QA	-
Learning to Explain Selectively: A Case Study on Question Answering	Data	-	EMNLP 2022 explanations	-
SimQA: Detecting Simultaneous MT Errors through Word-by-Word Question Answering	-	-	EMNLP 2022 simultaneous MT QA	SimQA code
Cheater’s Bowl: Human vs. Computer Search Strategies for Open-Domain QA	Data	-	EMNLP Findings 2022	Code
Re-Examining Calibration: The Case of Question Answering	-	-	EMNLP Findings 2022 calibration	NoviScl/calibrateQA
Evaluation Examples Are Not Equally Informative: How Should That Change NLP Leaderboards?	Data	-	ACL 2021 leaderboard analysis	leaderboard.pedro.ai
Distantly-Supervised Dense Retrieval Enables Open-Domain Question Answering without Evidence Annotation	-	-	EMNLP 2021 dense retrieval	henryzhao5852/DistDR
Evaluation Paradigms in Question Answering	-	-	EMNLP 2021 paradigm framing	-
Toward Deconfounding the Influence of Subject’s Demographic Characteristics in Question Answering	-	-	EMNLP 2021 fairness	-
What’s in a Name? Answer Equivalence For Open-Domain Question Answering	-	-	EMNLP 2021 answer equivalence	-
Multi-Step Reasoning Over Unstructured Text with Beam Dense Retrieval	-	-	NAACL 2021 multistep retrieval	-
Complex Factoid Question Answering with a Free-Text Knowledge Graph	-	-	WWW 2020 free-text KG QA	henryzhao5852/DELFT
Meta Answering for Machine Reading	-	-	ArXiv 2020 machine reading	-
Quizbowl: The Case for Incremental Question Answering	-	-	ArXiv 2020 incremental QA	QANTA site
What Question Answering can Learn from Trivia Nerds	-	-	ACL 2020 perspective paper	-
Mitigating Noisy Inputs for Question Answering	-	-	Interspeech 2019 noisy QA inputs	-
Can You Unpack That? Learning to Rewrite Questions-in-Context	Data	-	EMNLP 2019 question rewriting	aagohary/canard
What AI can do for me: Evaluating Machine Learning Interpretations in Cooperative Play	-	-	IUI 2019 interpretability in play	-
Trick Me If You Can: Human-in-the-loop Generation of Adversarial Question Answering Examples	Data	-	TACL 2019 adversarial QA	Eric-Wallace/trickme-interface

Contact

For dataset access or questions: qanta@googlegroups.com