Data & Code

A single place for starter code, core QANTA datasets, and paper-specific resources.


Competition / Tutorial Code

The best way to quickly get started with these data and this format is to use the CodaLab starter system and run it locally.

Resource Link Description
Competition baseline Pinafore/qanta-codalab Simplified system for quick setup, inspection, and leaderboard submission
Full research codebase Pinafore/qb Main QA system used in exhibition matches and research prototypes
Leaderboard CodaLab Submit and compare systems

QANTA Data

Computer-friendly data derived directly from quiz bowl data:

  • Normal Questions
  • Human responses
  • Naturalized Questions
  • Adversarial Questions (in the same format as the normal questions)
Data Direct Download Huggingface Link Description Code
QANTA main datasets - Canonical QANTA question data and related dataset docs Pinafore/qb
- Quizbowl human responses Human answer traces and response behavior data maharshi95/neural-irt
- - Naturalized questions derived from trivia-style QA Pinafore/qb2nq
Adversarial questions JSON - Adversarial examples in compatible QA format Eric-Wallace/trickme-interface

Full Dataset Catalog

The 2021 tossup release is the main benchmark dataset for modern QANTA work:

QANTA Tossup Dataset (2021)

~100k pyramid-style quiz bowl tossup questions with full text, answers, and metadata (category, tournament, year).

Split Download
Train Download
Dev Download

Code: github.com/Pinafore/qb

Historical releases: http://cs.umd.edu/~miyyer/qblearn/


Code / Data from Papers

Paper Data Direct Download Huggingface Link Description Code
Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can’t Answer? - - NAACL 2025 -
GRACE: A Granular Benchmark for Evaluating Model Calibration against Human Calibration - - ACL 2025 calibration benchmark yysung/advcalibration
No Questions are Stupid and but some are Poorly Posed: Understanding Poorly-Posed Information-Seeking Questions - - ACL 2025 question quality nehasrikn/poorly-posed-questions
Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above - - ACL 2025 evaluation methods -
ADVSCORE: A Metric for the Evaluation and Creation of Adversarial Benchmarks - - NAACL 2025 adversarialness metric -
Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA - Quizbowl collection EMNLP 2024 complementarity maharshi95/neural-irt
You Make me Feel like a Natural Question: Training QA Systems on Transformed Trivia Questions - - EMNLP 2024 naturalized QA Pinafore/qb2nq
PEDANTS (Precise Evaluations of Diverse Answer Nominee Text for Skinflints): Use Evaluation Metrics Wisely—Efficient Evaluation Analysis and Benchmarking for Open-Domain Question Answering - - EMNLP Findings 2024 evaluation zli12321/PEDANTS-LLM-Evaluation
Automatic Explicitation to Bridge the Background Knowledge Gap in Translation and its Evaluation with Multilingual QA - - EMNLP 2023 translation and QA -
Learning to Explain Selectively: A Case Study on Question Answering Data - EMNLP 2022 explanations -
SimQA: Detecting Simultaneous MT Errors through Word-by-Word Question Answering - - EMNLP 2022 simultaneous MT QA SimQA code
Cheater’s Bowl: Human vs. Computer Search Strategies for Open-Domain QA Data - EMNLP Findings 2022 Code
Re-Examining Calibration: The Case of Question Answering - - EMNLP Findings 2022 calibration NoviScl/calibrateQA
Evaluation Examples Are Not Equally Informative: How Should That Change NLP Leaderboards? Data - ACL 2021 leaderboard analysis leaderboard.pedro.ai
Distantly-Supervised Dense Retrieval Enables Open-Domain Question Answering without Evidence Annotation - - EMNLP 2021 dense retrieval henryzhao5852/DistDR
Evaluation Paradigms in Question Answering - - EMNLP 2021 paradigm framing -
Toward Deconfounding the Influence of Subject’s Demographic Characteristics in Question Answering - - EMNLP 2021 fairness -
What’s in a Name? Answer Equivalence For Open-Domain Question Answering - - EMNLP 2021 answer equivalence -
Multi-Step Reasoning Over Unstructured Text with Beam Dense Retrieval - - NAACL 2021 multistep retrieval -
Complex Factoid Question Answering with a Free-Text Knowledge Graph - - WWW 2020 free-text KG QA henryzhao5852/DELFT
Meta Answering for Machine Reading - - ArXiv 2020 machine reading -
Quizbowl: The Case for Incremental Question Answering - - ArXiv 2020 incremental QA QANTA site
What Question Answering can Learn from Trivia Nerds - - ACL 2020 perspective paper -
Mitigating Noisy Inputs for Question Answering - - Interspeech 2019 noisy QA inputs -
Can You Unpack That? Learning to Rewrite Questions-in-Context Data - EMNLP 2019 question rewriting aagohary/canard
What AI can do for me: Evaluating Machine Learning Interpretations in Cooperative Play - - IUI 2019 interpretability in play -
Trick Me If You Can: Human-in-the-loop Generation of Adversarial Question Answering Examples Data - TACL 2019 adversarial QA Eric-Wallace/trickme-interface

Contact

For dataset access or questions: qanta@googlegroups.com