QBLink: Sequential Open-Domain Question Answering

QBLink is a dataset for sequential question answering where multiple related questions about the same topic are answered in sequence. It evaluates how well QA systems leverage context from previous questions and answers.

18,644 sequences · 56,000 question–answer pairs

Dataset Structure

Each sequence contains:

Field	Description
`id`	Sequence identifier
`tournament`	Quiz bowl tournament source
`lead-in`	Introductory sentence defining the topic
`category`	Subject area (History, Literature, Philosophy, etc.)
`sub-category`	More specific classification
Questions 1–3	Each with `question_text`, `raw_answer`, `wiki_page`

Example sequences cover topics such as Bitcoin’s inventor or Ronald Reagan’s presidency, where later questions reference earlier answers to test contextual reasoning.

Citation

Ahmed Elgohary, Chen Zhao, Jordan Boyd-Graber. Dataset and Baselines for Sequential Open-Domain Question Answering. EMNLP 2018.