Efficient Multimodal Question Answering (EMM-QA)

Efficient multimodal question answering in the era of large language models.
EMM-QA is an ICML 2026 workshop focused on question answering systems that must balance accuracy, efficiency, and adaptability across multiple input modalities. The workshop brings together researchers from academia and industry working on knowledge-intensive multimodal systems that operate under practical resource constraints.
Rather than focusing only on larger models, the workshop emphasizes methods that make multimodal question answering usable in real settings, including retrieval-augmented systems, compact models, efficient inference, and human-in-the-loop evaluation.
Join the community on Discord.
Scope
The workshop is centered on efficient multimodal question answering. It also welcomes closely related work on multimodal retrieval, reasoning, evaluation, benchmarking, and efficient inference when those contributions are clearly connected to question answering or other knowledge-intensive multimodal tasks.
Like the previous iteration of EfficientQA, which focused on text-only question answering, we will also host a human-computer question answering competition. If youβd like to take part in that part of the competition (it should be fun!), you can either play as a team or write questions.
Workshop Format
The workshop is planned as a one-day event combining:
- Contributed papers
- Poster presentations
- Invited keynotes
- Shared-task highlights
- A live human-computer question answering event
- A panel discussion
The workshop will also serve as the venue where we announce the winning systems from the QANTA 2026 computer competition.
Schedule
- Workshop takes place on July 11th.
- All talk sessions (invited talks, spotlights, challenge talks, awards, etc.) will take place in the main workshop room at COEX.
- All poster sessions will take place separately in Hall A outside the workshop room area at COEX.
| Time | Activity | Duration |
|---|---|---|
| 08:00β08:10 | Welcome & Workshop Overview | 10 min |
| 08:10β08:50 | π¦ Robin Jia: TBA | 40 min |
| 08:50β09:00 | Q&A | 10 min |
| 09:00β09:15 | β Coffee Break | 15 min |
| 09:15β09:55 | π¦ Sewon Min: TBA | 40 min |
| 09:55β10:05 | Q&A | 10 min |
| 10:05β10:50 | π¨ Contributed Paper Spotlights | 45 min |
| 10:50β11:50 | π§ Workshop Posters | 60 min |
| 11:50β12:50 | Lunch | 60 min |
| 12:50β13:20 | π€ Live AI QA Competition | 30 min |
| 13:20β14:00 | π¦ Mrinmaya Sachan: TBA | 40 min |
| 14:00β14:10 | Q&A | 10 min |
| 14:10β14:50 | π¦ Naman Goyal & Jenny Ni: Multimodal Robustness Under Distribution Shift | 40 min |
| 14:50β15:00 | Q&A | 10 min |
| 15:00β15:15 | β Coffee Break | 15 min |
| 15:15β15:35 | Shared Challenge Introduction & Results Overview | 20 min |
| 15:35β15:55 | π¨ Best Challenge Team Talks | 20 min |
| 15:55β16:05 | π Challenge Awards | 10 min |
| 16:05β16:10 | Closing Remarks | 5 min |
| 16:10β17:00 | π§ Shared Challenge Posters | 50 min |
Legend
- π¦ Invited Talks
- π¨ Contributed Paper Spotlights / Best Challenge Team Talk
- π§ Poster Sessions
Confirmed Keynote Speakers
- Sewon Min (UC Berkeley EECS & Allen Institute for AI)
- Mrinmaya Sachan (ETH ZΓΌrich)
- Robin Jia (University of Southern California)
- Naman Goyal (Google DeepMind) & Jenny Ni (Google)
Organizers
- Jordan Boyd-Graber, University of Maryland
- Martin FajΔΓk, Brno University of Technology
- George Jojo Boateng, ETH Zurich / Kwame AI
- Ikuya Yamada, Studio Ousia / Tohoku University / Nagoya University / RIKEN
- Chen Zhao, NYU Shanghai
Contact
Questions about the workshop can be sent to emm-qa-organizers@googlegroups.com. Or join the Discord.
Sponsors/Acknowledgements
- This workshop is partially supported by Horizon EU programme through project ELOQUENCE, grant no. 101135916.