PUBLICATION
DVD: A Diagnostic Dataset for Multi-step Reasoning in Video Grounded Dialogue
Association for Computational Linguistics (ACL)
October 1, 2021
By: Hung Le, Chinnadhurai Sankar, Seungwhan Moon, Ahmad Beirami, Alborz Geramifard, Satwik Kottur
Abstract
A video-grounded dialogue system is required to understand both dialogue, which contains semantic dependencies from turn to turn, and video, which contains visual cues of spatial and temporal scene variations. Building such dialogue systems is a challenging problem, involving various reasoning types on both visual and language inputs. Existing benchmarks do not have enough annotations to thoroughly analyze dialogue systems and understand their capabilities and limitations in isolation. These benchmarks are also not explicitly designed to minimise biases that models can exploit without actual reasoning. To address these limitations, in this paper, we present DVD, a Diagnostic Dataset for Videogrounded Dialogues. The dataset is designed to contain minimal biases and has detailed annotations for the different types of reasoning over the spatio-temporal space of video. Dialogues are synthesized over multiple question turns, each of which is injected with a set of cross-turn semantic relationships. We use DVD to analyze existing approaches, providing interesting insights into their abilities and limitations. In total, DVD is built from 11k CATER synthetic videos and contains 10 instances of 10-round dialogues for each video, resulting in more than 100k dialogues and 1M question-answer pairs. Our code and dataset are publicly available: GitHub
Download Paper
Areas
AR/VR
ARTIFICIAL INTELLIGENCE
Share
Related Publications
Federated Learning for User Privacy and Data Confidentiality Workshop At ICML - July 24, 2021
Federated Learning with Buffered Asynchronous Aggregation
John Nguyen, Kshitiz Malik, Hongyuan Zhan, Ashkan Yousefpour, Michael Rabbat, Mani Malek, Dzmitry Huba
ISMAR - July 29, 2021
Instant Visual Odometry Initialization for Mobile AR
Alejo Concha, Michael Burri, Jesus Briales, Christian Forster, Luc Oth
ICSA - November 6, 2019
Auralization systems for simulation of augmented reality experiences in virtual environments
Peter Dodds, Sebastià V. Amengual Garí, W. Owen Brimijoin, Philip W. Robinson
UAI - July 28, 2021
A Nonmyopic Approach to Cost-Constrained Bayesian Optimization
Eric Hans Lee, David Eriksson, Valerio Perrone, Matthias Seeger
All Publications
Additional Resources
Videos
Downloads & Projects
Visiting Researchers & Postdocs
Visit Our Other Blogs
Engineering
Facebook AI
Oculus
Tech@
RSS Feed
About
Careers
Privacy
Cookies
Terms
Help
Facebook © 2021
To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy