Multi-view Multi-modality Data
Events are captured by a network of cameras with overlapping field-of-views.
Data are captured by various devices including regular stationary cameras, cameras mounted on moving vehicles, and infrared cameras.
QA in Two Forms
Formal Language Queries
Formal language queries are composed in the form of conjunction of predicates similar to first-order logic sentences. Answers to these queries are either true or false.
Natural Language Questions
Open-ended natural language questions and answers composed by crowd-sourcing human annotators.