Learning to Reason with Relational Video Representation for Question Answering
How does machine learn to reason about the content of a video in answering a question? A Video QA system must simultaneously understand language, represent visual content over space-time, and iteratively transform these representations in response to lingual content in the query, and finally arriving at a sensible answer. While recent advances in textual and visual question answering have come up with sophisticated visual representation and neural reasoning mechanisms, major challenges in Video QA remain on dynamic grounding of concepts, relations and actions to support the reasoning process. We present a new end-to-end layered architecture for Video QA, which is composed of a question-guided video representation layer and a generic reasoning layer to produce answer. The video is represented using a hierarchical model that encodes visual information about objects, actions and relations in space-time given the textual cues from the question. The encoded representation is then passed to a reasoning module, which in this paper, is implemented as a MAC net. The system is evaluated on the SVQA (synthetic) and TGIF-QA datasets (real), demonstrating state-of-the-art results, with a large margin in the case of multi-step reasoning.
NurtureToken New!

Token crowdsale for this paper ends in

Buy Nurture Tokens

Authors

Are you an author of this paper? Check the Twitter handle we have for you is correct.

Thao Minh Le (edit)
Vuong Le (add twitter)
Svetha Venkatesh (add twitter)
Truyen Tran (edit)
Ask The Authors

Ask the authors of this paper a question or leave a comment.

Read it. Rate it.
#1. Which part of the paper did you read?

#2. The paper contains new data or analyses that is openly accessible?
#3. The conclusion is supported by the data and analyses?
#4. The conclusion is of scientific interest?
#5. The result is likely to lead to future research?

Github
User:
None (add)
Repo:
None (add)
Stargazers:
0
Forks:
0
Open Issues:
0
Network:
0
Subscribers:
0
Language:
None
Youtube
Link:
None (add)
Views:
0
Likes:
0
Dislikes:
0
Favorites:
0
Comments:
0
Other
Sample Sizes (N=):
Inserted:
Words Total:
Words Unique:
Source:
Abstract:
None
07/10/19 06:05PM
7,556
2,522
Tweets
truyenoz: New preprint on "Learning to reason with relational video representation for question answering" by Thao Le-Minh https://t.co/6F1PxmJ50o #VideoQA #reasoning #relational #DeepLearning
arxiv_cscv: Learning to Reason with Relational Video Representation for Question Answering https://t.co/X8JptSizBr
arxiv_cscv: Learning to Reason with Relational Video Representation for Question Answering https://t.co/X8JptS0YJT
arxiv_cscv: Learning to Reason with Relational Video Representation for Question Answering https://t.co/X8JptS0YJT
arxivml: "Learning to Reason with Relational Video Representation for Question Answering", Thao Minh Le, Vuong Le, Svetha Ve… https://t.co/mGVmoQmwcY
SciFi: Learning to Reason with Relational Video Representation for Question Answering. https://t.co/kULPdrveIZ
arxiv_cs_LG: Learning to Reason with Relational Video Representation for Question Answering. Thao Minh Le, Vuong Le, Svetha Venkatesh, and Truyen Tran https://t.co/j6DtdUAxVa
BrundageBot: Learning to Reason with Relational Video Representation for Question Answering. Thao Minh Le, Vuong Le, Svetha Venkatesh, and Truyen Tran https://t.co/6AoyzTvRSj
arxiv_cscv: Learning to Reason with Relational Video Representation for Question Answering https://t.co/X8JptS0YJT
Images
Related