Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering
Text-based Question Generation (QG) aims at generating natural and relevant questions that can be answered by a given answer in some context. Existing QG models suffer from a "semantic drift" problem, i.e., the semantics of the model-generated question drifts away from the given context and answer. In this paper, we first propose two semantics-enhanced rewards obtained from downstream question paraphrasing and question answering tasks to regularize the QG model to generate semantically valid questions. Second, since the traditional evaluation metrics (e.g., BLEU) often fall short in evaluating the quality of generated questions, we propose a QA-based evaluation method which measures the QG model's ability to mimic human annotators in generating QA training data. Experiments show that our method achieves the new state-of-the-art performance w.r.t. traditional metrics, and also performs best on our QA-based evaluation metrics. Further, we investigate how to use our QG model to augment QA datasets and enable semi-supervised QA. We propose two ways to generate synthetic QA pairs: generate new questions from existing articles or collect QA pairs from new articles. We also propose two empirically effective strategies, a data filter and mixing mini-batch training, to properly use the QG-generated data for QA. Experiments show that our method improves over both BiDAF and BERT QA baselines, even without introducing new articles.
NurtureToken New!

Token crowdsale for this paper ends in

Buy Nurture Tokens

Authors

Are you an author of this paper? Check the Twitter handle we have for you is correct.

Shiyue Zhang (edit)
Mohit Bansal (edit)
Ask The Authors

Ask the authors of this paper a question or leave a comment.

Read it. Rate it.
#1. Which part of the paper did you read?

#2. The paper contains new data or analyses that is openly accessible?
#3. The conclusion is supported by the data and analyses?
#4. The conclusion is of scientific interest?
#5. The result is likely to lead to future research?

Github
Stargazers:
1
Forks:
0
Open Issues:
0
Network:
0
Subscribers:
1
Language:
None
Youtube
Link:
None (add)
Views:
0
Likes:
0
Dislikes:
0
Favorites:
0
Comments:
0
Other
Sample Sizes (N=):
Inserted:
Words Total:
Words Unique:
Source:
Abstract:
None
09/15/19 06:02PM
9,577
2,792
Tweets
arxiv_cs_LG: Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering. Shiyue Zhang and Mohit Bansal https://t.co/wVa2BULeLu
mohitban47: Here's @byryuer Shiyue's @EMNLP2019 paper on QG-for-QA & QAEval-for-QG! She presents SotA QG via 2 semantic-drift rewards (Qparaphr & QAprob) + shows QA-Eval as downstream NLG metric for QG + uses QG to improve semi-supv BERT QA ๐Ÿ™‚ https://t.co/rXEL870tyW https://t.co/z9MQO4FSIM https://t.co/5bxWYZdVXX
SciFi: Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering. https://t.co/jaXLfgdIG0
byryuer: @emnlp2019 @mohitban47 @uncnlp (4) finally, we propose 2 effective strategies to use QG-gener data for QA: data filter & mixing mini-batch training. Improvements over both BiDAF & BERT QA baselines, even w/o introd new articles PDF: https://t.co/3dcfwVWyBj Code/Models: https://t.co/TRh9MNDqdX: Coming soon ๐Ÿ˜ƒ
arxiv_cscl: Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering https://t.co/LQ3AvzOskQ
arxivml: "Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering", Shiyue Zhang, Mohit Bansโ€ฆ https://t.co/XHCrUAdEIQ
arxiv_cscl: Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering https://t.co/LQ3AvzwQWg
arxiv_cs_LG: Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering. Shiyue Zhang and Mohit Bansal https://t.co/wVa2BULeLu
BrundageBot: Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering. Shiyue Zhang and Mohit Bansal https://t.co/JAV8U4ybe5
Images
Related