To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks
While most previous work has focused on different pretraining objectives and architectures for transfer learning, we ask how to best adapt the pretrained model to a given target task. We focus on the two most common forms of adaptation, feature extraction (where the pretrained weights are frozen), and directly fine-tuning the pretrained model. Our empirical results across diverse NLP tasks with two state-of-the-art models show that the relative performance of fine-tuning vs. feature extraction depends on the similarity of the pretraining and target tasks. We explore possible explanations for this finding and provide a set of adaptation guidelines for the NLP practitioner.
NurtureToken New!

Token crowdsale for this paper ends in

Buy Nurture Tokens

Authors

Are you an author of this paper? Check the Twitter handle we have for you is correct.

Matthew Peters (edit)
Sebastian Ruder (edit)
Noah A. Smith (edit)
Ask The Authors

Ask the authors of this paper a question or leave a comment.

Read it. Rate it.
#1. Which part of the paper did you read?

#2. The paper contains new data or analyses that is openly accessible?
#3. The conclusion is supported by the data and analyses?
#4. The conclusion is of scientific interest?
#5. The result is likely to lead to future research?

Github
User:
None (add)
Repo:
None (add)
Stargazers:
0
Forks:
0
Open Issues:
0
Network:
0
Subscribers:
0
Language:
None
Youtube
Link:
None (add)
Views:
0
Likes:
0
Dislikes:
0
Favorites:
0
Comments:
0
Other
Sample Sizes (N=):
Inserted:
Words Total:
Words Unique:
Source:
Abstract:
None
03/14/19 06:02PM
4,712
1,796
Tweets
AUEBNLPGroup: Next meeting, Tue 2 April, 17:15-19:00: @alexandraxron presenting "An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models" (NAACL 2019) + Discussion of Peters et al. "To Tune or Not to Tune? Adapting..." (https://t.co/yZWkMe491O). Room A36.
ElectronNest: "To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks" https://t.co/4hhQsWpDu8
ceshine_en: "When extracting features, it is important to expose the internal layers as they typically encode the most transferable representations." https://t.co/zLiUln5Ufw
mayhewsw: I missed a crucial difference (thanks to @nlpmattg): the BERT paper fine-tuned over all params. Interesting to compare to the fire and ice emoji paper, where they say that fine-tuning BERT on sequence labeling tasks is not that important. https://t.co/6OjjcAkCso https://t.co/hwDHyJc6Jj
KiddoThe2B: @ryandcotterell @tallinzen @riedelcastro @ryandcotterell I believe that the training data size perspective is also relevant in "To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks"(https://t.co/kAF99E7LFI).
dtrtrtm: Really like πŸ‘ the "fine-tuning" πŸ”₯ and "feature extraction" ❄️ emojis in "To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks" by Peters, @seb_ruder and @nlpnoah https://t.co/yUYgwrNeK0
zaibacu: @AlgitaLfc @rozickas https://t.co/QumbRxcxqR
DrSophiaGberg: Emojis?! This is my kind of paper βœ… To Tune or Not to Tune? https://t.co/MCEhXOPaFc @mattthemathman @seb_ruder @nlpnoah πŸ‘ https://t.co/3NLZR0KzxX
udmrzn: RT @arxiv_cscl: To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks https://t.co/fK5fVSSVoX
arxiv_cscl: To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks https://t.co/fK5fVSSVoX
arxiv_cscl: To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks https://t.co/fK5fVSSVoX
himakotsu: To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks. (arXiv:1903.05987v1 [https://t.co/Elc9rIUsHa]) https://t.co/80jFtnnAzn
seb_ruder: New paper with @mattthemathman & @nlpnoah on adapting pretrained representations: We compare feature extraction & fine-tuning with ELMo and BERT and try to give several guidelines for adapting pretrained representations in practice. https://t.co/yXcoLk6MkG https://t.co/na3pu8UpCR
arxiv_cscl: To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks https://t.co/fK5fVSSVoX
dwhitena: New paper from @allen_ai researchers provides guidelines for #AI practicioners as they work with language models like BERT and ELMo: https://t.co/szZLMU0vTB @mattthemathman @seb_ruder @nlpnoah https://t.co/xBucDZvISQ
arxiv_cscl: To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks https://t.co/fK5fVSSVoX
yasubeitwi: δΊ‹ε‰ε­¦ηΏ’γ‚‚ζœ‰εŠΉγγ†γ€‚BERTツヨむ https://t.co/XlSWxlqVbZ
nlpnoah: New @ai2_allennlp paper by @mattthemathman , @seb_ruder , @nlpnoah -- to tune or not to tune? Includes guidelines for NLP researchers/practitioners that might be useful. https://t.co/9IIjhPzm0H
arxiv_cscl: To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks https://t.co/fK5fVSSVoX
lgr3d: ❄️ πŸ”₯ https://t.co/ZmCi96JJEz
_stefan_munich: To Tune or Not to Tune? Nice paper from @seb_ruder, @mattthemathman and @nlpnoah is now available on arxiv: https://t.co/Z79fZoEuKZ
arxivml: "To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks", Matthew Peters, Sebastian Ruder, No… https://t.co/q8XJxlkP25
arxiv_in_review: #NAACL2019 To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks. (arXiv:1903.05987v1 [cs\.CL]) https://t.co/u6NkuNPzNd
arxiv_cs_LG: To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks. Matthew Peters, Sebastian Ruder, and Noah A. Smith https://t.co/PptBiYOJmr
penzant: To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks https://t.co/kyYFK02YYq neat emoji notation ❄ πŸ”₯
Memoirs: To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks. https://t.co/NMNecNZ9iX
tuvuumass: New paper on adapting pretrained language models to downstream tasks by @mattthemathman, @seb_ruder, and @nlpnoah, showing that the effectiveness of fine-tuning depends on the language model architecture and the similarity of the pretraining/target tasks. https://t.co/DPjSksxJxE
arxiv_cscl: To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks https://t.co/fK5fVSSVoX
Miles_Brundage: Very useful looking paper! "To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks," Peters and @seb_ruder et al.: https://t.co/NHc9S31cgg https://t.co/wad6981TKt
Images
Related