On the Validity of Self-Attention as Explanation in Transformer Models
Explainability of deep learning systems is a vital requirement for many applications. However, it is still an unsolved problem. Recent self-attention based models for natural language processing, such as the Transformer or BERT, offer hope of greater explainability by providing attention maps that can be directly inspected. Nevertheless, by just looking at the attention maps one often overlooks that the attention is not over words but over hidden embeddings, which themselves can be mixed representations of multiple embeddings. We investigate to what extent the implicit assumption made in many recent papers - that hidden embeddings at all layers still correspond to the underlying words - is justified. We quantify how much embeddings are mixed based on a gradient based attribution method and find that already after the first layer less than 50% of the embedding is attributed to the underlying word, declining thereafter to a median contribution of 7.5% in the last layer. While throughout the layers the underlying word remains as the one contributing most to the embedding, we argue that attention visualizations are misleading and should be treated with care when explaining the underlying deep learning system.
NurtureToken New!

Token crowdsale for this paper ends in

Buy Nurture Tokens

Authors

Are you an author of this paper? Check the Twitter handle we have for you is correct.

Gino Brunner (add twitter)
Yang Liu (edit)
Damián Pascual (add twitter)
Oliver Richter (add twitter)
Roger Wattenhofer (edit)
Ask The Authors

Ask the authors of this paper a question or leave a comment.

Read it. Rate it.
#1. Which part of the paper did you read?

#2. The paper contains new data or analyses that is openly accessible?
#3. The conclusion is supported by the data and analyses?
#4. The conclusion is of scientific interest?
#5. The result is likely to lead to future research?

Github
User:
None (add)
Repo:
None (add)
Stargazers:
0
Forks:
0
Open Issues:
0
Network:
0
Subscribers:
0
Language:
None
Youtube
Link:
None (add)
Views:
0
Likes:
0
Dislikes:
0
Favorites:
0
Comments:
0
Other
Sample Sizes (N=):
Inserted:
Words Total:
Words Unique:
Source:
Abstract:
None
08/12/19 06:02PM
5,873
1,919
Tweets
arxiv_cs_LG: On the Validity of Self-Attention as Explanation in Transformer Models. Gino Brunner, Yang Liu, Damián Pascual, Oliver Richter, and Roger Wattenhofer https://t.co/IwtxxfPiMn
dgolano: @byron_c_wallace @jacobeisenstein @mark_riedl @BChoud1 @yuvalpi @sarahwiegreffe Agreed as well for those explaining, or not, attention.. I'm not not not kidding either. Looking forward to reading it and this one https://t.co/brH3J2QTeY
byron_c_wallace: @mark_riedl @BChoud1 @yuvalpi @sarahwiegreffe Yeah self-attention an interesting case and not extensively explored until just recently by Brunner et al: https://t.co/oj0YoJ6PDK -- "we argue that attention visualizations are misleading and should be treated with care"
unsorsodicorda: RT @RexDouglass: "ON THE VALIDITY OF SELF-ATTENTION AS EXPLANATION IN TRANSFORMER MODELS" https://t.co/B6dVvuhArv
RexDouglass: "ON THE VALIDITY OF SELF-ATTENTION AS EXPLANATION IN TRANSFORMER MODELS" https://t.co/B6dVvuhArv
arxiv_cscl: On the Validity of Self-Attention as Explanation in Transformer Models https://t.co/hVG3dbKlSt
Memoirs: On the Validity of Self-Attention as Explanation in Transformer Models. https://t.co/5zE8e7Yg3F
sleepinyourhat: Striking results on BERT, which help explain why attention-based analyses tend not to yield very satisfying results. (By @ginozkz et al., https://t.co/zVEYX7CIW6) https://t.co/gbzKwpKMzl
arxivml: "On the Validity of Self-Attention as Explanation in Transformer Models", Gino Brunner, Yang Liu, Damián Pascual, O… https://t.co/ITEdJEt0Ff
arxiv_cscl: On the Validity of Self-Attention as Explanation in Transformer Models https://t.co/hVG3dbKlSt
BrundageBot: On the Validity of Self-Attention as Explanation in Transformer Models. Gino Brunner, Yang Liu, Damián Pascual, Oliver Richter, and Roger Wattenhofer https://t.co/q7zCwsi7vK
arxiv_cscl: On the Validity of Self-Attention as Explanation in Transformer Models https://t.co/hVG3dbKlSt
Images
Related