Direct speech-to-speech translation with a sequence-to-sequence model
We present an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation. The network is trained end-to-end, learning to map speech spectrograms into target spectrograms in another language, corresponding to the translated content (in a different canonical voice). We further demonstrate the ability to synthesize translated speech using the voice of the source speaker. We conduct experiments on two Spanish-to-English speech translation datasets, and find that the proposed model slightly underperforms a baseline cascade of a direct speech-to-text translation model and a text-to-speech synthesis model, demonstrating the feasibility of the approach on this very challenging task.
NurtureToken New!

Token crowdsale for this paper ends in

Buy Nurture Tokens

Authors

Are you an author of this paper? Check the Twitter handle we have for you is correct.

Ye Jia (edit)
Ron J. Weiss (edit)
Fadi Biadsy (add twitter)
Wolfgang Macherey (add twitter)
Melvin Johnson (add twitter)
Zhifeng Chen (add twitter)
Yonghui Wu (add twitter)
Ask The Authors

Ask the authors of this paper a question or leave a comment.

Read it. Rate it.
#1. Which part of the paper did you read?

#2. The paper contains new data or analyses that is openly accessible?
#3. The conclusion is supported by the data and analyses?
#4. The conclusion is of scientific interest?
#5. The result is likely to lead to future research?

Github
User:
None (add)
Repo:
None (add)
Stargazers:
0
Forks:
0
Open Issues:
0
Network:
0
Subscribers:
0
Language:
None
Youtube
Link:
None (add)
Views:
0
Likes:
0
Dislikes:
0
Favorites:
0
Comments:
0
Other
Sample Sizes (N=):
Inserted:
Words Total:
Words Unique:
Source:
Abstract:
None
04/14/19 06:03PM
5,105
1,820
Tweets
arxiv_cscl: Direct speech-to-speech translation with a sequence-to-sequence model https://t.co/po9vxP5cg4
arxiv_cscl: Direct speech-to-speech translation with a sequence-to-sequence model https://t.co/po9vxPmNEE
arxivml: "Direct speech-to-speech translation with a sequence-to-sequence model", Ye Jia, Ron J. Weiss, Fadi Biadsy, Wolfgan… https://t.co/zwZVysQwnh
arxiv_cs_LG: Direct speech-to-speech translation with a sequence-to-sequence model. Ye Jia, Ron J. Weiss, Fadi Biadsy, Wolfgang Macherey, Melvin Johnson, Zhifeng Chen, and Yonghui Wu https://t.co/6zd7tZ4pMd
BrundageBot: Direct speech-to-speech translation with a sequence-to-sequence model. Ye Jia, Ron J. Weiss, Fadi Biadsy, Wolfgang Macherey, Melvin Johnson, Zhifeng Chen, and Yonghui Wu https://t.co/55wTSHydJV
arxiv_cscl: Direct speech-to-speech translation with a sequence-to-sequence model https://t.co/po9vxPmNEE
Images
Related