Direct speech-to-speech translation with a sequence-to-sequence model
We present an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation. The network is trained end-to-end, learning to map speech spectrograms into target spectrograms in another language, corresponding to the translated content (in a different canonical voice). We further demonstrate the ability to synthesize translated speech using the voice of the source speaker. We conduct experiments on two Spanish-to-English speech translation datasets, and find that the proposed model slightly underperforms a baseline cascade of a direct speech-to-text translation model and a text-to-speech synthesis model, demonstrating the feasibility of the approach on this very challenging task.
NurtureToken New!

Token crowdsale for this paper ends in

Buy Nurture Tokens

Authors

Are you an author of this paper? Check the Twitter handle we have for you is correct.

Ye Jia (edit)
Ron J. Weiss (edit)
Fadi Biadsy (add twitter)
Wolfgang Macherey (add twitter)
Melvin Johnson (add twitter)
Zhifeng Chen (add twitter)
Yonghui Wu (add twitter)
Ask The Authors

Ask the authors of this paper a question or leave a comment.

Read it. Rate it.
#1. Which part of the paper did you read?

#2. The paper contains new data or analyses that is openly accessible?
#3. The conclusion is supported by the data and analyses?
#4. The conclusion is of scientific interest?
#5. The result is likely to lead to future research?

Github
User:
None (add)
Repo:
None (add)
Stargazers:
0
Forks:
0
Open Issues:
0
Network:
0
Subscribers:
0
Language:
None
Youtube
Link:
None (add)
Views:
0
Likes:
0
Dislikes:
0
Favorites:
0
Comments:
0
Other
Sample Sizes (N=):
Inserted:
Words Total:
Words Unique:
Source:
Abstract:
None
04/14/19 06:03PM
5,105
1,820
Tweets
zmaglh: Tradução voice to voice! https://t.co/ayLvvVGGFQ
catchpushkar: @rctatman Thanks, Rachael for doing the paper readings. Trying to understand the Translatotron (https://t.co/708DfAEVcu) which is the same as an end to end MT but for Speech. Understanding Attention is key to it and this (https://t.co/0Y2UmIQbhE) is helping a lot
JeffDean: Nice article describing work by @GoogleAI researchers. Actual paper describing the work titled Direct speech-to-speech translation with a sequence-to-sequence model by Ye Jia, Ron Weiss, et al. is here: Arxiv: https://t.co/8fcSRqNfT5 Blog post: https://t.co/u8EMKiw3dw https://t.co/RV4GigYrqe
ExponentialMed: The “Translatotron”. Google’s #AI can now translate your speech while keeping your voice.... With neural networks trained to map audio “voiceprints” from one language to another. https://t.co/36AFTDcCTI Via @techreview https://t.co/Dp6vgtNvVO #translation
patdiaz: #Translatotron: direct speech-to-speech translation by Google. For now proof-of-concept. Very promising! https://t.co/BqMUkIPgaE https://t.co/ofpNanibA1
jakubzavrel: Now speech-to-speech translation can be done without intermediate text representations, and with voice preservation of the original speaker in the target language: hats off for the Translatotron! https://t.co/ofcCK2XMvi
FJ_Marmolejo: ¿Puede una App traducir tus palabras a otro idioma manteniendo tu propia voz?.Un team de Google lo ha logrado con #InteligenciaArtificial aplicada a la #traducción. Aquí la nota de @techreview con audio clips: https://t.co/APjaKDpJsq Aquí el paper técnico: https://t.co/elU0SSBr9Y https://t.co/baTBKcPJtO
a_h183: رابط الورقة العلمية https://t.co/3C3cxEwnF2
DataScienceNIG: AI can now help you speak another language in your own voice! Welcome TRANSLATOTRON from @GoogleAI Uses seq-to-seq net for voice input, processes it as a spectrogram & generates a new spectrogram in a target language. Paper: https://t.co/50A2GhjhkH Audio:https://t.co/pmr8bLcQtz https://t.co/o9VZ0Q09om
FAldunateM: La IA de Google consigue traducción simultánea de discursos de un idioma (sin pasar por textos intermedios) lo que permite mantener la voz del hablante. Parte con español-inglés. Aquí el paper: https://t.co/3EzdALhEZ3 Aquí algunas pruebas: https://t.co/XqQC3qhZ33
bryanoloughlin: Translatotron : An End-to-End Speech-to-Speech Translation Model , GoogleAI May 15, 2019 https://t.co/fjfk8JvrBt 1st-of-its-kind,work-in-prog,mimics users own voice, speed https://t.co/MbJIuz9wV3 Direct speech-to-speech translation with a seq 2-seq model https://t.co/Kh04PiqfBS https://t.co/LCkol3YrwK
geologylady: universal translator a little bit closer https://t.co/wsDsr4cD6G
BioDecoded: Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model | Google AI Blog https://t.co/isPjFVPUkl https://t.co/T0e7YwU1jG #DeepLearning #MachineTranslation https://t.co/yIEojMr2O8
arxiv_cscl: Direct speech-to-speech translation with a sequence-to-sequence model https://t.co/po9vxP5cg4
arxiv_cscl: Direct speech-to-speech translation with a sequence-to-sequence model https://t.co/po9vxPmNEE
arxivml: "Direct speech-to-speech translation with a sequence-to-sequence model", Ye Jia, Ron J. Weiss, Fadi Biadsy, Wolfgan… https://t.co/zwZVysQwnh
arxiv_cs_LG: Direct speech-to-speech translation with a sequence-to-sequence model. Ye Jia, Ron J. Weiss, Fadi Biadsy, Wolfgang Macherey, Melvin Johnson, Zhifeng Chen, and Yonghui Wu https://t.co/6zd7tZ4pMd
BrundageBot: Direct speech-to-speech translation with a sequence-to-sequence model. Ye Jia, Ron J. Weiss, Fadi Biadsy, Wolfgang Macherey, Melvin Johnson, Zhifeng Chen, and Yonghui Wu https://t.co/55wTSHydJV
arxiv_cscl: Direct speech-to-speech translation with a sequence-to-sequence model https://t.co/po9vxPmNEE
Images
Related