I work at ValueFirst Digital Media Private Ltd. I am a Product Marketer in the Surbo Team. Surbo is Chatbot Generator Platform owned by Value First. ...Full Bio
I work at ValueFirst Digital Media Private Ltd. I am a Product Marketer in the Surbo Team. Surbo is Chatbot Generator Platform owned by Value First.
Success story of Haptik
590 days ago
Who is afraid of automation?
590 days ago
What's happening in AI, Blockchain & IoT
591 days ago
3 million at risk from the rise of robots
591 days ago
Artificial Intelligence will soon have a voice like humans
Google's Alphabet AI Lab used their DeepMind artificial intelligence (AI) to develop a synthetic speech system called WaveNet back in 2016. This system runs on an artificial neural network and is capable of speech samples that are far better than those produced by other technologies. The AI voice is becoming more human like. WaveNet has improved since then and is now good enough for Google Assistant across all platforms.
According to a paper by Google that is still under peer review, WaveNet is getting a text-to-speech system called Tacotron 2. This is effectively the second generation of Google's synthetic speech AI. This new system combines the deep neural networks of WaveNet & Tacotron 2.
First, Tacotron 2 translates text into a visual representation of audio frequencies over time, called a spectogram. This is then fed into WaveNet, which reads the spectogram and creates a chart with the corresponding audio elements.
According to the study, the "model achieves a mean opinion score (MOS) of 4.53 comparable to a MOS of 4.58 for professionally recorded speech." Simply put, it sounds very much like a person speaking.
In fact, Google put recordings of a human and their new AI side-by-side, and it's difficult to tell which is the person and which is the machine.
To date, AI systems have gotten better at blurring the line between human and machine. There are now AIs capable of generating images of human beings that aren't real, but look it. Another AI can even make fake videos. One can't also discount the fact that some AIs are getting better at storytelling, or making art.
Mimicking human speech was always a challenge for AI networks. Now, DeepMind's WaveNet and Tacotron 2 seem to be changing that, and at quite an impressive rate. Not only does the AI pronounce words clearly, but it seems to be able to handle difficult to pronounce words or names, as well as put emphasis on the appropriate words based on punctuations.
This does not mean that the new AI system is completely perfect. However, one should keep in mind that this current iteration has been only trained to use one voice that Google had recorded from a woman they hired. For this new system to work with other voices, it will have to trained with again.
Besides having immediate applications for Google Assistant, as soon as Tacotron 2 is perfect, the technology could be applied to other areas. It may perhaps also take over certain jobs like other applications of AI are.