Session Details: SMPTE 2020

Name

“AI News Anchor” with Deep Learning-based Speech Synthesis

Date & Time

Wednesday, November 11, 2020, 8:00 PM - 8:30 PM

Timezone

UTC

Speakers

Kiyoshi Kurihara - NHK (Japan Broadcasting Corporation)

Description

Deep learning-based text-to-speech (DL-TTS) is put to general use in various situations. The sound quality of DL-TTS is closer to that of humans. We developed “AI news anchor,” which uses news-specific DL-TTS and launched it for live broadcast programs and automatic news speech distribution services. We herein report our work. We developed DL-TTS for controlling speaking style, speech speed, pitch, and intonation to facilitate the creation of various program productions. More specifically, we enabled changing specific speaking styles, like “news style,” which mimics the style of news reporters, and “conversation style.” The purpose of creating this system was to eliminate discomfort due to differences in speech and speaking styles. The speaking style control method is an important factor in news speech because the mismatched speaking style does not convey the news articles correctly. We conducted an evaluation experiment on the conveying of news articles collected to speaking style control and found the appropriate speaking styles of automatic generated news speech. We herein report the latest research on speaking style control. Our TTS system began utilizing an automatic news speech distribution system for smart speakers and the official home page of broadcast stations in 2018. Automatic distribution is used to deliver the latest news around the clock, while reducing the amount of labor required to produce the speech content. Users of smart speakers receive benefits from obtaining this latest news without having to read it. The key feature is to acquire the latest news orally in a short period of time. We have provided slow news speech representing easy-to-understand Japanese news articles in our web site “News Web Easy.” This service enables elementary school students and overseas Japanese learners to understand difficult news articles. An “AI News Anchor” also appears on TV. For live broadcasts, we developed an anime CG character production device connecting our TTS system. CG performs in the studio by using virtual production and talk to the cast. This system can be applied to editing machines for unmanned commentary and can also be used for videos appearing on Twitter. Through this study, the news-specific TTS system was found to have brought new services to broadcast stations. In the future, we will consider various utilization methods of flexible production style utilizing the cloud system. Thus, we will explore a new production system such as a cloud-based system for news speech automation.

Technical Depth of Presentation

Intermediate

Engineers, Researchers, and Program Directors

Deep learning-based news specific text-to-speech AI news anchor Automatic generated news speech delivery for Internet and live broadcasting