Welcome to WarBulletin - your new best friend in the world of gaming. We're all about bringing you the hottest updates and juicy insights from across the gaming universe. Are you into epic RPG adventures or fast-paced eSports? We've got you covered with the latest scoop on everything from next-level PC gaming rigs to the coolest game releases. But hey, we're more than just news! Ever wondered what goes on behind the scenes of your favorite games? We're talking exclusive interviews with the brains behind the games, fresh off-the-press photos and videos straight from gaming conventions, and, of course, breaking news that you just can't miss. We know you love gaming 24/7, and that's why we're here round the clock, updating you on all things gaming. Whether it's the lowdown on a new patch or the buzz about the next big gaming celeb, we're on it.

Contacts

  • Owner: SNOWLAND s.r.o.
  • Registration certificate 06691200
  • 16200, Na okraji 381/41, Veleslavín, 162 00 Praha 6
  • Czech Republic

Microsoft's latest speech generator is so good they're scared to release it to the public

This thing we made is so brilliant, we can't risk releasing it to the general public. So Microsoft basically says about it's latest speech generator, VALL-E 2. So, does that reflect genuine concerns? Or is it a clever marketing ruse designed to get some viral traction and online chins wagging?

If it is all completely genuine, what does it say about Microsoft that it's knowingly creating AI tools too dangerous to release? It's a conundrum, to be sure.

Anyway, here are the basic facts of the situation. Microsoft says in a recent blog post (via Extremetech) that it's latest neural codec language model for speech synthesis, known as VALL-E 2, achieves «human parity for the first time».

More specifically, «VALL-E 2 can generate accurate, natural speech in the exact voice of the original speaker, comparable to human performance.» Now, to some extent, this is nothing new. However, it's the incredible speed with which VALL-E 2 can achieve this, or to put it another way, the incredibly limited sample or prompt it needs to achieve this feat that's remarkable.

VALL-E 2 can accurately mimic a specific person's voice based on a sample just a few seconds long. It pulls that trick off by using a huge training library that maps variations in pronunciation, intonation, cadence in the model to the sample and spits out what appears to be totally convincing synthesised speech. 

Microsoft's blog post has a range of example audio clips demonstrating how well VALL-E 2 (and indeed its predecessor, VALL-E) can turn a short sample of between three and 10 seconds into convincing synthesised speech that's often indistinguishable from a real human voice.

It's a process known as zero-shot text-to-speech synthesis or zero-shot TTS for short. Again, the approach is nothing new, it's the accuracy and shortness of the sample audio that's novel.

Keep up to date with the most important stories and the best deals, as picked by the PC Gamer team.

Of course, the idea of weaponising such tools to create

Read more on pcgamer.com