Voicebox overcomes that limit by using what Meta described as an architecture that could handle the “in-filling” of audio information. TTS models usually require curated and relatively small, labeled data sets for training, as audio quality can degrade as the data set grows. Deepfake voices produced with Voicebox are so good, according to Meta, that it won’t release all of the code, and even came up with a method for detecting AI-generated audio. Meta has introduced a generative AI-powered text-to-speech (TTS) tool called Voicebox that the tech giant claims can produce a synthetic voice 20 times faster than the current state of the art and with only two seconds of recording.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |