AI Minds Newsletter
Posts
OpenAI’s Hallucinations, Eleven Labs Sued by Voice Actors, and a Billion Parameter Speech Model

OpenAI’s Hallucinations, Eleven Labs Sued by Voice Actors, and a Billion Parameter Speech Model

Cornell research calls out OpenAI's Hallucinations, the world's biggest TTS model is here, and Voice actors are filing lawsuits to tech companies

Jose Nicholas Francisco & Marcel Santilli
September 03, 2024

Welcome (back) to AI Minds, a newsletter about the brainy and sometimes zany world of AI, brought to you by the Deepgram editorial team.

In this edition:

🎹 How AI Vocals are Revolutionizing the Music Industry
🔇 OpenAI’s Speech-to-Text Hallucination Harms: What’s really wrong with Whisper
📊 BASE TTS: The world’s largest Text-to-Speech model to date
🛠️ Virtual Workshop! Learn to Build Voice AI Agents Now
🤖 How non-tech companies use Voice AI and AI Agents
🦄 An independent developer’s take on Whisper-v3 hallucinations.
🐦 Twitter: Voice Actors’ Lawsuits and 20 Questions with a Voice AI Model
📲 Three New Trending AI Apps for You!
🎤 AI Minds Podcast with Shai Unterslak, co-founder of Based Social Company
📝 (Once Again!) A Free Transcription Tool for you!
🎶 Top 10 AI Generated Songs with WatchMojo
🏛️ The FTC’s take on Voice Cloning: Governmental Thoughts
🎸 Bonus Content: Ozzy Osbourne listens to an AI Version of Himself Singing
🐑 Exposing Voice Cloning: How Synthetic Voices Shape Futures
😇 A Deep Dive into Ethical AI

Thanks for letting us crash your inbox; let’s party. 🎉

Deepgram just released a brand new medical transcription model! Check it out here. 🥳

🎥 AI Vocals Are Revolutionizing the Music Industry

This week, we’re showcasing the various capabilities of audio-based AI, from text-to-speech to music generation. In this video, Doctor Mix demonstrates perhaps the most advanced AI vocals technology in the world currently. Check it out!

🧑‍🔬 OpenAI’s Speech-to-Text Hallucinations and the World’s Largest Text-to-Speech Model

Careless Whisper: Speech-to-Text Hallucination Harms - What happens when Speech-to-Text goes wrong? Well that’s what this paper delves into as it analyzes OpenAI’s Whisper. As the authors state: “While many of Whisper’s transcriptions were highly accurate, we find that roughly 1% of audio transcriptions contained entire hallucinated phrases or sentences which did not exist in any form in the underlying audio.”

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data - BASE TTS is the largest TTS model to-date, trained on 100K hours of public domain speech data and deploying a 1-billion-parameter autoregressive Transformer that converts raw texts into discrete codes ("speechcodes") followed by a convolution-based decoder which converts these speechcodes into waveforms in an incremental, streamable manner.

💻 Virtual Workshop: Building Voice AI Agents at Scale with Deepgram & Groq

Master building voice AI agents in this practical, hands-on workshop hosted by Deepgram and Groq. Limited time: get 20% off with code AUGAI20 until 08/31. Sign up

When: Friday, September 20th | 9AM - 12PM PT

Where: Zoom

🏇 The Best and Worst of Voice Technology: Voice AI Agents and Whisper Hallucinations

How Non-Tech Companies Use Voice AI and AI Agents - From educational companies to healthcare providers, voice AI agents are expanding into new horizons. From Macy’s to Deutsche Bank, you’ll see that these vocally-powered assistants act as quite the fuel to the productivity fire.

Whisper-v3 Hallucinations on Real World Data - Much like the “Careless Whisper” paper from Cornell listed above, our very own Deepgram researchers and AI experts found that OpenAI’s Speech-to-Text was quite lacking in its capabilities.

Shayne and I built an insanely fast AI voice assistant in 50 LOC.
Llama 3.1 running on @CerebrasSystems.
2.5x faster inference than literally anything else.
🔥 400ms response times.
Uses:
🌐 @livekit transport
👂 @DeepgramAI STT
🧠 @CerebrasSystems LLM
🗣️ @cartesia_ai TTS
— dsa (@dsa)
5:14 PM • Aug 27, 2024

New AI lawsuit filed today:
A group of voice actors sued Eleven Labs, accusing the text-to-speech service of training its AI voice models using the actors' audiobook recordings.
— Rob Freund (@RobertFreundLaw)
1:38 AM • Aug 30, 2024

It's truly incredible how good @suno_ai_ is with vocal flow and song structure
— Nick St. Pierre (@nickfloats)
8:27 PM • Aug 29, 2024

SEOpital is an AI-powered SEO writing tool designed to help businesses and individuals create high-quality, SEO-optimized content in a fraction of the time it would take manually. With over 500 writers and SEO agencies using SEOpital daily, it's quickly becoming the go-to tool for SEO content generation and enhancement.

Retell AI is at the forefront of voice technology, offering an advanced API that enables developers to create voice agents that interact like humans, execute complex tasks, and follow instructions with unprecedented ease and efficiency. Inspired by the likes of JARVIS from Iron Man, Retell AI reduces the time to build these sophisticated agents from months to just a day.

UserCall is an innovative platform designed to streamline user interviews through AI-powered voice agents. By automating the interview process, UserCall offers businesses the ability to gather deep, qualitative insights from countless users without the typical time and resource constraints.

🎙️ AI Minds Podcast!

Shai Unterslak, co-founder of Based Social Company (Compass), shares his journey from founding a Cape Town company to developing Compass in the U.S., highlighting AI-driven audio transcription and future AI interactions.

This episode covers everything from Shai’s founding journey to his transition from South Africa to the United States in order to work on the future of technology.

📝 Free Transcription Forever! New Speech-to-Text AI Tool

Looking for a simple way to convert speech to text? Deepgram's free transcription tool is your ultimate solution. Whether it's conversations, audio files, or YouTube videos, our advanced AI transcription tool supports over 36 languages and dialects, making it the best free AI transcription tool available online. Discover how easy and efficient transcription can be with our tool.

🤖 Bonus Bits and Bytes!

If you've scrolled this far down, we've got some exciting bonus bits of content for you!

Top 10 AI Generated Songs - WatchMojo strikes again with ten songs that may make you question your ability to tell the difference between man and machine.
The FTC on Voice Cloning: Preventative Actions - Want to see exactly what the US Government thinks of Voice Cloning? Check out this official statement from the Federal Trade Commission's Office of Technology and Division of Marketing Practices.
Ozzy Osbourne listens to an AI version of himself singing - Find out what the rockstar thinks in this video.
Exposing Voice Cloning: How Synthetic Voices Shape Futures - On the note of Voice Cloning, check out this article on its predicted impact on everyday people and celebrities alike.
Ethical AI: A Deep Dive - It’s impossible to talk about voice cloning and other technology of murky moral foundations without delving into the subject of Ethical AI as a whole. Check out this article to find out what really is the difference between good and evil in voice AI.