• AI Minds Newsletter
  • Posts
  • OpenAI’s Hallucinations, Eleven Labs Sued by Voice Actors, and a Billion Parameter Speech Model

OpenAI’s Hallucinations, Eleven Labs Sued by Voice Actors, and a Billion Parameter Speech Model

Cornell research calls out OpenAI's Hallucinations, the world's biggest TTS model is here, and Voice actors are filing lawsuits to tech companies

Welcome (back) to AI Minds, a newsletter about the brainy and sometimes zany world of AI, brought to you by the Deepgram editorial team.

In this edition:

  • 🎹 How AI Vocals are Revolutionizing the Music Industry

  • 🔇 OpenAI’s Speech-to-Text Hallucination Harms: What’s really wrong with Whisper

  • 📊 BASE TTS: The world’s largest Text-to-Speech model to date

  • 🛠️ Virtual Workshop! Learn to Build Voice AI Agents Now

  • 🤖 How non-tech companies use Voice AI and AI Agents

  • 🦄 An independent developer’s take on Whisper-v3 hallucinations.

  • 🐦 Twitter: Voice Actors’ Lawsuits and 20 Questions with a Voice AI Model

  • 📲 Three New Trending AI Apps for You!

  • 🎤 AI Minds Podcast with Shai Unterslak, co-founder of Based Social Company

  • 📝 (Once Again!) A Free Transcription Tool for you!

  • 🎶 Top 10 AI Generated Songs with WatchMojo

  • 🏛️ The FTC’s take on Voice Cloning: Governmental Thoughts

  • 🎸 Bonus Content: Ozzy Osbourne listens to an AI Version of Himself Singing

  • 🐑 Exposing Voice Cloning: How Synthetic Voices Shape Futures

  • 😇 A Deep Dive into Ethical AI

Thanks for letting us crash your inbox; let’s party. 🎉

Deepgram just released a brand new medical transcription model! Check it out here. 🥳

🎥  AI Vocals Are Revolutionizing the Music Industry

This week, we’re showcasing the various capabilities of audio-based AI, from text-to-speech to music generation. In this video, Doctor Mix demonstrates perhaps the most advanced AI vocals technology in the world currently. Check it out!

🧑‍🔬 OpenAI’s Speech-to-Text Hallucinations and the World’s Largest Text-to-Speech Model

Careless Whisper: Speech-to-Text Hallucination Harms - What happens when Speech-to-Text goes wrong? Well that’s what this paper delves into as it analyzes OpenAI’s Whisper. As the authors state: “While many of Whisper’s transcriptions were highly accurate, we find that roughly 1% of audio transcriptions contained entire hallucinated phrases or sentences which did not exist in any form in the underlying audio.”

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data -  BASE TTS is the largest TTS model to-date, trained on 100K hours of public domain speech data and deploying a 1-billion-parameter autoregressive Transformer that converts raw texts into discrete codes ("speechcodes") followed by a convolution-based decoder which converts these speechcodes into waveforms in an incremental, streamable manner.

💻 Virtual Workshop: Building Voice AI Agents at Scale with Deepgram & Groq

Master building voice AI agents in this practical, hands-on workshop hosted by Deepgram and Groq. Limited time: get 20% off with code AUGAI20 until 08/31. Sign up

When: Friday, September 20th | 9AM - 12PM PT

Where: Zoom

🏇 The Best and Worst of Voice Technology: Voice AI Agents and Whisper Hallucinations

How Non-Tech Companies Use Voice AI and AI Agents - From educational companies to healthcare providers, voice AI agents are expanding into new horizons. From Macy’s to Deutsche Bank, you’ll see that these vocally-powered assistants act as quite the fuel to the productivity fire.

Whisper-v3 Hallucinations on Real World Data - Much like the “Careless Whisper” paper from Cornell listed above, our very own Deepgram researchers and AI experts found that OpenAI’s Speech-to-Text was quite lacking in its capabilities.

🐝 Social Media Buzz: Voice Actors’ Lawsuits and 20 Questions with Voice AI

SEOpital is an AI-powered SEO writing tool designed to help businesses and individuals create high-quality, SEO-optimized content in a fraction of the time it would take manually. With over 500 writers and SEO agencies using SEOpital daily, it's quickly becoming the go-to tool for SEO content generation and enhancement.

Retell AI is at the forefront of voice technology, offering an advanced API that enables developers to create voice agents that interact like humans, execute complex tasks, and follow instructions with unprecedented ease and efficiency. Inspired by the likes of JARVIS from Iron Man, Retell AI reduces the time to build these sophisticated agents from months to just a day.

UserCall is an innovative platform designed to streamline user interviews through AI-powered voice agents. By automating the interview process, UserCall offers businesses the ability to gather deep, qualitative insights from countless users without the typical time and resource constraints.

🎙️ AI Minds Podcast!

Shai Unterslak, co-founder of Based Social Company (Compass), shares his journey from founding a Cape Town company to developing Compass in the U.S., highlighting AI-driven audio transcription and future AI interactions.

This episode covers everything from Shai’s founding journey to his transition from South Africa to the United States in order to work on the future of technology.

📝 Free Transcription Forever! New Speech-to-Text AI Tool

Looking for a simple way to convert speech to text? Deepgram's free transcription tool is your ultimate solution. Whether it's conversations, audio files, or YouTube videos, our advanced AI transcription tool supports over 36 languages and dialects, making it the best free AI transcription tool available online. Discover how easy and efficient transcription can be with our tool.

🤖 Bonus Bits and Bytes!

If you've scrolled this far down, we've got some exciting bonus bits of content for you!