AI Minds Newsletter
Posts
GPT-4o’s Speech Model Says It Should “Breathe,” China Researchers Develop AI Speech Translation, and New App Detects AI Voices

GPT-4o’s Speech Model Says It Should “Breathe,” China Researchers Develop AI Speech Translation, and New App Detects AI Voices

Users shocked at GPT-4o's breathing, new app detects AI voices, and the impossibility of child-to-adult voice transformation

Jose Nicholas Francisco, Marcel Santilli & Demetrios
August 06, 2024

Welcome (back) to AI Minds, a newsletter about the brainy and sometimes zany world of AI, brought to you by the Deepgram editorial team.

In this edition:

🔊 Speech-to-Speech models are here and require no text
㊗️ Speech-to-Speech Translation can be achieved without parallel data?
👂 AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
💸 Top AI Use Cases in Marketing, Healthcare, and Finance
🏢 How non-tech companies like Toyota and Prada Are Using AI Today
🐦 Social Media Buzz: GPT-4o insists it should “breathe”
⚒️ Why it’s Easier Than You Think to Build Voice AI Assistants
❤️ The World’s First Empathic AI Voice Interface
📲 New Trending AI Voice Apps! Detect AI Voices Instantly
🎙️ New Episode of the AI Minds Podcast with CEO and Founder at HiQ, Liz Tsai
📓 Free Transcription AI Tool for You!
🤖Bonus Content: Musician reviews AI-generated Music
🍼 Child-to-adult Voice Style Transfer: A Case Study in Not Meeting Expectations
🌊 A Complete Deep Dive into AI Voice Agents

Thanks for letting us crash your inbox; let’s party. 🎉

Deepgram just released a brand new medical transcription model! Check it out here. 🥳

🎥 Voice AI Agents are Taking Over…

Gone are the days of speech-to-text-to-speech pipelines. Now, voice AI agents are directly converting speech-to-speech with no need for transcription in between. The latest version of ChatGPT proves this, in a remarkable video by YouTuber and Tech Reviewer Every.

🧑‍🔬 Achieving Speech-to-Speech Translation without Parallel Data and a new multimodal, audio-based model

Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?- It’s normally assumed that training a speech-to-speech translation model requires parallel data. That is, if you want to train a model to translate spoken English into spoken French, you need data that contains audio recordings of the exact same sentences in English and in French. This paper challenges that assumption with its introduction of the new ComSpeech model!

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head - This paper showcases “a multi-modal AI system named AudioGPT, which complements LLMs with (1) foundation models to process complex audio information and solve numerous understanding/generation tasks; and (2) the input/output interface to support spoken dialogue.

🏇 The Top AI Use Cases in Big- and Small-Businesses & How Non-Tech Companies Use AI

Top AI Use Cases in Marketing, Healthcare, and Finance - To help you witness how the big players utilize AI, this article covers Netflix and Amazon's personalized recommendations, Google's AI-powered healthcare diagnostics, and UPS and FedEx's route optimization systems, among other case studies.

How Non-Tech Companies Use AI Right Now - According to a Forbes Advisor survey, 64% of businesses expect AI to increase their overall productivity. From Prada to Toyota, Here’s how companies are achieving such results.

Asking GPT 4o advanced voice is really good.
Instructing to say tongue twisters without pausing to breathe. It insists it *has* to breathe “just like anybody speaking”
Sourced from Reddit
— Rohan Paul (@rohanpaul_ai)
3:48 PM • Aug 1, 2024

Let's build a voice AI assistant using a little bit of Python.
And let's have it use the webcam to see things.
This is easier than you think.
— Santiago (@svpino)
12:17 PM • Aug 3, 2024

Voicemod is a free voice changing software that works in real-time. It allows users to transform their voice with a wide variety of effects like robot, demon, chipmunk, male, female, celebrity voices, musical effects like autotune, and much more.

AI Voice Detector is a service that analyzes audio files to detect if a voice is real or artificially generated by AI. It helps identify fake AI voices used in scams, fraud, misinformation campaigns and more.

CloneDub is a web application that allows users to dub videos and podcasts into different languages using the same voices and lip sync. It utilizes advanced AI and machine learning technology to replicate voices and generate realistic lip movements in the dubbed language.

🎙️ AI Minds Podcast!

In this episode of the AI Minds podcast, CEO and Founder at HiQ Liz Tsai explores AI's impact on customer support automation, from her career shift to tech to integrating AI for enhanced interactions and compliance.

Liz shares her journey from commodities trading to tech, ignited by a startup opportunity in San Francisco. Liz established HiQ to transform customer support through AI-driven automation. HiQ continues refining AI to enhance customer service with a blend of technology and human expertise.

📝 Free Transcription Forever! New Speech-to-Text AI Tool

Looking for a simple way to convert speech to text? Deepgram's free transcription tool is your ultimate solution. Whether it's conversations, audio files, or YouTube videos, our advanced AI transcription tool supports over 36 languages and dialects, making it the best free AI transcription tool available online!

🤖 Bonus Bits and Bytes!

If you've scrolled this far down, we've got some exciting bonus bits of content for you!

AI, Machine Learning, Deep Learning and Generative AI Explained - In this educational video, IBM themselves explain the inner workings and the hype around generative AI in a clear, concise, (and under-10-minute) way.
You’ve Never Heard AI Music Like This - Jonny Keeley, a musician of over 15 years, tried out generating AI music for the first time… and he tried to hate it. But the results shocked quite literally everyone involved, including the artist himself.
Child-to-Adult Voice Style Transfer: A case study in auditory AI - Former AI Researcher at Stanford and Writer on the AI Minds Team, Jose Francisco, created a model that could transform child voices into adult voices. It didn’t go as planned.
AI Voice Agents: A Complete Deep Dive - This glossary entry on AI Voice Agents delves into everything from the algorithms behind speech synthesis to the testing and refinement of such technology. Learn everything you need here!