• AI Minds Newsletter
  • Posts
  • Andrew Ng’s new program launch, Why we’re poorly equipped to recognize voice clones, and Google Deepmind on 60 Minutes

Andrew Ng’s new program launch, Why we’re poorly equipped to recognize voice clones, and Google Deepmind on 60 Minutes

Andrew Ng announces the launch of a new 5-course program. Researchers study why we are currently poorly equipped to recognize when an AI is speaking versus a human. Google Deepmind gets a spotlight on 60 Minutes. And much, much more is revealed in this edition of AI Minds!

Welcome (back) to AI Minds, a newsletter about the brainy and sometimes zany world of AI, brought to you by the Deepgram editorial team.

In this edition:

  • 🎥 Which LLM makes the best doctor, according to a doctor

  • 🔊 Why we’re poorly equipped to recognize voice clones

  • 🏥 Healthcare AI Research: How far we are from achieving Baymax

  • ⚡ Webinar - “Voice AI in 2025: From Robotic IVRs to Human-like Voice AI Agents”

  • 🔨 Webinar: “Build Enterprise-Ready Voice Experiences with Aura-2”

  • 🐦 Social Media Buzz: Ng’s new program launch, Elon’s Colossus 2, and more!

  • 📲 Three new, trending AI apps for you!

  • 🧠 Google DeepMind on 60 Minutes

  • 💥 Exploring the TextAttack Framework: Components, Features & Applications

  • 🔊 2025 State of Voice Report (Featured last week)

  • 📚 Deep Dive: AI Lifecycle Management

Thanks for letting us crash your inbox; let’s party. 🎉

Looking for a cutting-edge AI medical transcription model? Click here. 🥳

🎥  Which LLM Makes the Best Doctor?

Dr. Mikhail "Mike" Varshavski D.O. is an actively practicing board certified Family Medicine Doctor living in NYC. In this video, he really puts LLMs to the test. Namely, Doctor Mike asks ChatGPT, Llama, Gemini, and Grok a series of medical questions to “see which has the best chance of replacing [him].” Check it out!

🔍  Why we’re poorly equipped to recognize voice clones and how far we are from Baymax

People are poorly equipped to detect AI-powered voice clones - Through a series of perceptual studies, the authors of this paper report on the realism of AI-generated voices in terms of identity matching and naturalness. They find human participants cannot consistently identify recordings of AI-generated voices.

A Survey of LLM-based Agents in Medicine: How far are we from Baymax? - This survey provides a comprehensive review of LLM-based agents in medicine, examining their architectures, applications, and challenges. It analyzes the key components of medical agent systems, including system profiles, clinical planning mechanisms, medical reasoning frameworks, and external capacity enhancement.

⚡ Webinar - “Voice AI in 2025: From Robotic IVRs to Human-like Voice AI Agents”

Check out the webinar here!

About this talk:

AI-powered voice agents are poised to close the satisfaction gap that has long plagued traditional voice technologies like IVR systems.

Despite this gap, forward-thinking enterprises are increasing their investments in voice technology, recognizing that next-generation voice AI agents represent a fundamental shift in customer experience and operational efficiency.

Join Opus Research and Deepgram for this live, interactive webinar as we unveil findings from the 2025 State of Voice AI Report, exploring the most compelling business reasons to implement voice AI agents and what key improvements are required to unlock even greater adoption.

🔨 Webinar: “Build Enterprise-Ready Voice Experiences with Aura-2”

See how developers are building real-time, high-performance voice applications with Aura-2: Deepgram’s newest text-to-speech model, built on the same enterprise-grade runtime that powers our STT and speech-to-speech capabilities.

TUNE IN TO LEARN

  • 🔊 Why enterprise-ready TTS needs more than just a natural voice – Hear how Aura-2 handles specialized language, tone, and pacing with clarity and consistency.

  • 📈 How the Deepgram Enterprise Runtime powers scalable voice AI – Discover automated model adaptation, built-in hot-swapping, and flexible hosting.

  • 🔎 See Aura-2 in action (live demo) and get guidance for integrating it into your apps.

Sign up here!

🐝 Social Media Buzz: Ng’s new program launch, Elon’s Colossus 2, and more!

Media Semantics Character API - The Character API is "animation in the cloud". It takes a TTS voice as input and produces a live, lip-synced, talking character from it, complete with gestures and emotional response. Delivering multiple character styles, it can be used for everything from videos to fully interactive applications. Together with LLMs, the technology enables you to create embodied, "social" agents like a greeter, teacher, or virtual concierge.

ChatGPT for YouTube is a free Chrome extension that utilizes ChatGPT to generate text summaries of YouTube videos. This allows users to quickly understand the key points and content of a video without having to watch the full length.

Draw3D AI is a revolutionary AI tool that converts hand-drawn sketches into photorealistic images. Upload a sketch and Draw3D AI will automatically transform it into a realistic image using AI technology. It works with any detailed sketch - landscapes, animals, objects, etc. Bring your imagination to life!

🤖 Bonus Bits and Bytes!