• AI Minds Newsletter
  • Posts
  • Zuckerberg & Theo Von Poke Fun at OpenAI, What a GPT-image Feedback Loop Looks Like, and Zero-Shot Singing Voice Synthesis

Zuckerberg & Theo Von Poke Fun at OpenAI, What a GPT-image Feedback Loop Looks Like, and Zero-Shot Singing Voice Synthesis

Meta CEO Mark Zuckerberg and comedian Theo Von briefly poke fun at OpenAI in a new podcast episode. X user shows what a GPT-image feedback loop (shockingly?) looks like. New research yields zero-shot singing voice synthesis model, proving that anyone can sing. And much more is revealed in this week's edition of AI Minds!

Welcome (back) to AI Minds, a newsletter about the brainy and sometimes zany world of AI, brought to you by the Deepgram editorial team.

In this edition:

  • 🎥 Software developer builds an AI therapist that only gives terrible advice

  • 🎨 Generating Multimodal Images with GAN

  • 🎤 Zero-Shot Singing Voice Synthesis and Conversion with Speech Reference

  • ⚡ Webinar: “Build Enterprise-Ready Voice Experiences with Aura-2”

  • 🐦 Social Media Buzz: Zuckerberg and Comedian Theo Von poke fun at OpenAI

  • 🍃 Google excited to announce two Nature publications from Project AMIE

  • 🌀 What a GPT-image feedback loop looks like

  • 📲 Three new, trending AI apps for you!

  • 📹 Meet NEO, Your Robot Butler in Training | Bernt Børnich | TED

  • 🧠Google’s New AI System Outperforms Physicians in Complex Diagnoses

  • 📈Optimizing Natural Language Processing Models Using Backtracking Algorithms

  • 📚Deep Dive: Computational Creativity

Thanks for letting us crash your inbox; let’s party. 🎉

Looking for a cutting-edge AI medical transcription model? Click here. 🥳

🎥  I Built an AI Therapist That Only Gives Terrible Advice

The hilarious and clever software engineer DeveloperFilip has made another hit video! In this demo+tutorial, you can witness him build an AI therapist that only gives terrible advice — and yes, it’s exactly as chaotic as it sounds. Check it out!

Note: You can code up this project yourself using the links in the description 😃

🔍  Generating Multimodal Images with GAN and Synthesizing Singing Voices (Zero-Shot)

Generating Multimodal Images with GAN: Integrating Text, Image, and Style - The authors of this paper propose a multimodal image generation method based on Generative Adversarial Networks (GAN), capable of effectively combining text descriptions, reference images, and style information to generate images that meet multimodal requirements. This method involves the design of a text encoder, an image feature extractor, and a style integration module, ensuring that the generated images maintain high quality in terms of visual content and style consistency.

Everyone-Can-Sing: Zero-Shot Singing Voice Synthesis and Conversion with Speech Reference - The authors of this paper propose a unified framework for Singing Voice Synthesis (SVS) and Conversion (SVC), addressing the limitations of existing approaches in cross-domain SVS/SVC, poor output musicality, and scarcity of singing data. The proposed zero-shot learning paradigm consists of one SVS model and two SVC models, utilizing pre-trained content embeddings and a diffusion-based generator.

⚡ Webinar: “Build Enterprise-Ready Voice Experiences with Aura-2”

In this webinar, you’ll see how developers are building real-time, high-performance voice applications with Aura-2: Deepgram’s newest text-to-speech model, built on the same enterprise-grade runtime that powers our STT and speech-to-speech capabilities.

TUNE IN TO LEARN

  • 🔊 Why enterprise-ready TTS needs more than just a natural voice – Hear how Aura-2 handles specialized language, tone, and pacing with clarity and consistency.

  • 📈 How the Deepgram Enterprise Runtime powers scalable voice AI – Discover automated model adaptation, built-in hot-swapping, and flexible hosting.

  • 🔎 See Aura-2 in action (live demo) and get guidance for integrating it into your apps.

  • When: Tuesday, May 6th → 11am PDT | 1pm CT | 2pm EST

  • Where: Online

  • Sign up here!

🐝 Social Media Buzz: Zuckerberg and Comedian Poke Fun at OpenAI

Lemon Slice (formerly Infinity AI) is a video foundation model that allows you to create expressive, talking characters. For the first time, any image of a character can be immediately transformed into an interactive video call supported in 10+ languages. To accomplish this, Lemon Slice trained a custom DiT model that streams at 25fps. It works across styles - from photorealistic to cartoons to paintings.

Icons8 is a comprehensive design resource that offers a wide array of products and services tailored for the creative community. From an extensive collection of PNG and SVG icons in 47 different styles to innovative AI technology products like the AI Face Generator, Icons8 is dedicated to fueling creativity and enhancing visual designs. With custom tools like the Iconizer tool for editing SVG icons without any technical skills, and a library full of high-quality graphics, Icons8 stands out by producing its own content, making it a one-stop-shop for all design needs.

Waveformer is a groundbreaking open-source web application developed by Replicate that brings a unique twist to music generation. By leveraging the power of MusicGen, Waveformer allows users to create music from text inputs, transforming written words into harmonious sounds.

🤖 Bonus Bits and Bytes!