AI Minds Newsletter
Posts
Must-see: Sora’s Response, Mistral Large, New Diffusion Models & Unexpected Reactions

Must-see: Sora’s Response, Mistral Large, New Diffusion Models & Unexpected Reactions

New AI trends are popping up because of Sora, from entertainment to business... See the latest research and reactions here

Demetrios, Marcel Santilli & Jose Nicholas Francisco
February 27, 2024

Welcome (back) to AI Minds, a newsletter about the brainy and sometimes zany world of AI, brought to you by the Deepgram editorial team.

In this edition:

🎥 Sora Reactions: A Techie’s Response vs. Layman’s Response
🤯 New Research! Any-to-Any Generation via Composable Diffusion
💻 Speech-driven video editing via an audio-conditioned diffusion model
🤖 Mistral Large and its benchmarks are out! How does it stack up?
📲 3 New Trending AI Apps!
🎙️ AI Minds Podcast, featuring CoNote’s Nisha Iyer (Episode 3)
🏢 How AI Art is used in Business today
🎨 Who wins and who loses in the generative AI art boom?
🐦 Twitter: Political Experts and Tyler Perry on Sora
📚 An index of multimodal AI Models, Diffusion Models, and the CURE algorithm

Thanks for letting us crash your inbox; let’s party. 🎉

Oh yeah, and while you may be familiar with Deepgram’s speech-to-text API, you might want to check out our upcoming text-to-speech technology as well 🥳

📽️ Sora Reactions: A Techie’s Response vs. Layman’s Response

Warning: Strong language

Marques Brownlee—also known as MKBHD—is perhaps the best-known technology focused YouTuber, reviewing everything from the latest phones & headsets to cars & retro-tech. A YouTube titan for over a decade and a half, he shares his thoughts on Sora above.

Charles White—also known as penguinz0 or MoistCritikal—is a streamer and YouTuber who discusses news in entertainment, film, gaming, and much more. He’s especially known for his landmark feature in The Hunger Games trilogy as well as his (in)famous victory in one of the largest chess tournaments in the world. While he’s not a techie in the same way MKBHD is, his insights showcase what people outside of the industry think about AI.

*Editor’s Note: Even the titles of these two videos reveal the interesting contrast between a techie’s perspective and a non-techie’s perspective on revolutionary AI.

🧑‍🔬 Any-to-Any AI Generation & ML-assisted video editing

Any-to-Any Generation via Composable Diffusion - This paper unveils “a novel generative model capable of generating any combination of output modalities, such as language, image, video, or audio, from any combination of input modalities. Unlike existing generative AI systems, CoDi can generate multiple modalities in parallel and its input is not limited to a subset of modalities like text or image.”

Speech driven video editing via an audio-conditioned diffusion model- This paper proposes “propose a method for end-to-end speech-driven video editing using a denoising diffusion model. Given a video of a talking person, and a separate auditory speech recording, the lip and jaw motions are re-synchronised without relying on intermediate structural representations such as facial landmarks or a 3D face model.”

Mistral Large came out just yesterday. It’s fluent in 5 languages, scores higher than Gemini, Llama-2, and GPT-3.5 on the MMLU, and has a 32k token context window. Learn more here!

See the HackerNews discussion here.

Vapi AI is a Voice AI platform made by developers for developers. With Vapi, you've got all you need to easily build, test, and launch voicebots. The Vapi API, self-service dashboard, and customizations make it a breeze. Think of them as the backstage crew for Voice AI—handling text-to-speech, speech-to-text, and natural language pipelines. They connect with providers, handle the tech stuff, so you can focus on your genius.

DeepAI is an accessible suite of AI tools designed for artists, writers, and designers. Its features include an AI image generator and editor, AI chat bots, and an AI search engine, making AI creativity easily attainable. The image generator allows users to create unique AI art by entering text prompts and selecting from a diverse library of art styles. AI Chat offers enhanced functionalities for writing stories, generating code, and more, with Genius Mode providing advanced capabilities.

Transistor.fm serves as a user-friendly guide in navigating the complexities of podcasting. Recognizing the confusion that newcomers often face in the podcasting realm, Transistor simplifies the process. Users can record and upload their audio, and Transistor takes care of the submission and distribution across platforms like Apple Podcasts and Spotify.

Beyond hosting and analytics, Transistor offers assistance through live chat and guides, providing valuable support and answers to users' podcasting queries.

Transistor is also part of Deepgram’s Start-up program.

Spoiler alert: we spoke to Justin Jackson the Co-Founder at Transistor, they feature on our podcast section of the newsletter next week!

🎙️ AI Minds Podcast!

This week in our feature episode, we bring to you the Co-founder at CoNote, Nisha Iyer

“To build a great product, I think you need to personally or know multiple people that are experiencing a huge pain point. And that's what the tech of it, the how was AI? But the what was the pain point? And that's what led me to want to build CoNote…”

— Nisha Iyer

🏇 Where art is headed in the AI-driven world

We Hired an AI Art Generator for Our Blog and I'm Not Mad at It- In this article, branding and design expert Mara Lubell offers a glimpse into how AI art is changing businesses today. In it, you can see the evolution of her blog’s art style from fair & sweet to glorious & glamorous.

In the Generative AI Art Boom, Who Wins and Who Loses?- The title question says it all. With Sora, Diffusion Models, and even AI Music in the arsenal of anyone who has internet access, what happens next to traditional artists? How are animators, musicians, and even your local craftsmen affected? Find out here!

On social media, many are revealing the other (unintended) effects of AI video generation. The tweets below showcase a peek at how even celebrities are using, purchasing, and maybe even investing in these models. Meanwhile, the call for higher regulation remains loud and clear.

Tyler Perry says he has halted the $800M expansion of his studio after seeing OpenAI’s text-to-video model Sora.
He adds that he just used AI in 2 of his films and that with Sora “I no longer would have to travel to locations. If I wanted to be in the snow in Colorado, it’s… twitter.com/i/web/status/1…
— DiscussingFilm (@DiscussingFilm)
12:15 AM • Feb 23, 2024

My latest for @Nature: AI's environmental costs are soaring. The new energy-hungry models for video, text, and image could create an energy crisis - and impact drinking water reserves. We urgently need action from industry, researchers, and legislators. nature.com/articles/d4158…
— Kate Crawford (@katecrawford)
6:30 PM • Feb 20, 2024

📚 Glossary Pages

The glossary entries below outline everything you need to know about each respective topic—from histories, to use cases, to current implementations & resources. Check them out!

Multimodal AI Models and Modalities - Want an index of the latest and greatest multimodal AI models? Then look no further than this glossary entry!
Diffusion Models - Learn all there is to know about Diffusion Models in this glossary entry. What are its advantages over other models? What are some interesting, unforeseen use cases?
The CURE algorithm - The CURE algorithm helps data professionals identify constellations within their cosmic datasets. It tackles a fundamental challenge: how to group data points into meaningful clusters without getting thrown off by outliers or non-uniform shapes.