• AI Minds Newsletter
  • Posts
  • Andrew Ng Announces Agentic Document Extraction, Google’s Transformer 2.0 Fixes Memory, and GPT 4.5 versus Sonnet 3.7

Andrew Ng Announces Agentic Document Extraction, Google’s Transformer 2.0 Fixes Memory, and GPT 4.5 versus Sonnet 3.7

Andrwe Ng announces agentic document extraction on Twitter. Google's Transformer 2.0 shows attention isn't all you need. GPT 4.5 tries to outsmart Sonnet 3.7 in a game.

Welcome (back) to AI Minds, a newsletter about the brainy and sometimes zany world of AI, brought to you by the Deepgram editorial team.

In this edition:

  • 🎥 Attention isn’t all you need: Google’s “Transformer 2.0” and AI Memory Recall

  • 🧠 Researchers use deep learning to detect mental illness

  • 🦊 Stereotyping animals with vision-language models

  • 💻 Deepgram & Vonage Technical Webinar: How to build responsive voice agents

  • 🛣️ Meet Deepgram at HumanX & NVIDIA GTC!

  • 📲 Three new, trending AI apps for you!

  • 📄 Andrew Ng announces Agentic Document Extraction

  • 🐦 Social Media Buzz: Best code embedding model in the market

  • 🚁 Drone uses machine learning to track its subjects with a camera

  • 🎙️ AI Minds Podcast with Pablo Palafox, Co-Founder and CEO at HappyRobot

  • 🤖 Bonus Video - Two AI agents play game OVER SOUND: Sonnet 3.7 vs GPT 4.5

Thanks for letting us crash your inbox; let’s party. 🎉

We coded with the brand-new Whisper-v3 over the past week, and the results were not what we expected. Check it out here!

🎥 Attention isn’t all you need: Google’s “Transformer 2.0” and AI Memory Recall

Video description: “In this video, [Bycloud] will be sharing the research that aims to solve the problem of context window, kv-cache, and memory recall efficiency. Even though the title only mentions Google's research, [Bycloud] also included research from Meta and Sakana AI. They paved a good way to introduce the idea of AI memory.”

Papers mentioned in the video:

🔍 Detecting Mental Illness with Deep Learning and Stereotyping Animals with vision-language AI

Tutorial on Using Machine Learning and Deep Learning Models for Mental Illness Detection - This tutorial provides practical guidance to address common challenges in applying machine learning and deep learning methods for mental health detection on platforms like social media. It focuses on strategies for working with diverse datasets, improving text preprocessing, and addressing issues such as imbalanced data and model evaluation.

Owls are wise and foxes are unfaithful: Uncovering animal stereotypes in vision-language models - This study investigates how animal stereotypes manifest in vision-language models during the task of image generation. Through targeted prompts, the authors explore whether DALL-E perpetuates stereotypical representations of animals, such as "owls as wise," "foxes as unfaithful," etc.

⚡ Technical Deep Dive: How to Build Responsive Voice Agents with Vonage & Deepgram

Learn how to build human-like voice agents for customer support, appointment scheduling and more in our March 26th technical webinar with Vonage.

When: Wednesday 26th March 2025, 10:00 PT / 12:00 ET / 17:00 GMT

Where: Online

⭐️ Save your spot here! ⭐️ 

Hosted by:

  • Benjamin Aronov, Developer Advocate at Vonage

  • Tony Chan, Senior Solutions Engineer at Vonage

  • Damien Murphy, Applied Engineer at Deepgram

🔊 Deepgram is Hitting the Road: HumanX & NVIDIA GTC

We’re gearing up for HumanX & NVIDIA GTC — two of the biggest AI events of the year. If you’re building with voice AI, stop by to see how our APIs can power real-time, scalable speech applications with low latency and high accuracy.

📍Find us here:

  • 🚀 HumanX – Booth 825

  • 🚀NVIDIA GTC – Booth 1709

Let’s meet onsite—grab time with our team!

🐝 Social Media Buzz: Ng announces Agentic Document Extraction

Snapvid AI helps save time in the video editing process by adding subtitles and emojis in seconds. Additionally, you can insert video footage, transitions, and sound effects with just one click.

Lenso is a cutting-edge application designed to enhance productivity and streamline workflows. By leveraging advanced technology, Lenso offers users a seamless experience that caters to a wide array of needs. Whether you’re a professional looking to optimize your daily tasks or a team in need of effective collaboration tools, Lenso provides a comprehensive solution.

Mix Check Studio, powered by RoEx, is a cutting-edge platform designed to provide precise feedback and enhancement for audio mixes and masters. Utilizing advanced AI technology, Mix Check Studio aims to streamline the audio production process by offering users the ability to upload their tracks and receive detailed analysis and improvements. This tool is especially beneficial for musicians, producers, and audio engineers.

🎤 The AI Minds Podcast

This episode of the AI Minds Podcast features Pablo Palafox, Co-Founder and CEO at HappyRobot. HappyRobot automates communication across channels with AI workers that integrate with your systems, manage conversations, & log data. 

He emphasizes the customer-centric approach they’ve taken, continuously refining their platform based on feedback from the logistics sector to ensure real-world value and address genuine business pain points.

🤖 Bonus Video: Two AI agents play game OVER SOUND: Sonnet 3.7 vs GPT 4.5

Video Description: “All the communication is happening exclusively by sound, using Gibberlink (powered by ggwave protocol). The bots were not programmed to play this game. One of the bots simply had an objective "play and win tic tac toe" in system prompt.”