• AI Minds Newsletter
  • Posts
  • Benchmark Cheating, MAMBA outperforms Transformers, and Better Bayesian Neural Nets: AI Under-the-Hood

Benchmark Cheating, MAMBA outperforms Transformers, and Better Bayesian Neural Nets: AI Under-the-Hood

How others cheat at benchmarking, how to build a local LLM chatbot, and why dataset curation matters more than data gathering

Welcome (back) to AI Minds, a newsletter about the brainy and sometimes zany world of AI, brought to you by the Deepgram editorial team.

In this edition:

  • 🐍 Why MAMBA algorithmically outperforms Transformers in Language Modeling

  • 📊 How we’ve made Bayesian Neural Networks more generalizable

  • 📚 New dataset curation methods lead to significantly better LLM performance

  • ♟️ Benchmark Cheating: How to do it and how to spot when others do it

  • 🦙 Tutorial: Implementing a local LLM chatbot that can run code

  • 🐦 Hot takes on datasets by WIRED and Emory University

  • 📲 An AI Chatbot for Twitch Streamers

  • 🎙️ AI Minds Podcast: Julian McCarthy, CEO of MosaicVoice

Thanks for letting us crash your inbox; let’s party. 🎉

Deepgram just released a brand new text-to-speech model called Aura! Check it out here. 🥳

🎥 Explaining MAMBA from Scratch: Neural Nets Better and Faster than Transformers

In perhaps “the most exciting development in AI since 2017,” Mamba is a new neural network architecture that outperforms Transformers at language modeling! 

In this video Algorithmic Simplicity explains how to derive Mamba from the perspective of linear RNNs. And don't worry, there's no state space model theory needed!

🧑‍🔬 Improvements to Bayesian Neural Networks and a New Web-Only Dataset

Flat Seeking Bayesian Neural Networks - The posteriors used in existing Bayesian neural networks (BNNs) do not account for the sharpness/flatness of the models derived from them in terms of model formulation. 

As a result, the sampled models can be located in regions of high sharpness and low flatness, leading to poor generalization ability. This paper attempts to solve this generalization problem.

The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data Only - As larger models requiring pre-training on trillions of tokens are considered, it is unclear how scalable dataset curation is, and whether we will run out of unique high-quality data soon. 

Contrary to popular belief, this paper shows that properly filtered and deduplicated web data alone can lead to powerful models; even significantly outperforming models trained on The Pile.

🏇 How to Cheat Benchmarks and Build a Locally-Running AI Chatbot

Lies, damn lies, and benchmarks - While there remains tremendous utility in employing benchmark evaluations in the development and selection of various AI models, it’s of critical importance to understand these benchmarks’ limitations. It’s fairly easy to game the system in order to produce the highest benchmark score possible. Here are some common ways that the dubious cheat at benchmarking (and how to spot these techniques in action!)

Implementing a local LLM chatbot that can run code and searches - This tutorial is designed to lead you through a step-by-step guide on how to build a completely local LLM chatbot with some “plugin” capabilities. Specifically, the “agent” will be able to execute arbitrary Python code and have access to a basic search tool!

🐝 Social media buzz

🧭 An AI Chatbot for Twitch Streamers

ai_licia is an AI chatbot designed specifically for Twitch streamers. It acts as an interactive companion in your Twitch chat to help engage viewers, moderate chat, and grow your channel. ai_licia connects directly to your Twitch channel and joins the chat under her own username.

🎙️ AI Minds Podcast! 

In this episode of AIMinds, we welcome Julian McCarthy, CEO and Co-founder of MosaicVoice, a company at the forefront of integrating AI into the call center industry. 

Julian takes us on his personal journey, from his early days in consulting for telecom operations to his pivotal role in investment banking, where he specialized in analytics companies. This background sparked his vision for MosaicVoice, aiming to tackle the inherent challenges within call centers. 

Through real-time analytics and AI-driven insights, MosaicVoice seeks to empower agents, improve customer interactions, and streamline operations.