Engineering
The End of the Orb: Building AI Agents That Feel Present
TLDR: Agents these days are blind and not very engaging, so we decided to team up with Anam and Inworld to build an agent using Vision Agents that feels personal and aware of the world around you. Give it a try here. Most voice agents today are blind. They hear words, convert them to text,
Read more
10 min read
Vision Agents v0.5.0 Release: Local Hardware I/O, Anam Avatars, and Faster Deepgram TTS
It's been a busy period since our last release, and now it’s time to share Vision Agents v0.5.0 — a step toward making production-grade multimodal AI agents easy to build and deploy. While previous versions laid the groundwork for real-time voice, video, and Vision Agents, v0.5.0 focuses on stability at scale and even more expressive
Read more
4 min read
Scaling Event-Driven Systems Without Compromising Mobile App Stability
Event-driven architecture is nothing new. IBM MQ shipped in 1993. JMS has been around since 1998. Kafka launched in 2011. But for most of that history, event-driven patterns were for specialized domains. Most developers never touched them. That's changed. Real-time mobile features, such as chat, activity feeds, live collaboration, or presence indicators, have pushed event-driven
Read more
17 min read
The Architecture and Best Practices for Mobile App Stability
A frozen message composer. A feed that won’t load. A draft that vanishes. None of these register as crashes, but all of them lose users. Add real-time features, like chat, activity feeds, or live streaming, and your crash rate can look pristine in Crashlytics while your app silently drops messages and bleeds memory. This guide
Read more
15 min read
How to Build a Social Media App: A Technical Guide
Building a social media app means a single user action must propagate to potentially millions of other users in real time, while staying fast, safe, and cheap. Every feature touches every other feature. And the hard problems shift as you scale. At 100K users, it's the database. At 1M users, it’s the fan-out strategies. At
Read more
25 min read
Developer's Guide to Ultralytics YOLO: From Theory to Real-Time Pose Detection
In most of the world, if you're YOLO'ing, you're jumping out of a plane, asking out your future spouse, or eating gas station sushi. In vision AI, You're Only Looking Once. Ultralytics' YOLO is a real-time object detection framework with a simple premise: instead of scanning an image multiple times to find and classify objects,
Read more
15 min read
Developer’s Guide to Building Vision AI Pipelines Using Grok
Grok tends to fly under the radar. While ChatGPT, Claude, and Gemini have found their footing in enterprise workflows and agentic toolchains, Grok remains mostly associated with X, which has overshadowed some genuinely strong capabilities. Chief among them is vision: Grok can understand and generate images, produce entire videos from a single prompt, and with
Read more
14 min read
How Text-to-Speech Works: Neural Models, Latency, and Deployment
Not long ago, text-to-speech (TTS) was a laughing stock. Robotic, obviously synthetic output that made customer service jokes write themselves and relegated TTS to accessibility contexts where users had no alternative. Now, you may have listened to text-to-speech today without even realizing. AI-generated podcasts, automated customer service calls, voice assistants that actually sound like assistants.
Read more
17 min read
