Engineering: AI
How Machines See: Inside Vision Models and Visual Understanding APIs
Before we read, before we write, we see. The human brain devotes more processing power to vision than to any other sense. We navigate the world through sight first, and a single glance tells us more than paragraphs of description ever could. For decades, this kind of visual understanding eluded machines. Computer vision could detect
Read more ->
8 min read
Seeing Like Gemini: Building Vision Applications with Google’s Multimodal Models
Google just dropped Gemini 3. The impression is it's impressive, and not just with words. The coolest concepts making the rounds are the ones that showcase the fundamental trait of the Gemini family of models: multimodality. From its inception, the Gemini models have been built different. Unlike GPT-4o or Claude, which bolt vision encoders onto
Read more ->
11 min read
Staying Competitive in a Rapid-Fire AI Landscape
Velocity is one of those words that shows up in every leadership deck and every product kickoff. But in practice, it behaves more like bubbles escaping a can of La Croix. The moment you try to hold onto it, it's gone. What remains is a backlog that looks less like a roadmap and more like
Read more ->
3 min read
What is MCP: The Infrastructure Powering Agentic AI
You might not be using AI agents yet, but you will soon. They'll schedule your meetings, analyze your data, write your code, and automate your workflows. However, to accomplish any of this, they need to access your calendar, data, code, and systems. Large language models can do this the same way any software talks to
Read more ->
10 min read
Vision Agents v0.2 Release
It's been just over a month since we released the first version of Vision Agents, our new open-source framework designed to help developers quickly build video AI applications using their favourite AI tool and Stream. Since the initial release, we've been hard at work adding new plugins, simplifying the code, and working with the community
Read more ->
2 min read
Why Real-Time Is the Missing Piece in Today's AI Agents
Thinking... Ruminating... Billowing... Wibbling... Cerebrating... These words invented by AI companies to mask processing are all very cute, but in reality, they're all just apologetic loading states. When ChatGPT shows "thinking" or Claude displays "ruminating," they're admitting their models aren't ready to interact with you yet. For text chat, a few seconds of delay feels
Read more ->
6 min read
Best 5 Frameworks To Build Multi-Agent AI Applications
This article aims to help you build AI agents powered by memory, knowledgebase, tools, and reasoning and chat with them using the command line and beautiful agent UIs. What is an Agent? Large language models (LLMs) can automate complex and sequential workflows and tasks. For example, you can use LLMs to build assistants that can
Read more ->
17 min read
DeepSeek R1 - The Best Local LLM Tools To Run Offline
Many people (especially developers) want to use the new DeepSeek R1 thinking model but are concerned about sending their data to DeepSeek. Read this article to learn how to use and run the DeepSeek R1 reasoning model locally and without the Internet or using a trusted hosting service. You run the model offline, so your
Read more ->
6 min read