Thoughts on AI engineering, Python, career growth, and technical leadership — organized using the Diataxis framework.
2026-03-31
5 min read
"turboquant-vllm started as a Molmo2-only proof of concept. v1.3.0 validates seven model families — but getting there meant rewriting Triton kernels for non-standard head dimensions and teaching the cache about sliding window attention."
2026-03-28
4 min read
"Build a container image with turboquant-vllm baked in, serve a vision-language model with 3.76x KV cache compression, and verify it works — in under five minutes."
2026-03-27
5 min read
"Google published TurboQuant at ICLR 2026 for text models. 72 hours later, turboquant-vllm was on PyPI — the first implementation validated on vision-language models and the first vLLM plugin. One flag to enable, 3.76x KV cache compression."
2026-03-26
6 min read
I implemented Google's TurboQuant algorithm for KV cache compression and validated it on Molmo2 video inference on an RTX 4090 — 3.76x compression with near-identical output at 1.78x overhead.
2026-03-23
6 min read
I cloned the most downloaded Python package twice, fixed the docstrings with docvet, and asked AI to generate architecture documentation from both. The results weren't even close.
2026-03-22
6 min read
Wrong documentation hurts AI tools more than missing documentation. docvet 1.14 introduces bidirectional verification — checking both what your docstrings fail to mention and what they wrongly claim.
2026-03-06
5 min read
Install adk-secure-sessions, swap one import, and verify your agent's session data is encrypted at rest — start to finish in under 5 minutes.
2026-03-05
8 min read
You write your agent's instructions, test them, tweak a word, test again, and hope the change helped. There's an algorithm that does this better than you do — evolutionary optimization finds prompts you'd never write yourself.
2026-03-04
4 min read
Wire docvet's MCP server into VS Code, Cursor, or Claude Code — your AI coding agent gets structured docstring quality checks without parsing CLI output.
2026-03-01
7 min read
Google ADK stores everything your agent knows — tool calls, user messages, conversation context — in plaintext SQLite. Here's why that matters and how to fix it.
2026-02-25
6 min read
Stale docstrings poison your AI coding agent's understanding of your codebase. Research shows incorrect documentation is worse than no documentation at all.
2026-02-05
7 min read
When exact matching fails, probabilistic record linkage weighs evidence like a chef recognizes a dish—not by a single ingredient, but by the whole picture.
2026-02-04
8 min read
The simplest way to keep images, videos, model calls, and outputs tied together across retries and fan-out.
2026-02-02
6 min read
Why at-least-once delivery means your AI pipeline will process duplicates, and why idempotency is the only reliable fix.
2026-02-01
4 min read
A Python engineer's case for building a portfolio site without touching JavaScript.