Alberto.Codes

HomeAboutExperienceProjectsBlogContact

Blog

Thoughts on AI engineering, Python, career growth, and technical leadership — organized using the Diataxis framework.

tutorialhow-toexplanationreference
explanation

2026-03-31

5 min read

"From one model to seven: Making TurboQuant model-portable"

"turboquant-vllm started as a Molmo2-only proof of concept. v1.3.0 validates seven model families — but getting there meant rewriting Triton kernels for non-standard head dimensions and teaching the cache about sliding window attention."

how-to

2026-03-28

4 min read

Serve compressed VLM inference from a container

"Build a container image with turboquant-vllm baked in, serve a vision-language model with 3.76x KV cache compression, and verify it works — in under five minutes."

explanation

2026-03-27

5 min read

"Paper to PyPI in 72 hours: Building the first TurboQuant vLLM plugin"

"Google published TurboQuant at ICLR 2026 for text models. 72 hours later, turboquant-vllm was on PyPI — the first implementation validated on vision-language models and the first vLLM plugin. One flag to enable, 3.76x KV cache compression."

explanation

2026-03-26

6 min read

I ran TurboQuant on a vision model. The first output was garbage.

I implemented Google's TurboQuant algorithm for KV cache compression and validated it on Molmo2 video inference on an RTX 4090 — 3.76x compression with near-identical output at 1.78x overhead.

explanation

2026-03-23

6 min read

I asked an AI to explain boto3. Then I fixed the docstrings.

I cloned the most downloaded Python package twice, fixed the docstrings with docvet, and asked AI to generate architecture documentation from both. The results weren't even close.

explanation

2026-03-22

6 min read

When docstrings lie, your AI tools pay the price

Wrong documentation hurts AI tools more than missing documentation. docvet 1.14 introduces bidirectional verification — checking both what your docstrings fail to mention and what they wrongly claim.

how-to

2026-03-06

5 min read

Encrypt ADK Sessions in 5 Minutes

Install adk-secure-sessions, swap one import, and verify your agent's session data is encrypted at rest — start to finish in under 5 minutes.

explanation

2026-03-05

8 min read

Stop Writing AI Agent Prompts by Hand

You write your agent's instructions, test them, tweak a word, test again, and hope the change helped. There's an algorithm that does this better than you do — evolutionary optimization finds prompts you'd never write yourself.

how-to

2026-03-04

4 min read

Give Your AI Agent a Docstring Quality Tool

Wire docvet's MCP server into VS Code, Cursor, or Claude Code — your AI coding agent gets structured docstring quality checks without parsing CLI output.

explanation

2026-03-01

7 min read

Your AI Agent's Memories Aren't Encrypted

Google ADK stores everything your agent knows — tool calls, user messages, conversation context — in plaintext SQLite. Here's why that matters and how to fix it.

explanation

2026-02-25

6 min read

Your AI Reads Your Docstrings. Are They Right?

Stale docstrings poison your AI coding agent's understanding of your codebase. Research shows incorrect documentation is worse than no documentation at all.

explanation

2026-02-05

7 min read

Entity Resolution is Recipe Matching

When exact matching fails, probabilistic record linkage weighs evidence like a chef recognizes a dish—not by a single ingredient, but by the whole picture.

explanation

2026-02-04

8 min read

Lineage IDs in Multimodal AI Pipelines

The simplest way to keep images, videos, model calls, and outputs tied together across retries and fan-out.

explanation

2026-02-02

6 min read

Task Queues, Idempotency, and AI Pipelines

Why at-least-once delivery means your AI pipeline will process duplicates, and why idempotency is the only reliable fix.

explanation

2026-02-01

4 min read

Why I Chose Reflex for My Portfolio Site

A Python engineer's case for building a portfolio site without touching JavaScript.

© 2026 Alberto Nieto. All rights reserved.