Blog

Notes and reflections from hands-on builds. Architecture decisions, trade-offs, and lessons learned.

December 28, 2025

Building a Local AI Assistant

Notes from running speech recognition, an LLM, and TTS on a single RTX 5070 Ti. Covers memory constraints, quantization trade-offs, and architectural choices driven by 16GB of shared VRAM.

Coming soon

Notes on LLM Inference Optimization

Observations from tuning inference performance on consumer GPUs. Quantization experiments, memory allocation strategies, and measuring real-world latency.

Coming soon

Experiments with RAG Systems

Lessons from building retrieval-augmented generation pipelines. What worked, what failed, and why certain architectural choices mattered more than expected.