Blog

Notes and reflections from hands-on builds. Architecture decisions, trade-offs, and lessons learned.

Building a Local AI Assistant

Notes from running speech recognition, an LLM, and TTS on a single RTX 5070 Ti. Covers memory constraints, quantization trade-offs, and architectural choices driven by 16GB of shared VRAM.

Read more →

Notes on LLM Inference Optimization

Observations from tuning inference performance on consumer GPUs. Quantization experiments, memory allocation strategies, and measuring real-world latency.

Experiments with RAG Systems

Lessons from building retrieval-augmented generation pipelines. What worked, what failed, and why certain architectural choices mattered more than expected.