LLM Articles

Android Local LLM Inference: LiteRT, MediaPipe, Quantization, and Production Trade-offs

A practical guide to Android local LLM inference across LiteRT, ONNX Runtime Mobile, MediaPipe LLM Inference, INT4 quantization, GPU delegates, KV cache memory, and device fallback.

OpenClaw Agent Deep Dive: From Prompt Container to Schedulable Execution Unit

A systematic breakdown of the OpenClaw Agent object model, runtime state machine, Session tree, scheduling budgets, tool boundaries, and failure recovery.

OpenClaw Architecture: How Node, Tool, and Skill Make AI Executable

Starting from an OpenClaw technical discussion, this post breaks down the responsibility boundaries and call chain of Node, Tool, and Skill, and explains why Node design is key to moving AI from answering to executing.

OpenClaw Tools Permissions: Why Chat Works but Exec and Web Do Not

After an OpenClaw upgrade or fresh install, chat may work while shell execution and web access fail. This post explains the Tools permission model, exec security policy, and a practical troubleshooting path.

Prompt Cost Optimization: When to Write Long and When to Write Short

Detailed prompts are not always cheaper. This post examines token pricing, context decay, and human effort to provide a measurable way to decide when prompts should be long or short.

Prompt Engineering: From Core Principles to Frontier Practice

February 10, 2026

A practical guide to prompt engineering, covering KERNEL design principles, few-shot and Chain-of-Thought prompting, promptfoo evaluation, DSPy automation, and prompt injection defense.

Android On-device RAG: From Local Vector Databases to LLM Inference

December 18, 2025

A practical walkthrough of on-device Android RAG, covering document chunking, local vector search with SQLite, MediaPipe LLM inference, and performance trade-offs.

Android On-device LLM Context Window Engineering

December 17, 2025

A practical Android on-device LLM context management strategy covering layered prompt compression, summary caching, dialog state machines, and token budget allocation.

Android On-device LLM Streaming Output: From Tokens to Compose UI

December 16, 2025

A full-stack architecture for Android on-device LLM streaming output, covering KV Cache memory pressure, Kotlin Flow backpressure, and incremental Compose rendering.