On-device Inference Articles
Android Multimodal On-device AI: Gemini Nano, Image Tokens, Streaming, and Compose
A practical Android multimodal AI walkthrough covering Gemini Nano Multimodality, AICore model loading, image preprocessing, ViT tokenization, streaming inference, Compose rendering, memory, and thermals.
Read Post
Designing an On-Device LLM Inference Scheduler: Priority Queues and Backpressure in Practice
This article shows how to build a scheduling layer above an on-device inference engine, using priority queues, preemption, and backpressure to avoid OOMs, unpredictable latency, and out-of-order results.
Read Post
Android Local LLM Inference: LiteRT, MediaPipe, Quantization, and Production Trade-offs
A practical guide to Android local LLM inference across LiteRT, ONNX Runtime Mobile, MediaPipe LLM Inference, INT4 quantization, GPU delegates, KV cache memory, and device fallback.
Read Post
Android On-device LLM Context Window Engineering
A practical Android on-device LLM context management strategy covering layered prompt compression, summary caching, dialog state machines, and token budget allocation.
Read Post
Inside Android TTS: From TextToSpeech API to On-Device Vocoders
A full-stack look at Android TTS, including engine binding, synthesis callbacks, on-device HiFi-GAN vocoders, streaming playback, and TTFA tuning.
Read Post
Android ML Kit in Practice: Vision Pipelines and CameraX Integration
A practical guide to ML Kit for on-device vision, covering detection pipelines, CameraX integration, multi-model orchestration, and inference optimization.
Read Post
Android AICore and Gemini Nano: System Services, Safety Filters, and LoRA Adaptation
A deep dive into Google AICore for Gemini Nano on Android, covering APEX delivery, permission isolation, model distribution, safety filtering, session management, and LoRA adapters.
Read Post