On-device AI Articles
Android On-device AI Real-time Video: CameraX Frames, GPU Preprocessing, and LiteRT Inference
A practical end-to-end Android real-time video AI pipeline, covering CameraX head-of-line blocking, GPU YUV preprocessing, LiteRT inference jitter, async staging, and frame-expiration control.
Read Post
Android On-device AI Memory Management: Model Loading Peaks, Tensor Lifetimes, and KV Cache Reclaim
A practical memory-management path for Android on-device LLM deployment, covering mmap model loading, tensor lifecycle reclamation, sliding-window KV cache, layer-wise decay, and LMK survival.
Read Post
Android On-device AI Prompt Engineering: Token Budgets, Few-shot Compression, and TTFT Control
A practical Android on-device LLM prompt-engineering guide showing how token budgeting, few-shot template compression, and dynamic budget switching reduced first-token latency from 8.7 seconds to under 2 seconds.
Read Post
Android On-device AI Chat Compose UI Architecture: Streaming Rendering and Multi-turn Conversation State
Compose UI patterns for on-device LLM chat apps, using token buffering, state isolation, and a single source of truth to keep streaming output smooth.
Read Post
Android On-device Speech Recognition: From SpeechRecognizer to Android 16 ASR
A full-stack look at Android on-device speech recognition, from AudioRecord capture and SpeechRecognizer APIs to Android 16's built-in ASR engine.
Read Post
Android On-device AI Model Delivery and Version Management
A practical model delivery architecture that decouples on-device AI models from APK releases with three-layer versioning, BSDiff incremental updates, and hot rollback.
Read Post
Android On-device AI Power and Thermal Management: From SoC DVFS to Thermal Throttling
A practical look at sustained on-device LLM inference, GPU power profiles, DVFS scheduling, thermal throttling, and thermal-aware load scheduling that reduced P99 latency from 890ms to 380ms.
Read Post
Android On-device AI Memory Bandwidth: GPU Shared Memory to NPU Zero-copy
A practical guide to Android on-device AI memory-bandwidth optimization, from camera-to-GPU data movement to AHardwareBuffer, ION reuse, and NPU zero-copy paths.
Read Post
Android On-device AI Profiling with Perfetto: NPU Scheduling and Memory Bandwidth
A Perfetto-based profiling method for Android on-device AI inference, tracing NPU scheduling, GPU counters, DRM contention, and memory bandwidth bottlenecks.
Read Post
Android On-device AI Image Preprocessing: From Bitmap Pixels to Tensor Input
A full Android on-device AI preprocessing pipeline from Bitmap pixels to tensor input, covering memory layout, pixel conversion, resize choices, normalization, and zero-copy optimization.
Read Post