On-device AI Articles

Android On-device AI Real-time Video: CameraX Frames, GPU Preprocessing, and LiteRT Inference

A practical end-to-end Android real-time video AI pipeline, covering CameraX head-of-line blocking, GPU YUV preprocessing, LiteRT inference jitter, async staging, and frame-expiration control.

Android On-device AI Memory Management: Model Loading Peaks, Tensor Lifetimes, and KV Cache Reclaim

A practical memory-management path for Android on-device LLM deployment, covering mmap model loading, tensor lifecycle reclamation, sliding-window KV cache, layer-wise decay, and LMK survival.

Android On-device AI Prompt Engineering: Token Budgets, Few-shot Compression, and TTFT Control

A practical Android on-device LLM prompt-engineering guide showing how token budgeting, few-shot template compression, and dynamic budget switching reduced first-token latency from 8.7 seconds to under 2 seconds.

Android On-device AI Chat Compose UI Architecture: Streaming Rendering and Multi-turn Conversation State

Compose UI patterns for on-device LLM chat apps, using token buffering, state isolation, and a single source of truth to keep streaming output smooth.

Android On-device Speech Recognition: From SpeechRecognizer to Android 16 ASR

A full-stack look at Android on-device speech recognition, from AudioRecord capture and SpeechRecognizer APIs to Android 16's built-in ASR engine.

Android On-device AI Model Delivery and Version Management

A practical model delivery architecture that decouples on-device AI models from APK releases with three-layer versioning, BSDiff incremental updates, and hot rollback.

Android On-device AI Power and Thermal Management: From SoC DVFS to Thermal Throttling

A practical look at sustained on-device LLM inference, GPU power profiles, DVFS scheduling, thermal throttling, and thermal-aware load scheduling that reduced P99 latency from 890ms to 380ms.

Android On-device AI Memory Bandwidth: GPU Shared Memory to NPU Zero-copy

A practical guide to Android on-device AI memory-bandwidth optimization, from camera-to-GPU data movement to AHardwareBuffer, ION reuse, and NPU zero-copy paths.

Android On-device AI Profiling with Perfetto: NPU Scheduling and Memory Bandwidth

A Perfetto-based profiling method for Android on-device AI inference, tracing NPU scheduling, GPU counters, DRM contention, and memory bandwidth bottlenecks.

Android On-device AI Image Preprocessing: From Bitmap Pixels to Tensor Input

A full Android on-device AI preprocessing pipeline from Bitmap pixels to tensor input, covering memory layout, pixel conversion, resize choices, normalization, and zero-copy optimization.