Gemini Nano on Android

This topic focuses on Gemini Nano and AICore on Android.

Android AI engineering is moving from “what is Gemini Nano?” to “how do we ship on-device generative AI inside a real app?” This page organizes notes around Gemini Nano, AICore, ML Kit GenAI APIs, Android on-device AI, local LLM inference, RAG, and multimodal interaction.

First Decide Whether On-device AI Fits

On-device AI is strongest when latency, offline use, privacy, and predictable inference cost matter. Good candidates include summarization, rewriting, image description, speech recognition, smart input, local content retrieval, and small RAG workflows.

It is not a good fit for simply copying every cloud LLM capability onto a phone. Long-context reasoning, complex multi-step planning, and large-scale knowledge retrieval still often need cloud assistance or a hybrid route.

Technical Entry Points

  1. AICore: a system-level service for model access, updates, security, and hardware acceleration.
  2. Gemini Nano: the Gemini model family designed for local, low-latency, privacy-first tasks.
  3. ML Kit GenAI APIs: higher-level capability APIs that abstract part of the model-version complexity.
  4. AI Edge, LiteRT, and MediaPipe LLM: better suited for custom local inference pipelines.
  5. Compose UI: useful for streaming output, multi-turn conversations, multimodal input, and state feedback.

Core Reading

Performance and Production Concerns

Official References

  • Compose-first Migration: local AI chat, streaming output, and multimodal interaction usually need a solid Compose UI architecture.
  • Android Performance: local models expose memory, temperature, power, and frame-rate problems quickly.