Get started with Prompt API

This page describes how to do the following:

  • Configure your project to use Prompt API
  • Provide text-only input and receive a response
  • Provide an image input with related text input and receive a response

For more details about the Prompt API, see the reference documentation for Kotlin (com.google.mlkit.genai.prompt) and Java (com.google.mlkit.genai.prompt.java, com.google.mlkit.genai.prompt).

[Optional] Test prompt with the Gemma 3n model

Before implementation, consider testing your prompt in AI Studio with the Gemma 3n model. Gemma 3n's outputs can be expected to be very similar to outputs from nano-v3.

Configure project

Add the ML Kit Prompt API as a dependency in your build.gradle configuration:

implementation("com.google.mlkit:genai-prompt:1.0.0-alpha1")

Implement generative model

To implement the code in your project, follow these steps:

  • Create a generativeModel object:

    Kotlin

    // Get a GenerativeModel instance
    val generativeModel = Generation.getClient()
    

    Java

    // Get a GenerativeModel instance
    GenerativeModelFutures generativeModelFutures = GenerativeModelFutures
        .from(Generation.INSTANCE.getClient());
    
  • Check if Gemini Nano is AVAILABLE, DOWNLOADABLE, or UNAVAILABLE. Then, download the feature if it is downloadable:

    Kotlin

    val status = generativeModel.checkStatus()
    when (status) {
        FeatureStatus.UNAVAILABLE -> {
            // Gemini Nano not supported on this device or device hasn't fetched the latest configuration to support it
        }
    
        FeatureStatus.DOWNLOADABLE -> {
            // Gemini Nano can be downloaded on this device, but is not currently downloaded
            generativeModel.download().collect { status ->
                when (status) {
                    is DownloadStatus.DownloadStarted ->
                        Log.d(TAG, "starting download for Gemini Nano")
    
                    is DownloadStatus.DownloadProgress ->
                        Log.d(TAG, "Nano ${status.totalBytesDownloaded} bytes downloaded")
    
                    DownloadStatus.DownloadCompleted -> {
                        Log.d(TAG, "Gemini Nano download complete")
                        modelDownloaded = true
                    }
    
                    is DownloadStatus.DownloadFailed -> {
                        Log.e(TAG, "Nano download failed ${status.e.message}")
                    }
                }
            }
        }
    
        FeatureStatus.DOWNLOADING -> {
            // Gemini Nano currently being downloaded
        }
    
        FeatureStatus.AVAILABLE -> {
            // Gemini Nano currently downloaded and available to use on this device
        }
    }
    

    Java

    ListenableFuture<Integer> status = generativeModelFutures.checkStatus();
    Futures.addCallback(generativeModelFutures.checkStatus(), new FutureCallback<>() {
        @Override
        public void onSuccess(Integer featureStatus) {
            switch (featureStatus) {
                case FeatureStatus.AVAILABLE - > {
                    // Gemini Nano currently downloaded and available to use on this device
                }
                case FeatureStatus.UNAVAILABLE - > {
                    // Gemini Nano not supported on this device or device hasn't fetched the latest configuration to support it
                }
                case FeatureStatus.DOWNLOADING - > {
                    // Gemini Nano currently being downloaded
                }
                case FeatureStatus.DOWNLOADABLE - > {
                    generativeModelFutures.download(new DownloadCallback() {
                        @Override
                        public void onDownloadStarted(long l) {
                            Log.d(TAG, "starting download for Gemini Nano");
                        }
                        @Override
                        public void onDownloadProgress(long l) {
                            Log.d(TAG, "Nano " + l + " bytes downloaded");
                        }
                        @Override
                        public void onDownloadCompleted() {
                            Log.d(TAG, "Gemini Nano download complete");
                        }
                        @Override
                        public void onDownloadFailed(@NonNull GenAiException e) {
                            Log.e(TAG, "Nano download failed: " + e.getMessage());
                        }
                    });
                }
            }
        }
        @Override
        public void onFailure(@NonNull Throwable t) {
            // Failed to check status
        }
    }, ContextCompat.getMainExecutor(context));
    
    

Provide text-only input

Kotlin

val response = generativeModel.generateContent("Write a 3 sentence story about a magical dog.")

Java

GenerateContentResponse response = generativeModelFutures.generateContent(
  new GenerateContentRequest.Builder(
    new TextPart("Write a 3 sentence story about a magical dog."))
  .build())
  .get();

Alternatively, add optional parameters:

Kotlin

val response = generativeModel.generateContent(
    generateContentRequest(
        TextPart("Write a 3 sentence story about a magical dog."),
    ) {
        // Optional parameters
        temperature = 0.2f
        topK = 10
        candidateCount = 3
    },
)

Java

GenerateContentRequest.Builder requestBuilder =
        new GenerateContentRequest.Builder(
                new TextPart("Write a 3 sentence story about a magical dog."));
requestBuilder.setTemperature(.2f);
requestBuilder.setTopK(10);
requestBuilder.setCandidateCount(3);

GenerateContentResponse response =
        generativeModelFutures.generateContent(requestBuilder.build()).get();

For more information about the optional parameters, see Optional configurations.

Provide multimodal (image and text) input

Bundle an image and a text input together in the generateContentRequest() function, with the text prompt being a question or command related to the image:

Kotlin

val response = generativeModel.generateContent(
    generateContentRequest(ImagePart(bitmap), TextPart(textPrompt)) {
        // optional parameters
        ...
    },
)

Java

GenerateContentResponse response = generativeModelFutures.generateContent(
    new GenerateContentRequest.Builder(
        new ImagePart(bitmap),
        new TextPart("textPrompt"))
    // optional parameters
    .build())
.get();

Process inference result

  • Run the inference and retrieve the result. You can choose to either wait for the full result or stream the response as it's generated for both text-only and multimodal prompts.

    • This uses non-streaming inference, which retrieves the entire result from the AI model before returning the result:

    Kotlin

    // Call the AI model to generate content and store the complete
    // in a new variable named 'response' once it's finished
    val response = generativeModel.generateContent("Write a 3 sentence story about a magical dog")
    

    Java

    GenerateContentResponse response = generativeModelFutures.generateContent(
            new GenerateContentRequest.Builder(
                    new TextPart("Write a 3 sentence story about a magical dog."))
                    .build())
            .get();
    
    • The following snippets are examples of using streaming inference, which retrieves the result in chunks as it's being generated:

    Kotlin

    // Streaming inference
    var fullResponse = ""
    generativeModel.generateContentStream("Write a 3 sentence story about a magical dog").collect { chunk ->
        val newChunkReceived = chunk.candidates[0].text
        print(newChunkReceived)
        fullResponse += newChunkReceived
    }
    

    Java

    // Streaming inference
    StringBuilder fullResponse = new StringBuilder();
    generativeModelFutures.generateContent(new GenerateContentRequest.Builder(
        (new TextPart("Write a 3 sentence story about a magical dog"))).build(),
            chunk -> {
                Log.d(TAG, chunk);
                fullResponse.append(chunk);
            });
    

For more information about streaming and non-streaming inference, see Streaming versus non-streaming.

Latency optimization

To optimize for the first inference call, your application may optionally call warmup(). This loads Gemini Nano into memory and initializes runtime components.

Optional configurations

As part of each GenerateContentRequest, you can set the following optional parameters:

  • temperature : Controls the degree of randomness in token selection.
  • seed : Enables generating stable and deterministic results.
  • topK : Controls randomness and diversity in results.
  • candidateCount : Requests the number of unique responses returned. Note that the exact number of responses may not be the same as candidateCount because duplicate responses are automatically removed.
  • maxOutputTokens : Defines the maximum number of tokens that can be generated in the response.

For more guidance on setting optional configurations, see GenerateContentRequest.

Supported features and limitations

  • Input must be under 4000 tokens (or approximately 3000 English words). For more information, see the countTokens reference.
  • Use cases that require long output (more than 256 tokens) should be avoided.
  • AICore enforces an inference quota per app. For more information, see Quota per application.
  • The following languages have been validated for Prompt API:
    • English
    • Korean

Common setup issues

ML Kit GenAI APIs rely on the Android AICore app to access Gemini Nano. When a device is just setup (including reset), or the AICore app is just reset (e.g. clear data, uninstalled then reinstalled), the AICore app may not have enough time to finish initialization (including downloading latest configurations from server). As a result, the ML Kit GenAI APIs may not function as expected. Here are the common setup error messages you may see and how to handle them:

Example error message How to handle
AICore failed with error type 4-CONNECTION_ERROR and error code 601-BINDING_FAILURE: AICore service failed to bind. This could happen when you install the app using ML Kit GenAI APIs immediately after device setup or when AICore is uninstalled after your app is installed. Updating AICore app then reinstalling your app should fix it.
AICore failed with error type 3-PREPARATION_ERROR and error code 606-FEATURE_NOT_FOUND: Feature ... is not available. This could happen when AICore hasn't finished downloading the latest configurations. When the device is connected to the internet, it usually takes a few minutes to a few hours to update. Restarting the device can speed up the update.

Note that if the device's bootloader is unlocked, you'll also see this error—this API does not support devices with unlocked bootloaders.
AICore failed with error type 1-DOWNLOAD_ERROR and error code 0-UNKNOWN: Feature ... failed with failure status 0 and error esz: UNAVAILABLE: Unable to resolve host ... Keep network connection, wait for a few minutes and retry.