Recognize Text in Images Securely with Cloud Vision using Firebase Auth and Functions on Android

In order to call a Google Cloud API from your app, you need to create an intermediate REST API that handles authorization and protects secret values such as API keys. You then need to write code in your mobile app to authenticate to and communicate with this intermediate service.

One way to create this REST API is by using Firebase Authentication and Functions, which gives you a managed, serverless gateway to Google Cloud APIs that handles authentication and can be called from your mobile app with pre-built SDKs.

This guide demonstrates how to use this technique to call the Cloud Vision API from your app. This method will allow all authenticated users to access Cloud Vision billed services through your Cloud project, so consider whether this auth mechanism is sufficient for your use case before proceeding.

Before you begin

Configure your project

  1. If you haven't already, add Firebase to your Android project.
  2. If you haven't already enabled Cloud-based APIs for your project, do so now:

    1. Open the Firebase ML APIs page in the Firebase console.
    2. If you haven't already upgraded your project to the pay-as-you-go Blaze pricing plan, click Upgrade to do so. (You'll be prompted to upgrade only if your project isn't on the Blaze pricing plan.)

      Only projects on the Blaze pricing plan can use Cloud-based APIs.

    3. If Cloud-based APIs aren't already enabled, click Enable Cloud-based APIs.
  3. Configure your existing Firebase API keys to disallow access to the Cloud Vision API:
    1. Open the Credentials page of the Cloud console.
    2. For each API key in the list, open the editing view, and in the Key Restrictions section, add all of the available APIs except the Cloud Vision API to the list.

Deploy the callable function

Next, deploy the Cloud Function you will use to bridge your app and the Cloud Vision API. The functions-samples repository contains an example you can use.

By default, accessing the Cloud Vision API through this function will allow only authenticated users of your app access to the Cloud Vision API. You can modify the function for different requirements.

To deploy the function:

  1. Clone or download the functions-samples repo and change to the Node-1st-gen/vision-annotate-image directory:
    git clone https://github.com/firebase/functions-samplescd Node-1st-gen/vision-annotate-image
  2. Install dependencies:
    cd functionsnpm installcd ..
  3. If you don't have the Firebase CLI, install it.
  4. Initialize a Firebase project in the vision-annotate-image directory. When prompted, select your project in the list.
    firebase init
  5. Deploy the function:
    firebase deploy --only functions:annotateImage

Add Firebase Auth to your app

The callable function deployed above will reject any request from non-authenticated users of your app. If you have not already done so, you will need to add Firebase Auth to your app.

Add necessary dependencies to your app

  • Add the dependencies for the Cloud Functions for Firebase (client) and gson Android libraries to your module (app-level) Gradle file (usually <project>/<app-module>/build.gradle.kts or <project>/<app-module>/build.gradle):
    implementation("com.google.firebase:firebase-functions:21.2.1")implementation("com.google.code.gson:gson:2.8.6")
  • Now you are ready to start recognizing text in images.

    1. Prepare the input image

    In order to call Cloud Vision, the image must be formatted as a base64-encoded string. To process an image from a saved file URI:
    1. Get the image as a Bitmap object:

      Kotlin

      varbitmap:Bitmap=MediaStore.Images.Media.getBitmap(contentResolver,uri)

      Java

      Bitmapbitmap=MediaStore.Images.Media.getBitmap(getContentResolver(),uri);
    2. Optionally, scale down the image to save on bandwidth. See the Cloud Vision recommended image sizes.

      Kotlin

      privatefunscaleBitmapDown(bitmap:Bitmap,maxDimension:Int):Bitmap{valoriginalWidth=bitmap.widthvaloriginalHeight=bitmap.heightvarresizedWidth=maxDimensionvarresizedHeight=maxDimensionif(originalHeight > originalWidth){resizedHeight=maxDimensionresizedWidth=(resizedHeight*originalWidth.toFloat()/originalHeight.toFloat()).toInt()}elseif(originalWidth > originalHeight){resizedWidth=maxDimensionresizedHeight=(resizedWidth*originalHeight.toFloat()/originalWidth.toFloat()).toInt()}elseif(originalHeight==originalWidth){resizedHeight=maxDimensionresizedWidth=maxDimension}returnBitmap.createScaledBitmap(bitmap,resizedWidth,resizedHeight,false)}

      Java

      privateBitmapscaleBitmapDown(Bitmapbitmap,intmaxDimension){intoriginalWidth=bitmap.getWidth();intoriginalHeight=bitmap.getHeight();intresizedWidth=maxDimension;intresizedHeight=maxDimension;if(originalHeight > originalWidth){resizedHeight=maxDimension;resizedWidth=(int)(resizedHeight*(float)originalWidth/(float)originalHeight);}elseif(originalWidth > originalHeight){resizedWidth=maxDimension;resizedHeight=(int)(resizedWidth*(float)originalHeight/(float)originalWidth);}elseif(originalHeight==originalWidth){resizedHeight=maxDimension;resizedWidth=maxDimension;}returnBitmap.createScaledBitmap(bitmap,resizedWidth,resizedHeight,false);}

      Kotlin

      // Scale down bitmap sizebitmap=scaleBitmapDown(bitmap,640)

      Java

      // Scale down bitmap sizebitmap=scaleBitmapDown(bitmap,640);
    3. Convert the bitmap object to a base64 encoded string:

      Kotlin

      // Convert bitmap to base64 encoded stringvalbyteArrayOutputStream=ByteArrayOutputStream()bitmap.compress(Bitmap.CompressFormat.JPEG,100,byteArrayOutputStream)valimageBytes:ByteArray=byteArrayOutputStream.toByteArray()valbase64encoded=Base64.encodeToString(imageBytes,Base64.NO_WRAP)

      Java

      // Convert bitmap to base64 encoded stringByteArrayOutputStreambyteArrayOutputStream=newByteArrayOutputStream();bitmap.compress(Bitmap.CompressFormat.JPEG,100,byteArrayOutputStream);byte[]imageBytes=byteArrayOutputStream.toByteArray();Stringbase64encoded=Base64.encodeToString(imageBytes,Base64.NO_WRAP);
    4. The image represented by the Bitmap object must be upright, with no additional rotation required.

    2. Invoke the callable function to recognize text

    To recognize text in an image, invoke the callable function, passing a JSON Cloud Vision request.

    1. First, initialize an instance of Cloud Functions:

      Kotlin

      privatelateinitvarfunctions:FirebaseFunctions// ...functions=Firebase.functions

      Java

      privateFirebaseFunctionsmFunctions;// ...mFunctions=FirebaseFunctions.getInstance();
    2. Define a method for invoking the function:

      Kotlin

      privatefunannotateImage(requestJson:String):Task<JsonElement>{returnfunctions.getHttpsCallable("annotateImage").call(requestJson).continueWith{task-> // This continuation runs on either success or failure, but if the task// has failed then result will throw an Exception which will be// propagated down.valresult=task.result?.dataJsonParser.parseString(Gson().toJson(result))}}

      Java

      privateTask<JsonElement>annotateImage(StringrequestJson){returnmFunctions.getHttpsCallable("annotateImage").call(requestJson).continueWith(newContinuation<HttpsCallableResult,JsonElement>(){@OverridepublicJsonElementthen(@NonNullTask<HttpsCallableResult>task){// This continuation runs on either success or failure, but if the task// has failed then getResult() will throw an Exception which will be// propagated down.returnJsonParser.parseString(newGson().toJson(task.getResult().getData()));}});}
    3. Create the JSON request. The Cloud Vision API supports two Types of text detection: TEXT_DETECTION and DOCUMENT_TEXT_DETECTION. See the Cloud Vision OCR Docs for the difference between the two use cases.

      Kotlin

      // Create json request to cloud visionvalrequest=JsonObject()// Add image to requestvalimage=JsonObject()image.add("content",JsonPrimitive(base64encoded))request.add("image",image)// Add features to the requestvalfeature=JsonObject()feature.add("type",JsonPrimitive("TEXT_DETECTION"))// Alternatively, for DOCUMENT_TEXT_DETECTION:// feature.add("type", JsonPrimitive("DOCUMENT_TEXT_DETECTION"))valfeatures=JsonArray()features.add(feature)request.add("features",features)

      Java

      // Create json request to cloud visionJsonObjectrequest=newJsonObject();// Add image to requestJsonObjectimage=newJsonObject();image.add("content",newJsonPrimitive(base64encoded));request.add("image",image);//Add features to the requestJsonObjectfeature=newJsonObject();feature.add("type",newJsonPrimitive("TEXT_DETECTION"));// Alternatively, for DOCUMENT_TEXT_DETECTION://feature.add("type", new JsonPrimitive("DOCUMENT_TEXT_DETECTION"));JsonArrayfeatures=newJsonArray();features.add(feature);request.add("features",features);

      Optionally, provide language hints to assist with language detection (see supported languages):

      Kotlin

      valimageContext=JsonObject()vallanguageHints=JsonArray()languageHints.add("en")imageContext.add("languageHints",languageHints)request.add("imageContext",imageContext)

      Java

      JsonObjectimageContext=newJsonObject();JsonArraylanguageHints=newJsonArray();languageHints.add("en");imageContext.add("languageHints",languageHints);request.add("imageContext",imageContext);
    4. Finally, invoke the function:

      Kotlin

      annotateImage(request.toString()).addOnCompleteListener{task-> if(!task.isSuccessful){// Task failed with an exception// ...}else{// Task completed successfully// ...}}

      Java

      annotateImage(request.toString()).addOnCompleteListener(newOnCompleteListener<JsonElement>(){@OverridepublicvoidonComplete(@NonNullTask<JsonElement>task){if(!task.isSuccessful()){// Task failed with an exception// ...}else{// Task completed successfully// ...}}});

    3. Extract text from blocks of recognized text

    If the text recognition operation succeeds, a JSON response of BatchAnnotateImagesResponse will be returned in the task's result. The text annotations can be found in the fullTextAnnotation object.

    You can get the recognized text as a string in the text field. For example:

    Kotlin

    valannotation=task.result!!.asJsonArray[0].asJsonObject["fullTextAnnotation"].asJsonObjectSystem.out.format("%nComplete annotation:")System.out.format("%n%s",annotation["text"].asString)

    Java

    JsonObjectannotation=task.getResult().getAsJsonArray().get(0).getAsJsonObject().get("fullTextAnnotation").getAsJsonObject();System.out.format("%nComplete annotation:%n");System.out.format("%s%n",annotation.get("text").getAsString());

    You can also get information specific to regions of the image. For each block, paragraph, word, and symbol, you can get the text recognized in the region and the bounding coordinates of the region. For example:

    Kotlin

    for(pageinannotation["pages"].asJsonArray){varpageText=""for(blockinpage.asJsonObject["blocks"].asJsonArray){varblockText=""for(parainblock.asJsonObject["paragraphs"].asJsonArray){varparaText=""for(wordinpara.asJsonObject["words"].asJsonArray){varwordText=""for(symbolinword.asJsonObject["symbols"].asJsonArray){wordText+=symbol.asJsonObject["text"].asStringSystem.out.format("Symbol text: %s (confidence: %f)%n",symbol.asJsonObject["text"].asString,symbol.asJsonObject["confidence"].asFloat,)}System.out.format("Word text: %s (confidence: %f)%n%n",wordText,word.asJsonObject["confidence"].asFloat,)System.out.format("Word bounding box: %s%n",word.asJsonObject["boundingBox"])paraText=String.format("%s%s ",paraText,wordText)}System.out.format("%nParagraph: %n%s%n",paraText)System.out.format("Paragraph bounding box: %s%n",para.asJsonObject["boundingBox"])System.out.format("Paragraph Confidence: %f%n",para.asJsonObject["confidence"].asFloat)blockText+=paraText}pageText+=blockText}}

    Java

    for(JsonElementpage:annotation.get("pages").getAsJsonArray()){StringBuilderpageText=newStringBuilder();for(JsonElementblock:page.getAsJsonObject().get("blocks").getAsJsonArray()){StringBuilderblockText=newStringBuilder();for(JsonElementpara:block.getAsJsonObject().get("paragraphs").getAsJsonArray()){StringBuilderparaText=newStringBuilder();for(JsonElementword:para.getAsJsonObject().get("words").getAsJsonArray()){StringBuilderwordText=newStringBuilder();for(JsonElementsymbol:word.getAsJsonObject().get("symbols").getAsJsonArray()){wordText.append(symbol.getAsJsonObject().get("text").getAsString());System.out.format("Symbol text: %s (confidence: %f)%n",symbol.getAsJsonObject().get("text").getAsString(),symbol.getAsJsonObject().get("confidence").getAsFloat());}System.out.format("Word text: %s (confidence: %f)%n%n",wordText.toString(),word.getAsJsonObject().get("confidence").getAsFloat());System.out.format("Word bounding box: %s%n",word.getAsJsonObject().get("boundingBox"));paraText.append(wordText.toString()).append(" ");}System.out.format("%nParagraph:%n%s%n",paraText);System.out.format("Paragraph bounding box: %s%n",para.getAsJsonObject().get("boundingBox"));System.out.format("Paragraph Confidence: %f%n",para.getAsJsonObject().get("confidence").getAsFloat());blockText.append(paraText);}pageText.append(blockText);}}