You can use Firebase ML to recognize text in images. Firebase ML has both a general-purpose API suitable for recognizing text in images, such as the text of a street sign, and an API optimized for recognizing the text of documents.
Before you begin
- If you have not already added Firebase to your app, do so by following the steps in the getting started guide.
- In Xcode, with your app project open, navigate to File > Add Packages.
- When prompted, add the Firebase Apple platforms SDK repository:
- Choose the Firebase ML library.
- Add the
-ObjC
flag to the Other Linker Flags section of your target's build settings. - When finished, Xcode will automatically begin resolving and downloading your dependencies in the background.
- In your app, import Firebase:
Swift
importFirebaseMLModelDownloader
Objective-C
@importFirebaseMLModelDownloader;
If you haven't already enabled Cloud-based APIs for your project, do so now:
- Open the Firebase ML APIs page in the Firebase console.
If you haven't already upgraded your project to the pay-as-you-go Blaze pricing plan, click Upgrade to do so. (You'll be prompted to upgrade only if your project isn't on the Blaze pricing plan.)
Only projects on the Blaze pricing plan can use Cloud-based APIs.
- If Cloud-based APIs aren't already enabled, click Enable Cloud-based APIs.
Use Swift Package Manager to install and manage Firebase dependencies.
https://github.com/firebase/firebase-ios-sdk.git
Next, perform some in-app setup:
Now you are ready to start recognizing text in images.
Input image guidelines
For Firebase ML to accurately recognize text, input images must contain text that is represented by sufficient pixel data. Ideally, for Latin text, each character should be at least 16x16 pixels. For Chinese, Japanese, and Korean text, each character should be 24x24 pixels. For all languages, there is generally no accuracy benefit for characters to be larger than 24x24 pixels.
So, for example, a 640x480 image might work well to scan a business card that occupies the full width of the image. To scan a document printed on letter-sized paper, a 720x1280 pixel image might be required.
Poor image focus can hurt text recognition accuracy. If you aren't getting acceptable results, try asking the user to recapture the image.
Recognize text in images
To recognize text in an image, run the text recognizer as described below.
1. Run the text recognizer
Pass the image as aUIImage
or a CMSampleBufferRef
to the VisionTextRecognizer
's process(_:completion:)
method:- Get an instance of
VisionTextRecognizer
by callingcloudTextRecognizer
:Swift
letvision=Vision.vision()lettextRecognizer=vision.cloudTextRecognizer()// Or, to provide language hints to assist with language detection:// See https://cloud.google.com/vision/docs/languages for supported languagesletoptions=VisionCloudTextRecognizerOptions()options.languageHints=["en","hi"]lettextRecognizer=vision.cloudTextRecognizer(options:options)
Objective-C
FIRVision*vision=[FIRVisionvision];FIRVisionTextRecognizer*textRecognizer=[visioncloudTextRecognizer];// Or, to provide language hints to assist with language detection:// See https://cloud.google.com/vision/docs/languages for supported languagesFIRVisionCloudTextRecognizerOptions*options=[[FIRVisionCloudTextRecognizerOptionsalloc]init];options.languageHints=@[@"en",@"hi"];FIRVisionTextRecognizer*textRecognizer=[visioncloudTextRecognizerWithOptions:options];
- In order to call Cloud Vision, the image must be formatted as a base64-encoded string. To process a
UIImage
:Swift
guardletimageData=uiImage.jpegData(compressionQuality:1.0)else{return}letbase64encodedImage=imageData.base64EncodedString()
Objective-C
NSData*imageData=UIImageJPEGRepresentation(uiImage,1.0f);NSString*base64encodedImage=[imageDatabase64EncodedStringWithOptions:NSDataBase64Encoding76CharacterLineLength];
- Then, pass the image to the
process(_:completion:)
method:Swift
textRecognizer.process(visionImage){result,erroringuarderror==nil,letresult=resultelse{// ...return}// Recognized text}
Objective-C
[textRecognizerprocessImage:imagecompletion:^(FIRVisionText*_Nullableresult,NSError*_Nullableerror){if(error!=nil||result==nil){// ...return;}// Recognized text}];
2. Extract text from blocks of recognized text
If the text recognition operation succeeds, it will return aVisionText
object. A VisionText
object contains the full text recognized in the image and zero or more VisionTextBlock
objects.Each VisionTextBlock
represents a rectangular block of text, which contain zero or more VisionTextLine
objects. Each VisionTextLine
object contains zero or more VisionTextElement
objects, which represent words and word-like entities (dates, numbers, and so on).
For each VisionTextBlock
, VisionTextLine
, and VisionTextElement
object, you can get the text recognized in the region and the bounding coordinates of the region.
For example:
Swift
letresultText=result.textforblockinresult.blocks{letblockText=block.textletblockConfidence=block.confidenceletblockLanguages=block.recognizedLanguagesletblockCornerPoints=block.cornerPointsletblockFrame=block.frameforlineinblock.lines{letlineText=line.textletlineConfidence=line.confidenceletlineLanguages=line.recognizedLanguagesletlineCornerPoints=line.cornerPointsletlineFrame=line.frameforelementinline.elements{letelementText=element.textletelementConfidence=element.confidenceletelementLanguages=element.recognizedLanguagesletelementCornerPoints=element.cornerPointsletelementFrame=element.frame}}}
Objective-C
NSString*resultText=result.text;for(FIRVisionTextBlock*blockinresult.blocks){NSString*blockText=block.text;NSNumber*blockConfidence=block.confidence;NSArray<FIRVisionTextRecognizedLanguage*>*blockLanguages=block.recognizedLanguages;NSArray<NSValue*>*blockCornerPoints=block.cornerPoints;CGRectblockFrame=block.frame;for(FIRVisionTextLine*lineinblock.lines){NSString*lineText=line.text;NSNumber*lineConfidence=line.confidence;NSArray<FIRVisionTextRecognizedLanguage*>*lineLanguages=line.recognizedLanguages;NSArray<NSValue*>*lineCornerPoints=line.cornerPoints;CGRectlineFrame=line.frame;for(FIRVisionTextElement*elementinline.elements){NSString*elementText=element.text;NSNumber*elementConfidence=element.confidence;NSArray<FIRVisionTextRecognizedLanguage*>*elementLanguages=element.recognizedLanguages;NSArray<NSValue*>*elementCornerPoints=element.cornerPoints;CGRectelementFrame=element.frame;}}}
Next steps
- Before you deploy to production an app that uses a Cloud API, you should take some additional steps to prevent and mitigate the effect of unauthorized API access.
Recognize text in images of documents
To recognize the text of a document, configure and run the document text recognizer as described below.
The document text recognition API, described below, provides an interface that is intended to be more convenient for working with images of documents. However, if you prefer the interface provided by the sparse text API, you can use it instead to scan documents by configuring the cloud text recognizer to use the dense text model.
To use the document text recognition API:
1. Run the text recognizer
Pass the image as aUIImage
or a CMSampleBufferRef
to the VisionDocumentTextRecognizer
's process(_:completion:)
method:- Get an instance of
VisionDocumentTextRecognizer
by callingcloudDocumentTextRecognizer
:Swift
letvision=Vision.vision()lettextRecognizer=vision.cloudDocumentTextRecognizer()// Or, to provide language hints to assist with language detection:// See https://cloud.google.com/vision/docs/languages for supported languagesletoptions=VisionCloudDocumentTextRecognizerOptions()options.languageHints=["en","hi"]lettextRecognizer=vision.cloudDocumentTextRecognizer(options:options)
Objective-C
FIRVision*vision=[FIRVisionvision];FIRVisionDocumentTextRecognizer*textRecognizer=[visioncloudDocumentTextRecognizer];// Or, to provide language hints to assist with language detection:// See https://cloud.google.com/vision/docs/languages for supported languagesFIRVisionCloudDocumentTextRecognizerOptions*options=[[FIRVisionCloudDocumentTextRecognizerOptionsalloc]init];options.languageHints=@[@"en",@"hi"];FIRVisionDocumentTextRecognizer*textRecognizer=[visioncloudDocumentTextRecognizerWithOptions:options];
- In order to call Cloud Vision, the image must be formatted as a base64-encoded string. To process a
UIImage
:Swift
guardletimageData=uiImage.jpegData(compressionQuality:1.0)else{return}letbase64encodedImage=imageData.base64EncodedString()
Objective-C
NSData*imageData=UIImageJPEGRepresentation(uiImage,1.0f);NSString*base64encodedImage=[imageDatabase64EncodedStringWithOptions:NSDataBase64Encoding76CharacterLineLength];
- Then, pass the image to the
process(_:completion:)
method:Swift
textRecognizer.process(visionImage){result,erroringuarderror==nil,letresult=resultelse{// ...return}// Recognized text}
Objective-C
[textRecognizerprocessImage:imagecompletion:^(FIRVisionDocumentText*_Nullableresult,NSError*_Nullableerror){if(error!=nil||result==nil){// ...return;}// Recognized text}];
2. Extract text from blocks of recognized text
If the text recognition operation succeeds, it will return aVisionDocumentText
object. A VisionDocumentText
object contains the full text recognized in the image and a hierarchy of objects that reflect the structure of the recognized document:For each VisionDocumentTextBlock
, VisionDocumentTextParagraph
, VisionDocumentTextWord
, and VisionDocumentTextSymbol
object, you can get the text recognized in the region and the bounding coordinates of the region.
For example:
Swift
letresultText=result.textforblockinresult.blocks{letblockText=block.textletblockConfidence=block.confidenceletblockRecognizedLanguages=block.recognizedLanguagesletblockBreak=block.recognizedBreakletblockCornerPoints=block.cornerPointsletblockFrame=block.frameforparagraphinblock.paragraphs{letparagraphText=paragraph.textletparagraphConfidence=paragraph.confidenceletparagraphRecognizedLanguages=paragraph.recognizedLanguagesletparagraphBreak=paragraph.recognizedBreakletparagraphCornerPoints=paragraph.cornerPointsletparagraphFrame=paragraph.frameforwordinparagraph.words{letwordText=word.textletwordConfidence=word.confidenceletwordRecognizedLanguages=word.recognizedLanguagesletwordBreak=word.recognizedBreakletwordCornerPoints=word.cornerPointsletwordFrame=word.frameforsymbolinword.symbols{letsymbolText=symbol.textletsymbolConfidence=symbol.confidenceletsymbolRecognizedLanguages=symbol.recognizedLanguagesletsymbolBreak=symbol.recognizedBreakletsymbolCornerPoints=symbol.cornerPointsletsymbolFrame=symbol.frame}}}}
Objective-C
NSString*resultText=result.text;for(FIRVisionDocumentTextBlock*blockinresult.blocks){NSString*blockText=block.text;NSNumber*blockConfidence=block.confidence;NSArray<FIRVisionTextRecognizedLanguage*>*blockRecognizedLanguages=block.recognizedLanguages;FIRVisionTextRecognizedBreak*blockBreak=block.recognizedBreak;CGRectblockFrame=block.frame;for(FIRVisionDocumentTextParagraph*paragraphinblock.paragraphs){NSString*paragraphText=paragraph.text;NSNumber*paragraphConfidence=paragraph.confidence;NSArray<FIRVisionTextRecognizedLanguage*>*paragraphRecognizedLanguages=paragraph.recognizedLanguages;FIRVisionTextRecognizedBreak*paragraphBreak=paragraph.recognizedBreak;CGRectparagraphFrame=paragraph.frame;for(FIRVisionDocumentTextWord*wordinparagraph.words){NSString*wordText=word.text;NSNumber*wordConfidence=word.confidence;NSArray<FIRVisionTextRecognizedLanguage*>*wordRecognizedLanguages=word.recognizedLanguages;FIRVisionTextRecognizedBreak*wordBreak=word.recognizedBreak;CGRectwordFrame=word.frame;for(FIRVisionDocumentTextSymbol*symbolinword.symbols){NSString*symbolText=symbol.text;NSNumber*symbolConfidence=symbol.confidence;NSArray<FIRVisionTextRecognizedLanguage*>*symbolRecognizedLanguages=symbol.recognizedLanguages;FIRVisionTextRecognizedBreak*symbolBreak=symbol.recognizedBreak;CGRectsymbolFrame=symbol.frame;}}}}
Next steps
- Before you deploy to production an app that uses a Cloud API, you should take some additional steps to prevent and mitigate the effect of unauthorized API access.