Text Recognition in Vision Framework
Text Recognition in Vision Framework
© 2019 Apple Inc. All rights reserved. Redistribution or public display not permitted without written permission from Apple.
let requestHandler = VNImageRequestHandler(url: imageURL, options: [:])
request.reportCharacterBoxes = true
try? requestHandler.perform([request])
let requestHandler = VNImageRequestHandler(url: imageURL, options: [:])
request.reportCharacterBoxes = true
try? requestHandler.perform([request])
let requestHandler = VNImageRequestHandler(url: imageURL, options: [:])
}
}
request.reportCharacterBoxes = true
try? requestHandler.perform([request])
let requestHandler = VNImageRequestHandler(url: imageURL, options: [:])
}
}
request.reportCharacterBoxes = true
try? requestHandler.perform([request])
let requestHandler = VNImageRequestHandler(url: imageURL, options: [:])
}
}
request.reportCharacterBoxes = true
try? requestHandler.perform([request])
let requestHandler = VNImageRequestHandler(url: imageURL, options: [:])
}
}
request.reportCharacterBoxes = true
try? requestHandler.perform([request])
let requestHandler = VNImageRequestHandler(url: imageURL, options: [:])
}
}
request.reportCharacterBoxes = true
try? requestHandler.perform([request])
let requestHandler = VNImageRequestHandler(url: imageURL, options: [:])
request.reportCharacterBoxes = true
try? requestHandler.perform([request])
let requestHandler = VNImageRequestHandler(url: imageURL, options: [:])
try? requestHandler.perform([request])
let requestHandler = VNImageRequestHandler(url: imageURL, options: [:])
try? requestHandler.perform([request])
let requestHandler = VNImageRequestHandler(url: imageURL, options: [:])
try? requestHandler.perform([request])
Our Journey Today
Example applications
•
Best practices
Our Journey Today
Example applications
•
Best practices
Our Journey Today
Example applications
•
Best practices
Our Journey Today
Example applications
•
Best practices
•
Fast Accurate
Two Paths to Choose From
Fast Accurate
Character Detection
Two Paths to Choose From
Fast Accurate
Character Detection
Character Recognition
Fast Accurate
Character Recognition
Fast Accurate
Reduce stress 1eve1s (by 68 percent!) Reduce stress levels (by 68 percent!)
Two Paths to Choose From
Fast Accurate
Reduce stress 1eve1s (by 68 percent!) Reduce stress levels (by 68 percent!)
Fast Accurate
Reduce stress 1eve1s (by 68 percent!) Reduce stress levels (by 68 percent!)
Results
Two Paths to Choose From
All on Device
Fast Accurate
Reduce stress 1eve1s (by 68 percent!) Reduce stress levels (by 68 percent!)
Results
Fast Versus Accurate
Fast Versus Accurate
Fast Versus Accurate
Fast Versus Accurate
Fast Versus Accurate
Fast Accurate
Fast Versus Accurate
Fast Accurate
Fast Accurate
Fast Accurate
Fast Accurate
Fast Accurate
What is my input?
Camera capture
Use Cases Drive How to Configure the Request
Camera capture
• Live capture at high frame rate — go fast
Use Cases Drive How to Configure the Request
Camera capture
• Live capture at high frame rate — go fast
Camera capture
• Live capture at high frame rate — go fast
Post processing
Use Cases Drive How to Configure the Request
Post processing
• Favor accuracy over speed
Reading Codes Versus Reading Natural Language
Language processing
Reading Codes Versus Reading Natural Language
Language processing
• Corrects typical recognition errors
Reading Codes Versus Reading Natural Language
Language processing
• Corrects typical recognition errors
Language processing
• Corrects typical recognition errors
// Create request
let myTextRecognitionRequest = VNRecognizeTextRequest()
// Create request handler
let myRequestHandler = VNImageRequestHandler(url: fileURL, options: [:])
// Create request
let myTextRecognitionRequest = VNRecognizeTextRequest()
// Create request
let myTextRecognitionRequest = VNRecognizeTextRequest()
// Create request
let myTextRecognitionRequest = VNRecognizeTextRequest()
// Create request
let myTextRecognitionRequest = VNRecognizeTextRequest()
// Create request
let myTextRecognitionRequest = VNRecognizeTextRequest()
// Create request
let myTextRecognitionRequest = VNRecognizeTextRequest()
// Create request
let myTextRecognitionRequest = VNRecognizeTextRequest()
// Create request
let myTextRecognitionRequest = VNRecognizeTextRequest()
// Create request
let myTextRecognitionRequest = VNRecognizeTextRequest()
Use case
When to Use .fast
Use case
• Read codes/serial numbers just like a barcode reader
When to Use .fast
Use case
• Read codes/serial numbers just like a barcode reader
Use case
• Read codes/serial numbers just like a barcode reader
• Interactivity is key
When to Use .fast
Use case
• Read codes/serial numbers just like a barcode reader
• Interactivity is key
Demo
Phone Number Reader
Demo Recap
Document Camera
•
• Evenly lit
import Vision
import VisionKit
textRecognitionWorkQueue.async {
for pageIndex in 0 ..< scan.pageCount {
let image = scan.imageOfPage(at: pageIndex)
if let cgImage = image.cgImage {
let requestHandler = VNImageRequestHandler(cgImage: cgImage, options: [:])
do {
try requestHandler.perform(self.requests)
} catch {
print(error)
}
}
}
}
}
import Vision
import VisionKit
textRecognitionWorkQueue.async {
for pageIndex in 0 ..< scan.pageCount {
let image = scan.imageOfPage(at: pageIndex)
if let cgImage = image.cgImage {
let requestHandler = VNImageRequestHandler(cgImage: cgImage, options: [:])
do {
try requestHandler.perform(self.requests)
} catch {
print(error)
}
}
}
}
}
import Vision
import VisionKit
textRecognitionWorkQueue.async {
for pageIndex in 0 ..< scan.pageCount {
let image = scan.imageOfPage(at: pageIndex)
if let cgImage = image.cgImage {
let requestHandler = VNImageRequestHandler(cgImage: cgImage, options: [:])
do {
try requestHandler.perform(self.requests)
} catch {
print(error)
}
}
}
}
}
import Vision
import VisionKit
textRecognitionWorkQueue.async {
for pageIndex in 0 ..< scan.pageCount {
let image = scan.imageOfPage(at: pageIndex)
if let cgImage = image.cgImage {
let requestHandler = VNImageRequestHandler(cgImage: cgImage, options: [:])
do {
try requestHandler.perform(self.requests)
} catch {
print(error)
}
}
}
}
}
import Vision
import VisionKit
textRecognitionWorkQueue.async {
for pageIndex in 0 ..< scan.pageCount {
let image = scan.imageOfPage(at: pageIndex)
if let cgImage = image.cgImage {
let requestHandler = VNImageRequestHandler(cgImage: cgImage, options: [:])
do {
try requestHandler.perform(self.requests)
} catch {
print(error)
}
}
}
}
}
•
Language knowledge
•
Performance
•
Processing results
Leverage Language Knowledge
Language-based correction
• Specify the language
Custom lexicon
• Use custom vocabulary for domain-specific text
Progress updates
myTextRecognitionRequest.progressHandler = myProgressHandler
Cancellation
myTextRecognitionRequest.cancel()
•
Demo
•
Processing Results
Expect Ambiguity in the Input
VNRecognizedTextObservation
• Process transcription candidates
let maxCount = 3
let candidates = currentObservation.topCandidates(maxCandidateCount: maxCount)
Use Geometry to Map Results
• Rotation
Use Geometry to Map Results
• Rotation
Use Parsers to Label Results
Data Detectors
• NSDataDetector for types of interest
Data Detectors
• NSDataDetector for types of interest
Address
Phone number
Email
Use Parsers to Label Results
Domain-specific filters
• Your own lexicon
• Regular expressions
•
Document Category
Picker
Business Companion App
Document Category
Document Camera
Picker
Business Companion App
Document Category
Document Camera Text Recognition
Picker
Business Companion App
Document Category
Document Camera Text Recognition Results Analysis
Picker
Receipt
Business Card
Other
Business Companion App
Document Category
Document Camera Text Recognition Results Analysis Visualization
Picker
Demo
•
Business Companion
Demo Recap