0% found this document useful (0 votes)

354 views17 pages

Exploring Apple's VISION Framework

The document discusses building an iOS app to detect human hand and body pose landmarks using Apple's Vision framework. It will create a simple UI with an image view and switches to select between detecting hand or body pose. When an image is selected from the photo library, the Vision framework will detect landmarks and draw them on the image without requiring internet access. The tutorial provides code to set up the UI, get user photo access, scale images appropriately, and prepare for using Vision to detect poses.

Uploaded by

chandana pemmasani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

354 views17 pages

Exploring Apple's VISION Framework

Uploaded by

chandana pemmasani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Exploring Apple's VISION Framework:

Building a simple iOS App to find Human hand pose and

body pose landmarks.
Having been introduced in Apple's WWDC 2017, the VISION Framework has been
continuously updated each year to provide various features for us developers to design
exciting Apps. This year at WWDC 2020, Apple offered a new set of tools via the VISION
framework that takes the field of computer vision to the next level by detecting the
human hand pose and body pose.

What we will be learning in this tutorial

All these days we have been using VISION to detect human faces, facial landmarks, text
and barcode recognition, and many more. The framework does all the complex parts of
detection for us exposing the API's which can be directly used in our app. The other
cool thing with VISION is that we don't require an internet connection to work with
this framework. So knowing all these, today, in this tutorial we will be learning how
to use this framework to find the various landmarks of human hands and the human body
that helps us to understand their pose in the static pictures. For this, we will be
building an iOS app and see how the framework does wonders for us to detect the human
pose.

Note: The project in this tutorial was built using Swift 5 in Xcode 12 beta. So you
need to have Xcode 12 beta to run the project. you don't have to have a device
(cheers:-)) as this project was built to support a simulator.

The app we are going to build is very simple. our app allows the user's to select an
image from the iPhone's gallery and as soon as the image loads we detect the hand and
body pose landmarks using our VISION framework.

Now, enough with the theory part, let's dive into the coding session.

Getting Started
First, go to the Xcode 12 beta and create a new project. Select the app template under
iOS(previously "Single View App") and on the next page give the application name and
make sure the interface is set to "storyboard" and swift is selected as the
programming language.

We will maintain a simple UI and give more priority to the VISION part. So let's
begin! go to the Main.storyboard and select the image view from the UI controls
library. Drag and drop the image view into our view Controller. Set the leading,
trailing to zero, and center it vertically in the safe area. Set the height of the
image view as 400. This makes the image view to sit in the center of the view, no
matter the size of the device. Now to access the phone's photo library we will be
requiring a button action. so add a button at the bottom of the view controller and
name it as "Select Photo" in the attributes inspector. Set the vertical spacing
between the button and the image view as 100 and center it horizontally in the safe
area. Now drag and drop two switch controls into the view controller. select both of
them and embed them in the stack view. These switch controls allow the users to choose
between hand pose and body pose. set the leading and trailing of the stack view and
set the vertical space between the stack view and image view as 140. This pins the
switch controls to the image view. Now add a label at the bottom of each switch in the
view controller and add the constraints to these labels. Finally, go to the attribute
inspector of each label and give the names as hand and body respectively. This
completes the UI for our app. Now, the storyboard pretty much looks like this
Implementing the photo library function
Now that we have set everything in the storyboard, it's time to write some code to use
these elements. Go to viewController.swift and adopt the
UINavigationControllerDelegate protocol that will be required by the
UIImagePickerController class.

class ViewController: UIViewController,

UINavigationControllerDelegate

Now, add three new IBOutlets and name them as imageView, handSwitch and bodySwitch for
the image view, and two switch controls respectively. Next, create the IBaction method
for selecting an image from the gallery. The code in the view controller should look
like this.

class ViewController: UIViewController,

UIImagePickerControllerDelegate,
UINavigationControllerDelegate {

@IBOutlet weak var imageView: UIImageView!

@IBOutlet weak var handSwitch: UISwitch!
@IBOutlet weak var bodySwitch: UISwitch!

override func viewDidLoad() {

super.viewDidLoad()
}

@IBAction func selectPhotoAction(_ sender:

Any) {
let imagePicker = UIImagePickerController()
imagePicker.delegate = self
imagePicker.sourceType = .photoLibrary
self.present(imagePicker, animated: true)

}
}

Make sure you connect all these outlet variables and action methods to the respective
controls in the storyboard. In the "selectPhotoAction", we create a constant of type
UIImagePickerController() . we set its delegate method to the class and then we
present the UIImagePickerController() to the user.

As we set the UIImagePickerController() delegate to the class. we have to adopt the

delegate and add the UIImagePickerControllerDelegate class method to
ViewController.swift

extension ViewController:UIImagePickerControllerDelegate {
func imagePickerControllerDidCancel(_picker: UIImagePickerController) {
dismiss(animated: true, completion: nil)
}
}

The line above handles the app if the user cancels the image. It also assigns the
class method UIImagePickerControllerDelegate to the ViewController. After this declare
two variables imageWidth and imageHeight of type CGFloat to get the image width and
height respectively. Declare the pathLayer variable of type CALayer . We use this
pathLayer to hold the Vision results. Your code should now look a little like this.

class ViewController: UIViewController,

UIImagePickerControllerDelegate,
UINavigationControllerDelegate {

@IBOutlet weak var imageView: UIImageView!

@IBOutlet weak var handSwitch: UISwitch!
@IBOutlet weak var bodySwitch: UISwitch!

var imageWidth: CGFloat = 0

var imageHeight: CGFloat = 0
var pathLayer: CALayer?

override func viewDidLoad() {

super.viewDidLoad()
}

@IBAction func selectPhotoAction(_ sender:

Any) {
let imagePicker = UIImagePickerController()
imagePicker.delegate = self
imagePicker.sourceType = .photoLibrary
self.present(imagePicker, animated: true)

}
}
extension ViewController:UIImagePickerControllerDelegate {
func imagePickerControllerDidCancel(_picker: UIImagePickerController) {
dismiss(animated: true, completion: nil)
}
}

To access the photo library, you need to get permission from the users. So Go to your
Info.plist and add this entry "Privacy – Photo Library Usage Description". We do this
because Starting from iOS 10, you will need to specify the reason why your app needs
to access the camera and photo library.

Now, to get the image in standard size and orientation, we define a function that
transforms the input gallery image into the required size. This function accepts the
UIImage as an argument and returns the transformed image.

func scaleAndOrient(image: UIImage) -> UIImage {

// Set a default value for limiting image size.

let maxResolution: CGFloat = 640

guard let cgImage = image.cgImage else {

print("UIImage has no CGImage backing it!")
return image
}

// Compute parameters for transform.

let width = CGFloat(cgImage.width)
let height = CGFloat(cgImage.height)
var transform = CGAffineTransform.identity

var bounds = CGRect(x: 0, y: 0, width: width, height: height)

if width > maxResolution ||

height > maxResolution {
let ratio = width / height
if width > height {
bounds.size.width = maxResolution
bounds.size.height = round(maxResolution / ratio)
} else {
bounds.size.width = round(maxResolution * ratio)
bounds.size.height = maxResolution
}
}

let scaleRatio = bounds.size.width / width

let orientation = image.imageOrientation
switch orientation {
case .up:
transform = .identity
case .down:
transform = CGAffineTransform(translationX: width, y: height).rotated(by:
.pi)
case .left:
let boundsHeight = bounds.size.height
bounds.size.height = bounds.size.width
bounds.size.width = boundsHeight
transform = CGAffineTransform(translationX: 0, y: width).rotated(by: 3.0 *
.pi / 2.0)
case .right:
let boundsHeight = bounds.size.height
bounds.size.height = bounds.size.width
bounds.size.width = boundsHeight
transform = CGAffineTransform(translationX: height, y: 0).rotated(by: .pi
/ 2.0)
case .upMirrored:
transform = CGAffineTransform(translationX: width, y: 0).scaledBy(x: -1,
y: 1)
case .downMirrored:
transform = CGAffineTransform(translationX: 0, y: height).scaledBy(x: 1,
y: -1)
case .leftMirrored:
let boundsHeight = bounds.size.height
bounds.size.height = bounds.size.width
bounds.size.width = boundsHeight
transform = CGAffineTransform(translationX: height, y: width).scaledBy(x:
-1, y: 1).rotated(by: 3.0 * .pi / 2.0)
case .rightMirrored:
let boundsHeight = bounds.size.height
bounds.size.height = bounds.size.width
bounds.size.width = boundsHeight
transform = CGAffineTransform(scaleX: -1, y: 1).rotated(by: .pi / 2.0)
default:
transform = .identity
}

return UIGraphicsImageRenderer(size: bounds.size).image { rendererContext in

let context = rendererContext.cgContext

if orientation == .right || orientation == .left {

context.scaleBy(x: -scaleRatio, y: scaleRatio)
context.translateBy(x: -height, y: 0)
} else {
context.scaleBy(x: scaleRatio, y: -scaleRatio)
context.translateBy(x: 0, y: -height)
}
context.concatenate(transform)
context.draw(cgImage, in: CGRect(x: 0, y: 0, width: width, height:
height))
}
}

Now, we define the other function that prepares the pathLayer variable to hold the
Vision results and displays the selected image in the UIImageView . This function
takes the UIImage as the argument.

func show(_ image: UIImage) {

// 1
pathLayer?.removeFromSuperlayer()
pathLayer = nil
imageView.image = nil

// 2
let correctedImage = scaleAndOrient(image: image)

// 3
imageView.image = correctedImage

guard let cgImage = correctedImage.cgImage else {

print("Trying to show an image not backed by CGImage!")
return
}

let fullImageWidth = CGFloat(cgImage.width)

let fullImageHeight = CGFloat(cgImage.height)

let imageFrame = imageView.frame

let widthRatio = fullImageWidth / imageFrame.width
let heightRatio = fullImageHeight / imageFrame.height

// 4
let scaleDownRatio = max(widthRatio, heightRatio)

// 5
imageWidth = fullImageWidth / scaleDownRatio
imageHeight = fullImageHeight / scaleDownRatio

// 6
let xLayer = (imageFrame.width - imageWidth) / 2
let yLayer = imageView.frame.minY + (imageFrame.height - imageHeight) / 2
let drawingLayer = CALayer()
drawingLayer.bounds = CGRect(x: xLayer, y: yLayer, width: imageWidth, height:
imageHeight)
drawingLayer.anchorPoint = CGPoint.zero
drawingLayer.position = CGPoint(x: xLayer, y: yLayer)
drawingLayer.opacity = 0.5
pathLayer = drawingLayer
// 7
self.view.layer.addSublayer(pathLayer!)
}

In the above code, At 1, we remove the previous path & old image from the View.
At 2, we call the previously defined func scaleAndOrient(image: UIImage) -> UIImage to
transform the image.
At 3, we place the image inside the imageView.
At 4, we scale down the image according to the stricter dimension.
At 5, we cache image dimensions to reference when drawing CALayer paths.
At 6, we prepare the pathLayer variable that we defined earlier of type CALayer to
hold Vision results.
At 7, we add the pathLayer to the main View.

That's it! Now, we are ready to start with the actual part of this tutorial that is
VISION.

Integrating the VISION Hand pose and body Pose APIs

The Vision Hand pose and Body pose APIs follow the same pattern as the other Vision
APIs. First, we will see how to implement the Human Hand pose request. The Body pose
is more or less similar to Hand pose except for the names used in the API.

The first step is to create a request handler. For this, we use the
VNImageRequestHandler() .

The next step is to create the request using VNDetectHumanHandPoseRequest() . For this
purpose, we create a lazy variable. Because the process to create request is
computationally expensive. so add the following lazy variable to the ViewController
class.

lazy var handLandmarkRequest = VNDetectHumanHandPoseRequest(completionHandler:

self.handleDetectedHandLandmarks)

Now we write a function to creates the Requests for hand pose and body pose.

func createVisionRequests() -> [VNRequest]

{

var requests: [VNRequest] = []

// Create & include a request if and only if switch is ON.

if self.handSwitch.isOn {
requests.append(self.handLandmarkRequest)
}
if self.bodySwitch.isOn {
requests.append(self.humanLandmarkRequest)
}
// Return grouped requests as a single array.
return requests
}

In the above function, we created an array of type VNRequest . Based on the switch
status we append the corresponding request and return the requests array.

We need to send these requests to the request handler. so, we write a function that
fetches the above requests and passes them to the request handler.

func performVisionRequestforLandmarks(image: CGImage, orientation:

CGImagePropertyOrientation) {
// 1
let requests = createVisionRequests()
// 2
let imageRequestHandler = VNImageRequestHandler(cgImage: image,
orientation: orientation,
options: [:])
// 3
DispatchQueue.global(qos: .userInitiated).async {
do {
try imageRequestHandler.perform(requests)
} catch let error as NSError {
print("Failed to perform image request: \(error)")
return
}
}
}

At 1, we fetch the desired requests based on switch status.

At 2, we create a request handler.
At 3, we send the requests to the request handler.

Once the results are fetched, these results are handed over to the
handleDetectedHandLandmarks() function in the completion handler of the
handLandmarkRequest variable. The implementation of this function is as follows

func handleDetectedHandLandmarks(request: VNRequest?, error: Error?) {

// 1
if let nsError = error as NSError? {
print("Landmark Detection Error \(nsError)")
return
}
// 2
DispatchQueue.main.async {
guard let drawLayer = self.pathLayer,
let results = request?.results as? [VNRecognizedPointsObservation]
else {
return
}
self.drawFeaturesforHandPose(onHands: results, onImageWithBounds:
drawLayer.bounds)
drawLayer.setNeedsDisplay()
}
}

In the above function, At 1, we handle the errors if any during the generation of
results and print the error to the debugger and return.
At 2, If there is no error we pass the results and image bounds to the
drawFeaturesforHandPose() function.

Next, we see the implementation of the drawFeaturesforHandPose() function. This is

the function where we actually draw something on the view using the results from the
Vision api.

func drawFeaturesforHandPose(onHands hands: [VNRecognizedPointsObservation],

onImageWithBounds bounds: CGRect) {
CATransaction.begin()

for hand in hands {

// 1
let observationBox = self.humanBoundingBox(for: hand)
// 2
let humanBounds = boundingBox(forRegionOfInterest: observationBox,
withinImageBounds: bounds)
do {
// 3
let allPoints = try hand.recognizedPoints(forGroupKey:
VNRecognizedPointGroupKey.all)

// 4
var indexPoints = [VNRecognizedPoint]()
indexPoints.append(allPoints[.handLandmarkKeyIndexTIP]!)
indexPoints.append(allPoints[.handLandmarkKeyIndexDIP]!)
indexPoints.append(allPoints[.handLandmarkKeyIndexPIP]!)
indexPoints.append(allPoints[.handLandmarkKeyIndexMCP]!)
indexPoints.append(allPoints[.handLandmarkKeyWrist]!)

var thumbPoints = [VNRecognizedPoint]()

thumbPoints.append(allPoints[.handLandmarkKeyThumbTIP]!)
thumbPoints.append(allPoints[.handLandmarkKeyThumbIP]!)
thumbPoints.append(allPoints[.handLandmarkKeyThumbMP]!)
thumbPoints.append(allPoints[.handLandmarkKeyThumbCMC]!)
thumbPoints.append(allPoints[.handLandmarkKeyWrist]!)

var ringFingerPoints = [VNRecognizedPoint]()

ringFingerPoints.append(allPoints[.handLandmarkKeyRingTIP]!)
ringFingerPoints.append(allPoints[.handLandmarkKeyRingDIP]!)
ringFingerPoints.append(allPoints[.handLandmarkKeyRingPIP]!)
ringFingerPoints.append(allPoints[.handLandmarkKeyRingMCP]!)
ringFingerPoints.append(allPoints[.handLandmarkKeyWrist]!)

var middleFingerPoints = [VNRecognizedPoint]()

middleFingerPoints.append(allPoints[.handLandmarkKeyMiddleTIP]!)
middleFingerPoints.append(allPoints[.handLandmarkKeyMiddleDIP]!)
middleFingerPoints.append(allPoints[.handLandmarkKeyMiddlePIP]!)
middleFingerPoints.append(allPoints[.handLandmarkKeyMiddleMCP]!)
middleFingerPoints.append(allPoints[.handLandmarkKeyWrist]!)

var littleFingerPoints = [VNRecognizedPoint]()

littleFingerPoints.append(allPoints[.handLandmarkKeyLittleTIP]!)
littleFingerPoints.append(allPoints[.handLandmarkKeyLittleDIP]!)
littleFingerPoints.append(allPoints[.handLandmarkKeyLittlePIP]!)
littleFingerPoints.append(allPoints[.handLandmarkKeyLittleMCP]!)
littleFingerPoints.append(allPoints[.handLandmarkKeyWrist]!)

// 5
var indexCGPoints = [CGPoint]()
var middleCGPoints = [CGPoint]()
var littleCGPoints = [CGPoint]()
var thumbCGPoints = [CGPoint]()
var ringCGPoints = [CGPoint]()

// 6
for point in indexPoints {
let reqPoint = CGPoint(x: (point.location.x), y:
(point.location.y))

indexCGPoints.append(CGPoint(x: (reqPoint.x * bounds.width) +

bounds.origin.x - humanBounds.origin.x, y: ((1 - reqPoint.y) * bounds.height) +
bounds.origin.y - humanBounds.origin.y ))
}

for point in middleFingerPoints {

let reqPoint = CGPoint(x: (point.location.x), y:
(point.location.y))

middleCGPoints.append(CGPoint(x: (reqPoint.x * bounds.width) +

bounds.origin.x - humanBounds.origin.x, y: ((1 - reqPoint.y) * bounds.height) +
bounds.origin.y - humanBounds.origin.y ))
}

for point in littleFingerPoints {

let reqPoint = CGPoint(x: (point.location.x), y:
(point.location.y))

littleCGPoints.append(CGPoint(x: (reqPoint.x * bounds.width) +

bounds.origin.x - humanBounds.origin.x, y: ((1 - reqPoint.y) * bounds.height) +
bounds.origin.y - humanBounds.origin.y ))
}

for point in thumbPoints {

let reqPoint = CGPoint(x: (point.location.x), y:
(point.location.y))
thumbCGPoints.append(CGPoint(x: (reqPoint.x * bounds.width) +
bounds.origin.x - humanBounds.origin.x, y: ((1 - reqPoint.y) * bounds.height) +
bounds.origin.y - humanBounds.origin.y ))
}

for point in ringFingerPoints {

let reqPoint = CGPoint(x: (point.location.x), y:
(point.location.y))

ringCGPoints.append(CGPoint(x: (reqPoint.x * bounds.width) +

bounds.origin.x - humanBounds.origin.x, y: ((1 - reqPoint.y) * bounds.height) +
bounds.origin.y - humanBounds.origin.y ))
}

var handLayers = [CAShapeLayer]()

// 7
let indexLayer = CAShapeLayer()
let indexPath = CGMutablePath()
indexPath.move(to: indexCGPoints[0])
for i in 1..<indexCGPoints.count {
indexPath.addLine(to: indexCGPoints[i])
}
indexLayer.path = indexPath

let thumbLayer = CAShapeLayer()

let thumbPath = CGMutablePath()
thumbPath.move(to: thumbCGPoints[0])
for i in 1..<thumbCGPoints.count {
thumbPath.addLine(to: thumbCGPoints[i])
}
thumbLayer.path = thumbPath

let littleFingerLayer = CAShapeLayer()

let littleFingerPath = CGMutablePath()
littleFingerPath.move(to: littleCGPoints[0])
for i in 1..<littleCGPoints.count {
littleFingerPath.addLine(to: littleCGPoints[i])
}
littleFingerLayer.path = littleFingerPath

let ringLayer = CAShapeLayer()

let ringPath = CGMutablePath()
ringPath.move(to: ringCGPoints[0])
for i in 1..<ringCGPoints.count {
ringPath.addLine(to: ringCGPoints[i])
}
ringLayer.path = ringPath

let middleLayer = CAShapeLayer()

let middlePath = CGMutablePath()
middlePath.move(to: middleCGPoints[0])
for i in 1..<middleCGPoints.count {
middlePath.addLine(to: middleCGPoints[i])
}
middleLayer.path = middlePath
// 8
handLayers.append(ringLayer)
handLayers.append(indexLayer)
handLayers.append(middleLayer)
handLayers.append(littleFingerLayer)
handLayers.append(thumbLayer)
// 9
for index in 0..<5 {
handLayers[index].lineWidth = 2
handLayers[index].strokeColor = UIColor.green.cgColor
handLayers[index].fillColor = nil
handLayers[index].shadowOpacity = 0.75
handLayers[index].shadowRadius = 4
handLayers[index].anchorPoint = .zero
handLayers[index].frame = humanBounds
handLayers[index].transform = CATransform3DMakeScale(1, 1, 1)
// 10
pathLayer?.addSublayer(handLayers[index])
}

catch {
print("error")
}
}

CATransaction.commit()
}

At 1, we need to get a boundary for the detected observations. This helps us to know
where the hand is located in the image. For Vision Face detection API, we get a
default boundingBox method for observations. But for Hand detection API, there is no
such method. So we write our method.

func humanBoundingBox(for observation: VNRecognizedPointsObservation) -> CGRect {

var box = CGRect.zero
var normalizedBoundingBox = CGRect.null
guard let points = try? observation.recognizedPoints(forGroupKey: .all) else {
return box
}
for (_, point) in points {
normalizedBoundingBox = normalizedBoundingBox.union(CGRect(origin:
point.location, size: .zero))
}
if !normalizedBoundingBox.isNull {
box = normalizedBoundingBox
}
return box
}

At 2, we have to convert the observation box we get in step 1 to the image scale. To
achieve this, we send the "observationBox" and image bounds to the following function.

func boundingBox(forRegionOfInterest: CGRect, withinImageBounds bounds: CGRect) ->

CGRect {

let imageWidth = bounds.width

let imageHeight = bounds.height

// Begin with input rect.

var rect = forRegionOfInterest

// Reposition origin.
rect.origin.x *= imageWidth
rect.origin.x += bounds.origin.x
rect.origin.y = (1 - rect.origin.y) * imageHeight + bounds.origin.y

// Rescale normalized coordinates.

rect.size.width *= imageWidth
rect.size.height *= imageHeight

return rect
}

At 3, for each hand, vision returns 21 observation points(4 points for each finger and
one for wrist). Each point is accessed using a key associated with it. These points
are again grouped for each finger and can be accessed via group keys. The following
are the group keys provided by Vision. These are used to access all the points in a
particular finger at a time.

VNRECOGNIZEDPOINTGROUPKEY.ALL - RETURNS ALL THE 21 POINTS

VNRECOGNIZEDPOINTGROUPKEY.HANDLANDMARKREGIONKEYTHUMB - RETURNS 4 POINTS ON THE THUMB FINGER.
VNRECOGNIZEDPOINTGROUPKEY.HANDLANDMARKREGIONKEYINDEXFINGER - RETURNS 4 POINTS ON THE INDEX FINGER.
VNRECOGNIZEDPOINTGROUPKEY.HANDLANDMARKREGIONKEYMIDDLEFINGER - RETURNS 4 POINTS ON THE MIDDLE
FINGER.
VNRECOGNIZEDPOINTGROUPKEY.HANDLANDMARKREGIONKEYRINGFINGER - RETURNS 4 POINTS ON THE RING FINGER.
VNRECOGNIZEDPOINTGROUPKEY.HANDLANDMARKREGIONKEYLITTLEFINGER - RETURNS 4 POINTS ON THE LITTLE
FINGER.
At 4, we declare an array of type VNRecognizedPoint and append the observed points of
Index finger to this array. The observed points for Index finger can be accessed using
the following keys. .handLandmarkKeyIndexTIP - This key gives the tip point of the
finger. ````.handLandmarkKeyIndexDIP - This key gives the second point from the top of
the finger. .handLandmarkKeyIndexPIP - This gives the third point from the top of the
finger. .handLandmarkKeyIndexMCP - This key gives the last point of the finger from the
top. .handLandmarkKeyWrist``` - This key gives the wrist point. We repeat this for all
the five fingers.

At 5, the observations generated from Vision framework are of type VNRecognizedPoint .

In order to plot them on View, we need to convert them to CGPoint . So we declare an
array of type CGPoint for each finger.
At 6, we convert each observed point of index finger to CGPoint and scale it based on
the image bounds. once converted, we append this point to the indexCGPoints array
defined at step 5. We repeat this step for all the five fingers.

Now, with all these points in their respective arrays, we draw lines between these
points for each finger so that we get to see the landmarks given by the Vision api.For
this purpose, we create variables of type CGmutablePath() that hold the path between
the points and variables of type CAShapeLayer() are used to show these lines on the
view.

At 7, as mentioned above, we create a variable indexLayer of type CAShapeLayer() and

other variable indexPath of type CGMutablePath() . Using the addLine method, we add
lines between the points and assign this path to the indexLayer path. We repeat this
for all the five fingers.

At 8, we declare an array handLayers of type CAShapeLayer() . This array holds the

layers of all the five fingers.

At 9, we set the properties of all the 5 layers in the handLayers array.

At 10, we append each layer of the handLayer to the pathLayer.

Now with all these done, it is time to make use of these functions. So we implement
the imagePickerController(_:didFinishPickingMediaWithInfo) method in the
ViewController to process the selected image and send the image to the Vision api's.

func imagePickerController(_ picker: UIImagePickerController,

didFinishPickingMediaWithInfo info:
[UIImagePickerController.InfoKey: Any]) {
// 1
let originalImage: UIImage =
info[UIImagePickerController.InfoKey.originalImage] as! UIImage
show(originalImage)
// 2
let cgOrientation = CGImagePropertyOrientation(rawValue:
UInt32(originalImage.imageOrientation.rawValue))
guard let cgImage = originalImage.cgImage
else {
return
}
// 3
performVisionRequestforLandmarks(image: cgImage,
orientation: cgOrientation!)
// 4
dismiss(animated: true, completion: nil)
}

In the above code,

At 1, We Extract the chosen image from the gallery.

At 2, we convert the image from UIImageOrientation to CGImagePropertyOrientation.
At 3, we make the function call to create the request for the landmarks on the
selected image.
At 4, we dismiss the picker to return to the original view controller.
Finally, with this function written, we implemented the Human hand pose. Now, it is
time to see the results of our hard work. So build and run the app, as soon as the app
opens, select a photo from the gallery the Vision api starts working and the detected
landmarks can be seen on the screen as shown below.

Notice, that we turned off the body switch as we are yet to implement the body pose.
Now that we are done with Hand pose implementation, I request you to try the body pose
implementation on your own so that you get to understand the methods we implemented.
It is almost similar to hand pose except for the Group Key names which I will provide
below. In any case if you get a doubt in implementation, download the full project
from the git repo where I have implemented both hand pose and body pose.

Group Keys for Human body pose API

VNRECOGNIZEDPOINTGROUPKEY.ALL - RETURNS ALL THE BODY POSE LANDMARKS.
VNRECOGNIZEDPOINTGROUPKEY.BODYLANDMARKREGIONKEYFACE - RETURNS ALL THE LANDMARKS ON FACE.
VNRECOGNIZEDPOINTGROUPKEY.BODYLANDMARKREGIONKEYLEFTARM - RETURNS ALL THE LANDMARKS ON LEFT ARM.
VNRECOGNIZEDPOINTGROUPKEY.BODYLANDMARKREGIONKEYLEFTLEG - RETURNS ALL THE LANDMARKS ON LEFT LEG.
VNRECOGNIZEDPOINTGROUPKEY.BODYLANDMARKREGIONKEYRIGHTARM - RETURNS ALL THE LANDMARKS ON RIGHT ARM.
VNRECOGNIZEDPOINTGROUPKEY.BODYLANDMARKREGIONKEYRIGHTLEG - RETURNS ALL THE LANDMARKS ON RIGHT LEG.
VNRECOGNIZEDPOINTGROUPKEY.BODYLANDMARKREGIONKEYTORSO - RETURNS ALL THE LANDMARKS ON THE TRUNK OF
THE HUMAN BODY.
Once you implement the Human body pose, the result would look something like below.

This marks the end of our tutorial on implementation of Human hand pose and body pose.

Combine Mastery in SwiftUI
0% (1)
Combine Mastery in SwiftUI
437 pages
Angular Js Interview Questions
No ratings yet
Angular Js Interview Questions
173 pages
Mastering Swiftui Sample
100% (2)
Mastering Swiftui Sample
220 pages
TWS Workshop For NCB
100% (1)
TWS Workshop For NCB
31 pages
Ccs332-App Development Final PDF
No ratings yet
Ccs332-App Development Final PDF
45 pages
Fundamentals of Internet Programming All in One Handout
No ratings yet
Fundamentals of Internet Programming All in One Handout
60 pages
Mastering SwiftUI For IOS 16 and Xcode 14 - Simon NG
No ratings yet
Mastering SwiftUI For IOS 16 and Xcode 14 - Simon NG
897 pages
Urgent Joomla
100% (1)
Urgent Joomla
264 pages
Java - E-Commerce Product Management System
No ratings yet
Java - E-Commerce Product Management System
2 pages
List of Abbreviation
No ratings yet
List of Abbreviation
8 pages
Thesis 18 Theme
100% (3)
Thesis 18 Theme
8 pages
UIKit Swift UIRecipes
No ratings yet
UIKit Swift UIRecipes
29 pages
Node Js
No ratings yet
Node Js
12 pages
240 Swiftui On All Devices
No ratings yet
240 Swiftui On All Devices
190 pages
224 Modernizing Your Ui For Ios 13 PDF
No ratings yet
224 Modernizing Your Ui For Ios 13 PDF
208 pages
Iphone App Develpment
No ratings yet
Iphone App Develpment
110 pages
Server Side Scripting PHP
No ratings yet
Server Side Scripting PHP
145 pages
414 Engineering For Testability
No ratings yet
414 Engineering For Testability
212 pages
Building A Camera App With SwiftUI and Combine
No ratings yet
Building A Camera App With SwiftUI and Combine
23 pages
iOS Syllabus PDF
No ratings yet
iOS Syllabus PDF
5 pages
Computer Application Technology P2 QP Sept 2022 Eng
100% (1)
Computer Application Technology P2 QP Sept 2022 Eng
18 pages
iOS 9 Day by Day
No ratings yet
iOS 9 Day by Day
109 pages
Practical Tips For Junior IOS Devs Master
No ratings yet
Practical Tips For Junior IOS Devs Master
90 pages
Flutterldn
No ratings yet
Flutterldn
72 pages
Headfirst Into IOS Development
No ratings yet
Headfirst Into IOS Development
66 pages
N5 Computing-Science QP 2019
No ratings yet
N5 Computing-Science QP 2019
40 pages
iOS Advanced
No ratings yet
iOS Advanced
9 pages
Operator View Guide
No ratings yet
Operator View Guide
156 pages
OPEN CV Tutorial
No ratings yet
OPEN CV Tutorial
42 pages
IOS Syllabus
No ratings yet
IOS Syllabus
5 pages
Swift Guid Sample
No ratings yet
Swift Guid Sample
24 pages
Night Watch Notes
No ratings yet
Night Watch Notes
29 pages
Cs193P - Lecture 6: Iphone Application Development
No ratings yet
Cs193P - Lecture 6: Iphone Application Development
54 pages
UIImagePickerController Class
No ratings yet
UIImagePickerController Class
32 pages
How To Convert Any Website Into A Google Chrome Extension
No ratings yet
How To Convert Any Website Into A Google Chrome Extension
6 pages
Angular - Introduction To Angular Concepts
No ratings yet
Angular - Introduction To Angular Concepts
5 pages
Introduction To Cocoa Programming For Macosx: Arthur Clemens
No ratings yet
Introduction To Cocoa Programming For Macosx: Arthur Clemens
45 pages
Untitled Document
No ratings yet
Untitled Document
3 pages
Iphone Odt
No ratings yet
Iphone Odt
74 pages
(M6-MAIN) MVC, Navigation and Table View
No ratings yet
(M6-MAIN) MVC, Navigation and Table View
60 pages
Abey Resume Template
No ratings yet
Abey Resume Template
1 page
CS5331 Lec05
No ratings yet
CS5331 Lec05
74 pages
Camera Programming Topics For iOS: Audio & Video
No ratings yet
Camera Programming Topics For iOS: Audio & Video
20 pages
Ios Tutorial: How To Create A Simple Iphone App Tutorial: Part 2/3
No ratings yet
Ios Tutorial: How To Create A Simple Iphone App Tutorial: Part 2/3
22 pages
Tutorial: Simple iOS App: Introduction - Setting Up Xcode, Basic Methods, Basic Setup
No ratings yet
Tutorial: Simple iOS App: Introduction - Setting Up Xcode, Basic Methods, Basic Setup
13 pages
Assets Library IOS
No ratings yet
Assets Library IOS
61 pages
Hospital Management Srs
100% (2)
Hospital Management Srs
6 pages
Camera Programming Topics For iOS
No ratings yet
Camera Programming Topics For iOS
20 pages
Stanford CS193p: Developing Applications For iOS Spring 2016
No ratings yet
Stanford CS193p: Developing Applications For iOS Spring 2016
31 pages
March 9th
No ratings yet
March 9th
2 pages
Home Automation IOT Project
No ratings yet
Home Automation IOT Project
52 pages
XML Extensible Markup Language
No ratings yet
XML Extensible Markup Language
27 pages
Docs iOS FD FaceDetection
No ratings yet
Docs iOS FD FaceDetection
16 pages
Anti Ad Block Filters
No ratings yet
Anti Ad Block Filters
17 pages
Xcode Basic UI Element Note UCTI
No ratings yet
Xcode Basic UI Element Note UCTI
5 pages
Iphone Training Course Contents
No ratings yet
Iphone Training Course Contents
12 pages
Advancing App With Real-Time Image Analysis, Machine Learning, and Vision - Mastering ARKit - Apple's Augmented Reality App Development Platform
No ratings yet
Advancing App With Real-Time Image Analysis, Machine Learning, and Vision - Mastering ARKit - Apple's Augmented Reality App Development Platform
8 pages
Ariba - API To Update Questionaire in SLP
No ratings yet
Ariba - API To Update Questionaire in SLP
11 pages
iOS App Dev W - App Brewery
No ratings yet
iOS App Dev W - App Brewery
19 pages
E-Panchayat Synopsis
No ratings yet
E-Panchayat Synopsis
4 pages
PHOTOBOOTH
No ratings yet
PHOTOBOOTH
6 pages
Instructions 2
No ratings yet
Instructions 2
7 pages
CameraSDKImpl Swift
No ratings yet
CameraSDKImpl Swift
4 pages
Programming Project 3 Graphic Calculator
No ratings yet
Programming Project 3 Graphic Calculator
8 pages
Other View1 Copy 4
No ratings yet
Other View1 Copy 4
5 pages
What Is .ART File Viewer
No ratings yet
What Is .ART File Viewer
4 pages
iOS Basic
No ratings yet
iOS Basic
8 pages
Photo Booth Tutorial App: What You'll Build
No ratings yet
Photo Booth Tutorial App: What You'll Build
8 pages
Code Reviews
No ratings yet
Code Reviews
7 pages
Develop in Swift Tutorials Educator Guide
No ratings yet
Develop in Swift Tutorials Educator Guide
5 pages
Ultimate Regex Cheat Sheet - Keycdn
No ratings yet
Ultimate Regex Cheat Sheet - Keycdn
7 pages
Step 1 Create Mobile Application Wi
No ratings yet
Step 1 Create Mobile Application Wi
2 pages
Develop in Swift Tutorials Educator Guide
No ratings yet
Develop in Swift Tutorials Educator Guide
4 pages
iOS13 Course Syllabus
No ratings yet
iOS13 Course Syllabus
3 pages
UIImage MultiFormat
No ratings yet
UIImage MultiFormat
2 pages
FactoryTalk Optix - 1.3.3 (Released 4 - 2024)
No ratings yet
FactoryTalk Optix - 1.3.3 (Released 4 - 2024)
6 pages
Develop in Swift Tutorials Educator Guide
No ratings yet
Develop in Swift Tutorials Educator Guide
4 pages
Roadmap IOS Assist
No ratings yet
Roadmap IOS Assist
1 page
MVVM-C Architecture With Dependency Injection + Testing - by Steven Curtis - Medium
No ratings yet
MVVM-C Architecture With Dependency Injection + Testing - by Steven Curtis - Medium
2 pages
Class 1 HTML
No ratings yet
Class 1 HTML
4 pages
Frairza After PageSpeed Insights
No ratings yet
Frairza After PageSpeed Insights
4 pages
Shipra Singhal Resume Btech
No ratings yet
Shipra Singhal Resume Btech
2 pages
Pyqt6 101: A Beginner’s Guide to PyQt6
From Everand
Pyqt6 101: A Beginner’s Guide to PyQt6
Edward Chang
No ratings yet
JavaScript Patterns JumpStart Guide (Clean up your JavaScript Code)
From Everand
JavaScript Patterns JumpStart Guide (Clean up your JavaScript Code)
Dan Wahlin
4.5/5 (3)
Vb Net Programming
From Everand
Vb Net Programming
Martin Booch
No ratings yet
Learning Highcharts
From Everand
Learning Highcharts
Joe Kuan
No ratings yet
Unleashing the Power of TypeScript
From Everand
Unleashing the Power of TypeScript
Steve Kinney
No ratings yet
Building a Screenshot Capture Web App with Vanilla HTML, CSS, and JavaScript.: A Practical Q&A Guide Using a Screenshot Capture App
From Everand
Building a Screenshot Capture Web App with Vanilla HTML, CSS, and JavaScript.: A Practical Q&A Guide Using a Screenshot Capture App
Lumavalle Press
No ratings yet
How to a Developers Guide in 4k: Developer edition, #2
From Everand
How to a Developers Guide in 4k: Developer edition, #2
Xinc Cyberwizard
No ratings yet
Spring Boot Intermediate Microservices: Resilient Microservices with Spring Boot 2 and Spring Cloud
From Everand
Spring Boot Intermediate Microservices: Resilient Microservices with Spring Boot 2 and Spring Cloud
Jens Boje
No ratings yet

Exploring Apple's VISION Framework

Uploaded by

Exploring Apple's VISION Framework

Uploaded by

Exploring Apple's VISION Framework:

Building a simple iOS App to find Human hand pose and

What we will be learning in this tutorial

class ViewController: UIViewController,

class ViewController: UIViewController,

@IBOutlet weak var imageView: UIImageView!

override func viewDidLoad() {

@IBAction func selectPhotoAction(_ sender:

As we set the UIImagePickerController() delegate to the class. we have to adopt the

class ViewController: UIViewController,

@IBOutlet weak var imageView: UIImageView!

var imageWidth: CGFloat = 0

override func viewDidLoad() {

@IBAction func selectPhotoAction(_ sender:

func scaleAndOrient(image: UIImage) -> UIImage {

// Set a default value for limiting image size.

guard let cgImage = image.cgImage else {

// Compute parameters for transform.

var bounds = CGRect(x: 0, y: 0, width: width, height: height)

if width > maxResolution ||

let scaleRatio = bounds.size.width / width

return UIGraphicsImageRenderer(size: bounds.size).image { rendererContext in

if orientation == .right || orientation == .left {

func show(_ image: UIImage) {

guard let cgImage = correctedImage.cgImage else {

let fullImageWidth = CGFloat(cgImage.width)

let imageFrame = imageView.frame

Integrating the VISION Hand pose and body Pose APIs

lazy var handLandmarkRequest = VNDetectHumanHandPoseRequest(completionHandler:

func createVisionRequests() -> [VNRequest]

var requests: [VNRequest] = []

// Create & include a request if and only if switch is ON.

func performVisionRequestforLandmarks(image: CGImage, orientation:

At 1, we fetch the desired requests based on switch status.

func handleDetectedHandLandmarks(request: VNRequest?, error: Error?) {

Next, we see the implementation of the drawFeaturesforHandPose() function. This is

func drawFeaturesforHandPose(onHands hands: [VNRecognizedPointsObservation],

for hand in hands {

var thumbPoints = [VNRecognizedPoint]()

var ringFingerPoints = [VNRecognizedPoint]()

var middleFingerPoints = [VNRecognizedPoint]()

var littleFingerPoints = [VNRecognizedPoint]()

indexCGPoints.append(CGPoint(x: (reqPoint.x * bounds.width) +

for point in middleFingerPoints {

middleCGPoints.append(CGPoint(x: (reqPoint.x * bounds.width) +

for point in littleFingerPoints {

littleCGPoints.append(CGPoint(x: (reqPoint.x * bounds.width) +

for point in thumbPoints {

for point in ringFingerPoints {

ringCGPoints.append(CGPoint(x: (reqPoint.x * bounds.width) +

var handLayers = [CAShapeLayer]()

let thumbLayer = CAShapeLayer()

let littleFingerLayer = CAShapeLayer()

let ringLayer = CAShapeLayer()

let middleLayer = CAShapeLayer()

func humanBoundingBox(for observation: VNRecognizedPointsObservation) -> CGRect {

func boundingBox(forRegionOfInterest: CGRect, withinImageBounds bounds: CGRect) ->

let imageWidth = bounds.width

// Begin with input rect.

// Rescale normalized coordinates.

VNRECOGNIZEDPOINTGROUPKEY.ALL - RETURNS ALL THE 21 POINTS

At 5, the observations generated from Vision framework are of type VNRecognizedPoint .

At 7, as mentioned above, we create a variable indexLayer of type CAShapeLayer() and

At 8, we declare an array handLayers of type CAShapeLayer() . This array holds the

At 9, we set the properties of all the 5 layers in the handLayers array.

At 10, we append each layer of the handLayer to the pathLayer.

func imagePickerController(_ picker: UIImagePickerController,

In the above code,

At 1, We Extract the chosen image from the gallery.

Group Keys for Human body pose API

You might also like