0% found this document useful (0 votes)
354 views17 pages

Exploring Apple's VISION Framework

The document discusses building an iOS app to detect human hand and body pose landmarks using Apple's Vision framework. It will create a simple UI with an image view and switches to select between detecting hand or body pose. When an image is selected from the photo library, the Vision framework will detect landmarks and draw them on the image without requiring internet access. The tutorial provides code to set up the UI, get user photo access, scale images appropriately, and prepare for using Vision to detect poses.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
354 views17 pages

Exploring Apple's VISION Framework

The document discusses building an iOS app to detect human hand and body pose landmarks using Apple's Vision framework. It will create a simple UI with an image view and switches to select between detecting hand or body pose. When an image is selected from the photo library, the Vision framework will detect landmarks and draw them on the image without requiring internet access. The tutorial provides code to set up the UI, get user photo access, scale images appropriately, and prepare for using Vision to detect poses.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Exploring Apple's VISION Framework:

Building a simple iOS App to find Human hand pose and


body pose landmarks.
Having been introduced in Apple's WWDC 2017, the VISION Framework has been
continuously updated each year to provide various features for us developers to design
exciting Apps. This year at WWDC 2020, Apple offered a new set of tools via the VISION
framework that takes the field of computer vision to the next level by detecting the
human hand pose and body pose.

What we will be learning in this tutorial


All these days we have been using VISION to detect human faces, facial landmarks, text
and barcode recognition, and many more. The framework does all the complex parts of
detection for us exposing the API's which can be directly used in our app. The other
cool thing with VISION is that we don't require an internet connection to work with
this framework. So knowing all these, today, in this tutorial we will be learning how
to use this framework to find the various landmarks of human hands and the human body
that helps us to understand their pose in the static pictures. For this, we will be
building an iOS app and see how the framework does wonders for us to detect the human
pose.

Note: The project in this tutorial was built using Swift 5 in Xcode 12 beta. So you
need to have Xcode 12 beta to run the project. you don't have to have a device
(cheers:-)) as this project was built to support a simulator.

The app we are going to build is very simple. our app allows the user's to select an
image from the iPhone's gallery and as soon as the image loads we detect the hand and
body pose landmarks using our VISION framework.

Now, enough with the theory part, let's dive into the coding session.

Getting Started
First, go to the Xcode 12 beta and create a new project. Select the app template under
iOS(previously "Single View App") and on the next page give the application name and
make sure the interface is set to "storyboard" and swift is selected as the
programming language.

We will maintain a simple UI and give more priority to the VISION part. So let's
begin! go to the Main.storyboard and select the image view from the UI controls
library. Drag and drop the image view into our view Controller. Set the leading,
trailing to zero, and center it vertically in the safe area. Set the height of the
image view as 400. This makes the image view to sit in the center of the view, no
matter the size of the device. Now to access the phone's photo library we will be
requiring a button action. so add a button at the bottom of the view controller and
name it as "Select Photo" in the attributes inspector. Set the vertical spacing
between the button and the image view as 100 and center it horizontally in the safe
area. Now drag and drop two switch controls into the view controller. select both of
them and embed them in the stack view. These switch controls allow the users to choose
between hand pose and body pose. set the leading and trailing of the stack view and
set the vertical space between the stack view and image view as 140. This pins the
switch controls to the image view. Now add a label at the bottom of each switch in the
view controller and add the constraints to these labels. Finally, go to the attribute
inspector of each label and give the names as hand and body respectively. This
completes the UI for our app. Now, the storyboard pretty much looks like this
Implementing the photo library function
Now that we have set everything in the storyboard, it's time to write some code to use
these elements. Go to viewController.swift and adopt the
UINavigationControllerDelegate protocol that will be required by the
UIImagePickerController class.

class ViewController: UIViewController,


UINavigationControllerDelegate

Now, add three new IBOutlets and name them as imageView, handSwitch and bodySwitch for
the image view, and two switch controls respectively. Next, create the IBaction method
for selecting an image from the gallery. The code in the view controller should look
like this.

class ViewController: UIViewController,


UIImagePickerControllerDelegate,
UINavigationControllerDelegate {

@IBOutlet weak var imageView: UIImageView!


@IBOutlet weak var handSwitch: UISwitch!
@IBOutlet weak var bodySwitch: UISwitch!

override func viewDidLoad() {


super.viewDidLoad()
}

@IBAction func selectPhotoAction(_ sender:


Any) {
let imagePicker = UIImagePickerController()
imagePicker.delegate = self
imagePicker.sourceType = .photoLibrary
self.present(imagePicker, animated: true)

}
}

Make sure you connect all these outlet variables and action methods to the respective
controls in the storyboard. In the "selectPhotoAction", we create a constant of type
UIImagePickerController() . we set its delegate method to the class and then we
present the UIImagePickerController() to the user.

As we set the UIImagePickerController() delegate to the class. we have to adopt the


delegate and add the UIImagePickerControllerDelegate class method to
ViewController.swift

extension ViewController:UIImagePickerControllerDelegate {
func imagePickerControllerDidCancel(_picker: UIImagePickerController) {
dismiss(animated: true, completion: nil)
}
}

The line above handles the app if the user cancels the image. It also assigns the
class method UIImagePickerControllerDelegate to the ViewController. After this declare
two variables imageWidth and imageHeight of type CGFloat to get the image width and
height respectively. Declare the pathLayer variable of type CALayer . We use this
pathLayer to hold the Vision results. Your code should now look a little like this.

class ViewController: UIViewController,


UIImagePickerControllerDelegate,
UINavigationControllerDelegate {

@IBOutlet weak var imageView: UIImageView!


@IBOutlet weak var handSwitch: UISwitch!
@IBOutlet weak var bodySwitch: UISwitch!

var imageWidth: CGFloat = 0


var imageHeight: CGFloat = 0
var pathLayer: CALayer?

override func viewDidLoad() {


super.viewDidLoad()
}

@IBAction func selectPhotoAction(_ sender:


Any) {
let imagePicker = UIImagePickerController()
imagePicker.delegate = self
imagePicker.sourceType = .photoLibrary
self.present(imagePicker, animated: true)

}
}
extension ViewController:UIImagePickerControllerDelegate {
func imagePickerControllerDidCancel(_picker: UIImagePickerController) {
dismiss(animated: true, completion: nil)
}
}

To access the photo library, you need to get permission from the users. So Go to your
Info.plist and add this entry "Privacy – Photo Library Usage Description". We do this
because Starting from iOS 10, you will need to specify the reason why your app needs
to access the camera and photo library.

Now, to get the image in standard size and orientation, we define a function that
transforms the input gallery image into the required size. This function accepts the
UIImage as an argument and returns the transformed image.

func scaleAndOrient(image: UIImage) -> UIImage {

// Set a default value for limiting image size.


let maxResolution: CGFloat = 640

guard let cgImage = image.cgImage else {


print("UIImage has no CGImage backing it!")
return image
}

// Compute parameters for transform.


let width = CGFloat(cgImage.width)
let height = CGFloat(cgImage.height)
var transform = CGAffineTransform.identity

var bounds = CGRect(x: 0, y: 0, width: width, height: height)

if width > maxResolution ||


height > maxResolution {
let ratio = width / height
if width > height {
bounds.size.width = maxResolution
bounds.size.height = round(maxResolution / ratio)
} else {
bounds.size.width = round(maxResolution * ratio)
bounds.size.height = maxResolution
}
}

let scaleRatio = bounds.size.width / width


let orientation = image.imageOrientation
switch orientation {
case .up:
transform = .identity
case .down:
transform = CGAffineTransform(translationX: width, y: height).rotated(by:
.pi)
case .left:
let boundsHeight = bounds.size.height
bounds.size.height = bounds.size.width
bounds.size.width = boundsHeight
transform = CGAffineTransform(translationX: 0, y: width).rotated(by: 3.0 *
.pi / 2.0)
case .right:
let boundsHeight = bounds.size.height
bounds.size.height = bounds.size.width
bounds.size.width = boundsHeight
transform = CGAffineTransform(translationX: height, y: 0).rotated(by: .pi
/ 2.0)
case .upMirrored:
transform = CGAffineTransform(translationX: width, y: 0).scaledBy(x: -1,
y: 1)
case .downMirrored:
transform = CGAffineTransform(translationX: 0, y: height).scaledBy(x: 1,
y: -1)
case .leftMirrored:
let boundsHeight = bounds.size.height
bounds.size.height = bounds.size.width
bounds.size.width = boundsHeight
transform = CGAffineTransform(translationX: height, y: width).scaledBy(x:
-1, y: 1).rotated(by: 3.0 * .pi / 2.0)
case .rightMirrored:
let boundsHeight = bounds.size.height
bounds.size.height = bounds.size.width
bounds.size.width = boundsHeight
transform = CGAffineTransform(scaleX: -1, y: 1).rotated(by: .pi / 2.0)
default:
transform = .identity
}

return UIGraphicsImageRenderer(size: bounds.size).image { rendererContext in


let context = rendererContext.cgContext

if orientation == .right || orientation == .left {


context.scaleBy(x: -scaleRatio, y: scaleRatio)
context.translateBy(x: -height, y: 0)
} else {
context.scaleBy(x: scaleRatio, y: -scaleRatio)
context.translateBy(x: 0, y: -height)
}
context.concatenate(transform)
context.draw(cgImage, in: CGRect(x: 0, y: 0, width: width, height:
height))
}
}

Now, we define the other function that prepares the pathLayer variable to hold the
Vision results and displays the selected image in the UIImageView . This function
takes the UIImage as the argument.

func show(_ image: UIImage) {

// 1
pathLayer?.removeFromSuperlayer()
pathLayer = nil
imageView.image = nil

// 2
let correctedImage = scaleAndOrient(image: image)

// 3
imageView.image = correctedImage

guard let cgImage = correctedImage.cgImage else {


print("Trying to show an image not backed by CGImage!")
return
}

let fullImageWidth = CGFloat(cgImage.width)


let fullImageHeight = CGFloat(cgImage.height)

let imageFrame = imageView.frame


let widthRatio = fullImageWidth / imageFrame.width
let heightRatio = fullImageHeight / imageFrame.height

// 4
let scaleDownRatio = max(widthRatio, heightRatio)

// 5
imageWidth = fullImageWidth / scaleDownRatio
imageHeight = fullImageHeight / scaleDownRatio

// 6
let xLayer = (imageFrame.width - imageWidth) / 2
let yLayer = imageView.frame.minY + (imageFrame.height - imageHeight) / 2
let drawingLayer = CALayer()
drawingLayer.bounds = CGRect(x: xLayer, y: yLayer, width: imageWidth, height:
imageHeight)
drawingLayer.anchorPoint = CGPoint.zero
drawingLayer.position = CGPoint(x: xLayer, y: yLayer)
drawingLayer.opacity = 0.5
pathLayer = drawingLayer
// 7
self.view.layer.addSublayer(pathLayer!)
}

In the above code, At 1, we remove the previous path & old image from the View.
At 2, we call the previously defined func scaleAndOrient(image: UIImage) -> UIImage to
transform the image.
At 3, we place the image inside the imageView.
At 4, we scale down the image according to the stricter dimension.
At 5, we cache image dimensions to reference when drawing CALayer paths.
At 6, we prepare the pathLayer variable that we defined earlier of type CALayer to
hold Vision results.
At 7, we add the pathLayer to the main View.

That's it! Now, we are ready to start with the actual part of this tutorial that is
VISION.

Integrating the VISION Hand pose and body Pose APIs


The Vision Hand pose and Body pose APIs follow the same pattern as the other Vision
APIs. First, we will see how to implement the Human Hand pose request. The Body pose
is more or less similar to Hand pose except for the names used in the API.

The first step is to create a request handler. For this, we use the
VNImageRequestHandler() .

The next step is to create the request using VNDetectHumanHandPoseRequest() . For this
purpose, we create a lazy variable. Because the process to create request is
computationally expensive. so add the following lazy variable to the ViewController
class.

lazy var handLandmarkRequest = VNDetectHumanHandPoseRequest(completionHandler:


self.handleDetectedHandLandmarks)

Now we write a function to creates the Requests for hand pose and body pose.

func createVisionRequests() -> [VNRequest]


{

var requests: [VNRequest] = []

// Create & include a request if and only if switch is ON.


if self.handSwitch.isOn {
requests.append(self.handLandmarkRequest)
}
if self.bodySwitch.isOn {
requests.append(self.humanLandmarkRequest)
}
// Return grouped requests as a single array.
return requests
}

In the above function, we created an array of type VNRequest . Based on the switch
status we append the corresponding request and return the requests array.

We need to send these requests to the request handler. so, we write a function that
fetches the above requests and passes them to the request handler.

func performVisionRequestforLandmarks(image: CGImage, orientation:


CGImagePropertyOrientation) {
// 1
let requests = createVisionRequests()
// 2
let imageRequestHandler = VNImageRequestHandler(cgImage: image,
orientation: orientation,
options: [:])
// 3
DispatchQueue.global(qos: .userInitiated).async {
do {
try imageRequestHandler.perform(requests)
} catch let error as NSError {
print("Failed to perform image request: \(error)")
return
}
}
}

At 1, we fetch the desired requests based on switch status.


At 2, we create a request handler.
At 3, we send the requests to the request handler.

Once the results are fetched, these results are handed over to the
handleDetectedHandLandmarks() function in the completion handler of the
handLandmarkRequest variable. The implementation of this function is as follows

func handleDetectedHandLandmarks(request: VNRequest?, error: Error?) {


// 1
if let nsError = error as NSError? {
print("Landmark Detection Error \(nsError)")
return
}
// 2
DispatchQueue.main.async {
guard let drawLayer = self.pathLayer,
let results = request?.results as? [VNRecognizedPointsObservation]
else {
return
}
self.drawFeaturesforHandPose(onHands: results, onImageWithBounds:
drawLayer.bounds)
drawLayer.setNeedsDisplay()
}
}

In the above function, At 1, we handle the errors if any during the generation of
results and print the error to the debugger and return.
At 2, If there is no error we pass the results and image bounds to the
drawFeaturesforHandPose() function.

Next, we see the implementation of the drawFeaturesforHandPose() function. This is


the function where we actually draw something on the view using the results from the
Vision api.

func drawFeaturesforHandPose(onHands hands: [VNRecognizedPointsObservation],


onImageWithBounds bounds: CGRect) {
CATransaction.begin()

for hand in hands {


// 1
let observationBox = self.humanBoundingBox(for: hand)
// 2
let humanBounds = boundingBox(forRegionOfInterest: observationBox,
withinImageBounds: bounds)
do {
// 3
let allPoints = try hand.recognizedPoints(forGroupKey:
VNRecognizedPointGroupKey.all)

// 4
var indexPoints = [VNRecognizedPoint]()
indexPoints.append(allPoints[.handLandmarkKeyIndexTIP]!)
indexPoints.append(allPoints[.handLandmarkKeyIndexDIP]!)
indexPoints.append(allPoints[.handLandmarkKeyIndexPIP]!)
indexPoints.append(allPoints[.handLandmarkKeyIndexMCP]!)
indexPoints.append(allPoints[.handLandmarkKeyWrist]!)

var thumbPoints = [VNRecognizedPoint]()


thumbPoints.append(allPoints[.handLandmarkKeyThumbTIP]!)
thumbPoints.append(allPoints[.handLandmarkKeyThumbIP]!)
thumbPoints.append(allPoints[.handLandmarkKeyThumbMP]!)
thumbPoints.append(allPoints[.handLandmarkKeyThumbCMC]!)
thumbPoints.append(allPoints[.handLandmarkKeyWrist]!)

var ringFingerPoints = [VNRecognizedPoint]()


ringFingerPoints.append(allPoints[.handLandmarkKeyRingTIP]!)
ringFingerPoints.append(allPoints[.handLandmarkKeyRingDIP]!)
ringFingerPoints.append(allPoints[.handLandmarkKeyRingPIP]!)
ringFingerPoints.append(allPoints[.handLandmarkKeyRingMCP]!)
ringFingerPoints.append(allPoints[.handLandmarkKeyWrist]!)

var middleFingerPoints = [VNRecognizedPoint]()


middleFingerPoints.append(allPoints[.handLandmarkKeyMiddleTIP]!)
middleFingerPoints.append(allPoints[.handLandmarkKeyMiddleDIP]!)
middleFingerPoints.append(allPoints[.handLandmarkKeyMiddlePIP]!)
middleFingerPoints.append(allPoints[.handLandmarkKeyMiddleMCP]!)
middleFingerPoints.append(allPoints[.handLandmarkKeyWrist]!)

var littleFingerPoints = [VNRecognizedPoint]()


littleFingerPoints.append(allPoints[.handLandmarkKeyLittleTIP]!)
littleFingerPoints.append(allPoints[.handLandmarkKeyLittleDIP]!)
littleFingerPoints.append(allPoints[.handLandmarkKeyLittlePIP]!)
littleFingerPoints.append(allPoints[.handLandmarkKeyLittleMCP]!)
littleFingerPoints.append(allPoints[.handLandmarkKeyWrist]!)

// 5
var indexCGPoints = [CGPoint]()
var middleCGPoints = [CGPoint]()
var littleCGPoints = [CGPoint]()
var thumbCGPoints = [CGPoint]()
var ringCGPoints = [CGPoint]()

// 6
for point in indexPoints {
let reqPoint = CGPoint(x: (point.location.x), y:
(point.location.y))

indexCGPoints.append(CGPoint(x: (reqPoint.x * bounds.width) +


bounds.origin.x - humanBounds.origin.x, y: ((1 - reqPoint.y) * bounds.height) +
bounds.origin.y - humanBounds.origin.y ))
}

for point in middleFingerPoints {


let reqPoint = CGPoint(x: (point.location.x), y:
(point.location.y))

middleCGPoints.append(CGPoint(x: (reqPoint.x * bounds.width) +


bounds.origin.x - humanBounds.origin.x, y: ((1 - reqPoint.y) * bounds.height) +
bounds.origin.y - humanBounds.origin.y ))
}

for point in littleFingerPoints {


let reqPoint = CGPoint(x: (point.location.x), y:
(point.location.y))

littleCGPoints.append(CGPoint(x: (reqPoint.x * bounds.width) +


bounds.origin.x - humanBounds.origin.x, y: ((1 - reqPoint.y) * bounds.height) +
bounds.origin.y - humanBounds.origin.y ))
}

for point in thumbPoints {


let reqPoint = CGPoint(x: (point.location.x), y:
(point.location.y))
thumbCGPoints.append(CGPoint(x: (reqPoint.x * bounds.width) +
bounds.origin.x - humanBounds.origin.x, y: ((1 - reqPoint.y) * bounds.height) +
bounds.origin.y - humanBounds.origin.y ))
}

for point in ringFingerPoints {


let reqPoint = CGPoint(x: (point.location.x), y:
(point.location.y))

ringCGPoints.append(CGPoint(x: (reqPoint.x * bounds.width) +


bounds.origin.x - humanBounds.origin.x, y: ((1 - reqPoint.y) * bounds.height) +
bounds.origin.y - humanBounds.origin.y ))
}

var handLayers = [CAShapeLayer]()


// 7
let indexLayer = CAShapeLayer()
let indexPath = CGMutablePath()
indexPath.move(to: indexCGPoints[0])
for i in 1..<indexCGPoints.count {
indexPath.addLine(to: indexCGPoints[i])
}
indexLayer.path = indexPath

let thumbLayer = CAShapeLayer()


let thumbPath = CGMutablePath()
thumbPath.move(to: thumbCGPoints[0])
for i in 1..<thumbCGPoints.count {
thumbPath.addLine(to: thumbCGPoints[i])
}
thumbLayer.path = thumbPath

let littleFingerLayer = CAShapeLayer()


let littleFingerPath = CGMutablePath()
littleFingerPath.move(to: littleCGPoints[0])
for i in 1..<littleCGPoints.count {
littleFingerPath.addLine(to: littleCGPoints[i])
}
littleFingerLayer.path = littleFingerPath

let ringLayer = CAShapeLayer()


let ringPath = CGMutablePath()
ringPath.move(to: ringCGPoints[0])
for i in 1..<ringCGPoints.count {
ringPath.addLine(to: ringCGPoints[i])
}
ringLayer.path = ringPath

let middleLayer = CAShapeLayer()


let middlePath = CGMutablePath()
middlePath.move(to: middleCGPoints[0])
for i in 1..<middleCGPoints.count {
middlePath.addLine(to: middleCGPoints[i])
}
middleLayer.path = middlePath
// 8
handLayers.append(ringLayer)
handLayers.append(indexLayer)
handLayers.append(middleLayer)
handLayers.append(littleFingerLayer)
handLayers.append(thumbLayer)
// 9
for index in 0..<5 {
handLayers[index].lineWidth = 2
handLayers[index].strokeColor = UIColor.green.cgColor
handLayers[index].fillColor = nil
handLayers[index].shadowOpacity = 0.75
handLayers[index].shadowRadius = 4
handLayers[index].anchorPoint = .zero
handLayers[index].frame = humanBounds
handLayers[index].transform = CATransform3DMakeScale(1, 1, 1)
// 10
pathLayer?.addSublayer(handLayers[index])
}

catch {
print("error")
}
}

CATransaction.commit()
}

At 1, we need to get a boundary for the detected observations. This helps us to know
where the hand is located in the image. For Vision Face detection API, we get a
default boundingBox method for observations. But for Hand detection API, there is no
such method. So we write our method.

func humanBoundingBox(for observation: VNRecognizedPointsObservation) -> CGRect {


var box = CGRect.zero
var normalizedBoundingBox = CGRect.null
guard let points = try? observation.recognizedPoints(forGroupKey: .all) else {
return box
}
for (_, point) in points {
normalizedBoundingBox = normalizedBoundingBox.union(CGRect(origin:
point.location, size: .zero))
}
if !normalizedBoundingBox.isNull {
box = normalizedBoundingBox
}
return box
}

At 2, we have to convert the observation box we get in step 1 to the image scale. To
achieve this, we send the "observationBox" and image bounds to the following function.

func boundingBox(forRegionOfInterest: CGRect, withinImageBounds bounds: CGRect) ->


CGRect {

let imageWidth = bounds.width


let imageHeight = bounds.height

// Begin with input rect.


var rect = forRegionOfInterest

// Reposition origin.
rect.origin.x *= imageWidth
rect.origin.x += bounds.origin.x
rect.origin.y = (1 - rect.origin.y) * imageHeight + bounds.origin.y

// Rescale normalized coordinates.


rect.size.width *= imageWidth
rect.size.height *= imageHeight

return rect
}

At 3, for each hand, vision returns 21 observation points(4 points for each finger and
one for wrist). Each point is accessed using a key associated with it. These points
are again grouped for each finger and can be accessed via group keys. The following
are the group keys provided by Vision. These are used to access all the points in a
particular finger at a time.

VNRECOGNIZEDPOINTGROUPKEY.ALL - RETURNS ALL THE 21 POINTS


VNRECOGNIZEDPOINTGROUPKEY.HANDLANDMARKREGIONKEYTHUMB - RETURNS 4 POINTS ON THE THUMB FINGER.
VNRECOGNIZEDPOINTGROUPKEY.HANDLANDMARKREGIONKEYINDEXFINGER - RETURNS 4 POINTS ON THE INDEX FINGER.
VNRECOGNIZEDPOINTGROUPKEY.HANDLANDMARKREGIONKEYMIDDLEFINGER - RETURNS 4 POINTS ON THE MIDDLE
FINGER.
VNRECOGNIZEDPOINTGROUPKEY.HANDLANDMARKREGIONKEYRINGFINGER - RETURNS 4 POINTS ON THE RING FINGER.
VNRECOGNIZEDPOINTGROUPKEY.HANDLANDMARKREGIONKEYLITTLEFINGER - RETURNS 4 POINTS ON THE LITTLE
FINGER.
At 4, we declare an array of type VNRecognizedPoint and append the observed points of
Index finger to this array. The observed points for Index finger can be accessed using
the following keys. .handLandmarkKeyIndexTIP - This key gives the tip point of the
finger. ````.handLandmarkKeyIndexDIP - This key gives the second point from the top of
the finger. .handLandmarkKeyIndexPIP - This gives the third point from the top of the
finger. .handLandmarkKeyIndexMCP - This key gives the last point of the finger from the
top. .handLandmarkKeyWrist``` - This key gives the wrist point. We repeat this for all
the five fingers.

At 5, the observations generated from Vision framework are of type VNRecognizedPoint .


In order to plot them on View, we need to convert them to CGPoint . So we declare an
array of type CGPoint for each finger.
At 6, we convert each observed point of index finger to CGPoint and scale it based on
the image bounds. once converted, we append this point to the indexCGPoints array
defined at step 5. We repeat this step for all the five fingers.

Now, with all these points in their respective arrays, we draw lines between these
points for each finger so that we get to see the landmarks given by the Vision api.For
this purpose, we create variables of type CGmutablePath() that hold the path between
the points and variables of type CAShapeLayer() are used to show these lines on the
view.

At 7, as mentioned above, we create a variable indexLayer of type CAShapeLayer() and


other variable indexPath of type CGMutablePath() . Using the addLine method, we add
lines between the points and assign this path to the indexLayer path. We repeat this
for all the five fingers.

At 8, we declare an array handLayers of type CAShapeLayer() . This array holds the


layers of all the five fingers.

At 9, we set the properties of all the 5 layers in the handLayers array.

At 10, we append each layer of the handLayer to the pathLayer.

Now with all these done, it is time to make use of these functions. So we implement
the imagePickerController(_:didFinishPickingMediaWithInfo) method in the
ViewController to process the selected image and send the image to the Vision api's.

func imagePickerController(_ picker: UIImagePickerController,


didFinishPickingMediaWithInfo info:
[UIImagePickerController.InfoKey: Any]) {
// 1
let originalImage: UIImage =
info[UIImagePickerController.InfoKey.originalImage] as! UIImage
show(originalImage)
// 2
let cgOrientation = CGImagePropertyOrientation(rawValue:
UInt32(originalImage.imageOrientation.rawValue))
guard let cgImage = originalImage.cgImage
else {
return
}
// 3
performVisionRequestforLandmarks(image: cgImage,
orientation: cgOrientation!)
// 4
dismiss(animated: true, completion: nil)
}

In the above code,

At 1, We Extract the chosen image from the gallery.


At 2, we convert the image from UIImageOrientation to CGImagePropertyOrientation.
At 3, we make the function call to create the request for the landmarks on the
selected image.
At 4, we dismiss the picker to return to the original view controller.
Finally, with this function written, we implemented the Human hand pose. Now, it is
time to see the results of our hard work. So build and run the app, as soon as the app
opens, select a photo from the gallery the Vision api starts working and the detected
landmarks can be seen on the screen as shown below.

Notice, that we turned off the body switch as we are yet to implement the body pose.
Now that we are done with Hand pose implementation, I request you to try the body pose
implementation on your own so that you get to understand the methods we implemented.
It is almost similar to hand pose except for the Group Key names which I will provide
below. In any case if you get a doubt in implementation, download the full project
from the git repo where I have implemented both hand pose and body pose.

Group Keys for Human body pose API


VNRECOGNIZEDPOINTGROUPKEY.ALL - RETURNS ALL THE BODY POSE LANDMARKS.
VNRECOGNIZEDPOINTGROUPKEY.BODYLANDMARKREGIONKEYFACE - RETURNS ALL THE LANDMARKS ON FACE.
VNRECOGNIZEDPOINTGROUPKEY.BODYLANDMARKREGIONKEYLEFTARM - RETURNS ALL THE LANDMARKS ON LEFT ARM.
VNRECOGNIZEDPOINTGROUPKEY.BODYLANDMARKREGIONKEYLEFTLEG - RETURNS ALL THE LANDMARKS ON LEFT LEG.
VNRECOGNIZEDPOINTGROUPKEY.BODYLANDMARKREGIONKEYRIGHTARM - RETURNS ALL THE LANDMARKS ON RIGHT ARM.
VNRECOGNIZEDPOINTGROUPKEY.BODYLANDMARKREGIONKEYRIGHTLEG - RETURNS ALL THE LANDMARKS ON RIGHT LEG.
VNRECOGNIZEDPOINTGROUPKEY.BODYLANDMARKREGIONKEYTORSO - RETURNS ALL THE LANDMARKS ON THE TRUNK OF
THE HUMAN BODY.
Once you implement the Human body pose, the result would look something like below.

This marks the end of our tutorial on implementation of Human hand pose and body pose.

You might also like