0% found this document useful (0 votes)
12 views

Module 10 - Learners Guide

The document discusses Edge AI and its applications, including the use of various hardware devices and optimization techniques such as pruning, clustering, and quantization. It also covers neuromorphic computing, comparing it with conventional AI, and introduces Edge Cloud, highlighting its benefits, use cases, and challenges. The content is aimed at providing insights into the architecture, implementation, and future trends of Edge AI and Edge Cloud technologies.

Uploaded by

blackythekarpie
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Module 10 - Learners Guide

The document discusses Edge AI and its applications, including the use of various hardware devices and optimization techniques such as pruning, clustering, and quantization. It also covers neuromorphic computing, comparing it with conventional AI, and introduces Edge Cloud, highlighting its benefits, use cases, and challenges. The content is aimed at providing insights into the architecture, implementation, and future trends of Edge AI and Edge Cloud technologies.

Uploaded by

blackythekarpie
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Edge AI / Embedded AI

20 April 2024
Agenda
• What is Edge AI ?
• Why, Where ,When ,How we use it for different solutions
• Sample Hardware devices
• Optimizations for target devices
i. Board background knowledge - understanding the problem statement, target
specifications
ii. Porting
iii. Optimizations for Edge devices –
i. Pruning,
ii. Clustering
iii. Quantization
• Neuromorphic computing
i. Von Neumann Vs Neuromorphic architectures
ii. SNNs
iii. Neuromorphic computing Vs Conventional AI(ANN/CNN)
iv. Applications
v. Example board
vi. Demo Videos
• EdgeCloud
I. What is Edge Cloud ?
II. Why is Edge cloud ?
III. Use cases of Edge Cloud &
IV. Challenges
Edge AI
What is Edge AI ?
Conventional Edge Devices
Edge devices are categorized into two major types
• Micro Controller (MCU)
• Micro Processor (MPU), Desktop PC, GPU , CPU

Casper system architecture of MCU and MPU

Automotive industry application


Why, Where,When,How we use it for different solutions
• Distributed systems
• Intelligence at Edge
• Cloud cost reduction
• Data security
• Processing at data generation spot
Sample edge devices

• Microcontrollers
• Micro processors
• Raspberry pi
• Sensors
• Smart phones
• Tablets
• Desktop PC
• GPU
• TPU
• ECU in cars
• Drones
• Connected devices
Background of Board knowledge & Porting

i. Background Knowledge of board required:


• Specifications of the board – RAM , ROM, Processor
• Usecase priority – Accuracy , latency , etc.,
• Connectivity for the board
• Support of IDE for the board , what is the development environment – python,C,C++
ii. Porting :
• Streamline the model development with the necessary optimizations specific
to target
• Check for the porting support – Binary/ .py file / MLOps pipepine etc.,
• As it’s a AI model , future enhancements provision has to be considered
• Packages supporting the target board and AI model
• ONNX model conversion (if needed)
iii. Optimizations

Need for optimization :


• All models are not compatible with all hardwares
• In order to realize the algorithm onto to selected target of lower memory footprint , we
need to optimize it according to the use case
• Its always a trade off between latency Vs accuracy when we optimize any model
• Below chart shows various methods of optimizations that can be implemented
• We will discuss 3 major methods in this session - Pruning , Quantization, Clustering

AI Model
Optimizations

Gradient
Matrix Scaling in
Pruning Quantization Clustering
Decomposition Network
Quantization
Pruning
Pruning is used to produce models having a smaller size for inference. Pruning is
implemented by removing unimportant connections or neurons. With reduced size, the model
becomes both memory efficient and energy efficient and faster at inference with minimal
loss.
Clustering

• Clustering works by grouping the weights of each layer in a model into a predefined
number of clusters, then sharing the centroid values for the weights belonging to each
individual cluster. This reduces the number of unique weight values in a model, thus
reducing its complexity.
• As a result, clustered models can be compressed more effectively, providing
deployment benefits similar to pruning.

Development workflow
• As a starting point, check if the models in hosted models can work for your application. If
not, it’s recommend that start with the post-training quantization tool since this is broadly
applicable and does not require training data.
• For cases where the accuracy and latency targets are not met, or hardware accelerator
support is important, quantization-aware training is the better option. If you want to further
reduce your model size, you can try pruning and/or clustering prior to quantizing your
models.
Quantization
Quantization refers to reducing the precision of weights, parameters, biases, and activations
so that they occupy less memory and the size of the model would be reduced. Usually it
replaces float32 parameters and inputs with other types, such as float16, INT32, INT16, INT8,
INT4, INT1 etc.
There are two ways to perform quantization.
• Post Training Quantization
• Quantization Aware Training
Types of Quantization
Post-Training Quantization (PTQ)
• Quantizing an already-trained model.
• Weights and activations are quantized for deployment.
• Significant reduction in model size
• Loss of accuracy
Quantization Aware Training
• Quantization is considered during training process
• Requires modification in training pipeline
• Better accuracy

Comparison of Quantization Aware Training (left) and Post Training


Quantisation(right)
Comparison for Quantization methods
Size
Technique Data requirements Accuracy Supported hardware
reduction
Post-training float16 No data Up to 50% Insignificant accuracy CPU, GPU
quantization loss

Post-training dynamic range No data Up to 75% Smallest accuracy CPU, GPU (Android)
quantization loss

Post-training integer Unlabelled representative Up to 75% Small accuracy loss CPU, GPU (Android), EdgeTPU,
quantization sample Hexagon DSP

Quantization-aware training Labelled training data Up to 75% Smallest accuracy CPU, GPU (Android), EdgeTPU,
loss Hexagon DSP

Sample
optimization
methods using
Tensorflow
library, shown
in the
Decision tree
Few model Quantization done for common CNN models

Top-1 Latency (Post Latency


Top-1 Top-1 Accuracy Latency Size Size
Accuracy Training (Quantization
Model Accuracy (Quantization (Original) (Original) (Optimized)
(Post Training Quantized) Aware Training)
(Original) Aware Training) (ms) (MB) (MB)
Quantized) (ms) (ms)

Mobilenet-v1-1-224 0.709 0.657 0.70 124 112 64 16.9 4.3

Mobilenet-v2-1-224 0.719 0.637 0.709 89 98 54 14 3.6

Inception_v3 0.78 0.772 0.775 1130 845 543 95.7 23.9

Resnet_v2_101 0.770 0.768 N/A 3973 2868 N/A 178.3 44.9


Neuromorphic Computing
Von Neumann Vs Neuromorphic Computing
Evolution of SNNs

Neuromorphic Market trends


Spiking Neural Networks(SNN)
The idea is that neurons in the SNN do not
transmit information at each propagation When the membrane potential reaches the
cycle (as it happens with typical multi- threshold, the neuron fires, and generates a
layer perceptron networks), but rather signal that travels to other neurons which, in
transmit information only when a membrane turn, increase or decrease their potentials in
potential—an intrinsic quality of the neuron response to this signal. A neuron model that
related to its membrane electrical charge— fires at the moment of threshold crossing is
reaches a specific value, called the also called a spiking neuron model.
threshold.

Neuromorphic Computing
Neuromorphic computing is putting together synthetic neurons that operate
according to the same principles as the human brain.

It operates on Spiking Neural Networks (SNNs), where each "neuron"


communicates with other neurons independently. It imitates the organic neural
networks found in living brains. Each "neuron" in the SNN can fire
independently of the others, and when it does, it sends pulsed signals to other
neurons in the network that directly alter the electrical states of those
neurons.
Neuromorphic Vs Conventional AI
Sl no Neuromorphic Computing Conventional AI
1 Designed to emulate the structure and behavior of Typically uses digital processors, such as CPUs and
biological neural networks. GPUs, following the von Neumann architecture.
Memory and processing units are separate

2 Emphasizes event-driven and asynchronous Processing involves executing software-based


processing, where computations occur when inputs algorithms in a sequential or parallel manner.
or stimuli are detected, similar to how neurons fire Algorithms are designed to process data and make
in response to signals in the brain. decisions based on predefined rules or learned
patterns.

3 Neuromorphic systems often use spiking neurons Conventional AI relies on fixed-precision


and synapses to represent and transmit data arithmetic and symbolic representations for data
processing

4 Neuromorphic computing aims to achieve high Conventional AI processing can be energy-


energy efficiency by minimizing data movement, intensive due to high-speed data transfer and
taking advantage of analog and continuous complex computations, although optimizations
computations, and leveraging the brain's efficient and advancements in hardware have improved
signal processing mechanisms efficiency.
ANN Vs SNN

SNN ANN
• Spikes as Fundamental Units • Continuous Values
• Temporal Coding • Vector Representations
• Sparse and Event-Driven • Fixed Precision
• Integration and Propagation • Feedforward Processing and
Backpropagation

** SNN is better than ANN in terms of power efficiency .


Neuromorphic Devices for SNN
Real World Applications
Industry that can be targeted :

1.Home appliances- smart home edge


intelligence
2.Automotive
3.Healthcare
4.Aerospace
5.Security
Etc.,

Automotive industry application


Brainchip Akida Raspberry pi EvalKit

Akida MiniPCIe card

Demos in Akida Evalkit :

1.Visual wake-word demo (detect person standing in camera field of view)


2.Edge learning demo
3.Edge face recognition demo
4.Detection demo Demo videos

We have tested all the demos mention above using the brainchip Akida
raspberry pi evalkit
Edge Cloud
Edge Cloud
What is Edge Cloud ?

• Edge computing describes the process of bringing compute


and storage elements closer to the network edge. And edge
cloud goes a step further and uses a cloud architecture for
that same process.
• Edge cloud computing extends the convenience of cloud to
edge networks. Edge clouds are hosted by micro-data
centers that store, analyze, and process data faster than is
possible using a connection to a data center.
• An edge cloud strategy places intelligent edge nodes closer
to local resources, equipment, and devices, with software to
deliver services in a way that’s like using public cloud
services.

Reference : Webinar Recording: How to Build a Basic


Edge Cloud (youtube.com)
Why Edge Cloud

•Faster response times. Data within a cloud edge network can be processed and consumed close to where it
is generated, enabling faster response times for enhanced user experiences.

•Optimized bandwidth. By processing more workloads locally, cloud edge networks minimize the need to
transmit massive amounts of data to centralized servers, reducing network usage and lowering bandwidth
needs.

•Increased security. By processing and storing data locally, cloud edge networks limit the distance sensitive
information must travel, and minimize its exposure to threats.

•Simpler data governance. Many countries and jurisdictions have different requirements for how data such as
customer records can be collected, used, stored, protected, and retained. When data travels long distances to
reach cloud data centers, managing these data sovereignty mandates can be complex and time-consuming.
Cloud edge computing simplifies data governance by processing and storing data locally.

•More flexibility and scalability. Cloud edge networks make it easy to scale applications with ease and to run
modern apps built on containers or existing apps on virtual machines, all within a single platform.

•Real-time insight. Because cloud edge computing enables data to be processed faster and delivered with
quicker response times, solutions such as analytics platforms can deliver insight to end users with greater
speed and timeliness.
Use cases for Edge Cloud & Challenges
Areas where we can implement Edge Cloud solutions :

• Multimedia experiences
• Manufacturing
• Self-driving vehicles
• Healthcare
• Smart cities Solutions

Challenges of Edge Cloud :

As a complex and critical system within IT infrastructure, edge cloud presents some challenges:

•Edge device management: Managing many edge devices across multiple locations can be resource
intensive. Ensuring proper configuration, security updates, and maintenance of servers poses a challenge.

•Network connectivity and reliability: Edge cloud relies on a stable connection between the servers at the
edge of the network and a centralized cloud. In remote environments, maintaining consistent network
connectivity may be difficult, leading to unreliable access to cloud services.

•Scalability and resource management: Scaling resources at the edge to handle fluctuations in demand
can be difficult. Achieving seamless resource orchestration and load balancing in a distributed environment
requires careful coordination.
References for Edge Cloud

Links:
https://www.hpe.com/in/en/what-is/edge-to-cloud.html

https://www.ciena.com/insights/what-is/What-is-Edge-Cloud.html

https://www.intel.com/content/www/us/en/edge-computing/edge-
cloud.html#articleparagraph_375506681

https://www.vmware.com/topics/glossary/content/edge-
cloud.html#:~:text=Edge%20cloud%20is%20cloud%20computing,or%20private%20cloud%20f
or%20processing
Thank You

www.tataelxsi.com

Confidentiality Notice
This document and all information contained herein is the sole property of Tata Elxsi
Limited and shall not be reproduced or disclosed to a third party without the express
written consent of Tata Elxsi Limited.

You might also like