Module 10 - Learners Guide
Module 10 - Learners Guide
20 April 2024
Agenda
• What is Edge AI ?
• Why, Where ,When ,How we use it for different solutions
• Sample Hardware devices
• Optimizations for target devices
i. Board background knowledge - understanding the problem statement, target
specifications
ii. Porting
iii. Optimizations for Edge devices –
i. Pruning,
ii. Clustering
iii. Quantization
• Neuromorphic computing
i. Von Neumann Vs Neuromorphic architectures
ii. SNNs
iii. Neuromorphic computing Vs Conventional AI(ANN/CNN)
iv. Applications
v. Example board
vi. Demo Videos
• EdgeCloud
I. What is Edge Cloud ?
II. Why is Edge cloud ?
III. Use cases of Edge Cloud &
IV. Challenges
Edge AI
What is Edge AI ?
Conventional Edge Devices
Edge devices are categorized into two major types
• Micro Controller (MCU)
• Micro Processor (MPU), Desktop PC, GPU , CPU
• Microcontrollers
• Micro processors
• Raspberry pi
• Sensors
• Smart phones
• Tablets
• Desktop PC
• GPU
• TPU
• ECU in cars
• Drones
• Connected devices
Background of Board knowledge & Porting
AI Model
Optimizations
Gradient
Matrix Scaling in
Pruning Quantization Clustering
Decomposition Network
Quantization
Pruning
Pruning is used to produce models having a smaller size for inference. Pruning is
implemented by removing unimportant connections or neurons. With reduced size, the model
becomes both memory efficient and energy efficient and faster at inference with minimal
loss.
Clustering
• Clustering works by grouping the weights of each layer in a model into a predefined
number of clusters, then sharing the centroid values for the weights belonging to each
individual cluster. This reduces the number of unique weight values in a model, thus
reducing its complexity.
• As a result, clustered models can be compressed more effectively, providing
deployment benefits similar to pruning.
Development workflow
• As a starting point, check if the models in hosted models can work for your application. If
not, it’s recommend that start with the post-training quantization tool since this is broadly
applicable and does not require training data.
• For cases where the accuracy and latency targets are not met, or hardware accelerator
support is important, quantization-aware training is the better option. If you want to further
reduce your model size, you can try pruning and/or clustering prior to quantizing your
models.
Quantization
Quantization refers to reducing the precision of weights, parameters, biases, and activations
so that they occupy less memory and the size of the model would be reduced. Usually it
replaces float32 parameters and inputs with other types, such as float16, INT32, INT16, INT8,
INT4, INT1 etc.
There are two ways to perform quantization.
• Post Training Quantization
• Quantization Aware Training
Types of Quantization
Post-Training Quantization (PTQ)
• Quantizing an already-trained model.
• Weights and activations are quantized for deployment.
• Significant reduction in model size
• Loss of accuracy
Quantization Aware Training
• Quantization is considered during training process
• Requires modification in training pipeline
• Better accuracy
Post-training dynamic range No data Up to 75% Smallest accuracy CPU, GPU (Android)
quantization loss
Post-training integer Unlabelled representative Up to 75% Small accuracy loss CPU, GPU (Android), EdgeTPU,
quantization sample Hexagon DSP
Quantization-aware training Labelled training data Up to 75% Smallest accuracy CPU, GPU (Android), EdgeTPU,
loss Hexagon DSP
Sample
optimization
methods using
Tensorflow
library, shown
in the
Decision tree
Few model Quantization done for common CNN models
Neuromorphic Computing
Neuromorphic computing is putting together synthetic neurons that operate
according to the same principles as the human brain.
SNN ANN
• Spikes as Fundamental Units • Continuous Values
• Temporal Coding • Vector Representations
• Sparse and Event-Driven • Fixed Precision
• Integration and Propagation • Feedforward Processing and
Backpropagation
We have tested all the demos mention above using the brainchip Akida
raspberry pi evalkit
Edge Cloud
Edge Cloud
What is Edge Cloud ?
•Faster response times. Data within a cloud edge network can be processed and consumed close to where it
is generated, enabling faster response times for enhanced user experiences.
•Optimized bandwidth. By processing more workloads locally, cloud edge networks minimize the need to
transmit massive amounts of data to centralized servers, reducing network usage and lowering bandwidth
needs.
•Increased security. By processing and storing data locally, cloud edge networks limit the distance sensitive
information must travel, and minimize its exposure to threats.
•Simpler data governance. Many countries and jurisdictions have different requirements for how data such as
customer records can be collected, used, stored, protected, and retained. When data travels long distances to
reach cloud data centers, managing these data sovereignty mandates can be complex and time-consuming.
Cloud edge computing simplifies data governance by processing and storing data locally.
•More flexibility and scalability. Cloud edge networks make it easy to scale applications with ease and to run
modern apps built on containers or existing apps on virtual machines, all within a single platform.
•Real-time insight. Because cloud edge computing enables data to be processed faster and delivered with
quicker response times, solutions such as analytics platforms can deliver insight to end users with greater
speed and timeliness.
Use cases for Edge Cloud & Challenges
Areas where we can implement Edge Cloud solutions :
• Multimedia experiences
• Manufacturing
• Self-driving vehicles
• Healthcare
• Smart cities Solutions
As a complex and critical system within IT infrastructure, edge cloud presents some challenges:
•Edge device management: Managing many edge devices across multiple locations can be resource
intensive. Ensuring proper configuration, security updates, and maintenance of servers poses a challenge.
•Network connectivity and reliability: Edge cloud relies on a stable connection between the servers at the
edge of the network and a centralized cloud. In remote environments, maintaining consistent network
connectivity may be difficult, leading to unreliable access to cloud services.
•Scalability and resource management: Scaling resources at the edge to handle fluctuations in demand
can be difficult. Achieving seamless resource orchestration and load balancing in a distributed environment
requires careful coordination.
References for Edge Cloud
Links:
https://www.hpe.com/in/en/what-is/edge-to-cloud.html
https://www.ciena.com/insights/what-is/What-is-Edge-Cloud.html
https://www.intel.com/content/www/us/en/edge-computing/edge-
cloud.html#articleparagraph_375506681
https://www.vmware.com/topics/glossary/content/edge-
cloud.html#:~:text=Edge%20cloud%20is%20cloud%20computing,or%20private%20cloud%20f
or%20processing
Thank You
www.tataelxsi.com
Confidentiality Notice
This document and all information contained herein is the sole property of Tata Elxsi
Limited and shall not be reproduced or disclosed to a third party without the express
written consent of Tata Elxsi Limited.