Gerardo Rodríguez Barba
Control de Lectura
Lunes 12 febrero
Desde Página 1 – Hasta Página 60
True Sentences
Unidad 1
John McCarthy at the 1956 Dartmouth Conference, which outlined that artificial intelligence is about letting a
machine simulate the intelligent behavior of humans as precisely as it can be.
According to the theory of multiple intelligences, human intelligence can be categorized into seven types:
Linguistic, Logical-Mathematical, Spatial Bodily- Kinesthetic, Musical, Interpersonal and Intrapersonal
intelligence.
The study of machine learning aims at enabling computers to simulate or perform human learning ability and
acquire new knowledge and skills.
Deep learning (DL) derives from the study of artificial neural networks (ANN). As a new subfield
of machine learning, it focuses on mimicking the mechanisms of human brain in interpreting data like images,
sound, and text.
Types of AI
Strong artificial intelligence is about the possibility to create the intelligent machines that can accomplish
reasoning problem-solving tasks.
Weak artificial intelligence depicts the circumstance when it is not able to make machines that can truly
accomplish reasoning and problem-solving. These machines may look smart, but they do not really have
intelligence or self-awareness.
The Three Main Schools of AI
Symbolism
The basic theory of symbolism believes that, the cognitive process of human being consists of the inference
and processing of symbols. Human is an example of physical symbol system, and so does the computer.
Connectionism
The foundation of connectionism is that the nature of human logical thinking is neurons, rather than a process
of symbol processing.
Connectionism believes that the human brain is different from computers, and put forward a connectionist
model imitating brain work to replace the computer working model operated by symbols.
Behaviorism
The fundamental theory of behaviorism believes that intelligence depends on perception and behavior.
Behaviorism introduces a “perception-action” model for intelligent activities. Behaviorism believes that
intelligence has nothing to do with knowledge, representation, or reasoning.
AI-Related Technologies
AI technology is multi-layered, running through technical levels such as applications, algorithms, chips,
devices, and processes
Application Level
Algorithm Level: Machine learning algorithms: neural network, support vector machine (SVM),
K-nearest neighbor algorithm (KNN), Bayesian algorithm, decision tree, hidden Markov model (HMM),
ensemble learning, etc.
Chip Level
Device Level
Process Level
Deep Learning Framework
The introduction of deep learning frameworks has made deep learning easier to build. With the deep learning
framework, we do not need to firstly code complex neural networks with backpropagation algorithms, but can
just configure the model hyperparameters according to our demands, and the model parameters can be learned
automatically from training.
Types of AI Processors
Central processing unit (CPU)
Graphics processing unit (GPU)
Application specific integrated circuit (ASIC)
Field programmable gate array (FPGA)
In terms of the functions, AI processors can be classified into two types: training processors and inference
processors.
In order to train a complex deep neural network model, the AI training usually entails the input of a large
amount of data and learning methods such as reinforcement learning.
Training is a compute-intensive process.
The popular training processors include NVIDIA
GPU, Google’s tensor processing unit (TPU), and Huawei’s neural-network processing unit (NPU).
Inference here means inferring various conclusions with new data obtained on the basis of the trained model.
For instance, the video monitor can distinguish whether a captured face is the specific target by making use of
the backend deep neural network model.
Although inference entails much
less computation than training, it still involves lots of matrix operations.
GPU, FPGA and NPU are commonly used in inference processors.
The CPU performance can also be improved by increasing the frequency, but there is a limit, and the high
frequency will cause excessive power consumption and high temperature.
GPU is very competitive in matrix computing and parallel computing and serves as the engine of
heterogeneous computing. It was first introduced into the field of AI as an accelerator to facilitate deep
learning and now has formed an established ecology.
The main challenges of current GPUs are high cost, low energy consumption ratio, and high input and output
latency.
Since 2016, Google has been committed to applying the concept of application-specific integrated circuits
(ASIC) to the study of neural networks.
In 2016, it launched the AI custom-developed processor TPU which supports the open-source deep learning
framework TensorFlow.
FPGA uses a programmable hardware description language (HDL), which is flexible, reconfigurable, and can
be deeply customized.
It can load DNN model on the chips to perform low-latency operation by incorporating multiple FPGAs,
contributing to a computing performance higher than GPU.
The GPU is generally designed to tackle large-scale data that are highly unified in type and independent from
each other, and deal with a pure computing environment without interruption. The CPU is designed more
general-purpose, so as to process different types of data, and perform logical decisions at the same time, and it
also needs to introduce a large number of branch-jump instructions and interrupt processing.
Huawei Ascend AI Processor
NPU refers to the processor carrying out the optimization design specialized for neural network computing,
whose performance of neural network tasks processing is much higher than that of CPU and GPU.
The NPU mimics human neurons and synapses on the circuitry, and directly processes large scale neurons and
synapses through deep learning processor instruction set. In NPUs, the processing of a group of neurons will
take only one instruction. Currently, the typical examples of NPU include Huawei Ascend AI processor, the
Cambrian chip and IBM’s TrueNorth chip.
There are two kinds of Huawei Ascend AI processor: Ascend 310 and Ascend 910.
Ascend 910 is mainly applied to training scenarios, mostly deployed in the data center.
While Ascend 310 is mainly designed for reasoning scenarios, whose deployment covers the device, edge and
cloud full scenarios.
Over the past half a century, the world has witnessed three waves of AI.
The first was in 1962, when the checkers-playing program developed by Arthur Samuel from IBM defeated
the world’s best checkers player in the United States.
The second time was in 1997, when IBM’s supercomputer Deep Blue defeated the human chess
world champion Garry Kasparov by 3.5:2.5.
The third wave of AI came in 2016 when the Go AI AlphaGo developed by DeepMind, a subsidiary of
Google, defeated the Go world champion and nine-dan player from the South Korean, Lee Sedol.
We need to integrate AI with cloud computing, big data and the Internet of Things (IoT) so as to make the
application of AI in real life possible, which is the foundation of the platform architecture for AI
Application
Currently, the development and application of AI needs to deal with the following three problems.
1. High occupational standards: To get engaged in AI industry, it is a prerequisite for a person to have
considerable knowledge in machine learning, deep learning, statistics, linear algebra and calculus.
2. Low efficiency. Training a model will take a long working cycle, which consists of data collection, data
cleaning, model training and tuning, and optimization of visualization experience.
3. Fragmented capabilities and experiences: to apply a same AI model in other scenarios requires will need to
repeat data collection, data cleaning, model training and tuning, and experience optimization, and the
capabilities of the AI model cannot be directly passed to the next scenario.
4. Difficult capacity upgrading and improvement: the model upgrading and effective data capturing are
difficult tasks.
The Technologies of AI
Computer vision is a science that explores how to make computers “see” things, and the most established
technology among the three genres of AI application technologies.
Speech processing is the study of the statistical characteristics of speech signals and voice production.
Natural language processing is a technology aiming at interpreting and utilizing natural language through
computer technologies.
The subjects of NLP include machine translation, text mining and sentiment analysis.
The Applications of AI
A smart city is to use ICT technologies to sense, analyze, and integrate the key information of the core urban
operation system, so as to intelligently respond to the city’s demands in people’s livelihood, environmental
protection, public safety, urban services, and industrial and commercial activities.
Smart healthcare, we can enable AI to “learn” professional medical knowledge, to “memorize” loads of health
records, and to analyze medical images with computer vision, so as to provide doctors with reliable and
efficient assistance
Smart Retail
AI will also revolutionize the retail industry. A typical case is the unmanned supermarket. Amazon’s
unmanned supermarket Amazon Go adopts sensors, cameras, computer vision, and deep learning algorithms
and cancels the traditional check-out, so that customers can just walk in the store, grab the products they need
and go.
Smart driving
The Society of Automotive Engineers (SAE) defines six levels for autonomous driving from L0 to L5 based
on the degree of dependence the vehicle has on the driving system. The L0-level vehicles need to reply on
driver’s operation completely, and the vehicles at level L3 and above allow the hands-off driving under
certain circumstances, while the L5-level vehicles are completely operated by the driving system without a
driver in all scenarios.
The Current Status of AI, the AI development has undergone three stages and now AI is
still in the stage of perceptual intelligence.
The three stages of artificial intelligence
Directions of Huawei Full-Stack AI
Huawei’s one-stop AI development platform—ModelArts, is a one-stop development platform that Huawei
designed for AI developers. It supports large-scale data preprocessing, semi-automatic labeling,
distributed training, automated model building and on-demand model deployment on end, edge and cloud, to
help developers quickly build and deploy models and manage the full AI development lifecycle.
MindSpore, All-Scenario AI Computing Framework, Although the application of AI services to the device,
edge and cloud scenarios is thriving in this intelligent age, AI technology still faces huge challenges including
the high technological standards, soaring development costs and long
deployment cycles.
MindSpore supports architectures of different sizes and types, adaptable to all-scenario independent
deployment, Ascend AI processor, and other processors such as GPUs and CPUs.
Compute Architecture for Neural Networks (CANN) is a chip enablement layer Huawei built for deep neural
networks and Ascend AI processors. It consists of the following four major function modules.
Ascend AI processor, The unique Da Vinci 3D Cube architecture of Ascend AI processors makes the series
quite competitive in computing power, energy efficiency and scalability.
Ascend 310 is a highly efficient AI system-on-chip (SoC) designed for the edge intelligent scenarios of
inference. It uses a 12 nm chip and delivers a computing power of up to 16 TOPS (tera operations per second)
with a consumption of only 8 W, highly suitable for the edge intelligence scenarios requiring
low power consumption.
Ascend 910 is currently the single chip with the greatest computing density, suitable for AI training. It adopts
a 7 nm chip and provides a computer power of up to 512 TOPS with a maximum power consumption of 350
W.
Huawei Atlas artificial intelligence computing solution is based on the Huawei Ascend AI processors to build
an all-scenario AI infrastructure solution for device, edge and cloud scenarios through a wide range of
products including modules, circuit boards, edge stations, servers and clusters, etc
The Controversy of AI
Algorithmic Bias
Privacy Issues
The Contradiction Between Technology and Ethics
Will Everyone Be Unemployed?
The Development Trends for AI
Easier Development Framework
Algorithms and Models with Better Performance
Smaller Deep Models
All-round development of the computing power at device, edge and cloud
More Sophisticated AI Basic Data Services
Safer Data Sharing
Exercises
1. There are different interpretations of artificial intelligence in different contexts.
Please elaborate on the artificial intelligence in your eyes.
2. Artificial intelligence, machine learning and deep learning are three concepts often mentioned together.
What is the relationship between them? What are the similarities and differences between the three terms?
3. After reading the artificial intelligence application scenarios in this chapter, please describe in detail a field
of AI application and its scenarios in real life based on your own life experience.
4. CANN is a chip enablement layer that Huawei introduced for deep neural networks and Ascend AI
processors. Please brief the four major modules of CANN.
Fusion Engine: The operator-level fusion engine is mainly used to perform operator fusion to reduce the
memory movement among operators and improve performance by 50%.
CCE operator library: It is a deeply optimized common operator library of Huawei that can meet most of the
needs of the mainstream computer vision and NLP neural network.
Tensor Boost Engine (TBE). It is an efficient and high-performance custom operator development tool, which
makes abstraction of hardware resources into application programming interfaces (API).
The last module is the compiler at the bottom. It provides ultimate optimization of performance to support
Ascend AI processor in all scenarios.
5. Based on your current knowledge and understanding, please elaborate on the development trends of
artificial intelligence in the future in your view.
Easier Development Framework
Algorithms and Models with Better Performance
Smaller Deep Models
All-round development of the computing power at device, edge and cloud
More Sophisticated AI Basic Data Services
Safer Data Sharing
UNIDAD 2
Machine learning is currently a mainstream research hotspot in the AI industry, entailing multiple disciplines
such as probability theory, statistics, and convex optimization.
Machine learning (including its branch deep learning) is the study of “learning algorithms”. The so-called
“learning” here refers to the situation that the performance of a computer program measured by performance
metric P on a certain task T improves itself with experience E, then we call this computer program learning
from experience E.
The nature of machine learning algorithms is function fitting. Let f be the objective function, the purpose of
the machine learning algorithm is to give a hypothetical function g that makes g(x) and f(x) as close as
possible to the input x in any defined domain.
Therefore, the output of the learning algorithm is always not perfect and cannot be completely consistent with
the objective function. However, with the expansion of the training data, the degree of approximation of the
hypothesis function g to the objective function f is gradually improved, and finally a satisfactory level of
accuracy can be achieved in machine learning.
Machine learning can deal with various types of problems, including the most typical ones such as
classification, regression, and clustering.
Classification and regression are the two major types of prediction problems, taking up of 80–90% of
all the problems.
The main difference is that the output of classification is discrete serial numbers of classes (generally called as
“labels” in machine learning), while the output of regression is continuous value.
The classification problem requires the program to indicate which of the k classes does the input belong to. To
solve this problem, machine learning algorithms usually output mapping from domain D to category labels
{1, 2, . . ., k}.
Image classification task is a typical classification problem.
In regression problems, the program needs to predict the output value for a given input. The output of a
machine learning algorithm is usually a mapping from the domain D to the real number domain R.
For instance, predicting the claim amount of the insured (used to set insurance premium), or predicting the
price of securities in the future are all relevant cases. In fact, classification problems can also be reduced to
regression problems.
By predicting the probability of the image belonging to each class, the machine learning can obtain the result
of the classification.
The clustering problem needs to divide the data into multiple categories according to the inherent similarity of
the data.
Unlike the classification problem, the dataset of the clustering problem does not contain manually labeled
labels.
The clustering algorithm makes the data similar to each other within the class as much as possible, while the
data similarity between the classes is relatively small, so as to implement classification. Clustering algorithms
can be used in scenarios like image retrieval, user portrait generation and etc.
According to whether the training dataset contains manually tagged labels, machine learning can be generally
divided into two types—supervised learning and unsupervised learning.
If some data in the dataset contains labels and the majority of the data does not, then this learning algorithm is
called semi-supervised learning.
Reinforcement learning mainly focuses on multi-step decision-making problems, and automatically collects
data for learning in the interaction with the environment.
Generally speaking, supervised learning is allowing the computer to compare standard answers when it is
trained to answer the multiple-choice questions. The computer tries its best to adjust its model parameters,
expecting that the inferred answer is as consistent as possible with the standard answer, and finally learn how
to solve the question.
Compared with supervised learning, unsupervised learning is like letting the computer do multiple-choice
questions without telling it what the correct answer is. The unsupervised learning algorithm does not require
labeling samples, but directly modeling the input dataset.
Clustering algorithm is a typical unsupervised learning algorithm that can be summarized by the proverb:
“birds of a feather flock together”. The algorithm only puts the samples of high similarity together. For newly
input samples, it only needs to calculate their similarity with the existing samples, and then classify them
according to the degree of similarity
The Overall Process of Machine Learning
A complete machine learning project often involves data collection, data cleaning, feature extraction and
selection, model training, model evaluation and testing, model deployment and integration,
A dataset is a set of data used in a machine learning project, and each data is called a sample.
The items or attributes that reflect the performance or nature of the sample in a certain aspect are called
features.
The dataset used in the training process is called the training set, and each sample is called a training sample.
Learning (training) is the process of learning a model from data.
The process of using the model to make predictions is called testing, and the dataset used for testing is known
as the test set.
Each sample in the test set is called a test sample.
Data is vital to the model and determines the limit of the model’s capabilities.
Without good data, there will be no good models.
Following are some typical problems on data quality. Such data is called “dirty” data.
1. Incomplete: Data lacks attributes or containing missing values.
2. Noisy: Data contains erroneous records or outliers.
3. Inconsistent: Data contains conflicting records or discrepancies.
What are handled by the machine learning model are all features. The so-called feature is the numerical
representation of the input variable that can be used in the model. In most cases, the collected data can be
used by the algorithm after preprocessing.
The preprocessing operation mainly includes the following procedures.
1. Data filtering.
2. Handling missing data.
3. Handling possible errors or outliers.
4. Combining data from multiple sources.
5. Data aggregation.
Feature Selection
Usually, there are many different features in a dataset, some of which may be redundant or unrelated to the
target
Through feature selection, these redundant or irrelevant features can be eliminated, so that the model is
simplified and easier to be interpreted by users.
At the same time, feature selection can also effectively reduce the time of model training, avoid dimensional
explosion, improve the generalization performance of the model, and avoid overfitting.
Common methods for feature selection include filter methods, wrapper methods, and embedded methods
The filter method is independent when selecting features and has nothing to do with the model itself. By
measuring the correlation between each feature and the target attribute, filter method applies a statistical
measurement to score each feature.
By sorting these features on the basis of the scores, you can decide to keep or eliminate specific features.
Figure 2.15 shows the machine learning process using filter methods. Statistical measures commonly used in
filtering include Pearson’s correlation coefficient, Chi-Square coefficient, and mutual information.
The wrapper method uses a predictive model to score a subset of features, and considers the feature selection
problem as a search problem, where the wrapper will evaluate and compare different feature combinations,
and the predictive model will be used as a tool for evaluating feature combinations. The higher the accuracy
of the prediction model, the more the feature combination should be retained.
Unlike the filter and wrapper method, the model using the embedded method actively learns how to perform
feature selection during training. The most common embedded feature selection method is regularization.
Regularization is also called the penalty method. By introducing additional constraints when optimizing the
prediction algorithm, the complexity of the model is reduced, namely, the
number of features is reduced. Common regularization methods include ridge regression and Lasso
regression.
After finishing data cleaning and feature extraction, it is time to build the model.
The core of model construction is model training, verification and testing.
Model Evaluation
What is a good model? The most important evaluation indicator is the model’s generalization ability, also
known as the prediction accuracy of the model dealing with actual business data.
There are also some engineering indicators that can be used to evaluate the model: interpretability, which
describes the degree of straightforwardness of the model’s prediction results; prediction rate, which refers to
the average time it takes for the model to predict each sample; plasticity, which refers to the acceptability of
model prediction rate in actual business process as the business volume expands.
The goal of machine learning is to make the learned model applicable to new samples, not just on training
samples. The ability of the learned model to apply to new samples is called generalization ability, also
addressed as robustness.
The difference between the predicted result of the learned model on the sample and the true result of the
sample is called error. The training error refers to the error of the model on the training set, and the
generalization error refers to the error of the model on the new sample (test set).
Obviously, we want to have a model with smaller generalization error.
The effective capacity of the model is limited by algorithms, parameters, and regularization methods.
Generally speaking, the generalization error can be interpreted as:
Total error ¼ Bias 2 + Variance + Unresolvable error