0% found this document useful (0 votes)
13 views24 pages

AI Primer

The document provides a comprehensive overview of data science, big data, and data analytics, explaining their definitions, characteristics, and differences. It also discusses structured and unstructured data, their storage techniques, and the advantages and disadvantages of centralized versus distributed networks. Additionally, it highlights alternative data types and the 8 V's of big data, emphasizing the importance of understanding data in various contexts.

Uploaded by

wen man
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views24 pages

AI Primer

The document provides a comprehensive overview of data science, big data, and data analytics, explaining their definitions, characteristics, and differences. It also discusses structured and unstructured data, their storage techniques, and the advantages and disadvantages of centralized versus distributed networks. Additionally, it highlights alternative data types and the 8 V's of big data, emphasizing the importance of understanding data in various contexts.

Uploaded by

wen man
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Data Science vs Big Data vs Data Analytics

● Data Science: Dealing with unstructured and structured data. Data Science is a field that
comprises everything that is related to data cleansing, preparation, and analysis. Data
Science is the combination of statistics, mathematics, programming, problem-solving,
capturing data in ingenious ways, the ability to look at things differently, and the activity of
cleansing, preparing, and aligning the data. In simple terms, it is the umbrella of techniques
used when trying to extract insights and information from data.
● Big Data: Big Data refers to humongous volumes of data that cannot be processed
effectively with the traditional applications that exist. The processing of Big Data begins
with the raw data that isn’t aggregated and is most often impossible to store in the memory
of a single computer. The definition of Big Data, given by Gartner, is, “Big data is high-
volume, and high-velocity or high-variety information assets that demand cost-effective,
innovative forms of information processing that enable enhanced insight, decision making,
and process automation.”
● Data Analytics: the science of examining raw data to conclude that information. Data
Analytics involves applying an algorithmic or mechanical process to derive insights and,
for example, running through several data sets to look for meaningful correlations between
each other. It is used in several industries to allow organisations and companies to make
better decisions as well as verify and disprove existing theories or models. The focus of
Data Analytics lies in inference, which is the process of deriving conclusions that are solely
based on what the research already knows.

Data Structure
Structured vs Unstructured Data

What is structured data?


● Structured data refers to any data that resides in a fixed field within a record or file. This
includes data contained in relational databases and spreadsheets.
● Structured data is easy to enter, store, query, and analyze, but it must be strictly defined
in terms of field name and type (numeric, currency, alphabetic, name, date, address) and
any restrictions on the data input (number of characters; restricted to certain terms such
Male or Female).
● Structured data requires you to first create a data model. It is all about a model that defines
the types of business data and how it will be stored, processed and accessed.
● Structured data examples
○ Meta-data (Time and date of creation, File size, Author etc.)
○ Library Catalogues (date, author, place, subject, etc.)
○ Census records (birth, income, employment, place etc.)
○ Economic data (GDP, PPI, ASX etc.)
○ FaceBook like button
○ Phone numbers (and the phone book)
○ Databases (structuring fields)
What is unstructured data?
● Unstructured data (or unstructured information) is the kind of information that either does
not have a predefined data model or is not organised in a pre-defined manner
● Unstructured data examples are as follows:
○ Text files (Word processing, spreadsheets, presentations etc.)
○ Email body
○ Social Media (Data from Facebook, Twitter, Linkedin)
○ Website (YouTube, Instagram, photo sharing sites)
○ Mobile data (Text messages)
○ Communications (Chat, IM, phone recordings, collaboration software)
○ Media (MP3, digital photos, audio and video files)

Computer or Machine-Generated Data Sources


Machine-generated data generally refers to the kind of data that is created by a machine without
human intervention.

Machine Generated Structured Data Sources Machine Generated Unstructured


Data Sources

Sensor data: when you talk about radio frequency ID Satellite images: when you take into
tags, smart meters, medical devices, and Global consideration the weather data or the
Positioning System data, you are basically referring to data that government agencies
machine generated structure data. Supply chain procure through its satellite
management and inventory control is what gets the surveillance imagery, you are talking
companies interested in this. about machine generated unstructured
data. Google Earth and similar
mechanisms aptly illustrate the point.

Web log data: when systems and mechanisms such Scientific data: all scientific data that
as servers, applications and networks etc. work, they includes seismic imagery, atmospheric
soak in different types of data regarding whatever is data and high energy Physics so and
the operation. It means enormous piles of data of so forth stand for machine generated
diverse kinds. Based on this data, you can deal with unstructured data.
service-level agreements or predict security breaches.

Point-of-sale data: when the digital transactions take Photographs and video: when
place over the counter of a shopping mall, the machines capture images and video
machine captures a lot of data. This is machine for the purposes of security,
generated structured data related to barcode and surveillance and traffic, the data that is
other relevant details of the product etc. produced is machine generated
unstructured data.

Financial data: computer programs are used with Radar or sonar data: this includes
respect to financial data a lot more now. Processes vehicular, meteorological, and
are automated with the help of these programs. Take oceanographic seismic profiles.
the case of stock-trading. It carries structured data
such as the company symbol and dollar value. A part
of this data is machine generated and some of it is
human generated.

Human-Generated Data Sources


This is data that humans, in interaction with computers, supply.

Human Generated Structured Data Sources Human Generated Unstructured Data


Sources

Input data: when a human user enters input Text internal to your company: this is the type
such as name, age, income, non-free-form of data that is restricted to a given company
survey responses etc. into a computer, it is such as documents, logs, survey results,
human generated structured data. Companies emails etc. Such enterprise information forms
can find this type of data quite useful in a big part of such unstructured text information
studying consumer behaviour. in the world.

Clickstream data: this is the type of data Social media data: this kind of data is
generated through the process of a user generated when human users interact with
clicking a lick on a website. Businesses like social media platforms such as Facebook,
this type of data because it allows them to Twitter, Flickr, YouTube, LinkedIn etc.
study customer behaviour and purchase
patterns.

Gaming-related data: when a human user Mobile data: this type of data includes
makes a move in a game on a virtual platform, information such as text messages and
it produces a piece of information. How users location information.
navigate a gaming portfolio is a source of a lot
of interesting data.

Website content: this type of data is derived


from a site delivering unstructured content
such as YouTube, Flickr, Instagram etc.

Characteristics
Structured data Unstructured data

Flexibility Schema dependent rigorous Absence of schema, very flexible


scheme

Scalability Scaling DB schema is difficult Highly scalable

Robustness Robust -

Query Performance Structured query allows complex Only textual query possible
joins

Accessibility Easy to access Hard to access

Availability Percentage wise lower Percentage wise higher


Association Organised Scattered and dispersed

Analysis Efficient to analysis Additional preprocessing is needed

Appearance Formally defined Free-Form

Storage Techniques
Structured Data Storage Technique
Block storage / block level storage:

● This type of data storage is used in the context of storage-area network (SAN)
environments. In such environments, data is stored in volumes which is also referred to
as blocks.
● An arbitrary identifier is assigned to every block. It allows the block to be stored and
retrieved but there would be no metadata providing further context.
● Virtual machine file system volumes and structured database storage are the use cases
of block storage.
● When it comes to block storage, raw storage volumes are created on the device. With the
aid of a server-based system, the volumes are connected and each one is treated as an
individual hard drive.

Unstructured Data Storage Technique

● This particular technique is basically a way of storing, organising and accessing data on
disk. The difference however is that it is done so in a more scalable and cost-effective
manner.
● This kind of storage system makes it possible to retain huge volumes of unstructured data.
When it comes to storing photos on Facebook, songs on Spotify, or files in collaboration
services such as Dropbox, object storage comes into play.
● Each object incorporates data, a lot of metadata and a singularly unique identifier. This
kind of storage can be done at different levels such as device level, system level and
interface level.
● Since objects are robust, this kind of storage works well for long-term storage of data
archives, analytics data and service provider storage with SLAs linked with data delivery.

Structured Data vs Unstructured Data


Structured Data Unstructured Data

Definition Structured data refers to any data Unstructured data (or unstructured
that resides in a fixed field within a information) is information that either
record or file. This includes data does not have a predefined data model
contained in relational databases or is not organised in a pre-defined
and spreadsheets. manner.

Example ● Databases (structuring ● Website Data which are present


fields) in the form of HTML Pages
● Meta-data (time and date of ● Media (MP3, digital photos,
creation, file size, author etc.) audio and video files)
● Census records (birth, ● Text files (Word processing,
income, employment, place spreadsheets, presentations
etc.) etc.)

Growth Structured data accounts for about Experts estimate that 80% of the data in
20% of the total existing data. any organisation is unstructured.

Characteristic ● Schema dependent ● Absence of schema


● Scaling DB schema is difficult ● Very flexible
● Robust ● Highly scalable
● Structured query allows ● Only textual query possible
complex joins ● Hard to access
● Easy to access ● Scattered and dispersed
● Organised ● Additional preprocessing is
● Efficient to analysis needed

Storage Block storage Object storage


technique

Storage and ● SQL Server ● Hadoop


management ● Oracle Database ● Amazon web Service S3
tool ● MySQL ● IBM Spectrum Scale

8 Vital Alternative Data Types


https://prowebscraper.com/blog/why-is-alternative-data-so-important/
● App Usage: behavioural data from purchase, etc
● Credit/Debit Card: buying patterns and choices
● Geo-Location: tracking Wi-Fi or Bluetooth beacons
● Public Data: patents, government contracts, import/export data, etc
● Satellite: satellite feed and low-level drones for supply-chain, tracking agriculture yields
and oil and gas storage, etc
● Social or Sentiment: social media, news, management communications, comments,
shares, likes on social media
● Web Data: data scrapped from websites for product descriptions, flight bookings, real
estate listing, etc
● Web Traffic: demographics of visitors visiting a particular website for travel bookings and
e-commerce as examples

The 8 V’s of Big Data


● Volume: can you find the information you are looking for?
● Value: can you find it when you most need it?
● Veracity: are you dealing with information or disinformation?
● Visualisation: can you make sense at a glance? Does it trigger a decision?
● Variety: is a picture worth a thousand words in 70 languages? Is your information
balanced?
● Velocity: information gains momentum and crises & opportunities evolve in real time. How
is outlook for today?
● Viscosity: does it stick with you? Does it call for action?
● Virality: AHA to-go? Does it convey a message that can be pasted into a presentation or
instagrammed?

Small Data vs Big Data


Small Data Big Data

Definition Data that is ‘small’ enough for human Data sets that are so large or complex
comprehension. In a volume and that traditional data processing
format that makes it accessible, applications cannot deal with them.
informative and actionable.

Data Source ● Data from traditional ● Purchase data from point-of-


enterprise systems like sale
○ Enterprise resource ● Clickstream data from
planning websites
○ Customer relationship ● GPS stream data - mobility
management (CRM) data sent to a server
● Financial Data like general ● Social media - Facebook,
ledger data Twitter
● Payment transaction data
from website

Volume Most case in range of tens or More than few Terabytes (TB).
hundreds of GB. Some case few TBs
(1 TB = 1000 GB).

Velocity (Rate ● Controlled and steady data ● Data can arrive at very fast
at which data flow speeds
appears) ● Data accumulation is slow ● Enormous data can
accumulate within very short
periods of time

Variety Structured data in tabular format with High variety data sets which include
fixed schema and semi structured Tabular data, Text files, Images,
data in JSON or XML format. Video, Audio, XML, JSON, Logs,
Sensor data etc.

Veracity Contains less noise as data collected Usually quality of data not guaranteed.
(Quality of in controlled manner. Rigorous data validation is required
data) before processing.

Value Business Intelligence, Analysis and Complex data mining for prediction,
Reporting recommendation, pattern finding etc.

Time Historical data equally valid as data In some cases data gets older soon
Variance represent solid business interactions (e.g. fraud detection).

Data Location Databases within enterprise, Local Mostly in distributed storages on Cloud
servers etc. or in external file systems.

Infrastructure Predictable resource allocation. More agile infrastructure with


Mostly vertically scalable hardwares. horizontally scalable architecture.
Load on system varies a lot.

Distributed vs Centralised Networks for Storage


● Centralised data networks are those that maintain all the data in a single computer
location and to access the information you must access the main computer of the system,
known as “server”.
● On the other hand, a distributed data network works as a single logical data network,
installed in a series of computers (nodes) located in different geographic locations and
that are not connected to a single processing unit, but are fully connected among them to
provide integrity and accessibility to information from any point. In this system all the nodes
contain information and all the clients of the system are in equal condition. In this way,
distributed data networks can perform autonomous processing. The clear example is the
blockchain, but there are others such as Spanner, a distributed database created by
Google. → diversification of location, storage

Advantages and disadvantages of centralised, decentralised and distributed data networks


Centralised and distributed networks have different characteristics and also have different
advantages and disadvantages. For example, centralised networks are the easiest to maintain
since they have only one point of failure. This is not the case of the distributed ones, which in
theory are more difficult to maintain.

But this in turn is its main disadvantage: the centralized ones are very unstable, since any problem
that affects the central server can generate chaos throughout the system. However, the
distributed ones are more stable, by storing the totality of the system information in a large number
of nodes that maintain equal conditions with each other.

This same feature is what gives distributed networks a higher level of security, since to carry out
malicious attacks would have to attack a large number of nodes at the same time. As the
information is distributed among the nodes of the network: in this case if a legitimate change is
made it will be reflected in the rest of the nodes of the system that will accept and verify the new
information; but if some illegitimate change is made, the rest of the nodes will be able to detect it
and will not validate this information. This consensus between nodes protects the network from
deliberate attacks or accidental changes of information.

In addition, distributed systems have an advantage over centralized systems in terms of network
speed, since as the information is not stored in a central location, a bottleneck is less likely, in
which the number of people Attempting to access a server is larger than it can support, causing
waiting times and slowing down the system.

Also, centralized systems tend to present scalability problems since the capacity of the server is
limited and can not support infinite traffic. Distributed systems have greater scalability, due to the
large number of nodes that support the network.

Finally, in a distributed network the extraction of any of the nodes would not disconnect from the
network to any other. All the nodes are connected to each other without necessarily having to
pass through one or several local centers. In this type of networks the center / periphery division
disappears and therefore the filter power over the information that flows through it, which makes
it a practical and efficient system.

Specific advantages of blockchain


● There are other types of distributed data networks besides the blockchain. In fact, the
consensus and the immutability of the data are not unique characteristics of the
blockchain, since there are other distributed data networks that also have these
characteristics, such as: Paxos, Raft, Google HDFS, Zebra, CouchDB, Datomic, among
other.
● But there are two characteristics that really differentiate the blockchain from the rest of the
data networks: the access control for writing and reading data is truly decentralised, unlike
other distributed data networks where it is logically centralised, and the ability to secure
transactions no need for trusted third parties in a competitive environment.
● The blockchain has unique characteristics over the rest of the available data networks.
However, this does not mean that for all possible cases of data storage the best option is
always to use the blockchain, since this really depends on the needs and requirements of
a company or organisation when using a database. → what problem are you facing? Can
blockchain help you solve the problem? It is not a cure all. Need design thinking to solve
your problem.

Comparative Summary
Centralised Distributed

Security If someone has access to the All data is distributed between the nodes
server with the information, any of the network. If something is added,
data can be added, modified and edited or deleted in any computer, it will be
deleted. reflected in all the computers in the
network. If some legal amendments are
accepted, new information will be
disseminated among other users
throughout the network. Otherwise, the
data will be copied to match the other
nodes. Therefore, the system is self-
sufficient and self-regulating. The
databases are protected against
deliberate attacks or accidental changes
of information.

Availability If there are several requests, the Can withstand significant pressure on the
server can break down and no network. All the nodes in the network have
longer respond. the data. Then, the requests are
distributed among the nodes. Therefore,
the pressure does not fall on a computer,
but on the entire network. In this case, the
total availability of the network is much
greater than in the centralised one.
Accessibility If the central storage has problems, Given that the number of computers in the
you will not be able to obtain your distributed network is large, DDoS attacks
information unless the problems are possible only in case their capacity is
are solved. In addition, different much greater than that of the network. But
users have different needs, but the that would be a very expensive attack. In
processes are standardised and a centralised model, the response time is
can be inconvenient for customers. very similar in this case. Therefore, it can
be considered that distributed networks
are secured.

Data If the nodes are located in different In distributed networks, the client can
transfer countries or continents, the choose the node and work with all the
rates connection with the server can required information.
become a problem.

e-Scalability Centralised networks are difficult to Distributed models do not have his
scale because the capacity of the problem since the load is shared among
server is limited and the traffic can several computers.
not be infinite. In a centralised
model, all clients are connected to
the server. Only the server stores
all the data. Therefore, all requests
to receive, change, add or delete
data go through the main computer.
But server resources are finite. As a
result, he is able to carry out his
work effectively only for the specific
number of participants. If the
number of clients is greater, the
server load may exceed the limit
during the peak time.

Technology Study - AI Primer


Definition of AI
● AI originated more than 50 years ago, and it is generally agreed that John McCarthy
coined the phrase “artificial intelligence” in a written proposal for a workshop in Dartmouth
in 1956. AI is now commonly understood as the study and engineering of computations
that make it possible to perceive, reason, act, learn and adapt.
● In the widely referenced book, “Artificial Intelligence: A Modern Approach”, Dr Stuart
Russell and dr Peter Norvig define AI as: “The study of agents that receive percepts from
the environment and perform actions.”

Definitions and not definition


● The various definitions of AI, laid out along two dimensions are also discussed.
● The definitions on top are concerned with thought processes and reasoning, whereas the
ones on the bottom address behaviour
● The definitions on the left measure success in terms of fidelity to human performance,
whereas the ones on the right measure against an ideal performance measure, rationality:
History of AI
Symbolic AI
● Early in the 1940s and 1950s, a handful of scientists from a variety of fields, including
mathematics, psychology, engineering, economics, and political science, began to
discuss the possibility of creating an artificial brain. The term “Artificial Intelligence” was
coined at a Dartmouth conference and AI research was founded as an academic discipline
in 1956.
● At the early stage, teaching machines how to play chess was one of the main research
focuses on AI. Chess has its playing rules, many experts in AI believed that AI could be
achieved by having programmers handcraft a sufficiently large set of explicit rules for
manipulating knowledge, these rules are human-readable representations of problems
and logics. This is known as “Symbolic AI”, and it was the dominant paradigm in AI from
the 1950s to the late 1980s. Figure 2 illustrates how Symbolic AI works.

● Symbolic AI reached its peak popularity during the “Expert Systems” booms of the
1980s.
● Expert systems are a logical and knowledge-based approach.
● Their power came from the expert knowledge they contained, but it also limited the
further development of expert systems.
● The knowledge acquisition problem, knowledge base increasing and updates issues are
all the major challenges of expert systems.
● A new type of AI approaches became required over rule-based technologies at that time.
Machine Learning
● Machine learning, reorganised as a subset field of AI, started to flourish in the 1990s
● Different from Symbolic AI, machine learning does not require humans to know the
existing rules.
● It arises from the question: “could a computer go beyond “what we know how to order it
to perform” (Symbolic AI), and learn on its own how to perform a specified task?”
● Based on this, with machine learning, humans input data as well as the expected answers
from the data, and the machine will “learn” by itself and outcome the rules.
● These learned rules can then be applied to new data to produce new answers.
● Figure 3 illustrates the simple structure of machine learning.

● Starting from the 1990s, AI changed its goal from achieving AI to tackling solvable
problems of a practical nature.
● It shifted focus away from the symbolic approaches it had inherited from AI, and towards
methods and models borrowed from statistics and probability theory (Langley 2011).

Deep Learning
● Following a series of ups and downs, often referred to as “AI summers and winters”, as
interest in AI has alternately grown and diminished.
● This is illustrated in Figure 4. In this evolution roadmap, we can see AI is a general field,
which covers machine learning.
● Deep learning is one hot branch of machine learning, which is also the symbol of the
current AI boom, about eight years ago.

Learning - Machine Versus Deep


● Compared to machine learning, deep learning automates the feature engineering of the
input data (the process of learning the optimal features of the data to create the best
outcome), and allows algorithms to automatically discover complex patterns and
relationships in the input data.
● Deep learning is based on Artificial Neural Networks (ANNs), which were inspired by
information processing and distributed communication nodes in biological systems, like
the human brain. Figure 5 simply shows the information processing framework of human
brain and ANNs. ANN imitates the human brain’s process through using multiple layers to
progressively extract different levels of feature/interpretation from raw input data (each
hidden layer represents one feature/interpretation of the data). In essence, deep learning
algorithms “learn how to learn”.

● Although AI research started in the 1950s, its effectiveness and progress have been most
significant over the last decade, driven by three mutually reinforcing factors:
○ The availability of big data: various data sources, including businesses, e-
commerce, social media, science, wearable devices, government etc.
○ Dramatically improvement of machine learning algorithm: the sheer amount of
available data accelerates algorithm innovation
○ More powerful computing ability and cloud-based services: make it possible to
realise and implement the advanced AI algorithms, like deep neural networks
● Significant progress in algorithms, hardware, and big data technology, combined with the
financial incentives to find new products, have also contributed to the AI technology
renaissance.
● Today, AI has transformed from “let the machine know what we know” to “let the machine
learn what we may don’t know” to “let the machine automatically learn how to learn”.
● Researchers are working on much wider applications of AI that will revolutionise the ways
in which people work, communicate, study and enjoy ourselves.
● Products and services incorporating such innovation will become part of people’s day-to-
day lives in the near future.

Human Brain and Artificial Neural Networks

● Activation
○ Activation functions are mathematical equations that determine the output of a
neural network.
○ The function is attached to each neuron in the network, and determines whether it
should be activated (“fired”) or not, based on whether each neuron’s input is
relevant for the model’s prediction.
● Feed-forward and backpropagation learning

● Error, backpropagation, gradient descent


○ A cost function is then adopted to measure the “error”, that is the difference
between true output value and the predicted output value.
■ It basically judges how wrong or bad the learned model is in its current
form. The ideal goal is to have zero cost. Usually, a minimum cost value is
set as a stop criteria.
○ After getting the “error”, the backpropagation process is followed to reduce the
current error cost.
■ Backpropagation is to tweak the weights of the previous layer, which aims
to get the value we want in the current layer.
■ We do this recursively throughout however many layers are in the network.
○ Gradient Descent is usually used to tweak the weights. It is a first-order iterative
optimisation algorithm for finding the global minimum of the cost function.
■ In general, when we adjust the current weight, we move to the left or right
of the current value location, and we can figure out which direction
produces a slope with a lower value than the current value, and we can
take a small step in that direction and then try again (Figure 9).

● Forward and backward


○ Feed-forward and backpropagation is a cycle learning process. We need to repeat,
maybe thousands even millions of times before we can find the global minimum
value of the cost function.
○ Once a neural network is trained, it may be used to analyse new data. That is, the
practitioner stops the training and allows the network to function in forward
propagation mode only.
○ The forward propagation output is the predicted model used to interpret and make
sense of previously unknown input data.

Primer Knowledge of AI
● To further understand how current AI works, we will introduce the primer knowledge of
deep learning in this section.
● As machine learning is the base of deep learning, so a general introduction of some basic
knowledge about machine learning will be shown first.

Machine Learning
● Machine learning involves the creation of algorithms which can modify/adjust itself without
human intervention to produce desired output - by feeding itself through input data.
● Through this learning process, the machine can categorise similar people or things,
discover or identify hidden or unknown patterns and relationships, also can detect
anomalous behaviours in the given data, which allow it to make predictions/estimations
possible outcomes or actions of future data.
● Therefore, to do machine learning, we usually follow five steps, from data collection, data
preparation to modelling, understanding and delivering the results (as shown in figure 6)

● Machine learning workflow


○ Step 1 and 2 are data preparation work, it transforms the raw data into structured
data that the machine can read.
○ For example, to do image classification (“dog” or “cat”), we should know what kind
of image features we need to extract and how to extract, like texture, edge, and
shape
○ We call these features as input data, usually represented as a vector or matrix

, is one structured feature.


○ And output data is the corresponding label (“dog” or “cat”). Step 4, and 5 are
straightforward and easily understandable.
○ Step 3 Model building is the key process of machine learning.
○ The processes machines use to learn are known as algorithms. Based on different
algorithms used at this step, machine learning can be further categorised into four
big types: supervised learning, unsupervised learning, semi-supervised learning
and reinforcement learning.

Supervised Learning
● The supervised learning algorithm is as its name shown, it is trained/taught using given
examples.
● The examples are labelled, means the desired output for input is known.
● For example, a credit card application can be labelled either as approved or rejected.
● The algorithm received a set of inputs (the applicants’ information) along with the
corresponding outputs (whether the application was approved or not) to foster learning.
● The model building or the algorithm learning is a process to minimise the error between
the estimated output and the correct output.
● Learning stops when the algorithm achieves an acceptable level of performance, such as
the error is smaller than the pre-defined minimum error.
● The trained algorithm is then applied to unlabeled data to predict the possible output value,
such as whether a new credit card application should be approved or not.
● This is helpful to what we can familiar with, called Know Your Customer (KYC) in bank
business.
● There are multiple supervised learning algorithms: Bayesian statistics, regression
analysis, decision trees, random forests, support vector machines (SVM), ensemble
models and so on.
● Practical applications include risk assessment, fraud detection, image, speech and text
recognition etc.

Unsupervised Learning
● Different from supervised learning, in supervised learning, the algorithm is not
trained/taught on the “right answer”. The algorithm tries to explore the given data and
detect or mine the hidden patterns and relationships within the data. In this case, there is
no answer key. Learning is based on the similarity/distance among the given data points.
● Take bank customer understanding as an example, unsupervised learning can be used to
identify several groups of bank customers. The customers in a specific group are with
similar demographic information or same bank product selections. The learned
homogeneous groups can help the bank to figure out the hidden relationship within the
customer’s demographics and their bank products selection.
● This would provide useful insights on customer targeting when the bank would like to
promote a product to new customers. Also, unsupervised learning works well with
transactional data in that it can be used to identify a group of individuals with similar
purchase behaviour who can then be treated as a single homogeneous unit during
marketing promotions.
● Association rule mining, clustering like K-means, nearest-neighbour mapping, self-
organising mapping, dimensionality reduction like principal component analysis, are all the
common and popular unsupervised learning algorithms.
● Practical applications cover market basket analysis, customer segmentation, anomaly
detection and so on.

Semi-Supervised Learning
● Semi-supervised learning is used to address similar problems as supervised learning.
● However, in semi-supervised learning, the machine is provided both labelled and
unlabelled data.
● A small amount of labelled data is combined with a large amount of unlabelled data.
● When the cost associated with labelling is too high to allow for a fully labelled training
process, semi-supervised learning is normally utilised.
● Using the labelled data, semi-supervised learning algorithms first a large amount of
unlabelled data.
● A new model will further be trained using the new labelled data set.
● For example, an online news portal wants to do web pages classification or labelling.
● Let’s say the requirement is to classify web pages into different categories (i.e. Sports,
Politics, Business, Entertainment, etc.).
● In this case, it is prohibitively expensive to go through hundreds of millions of web pages
and manually label them.
● Therefore the intent of semi-supervised learning is to take as much advantage of the
unlabelled data as possible, to improve the trained model.
● Image classification and text classification are good practical applications of semi-
supervised learning.

Reinforcement Learning
● The intent of reinforcement learning is to find the best actions that lead to maximum reward
or drive the most optimal outcome.
● The machine is provided with a set of allowed actions, rules, and potential end states. In
other words, the rules of the game are defined. By applying the rules, exploring different
actions and observing resulting reactions the machine learns to exploit the rules to create
the desired outcome.
● Thus determining what series of actions, in what circumstances, will lead to an optimal or
optimised result.
● Reinforcement learning is the equivalent of teaching someone to play a game. The rules
and objectives are clearly defined.
● However, the outcome of any single game depends on the judgement of the player who
must adjust his approach in response to the incumbent environment, skill and actions of
a given opponent. It is often utilised in gaming and robotics.

A Beginner’s Guide to Deep Reinforcement Learning

● https://wiki.pathmind.com/deep-reinforcement-learning
● Deep reinforcement learning combines artificial neural networks with a reinforcement
learning architecture that enables software-defined agents to learn the best actions
possible in virtual environment in order to attain their goals.
● While neural networks are responsible for recent AI breakthroughs in problems like
computer vision, machine translation and time series prediction - they can also combine
with reinforcement learning algorithm to create something astounding like Deepmind’s
AlphaGo, an algorithm that beat the world champions of the Go board game.
● Google DeepMind’s Deep Q-learning playing Atari Breakout
○ https://www.youtube.com/watch?v=V1eYniJ0Rnk
● Marl/O - Machine Learning for Video Games
○ https://www.youtube.com/watch?v=qv6UVOQ0F44
● Reinforcement Learning - Ep 30 (Deep Learning SIMPLIFIED)
○ https://www.youtube.com/watch?v=e3Jy2vShroE
● Simulation and Automated Deep Learning
○ https://www.youtube.com/watch?v=EHP47tM6ctc

Deep Learning
● Data is to machine learning what life is to human learning. The output of a machine
learning algorithm is entirely dependent on the input data it is exposed to.
● Therefore, to train a good machine learning model, experts need to do good data
preparation beforehand. To some extent, machine learning performance depends on the
quality of the input data.
● Deep learning follows a similar workflow as machine learning, while the main advantage
is deep learning does not necessarily need structured data as input. Imitating the way how
our human brain works to solve problems - by passing queries through various hierarchies
of concepts and related questions to find an answer, deep learning uses artificial neural
networks to hierarchically define specific features via multiple layers (as Figure 5 shown).
● Deep learning weakens the dependence of machine learning on feature engineering,
which makes it general and easier to apply to more fields. The following section illustrates
the primer knowledge about how deep learning works.
● We know deep learning do the mapping of input to output via a sequence of simple data
transformations (layers) in an Artificial Neural Network.
● Take face recognition as an example, as shown in Figure 7, data (face image) is presented
to the network via the input layer, which connects to one or more hidden layers. The hidden
layers further connect to an output layer.
● Each hidden layer represents one level of face image features (greyscale, eye shape,
facial contours, etc.). Every node on each layer is connected to the nodes on the neighbour
layer with a weight value.
● The actual processing of deep learning is done by adjusting the weights of each
connection to realise input-output mapping.

Example of ANN using for face recognition


● Google
○ Google launched the deep-learning-focused Google Brain project in 2011,
introduced neural nets into its speech-recognition products in mid-2012, and
retained neural nets pioneer Geoffrey Hinton in March 2013. It now has more than
1,000 deep-learning projects underway, it says, extending across search, Android,
Gmail, photo, maps, translate, YouTube, and self-driving cars. In 2014 it bought
DeepMind, whose deep reinforcement learning project, AlphaGo, defeated the
world’s go champion, Lee Sedol, in March, achieving an artificial intelligence
landmark.
● Microsoft
○ Microsoft introduced deep learning into its commercial speech-recognition
products, including Bing voice search and X-Box voice commands, during the first
half of 2011. The company now uses neural nets for its search rankings, photo
search, translation systems, and more. “It’s hard to convey the pervasive impact
this has had,” says Lee. Last year it won the key image-recognition contest, and in
September it scored a record low error rate on a speech-recognition benchmark:
6.3%
● Facebook
○ In December 2013, Facebook hired French neural nets innovator Yann LeCun to
direct its new AI research lab. Facebook uses neural nets to translate about 2
billion user posts per day in more than 40 languages, and says its translations are
seen by 80 million users a day. (About half its community does not speak English.)
Facebook also uses neural nets for photo search and photo organisation, and it’s
working on a feature that would generate spoken captions for untagged photos
that could be used by the visually impaired.
● Baidu
○ In May 2014, Baidu hired Andrew Ng, who had earlier helped launch and lead the
Google Brain project, to lead its research lab. China’s leading search and web
services site, Baidu, uses neural nets for speech recognition, translation, photo
search, and a self-driving car project, among others. Speech recognition is key in
China, a mobile-first society whose main language, Mandarin, is difficult to type
into a device. The number of customers interfacing by speech has tripled in the
past 18 months, Baidu says.

Artificial Neural Networks (ANN)


● Most ANNs contain a learning scheme that modifies the connection weights based on
the input patterns and connection types that it is presented. This brings different deep
neural networks, such as convolutional deep neural networks (CNNs), recurrent neural
networks (RNNs).
● Here we take a simple ANN to illustrate the learning process: feed-forward and
backpropagation, which is similar to biological neural networks. Human brains learn to do
complex things, such as recognising objects, not by processing exhaustive rules but
through experience, feedback, adjust and learn. Figure 8 gives an illustration of this
process.
● In the beginning, all the connections are randomly assigned weight values. In feed-forward
step, all the input nodes receive their respective values from the given input and generate
a combination, like linear transmission, to the nodes in hidden layers.
● Upon receiving the initial input, the hidden layers make a random guess as to what that
pattern might be via using the assigned weights. There are various activation functions for
the calculation at the hidden and output layers. The sigmoid or logistic function remains
the most popular among users.

You might also like