0% found this document useful (0 votes)
318 views53 pages

Class 1 Intro AI

Uploaded by

iimayoral
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
318 views53 pages

Class 1 Intro AI

Uploaded by

iimayoral
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Teresa Scantamburlo

Introduction to AI European Centre for Living


Technology (ECLT)
“Philosophical reflections on an AI future”
Swiss Summer School, Miglieglia (Switzerland) 7-14 July 2019 Ca’ Foscari University of Venice
The age of AI
The dream of AI

(Dartmouth workshop, 1956)


“The study is to proceed on
the basis of the conjecture
that every aspects of
learning or any other feature
of intelligence can be in
principle be so precisely
described that a machine can
be made to simulate it”

(Dartmouth workshop, 1956)


Take-home message

“Computers that could simulate human intelligence were once a futuristic dream.
Now they are all around us – but not in the way their pioneers expected”

Nello Cristianini, The Road to artificial intelligence: A case of data over theory (2016)
What type of AI is thriving?

What paradigm has become


dominant?

(Dartmouth workshop, 1956)


AI and intelligence
• Complicated philosophical questions:
• What does intelligence mean?
• What do we count as an intelligent task?

• There exist different models of intelligence:


• How does AI instantiate intelligence?
• What model has become the most “successful”?
Four approaches in AI
Basic definition: “The field of Artificial Intelligence deals with the study and the design of systems that…

Thinking humanly Thinking Rationally

…think like humans” …operate according to the laws of


(AI systems as a model of human thought”
cognition) (solving problem by using logics)

Acting humanly Acting Rationally

…act / behave as they were humans” …act to achieve the best expected
(Turing test) outcome”
(correct inference / rational agent)

Russell S. and Norvig P., Artificial Intelligence. A Modern Approach, 2010


Four approaches in AI
Basic definition: “The field of Artificial Intelligence deals with the study and the design of systems that…
E ar
ly A
Thinking humanly Thinking Rationally I
…think like humans” …operate according to the laws of
(AI systems as a model of human thought”
cognition) (solving problem by using logics)

Acting humanly Acting Rationally

…act / behave as they were humans” …act to achieve the best expected
(Turing test) outcome”
(correct inference / rational agent)

Russell S. and Norvig P., Artificial Intelligence. A Modern Approach, 2010


Four approaches in AI
Basic definition: “The field of Artificial Intelligence deals with the study and the design of systems that…

Thinking humanly Thinking Rationally

…think like humans” …operate according to the laws of


(AI systems as a model of human thought”
cognition) (solving problem by using logics)
To
da
Acting humanly Acting Rationally yA
I
…act / behave as they were humans” …act to achieve the best expected
(Turing test) outcome”
(correct inference / rational agent)

Russell S. and Norvig P., Artificial Intelligence. A Modern Approach, 2010


AI phases
• Symbolic approach & early successes
• Connectionism – neural networks
• Early failures
• Neural network revival
• Deep learning
Symbolic approach

They proposed:

Allen Newell
• Logic Theorist 1955
• General Problem Solver 1957 Herbert Simon
• Physical Symbol System and Heuristic Search hypotheses 1976
Physical Symbol System
Physical Symbol System hypothesis: “A Physical symbol system has the necessary and
sufficient means for general intelligent action”

(Newell A. and Simon H., Computer Science as empirical inquiry: Symbols and search, 1976)

Essential ideas:
• Physical à obey laws of physics / be engineered
• Symbol system à set of entities that designate objects and carry out processes
Search
Heuristic Search Hypothesis: “The solutions to problems are represented as symbol
structures. A physical symbol system exercises its intelligence in problem solving by
search – that is by generating and progressively modifying symbol structures until it
produces a solution structure”

Example:
AX + B = CX + D

AX + B – CX = CX + D – CX

X = (D – B)/(A – C)

Newell A. and Simon H., Computer Science as empirical inquiry: Symbols and search (1976)
Graph and tree search
• Reducing the problem to a graph or tree search
• Element that must be specified:
• States (the space of symbol structures)
• Actions (the processes for modifying a symbol structure into another)
• The initial and the goal state (the test)

• The task is to select a series of actions that take us from the initial state to the goal
state
• Note that the search can be uninformed (blind) or informed (heuristics /
information to assess the progress toward the goal)
Example: states
Venice

Milan Bolzano Udine

Lugano Innsbruck Davos Villach


Example: actions
Venice

Milan Bolzano Udine

Lugano Innsbruck Davos Villach


Example: initial and goal state
Venice

Milan Bolzano Udine

Lugano Innsbruck Davos Villach


Example: breadth-first search
Venice

Milan Bolzano Udine

Lugano Innsbruck Davos Villach


Example: depth-first search
Venice

Milan Bolzano Udine

Lugano Innsbruck Davos Villach


General-purpose machines
“In artificial intelligence, an initial burst of activity aimed at building intelligent
programs for a wide variety of almost randomly selected tasks is giving way to
more sharply targeted research aimed at understanding the common mechanisms
of such systems”

Example of common mechanisms: schemes of representation for goals and plans,


procedures for search, pattern-matching mechanisms, …

Newell A. and Simon H., Computer Science as empirical inquiry: Symbols and search (1976)
Knowledge and search
• He defined the high-level programming language LISP
• In 1958 he proposed the idea of “Advice Taker”

“a program has common sense if it automatically deduces for


itself a sufficiently wide class of immediate consequences of
anything it is told and what it already knows” John McCarthy

McCarthy J. Programs with common sense (1958)

Central idea: to embed knowledge into the system and search for solutions
Connectionism
• McCulloch and Pitt proposed a
mathematical model for a neuron (1943)

Warren S. McCulloch

Walter H. Pitts

• Rosenblatt proposed a model for pattern


recognition called Perceptron (1957)

Frank Rosenblatt
McCulloch and Pitts model
Neuron is modelled as a binary threshold unit:

The unit fires if the weighted sum of the input ∑" 𝑤𝑖𝑗 𝑎𝑖 reaches or exceeds a threshold value 𝜇𝑗

Russell S. and Norvig P., Artificial Intelligence. A Modern Approach, 2010


Network topologies
• Feedforward vs. Feedback loop (recurrent)
• Fully connected vs. sparsely connected
• Single layer vs. multilayer

Pelillo M. http://www.dsi.unive.it/~pelillo/Didattica/Old%20Stuff/RetiNeurali/
The problem of classification
Given:
1) Some features (f1, f2, f3,…fn)
2) Some classes (c1, c2, c3,…cm)

Problem: to classify an object according to its features

Pelillo M. http://www.dsi.unive.it/~pelillo/Didattica/Old%20Stuff/RetiNeurali/
The problem of classification
Given:
1) Some features (f1, f2, f3,…fn)
2) Some classes (c1, c2, c3,…cm)

Problem: to classify an object according to its features

Philosophical principle of essentialism (distinction between


essential and accidental properties), see e.g.:
• Pelillo M, Scantamburlo T “How Mature Is the Field of Machine
Learning?” 2013.

Aristotle
Simple example
Given certain classes or categories:
c1 = “watermelon”
c2 = “apple”
c3 = “orange”

Classify object based on the following features: Example:


f1 = “weight” weight = 80g
f2 = “colour” colour = green ”apple”
f3 = “size” size = 10 cm3

Pelillo M. http://www.dsi.unive.it/~pelillo/Didattica/Old%20Stuff/RetiNeurali/
Geometrical interpretation
Classes = {1,0}
Features = {x1, x2} ∈ [0,+∞[

Pelillo M. http://www.dsi.unive.it/~pelillo/Didattica/Old%20Stuff/RetiNeurali/
Neural networks for classification
A neural network can be used for classification tasks
• Input = features values
• Output = class labels
Example: 3 features, 2 labels

Pelillo M. http://www.dsi.unive.it/~pelillo/Didattica/Old%20Stuff/RetiNeurali/
Simple perceptron
A network consisting of one layer of McCulloch and Pitt neurons connected in a
feedforward way (no lateral or feedback connections)

• Perceptrons can represent all of the primitive Boolean functions (e.g. AND, OR)
Mitchell T., Machine Learning, 1997
Limitations of Perceptrons

Marvin Minsky Seymour Papert

• In 1969 they showed that perceptrons suffer from serious limitations


• Perceptrons can only solve linearly separable problems
Linear separability
X Y X and Y X Y X and Y
0 0 0 0 0 0
0 1 0 0 1 1
0 1 0 0 1 1
1 1 1 1 1 0

Pelillo M. http://www.dsi.unive.it/~pelillo/Didattica/Old%20Stuff/RetiNeurali/
Linear separability
X Y X and Y X Y X and Y
0 0 0 0 0 0
0 1 0 0 1 1
0 1 0 0 1 1
1 1 1 1 1 0

Perceptrons can solve this!


Early failures
• Popular story about machine translation:

“The spirit is willing but the flesh is weak” English à Russian

“The vodka is good but the meat is rotten” Russianà English

• Syntactic manipulation is not sufficient to deal with machine translation


• In 1966 a report found that “there has been no machine translation of general
scientific text, and none is in immediate prospect”

Russell S. and Norvig P., Artificial Intelligence. A Modern Approach, 2010


Application-oriented AI
• Resurgence of neural networks in 1980s:
• Back-propagation algorithm, a technique that was invented in the late
1960s and then rediscovered by several scholars
• In 1982 John Hopfield popularised a form of recurrent neural networks
(Hopfield networks)
• ….

• Successes in various domains:


• Handwritten characters recognition
• Speech recognition
• Image recognition
•…
Back-propagation learning algorithm
• It is used to learn weights in multilayer network given a a set of training examples
{(x1,y1)…(xn,ym)}
• Example of a training set: {([“80 g”, “green”, “10 cm3”], “apple”)…([“500 g”, “green”, “45 cm3”],
“watermelon”)}
• It is based on the gradient descent method to compute the weights (no need of
threshold)
• A training error function must be specified in such a way the output of the network
is as closed as possible to the desired output. Example (with a single unit).:
.
𝐸 → = ∑ 𝑡2 − 𝑜2 2
error , / 0
output of the network (with a single unit)
vector of weights desired/target output

Mitchell T., Machine Learning, 1997


Geometrical interpretation
Given the space of possible weights
for a single unit
Objective = to minimize di error

The direction of steepest descent is


found by computing the derivative of E


Gradient of E = vectors of derivative of E with respect to ,

Mitchell T., Machine Learning, 1997


Back-propagation
• Back propagation algorithm consists of two steps:
• Forward pass: the input to the network is propagated layer after layer in
forward direction
• Backword pass: the error made by the network is propagated backword, and
weight are updated accordingly

• Multilayer networks learned by Back-propagation are capable of expressing a rich


variety of non-linear decision surfaces

Pelillo M. http://www.dsi.unive.it/~pelillo/Didattica/Old%20Stuff/RetiNeurali/
Learning machines
• Great achievements and commercialization of AI
• Terry Sejnowski presented NetTalk, an artificial neural network that learn to pronounce English
text the same way as a baby does (1985)
• IBM deep blue beats the world champion at chess, Garry Kasparov (1997)
• Amazon replaces human product recommendation editors with an automated system (2002)
• Google launches Translate (2007)
• IBM’s super computer Watson beats two human champions ta the quiz game Jeopardy (2011)

• Creation of data repositories, e.g.


• UCI (1987) https://archive.ics.uci.edu/ml/datasets.php
• MNIST database (1998) http://yann.lecun.com/exdb/mnist/
• ImageNet (2009) http://www.image-net.org/
• …
Deep-Learning
• Representations-learning methods with multiple levels of representation
• They combined simple but non-linear modules
• Each module transform the representation at one level into a representation at
higher, more abstract level
• Higher layers of representation amply features of the input data that are important
for classification
• The key idea is that these layers of features are not designed by human engineers
but are learnt from data

LeCun Y., Bengio Y. and Hinton G., Deep Learning, Nature 2015
Deep-learning

Weights are adjusted


like in Back-propagation

LeCun Y., Bengio Y. and Hinton G., Deep Learning, Nature 2015
Big data revolution
• In 2009 Google researchers published an influential paper celebrating “the
unreasonable effectiveness of data”:

“We should stop acting as if our goal is to author extremely


elegant theories, and instead embrace complexity and make use
of the best ally we have: the unreasonable effectiveness of data”

Halevy A., Norvig P., Pereira F., The Unreasonable Effectiveness of Data, 2009
Insights

“Contrary to the assumptions of 60 years ago, we don’t need precisely describe a


feature of intelligence for a machine to simulate it”
“While each of these mechanism is simple enough that we might call it statistical
hack, when we deploy many of them simultaneously in complex software, and feed
them with millions of examples, the result might look like highly adaptive behaviour
that feels intelligent to us”

Nello Cristianini, The Road to artificial intelligence: A case of data over theory (2016)
Concluding remarks
The dominant AI paradigm places great emphasis on
• Statistical inference (prediction) / machine learning (classification)
• Recommend products or friends
• Answer human queries
• Personalize news
• Predict risk scores (e.g. loans, credit card…)
• Optimal control
• Maximize an objective function over time
• Data collection and curation
• Integration with human life
• Data collection and curation as a by-product of online activities and micro-workers
• AI to support decision-making
References
• Russell N and Norvig P, Artificial Intelligence. A modern Approach, 3rd Edition, Pearson, 2010
• Cristianini N, “The Road to artificial intelligence: A case of data over theory”, New Scientist, 2016
• Mitchell T, Machine Learning, McGraw Hill, 1997
• Newell A and Simon H, “Computer science as empirical inquiry: symbols and search”. Commun.
ACM 19, 3 (March 1976), 113-126, 1976
• LeCun Y, Bengio Y and Hinton G, “Deep Learning”, Nature, 521, 436–444, 2015
• Halevy A, Norvig P and Pereira F., “The Unreasonable Effectiveness of Data”, IEEE Intelligent Systems,
8-12, 2009
• Pelillo M, Scantamburlo T “How Mature Is the Field of Machine Learning?” AI*IA 2013. Lecture Notes
in Computer Science, vol 8249, pp. 121-132, 2013
• Pelillo M, classes on neural networks,
http://www.dsi.unive.it/~pelillo/Didattica/Old%20Stuff/RetiNeurali/

You might also like