001 ML Introduction W1L2
001 ML Introduction W1L2
Machines that can learn from data and their own experience?
AI Around Us
Boston Dynamics
Evolution
Sophia
Ameca
AI Around Us: No need to look so far
away!
Machines as mechanical helpers
Machines as Intellectual helpers
13
But how is AI different from traditional
Computer Science?
Traditional Problems
Data
Output
Program
New Frontiers
Data
Output
Program
Machine Learning
➢ $150,000
Y Y N Y
➢ $190,000
N N Y Y
➢ $350,000
Y N N Y
➢ $550,000
➢ $90,000
Traditional CS
Data
Output
Program
Machine Learning
Data
Program
Output
Machine Learning Pipeline
Data
Output Correct Answer
Data
Program
Output
𝑊 𝑃(𝑊)
○ When the classes are not predefined, we call it “Unsupervised Machine Learning”
■ In this case the classifier just clusters the inputs into a bunch of classes without
naming them.
Classifiers
CAT!
NO!
cat
{cat, dog}
happy
{happy, sad, angry,
surprised, neutral,
other}
empty
{empty, full}
Classifiers
“Hospital”
0
x 0
x x
Return the best of {all
positions one x move from
current}
Classifiers: Language Models (LM)
“in”
40
41
Data
Data – Big, Big… Data!
How do we obtain these massive datasets to train our Machine Learning
models?
From real interactions e.g., call centers
Expert annotators e.g., hired teams of annotators
Crowd sourcing
Recaptcha: Tagging:
We tag data for “free” for using “free”
services
44
Applications
Speech Technologies
What was said?
Was it Ahmad?
Was it plagiarized?
47
Evaluation: Ideal and Practical
The Turing Test
• The "standard interpretation" of the Turing test:
• Player C, the interrogator, tries to determine which player – A or B – is
a computer and which is a human.
• The interrogator is limited to using the responses to written questions to
make the determination.
• https://en.wikipedia.org/wiki/Turing_test
• XKCD Data Science: https://towardsdatascience.com/12-xkcd-strips-that-show-the-truth-about-ai-e09fbcd00c4c
48
ML in Local Languages of Pakistan
Speech Recognition (Speech to Text)
Speech Synthesis (Text to Speech)
Limitations and Challenges
Challenges of ML – Explainability
• A classifier can potentially learn to classify on the basis of
features not desirable for humans
• All dogs wearing a collar in the training data while no cat is wearing it –
ML just learns to separate based on collar
• All horse images have a copyrights notice – ML just learns to recognize
horses based on the copyrights notice
53
Challenges – Fairness in AI
• AI tends to reflect the biases of the society
• Human taggers who mark a recording as misinformation
based on accent or gender
• Court decisions in country that make a rich person’s
acquittal more likely
• Automated standardized testing in the US could yield
unfavorable results for certain demographic groups
• AI plays a deciding role in hiring decisions, with up to 72%
of resumes in the US never being viewed by a human
• Decisions on immigration, bank loans, credit history checks,
criminal profiling
54
Machine Learning in Low-resource settings
• Problems where large data sets and tools are not
available
• Natural Language Processing and Speech
problems for languages of developing regions
• Pakistan has 71 languages
• We barely have speech recognition capabilities for Urdu
55
The Internet
• The internet has transformed the way people participate in the
information ecology and digital economy
• Social media, online discussion forums, crowdsourcing marketplaces
56
Oral and Offline
• 2.9 billion people worldwide are offline
• That is 37% of the world population
o Of these, 96% live in developing countries.
• 10% of the developed world, 43% of the developing world
and 73% of the Least Developed Countries are offline*
• Offline populations
• too poor to afford Internet-enabled devices
• too remote to access the Internet
• too low-literate to navigate the mostly-text-driven Internet
References: International Telecommunication Union (ITU): Facts and Figures 2021: 2.9 billion people still offline, Link,
https://www.itu.int/en/ITU-D/Statistics/Documents/facts/FactsFigures2021.pdf, last accessed Feb 22, 2022
McKinsey (2014), WHO, World Bank, Ethnologue, The World Fact book – CIA, GSMA Mobile Economy, weforum.org
57
Digital Divides
• Gender Divide: More men than women use the Internet.
• The gap is smaller in developed countries and larger in developing countries,
and LDCs (4 out of every 5 women are offline in LDCs).
• Urban-Rural Divide: More urban than rural people use the internet
• Globally, people in urban areas are twice as likely to use the Internet than those
in rural areas (47% vs 13% in LDCs).
References: WHO (link), McKinsey (2014), WHO, World Bank, Ethnologue, The World Fact book – CIA, GSMA Mobile Economy,
ITU, https://itu.foleon.com/itu/measuring-digital-development/gender-gap/
PTA, https://www.pta.gov.pk/en/telecom-indicators
58
Lack of access to Information and Connectivity can be a major impediment
to Development
59
Managing Expectations
• Too optimistic/eager
o AI replacing humans as caring partners, AI replacing creative professions, super-intelligent AI
o AI invasion, robots gaining sentience and self-awareness, technological singularity, terminators!
Managing Expectations
• Too pessimistic/reluctant
o No practical applications of AI, no chances of a positive social impact, AI cannot help with anything
o No expected social harm of the mistakes of AI, no associated risks of integrating AI irresponsibly
• The differences between these extremes have been responsible for the AI
winters
https://www.npr.org/2023/02/09/1155650909/google-chatbot--error-bard-shares
Managing Expectations
• Just right
o AI can aid human efforts and professions, it can transform the industry, improve performance,
precision and accuracy, it can produce meaningful impact on the society
o Real harms that must be mitigated e.g., disruptive influence on the job market, hard to explain
and interpret, biases, the need for fairness and regulation, fail-safes to mitigate potential harms
Responsible AI
Artificial Intelligence should be practiced in a manner that is:
• Explainable/Transparent/Interpretable
• Fair/unbiased
• Ethical
o Disruptive influences
o Privacy and informed consent
• Safe
o For the stakeholders
o From adversarial attacks (the cat-n-mouse game)
• Regulated
• XKCD Data Science: https://towardsdatascience.com/12-xkcd-strips-that-show-the-truth-about-ai-e09fbcd00c4c
64
For more details please visit
http://aghaaliraza.com
Thank you!
65