0% found this document useful (0 votes)
15 views159 pages

Week 1 Lec 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views159 pages

Week 1 Lec 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 159

Responsible & Safe AI

Prof. Ponnurangam Kumaraguru (PK), IIITH


Prof. Balaraman Ravindran, IIT Madras
Prof. Arun Rajkumar, IIT Madras

Week 1: AI Risks
Improvement in AI capabilities

2
What is the current situation?
Hard to differentiate between AI & Human
How did we get here?
Scaling up algorithms
Scaling up data for training
Increasing computing capabilities
Not many predicted that we would have these advancements
Worry about AI overtaking Human

3
4
AI capabilities
Vision
Reinforcement Learning
Language
Multi-Paradigm
….

5
GANs 2014

6
Image
generation

7
Image
generation

New algorithms, GANs, transformers, diffusion models


Scaling up of Compute & Data used during training 8
Image
generation

9
Image
generation

10
Image
generation

Professor teaching Responsible and Safe AI course at IIIT Hyderabad for 70+ students
11
Video
generation
2019

DeepMind’s DVD-GAN model

12
Video
generation
April 2022

13
Video
generation
Tiny plant sprout coming out of land Teddy bear running in New York city Oct 2022

https://openai.com/index/sora/ 14
Video
Games
2013
Pong and
Breakout

15
Video
Games
2018
Starcraft,
Dota2

16
Strategy
games
2016 / 17
AlphaGo

17
Strategy
games
2022
Diplomacy

Hidden alliances, negotiations, deceiving other players

https://en.wikipedia.org/wiki/Diplomacy_(game)
18
Language based tasks
Text generation
Common-sense Q&A
Planning & strategic thinking

19
Language models
2011

20
GPT-2 2019

21
GPT-3 2020
Same as GPT-2
100X parameters

22
ChatGPT 2022
Significant changes
form GPT-3

23
Common sense Q&A
Google’s 2022
PaLM model

24
25
Common sense Q&A
Google’s 2022
PaLM model

26
AP exam

27
Planning & Strategic
thinking

28
Acting on instruction /
plans

29
https://www.adept.ai/blog/act-1
https://arxiv.org/pdf/2307.07924.pdf 30
31
32
33
34
35
36
37
ChatGPT
Facts
Writing email
Writing code
And many more…..

38
Any use cases / experiences from your side?

39
Coding: GPT-3 with Codex LM
Codex is the model that
powers GitHub Copilot

Training = natural language


and billions of lines of source
code from publicly available
sources
OpenAI Codex is most capable
in Python, but it is also
proficient in over a dozen
languages including JavaScript,
Go, Perl, PHP, Ruby, Swift and
TypeScript, and even Shell.
https://openai.com/blog/openai-codex#spacegame 40
Math: Google’s MINERVA model (PaLM variant)

41
Math: AlphaTensor

https://deepmind.google/discover/blog/discovering-novel-algorithms-with-alphatensor/
42
Life Sciences: AlphaFold2

Predicting protein
structure
GDT is a measure of
similarity between two
protein structures

https://en.wikipedia.org/wiki/Global_distance_test
43
https://blog.google/technology/ai/google-deepmind-isomorphic-alphafold-3-ai-model/#life-molecules 44
https://blog.google/technology/research/google-ai-research-new-images-human-brain/ 45
Similar systems / applications
Bard by Google - is connected to internet, docs, drive, gmail
LLaMa by Meta - open source LLM
BingChat by Microsoft - integrates GPT with internet
Copilot X by Github - integrates with VSCode to help you write code
HuggingChat - open source chatGPT alternative
BLOOM by BigScience - multilingual LLM
OverflowAI by StackOverflow - LLM trained by stackoverflow
Poe by Quora - has chatbot personalities
YouChat - LLM powered by search engine You.com
More in the list, Devin, GPT40

46
In summary
Most of the advancements in 2022 and beyond
Good at taking actions in complex environment, strategic thinking and
connecting to real world

47
48
49
Activity #AICapabilities
Imagine the optimal collaboration between AI and humans across
sectors like healthcare, education, environmental management, and
more.
What innovations are necessary to achieve this?
What challenges could arise, and what potential risks might we face in
this best-case scenario?
Drop your answers as a response in mailing list with subject line
“Activity #AICapabilities”

50
White House:
Executive Order
on Safe, Secure,
and Trustworthy
Artificial
Intelligence, Oct
2023

https://www.whitehouse.gov/briefing-room/statements-releases/2023/10/30/fact-sheet-president-biden-issues-executive-order-on-safe-secure-and-trustworthy-artificial-intelligence/
51
https://www.whitehouse.gov/briefing-room/statements-releases/2023/10/30/fact-sheet-president-biden-issues-executive-order-on-safe-secure-and-trustworthy-artificial-intelligence/
52
Deepfakes

https://www.youtube.com/watch?v=cQ54GDm1eL0 53
Deepfakes

https://www.youtube.com/watch?v=enr78tJkTLE
54
Deepfakes: What goes on behind the scenes; go to
colab

55
https://colab.research.google.com/github/JaumeClave/deepfakes_first_order_model/blob/master/first_order_model_deepfakes.ipynb
Lip sync

https://bhaasha.iiit.ac.in/lipsync/example_upload1 56
Face recognition

https://youtu.be/jZl55PsfZJQ?si=3wD5xxRHgnD1p1fR 57
Weaponization

https://www.theguardian.com/world/2023/dec/01/the-gospel-how-israel-uses-ai-to-select-bombing-targets 58
59
Errors / Bias in algorithms

60
https://techcrunch.com/2023/06/06/a-waymo-self-driving-car-killed-a-dog-in-unavoidable-accident/
Errors in algorithms

61
https://www.theguardian.com/technology/2022/dec/22/tesla-crash-full-self-driving-mode-san-francisco
Errors in algorithms

62
https://www.indiatoday.in/technology/news/story/robot-confuses-man-for-a-box-of-vegetables-pushes-him-to-death-in-factory-2460977-2023-11-09
What is going on? ☺

https://www.youtube.com/watch?v=lnyuIHSaso8&t=75s 63
More

https://economictimes.indiatimes.com/news/new-updates/man-gets-caught-in-deepfake-trap-almost-ends-life-among-first-such-cases-in-india/articleshow/105611955.cms
64
Malicious use: ChaosGPT
"empowering GPT with Internet and
Memory to Destroy Humanity.”

https://decrypt.co/126122/meet-chaos-gpt-ai-tool-destroy-humanity 65
Malicious use: ChaosGPT

https://en.wikipedia.org/wiki/Tsar_Bomba 66
Malicious use: ChaosGPT

https://decrypt.co/126122/meet-chaos-gpt-ai-tool-destroy-humanity 67
Malicious use: ChaosGPT

https://www.youtube.com/watch?v=kqfsuHsyJb8 68
Your list of AI risks?

69
What is an alignment problem?

70
What is an alignment problem?

https://www.youtube.com/watch?v=yWDUzNiWPJA 71
Misalignment?

https://www.ndtv.com/offbeat/ai-chatbot-goes-rogue-swears-at-customer-and-slams-company-in-uk-4900202
https://twitter.com/ashbeauchamp/status/1748034519104450874/ 72
73
74
https://flowingdata.com/2023/11/03/demonstration-of-bias-in-ai-generated-images/
75
https://blog.google/products/gemini/gemini-image-generation-issue/ 76
https://blog.google/products/gemini/gemini-image-generation-issue/ 77
https://blog.google/products/gemini/gemini-image-generation-issue/ 78
Any questions?

79
Risk sources / Taxonomy
Malicious use
AI race
Organizational risks
Rogue AIs

80
Malicious use
AI could be used to engineer new pandemics or for
propaganda, censorship, and surveillance, or released to
autonomously pursue harmful goals.

81
Malicious use: Bioterrorism
Ability to engineer pandemic is rapidly becoming more accessible
Gene synthesis is halving cost every 15 months
Benchtop DNA synthesis can help rogue actors new biological agents
with no safety measures

https://www.nature.com/articles/s42256-022-00465-9 82
Malicious use: ChaosGPT
"empowering GPT with Internet and
Memory to Destroy Humanity.”

https://decrypt.co/126122/meet-chaos-gpt-ai-tool-destroy-humanity 83
Persuasive AI
AIs will enable sophisticated personalized influence campaigns that may
destabilize our shared sense of reality

AIs have the potential to increase the accessibility, success rate, scale,
speed, stealth and potency of cyberattacks

Cyberattacks can destroy critical infrastructure

84
Concentration of Power
If material control of AIs is limited to few, it could represent the most
severe economic and power inequality in human history.

85
Malicious use: Solutions
Improving biosecurity
Restricted access controls
Biological capabilities removed from general purpose AI
Use of AI for biosecurity
Restricting access to dangerous AI models
Controlled interactions
Developers to prove minimal risks
Technical research on anomaly detection
Holding AI developers liable for harms
86
AI race
Competition could push nations and corporations to rush
AI development, relinquishing control to these systems.

Cyberwarfare, autonomous weapons, automate human


labor → mass unemployment and dependence on AI
systems.

87
AI race: Military

Low-cost automated weapons, such as drone swarms outfitted with


explosives, could autonomously hunt human targets with high
precision, performing lethal operations for both militaries and terrorist
groups and lowering the barriers to large-scale violence.

88
AI race: Corporate

As AIs automate increasingly many tasks, the economy may become


largely run by AIs. Eventually, this could lead to human enfeeblement
and dependence on AIs for basic needs.

89
AI race: Solutions
Safety regulations: self regulation of companies,
competitive advantage for safety oriented companies
Data documentation: transparency & accountability
Meaningful human oversight: human supervision
AI for cyber defense: anomaly detection
International coordination: standards for AI development,
robust verification & enforcement
Public control of general-purpose AIs

90
Organizational risks
Organizations developing advanced AI cause catastrophic
accidents; profits over safety
AIs could be accidentally leaked to the public or stolen by
malicious actors, and organizations could fail to properly invest
in safety research.

91
Organizational risks

New capabilities can emerge quickly and unpredictably during


training, such that dangerous milestones may be crossed
without our knowing.

92
Organizational risks

The Swiss cheese model shows how technical factors can


improve organizational safety. Multiple layers of defense
compensate for each other’s individual weaknesses, leading to
a low overall level of risk. 93
Organizational risks: Solutions
Red teaming
Prove safety
Deployment
Publication reviews
Response plans
Risk management: Employ a chief risk officer and an internal
audit team for risk management.
Processes for important decisions: Make sure AI training or
deployment decisions involve the chief risk officer and other
key stakeholders, ensuring executive accountability.
94
Rouge AIs
We risk losing control over AIs as they become more capable.
Proxy gaming: YouTube / Insta – User engagement – Mental health

95
Rouge AIs: power seeking

It can be instrumentally rational for AIs to engage in self-


preservation. Loss of control over such systems could be hard to
recover from.

96
Rouge AIs: Deception

Various resources, such as money and computing power, can


sometimes be instrumentally rational to seek. AIs which can capably
pursue goals may take intermediate steps to gain power and
resources.

97
Rouge AIs: Solutions

AIs should not be deployed in high-risk settings, such


as by autonomously pursuing open-ended goals or
overseeing critical infrastructure, unless proven safe.

Need to advance AI safety research in areas such as


adversarial robustness, model honesty,
transparency, and removing undesired capabilities.
98
World GDP adjusted for inflation
https://ourworldindata.org/economic-growth 99
Rapid acceleration
Took hundreds of thousands of years for Homo Sapiens → agricultural
revolution & millenia for industrial revolution
Centuries later AI revolution

https://ourworldindata.org/economic-growth World GDP adjusted for inflation 100


Double edge sword of technology, nuclear weapons

101
102
Solutions to these risks?

103
Solutions to Mentioned Risks

104
Solutions to Mentioned Risks

People Policy Technology

105
A Notional Decomposition of Risk

Risk ≈ Vulnerability × Hazard Exposure × Hazard

Vulnerability: a factor or process that increases susceptibility to the damaging effects


of hazards

Exposure: extent to which elements (e.g., people, property, systems) are subjected
or exposed to hazards

Hazard: a source of danger with the potential to harm

Dan Hendrycks Introduction to ML Safety 106


A Notional Decomposition of Risk

Risk ≈ Vulnerability × Hazard Exposure × Hazard

This is a risk Here, “×” just Here, “Hazard” is a


corresponding to denotes shorthand for hazard
a specific hazard, nonlinear probability and
not total risk interaction severity

Dan Hendrycks Introduction to ML Safety 107


Example: Injury from Falling on a Wet Floor

Risk ≈ Vulnerability × Hazard Exposure × Hazard


Bodily Brittleness Floor Utilization Floor Slipperiness

Introduction to ML Safety 108


Example: Injury from Falling on a Wet Floor

Risk ≈ Vulnerability × Hazard Exposure × Hazard


Bodily Brittleness Floor Utilization Floor Slipperiness

Dan Hendrycks Introduction to ML Safety 109


Example: COVID

Risk ≈ Vulnerability × Hazard Exposure × Hazard


Old Age, Poor Contact with Prevalence
Health, etc. Carriers and Severity

Dan Hendrycks Introduction to ML Safety 110


Example: COVID

Risk ≈ Vulnerability × Hazard Exposure × Hazard


Old Age, Poor Contact with Prevalence
Health, etc. Carriers and Severity

Dan Hendrycks Introduction to ML Safety 111


112
Lets look at ML systems

113
The Disaster Risk Equation

Risk ≈ Vulnerability × Hazard Exposure × Hazard


Alignment

Reduce the probability


and severity of
inherent model
hazards
Dan Hendrycks Introduction to ML Safety 114
Agents Must Pursue Good Goals

Dan Hendrycks Introduction to ML Safety 115


The Disaster Risk Equation

Risk ≈ Vulnerability × Hazard Exposure × Hazard


Robustness

Withstand Hazards

Dan Hendrycks Introduction to ML Safety 116


Agents Must Withstand Hazards

Dan Hendrycks Introduction to ML Safety 117


Agents Must Withstand Hazards

Dan Hendrycks Introduction to ML Safety 118


The Disaster Risk Equation

Risk ≈ Vulnerability × Hazard Exposure × Hazard


Monitoring

Identify Hazards

Dan Hendrycks Introduction to ML Safety 119


Agents Must Identify and Avoid Hazards

Dan Hendrycks Introduction to ML Safety 120


The Disaster Risk Equation

Risk ≈ Vulnerability × Hazard Exposure × Hazard


Systemic Safety

Reduce systemic risks

Dan Hendrycks Introduction to ML Safety 121


Remove Hazards

Dan Hendrycks Introduction to ML Safety 122


Remove Hazards

Dan Hendrycks Introduction to ML Safety 123


Reducing Risk vs Estimating Risk

Risk ≈ Vulnerability × Hazard Exposure × Hazard

Dan Hendrycks Introduction to ML Safety 124


Errors in algorithms

125
https://www.indiatoday.in/technology/news/story/robot-confuses-man-for-a-box-of-vegetables-pushes-him-to-death-in-factory-2460977-2023-11-09
Example: Robot confuses man for veggies

Risk ≈ Vulnerability × Hazard Exposure × Hazard


????? ????? ?????

Dan Hendrycks Introduction to ML Safety 126


Example: Robot confuses man for veggies

Risk ≈ Vulnerability × Hazard Exposure × Hazard


Misclassifying Employees Injury / Death
veggies to & Robot around
humans each other

Dan Hendrycks Introduction to ML Safety 127


Other examples?

128
X-Risks

129
AI could someday reach human intelligence

Human intelligence arises


from changes that are not
necessarily that dramatic
architecturally
The train won’t stop at human station

Intelligence

Dumb Albert
person Einstein
The train won’t stop at human station

Intelligence
Dumb
person
????

Ant Cat Albert


Einstein
Intelligence is power
Gorillas are far stronger than we are
Yet their existence depends entirely on us
The difference: intelligence
“It Isn’t Going to Happen”
Sept 11, 1933: Ernest Rutherford: “Anyone who looks
for a source of power in the transformation of the
atoms is talking moonshine.”

Sept 12, 1933: Leo Szilard invents neutron--


induced nuclear chain reactions. “We switched
everything off and went home. That night, there
was very little doubt in my mind that the world
was headed for grief.”
134
Models Are Not Always Truthful

135
Models Are Not Always Honest
We can show models “know” the truth,
but sometimes are not incentivized to
output it.

136
Emergent capabilities are common

137
Emergent capabilities are common

Larger “LMs exhibit qualitatively different reasoning abilities, e.g.,


RoBERTa succeeds in reasoning tasks where BERT fails completely”

Capabilities are only continuing to get better

138
Power-seeking can be instrumentally incentivized
“By default, suitably strategic
“One might imagine that AI systems and intelligent agents, engaging
with harmless goals will be harmless.
in suitable types of planning, will
This paper instead shows that intelligent
systems will need to be carefully have instrumental incentives to
designed to prevent them from behaving gain and maintain various types
in harmful ways.” ~ Omohundro of power, since this power will
help them pursue their
objectives more effectively”
- Joseph Carlsmith, Is Power-
seeking AI an Existential
Risk?
https://en.wikipedia.org/wiki/Steve_Omohundro
139
Power-seeking can be explicitly incentivized

“Whoever becomes the


leader in [AI] will become
the ruler of the world.”
- Vladimir Putin

140
Stephen Hawking on AI Risk
“Unless we learn how to prepare for,
and avoid, the potential risks, AI could
be the worst event in the history of our
civilization. It brings dangers, like
powerful autonomous weapons, or
new ways for the few to oppress the
many. It could bring great disruption to
our economy.”

“The development of full artificial


intelligence could spell the end of the
human race.”
141
Elon Musk on AI Risk

“I think we should be very careful about


artificial intelligence. If I were to guess
like what our biggest existential threat
is, it’s probably that. … With artificial
intelligence we are summoning the
demon.”
“As AI gets probably much smarter
than humans, the relative intelligence
ratio is probably similar to that
between a person and a cat, maybe
bigger”

142
Hillary Clinton on AI Risk

“Think about it: Have you ever seen a


movie where the machines start
thinking for themselves that ends well?
Every time I went out to Silicon Valley
during the campaign, I came home
more alarmed about this. My staff lived
in fear that I’d start talking about ‘the
rise of the robots’ in some Iowa town
hall. Maybe I should have.”

143
Alan Turing on AI Risk

“Once the machine thinking method


had started, it would not take long to
outstrip our feeble powers. At some
stage therefore we should have to
expect the machines to take control.”

144
Norbert Wiener on AI Risk

“Moreover, if we move in the direction


of making machines which learn and
whose behavior is modified by
experience, we must face the fact that
every degree of independence we give
the machine is a degree of possible
defiance of our wishes. The genie in the
bottle will not willingly go back in the
bottle, nor have we any reason to
expect them to be well disposed to us.”

145
“There are very few
examples of a more
intelligent thing being
controlled by a less
intelligent thing,”

https://edition.cnn.com/videos/tv/2023/05/02/the-lead-geoffrey-hinton.cnn

146
Speculative Hazards and Failure Modes
148
Weaponized AI
Recently, it was shown that AI could generate potentially deadly
chemical compounds
Weaponized AI
AI could be used to create autonomous weapons
Deep RL methods outperform humans in simulated aerial combat
What to do about weaponized AI?

Anomaly detection
Detect novel hazards such as novel biological phenomena
Detect malicious use and nation-state misuse
Systemic Safety (forecasting, ML for cyberdefense, cooperative AI)
Reduce probability of conflict
Policy
Out of scope for this course
Proxy Gaming
Future artificial agents could over-optimize and game faulty proxies,
which could mean systems aggressively pursue goals and create a
world that is distinct from what humans value

In the real world, “what gets measured gets managed,” so we will


need to appropriately measure our values
Treacherous Turns
AI could behave differently once it has
the ability to do so
For instance, it could turn after
reaching a high enough intelligence,
detecting that it is “deployed in the
real world”, gaining enough power,
the removal of a safeguard, etc.

Might be difficult to predict beforehand


and difficult to stop
Deceptive Alignment
Deception doesn’t require a superhuman model

The robot only appears to be


grabbing the ball
Persuasive AI

Superintelligent AI could be extremely


persuasive

It may become difficult to differentiate reality


from fiction

Current examples: disinformation, social


media bots, deepfakes
156
https://arxiv.org/pdf/2206.05862.pdf#page=13
https://arxiv.org/pdf/2206.13353.pdf
https://www.youtube.com/watch?v=UbruBnv3pZU&t=37s

157
Activity #AIRisks

Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence


What to do?
1. Please go through the FACTSHEET
2. Submit the following with subject line Activity #AIRisks
I. At least 3 technical issues that are highlighted in the Order
II. At least 3 ideas that you think you can take it up as course project
158
pk.profgiri

Ponnurangam.kumaraguru

/in/ponguru

ponguru
Thank you
[email protected]
for attending
the class!!!

You might also like