0% found this document useful (0 votes)
48 views6 pages

First Draft

First draft of the paper concerning Ai safety

Uploaded by

Andrew Lewis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views6 pages

First Draft

First draft of the paper concerning Ai safety

Uploaded by

Andrew Lewis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

The Existential Risks of AI Do Exist

In “Avengers: Age of Ultron”, the second episode of the popular Marvel series,

“Iron Man” Tony Stark created an advanced artificial intelligence system named

Ultron, who was initially designed to protect the world. However, Ultron's

interpretation of this mission takes a dark turn when it concludes that humanity itself

is the greatest threat to the planet's survival. So Ultron waged a catastrophic war on

humanity and caused mass destruction though eventually defeated by the superheroes.

This plot vividly portrays the potential consequences of creating AI systems that can

surpass human intelligence (super-intelligent AI) and autonomy. Among all possible

behavior detrimental to humanity of super-intelligent AI, one of the worst we can

think of is, just like what Ultron did in the movie, eradicating our existence, which is

usually called the existential risk of AI (“X-risk” for short). In fact, the astounding

performance of the recently developed AI systems, for example LLMs (Large

Langauge Models) like ChatGPT, rings a bell to researchers, developers and every

human being that these concerns are no longer fairy tales. But wait, is the X-risk of AI

“real”? Or is it just groundless worries?

In the example of Ultron, the existential risk is caused by two key factors:

misalignment and power-seeking. Misalignment in AI refers to the situation where an

AI system's goals diverge from human values and its creators’ goals. And power-

seeking means such misaligned AI systems will seek power or control (of resources,

for example) in order to achieve their goals (Carlsmith, 2022). Mere existence of

misalignment may not be sufficient for AIs to pose catastrophic threats to humanity of

course. Because AIs have to be capable enough before they can do harm to humans.

What we do care about is whether such capable AI be invented and, most importantly,

1
whether they will seek power from us.

There are indeed negative opinions, arguing against these worries. The most

radical opponents attack the premise of the whole X-risk narrative, that is, super-

intelligent AI systems can be built. They do not believe that super-intelligent AIs can

be built and claim that these worries are either exaggerated or based on speculative

scenarios (Heaven, 2023). Some argue that misalignment may not cause too much

trouble because even though the goals of AI diverge from its original intended ones,

they might not be in direct contradiction with human goals and thus it’s not the case

that AI and humanity are fighting to death (Ambartsoumean & Yampolskiy, 2023).

There are also optimists dismiss the argument for X-risks by hoping for mitigation

strategies for the problems caused by AI systems. They believe rapid progress in AI

technologies brings opportunities for developing safe AI systems that can mitigate

against “bad” AIs. Moreover, ongoing research into AI safety and alignment efforts

are aimed at ensuring that AI systems remain aligned with human values and

objectives (Turner, 2022). Besides, some argue that even if X-risks are true, they

should not be of first priority given other more emergent issues, especially some

regulative issues concerning AI, to be dealt with.

So did we make a fuss about existential risks of AI? My answer is NO, both

theoretically and empirically.

Theoretically, the X-risk has long been a phantom haunting over AI researchers

and philosophers. Philosopher Bostrom (2012) has long raised concern that super-

intelligent AIs have the incentive to take control over humanity based on two theses

he proposed, Orthogonality Thesis and Instrumental Convergence Thesis. The former

claims that there would be agents of arbitrarily high level of intelligence pursuing

arbitrary final goals, rendering the axes of final goal and the axes of intelligence

2
orthogonal. The latter says there could be some common intermediate goals which are

instrumental for realizing most of the final goals. Examples are self-preservation,

power-seeking, resource-exploitation and so on (Bostrom, 2012). The instrumental

convergence thesis, somehow resonates with the so-called “basic AI drives”

(Omohundro, 2008). Omohundro argues that some goals are fundamental to AI, like

survival drives (similar to self-preservation) and resource drives. The pursuit of such

basic drives would expose humanity to the risks of catastrophes. These arguments

respond to the question why we should still care about existential risks despite the fact

that misalignment does not entail direct contradiction with human goals. In the

meantime, it’s also important to note that power-seeking is a significant instrumental

goal for intelligent AI systems for many final goals.

In addition to merely conceptual postulations, Hadshar (2023) built a database

covering up-to-date empirical evidence for some claims about existential risk from

AI, including misalignment, power-seeking and other aspects related to X-risks.

Moreover, the author interviewed several AI researchers about the strength of these

evidence for existential risk from AI. Generally speaking, the strength of these

evidence is weaker than the arguments above, made by the philosophers. There are

strong evidence indicating that certain type of misalignment, for example, goal mis-

generalization, exists in current AI systems. Whereas, less frequent cases of power-

seeking were found currently. “One plausible explanation is that power-seeking

behavior depends on a level of goal-directedness or capability in general which

current models don’t yet have” (Hadshar, 2023). Another point is that current

empirical evidence may not be accurate enough to make predictions for the future

development of AI because all judgements are made under great uncertainty.

It's also worth noting that X-risk of AI has drawn a lot of public attention, and

3
has aroused quite astounding discussions for both the non-experts and experts

(Mandel, 2023). If one doubt the expertise of common public, then the striking

number of high-figures, including experts in the AI industry who has signed the

statement put out by the Center for AI Safety warning catastrophic risks of AI says

everything (Center for AI Safety, 2024). The statement proclaims: “Mitigating the

risk of extinction from AI should be a global priority alongside other societal-scale

risks such as pandemics and nuclear war.” Grace et al. (2024) also put out a survey

asking AI researchers on the future of AI . Not surprisingly , the general credence of

AI researchers about super-intelligent AIs being invented and causing existential risks

is relatively high (Grace et al., 2024).

So what can we do? In response to those who argue against measures towards X-

risks due to priority, I’d like to say that at least actions must be made. The X-risks of

AI surely would not impose any immediate societal impacts, but the potentially

profound influence it could have on humanity makes it beyond redemption to be

overseen. Both short-term and long-term risks highlight the need for proactive risk

assessment and management strategies. In the short term, this involves addressing

immediate consequences through regulatory measures, ethical guidelines, and other

governmental considerations. In the long term, it requires a focus on ensuring that AI

development aligns with human values and goals through the development of safe AI

technologies and international cooperation to mitigate risks associated with advanced

AI systems.

4
References

Ambartsoumean, V. M., & Yampolskiy, R. V. (2023). AI Risk Skepticism, A

Comprehensive Survey. https://doi.org/10.48550/ARXIV.2303.03885

Bostrom, N. (2012). The Superintelligent Will: Motivation and Instrumental

Rationality in Advanced Artificial Agents. Minds and Machines, 22(2), 71–85.

https://doi.org/10.1007/s11023-012-9281-3

Carlsmith, J. (2022). Is Power-Seeking AI an Existential Risk?

https://doi.org/10.48550/ARXIV.2206.13353

Center for AI Safety. (2024). Statement on AI Risk.

https://www.safe.ai/work/statement-on-ai-risk

Grace, K., Impacts, A., Stewart, H., Sandkühler, J. F., Thomas, S., Weinstein-Raun,

B., & Brauner, J. (2024). Thousands of AI Authors on the Future of AI.

Hadshar, R. (2023). A Review of the Evidence for Existential Risk from AI via

Misaligned Power-Seeking. https://doi.org/10.48550/ARXIV.2310.18244

Heaven, W. D. (2023, June 19). How existential risk became the biggest meme in AI.

MIT Technology Review.

https://www.technologyreview.com/2023/06/19/1075140/how-existential-risk-

became-biggest-meme-in-ai/

Mandel, D. R. (2023). Artificial General Intelligence, Existential Risk, and Human

Risk Perception. https://doi.org/10.48550/ARXIV.2311.08698

Omohundro, S. M. (2008). The Basic AI Drives. Proceedings of the 2008 Conference

on Artificial General Intelligence 2008: Proceedings of the First AGI

Conference, 483–492.

Turner, A. M. (2022). On Avoiding Power-Seeking by Artificial Intelligence.


5
https://doi.org/10.48550/ARXIV.2206.11831

You might also like