0% found this document useful (0 votes)

10 views88 pages

Open-Source and Science in The Era of Foundation Models

The document discusses the evolution of access to foundation models in AI, highlighting the importance of API, open-weight, and open-source access for research and development. It emphasizes how different levels of access shape research capabilities and the potential for creating advanced agents in machine learning and cybersecurity. The document concludes with reflections on the implications of access and the need for collaborative efforts to enhance research in the field.

Uploaded by

mhwani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views88 pages

Open-Source and Science in The Era of Foundation Models

Uploaded by

mhwani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 88

Open-Source and Science

In the Era of Foundation Models

Berkeley LLM Agents Course - November 18, 2024
Percy Liang
Capabilities skyrocket…

capability

2018 2019 2020 2021 2022 2023

Access plummets…
paper, code, data, weights

paper, weights
access

paper, API

API

2018 2019 2020 2021 2022 2023

Why does access matter?
Access shapes research

1990s: Internet (text in digital form) ⇒ statistical NLP methods

2010s: crowdsourcing platforms ⇒ large annotated datasets

2010s: GPUs ⇒ deep learning methods

Levels of access for foundation models

API “cognitive scientist”

open-weight “neuroscientist”

open-source “computer scientist”

API access

Analogy: cognitive scientists can measure behavior

prompt response

Opportunity: build agents to solve complex problems

Open-weight access

Analogy: neuroscientists can probe internal activations

Opportunity: understand mechanisms, create novel derivatives

Open-source access

Analogy: computer scientist building a system can control every part of it

Opportunity: question everything

Levels of access for foundation models

API “cognitive scientist”

open-weight “neuroscientist”

open-source “computer scientist”

Levels of access for foundation models

API “cognitive scientist”

open-weight “neuroscientist”

open-source “computer scientist”

API access
prompt response

GPT-4, Claude, Gemini

● Think of the API as a universal function (e.g., summarize, verify, generate)

● Compose API calls together into systems (agents)
● Important: API is controller of execution (not called by fixed program)
Agent architecture
future

present

(tool use)

past
(verifiers)
Tale of two agents

Problem-solving agents Simulation agents

ICML 2024
Results
Success rate: fraction over 8 trials that agent improves by 10% over reference
Reflections: research agents
● Related work
○ MLE-Bench [Chan+ 2024]: benchmark with 75 Kaggle challenges
○ AIDE [Schmidt+ 2024]: agent architecture for data science competitions
○ OpenHands (OpenDevin) [Wang+ 2024]: general-purpose platform for software development
○ CORE-Bench [Siegel+ 2024]: benchmark to reproduce research results
○ Generating novel research ideas [Si+ 2024]

Self-improvement: solve task → improve model → solve task better

Results

Hardest: 24 hours
Reflections: dual implications of cybersecurity agents

Evaluation of cyber-risk (offense) Penetration testing tool (defense)

Tale of two agents

Problem-solving agents Simulation agents

UIST 2023
Architecture
Retrieval
Reflection
Simulating social behavior
Let’s make it real…
Interviews are rich
Interviewer: To start, I would like to begin with a big question: tell me the story of your life. Start from the beginning -- from your
childhood, to education, to family and relationships, and to any major life events you may have had.

Joon: Right. So I was born in Korea. I lived there for the first 12 years or so of my life. And my parents and I moved to Boston. So my parents were
pursuing their postdoctoral degree. They're both medical doctors. So we moved to Boston. So we lived there for five years. And then I went to high
school in New Hampshire after that, college in Pennsylvania, called Swarthmore College. It's a liberal arts college. And towards the end of my
college years, one other co-founder and I started to run a startup. So while we were running that, as we were graduating, we moved to Palo Alto. So
we lived by somebody's garage, actually, for about a year. The typical thing that you do if you're into that. The startup itself didn't pan out. But during
that year, I got to know this group of researchers and community in the computer science department at Stanford, just by through some luck and just,
yeah, through pure luck. And I started to work on some research projects with them. So I spent about a half a year, a little bit more than maybe half a
year working with them. And then I went to get my master's in computer science at UIUC, that's Illinois Urbana-Champaign. So I spent two years of
my life there. And when I was finishing that up, I moved back to Palo Alto, started my PhD as a computer science PhD student at Stanford. And I've
been here since then. So this is my year five of doing that.

Interviewer: Thank you for sharing that overview! I'd love to learn more about your experiences. Can you tell me what it was like for you
adjusting to life in Boston after moving from Korea? What were some of the challenges you faced during that transition?

Joon: It's a good question. Well, moving to a different country and different culture for any child and their parents, it's a heavy lift. But I think likely
what was sort of the most interesting was the fact that the entire family, to some extent, was sort of at this point where they were all trying to figure
out their career and so forth…
85% normalized accuracy!
Reflections: agents and API access

● Use API to create agents

● Solve complex problems in ML engineering and cybersecurity
● Simulate people (digital twin of society) - lab for social scientists
● Next: static agents ⇒ learn from experiences
● AlphaGo analogy: supervised learning ⇒ reinforcement learning
Levels of access for foundation models

API “cognitive scientist”

open-weight “neuroscientist”

open-source “computer scientist”

Open-weight access

Llama, Qwen, Mixtral, Jamba, Yi, Gemma, Phi

More accurately: dual-use foundation models with widely available weights

Reproducibility

API models get deprecated You always have the weights…

θ1 θ2

Were θ1 and θ2 independently trained or not (e.g., θ1 fine-tuned from θ2)?

Idea 1
Compute sim(θ1, θ2) - e.g., cosine similarity of MLP weights

Problem: if sim(θ1, θ2) = 0.1, is that similar or not? Statistical guarantees?

Idea 2
Train a bunch of models { sim(θ1’, θ2): θ1’ = train(random init) }

p-value = ℙ[sim(θ1’, θ2) > sim(θ1, θ2)]

Problem: impossible to train to get θ1’ since only have the final weights!
Idea 3
perm(θ) = permute the hidden units defined by θ to get counterfactuals

p-value = ℙ[sim(perm(θ1), θ2) > sim(θ1, θ2)]

Empirical validation

Not independent!
StripedHyena-Nous-7B ~ Mistral-7B-v0.1
Other findings
Miqu-70B (Mistral leak) ~ Llama-2-70B

Llama-3.1-8B ~ Llama-3.2-3B
Reflections: open-weight access

● Strong open-weight models (e.g., Llama 3) have been immensely valuable

● Enables research on interpretability, fine-tuning, distillation, merging (all
reproducible!)
● Question: how weight modifications can yield coherent functional changes?
● Teaches us about API models (e.g., adversarial attacks transfer)
● New problems motivated by open-weights: model independence testing
● But still confined by the blueprint of existing models…
Levels of access for foundation models

API “cognitive scientist”

open-weight “neuroscientist”

open-source “computer scientist”

Open-source language model efforts

FineWeb, SmolLM
GPT-J, GPT-NeoX, Pythia OLMo, OLMoE

RedPajama
StarCoder

DCLM-BASELINE
MAP-Neo, OpenCoder
K2
Performance gaps

Model Access MMLU

Claude Sonnet 3.5 API 87.3

Llama 3.1 Instruct (405B) open-weight 84.5

OLMo 1.7 (7B) open-source 53.8

What exactly is open-source?
Free and open-source software
Roots: hacker ethic (MIT in 1950s) + academia (for centuries)

Values: creativity, exploration, transparency, collaboration, resistance against authority

1983: Richard Stallman started GNU (bash, ls, …)

1991: Linus Torvalds started Linux

1998: Open-Source Initiative (OSI) - coined and defined “open-source”

Open-Source AI Definition - version 1.0
Data information, not data

Model developers don’t own license for web data (copyrighted), can’t release!
Need compute to (re-)train to achieve spirit of open-source
What mixture to use?

distributionally robust optimization (DRO)

NeurIPS 2023
diagonal Hessian with clipping

ICLR 2024
precise model editing

MacBook := MacBook - Apple + HP

ACL 2023
Would the results hold if we scaled up?
Where do we get the compute?
Track 1: construct scaling laws that extend down
Track 2: harness idle GPUs everywhere
The problem

100Gbps 1Gbps
Training (1B models) is only ~2x slower than in the datacenter
NeurIPS 2022
Track 3: fund the public good
Big Science
Levels of access for foundation models

API “cognitive scientist”

open-weight “neuroscientist”

open-source “computer scientist”

Final remarks

● Access shapes research

● Many interesting problems with API (agents) and open-weight (distillation)
● Today, most research lives within the confines of APIs and fixed weights
● Question everything: data, model architecture, training algorithm
● Goal: understand data, architecture → model behavior (hard even with full
access)
● Compute: try at smaller scales + scaling laws; pool our compute
Thank you!
THE END
Train-test overlap
Upshot: can predict benchmark performance
Data information includes data processing code
Big question: but will these results transfer to larger scales?
Can use surrogates to extrapolate across scales
What is the future of the open web?

AI and Machine Learning in Action Real World Solutions For Coders
No ratings yet
AI and Machine Learning in Action Real World Solutions For Coders
175 pages
Introduction To Computer Science - WEB
No ratings yet
Introduction To Computer Science - WEB
945 pages
Agentic AI in Financial Services
100% (1)
Agentic AI in Financial Services
36 pages
Intro To Intelligent Apps Workshop
100% (1)
Intro To Intelligent Apps Workshop
106 pages
Data Science & Generative AI Technologies
No ratings yet
Data Science & Generative AI Technologies
97 pages
Python AI ML LLM TrainingJun142024
No ratings yet
Python AI ML LLM TrainingJun142024
192 pages
DeepSeek, Deep Impact On The US Monopoly On AI Front
No ratings yet
DeepSeek, Deep Impact On The US Monopoly On AI Front
3 pages
Res6-013f21 Presentation Slides
No ratings yet
Res6-013f21 Presentation Slides
54 pages
Essential Python Libraries and Frameworks
No ratings yet
Essential Python Libraries and Frameworks
170 pages
Res6-013f21 Presentation Slides
No ratings yet
Res6-013f21 Presentation Slides
54 pages
Smith 2021 Collaborative Open and Automated Data Science
No ratings yet
Smith 2021 Collaborative Open and Automated Data Science
187 pages
Design of Feedback For A System
No ratings yet
Design of Feedback For A System
170 pages
AI
No ratings yet
AI
61 pages
First Contact With Tensor Flow PDF
100% (2)
First Contact With Tensor Flow PDF
136 pages
Design A Machine Learning System
No ratings yet
Design A Machine Learning System
9 pages
Data + AI Summit Keynote Day 1
No ratings yet
Data + AI Summit Keynote Day 1
41 pages
001 ML Introduction W1L2
No ratings yet
001 ML Introduction W1L2
64 pages
Lecture 0 - CS50's Introduction To Artificial Intelligence With Python
No ratings yet
Lecture 0 - CS50's Introduction To Artificial Intelligence With Python
13 pages
Assignment Week 05
No ratings yet
Assignment Week 05
22 pages
Summer Course Material
No ratings yet
Summer Course Material
52 pages
Share Capstone - Mark1
No ratings yet
Share Capstone - Mark1
16 pages
DA Python Env Intro
No ratings yet
DA Python Env Intro
47 pages
Open (For Business) - Big Tech, Concentrated Power, and The Political Economy of Open AI
No ratings yet
Open (For Business) - Big Tech, Concentrated Power, and The Political Economy of Open AI
27 pages
First Contact With Tensor Flow - Part 1
100% (1)
First Contact With Tensor Flow - Part 1
136 pages
DataScience, AI, GenerativeAI, Analytics Tech Insights
No ratings yet
DataScience, AI, GenerativeAI, Analytics Tech Insights
97 pages
OpenAI API
No ratings yet
OpenAI API
14 pages
AI Notes
No ratings yet
AI Notes
19 pages
Workshop AI Baker PDF
No ratings yet
Workshop AI Baker PDF
88 pages
Modul 3 Data Science
No ratings yet
Modul 3 Data Science
10 pages
Exploring Computing Innovations Milestones (APCSP)
No ratings yet
Exploring Computing Innovations Milestones (APCSP)
6 pages
Introducing OpenAI
No ratings yet
Introducing OpenAI
3 pages
ABES Presentation
No ratings yet
ABES Presentation
91 pages
ML 22
No ratings yet
ML 22
29 pages
EL4106Intro 2024
No ratings yet
EL4106Intro 2024
69 pages
Machine Leaning Cours
No ratings yet
Machine Leaning Cours
24 pages
Own Your AI - Tech Deck
No ratings yet
Own Your AI - Tech Deck
75 pages
AI Subfields
No ratings yet
AI Subfields
18 pages
AI
No ratings yet
AI
26 pages
cs188 Fa24 Lec26
No ratings yet
cs188 Fa24 Lec26
38 pages
Anjali Case
No ratings yet
Anjali Case
10 pages
AI With ICA 18092024 074806pm
No ratings yet
AI With ICA 18092024 074806pm
36 pages
Lecture 1
No ratings yet
Lecture 1
75 pages
Oracle AI Vector Search Mock Test - Set - 02
No ratings yet
Oracle AI Vector Search Mock Test - Set - 02
5 pages
Unit1 Intro AI
No ratings yet
Unit1 Intro AI
40 pages
6 Open Source Data Science Projects Interviewer
No ratings yet
6 Open Source Data Science Projects Interviewer
7 pages
Unit 1 Supervised Learning
No ratings yet
Unit 1 Supervised Learning
33 pages
Ai Notes
No ratings yet
Ai Notes
7 pages
Data Science SOP-1
No ratings yet
Data Science SOP-1
3 pages
The Real Threat of Chinese AI - Foreign Affairs
No ratings yet
The Real Threat of Chinese AI - Foreign Affairs
14 pages
Navigating The Frontiers of Computer Science
No ratings yet
Navigating The Frontiers of Computer Science
18 pages
Open Source For You 2023 02 v11n04 PDF
No ratings yet
Open Source For You 2023 02 v11n04 PDF
100 pages
Ethical and Open-Source AI: Technical Article
No ratings yet
Ethical and Open-Source AI: Technical Article
2 pages
2 The Power of Natural Language Processing Updated
No ratings yet
2 The Power of Natural Language Processing Updated
5 pages
AI and Neural Networks
No ratings yet
AI and Neural Networks
5 pages
Large Language Models and Where To Use Them - Part 1
No ratings yet
Large Language Models and Where To Use Them - Part 1
12 pages
Unveiling The Frontiers: Exploring Emerging Fields in Computer Science and Artificial Intelligence
No ratings yet
Unveiling The Frontiers: Exploring Emerging Fields in Computer Science and Artificial Intelligence
4 pages
Video 3 What Is Data
No ratings yet
Video 3 What Is Data
3 pages
AI Projects With Source Code
No ratings yet
AI Projects With Source Code
16 pages
Latest Trends Sep 2023
No ratings yet
Latest Trends Sep 2023
6 pages
ODSC Machine Learning Guide V1.1
No ratings yet
ODSC Machine Learning Guide V1.1
6 pages
How I Stay Up To Date On The Latest AI Science News - YouTube
No ratings yet
How I Stay Up To Date On The Latest AI Science News - YouTube
2 pages
LLM Based Agents Synopsis
No ratings yet
LLM Based Agents Synopsis
9 pages
ACP Gen Ai DS Brochure PDF
No ratings yet
ACP Gen Ai DS Brochure PDF
17 pages
Karpathy Software Changing
No ratings yet
Karpathy Software Changing
3 pages
AI ML RL GenAI
No ratings yet
AI ML RL GenAI
37 pages
The Rundown's Guide To Prompt Engineering
No ratings yet
The Rundown's Guide To Prompt Engineering
23 pages
SAP AI Overview - Partner Account Teams Deck
No ratings yet
SAP AI Overview - Partner Account Teams Deck
13 pages
Synthesizeme!: Inducing Persona-Guided Prompts For Personalized Reward Models in Llms
No ratings yet
Synthesizeme!: Inducing Persona-Guided Prompts For Personalized Reward Models in Llms
34 pages
Black Box Deployed: Functional Criteria For Artificial Moral Agents in The LLM Era
No ratings yet
Black Box Deployed: Functional Criteria For Artificial Moral Agents in The LLM Era
42 pages
A ' G T: B A D R C: Valon S Ame of Houghts Attle Gainst Eception Through Ecursive Ontemplation
No ratings yet
A ' G T: B A D R C: Valon S Ame of Houghts Attle Gainst Eception Through Ecursive Ontemplation
40 pages
DSMLAI 9 Months Brochure
No ratings yet
DSMLAI 9 Months Brochure
42 pages
AI For Cancer Screening
No ratings yet
AI For Cancer Screening
131 pages
Navigating The AI Frontier - A Guide For Ethical Academic Writing
No ratings yet
Navigating The AI Frontier - A Guide For Ethical Academic Writing
3 pages
Abstractive Text Summarization Using Transformer Based Approach
No ratings yet
Abstractive Text Summarization Using Transformer Based Approach
10 pages
AI Agent Leaderboard
No ratings yet
AI Agent Leaderboard
11 pages
Final Project Vaaghu
No ratings yet
Final Project Vaaghu
84 pages
How To Evaluate Control Measures For LLM Agents? A Trajectory From Today To Superintelligence
No ratings yet
How To Evaluate Control Measures For LLM Agents? A Trajectory From Today To Superintelligence
28 pages
Sandmann 等 - 2025 - Benchmark evaluation of DeepSeek large language models in clinical decision-making
No ratings yet
Sandmann 等 - 2025 - Benchmark evaluation of DeepSeek large language models in clinical decision-making
20 pages
Project Documentation - PDF Q&A With Gemini (LangChain Practical Implementation)
No ratings yet
Project Documentation - PDF Q&A With Gemini (LangChain Practical Implementation)
6 pages
h19825.1 Gen Ai Model Customization
No ratings yet
h19825.1 Gen Ai Model Customization
47 pages
Mock CLAT 19 Question Paper5515096
No ratings yet
Mock CLAT 19 Question Paper5515096
36 pages
Deepseek-Coder-V2: Breaking The Barrier of Closed-Source Models in Code Intelligence
No ratings yet
Deepseek-Coder-V2: Breaking The Barrier of Closed-Source Models in Code Intelligence
19 pages
Improving Small-Scale Large Language Models Function Calling For Reasoning Tasks
No ratings yet
Improving Small-Scale Large Language Models Function Calling For Reasoning Tasks
10 pages
Research On The Application of Large Language Models in Human Resource Management Practices
No ratings yet
Research On The Application of Large Language Models in Human Resource Management Practices
8 pages
POKERBENCH: Training Large Language Models To Become Professional Poker Players
No ratings yet
POKERBENCH: Training Large Language Models To Become Professional Poker Players
11 pages
LLM Multi-Agent Systems Challenges and Open Problems
No ratings yet
LLM Multi-Agent Systems Challenges and Open Problems
8 pages
Words Related To Artificial Intelligence
No ratings yet
Words Related To Artificial Intelligence
2 pages
AI LLM Pricing Comparison
No ratings yet
AI LLM Pricing Comparison
3 pages
Asset-V1 Databricks+LLM102x+2T2023+type@asset+block@LLMs Foundation Models From The Ground Up Syllabus
No ratings yet
Asset-V1 Databricks+LLM102x+2T2023+type@asset+block@LLMs Foundation Models From The Ground Up Syllabus
3 pages

Open-Source and Science in The Era of Foundation Models

Uploaded by

Open-Source and Science in The Era of Foundation Models

Uploaded by

Open-Source and Science

In the Era of Foundation Models

2018 2019 2020 2021 2022 2023

2018 2019 2020 2021 2022 2023

1990s: Internet (text in digital form) ⇒ statistical NLP methods

2010s: crowdsourcing platforms ⇒ large annotated datasets

2010s: GPUs ⇒ deep learning methods

API “cognitive scientist”

open-source “computer scientist”

Analogy: cognitive scientists can measure behavior

Opportunity: build agents to solve complex problems

Analogy: neuroscientists can probe internal activations

Opportunity: understand mechanisms, create novel derivatives

Analogy: computer scientist building a system can control every part of it

Opportunity: question everything

API “cognitive scientist”

open-source “computer scientist”

API “cognitive scientist”

open-source “computer scientist”

GPT-4, Claude, Gemini

● Think of the API as a universal function (e.g., summarize, verify, generate)

Problem-solving agents Simulation agents

Self-improvement: solve task → improve model → solve task better

Evaluation of cyber-risk (offense) Penetration testing tool (defense)

Problem-solving agents Simulation agents

● Use API to create agents

API “cognitive scientist”

open-source “computer scientist”

Llama, Qwen, Mixtral, Jamba, Yi, Gemma, Phi

More accurately: dual-use foundation models with widely available weights

API models get deprecated You always have the weights…

Were θ1 and θ2 independently trained or not (e.g., θ1 fine-tuned from θ2)?

Problem: if sim(θ1, θ2) = 0.1, is that similar or not? Statistical guarantees?

p-value = ℙ[sim(θ1’, θ2) > sim(θ1, θ2)]

p-value = ℙ[sim(perm(θ1), θ2) > sim(θ1, θ2)]

● Strong open-weight models (e.g., Llama 3) have been immensely valuable

API “cognitive scientist”

open-source “computer scientist”

Model Access MMLU

Claude Sonnet 3.5 API 87.3

Llama 3.1 Instruct (405B) open-weight 84.5

OLMo 1.7 (7B) open-source 53.8

Values: creativity, exploration, transparency, collaboration, resistance against authority

1983: Richard Stallman started GNU (bash, ls, …)

1991: Linus Torvalds started Linux

1998: Open-Source Initiative (OSI) - coined and defined “open-source”

distributionally robust optimization (DRO)

MacBook := MacBook - Apple + HP

API “cognitive scientist”

open-source “computer scientist”

● Access shapes research

You might also like