Open-Source and Science in The Era of Foundation Models
Open-Source and Science in The Era of Foundation Models
capability
paper, weights
access
paper, API
API
open-weight “neuroscientist”
prompt response
open-weight “neuroscientist”
open-weight “neuroscientist”
present
(tool use)
past
(verifiers)
Tale of two agents
Hardest: 24 hours
Reflections: dual implications of cybersecurity agents
Joon: Right. So I was born in Korea. I lived there for the first 12 years or so of my life. And my parents and I moved to Boston. So my parents were
pursuing their postdoctoral degree. They're both medical doctors. So we moved to Boston. So we lived there for five years. And then I went to high
school in New Hampshire after that, college in Pennsylvania, called Swarthmore College. It's a liberal arts college. And towards the end of my
college years, one other co-founder and I started to run a startup. So while we were running that, as we were graduating, we moved to Palo Alto. So
we lived by somebody's garage, actually, for about a year. The typical thing that you do if you're into that. The startup itself didn't pan out. But during
that year, I got to know this group of researchers and community in the computer science department at Stanford, just by through some luck and just,
yeah, through pure luck. And I started to work on some research projects with them. So I spent about a half a year, a little bit more than maybe half a
year working with them. And then I went to get my master's in computer science at UIUC, that's Illinois Urbana-Champaign. So I spent two years of
my life there. And when I was finishing that up, I moved back to Palo Alto, started my PhD as a computer science PhD student at Stanford. And I've
been here since then. So this is my year five of doing that.
Interviewer: Thank you for sharing that overview! I'd love to learn more about your experiences. Can you tell me what it was like for you
adjusting to life in Boston after moving from Korea? What were some of the challenges you faced during that transition?
Joon: It's a good question. Well, moving to a different country and different culture for any child and their parents, it's a heavy lift. But I think likely
what was sort of the most interesting was the fact that the entire family, to some extent, was sort of at this point where they were all trying to figure
out their career and so forth…
85% normalized accuracy!
Reflections: agents and API access
open-weight “neuroscientist”
Problem: impossible to train to get θ1’ since only have the final weights!
Idea 3
perm(θ) = permute the hidden units defined by θ to get counterfactuals
Not independent!
StripedHyena-Nous-7B ~ Mistral-7B-v0.1
Other findings
Miqu-70B (Mistral leak) ~ Llama-2-70B
Llama-3.1-8B ~ Llama-3.2-3B
Reflections: open-weight access
open-weight “neuroscientist”
FineWeb, SmolLM
GPT-J, GPT-NeoX, Pythia OLMo, OLMoE
RedPajama
StarCoder
DCLM-BASELINE
MAP-Neo, OpenCoder
K2
Performance gaps
Model developers don’t own license for web data (copyrighted), can’t release!
Need compute to (re-)train to achieve spirit of open-source
What mixture to use?
NeurIPS 2023
diagonal Hessian with clipping
ICLR 2024
precise model editing
ACL 2023
Would the results hold if we scaled up?
Where do we get the compute?
Track 1: construct scaling laws that extend down
Track 2: harness idle GPUs everywhere
The problem
100Gbps 1Gbps
Training (1B models) is only ~2x slower than in the datacenter
NeurIPS 2022
Track 3: fund the public good
Big Science
Levels of access for foundation models
open-weight “neuroscientist”