Some people claim that aesthetics don't mean anything, and are resistant to the idea that they could.  After all, aesthetic preferences are very individual. 

Sarah argues that the skeptics have a point, but they're too epistemically conservative. Colors don't have intrinsic meanings, but they do have shared connotations within a culture. There's obviously some signal being carried through aesthetic choices.

26Zvi
This post kills me. Lots of great stuff, and I think this strongly makes the cut. Sarah has great insights into what is going on, then turns away from them right when following through would be most valuable. The post is explaining why she and an entire culture is being defrauded by aesthetics. That is it used to justify all sorts of things, including high prices and what is cool, based on things that have no underlying value. How it contains lots of hostile subliminal messages that are driving her crazy. It's very clear. And then she... doesn't see the fnords. So close!
Customize
Wei Dai4425
2
To branch off the line of thought in this comment, it seems that for most of my adult life I've been living in the bubble-within-a-bubble that is LessWrong, where the aspect of human value or motivation that is the focus of our signaling game is careful/skeptical inquiry, and we gain status by pointing out where others haven't been careful or skeptical enough in their thinking. (To wit, my repeated accusations that Eliezer and the entire academic philosophy community tend to be overconfident in their philosophical reasoning, don't properly appreciate the difficulty of philosophy as an enterprise, etc.) I'm still extremely grateful to Eliezer for creating this community/bubble, and think that I/we have lucked into the One True Form of Moral Progress, but must acknowledge that from the outside, our game must look as absurd as any other niche status game that has spiraled out of control.
Richard_Ngo*7217
4
In response to an email about what a pro-human ideology for the future looks like, I wrote up the following: The pro-human egregore I'm currently designing (which I call fractal empowerment) incorporates three key ideas: Firstly, we can see virtue ethics as a way for less powerful agents to aggregate to form more powerful superagents that preserve the interests of those original less powerful agents. E.g. virtues like integrity, loyalty, etc help prevent divide-and-conquer strategies. This would have been in the interests of the rest of the world when Europe was trying to colonize them, and will be in the best interests of humans when AIs try to conquer us. Secondly, the most robust way for a more powerful agent to be altruistic towards a less powerful agent is not for it to optimize for that agent's welfare, but rather to optimize for its empowerment. This prevents predatory strategies from masquerading as altruism (e.g. agents claiming "I'll conquer you and then I'll empower you", which then somehow never get around to the second step). Thirdly: the generational contract. From any given starting point, there are a huge number of possible coalitions which could form, and in some sense it's arbitrary which set of coalitions you choose. But one thing which is true for both humans and AIs is that each generation wants to be treated well by the next generation. And so the best intertemporal Schelling point is for coalitions to be inherently historical: that is, they balance the interests of old agents and new agents (even when the new agents could in theory form a coalition against all the old agents). From this perspective, path-dependence is a feature not a bug: there are many possible futures but only one history, meaning that this single history can be used to coordinate. In some sense this is a core idea of UDT: when coordinating with forks of yourself, you defer to your unique last common ancestor. When it's not literally a fork of yourself, there's more arb
Daniel Tan*467
5
Research engineering tips for SWEs. Starting from a more SWE-based paradigm on writing 'good' code, I've had to unlearn some stuff in order to hyper-optimise for research engineering speed. Here's some stuff I now do that I wish I'd done starting out.  Use monorepos.  * As far as possible, put all code in the same repository. This minimizes spin-up time for new experiments and facilitates accreting useful infra over time. * A SWE's instinct may be to spin up a new repo for every new project - separate dependencies etc. But that will not be an issue in 90+% of projects and you pay the setup cost upfront, which is bad.  Experiment code as a journal.  * By default, code for experiments should start off' in an 'experiments' folder, with each sub-folder running 1 experiment. * I like structuring this as a journal / logbook. e.g. sub-folders can be titled YYYY-MM-DD-{experiment-name}. This facilitates subsequent lookup. * If you present / track your work in research slides, this creates a 1-1 correspondence between your results and the code that produces your results - great for later reproducibility * Each sub-folder should have a single responsibility; i.e running ONE experiment. Don't be afraid to duplicate code between sub-folders.    * Different people can have different experiment folders. * I think this is fairly unintuitive for a typical SWE, and would have benefited from knowing / adopting this earlier in my career.  Refactor less (or not at all).  * Stick to simple design patterns. For one-off experiments, I use functions fairly frequently, and almost never use custom classes or more advanced design patterns. * Implement only the minimal necessary functionality. Learn to enjoy the simplicity of hardcoding things. YAGNI. * Refactor when - and only when - you need to or can think of a clear reason. * Being OCD about code style / aesthetic is not a good reason. * Adding functionality you don't need right this moment is not a good reason. *
Elizabeth699
13
According to a friend of mine in AI, there's a substantial contingent of people who got into AI safety via Friendship is Optimal. They will only reveal this after several drinks. Until then, they will cite HPMOR. Which means we are probably overvaluing HPMOR and undervaluing FIO.
TsviBT256
0
If there are some skilled/smart/motivated/curious ML people seeing this, who want to work on something really cool and/or that could massively help the world, I hope you'll consider reaching out to Tabula. https://www.lesswrong.com/posts/SsLkxCxmkbBudLHQr/

Popular Comments

Recent Discussion

[Thanks to Steven Byrnes for feedback and the idea for section §3.1. Also thanks to Justis from the LW feedback team.]

Remember this?

Or this?

The images are from WaitButWhy, but the idea was voiced by many prominent alignment people, including Eliezer Yudkowsky and Nick Bostrom. The argument is that the difference in brain architecture between the dumbest and smartest human is so small that the step from subhuman to superhuman AI should go extremely quickly. This idea was very pervasive at the time. It's also wrong. I don't think most people on LessWrong have a good model of why it's wrong, and I think because of this, they don't have a good model of AI timelines going forward.

1. Why Village Idiot to Einstein is a Long Road: The Two-Component

...

Actually, even if the LLMs do scale to the AGI, we might find that a civilisation run by the AGI is unlikely to appear. The current state of the world energy industry and computation technology might fail to allow the AGI to generate answers to many tasks  that are necessary to sustain the energy industry itself. Attempts to optimize the AGI would require it to be more energy efficient, which appears to lead it to be neuromorphic, which in turn could imply that the AGIs running the civilisation are to be split into many brains, resemble the humanity a... (read more)

The usual explanation of probability theory goes like this:

There is this thing called Probability Space, which consists of three other things:

  1. Sample Space - some non-empty set
  2. Event Space - a set of subsets of the Sample Space
  3. Probability Function - a measure function over the elements of the Event Space.

And then several examples of how we can merge this mathematical model with a real world situations are given. 

For instance, for a dice roll the appropriate sample space would be {1; 2; 3; 4; 5; 6}. For an Event Space we can use a superset of the Sample Space and probability function has to give every elementary event equal value: 

The point of such examples is to give students intuitive understanding of how to apply the math of set theory towards...

2cubefox
I think picking axioms is not necessary here and in any case inconsequential. "Bachelors are unmarried" is true whether or not I regard it as some kind of axiom or not. I seems the same holds for tautologies and probabilistic laws. Moreover, I think neither of them is really "entangled" with reality, in the sense that they are compatible with any possible reality. They merely describe what's possible in the first place. That bachelors can't be married is not a fact about reality but a fact about the concept of a bachelor and the concept of marriage. Suppose you are not instrumentally exploitable "in principle", whatever that means. Then it arguably would still be epistemically irrational to believe that "Linda is a feminist and a bank teller" is more likely than "Linda is a bank teller". Moreover, it is theoretically possible that there are cases where it is instrumentally rational to be epistemically irrational. Maybe someone rewards people with (epistemically) irrational beliefs. Maybe theism has favorable psychological consequences. Maybe Pascal's Wager is instrumentally rational. So epistemic irrationality can't in general be explained with instrumental irrationality as the latter may not even be present. I don't think we have to appeal to reality. Suppose the concept of bachelorhood and marriage had never emerged. Or suppose humans had never come up with logic and probability theory, and not even with language at all. Or humans had never existed in the first place. Then it would still be true that all bachelors are necessarily unmarried, and that tautologies are true. Moreover, it's clear that long before the actual emergence of humanity and arithmetic, two dinosaurs plus three dinosaurs already were five dinosaurs. Or suppose the causal history had only been a little bit different, such that "blue" means "green" and "green" means "blue". Would it then be the case that grass is blue and the sky is green? Of course not. It would only mean that we say "grass is

I think picking axioms is not necessary here and in any case inconsequential.

By picking your axioms you logically pinpoint what you are talking in the first place. Have you read Highly Advanced Epistemology 101 for Beginners? I'm noticing that our inferential distance is larger than it should be otherwise.

"Bachelors are unmarried" is true whether or not I regard it as some kind of axiom or not.

No, you are missing the point. I'm not saying that this phrase has to be axiom itself. I'm saying that you need to somehow axiomatically define your individual words... (read more)

If probability is in the map, then what is the territory? What are we mapping when we apply probability theory?

"Our uncertainty about the world, of course."

Uncertainty, yes. And sure, every map is, in a sense, a map of the world. But can we be more specific? Say, for a fair coin toss, what particular part of the world do we map with probability theory? Surely it's not the whole world at the same time, is it?

"It is. You map the whole world. Multiple possible worlds, in fact. In some of them the coin is Heads in the others it's Tails, and you are uncertain which one is yours."

Wouldn't that mean that I need to believe in some kind of multiverse to reason about probability? That doesn't sound...

That's not how people usually use these terms. The uncertainty about a state of the coin after the toss is describable within the framework of possible worlds just as uncertainty about a future coin toss, but uncertainty about a digit of pi - isn't.

Oops, that's my bad for not double-checking the definitions before I wrote that comment. I think the distinction I was getting at was more like known unknowns vs unknown unknowns, which isn't relevant in platonic-ideal probability experiments like the ones we're discussing here, but is useful in real-world situa... (read more)

The other day I discussed how high monitoring costs can explain the emergence of “aristocratic” systems of governance:

Aristocracy and Hostage Capital

Arjun Panickssery · Jan 8
There's a conventional narrative by which the pre-20th century aristocracy was the "old corruption" where civil and military positions were distributed inefficiently due to nepotism until the system was replaced by a professional civil service after more enlightened thinkers prevailed ...

An element of Douglas Allen’s argument that I didn’t expand on was the British Navy. He has a separate paper called “The British Navy Rules” that goes into more detail on why he thinks institutional incentives made them successful from 1670 and 1827 (i.e. for most of the age of fighting sail).

In the Seven Years’ War (1756–1763) the British had a 7-to-1 casualty...

Solar has an average capacity factor in the US of about 25%. Naively, you might think that to turn this into a highly-available power source, you just need to have 4x the solar panels, plus enough batteries to store 75% of a day’s worth of power. E.g., for each continuous megawatt you want to supply, you need 4 MW of solar panels, and 18 MWh of batteries. During the day, you supply 1 MW from the panels and use the other 3 MW to charge the batteries. Overnight, you discharge the batteries to supply continuous power.

Turns out it’s not quite that simple. First, the capacity factor varies throughout the year, as the days get shorter in winter. So you at least need to build enough that even...

(I really like how gears-y your comment is, many thanks and strong-upvoted.)

What if they released the new best LLM, and almost no one noticed?

Google seems to have pulled that off this week with Gemini 2.5 Pro.

It’s a great model, sir. I have a ton of reactions, and it’s 90%+ positive, with a majority of it extremely positive. They cooked.

But what good is cooking if no one tastes the results?

Instead, everyone got hold of the GPT-4o image generator and went Ghibli crazy.

I love that for us, but we did kind of bury the lede. We also buried everything else. Certainly no one was feeling the AGI.

Also seriously, did you know Claude now has web search? It’s kind of a big deal. This was a remarkably large quality of life improvement.

Table of Contents

  1. Google Fails Marketing Forever. Gemini Pro 2.5? Never
...

This is helpful, thanks. Bummer though...

1Mis-Understandings
Google is a marketing company, it sells marketing to other people   So it actually does say something about google as an organization that their dogfood marketing is only mediocre. 
1MichaelDickens
You could technically say Google is a marketing company, but Google's ability to sell search ads doesn't depend on being good at marketing in the traditional sense. It's not like Google is writing ads themselves and selling the ad copy to companies.
1Mis-Understandings
Exactly. It is notable that google hosts so much ad copy, but is bad at it. You would think that they could get good by imitation, but turns out that no, imitating good marketing is hard. 
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

[This is our blog post on the papers, which can be found at https://transformer-circuits.pub/2025/attribution-graphs/biology.html and https://transformer-circuits.pub/2025/attribution-graphs/methods.html.]

Language models like Claude aren't programmed directly by humans—instead, they‘re trained on large amounts of data. During that training process, they learn their own strategies to solve problems. These strategies are encoded in the billions of computations a model performs for every word it writes. They arrive inscrutable to us, the model’s developers. This means that we don’t understand how models do most of the things they do.

Knowing how models like Claude think would allow us to have a better understanding of their abilities, as well as help us ensure that they’re doing what we intend them to. For example:

  • Claude can speak dozens of languages. What language, if any, is it using "in its
...

In the poetry case study, we had set out to show that the model didn't plan ahead, and found instead that it did.

I found it shocking they didn't think the model plans ahead. The poetry ability of LLMs since at least GPT2 is well beyond what feels possible without anticipating a rhyme by planning at least a handful of tokens in advance.

9Joseph Miller
DeepMind says boo SAEs, now Anthropic says yay SAEs![1] Reading this paper pushed me a fair amount in the yay direction. We may still be at the unsatisfying level where we can only say "this cluster of features seems to roughly correlate with this type of thing" and "the interaction between this cluster and this cluster seems to mostly explain this loose group of behaviors". But it looks like we're actually pointing at real things in the model. And therefore we are beginning to be able to decompose the computation of LLMs in meaningful ways. The Addition Case Study is seriously cool and feels like a true insight into the model's internal algorithms. Maybe we will further decompose these explanations until we can get down to satisfying low-level descriptions like "this mathematical object is computed by this function and is used in this algorithm". Even if we could still interpret circuits at this level of abstraction, humans probably couldn't hold in their heads all the relevant parts of a single forward pass at once. But maybe AIs could or maybe that won't be required for useful applications. The prominent error terms and simplifying assumptions are worrying, but maybe throwing enough compute and hill-climbing research at the problem will eventually shrink them to acceptable sizes. It's notable that this paper contains very few novel conceptual ideas and is mostly just a triumph of engineering schlep, massive compute and painstaking manual analysis. 1. ^ This is obviously a straw man of both sides. They seem to be thinking about it from pretty different perspectives. DeepMind is roughly judging them by their immediate usefulness in applications, while Anthropic is looking at them as a stepping stone towards ambitious moonshot interp.
4Viliam
A few months ago I complained that automatic translation sucks when you translate between two languages which are not English, and that the result is the same as if you translated through English. When translating between two Slavic languages, even sentences where you practically just had to transcribe Cyrillic to Latin and change a few vowels, both Google Translate and DeepL succeeded to randomize the word order, misgender every noun, and mistranslate concepts that happen to be translated to English as the same word. I tried some translation today, and from my perspective, Claude is almost perfect. Google Translate sucks the same way it did before (maybe I am not fair here, I did not try exactly the same text), and DeepL is somewhere in between. You can give Claude a book and tell to translate it to another language, and the result is pleasant to read. (I haven't tried other LLMs.) This seems to support the hypothesis that Claude is multilingual in some deep sense. I assume that Google Translate does not prioritize English on purpose, it's just that it has way more English texts that any other language, so if e.g. two Russian words map to the same English word, it treats it as strong evidence that those two words are the same. Claude can see that the two words are used differently, and can match them correctly to corresponding two words in a different language. (This is just a guess; I don't really understand how these things work under the hood.)
8Julian Bradshaw
Copying over a comment from Chris Olah of Anthropic on Hacker News I thought was good: (along with parent comment)      fpgaminer       olah3

Apologies for the late announcement.

Come on out to the ACX (Astral Codex Ten) Montreal Meetup! This week, we're discussing Is Brain Size Morally Relevant?, by Brian Tomasik. And hopefully determine the answer to this question once and for all.

Feel free to suggest topics or readings for future meetups on this form here.

Venue: Ye Olde Orchard Pub & Grill, 20 Prince Arthur St W.
Date & Time: Saturday, March 29th, 2024, 1PM.

RSVP by clicking "Going" at the top of this post.

Send a message on our Montreal Rationalists Discord on channel #meetup-general if you have trouble finding us or any other issues.

Please also join the mailing list and our Discord server if you haven't already. We host biweekly ACX Montreal meetups, so join us if you don't want to miss any of them!

PS: Add May 10th to your calendar, our forever-exciting biyearly ACX Everywhere Montreal meetup!

Even an AGI "aligned" to a purpose which doesn't imply humanity's survival but does require the AGI itself to achieve difficult feats like transforming the entire Solar System into something computing as many digits of pi as possible would obviously still need to produce the computing systems and gather the energy necessary for the systems' work. As I mentioned in my previous question, all the electrical energy generated in the world cannot sustain more than  agents who interact with GPT-3 a hundred times a day while using 3Wh per interaction. The OpenAI-o3 model apparently requires more than 1 kWh per task

However, the ARC-AGI task set shows the following trend: as the o3 models taught under the same paradigm increased the rate of success at ARC-AGI-1 tasks from 10%...

LessOnline 2025

Ticket prices increase in 5 days

Join our Festival of Blogging and Truthseeking from May 30 - Jun 1, Berkeley, CA