Often you can compare your own Fermi estimates with those of other people, and that’s sort of cool, but what’s way more interesting is when they share what variables and models they used to get to the estimate. This lets you actually update your model in a deeper way.
The claim is that most applications aren't internal usage of AI for AI development and thus can be made trivially safe.
Not that most applications of AI for AI development can be made trivially safe.
When I introduce people to plans like QACI, they often have objections like "How is an AI going to do all of the simulating necessary to calculate this?" or "If our technology is good enough to calculate this with any level of precision, we can probably just upload some humans." or just "That's not computable."
I think these kinds of objections are missing the point of formal goal alignment and maybe even outer alignment in general.
To formally align an ASI to human (or your) values, we do not need to actually know those values. We only need to strongly point to them.
AI will figure out our values. Whether it's aligned or not, a recursively self-improving AI will eventually get a very good model of our values, as part...
What do you think of this post by Tammy?
It seems like someone could definitely be wrong about what they want (unless normative anti-realism is true and such a sentence has no meaning). For example consider someone who thinks it's really important to be faithful to God and goes to church every Sunday to maintain their faith and would use a superintelligent religious AI assistant to help keep the faith if they could. Or maybe they're just overconfident about their philosophical abilities and would fail to take various precautions that I think are importan...
Basically all ideas/insights/research about AI is potentially exfohazardous. At least, it's pretty hard to know when some ideas/insights/research will actually make things better; especially in a world where building an aligned superintelligence (let's call this work "alignment") is quite harder than building any superintelligence (let's call this work "capabilities"), and there's a lot more people trying to do the latter than the former, and they have a lot more material resources.
Ideas about AI, let alone insights about AI, let alone research results about AI, should be kept to private communication between trusted alignment researchers. On lesswrong, we should focus on teaching people the rationality skills which could help them figure out insights that help them build any superintelligence, but are more likely to first give them insights...
While I've softened my position on this in the last year, I want to give a big +1 to this response, especially these two points:
...
- It's genuinely hard to come up with ideas that help capabilities a lot. I think you are severely underestimating how hard it is, and how much insight is required. I think one issue here is that most papers on arxiv are garbage and don't actually make any progress, but those papers are not the ones that are pushing AGI forward anyways.
- [..]
- High level ideas are generally not that valuable in and of themselves. People generally learn
The beauty industry offers a large variety of skincare products (marketed mostly at women), differing both in alleged function and (substantially) in price. However, it's pretty hard to test for yourself how much any of these product help. The feedback loop for things like "getting less wrinkles" is very long.
So, which of these products are actually useful and which are mostly a waste of money? Are more expensive products actually better or just have better branding? How can I find out?
I would guess that sunscreen is definitely helpful, and using some moisturizers for face and body is probably helpful. But, what about night cream? Eye cream? So-called "anti-aging"? Exfoliants?
Retinoids, which is a big family of compounds but I would go with adapalene, which has better safety/side effect than anything else. It has less scientific evidence for anti-aging than other retinoids (and is not marketed for that purpose), but I've tried it myself (bought it for acne), and it has very obvious anti-wrinkle effects within like a week. You can get generic 0.1% adapalene gel on Amazon for 1.6oz/$12.
(I'm a little worried about long term effects, i.e. could the increased skin turnover mean faster aging in the long run, but can't seem to find any data or discussion about it.)
What is the mysterious impressive new ‘gpt2-chatbot’ from the Arena? Is it GPT-4.5? A refinement of GPT-4? A variation on GPT-2 somehow? A new architecture? Q-star? Someone else’s model? Could be anything. It is so weird that this is how someone chose to present that model.
There was also a lot of additional talk this week about California’s proposed SB 1047.
I wrote an additional post extensively breaking that bill down, explaining how it would work in practice, addressing misconceptions about it and suggesting fixes for its biggest problems along with other improvements. For those interested, I recommend reading at least the sections ‘What Do I Think The Law Would Actually Do?’ and ‘What are the Biggest Misconceptions?’
As usual, lots of other things happened as well.
Regarding Cortez and the Aztecs, it is of interest to note that Cortez's indigenous allies (enemies of the Aztecs) actually ended up in a fairly good position afterwards.
From https://en.wikipedia.org/wiki/Tlaxcala
For the most part, the Spanish kept their promise to the Tlaxcalans. Unlike Tenochtitlan and other cities, Tlaxcala was not destroyed after the Conquest. They also allowed many Tlaxcalans to retain their indigenous names. The Tlaxcalans were mostly able to keep their traditional form of government.
Hello, friends.
This is my first post on LW, but I have been a "lurker" here for years and have learned a lot from this community that I value.
I hope this isn't pestilent, especially for a first-time post, but I am requesting information/advice/non-obvious strategies for coming up with emergency money.
I wouldn't ask except that I'm in a severe financial emergency and I can't seem to find a solution. I feel like every minute of the day I'm butting my head against a brick wall trying and failing to figure this out.
I live in a very small town in rural Arizona. The local economy is sustained by fast food restaurants, pawn shops, payday lenders, and some huge factories/plants that are only ever hiring engineers and other highly specialized personnel.
I...
Finding a loan to move to somewhere with jobs is probably your best bet. This may devolve to begging amongst any social circles as well, which is a big pride hit. Many probably won't believe you will wind up with the means to pay it back. Minimizing the cost of the move by getting rid of non essential belongings. It is probably somewhat easier these days to line up a far away job via zoom interviewing. Quantity over quality.
One thing that I think is non-obvious: if you lay out the case for the loan in detail, that demonstrates intelligence and conscientiou...
Produced while being an affiliate at PIBBSS[1]. The work was done initially with funding from a Lightspeed Grant, and then continued while at PIBBSS. Work done in collaboration with @Paul Riechers, @Lucas Teixeira, @Alexander Gietelink Oldenziel, and Sarah Marzen. Paul was a MATS scholar during some portion of this work. Thanks to Paul, Lucas, Alexander, Sarah, and @Guillaume Corlouer for suggestions on this writeup.
What computational structure are we building into LLMs when we train them on next-token prediction? In this post we present evidence that this structure is given by the meta-dynamics of belief updating over hidden states of the data-generating process. We'll explain exactly what this means in the post. We are excited by these results because
This is a great question, and one of the things I'm most excited about using this framework to study in the future! I have a few ideas but nothing to report yet.
But I will say that I think we should be able to formalize exactly what it would mean for a transformer to create/discover new knowledge, and also to apply the structure from one dataset and apply it to another, or to mix two abstract structures together, etc. I want to have an entire theory of cognitive abilities and the geometric internal structures that support them.
Announcing the first academic Mechanistic Interpretability workshop, held at ICML 2024! I think this is an exciting development that's a lagging indicator of mech interp gaining legitimacy as an academic field, and a good chance for field building and sharing recent progress!
We'd love to get papers submitted if any of you have relevant projects! Deadline May 29, max 4 or max 8 pages. We welcome anything that brings us closer to a principled understanding of model internals, even if it's not "traditional” mech interp. Check out our website for example topics! There's $1750 in best paper prizes. We also welcome less standard submissions, like open source software, models or datasets, negative results, distillations, or position pieces.
And if anyone is attending ICML, you'd be very welcome at the workshop!...
This is a thread for updates about the upcoming LessOnline festival. I (Ben) will be posting bits of news and thoughts, and you're also welcome to make suggestions or ask questions.
If you'd like to hear about new updates, you can use LessWrong's "Subscribe to comments" feature from the triple-dot menu at the top of this post.
Reminder that you can get tickets at the site for $400 minus your LW karma in cents.
Launched a few days ago, the plan is:
Happy to get feedback on this, still figuring out what exactly helps parents and how to set it up right.