Discussion about this post

User's avatar
Samuel Hammond's avatar

Your analogy breaks down as the "safety team" wasn't an internal safety bureaucracy / watch dog, but OpenAI's Superalignment research project. OpenAI has a separate "trust and safety" team that is closer to the discrete watchdog you mention.

It's true that it's normally better to fuse "safety" with core engineering teams when building anything big and important, especially when the "safety" is about making the core product safe. The superalignment project was something different. It was doing concentrated research on how to align and interpret powerful models per se. It's similar to how Anthropic has its core Claude team, and a separate team of researchers focused on interpretability. The mech interp team's role isn't as an internal watchdog, but as its own independent research effort with its own findings and purpose.

Expand full comment
Eskild's avatar

The modern world, and this wonderful post, has me thinking of a quote by Kurt Vonnegut:

“We have to continually be jumping off cliffs and developing our wings on the way down.”

There is no other way...

Expand full comment
18 more comments...

No posts