I flinch at this, as it seems to me to dilute the phrase "AI alignment", which was coined as a term-of-art for the technical directing of an optimization processes' objectives. (Edited)
<!-- Is it possible for crypto to successfully coordinate games involving a coalition of misaligned humans and AIs?
The value of credible commitments (mechanisms) shines the most where players have the least trust, and trust originates from audibility and int
This (and a bunch of the below) feels really weird to me. Sometimes in my field we joke about how coverage of AI often goes "AI is very interesting, it might lead to job loss or redistribution of wealth or the literal destruction of everything we know and love. More on self driving cars at 7", and this doc seems to me to have some of this property. (Edited)
<!-- Is it possible for crypto to successfully coordinate games involving a coalition of misaligned humans and AIs?
The value of credible commitments (mechanisms) shines the most where players have the least trust, and trust originates from audibility and int
From my perspective -- as someone who thinks that the default outcome we're steaming towards is the literal destruction of everyting we know and love (while also thinking that with full understanding of AI we could attain a technological eutopia shockingly quickly) -- the question first and foremost would be something like "is all humanity facing a grave and pressing peril? and, if so, what coordination powers that humanity has scraped together so far can be brought to bear to face it, and how?" (Edited)
I would start from the other end: what is the technical problem to be solved? How would we think about solving it in a case where the stakes are low? Only then should we consider the high-stakes questions, because frankly discussions of world-ending scenarios are *enormously* distracting and suck up huge amounts of attention, leaving little time to consider the real nature of the problem to be solved.
(though as i noted in telegram, i'm also happy to back off and avoid a too-many-cooks-in-the-kitchen situtaion, and help out with some sessions, and am only giving my commentary per a request that I do so) (Edited)
these sure aren't how I'd go about introducing AI alignment! "ethics" and "assumption of commitments" seem mostly tangential to alignment, to me, and "decision theory" has a relationship but it's not a very direct one and wouldn't show up in an intro session. (Edited)
Would it be correct to say that ethics and alignment are mostly independent, because while we might have several workable ethical schemes, we have no workable means of aligning artificial minds to any of them? So the problem is that we don't know how to bind the AI to those schemes, and making the schemes more elaborate or elegant doesn't help.
We can't rely on "but we can just train the AI to care", because there are good reasons to believe that our training methods will not reliably cause the AI to care, and thise is the heart of the problem.
seems to me like a sideshow (albeit one that people probably want a session on); if it were me designing the agenda I'd give it much less focus (Edited)
this seems to me to do a bit of the "AI: job loss, wealth redistribution, and the literal destruction of everything" thing (unless someone thinks that cryptography defends against literal superintelligent adversaries that could rapidly destroy the biosphere regardless of what cryptography you're running). (Edited)
if it were me, i'd be careful to treat the following topics very separately:
1. how do our coordination mechanisms interplay with weak AIs in whatever remains of the short-term?
2. what coordination feats can we pull off before the transformative AIs show up, that can help humanity build AIs that are friendly
where stuff like privacy concerns fall squarely in (1). and where i note again that it feels weird to me to mix them, as they carry such different levels of gravity. (Edited)
i flinch at the word "conscious" here, as philosophical questions about consciousness are a great derailer of conversations about how to avoid literally everything we know and love being destroyed, for some reason (Edited)
and again, this section seems to mix and match a bunch of stuff that's like "how do we help teachers know whether their kids used an LLM on their homework? and how do we avoid the literal destruction of everything we know and love?", as feels super weird and off-putting to me. (Edited)
<!-- Is it possible for crypto to successfully coordinate games involving a coalition of misaligned humans and AIs?
The value of credible commitments (mechanisms) shines the most where players have the least trust, and trust originates from audibility and interpre
This assertion needs some evidence. I expect that AI develoment will quickly trend towards more and more agentic behavior (current LLMs are at best okay-ish at driving agents, but they were never trained to do so. We might soon see LLMs explicitly trained for driving agentic behavior, though.) (Edited)
Agree that this should not have high priority in this context.
(AI consciousness is mostly relevant to determine if AIs can be moral patients and how they should be treated from an ethical standpoint. I has no bearing on their performance or impact otherwise, though). (Edited)
I guess the question is: is this an event about 1) addressing AGI existential risk, or is it an event that 2) explores what the crypto community could do to decrease some specific harms of advanced AI?
It is easy to come up with modest proposals for (2), e.g. securing information infrastructures via verification and timestamping, networks of trust. (Edited)
But I get the impression that the event aims for (1), asking how incentive and commitment mechanism enabled by blockchain tech can steer AGI.
Fundamental question here: why would a potentially unaligned AGI not simply choose to ignore any incentives/constraints from the crypto infrastructure? (Edited)
I think it is a very hard yet crucially important question to tackle.
The first angle of attack naturally comes from the vulnerability of infra itself. Infra is written by human beings and naturally fallible. Given the popular definition of AGI, it should be able to exploit the vulnerabilities more efficiently than all the human hackers combined. The only answer seems to be how well can we steer a more aligned and potentially less capable version of AI to bulletproof the infrastructure.
The second angle of attack is mcuh trickier. Since so much of human civilization runs on other infrastructure, it seems to me that runaway AGI can simply ignore the whatever infrastructure we are trying to use to put a restraint on it.
A potential solution might be making such infrastructure necessary for the AGI to achieve its world conquering mission. For example, if the majority of GPU or future AI computing power is locked inside such infra network, then this AGI has to participate. (Edited)
<!-- Is it possible for crypto to successfully coordinate games involving a coalition of misaligned humans and AIs?
The value of credible commitments (mechanisms) shines the most where players have the least trust, and trust originates from audibility and int
love this framing, in particular the idea that we can in some way leverage collective intelligence (i.e. intelligence of existing humans within some kind of incentivized / coordinated system) to steer work on AI systems (Edited)
<!-- Is it possible for crypto to successfully coordinate games involving a coalition of misaligned humans and AIs?
The value of credible commitments (mechanisms) shines the most where players have the least trust, and trust originates from audibility and int
<!-- Is it possible for crypto to successfully coordinate games involving a coalition of misaligned humans and AIs?
The value of credible commitments (mechanisms) shines the most where players have the least trust, and trust originates from audibility and interpre
Are there any particularly good papers showing AI developing more agentic behavior? Thinking of e.g. this paper but I've heard mixed things about the results (don't feel qualified to know which perspective is correct yet):
https://arxiv.org/abs/2211.10851 (Edited)
Nate had some really interesting points the other night about whether or not consciousness is really even related to or relevant for agency.
I don't know where I stand on it but I think it might be good to separate the question of consciousness into another stream (vs. explicitly including it in crypto x AI) (Edited)
<!-- Is it possible for crypto to successfully coordinate games involv
This is really interesting to consider, we often talk about the value of pluralism in web3 and how we can achieve more if we have a wide range of coordinated local systems that solve their own independent problems, and eventually through that evolutionary process more global problems.
What would it look like to have multiple adversarial AGIs that essentially compete with each other in order to reveal new information / better strategies? Is this even useful / necessary when thinking of true AGI? (Edited)
There must be at least *some* room for the "how to defend against AI that is merely extremely annoying and not an existential threat" question though (Edited)
I think we could distinguish between consciousness as a metaphysical claim, and the idea of an agent which has an internal representation of self and world.
The latter is definitely in-scope, but the former gets into philosophical territory that might be interesting but where we're unlikely to make any progress. (Edited)
<!-- Is it possible for crypto to successfully coordinate games involv
I think this is important in terms of the "orthogonality thesis", or the idea that superintelligent AIs would not automatically converge on ethical viewpoints simply because they're superintelligent. It should still be true that superintelligent agents would have private knowledge and different histories, and these differences in initial conditions ought to cause them to behave differently, possibly *very* differently from each other. (Edited)
IMHO, just like other tools, crypto should be used as a means to the end that is AI alignment. We are not trying to align crypto and AI, but achieve AI alignment. (Edited)
> Both the cryptoeconomics research community and the AI safety / new cyber-governance / existential risk community are trying
Some of the examples below do not seem to be directly related to AI for people unfamiliar with the crypto side of things, for example credible auctions, programmable privacy, and identity (Edited)
> Both the cryptoeconomics research community and the AI safety / new cyber-governance / existential risk community are trying
“Both the cryptoeconomics research community and the AI safety / new cyber-governance / existential risk community are trying to tackle what is fundamentally the same problem: How can we regulate a very complex and very smart system with unpredictable emergent properties using a very simple and dumb system whose properties once created are inflexible”
The above Excerpt from Vitalik's “Why Cryptoeconomics and X-Risk Researchers Should Listen to Each Other More” might be better framing angle
(Edited)
Here in Zuzalu, we attempt to explore their intersection from first principles. During a
“Both the cryptoeconomics research community and the AI safety / new cyber-governance / existential risk community are trying to tackle what is fundamentally the same problem: How can we regulate a very complex and very smart system with unpredictable emergent properties using a very simple and dumb system whose properties once created are inflexible”
Again, this might be a helpful angle (Edited)
<!-- Is it possible for crypto to successfully coordinate games involving a coalition of misaligned humans and AIs?
The value of credible commitments (mechanisms) shines the most where players have the least trust, and trust originates from audibility and interpre
1. We should not overestimate our ability to sense our location in the exponential curve.
2. Regardless of exponential curve actually taking off, it would be an important direction of research for x-risk angle. (Edited)
Second on that philosophical discussion on consciousness can easily get intractable and often lead to claims that can neither be validated or falsified. (Edited)
<!-- Is it possible for crypto to successfully coordinate games involving a coalition of misaligned humans and AIs?
This is the most interesting question to me, because it is essentially the hardest problem out there and I will be much more optimistic about AI x-risk if it is theoretically possible (Edited)
CryptoXai.wtf
From Asimov's Laws to Ethereum’s Protocol: [Re]searching the intersection where crypto meets AI alignment.
Both the cryptoeconomics research community and the AI safety / new cyber-governance / existential risk community are trying to tackle what is fundamentally the same problem: How can we regulate a very complex and very smart system with unpredictable emergent properties using a very simple and dumb system whose properties once created are inflexible. - Vitalik Buterin, Why Cryptoeconomics and X-Risk Researchers Should Listen to Each Other More (2016)
Here in Zuzalu, we attempt to explore their intersection from first principles. During an evening whiteboarding session at a Pi-rate Ship pop-up hackerhouse, we, a group of humans, started by brainstorming the core concepts that underpins the foundations of both fields. We arrived at a collective mindmap for Crypto X AI, taken inspiration from MEV mindmap: undirected traveling salesman. This exploration leads us to a continuing journey into a future where crypto mechanisms become increasingly conscious, and AI plays a transformative role in prediction and alignment.
This is just the beginning of our journey exploring the convergence of crypto and AI alignment research... If you would like to contribute to our collective mindmap, ping @sxysun1 on Twitter.
Agenda (WIP)
Time: 11:00 - 19:30 (GMT+2) on Sunday, May 7, 2023 Location: The Lighthouse, Zuzalu Livestream:zuzalu.streameth.org Disclaimer: No finality on the agenda yet, see you in MEV-time. ;) Pre-game: Wanna have your questions answered by the speakers? Want the event to focus on what you are interested in? Make yourself heard!
Abstract: Discussions of risks from AI are often emotionally charged, making it difficult to have a common understanding of what the concrete risks are and what capabilities will lead to these risks. In this talk, Daniel will provide an opinionated view on the history and future of these capabilities/risks. The goal will be to align the workshop attendees on concrete scenarios to discuss.
Abstract: Laying out hypothetical scenarios of dystopian and utopian outcomes of AI development on humans, and potential paths as to how to get there.
Dystopias: FOOM Doom, medium-speed descent into madness, locked in: totalitarianism, stagnation, civilizational decline, the human Moloch: what might cause the above
Utopias: Fun theory, CEV and other theories of aggregating human preferences, "Minimal AI government" ideas, Human coordination: what might cause the above
Abstract:Competing existential risks, falsifiability, near-misses, and sharing the planet with cognitively more advanced agents. With a side of LoRa, Markets and Predictions. Understanding the relative sizes of x-risks is crucial to guiding policy. Increasing AI capabilities have two effects on this, increasing the probability that the AI destroys us at some point, while potentially also preventing things that destroy us. Credibly signalling who understands the relative size of these two counter-acting forces on human welfare is crucial for our future. This talk proposes an initial mechanism towards this in the form of fine-tunings of LLMs that reflect beliefs, and are judged on their ability to predict tommorows news given todays.
Abstract: Consciousness is a pre-scientific phenomenon similar to alchemy. What would it take to transition to a principled chemistry of phenomenology? This talk will survey “what kind of problem” consciousness is, what we might expect from a mature science of consciousness, and my solutions thus far. Implications for “what kind of thing humans are” and AI alignment will be discussed.
Abstract: What progress has the AI alignment field made over time? How have its problems been formulated, reframed, and solved over time? What are some of the fundamental obstacles? This talk presents an overview of the history of the field and current lines of inquiry.
Abstract: I'll share some thoughts about AI safety, shaped by a year's leave at OpenAI to work on the intersection of AI safety and theoretical computer science.
Abstract: Open Agency Architecture is a bold theory and proposal for AI alignment that requires a massive and wide ranging formal-modeling enterprise that integrates into a global world-model. OAA systems do not deploy the trained ML system itself, but instead aim to constrain powerful ML systems to deploy verifiably aligned, less powerful outputs. Our plan is to develop OAA by iterating on smaller, domain-specific applications that can find immediate use as institutional decision-making tools and provide OAA with feedback from different academic disciplines and expert networks in an international collaboration.
This chapter aims to build a mental model of blockchains, cryptoeconomic mechanisms, and cryptography, focusing on their ability to align and coordinate agents.
Abstract: In this talk, I will discuss the parallels between MEV alignment and AI alignment. First, I will give a brief introduction to MEV as a primitive for representing complex coordination games. I will argue that the MEV ecosystem represents a synthetic consciousness of fundamentally unaligned and often robotic actors, whose local incentives drive them to a common outcome. I posit several learnings and opportunities from the intersection of MEV and AI, including the ability to use cryptocurrencies as a hyper-realistic and ultra-adversarial sandbox to test agent modeling axioms. I will claim that privacy and decentralization are the key to an aligned future, and that we must align around these topics as humans as well.
Abstract: A major value proposition of cryptoeconomic mechanisms is that users can trustlessly collaborate by making credible commitments of their actions. We discuss ways where crypto-enforced credible commitments may mitigate human-AI coordination failures and demonstrate the limit and tradeoff of those commitment devices in mitigating intelligence alignment risks. We demonstrate how, surprisingly, the problem of mitigating the negative externalities of commitment devices in cryoto (i.e., MEV) is same as the problem of cooperative AI and a large part of AI alignment.
Abstract: The crypto community has thought a lot about how to build collusion resistant mechanisms. Basically making it impossible to get bribed by making it impossible to prove you did the thing you want to get bribed for. If we combine this with proof of individuality and proof of possession of private key we can make it impossible for AI to bribe humans to defect.
Abstract: Zero-knowledge proofs have made amazing advances in proving arbitrary computation, but the real uses have mostly been limited in zkEVMs. In this talk, I will describe how to use zero-knowledge proofs to interact with the real world. I'll start by describing to authenticate real people and media (videos, images, audio) without needing to trust third parties when combined with attested sensors. With open standards, we also don't need to rely on specific hardware vendors. I'll also describe how to audit ML deployments. As a case study, I'll describe how to audit the Twitter algorithm. The same technology can also be used to audit providers such as OpenAI.
Abstract: The philosopher John Rawls proposed the Veil of Ignorance (VoI) as a thought experiment to identify fair principles for governing a society. Here, we apply the VoI to an important governance domain: artificial intelligence (AI). In five incentive-compatible studies (N = 2, 508), including two preregistered protocols, participants choose principles to govern an Artificial Intelligence (AI) assistant from behind the veil: that is, without knowledge of their own relative position in the group. Compared to participants who have this information, we find a consistent preference for a principle that instructs the AI assistant to prioritize the worst-off. Neither risk attitudes nor political preferences adequately explain these choices. Instead, they appear to be driven by elevated concerns about fairness: Without prompting, participants who reason behind the VoI more frequently explain their choice in terms of fairness, compared to those in the Control condition. Moreover, we find initial support for the ability of the VoI to elicit more robust preferences: In the studies presented here, the VoI increases the likelihood of participants continuing to endorse their initial choice in a subsequent round where they know how they will be affected by the AI intervention and have a self-interested motivation to change their mind. These results emerge in both a descriptive and an immersive game. Our findings suggest that the VoI may be a suitable mechanism for selecting distributive principles to govern AI.
Abstract: Language models have ruined the ability for us to have a clean separation between man and machine — RIP Turing Test. On the other hand, other areas of computer science, such as interactive proofs and ZK have very 'clean' notions of knowledge built into their definitions. The type of 'knowledge' in zero knowledge exists in a particular sense — it can only 'exist' if a polynomial time algorithm generated it and it can only be 'stolen' if you have exponential compute resources. This dichotomy between the lack of "knowledge" in LLMs versus the formal and clear definition of "knowledge" in ZK suggests that we might be able to import some lessons about 'knowledge' from ZK to LLMs. In this talk, we'll go through the epistemological concerns related to this question and try to provide some ideas for how LLMs can display possession of knowledge to each other.
Abstract: In this talk, we will delve into the fascinating world of AI-generated images, exploring the intricate relationship between AI and artistic authorship. We'll examine whether searching for an image can truly be considered as creating it, and discuss the characteristics of a medium that influence the strength of authorship claims. Further, we'll investigate how the semantics of language as a tool for image synthesis impacts the end results and consider whether AI is inadvertently codifying "style" in its creations. Along the way, we'll ponder if AI-generated art evokes a sense of nostalgia by design, and ultimately, address the burning question: Can AI truly produce art?
Special thanks to Vitalik Buterin, Michael Johnson, Rob Knight, Nate Soares, Xinyuan Sun (Sxysun), Barry Whitehat, George Zhang and Zuzalu friends of the Pi-rate Ship. If you have any proposed topics, would like to speak or attend the .wtf unconference, please ping sxysun or T.I.N.A. via Twitter DM.
Snacks for Thoughts…
Blockchains enable trustless collaboration via cryptographic and crypto-economic primitives. These primitives allow users to delegate their decision-making to smart contracts (algorithmic agents). And consensus on commitments makes this delegation common knowledge, thus shifting equilibria.
AIs, as complex algorithmic agents that may or may not employ agentic behavior, can lead to undesirable equilibria for humans, and many has even predicted to be bringing horrible destruction for humans very quickly. Can existing coordination technologies like crypto help us answer this question?
CryptoXAI delves into the coordination and alignment aspects of AI and crypto. After all, crypto's potential lies in its ability to act as a coordination device through the use of credible commitments, e.g., global payment, public goods funding, democratized financial access. How does crypto, as an alignment/commitment device, compare with popular alignment approaches such as decision theories or open-sourcing AI’s source code? Does it make more sense to align AIs by combining functional decision theory with cryptographic commitments about the AI's actions instead of allowing arbitrary access to source code (which could cause programmable privacy issues)? But even if AIs can coordinate using cryptographic/crypto-economic commitments, can those commitments exercise be interpretable? What does the tradeoff space of those approaches look like?
Will AI use crypto to coordinate amongst themselves to improve the equilibria payoff? Will the equilibria that they coordinate align with the human social value? Can crypto as a commitment device be used to align AIs and humans? Afterall, some argues that AIs are still far from gaining agency and will stay in the "tool of humans" range for a long time. If that's the case, will the coordination and alignment of AI just ends up being a shadow of the coordination and alignment of humans. What unique properties would the projection of this shadow have? And if it does endup being human alignment, is it possible for crypto to exercise the coordination magic on humans to work on building AIs together (e.g., solving the data privacy training problem using some variant of orderflow auctions)? How about using crypto to coordinate humans in the period leading up to AGI, or to make the online world more secure against AGI?
What about AGIs? How fast will AIs gain agency? Does agency require consciousness? Does the development of AGIs lead to a world where there is one dominant Advanced AGI? Will AGIs have arbitrary preferences that make their alignment impossible? Will the access to privacy technology change what AGI could do (after all, the acess to source code would be impossible)?
What kind of human values can crypto, as commitment devices, align AIs to that wasn’t possible with existing approaches like various decision theories? What are the limitations of commitment devices in its coordination of agents to reach human-valued outcomes? Can crypto learn from AI on how to best coordinate and trustlessly cooperate?