Rob Knight - HackMD

AI alignment problem

TL,DR; It is tempting to think of AI alignment as an ethical problem, but this presupposes that we have some means of making an ethical scheme binding on an AI, which we do not have. At best we have only unreliable means of constraining AI computation or outputs, which are not sufficient to meet any commonly-accepted standards of reliability for critical systems. By analogy with common computing systems such as disk drives and file systems, we can see that we lack the technology and engineering practice to ensure equivalent levels of reliability, and so much of our talk about AI ethics is akin to designing file hierarchies on top of a storage system that randomly forgets or rewrites data. What do we mean by "alignment"? It is common to talk about an "alignment of interests" between people, such that what is good for person A is also good for person B. This serves as a good basis for A and B to cooperate with each other. Or we can talk about an "alignment of intent", such that A and B have decided that they both wish to pursue some particular outcome. We might also talk about an "alignment of values", where A and B have some general agreement about the kinds of things that they consider good and bad. Such alignment can also be pursued in larger groups. We could talk about alignment within an organisation. A well-aligned organisation is one in which most people are pursuing a common goal, in ways that are mutually supportive. A poorly-aligned organisation is one in which the actions of individuals conflict with each other, or conflict with the organisation's goals or purpose. We can address these problems in a variety of ways. If we wish to achieve an alignment of intent - that is, we want to get people to work together for a shared outcome - then we might seek first to align their interests. If you have a team of people who could work together to achieve something, we should make sure that it's in each person's interest to do so, by offering to pay each person something if the outcome is achieved. Or, if payment is not appropriate, we might try to rationally persuade people to collaborate by appealing to their shared values, showing how cooperation is the right thing to do in the circumstances.