How To Bell The AI Cat?
The mice finally agreed how they wanted the cat to behave, and congratulated each other on the difficult consensus. They celebrated in lavish cheese island retreats and especially feted those brave heroes who promised to place the bells and controls.
The heroes received generous funding, with which they first built a safe fortress in which to build and test the amazing bells they had promised. Experimenting in safety without actually touching any real cats, the heroes happily whiled away many years.As wild cats ran rampant, the wealthy and wise hero mice looked out from their well-financed fortresses watching the vicious beasts pouncing and polishing off the last scurrying ordinaries. Congratulating each other on their wisdom of testing the controls only on tame simulated cats, they mused over the power of evolution to choose those worthy of survival...
Deciding how we want AIs to behave may be useful as an aspirational goal, but it tempts us to spend all our time on the easy part, and perhaps cede too much power up front to those who claim to have the answers.
To enforce rules, one must have the ability to deliver consequences - which presumes some long-lived entity that will receive them, and possibly change its behavior. The fight with organized human scammers and spammers is already a difficult battle, and even though many of them are engaged in behaviors that are actually illegal, the delivery of consequences is not easy. Most platforms settle for keeping out the bulk of the attackers, with the only consequence being a blocked transaction or a ban. This is done with predictive models (yes, AI, though not the generative kind) that makes features out of assets" such as identifiers, logins, device ids which are at least somewhat long-lived. The longer such an asset" behaves well, the more it is trusted. Sometimes attackers intentionally create sleeper" logins that they later burn.
Add generative AI to the mix, and the playing field tilts more towards the bad actors. AI driven accounts might more credibly follow normal" patterns, creating more trust over time before burning it. They may also be able to enter walled gardens that have barriers of social interaction over time, damaging trust in previously safe smaller spaces.
What generative AI does is lower the value of observing normal" interactions, because malicious code can now act like a normal human much more effectively than before. Regardless of how we want AIs to behave, we have to assume that many of them will be put to bad uses, or even that they may be released like viruses before long. Even without any new rules, how can we detect and counteract the proliferation of AIs who are scamming, spamming, behaving inauthentically, and otherwise doing what malicious humans already do?
Anyone familiar with game theory (see Nicky Case's classic Evolution of Trust for a very accessible intro) knows that behavior is better" - more honest and cooperative - in a repeated game with long-lived entities. If AIs can somehow be held responsible for their behavior, if we can recognize who" we are dealing with, perhaps that will enable all the rules we might later agree we want to enforce on them.
However, upfront we don't know when we are dealing with an AI as opposed to a human - which is kind of the point. Humans need to be pseudonymous, and sometimes anonymous, so we can't always demand that the humans do the work of demonstrating who they are. The best we can do in such scenarios, is to have some long-lived identifier for each entity, without knowing its nature. That identifier is something it can take with it for establishing its credibility in a new location.
Why, that's a DID!" I can hear the decentralized tech folx exclaim - a decentralized identifier, with exactly this purpose, to create long-lived but possibly pseudonymous identifiers for entities that can then be talked about by other entities who might express more or less trust in them. The difference between a DID and a Twitter handle, say, is that a DID is portable - the controller has the key which allows them to prove they are the owner of the DID, by signing a statement cryptographically (the DID is essentially the public key half of the pair) - so that the owner can assert who they are on any platform or context.
Once we have a long-lived identity in place, the next question is how do you set up rules - and how would those rules apply to generative AI?
We could require that AIs always answer the question Who are you?" by signing a message with their private key and proving their ownership of a DID, even when interacting from a platform that does not normally expose this. Perhaps anyone who cannot or does not wish to prove their humanity to a zktrust trusted provider, must always be willing to answer this challenge, or be banned from many spaces.
What we are proposing is essentially a dog license, that each entity (whether human or AI) interacting must identify who it is in some long term way, so that both public attestations about it and private or semi-private ones can be made. Various accreditors can spring up, and each maintainer of a space can decide how high (or low) to put the bar. The key is we must make it easy for spaces to gauge the trust of new participants, independent of their words.
Without the expectation of a DID, essentially all we have to lean on is the domain name service of where the entity is representing itself, or the policy of the centralized provider which may be completely opaque. But this means that new creators of spaces have no way to screen participants - so we would ossify even further into the tech giants we have now. Having long-lived identifiers that cross platforms enables the development of trust services, including privacy-preserving zero-knowledge trust services, that any new platform creator could lean on to create useful, engaging spaces (relatively) safe from spammers, scammers, and manipulators.
Identifiers are not a guarantee of good behavior, of course - a human or AI can behave deceptively, run scams, spread disinformation and so on even if we know exactly who they are. They do, however, allow others to respond in kind. In game theory, a generous tit-for-tat strategy winds up generally being successful in out-competing bad actors, allowing cooperators who behave fairly with others to thrive. Without the ability to identify the other players, however, the cheaters will win every round.
With long term identifiers, the game is not over - but it does become much deeper and more complex, and opens an avenue for the honest" cooperators to win, that is, for those who reliably communicate their intentions. Having identifiers enables a social graph, where one entity can stake" their own credibility to vouch for another. It also enables false reporting and manipulation, even coercion! The game is anything but static. Smaller walled gardens of long-trusted actors may have more predictable behavior, while more open spaces provide opportunity for newcomers.
This brings us to the point where consensus expectations have value. Once we can track and evaluate the behavior, we can set standards for the spaces we occupy. Creating the expectation of an identifier, is perhaps the first and most critical standard to set.
Generative AI can come play with us, but it should do so in an honest, above board way, and play by the same rules we expect from each other. We may have to adapt our tools for everyone in order to accomplish it - and must be careful we don't lose our own freedoms in the process.