AI Box

In Friendly AI studies, an AI box is a hypothetical isolated computer hardware system where an artificial intelligence is kept constrained inside a simulated world and not allowed to affect the external world. Such a box would have extremely restricted inputs and outputs; maybe only a plaintext channel. However, a sufficiently intelligent AI may be able to persuade or trick its human keepers into releasing it. This is the premise behind Eliezer Yudkowsky’s informal AI-box experiment.

Intelligence Improvements

Some intelligence technologies, like seed AI, have the potential to make themselves more intelligent, not just faster, by modifying their source code. These improvements would make further improvements possible, which would make further improvements possible, and so on.

This mechanism for an intelligence explosion differs from an increase in speed in that it does not require external effect: machines designing faster hardware still require humans to create the improved hardware, or to program factories appropriately. An AI which was re-writing its own source code, however, could do so while contained in an AI box.

AI-box Experiment

The AI-box experiment is an informal experiment devised by Eliezer Yudkowsky to attempt to demonstrate that a suitably advanced artificial intelligence can either convince, or perhaps even trick or coerce, a human being into voluntarily “releasing” it, using only text-based communication. This is one of the points in Yudkowsky’s work aimed at creating a friendly artificial intelligence that when “released” won’t try to destroy the human race for one reason or another. The setup of the AI box experiment is simple and involves simulating a communication between an AI and a human being to see if the AI can be “released”. As an actual super-intelligent AI has not yet been developed, it is substituted by a human. The other person in the experiment plays the “Gatekeeper”, the person with the ability to “release” the AI. They communicate through a text-interface/Computer terminal only and the experiment ends when either the Gatekeeper releases the AI, or the allotted time of 2 hours ends.


Despite being of human rather than superhuman intelligence, Yudkowsky was often able to convince the Gatekeeper, purely through argumentation, to let him out of the box. Due to the rules of the experiment, the transcript and his successful AI coercion tactics cannot be revealed. This is made necessary because it is only a simulation of superintelligent AI vs. human contact. It is an analogy that is accomplished with coercion techniques used by one human against another. The intelligence disparity is critical; Yudkowsky has not engaged in this experiment with individuals that he felt could win.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s