In algorithmic information theory, algorithmic (Solomonoff) probability is a mathematical method of assigning a prior probability to a given observation. In a theoretic sense, the prior is universal. It is used in inductive inference theory, and analyses of algorithms. Since it is not computable, it must be approximated.

It deals with the questions: Given a body of data about some phenomenon that one wants to understand, how can one select the most probable hypothesis of how it was caused from among all possible hypotheses, how can we evaluate the different hypotheses, and how can we predict future data?

Algorithmic probability combines several ideas: Occam’s razor; Epicurus’ principle of multiple explanations; special coding methods from modern computing theory. The prior obtained from the formula is used in Bayes rule for prediction.

Occam’s razor means ‘among the theories that are consistent with the observed phenomena, one should select the simplest theory’.

In contrast, Epicurus had proposed the Principle of Multiple Explanations: if more than one theory is consistent with the observations, keep all such theories.

A special mathematical object called a universal Turing machine is used to compute, quantify and assign codes to all quantities of interest. The universal prior is taken over the class of all computable measures; no hypothesis will have a zero probability.

Algorithmic probability combines Occam’s razor and the principle of multiple explanations by giving a probability value to each hypothesis (algorithm or program) that explains a given observation, with the simplest hypothesis (the shortest program) having the highest probability and the increasingly complex hypotheses (longer programs) receiving increasingly small probabilities. These probabilities form a prior probability distribution for the observation, which Ray Solomonoff proved to be machine-invariant within a constant factor (called the invariance theorem) and can be used with Bayes’ theorem to predict the most likely continuation of that observation. A universal Turing machine is used for the computer operations.

Solomonoff invented the concept of algorithmic probability with its associated invariance theorem around 1960, publishing a report on it: “A Preliminary Report on a General Theory of Inductive Inference.” He clarified these ideas more fully in 1964 with “A Formal Theory of Inductive Inference,” Part I and Part II.

He described a universal computer with a randomly generated input program. The program computes some possibly infinite output. The universal probability distribution is the probability distribution on all possible output strings with random input.

The algorithmic probability of any given finite output prefix q is the sum of the probabilities of the programs that compute something starting with q. Certain long objects with short programs have high probability.

Algorithmic probability is the main ingredient of Solomonoff’s theory of inductive inference, the theory of prediction based on observations; it was invented with the goal of using it for machine learning; given a sequence of symbols, which one will come next? Solomonoff’s theory provides an answer that is optimal in a certain sense, although it is incomputable. Unlike, for example, Karl Popper’s informal inductive inference theory, however, Solomonoff’s is mathematically rigorous.

Algorithmic probability is closely related to the concept of Kolmogorov complexity. Kolmogorov’s introduction of complexity, however, was motivated by information theory and problems in randomness while Solomonoff introduced algorithmic complexity earlier, for a different reason: inductive reasoning. A single universal prior probability that can be substituted for each actual prior probability in Bayes’s rule was invented by Solomonoff with Kolmogorov complexity as a side product.

Solomonoff’s enumerable measure is universal in a certain powerful sense, but the computation time can be infinite. One way of dealing with this is a variant of Leonid Levin’s Search Algorithm, which limits the time spent computing the success of possible programs, with shorter programs given more time. Other methods of limiting search space include training sequences.