What is beautiful? What is not? There clearly are no objective answers to these questions. What is considered beautiful by one observer may be regarded as ugly by another observer. Ideals of beauty are different in different cultures and subcultures, they have changed over the centuries, and they are not even stable with respect to a single individual. Therefore, any theory of beauty has to take the observer into account.

Following common sense, I assume that a typical human observer
tries internally
to represent input data in terms of
what is familiar. Regarding the observer's subjectivity, I assume
that the Church-Turing thesis is true (everything that can be
computed by a human being can be computed by an appropriate
program for a general-purpose computer) and postulate the
following setting. At a given time, a human observer's current
knowledge about visual scenes
can be described as a coding algorithm. This algorithm maps
input data (such as retinal activity caused by a work of art in
the visual field) onto internal representations of the data. The
coding algorithm *C*, the data *D* and its internal representation *D*'
can be written as strings of symbols from a finite alphabet. If *D*'
conveys all information about *D*, but the length of *D*' is less than
the length of *D*, then *D* is compressible or redundant with respect
to the observer's knowledge. The observer already knew something
about *D*. Similar statements can be made in cases where *D*' allows
only for partial reconstruction of *D*.

The observer's subjectivity is embodied by the coding algorithm *C*.
One may be tempted to define the beauty of a drawing with respect
to *C*. In the following preliminary attempt to do so (inspired by
the MDL approach), I assume that ``beauty" simply corresponds to
``high conditional probability given *C*":
Given *C*, the best way of selecting a drawing *s* from a set or
class *S* of possible drawings satisfying certain specifications may
be to maximize P(s | C),
the conditional probability of
*s*, given *C*.
Bayes' formula tells us
P(s | C) = P(C | s) P(s) / P(C),
or, equivalently,
-logP(s | C) = -log P(C | s) + logP(C) - logP(s).

Let us interpret this.
Since *C* is given, *P*(*C*) may be viewed as a normalizing constant.
It can be disregarded.
-logP(C | s)
can be interpreted as the
information (or length of the observer's shortest algorithm)
required to compute *C* from *s*.
*P*(*s*) is given by some *a priori* distribution on the
drawings. For simplicity, let us assume that this prior
is uniform.
Then, given *C*, s in set S
is optimal (most likely, most ``beautiful'')
if the information required to compute *C* from *s* is minimized.

How can this be related to human experience? The following example attempts to establish such a relationship.

**``Beautiful'' faces.**
Human beings appear to have a certain coding scheme for storing
faces in memory. This scheme is certainly different from the
circle scheme described earlier. It is probably based on previous
experiences with many different faces, and it is probably adapted
to code many faces efficiently. One way of doing so is to store a
prototype face and code new faces by coding only the deviations
from the prototype.

The principle of minmal description length suggests that
the ``ideal'' (most likely) prototype *F*_{P} maximizes
P(*F*_{P} | F),
thus minimizing
-logP(F |
*F*_{P}
) -logP(
*F*_{P}
),
where F is a given set of all faces to be coded. In other words,
the optimal prototype minimizes the sum of the description lengths
of all faces relative to the prototype, as well as of the
description length of the prototype itself (relative to the
observer's remaining knowledge about visual scenes).

Assuming that all faces are equally likely to appear in the visual field, the formalism above predicts that the most beautiful face is the one that can be most easily computed from the coding scheme. It seems reasonable to assume that the information required to specify the coding scheme is dominated by the information required to specify the prototype face. If the current face looks like the prototype face, then there is very little additional information to compute. This would imply that the prototype face is perceived as the most beautiful one.

**Previous work on attractive faces.**
The statement above seems compatible with results presented by
Langlois and Roggmann [9], who claim that the ``average face"
(computed by digital blending of numerous photos of real faces) is
perceived as the most attractive one. Perrett, May and Yoshikawa
[10] partly dispute this claim, however. Their test subjects also
appreciate average faces computed by blending [11] but prefer
``attractive average faces" constructed from faces perceived as
attractive. Indeed, the most attractive faces are caricatures
obtained by digitally exaggerating the deviations between
``average" and ``attractive average."

**Critique of previous work.**
The studies above, however, do not say much about the plausibility
of the algorithms used to compute average faces. Let us assume
that the brain does indeed support face-processing by an ideal (in
the information theoretic sense) prototype face. It would be naive
to assume that the ideal face equals the one computed by blending.
There are many plausible algorithms for computing prototypes,
based on many plausible metrics for ``distances"
between faces. Therefore the studies above,
including the statement that the average face is not the most
attractive one, have to be judged with skepticism. The presented
claims depend on the definition of ``average" and the corresponding
nature of the blending algorithms, which may not be very closely
related to a hypothetical method the brain might be using for
generating the ``optimal" prototype *F*_{D}. Unfortunately, at the
present time it seems impossible to analyze the way the brain
stores representations of objects. Therefore it also seems
impossible to test the predictions made by the formalism presented
above.

Back to Theory of Beauty page

Back to Algorithmic Information page