What is beautiful? What is not? There clearly are no objective answers to these questions. What is considered beautiful by one observer may be regarded as ugly by another observer. Ideals of beauty are different in different cultures and subcultures, they have changed over the centuries, and they are not even stable with respect to a single individual. Therefore, any theory of beauty has to take the observer into account.
Following common sense, I assume that a typical human observer tries internally to represent input data in terms of what is familiar. Regarding the observer's subjectivity, I assume that the Church-Turing thesis is true (everything that can be computed by a human being can be computed by an appropriate program for a general-purpose computer) and postulate the following setting. At a given time, a human observer's current knowledge about visual scenes can be described as a coding algorithm. This algorithm maps input data (such as retinal activity caused by a work of art in the visual field) onto internal representations of the data. The coding algorithm C, the data D and its internal representation D' can be written as strings of symbols from a finite alphabet. If D' conveys all information about D, but the length of D' is less than the length of D, then D is compressible or redundant with respect to the observer's knowledge. The observer already knew something about D. Similar statements can be made in cases where D' allows only for partial reconstruction of D.
The observer's subjectivity is embodied by the coding algorithm C.
One may be tempted to define the beauty of a drawing with respect
to C. In the following preliminary attempt to do so (inspired by
the MDL approach), I assume that ``beauty" simply corresponds to
``high conditional probability given C":
Given C, the best way of selecting a drawing s from a set or
class S of possible drawings satisfying certain specifications may
be to maximize P(s | C),
the conditional probability of
s, given C.
Bayes' formula tells us
P(s | C) = P(C | s) P(s) / P(C),
-logP(s | C) = -log P(C | s) + logP(C) - logP(s).
Let us interpret this. Since C is given, P(C) may be viewed as a normalizing constant. It can be disregarded. -logP(C | s) can be interpreted as the information (or length of the observer's shortest algorithm) required to compute C from s. P(s) is given by some a priori distribution on the drawings. For simplicity, let us assume that this prior is uniform. Then, given C, s in set S is optimal (most likely, most ``beautiful'') if the information required to compute C from s is minimized.
How can this be related to human experience? The following example attempts to establish such a relationship.
``Beautiful'' faces. Human beings appear to have a certain coding scheme for storing faces in memory. This scheme is certainly different from the circle scheme described earlier. It is probably based on previous experiences with many different faces, and it is probably adapted to code many faces efficiently. One way of doing so is to store a prototype face and code new faces by coding only the deviations from the prototype.
The principle of minmal description length suggests that the ``ideal'' (most likely) prototype FP maximizes P(FP | F), thus minimizing -logP(F | FP ) -logP( FP ), where F is a given set of all faces to be coded. In other words, the optimal prototype minimizes the sum of the description lengths of all faces relative to the prototype, as well as of the description length of the prototype itself (relative to the observer's remaining knowledge about visual scenes).
Assuming that all faces are equally likely to appear in the visual field, the formalism above predicts that the most beautiful face is the one that can be most easily computed from the coding scheme. It seems reasonable to assume that the information required to specify the coding scheme is dominated by the information required to specify the prototype face. If the current face looks like the prototype face, then there is very little additional information to compute. This would imply that the prototype face is perceived as the most beautiful one.
Previous work on attractive faces. The statement above seems compatible with results presented by Langlois and Roggmann , who claim that the ``average face" (computed by digital blending of numerous photos of real faces) is perceived as the most attractive one. Perrett, May and Yoshikawa  partly dispute this claim, however. Their test subjects also appreciate average faces computed by blending  but prefer ``attractive average faces" constructed from faces perceived as attractive. Indeed, the most attractive faces are caricatures obtained by digitally exaggerating the deviations between ``average" and ``attractive average."
Critique of previous work. The studies above, however, do not say much about the plausibility of the algorithms used to compute average faces. Let us assume that the brain does indeed support face-processing by an ideal (in the information theoretic sense) prototype face. It would be naive to assume that the ideal face equals the one computed by blending. There are many plausible algorithms for computing prototypes, based on many plausible metrics for ``distances" between faces. Therefore the studies above, including the statement that the average face is not the most attractive one, have to be judged with skepticism. The presented claims depend on the definition of ``average" and the corresponding nature of the blending algorithms, which may not be very closely related to a hypothetical method the brain might be using for generating the ``optimal" prototype FD. Unfortunately, at the present time it seems impossible to analyze the way the brain stores representations of objects. Therefore it also seems impossible to test the predictions made by the formalism presented above.