Algorithmic Composition: Andrew Pascoe December 7, 2009
Algorithmic Composition: Andrew Pascoe December 7, 2009
Algorithmic Composition: Andrew Pascoe December 7, 2009
Andrew Pascoe
December 7, 2009
Introduction
However, they serve to provide at least some notion of why algorithmic composition
is a controversial subject. The role and intent of the composer form the basis for this
discussion.
2.1
42
from studying the works of Palestrina. Modern composition students still receive
instruction in these rules and aesthetics.
Already one can begin to see how a computer can begin to use these notions of good
composition versus bad composition. It is ultimately simply a matter of eliminating
possibilities and ranking the remaining possibilities in terms of their adherence to the
given aesthetic.
2.2
The common practice period encompasses most styles associated with the term classical music: Baroque, Classical, and Romantic period styles. The theoretical groundwork for these styles is based not only in the counterpoint and voice-leading considerations of Fux, but also in chord progressions. Kostka and Payne write:[11]
[Students] must learn which chord successions are typical of tonal harmony
and which ones are not. Why is it that some chord successions seem to
progress, to move forward toward a goal, while others tend to wander,
to leave our expectations unfulfilled?
They provide the following examples:
3
4
3
4
43
3
4
The implication is that there exists a set of guidelines for writing good, tonal music.
These guidelines are what is typically referred to as music theory, in imprecise,
casual speech. Kostka and Payne summarize some of these guidelines with a diagram
of possible chord progressions, but are quick to point out:
. . . [B]e aware that Bach and Beethoven did not make use of diagrams
such as these. They lived and breathed the tonal harmonic style and had
no need for the information the diagrams contain. Instead, the diagrams
represent norms of harmonic practice observed by theorists over the years
in the works of a large number of tonal composers.
That is to say, these rules and guidelines have been imposed after the fact, much as
Fuxs Gradus ad Parnassum did for the sixteenth century style. Once again, this push
toward systematization continues to reduce the art of composition to an algorithmic
framework.
2.3
The twentieth century is unique in at least one respect to previous centuries: In addition to theorists imposing theoretical structures on previous composers, composers
themselves were creating systems for generating music. This section provides a couple
examples from each camp.
2.3.1
Charles Seeger
Arnold Schoenberg
Arnold Schoenberg created a technique that is known as serial composition, twelvetone technique, or dodecaphony.[16] This atonal process begins with a tone row of
all twelve diatonic pitches. The motivation for this practice is to have the pitches not
relate to any particular key, but only to themselves. From here, only certain musical
transformations are allowed: transposition and inversion. P is used to denote the
prime tone row. Transposition is denoted through the use of subscripts, so, for
example, a transposition of six semitones would be written as P6 . Inversion is denoted
by I , and inversions can also be transposed. Thus, an inversion transposed by six
semitones would be written as I6 . Taruskin gives the following examples using a tone
row from Schoenbergs Suite, Op. 25 :
(a) P0
(b) I0
(c) P6
(d) I6
These processes are extremely mathematical in nature. Besides just being a general sense of what is good composition versus what is bad composition, the rules
laid forth by Schoenberg comment on what is possible composition. Moreover, there
is no question that Schoenberg intended this system to function as hard and fast
rules. Taruskin writes, and perhaps provides a snide comment toward algorithms as
a compositional practice:
Indeed, the use of an exhaustive twelve-note series makes the Grundgestalt
function, and the organic unity thus guaranteed, virtually automatic. And
that, of course, was the great breakthrough, the principle capable of
serving as a rule, which allowed the composition of large-scale, abstract,
and autonomous atonal music of constant and at-all-time-demonstrable
motivic coherence despite its renunciation of predefined tonal hierarchies,
and despite its frequent athematicism. But beware! As soon as any
musical characteristic becomes the automatic result of a method, it stops
being a compositional achievement.
Regardless of Taruskins personal opinions, Schoenbergs serialism does provide
a rigorous framework that a computer can be easily made to emulate. Whether a
computer can make a compositional achievement is a question saved for a later
section of this paper.
6
2.3.3
Leonard Bernstein
Leonard Bernstein began to pose new theoretical question related to the structure
of music, and in that sense his theory imposes interpretations on compositions from
the past.[2] His theory has its roots in the linguistics of Noam Chomsky. The Chomskian framework of transformational linguistics is applied to music. Essentially,
Bernstein equates melody with basic musical meaning, chords as adjectival modifiers,
rhythm as verbs, and mode as negation. He continues this reasoning and explores
the syntactic structures of music by suggesting an underlying deep structure to
musical content that goes through a sort of transformational grammar through not
only these modifiers, but also grammatical aspects of deletion and combination. He
freely admits that his ideas are not fully formed, but here are some basic examples
with potential linguistic equivalents:
This notion is very different from the forms of music theory we have thus far
considered. It does not comment on notes or chords in particular, but rather focuses
on the construction in form of a piece of music. Bernstein likens this to elevating
prose into poetry.
Bernsteins analysis lends a different type of insight into the process of composition. This potentially affords computers the opportunity to replicate human creativity
on a separate level. By using musical fragments and applying transformational gram7
mars to them, new creative structures may emerge. This question is related to natural
language processing.
2.3.4
David Lewin
David Lewin made advances in music theory with his work in generalized musical
intervals and transformations.[12] In many ways it is an extension of Schoenbergs
atonal practice, but through this extension finds relevance to tonal composers, such
as Chopin. Lewins theorization of music relies heavily on the mathematical field of
abstract algebra. Indeed, Lewins book reads more like a math text than a treatise on
music, containing theorems, corollaries, and proofs. In the foreword, Edward Gollin
sums it up succinctly:
The work, a methodical examination of the concept of a musical interval,
explores how the familiar notion of interval as a distance extended between pitches in a Cartesian space is merely one specific case of a more
general idea, one that can embrace different kinds of musical objects (durations, meters, Klangs, timbres, and so on), different (i.e. non-Euclidean)
geometries, and different orientational perspectives (interval as action or
gesture rather than as simply measurement of distance between things).
By far the most mathematical example covered in this paper, Lewins work finds
a fit in the world of computation. Not just composition, but sound itself can be
analyzed rigorously within Lewins framework. This opens the door for computers
to not only compose well, but to also construct interesting collections of frequencies
that evolve over time for synthetic instrument creationaspects of music that are
hardly intuitive for traditional composition practitioners. Thus, the computer can
more easily express musical ideas than a human can.
Recent Methods
This section of the paper now looks at approaches to algorithmic composition from a
few perspectives, namely Benjamin Carson, David Cope, Peter Elsea, and Microsofts
MySong application.
3.1
Peter Elsea
Peter Elsea has explored algorithmic composition through the use of fuzzy logic
systems.[8, 7] Pitch classes are represented as sets that denote membership in a particular chord or scale. For example:
(1 0 1 0 1 1 0 1 0 1 0 1)
This example in the major scale does not contain any real fuzzy logic (though the
set still can function as a fuzzy logic set). The example of the minor scale, in which
the seventh can be raised, provides a fuzzier set:
(1 0 1 1 0 1 0 1 1 0 0.7 0.6)
Note that these are not probabilities, but merely demonstrate a preference for the
flatted seventh over the naturalized one.
This fuzzy logic system affords a host of mathematical operations that can be performed. Transposition is accomplished by circularly rotating the set. Multiplication
between sets is available. Set unions and intersections are accomplished by taking the
maximum and minimum, respectively, of two numbers with the same index. These
operations allow for simple processing in tonal harmony applications.
Elseas software functions by reading in a MIDI note and outputting a harmonization for that note. The basic set of rules are, with regards to chord inversions:
If root position keeps common tones, then root position.
If first inversion keeps common tones, then first inversion.
If second inversion keeps common tones, then second inversion.
If last position was root, then first inversion or second inversion.
If there have been too many firsts in a row, then root or second.
If there have been too many seconds in a row, then root or first.
If last position was not root, then root.
The first three rules ensure that the harmonization does not fly around too wildly.
The next three rules ensure that the harmonization is not too static. Of particular
note for these rules is the use of the fuzzy concept of too many. Thus, programmed
into Elseas code is a sense of what too many actually means. The last rule has a
relatively low weight of being implemented so that it does not interfere with the main
rules too much.
The question, once we move beyond simple inversions, is how to harmonize melodic
notes given their context not only in the key, but also in the context of the preceding
chord. Elsea constructs three candidate chords by using these aforementioned fuzzy
logic sets: one with the note as the root, one with the note as the third, and one
with the note as the fifth in the chord. Then, the actual roots of these chords are
compared with more fuzzy logic sets that are based on the tonal harmony (the diagrams) provided by Kostka and Payne. The best candidate, based merely on taking
the maximum of the resultant computations, is then constructed and played. So for
10
example, in the key of C, if the user plays a G, there are three basic triadic chords in
the key of C that harmonize G, not including their inversions:
I
iii
Which chord to choose is, once again, dependent on the context. If the preceding
chord is a root position I chord, then the following is a possibility:
I
I(6)
But if the preceding chord is a ii chord, the next two examples show a good
progression and a bad progression:
ii
V(6,4)
ii
iii
In short, Elsea takes a mathematical approach to the tonal theory of the common
practice period, and thus the computer is able to make informed, real-time compositional decisions based on melodic user input. This would not have been possible
without a systematization of the music of the period in the first place.
11
3.2
Benjamin Carson
4
, 7,
16 16
and
13
.
16
the composers input as to how complex he wishes a certain series of rhythmic events
to be.
Beyond this, the software allows a composer to view his work from a variety of different dimensions. Rhythmic complexity can be tied to pitch classes or dynamics, for
example. Thus, even though the underlying algorithmic production of these durations
is inherently random, the composer is free to choose some sort of logical framework for
how rhythmic complexity interacts with the rest of the compositions aspects. This
affords a unique mix of both the computers input and the compositional goals and
ideas of the composer.
These syncopated rhythmic grammars can be variably juxtaposed to the underlying ordered pulse of the music. This can be accomplished by shifting the syncopation
over the underlying pulse, or by spreading out the underlying pulse in clock time.
These processes generate psychological musical effects for the listener. Once again, it
is at the composers whim when to produce such rhythmic effects as part of the creative process. In particular, Carson describes this feature of his software as a means of
maintaining interesting rhythmic flows that do not grow stale, but still maintain some
sort of logical structure. The goal is not to be completely random, but instead to have
a certain level of ordered randomness. Thus, the piece has a concrete development
over its course without devolving into either banal repetition or pure chaos.
12
Carson also makes a distinction between metric time and clock time. That is, a
32nd note does not lose its 32nd -ness under tempo changes. Its musical function
does not change despite its variation in actual clock time. To this extent, Carson also
experiments with an algorithmic approach to changes in tempi.
Carsons software is in at least part a practice in compositional economy. By
automating certain musical processes, he finds himself free to focus on other aspects
of his composition he finds either more interesting or more important. However, the
software allows for a large range of input by the composer, allowing him to express
his musical impulses without necessarily having the computer completely control
the output. Carsons idea of musical impulses will be covered in the next section of
this paper.
3.3
David Cope
David Cope has long been involved in the field of algorithmic composition, and much
of his focus has been on making computer models of musical creativity.[6] His program, Experiments in Musical Intelligence (EMI), employs a variety of techniques to
accomplish not only algorithmic composition in the styles of particular composers,
but also to produce new, creative output.
Copes approach is fundamentally database driven. He compiles musical sources
from a particular composer (or particular composers), all of which go through a series
of analyses. From here, the program sets to work using models of recombinance,
allusion, learning, form and structure, and influence. This paper presents a summary
of this ideas and their function.
3.3.1
Recombinance
Recombinance is essentially taking musical material from the database and replacing
sections of it with other material acquired from the database. Copes book provides
a simple example of this process:
13
43
43
43
43
43
3
4
43
3
4
8
The EMI output replicates the original Bach Chorale no. 188 on the first beat,
but diverges from there, instead replacing it with a fragment from Bachs Chorale
no. 157. EMI accomplishes this by forming examining the database and searching for
alternative candidates following from the given first beat. This process is continued
throughout the algorithmically generated composition.
However, this process will ultimately result in free-flowing melodies without any
sense of direction. Cope writes:
In order to provide some sense of this logic and larger structure, and again
wanting to avoid coding my own knowledge of musical form as rules, I
rewrote the program to inherit more of the structural aspects of the music
in its database. This inheritance involved extending the analysis process
so that the program could store other information about the music being
analyzed in each grouping along with destination notes. For example, I
had the program analyze the original musics distance to cadence, position
of groupings in relation to meter, and other context-sensitive features.
The program is also sensitive to compositional signatures, which are larger fragments of music that give a particular composer a sense of musical identity. Thus, there
14
is an aspect of the program that resists the complete fragmentation of raw recombinance. Due to similar concerns of over-fragmentation, EMI also tries to maintain a
cohesion in musical texture by comparing density of notes and rhythmic characteristics between groupings, and searching the database for passages of similar quality in
texture.
3.3.2
Allusion
Allusion refers to the practice of composers to borrow material from other composers.
Cope breaks down allusion into five categories: quotations, paraphrases, likenesses,
frameworks, and commonalities. He freely admits that the boundaries between these
categories are not necessarily clear.
Quotations are as one would expect: direct pitch and rhythm copying. Paraphrases use similar intervallic structures to the source material and can also incorporate rhythmic differences. Likenesses are similar to paraphrases, but allow more
freedom in the differences, as well as changes in harmonic content. Frameworks take
source material and insert notes between the pitches and rhythms of the source material. Such processes can be illuminated through textural reductions. Commonalities
are simplistic musical materials, not unlike Bernsteins concept of deep structures.
Cope developed a secondary program called Sorcerer that can sniff out these allusions. A user provides Sorcerer with a database of musical material and a work that
potentially contains allusions that are in the database. The program compares not
only pitches but intervals between pitches, allowing for variation in these parameters,
and also allows for a certain number of notes to fall within the source material (the
framework idea). Users can specify how sensitive they wish Sorcerer to perform, picking out only quotations, or running the whole gamut up to commonalities. Bounds
on the length of patterns must also be set by the user.
EMI, because of its construction in accessing databases, is naturally inclined to at
least include some elements of allusion. Cope tells a story about how he recognized
15
in one of EMIs pieces an allusion to Vivaldi, despite the database being entirely
comprised of works by Bach. Bach himself transcribed a series of Vivaldi pieces. Did
EMI reveal a hidden allusion in Bach to the music of which he himself had intimate
knowledge?
3.3.3
Learning
Cope describes another program he worked on called Gradus (named after Fuxs
Gradus ad Parnassum) which attempts to learn the rules of first species counterpoint.
Other programs in the past have codified solutions to this problem, but they do not
learn in particular.
A melodic line is given to these programs (called a cantus firmus). The goal is to
write suitable counterpoint against this line. Previous programs begin with the first
note and proceed to the next. As any student of counterpoint can attest to, this nave
approach can quickly run into serious problems. Some beginning solutions resolve in
dead ends where there are no possible notes to write that will conform to the rules.
Therefore, such a program needs to backtrack to the last good note and attempt to
replace it with another possible solution, and then continue on from there, potentially
running into the same problem all over again.
Copes program stores its mistakes and avoids replicating them again in the future.
So the program starts by having to backtrack, but the more the program practices,
the less it must backtrack to find a solution to the given cantus firmus. While Copes
example has limited application to sixteenth century first species counterpoint, the
underlying philosophy has large implications for algorithmic composition as a whole.
The ability of programs to learn what to avoid can not only increase efficiency, but
also produce higher quality work.
16
3.3.4
Influence
compositions. Cope programmed in a metric for what a good composition is: the
kinetic energy of the music. Each program has a slightly different metric, and hands
over to the other programs music it evaluates as aesthetically acceptable. In this way,
the programs are attempting to influence each other, resulting in compositions that
demonstrate disparate sources. The author finds a similarity between this idea and
Bohm Dialogues.
Last, Cope describes influence as a form of exploration. He created a web-crawling
application called Serendipity that would download MIDI files (and text files) that
would be added to databases. At first, this process resulted in a mish-mash of styles
in one composition that was not aesthetically pleasing. So Cope produced filters
that would determine if a given MIDI file was good enough, or even appropriate,
for inclusion in EMIs database. This resulted in much more cohesive and enjoyable
compositions.
3.4
MySong
and takes all of the notes in a particular measure and compiles them into a set. Here
is an illustration:
(a) Sung line.
The database MySong employs is a collection of about three hundred lead sheets.
Lead sheets contain melodic content along with chord progressions. These lead sheets
come from a variety of popular music genres, including pop, rock, and country. As
such, these styles of music usually find changes in harmony at the bar line. Here is
an example of what a lead sheet looks like:
The method for algorithmic composition is based in Hidden Markov Models (HMM).
Essentially, a Markov model represents a stochastic process through transition probabilities. Based on an analysis of the database, we might conclude that a C major
chord will move to a G major chord 30% of the time, and move to an F major chord
20% of the time. What makes the Markov model hidden in this application is that
the chord progression is unknownall that is available are the pitch classes derived
from the user input. Thus, there is another level of analysis involving the probability
of a chord for the measure given the pitch class.
The way to find the optimal solution given a set of observances in an HMM is to
use the Viterbi algorithm. A mathematical discussion of the Viterbi algorithm and
why it works is beyond the scope of this paper, but it is important to know that there
19
20
Musical Meaning
42
ff
2
4
p
43
What, if anything, do these bits of music mean? Taking a cue from MySongs
use of a Happiness Factor, should we conclude that Beethoven was having a bad day,
and Shostakovich happened to be feeling jubilant when they wrote their respective
pieces? One of the authors past professors suggested that such interpretations were
dangerous, but Aldwell and Schachter suggest that our connotations of the major
mode being happy and the minor mode being sad should not be dismissed too
carelessly.[1] Of course, this begs the question, What was EMI feeling?
Hofstadter finds these questions, given the quality of EMIs output, disturbing.[10]
He writes:
. . . I was going to grapple with this strange program that was threatening
to upset the apple cart that held many of my oldest and most deeply
21
cherished beliefs about the sacredness of music, about music being the
ultimate inner sanctum of the human spirit, the last thing that would
tumble in AIs headlong rush towards thought, insight, and creativity.
Having a personal affinity for Chopin and being surprised at how well EMI emulated Chopin, Hofstadter boils his worries down to three possible outcomes, each of
which he finds unsatisfactory as a lover of music and as a human being:
(1) Chopin (for example) is a lot shallower than I had ever thought.
(2) Music is a lot shallower than I had ever thought.
(3) The human soul/mind is a lot shallower than I had ever thought.
These concerns are strange since Hofstadter claims he has long been convinced
that humans are effectively machines anyway. He claims his surprise lies in the fact
that the machines producing EMIs output are many orders of magnitude simpler than
human biological construction. Regardless, the author is not particularly surprised
by such developments in the production of machine musicthe long historical trend
of the systematization of music lends itself to mechanical replication.
But Hofstadter is not alone. Cope has found resistance in music circles to having
EMIs output performed by trained human musicians, saying responses were mostly
negative.[5] Why do people have such revulsion to the idea that a computer can
compose well? Do people read that much meaning into music?
Bernstein rejects the notion of any sort of extra meaning in music.[3, 2] For him,
the meaning of music is contained entirely within the notes themselves. He declares
that music does not have the same function as language, and any meaning it has is
ineffable. Hofstadters claim that music functions as direct soul-to-soul messages
does not find a place in Bernsteins philosophy of music. As Bernstein says:
. . . [L]anguage has a communicative function and an aesthetic function.
Music has an aesthetic function only. For that reason, musical surface
structure is not equitable with linguistic surface structure.
22
is the act of recording musical impulses. These musical impulses function not only
consciously, but subconsciously. Using an algorithm, or even developing an algorithm,
inherently incorporates musical impulses.
Will human composers become obsolete? Carson does not think so. Humans have
a natural tendency and demand to relate to other humans, and thus have a desire
to experience compositions produced by human composers. However, as stated, even
an algorithm expresses some element of a composers will and aesthetic. Carson even
extends the musical impulse to a user merely pushing play on some automatic
composition device.
He also does not take a post-modern stance of all music being equal. In response
to Hofstadter, Chopin is still great whether or not his style is replicable by machines.
Yes, music is a shallow surface structure, but Chopins greatness can be attributed
to his ability to produce surface structures of enormous musical complexity. Carson concludes that Chopin himself is essentially an algorithm of both conscious and
subconscious processesbut he is a very good algorithm.
The shallowness of music, according to Carson, is not a fundamental problem.
Music still has value. People associate with music because it relates to them on a
multitude of levels, including cultural and emotional levels. Audiences may incorporate meaning into their musical experiences, and there is nothing inherently wrong
with this. However, there is a benefit in increasing musical understanding by dispelling these myths about composition as some sort of divine, mystical, or spiritual
process. Knowledge of actual compositional processes should aid audiences in their
appreciation of music as an art form.
The question about what EMI was feeling when it output the music above is
perhaps a misguided question. The real question could easily be, What were David
Copes musical impulses when he wrote EMI, compiled a database of Bach lute suites,
and pressed the generate button?
24
Conclusion
Algorithmic composition is still a relatively nascent field given the long course of
music history. Its full potential is yet to be explored, and it has yet to completely
break into the public consciousness. Hofstadter asks:[10]
Where will we have gotten in twenty years of hard work? In fifty? What
will be the state of the art in 2084? Who, if anyone, will be able to tell
the right stuff from rubbish? Who will know, who will care, who will
loudly protest that the last (though tiniest) circle at the center of the
style-target has still not been reached (and may never be reached)? What
will such nitpicky details matter, when new Bach and Chopin masterpieces
applauded by all come gushing out of silicon circuity at a rate faster than
H2 O pours over the edge of Niagara? Will that wondrous new golden age
of music not be truly a thing of beauty?
And he concludes:
. . . [T]he day when music is finally and irrevocably reduced to syntactic
pattern and pattern alone will be, to my old-fashioned way of looking at
things, a very dark day indeed.
The author shares a curiosity in Hofstadters questions, but applies a separate
moral judgment on the future of music. Sharing in Carsons idea of musical impulse,[4]
the future looks bright for composers to continue to develop systems, as theoreticians
and composers alike have done throughout history, that will explore new avenues of
musical expression and insight. Algorithmic processes do not detract from our humanity in the slightest. When all is said and done, an algorithm is still a creation
founded in the human mind, and creations are inherently imbued with the wills of
their creators.
25
References
[1] Edward Aldwell and Carl Schachter. Harmony and Voice Leading. Schirmer,
New York, New York, third edition, 2002.
[2] Leonard Bernstein. The Unanswered Question. Harvard University Press, Cambridge, Massachusetts, 1999.
[3] Leonard Bernstein. The Joy of Music. Amadeus Press, LLC, Pompton Plains,
New Jersey, 2004.
[4] Benjamin Carson. Personal Interview, December 4, 2009.
[5] Jacqui
Cheng.
controversy.
virtual-composer-makes-beautiful-musicand-stirs-controversy.ars,
Accessed December 6, 2009.
[6] David Cope. Computer Models of Musical Creativity. The MIT Press, Cambridge,
Massachusetts, 2005.
[7] Peter Elsea. Fuzzy Logic and Musical Decisions, 1995. ftp://arts.ucsc.edu/
pub/ems/FUZZY/Fuzzy_Logic_And_Music.pdf, Accessed December 6, 2009.
[8] Peter Elsea. Musical Applications of Fuzzy Logic, 1995. ftp://arts.ucsc.edu/
pub/ems/FUZZY/MusAppFuzzy.pdf, Accessed December 6, 2009.
[9] Robert Gauldin. A Practical Approach to Sixteenth-Century Counterpoint. Waveland Press, Inc., Long Grove, Illinois, 1995.
[10] Douglas Hofstadter.
http://www.unc.edu/~mumukshu/
26
[11] Stefan Kostka and Dorothy Payne. Tonal Harmony, With an Introduction to
Twentieth-Century Music. McGraw-Hill, 1221 Avenue of the Americas, New
York, New York, fifth edition, 2004.
[12] David Lewin. Generalized Musical Intervals and Transformations. Oxford University Press, Inc., 198 Madison Avenue, New York, New York, 2007.
[13] Daniel
Morris.
Melodies, 2008.
MySong:
Automatic
Accompaniment
for
Vocal
27