The Science and Art of Sonification, Tufte’s Visualization, and the "slippery slope" to Algorithmic Composition
(An Informal Response to Ed Childs’ Short Paper on Tufte and Sonification; with additional commentary by Ed)
Reading Ed’s wonderful precis of Tufte’s first book (The Visual Display…), I had a number of thoughts regarding sonification and composition. I was somewhat surprised by the complexity of my reaction, pointing out to me that certain assumptions I’d been making about sonification, mapping, and algorithmic composition needed rethinking. In particular, this paper raises (for me) subtle distinctions between scientific and artistic sonification (the latter I’d like to call manifestation), and the further distinction between what should clearly be called sonification and what is more appropriately called algorithmic composition. Ed’s description of Tufte’s possible extension to sonification helped to clarify two very different approaches which I’d been conflating.
I wrote these
notes in response to Ed’s writing, and then asked Ed to critique them.
Some of Ed’s edits/comments are integrated into the body of this
(and thanks to Ed for all of that, most of which is simply and
acknowledged here). Some of Ed’s comments I have placed at the end, as
A gedanken experiment: two "pieces." In the first, we use the stochastic algorithm for computing PI. This entails throwing random coordinates at a square, and testing to see if they fall inside an inscribed circle. The number of points that inside the inscribed circle is NC and the number of total points thrown is NA (a successive approximation of the area A of the square, and thus the area of the circle, and thus PI) ? the ratio NC /NA tells us, via simple geometry (PI r2 = A), the value of PI. The more points we throw, the closer we get to PI, not in any systematic mathematical way, but in a jagged, strange, probabillistic trajectory (it occurs to me that it might be a simple matter to determine what this approximation function is, and that it might be an interesting thing to listen to on its own).
Our estimates for PI might move from 7 to .01 and to 100 for all we know, but eventually, by the law of large numbers, we’ll get a very accurate representation. In fact, since by definition PI is related to to the area of a circle, we could say that this technique is an appropriate way of measuring it. It shows us, in fact, visualizes, the meaning of the value as it computes it.
Now suppose we have two pitches, tuned some random distance apart. Since we know a priori what PI is, we could use those pitches to tell us how close we are in our stochastic algorithm. The closer we approximate it, the closer those two pitches get to a just perfect fifth (or something like that). The further we are from PI, the further apart the pitches get (and we could allow them to go above and below each other, or we could use the approximation to make them more or less consonant by some consonant measure). By listening to the way the sonority approaches an easily heard timbre/consonance, we can hear the process for computing PI.
The second piece is as follows (in fact, it’s a kind of simple version of the basic idea behind a series of my own pieces). We’re interested in probability distributions, and the way they’re heard, and what they might mean to music. Let’s say we write a little 3 minute piece for piano in which in the first minute, the pitches are uniformly distributed over the piano keyboard, in the second minute, they are distributed as a Poisson, and in the third, a Myhill. Maybe we even have little morphing interludes between the three distributions.
Now, what’s the difference between piece one and piece two (in the context of the discussion of sonification). Clearly, they are both deterministic mappings of clear, mathematical, procedural, formal ideas. Also, they’re both attempts to hear something that is thought to be a mathematical construct.
But in the first, we have no notion of an aesthetic progression, nor is our objective that we learn something new about music and perception from listening to this process (although that could, of course, happen). The intent of the first is to sonify a process, to better understand that process.
In the second, our primary intention is not to understand any better the three probability distributions. It’s not just that we already understand them so well (in this case, it could be argued that in fact we do, at least they are less necessary of some sort of sensory elucidation than the PI computation algorithm), or that they’d be any less interesting to hear. One could easily imagine sonifying these for a class on probability, and having that sonification be illustrative (in the same fashion as the first piece). As Ed points out, "each distribution should produce a different, recognizable texture." Note that this is a statement which suggests their sonification’s utility for either art or science.
Rather, the second piece is, in most of our conventional definitions, a piece. It is meant to be first a musical, or even aesthetic experience, and less so pedagogical. It even has a kind of temporal form which we associate with conventional musical ideas. We’re trying to use a set of natural processes, with profound and universal meanings, to instantiate a new musical form. We don’t want to hear the Gaussian distribution so much as we want to use the Gaussian distribution to allow us to hear a new music.
The first case is appropriately called, it seems to me, sonification. It’s role is pedagogical, illustrative, mathematical. Sound elucidates mathematics. The second case is called, usually, algorithmic composition (or to use Ames’ more careful term, computer-aided composition), but I would prefer to think of it more as a pure example of experimental music (a music in which some more or less unknown result emanates from some experimental "hypotheses"). Mathematics inspires sound. It is worth noting, in passing, that the second piece doesn’t necessarily require the use of the computer at all (one could hand transcribe the results of a table from a mathematics book). When the intent is clearly to use a formal or mathematical/formal process to create a new musical idea as a form of sonification, I propose the term manifestation (but I’m open to suggestions).
The definition of sonification
Tufte’s "principals of graphical excellence"
Several points are worth considering here. First, Tufte’s notions of excellence may or may not apply to manifestation. Clearly, "precision," "greatest number of ideas in the smallest space [time]," and "clarity" only directly apply when one’s interest is translational, illustrative. Even then, they preclude the possibility of elliptical, ironic, poetic, or even satiric data visualization. I suppose for a statistician, these latter forms of expression may be outside the realm of the useful, but they exist nonetheless and the "principles of graphic excellence" certainly do not account for their possibility. More, an elliptical statistical display, by definition, would verge on the artistic, and by this we can establish a continuum between the sonified and the manifested, not simply a binary apposition.
For the purposes of artistic sonification, the principles cannot be said to apply. There is no canon of art which necessitates "efficiency," "economy," or even, to play devil’s advocate, "clarity." While a great many artists would agree that these are desireable qualities, art in general has no such rules, requirements. While these notions might be (and are) useful starting points for many beginning artists, and pedagogically productive, once established as principles they must of necessity be confounded or least manipulated by most working artists.
Another minor point, in amplification of Tufte (or more appropriately, Childs’ explanation of Tufte): the "artistry" of a presentation is perhaps as important as any of the other graphical excellence criteria. In fact, perhaps more so. One can satisfy economy, efficiency, clarity without making an excellent presentation of data. All of Tufte’s best examples make use of another quality, which I would like to call artistry: the careful choice of elements within the environment of the other criteria. These choices are most likely easily explained by more conventional design and visual art principles (and as such, no great mystery). The choice of color, juxtaposition, size, shape, figure/ground and other graphic elements distinguish, in his examples, between the dutiful and the beautiful.
"Direct translation into the auditory domain"
The point about the time it takes to comprehend something being more or less translatable to repeated listening is a good one, but it is less clear to me, in general, exactly how "space in the visual… corresponds to … time in the auditory." It’s not that it can’t, it’s just that the transform can be quite complex. It is convenient of course, and a natural assumption, but there is no real reason to believe that we perceive visual time space in ways closely analogous to auditory time (and Ed is not suggesting that we do, because if we did, we would have no need of sonification at all). In some circumstances, there are clear similarities (gestalt proximity for one, or even divisions of the field (pulse, grid)). In others, it is less clear. For example, auditory time involves concepts like repetition, recapitulation, backward and forward reference, morphological similarity and transformation. In visual and auditory domains these are vastly different.
"How much ink…?"
There is, arguably, no such thing as a "measurement" of sound. If there is anything like this, it would certainly be loudness, but that is of course not the same as the density of ink on a page. The "measurement" of sound is a very interesting, and relatively unexplored, idea.
There are measures (like spectral density, or spectral saturation) which attempt to measure a similar thing. But more appropriately, ink might be translated, for the auditory, to time ? how long it takes to communicate an idea (see above). Where Tufte talks about density, redundancy, think about telling a story in the shortest possible time ("ink" as a journalistic metaphor). In fact, the one measurement of sound we might undeniably have is how long it is. While it is not clear to me that this necessarily means that (even by Tufte’s ideas) "data should be sonified in the shortest possible time," that is certainly one possible interpretation. There are, however, others, for example: data should be sonified using the fewest possible variant parameters. That is, quantity of ink might be translated into variety of musical elements. Eliminating pitch as a variable, or rhythm, or timbral differentiation might be the analogy required.
Another important question here has to do with an analog of what we may call the measurement of complexity. Kolmogorov, Chaitin, and others have described complexity (or randomness) more or less by the "length of data (binary string) that it takes to send a message." In other words, how much description is needed for the thing being described. "I am coming home" is by definition a less complex message than "I am coming home at 7:00." Random messages are complex because their description more or less equals their message. Bigger numbers are more complex than smaller ones (they take more digits), repeating messages are simpler (just need to describe the pattern and the manner of its repetition), and so on.
Where this is relevant to sonification (and not really so much visualization) is in real-time situations, where ongoing processes are sonified, whose unfolding in time is part of what their interest. Ed very insightfully points out that "this is where the distinction betweeen sonification and audification" is made clear: audification is frequently used to compress large data sets.
Charles Dodge’s Earth’s Magnetic Field is a good example: the data needs to exist in some compression of its own time flow, but in the time flow itself (otherwise, it had no meaning). Another example is the standard one of sonifying computer registers for programming cues: the data must exist in real-time, and cannot be "shortened," otherwise it would not be a functional descriptor but an archival one (where it could of course be highly compressed, and taken out of time). Consider the sonification of some sort of etiological phenomenon (a herd, a flock, a school of fish). We need to listen to things happen, and it might be that the real-time, continuous aspect of this sonification is in fact what we are most truly interested in (rather than a compressed or statistical reduction, which gave us some analog of the morphology of the movement). We might not want to hear an atemporal chart of an animal’s movements, we might want to hear those movements themselves in some translation, like listening to background radiation as a safety monitor.
"The number of information-carrying dimensions…"
Just as there is no measurement of sound, or more accurately, no generally agreed upon such measure, there is no agreement and consequently tremendous variation in the graphical description of sound. From concréte-like descriptive morphologies (Wishart, Shaeffer, and others who’ve worked on the theory of music concréte) to notational explorations (from the standard musical score, to every other conceivable descriptive/prescriptive/inscriptive/postscriptive musical notation, including the 20th century experiments of Cage, Xenakis, Feldman, Ligetti, the Warsaw school, Brown, Grainger, the futurists, Crumb, Corner, Goldstein, and so many others) to technical attempts (Lemur, waterfall plots, sonograms, melographs, sound-graphic conversions), to synaesthetic experiments (color organs and their ilk (in both directions), and software "visualizations" of our favorite music) in fact to every conceivable use and misuse of graphics to visualize sound (misuse: see Penelope Leach’s wonderful chart-junk depiction of "an infant’s cry" in her standard book on infants), there is almost complete non-concordance on what we see when we hear, or what we should see when we hear.
The Mapping Problem
Over the years I’ve seen and participated in hundreds of attempts at what I like to call the mapping problem: an idea in one domain is manifested in other. I think that in a lengthy year of discussions with my student John Puterbaugh, who, among other things, had a deep interest in the repercussions of Walter Benjamin’s work (as it related to musical notation, digital technology, and composition), that phrase became a kind of catch-all for everything from the digits of 1 piece to musical fractals to chaos equations pieces to the more sophisticated compositional experiments of people like Tenney, Ames, Koenig, and Xenakis.
I like that term, because it states that it clearly is a problem, in the best sense of the word (like a math problem). And it is one for which there certainly exists no current solution, and because of the nature of art, is not likely to have a solution.
But it is clear that most beginning attempts lack the insight to explore the problem on its own terms, and in addition, pay no heed to the many times it’s been tried. Once, invited as a guest composer to listen to the results of an assignment of a beginning electronic music class, I heard 20-30 1-3 minute pieces which mapped the standard chaos bifurcation equation to MIDI pitches. Not one of the pieces proceeded upon the results of the previous, nor on the countless other pieces like this that have been tried over the years. None of them shed any light on what it means to take such a concept and map it into sound.
What is clearly needed is one person, or some group of people, to proceed a bit more methodically and thoughtfully, so that the effectiveness and interest of this approach (mapping) is not judged by the efforts of people who’ve not given it serious thought. It’s like evaluating the validity of the calculus based on the equations of people who’ve barely learned algebra.
The problems with many mapping attempts are in some ways embodied by the "first thoughts" given in the article: "should a factor of two data be mapped to a doubling of volume" (or "octave" if in pitch). It is clear, for example, as said here that ratiometric perception is fundamental, and needs to be respected and explored in some way in the mapping. But what is the sense, for instance, in the common practice of mapping architectural proportions to tuning ratios (and vice versa) in search of supranatural concordance (see the numerological/tuning work of Ernst McClain and others; Isacoff’s new book on temperament has some interesting things to say about this as well).
Or is there a sense at all? I have argued frequently, for instance, that certain proportions will be perceived cross-parameter and cross-media. The compositions of Henry Cowell (and James Tenney, myself, Carter Scholz, Kyle Gann, and others in what might be called the "rhythmicana" school) proceed from some assumption that tuning ratios and durational ratios have some common terrain, perceptually (or at least compositionally). Composer Ben Johnston’s music argues a kind of similar point: complex tuning ratios are embodied rhythmically. Certainly, if one goes back to a kind of Chaitin-esque "complexity" measure, this holds true, at least in the most simple cases: 2:1’s are fairly simple (low bits), so are 3:1’s and 3:2’s, and so on, and this is probably the case across the board (next time you have to divide up a pie, try doing it in your head first in half, then in thirds, then in 5.5 parts!).
But it’s hard to extend that analogy too far with cognitive evidence, because past a certain simple point there seems to be a rapid decay of cognitive weight, and other factors come into play. For instance, how much less simple is a 13:11 rhythmically (or architecturally even) than a 15:13? In general, they’re both in some other category simply called "complicated." I’m not convinced that the same logic that advocates simple ratio perception extends past the basic notion that "simpler is simpler" ? the perceptual curve may be very steep indeed in its drop off into indistinguishibility of complexity.
But the best mappings, like those of, for instance, the golden mean, may be categorical, or more accurately described as feature-based. That is, looking at quantities across perceptual boundaries may not be as useful as using higher level features (and the golden mean may be one, because of the unique set of relationships it describes). For example, "brightness" is a feature that we may very well correlate between light and sound, as well as "loudness" and "intensity." While synaesthetes claim color/pitch correlations, this is, in general, not the case for most of us. But at even higher levels, perhaps gestalt segregative laws, ideas of repetition, predictability, even cultural association, mappings may be more effective. The stereotype of "minor/dark/sad" may not be as trite as we’ve always thought, and we might want to investigate more subtle, cognitively based and universal aspects of those kinds of associations. Consonance mappings, agogic accent mappings, timbral feature mappings may all be far more fruitful than direct acoustic ones. In fact, if one looks at the taxonomy of "similarity" measures in psychology and cognitive theory, there is already a kind of simple classification of these kinds of measures: geometric similarity, feature similarity, statistical similarity, and so on. I think it is useful to think of the mapping problem in this regard ? for example, feature mapping would require some data reduction and classification to make the mapping (how would, for example, numerical data be mapped onto spectral onset curves. What about taking the autocorrelation of a data set and mapping that onto harmonicity of a spectra?).
One of my favorite sonification/art examples is also one of the simplest, douglas repetto’s Sine Family Tree, in which sine waves enter at different frequencies representing the composer’s geneaology, in temporal proportion to the century’s progress. The "voice over" in this case is lovely, direct, and poetic (a voice recites the month in pulse, so that the years are marked into 12 parts). This piece sonifies the family’s history as clearly as a visual chart, but also transcends the geneology into art. It’s a nice model for either end of the continuum.
Sonification must resist the temptation to … embellish
All depiction is embellishment, in fact, I would claim that mapping itself is embellishment. The only thing that is not embellishment is the thing itself.
That said, Tufte’s notion of well-designedness is perhaps a better guideline than the prohibition of embellishment. Things should not be redundant, nor arbitrary, nor perhaps overly obvious. In general. But in art there are always counterexamples, and it is not a good idea to try and establish a design canon where the act itself is by definition an arbitrary translation (sonification of numerical data).
One, usually desirable quality, is that a thing (a piece, a sonification) does not need further explanation (it is the explanation). That is not the same thing as saying that it might not benefit from more explanation (what would not?), but that in itself it is complete and coherent. It’s always nice to know more, or get other perspectives, but if the sonification actually sonifies, it should explain itself.
Area and volume
That’s an important example, where Tufte points out the misuse of area in representing linear data. It also happens a lot in the use of the linear to represent the exponential, and vice versa. His idea of using one dimension to represent one idea is a good one, and in general fine, but not always exclusive of adding a little spice or just plain fun to a sonification or a visualization. Chartjunk, to an artist, may not be always bad (that would leave only realist art).
I think of Tufte’s area example as more an example of the overlooking of aggregate features, as I talked about above. Area and volume are the result of two other dimensions, not independent of either one. So is, for example, loudness (amplitude and frequency). These kinds of higher level correspondences are what is needed. That is why it is not true that in the sound world, the "dimensions are all independent of each other." Loudness results from frequency and amplitude, and amplitude itself is a multi-featured construct (RMS? Peak? Other measures?). Timbre is so multi-dimensional that I continue to advise against even using the word. Even pitch/fundamental is an aggregate construct: pitch of what (strongest harmonic? strongest lowest harmonic? generally agreed upon fundamental?). It is a big mistake to artificially separate psychoacoustic features as if they are acoustic.
Similarly, I think it is also a mistake to assume the kind of one-to-one correspondence that suggests, for instance, a reference pitch and aboveness or belowness being the mapping. There are infinitely many possibilities for these kinds of representation, many of which are more sophisticated.
For example, in my own sonification piece (3 Studies, for live computer and musicians, written around 1988), I use three different sonic "fabrics," perturbations in which sonify the measured similarities of improvised melodies by the musicians (see my Computer Music Journal article on "Live Interactive … Music in HMSL…," from the mid-1990s). The three fabrics are: a six-part chord that starts out wide and shrinks to a unison; a six voice pulse which is randomly perturbed around the pulse; a set of glissandi whose monotonicity reflects the degree of similarity. In all three cases, one sonic phenonena (melodic similarity) is illustrated by another. In a sense, I am sonifying sound.
Idea: Something Needed
One of Tufte’s great contributions is that he has provided us with an exhaustive set of graphical examples and mis-examples. He says chartjunk, and he shows us chartjunk. He says economy of dimensions, and he shows us economy of dimensions.
What we need is a set of examples
in sonification that illustrate the principles of sonic design in a
way. Since there are so many fewer sonification examples (though if the
idea of sonification were extended to linguistic forms, movie music,
sounds, electronic alerts, etc., there might be a lot more to consider
and discuss), it would probably be a useful idea to construct a great
(Ed Child’s comments on my comments, and my further comments)
"Reading Ed’s wonderful precis…"
Ed correctly points out that the next stage in sonification, or perhaps the current, is to bypass the visual analogy entirely and think clearly about sound itself. This would seem to make Tufte’s work, in some sense, unnecessary, since it is completely devoted to the visual. My own thinking about this process, which certainly began long before I had ever read Tufte’s work, is so influenced by the latter, that I am happy to use it as a jumping off point.
"…we can hear the process for computing 1."
"Now, what’s the difference between piece one and piece two…"
"We don’t want to hear the Gaussian distribution so much as we want to use the Gaussian distribution to allow us to hear a new music."
"… I propose the term manifestation (but I’m open to suggestions)."
"…once established as principles they must of necessity be confounded or least manipulated by most working artists."
"… but there is no real reason to believe that we perceive visual time space in ways closely analogous to auditory time."
"The "measurement" of sound is a very interesting, and relatively unexplored, idea."
"It is often argued that music is some sort of by-product or even progenitor of language."
"All depiction is embellishment…"