The 111th AES Convention:
A Bit of the Technical Side

Contents

Caveat Reader

The 111th Convention of the Audio Engineering Society (AES) was held at the Javits Center in NY City 30 Nov to 3 Dec 2001. The AES conventions are perhaps best known as trade shows; coverage in the music press typically reports entirely on the commercial exhibits. But the convention is more than a trade show. Each day, activity starts well before the exhibit hall opens, and continues well after it closes. The Convention is in essence several simultaneous gatherings: a trade show spanning the spectrum of industrial participants in audio, a technical conference focusing on scientific and technological developments in audio in academia and industry, and a collection of meetings of various technical councils and standards committees. And believe it or not, there is also music!

The trade show will be adequately covered in the music and recording press. Online, you'll find excellent coverage of trade show highlights in Mike Rivers's AES report for the rec.audio.pro newsgroup. Mike has been providing such coverage of AES and NAMM shows for years now, and his reports are widely anticipated and very much worth reading. Steve Rochlin also has a collection of AES 111 photos very much worth going through.

Here I'd like to provide a glimpse of some of the rest of the AES convention. I write from the point of view of a participant in the technical sessions (I spoke on a new audio testing method, described below). But I must emphasize at the beginning that despite this participation, I'm largely an outsider to the audio engineering community. I've been building audio equipment since my teens, I engineer occassional local folk concerts, and I consult and do product evaluations for acoustic instrument pickup manufacturers through my small business, but my main career is as an astronomer (my day job, as it were!). But perhaps my perspective as an outsider will make this technical summary a bit easier to follow for others who are not themselves active in the technical side of the industry.

One of the neat things about the technical sessions is that whereas the trade show shows you what audio technology is like today, the technical sessions give you a glimpse of what it might be like tomorrow. I hope these notes provide the reader a bit of that glimpse.

Finally, I must also emphasize that I will cover here only a small fraction of what went on in the technical sessions. There were several parallel sessions, so no single person could possibly attend all the talks. And in any case, I like gauking at new gear as much as the next guy, so I only spent about 1/3 of my time in the technical sessions. I offer the vignettes below to communicate the flavor of what happened off the trade show floor, not as a thorough summary. I'll also cover some aspects of the trade show not typically covered by others. With these provisos, here we go!

Technical Sessions

Talks in the technical sessions were 20-30 min long. A lot can be covered in that time! Here I can only present a few highlights from selected talks to give you some sense of what was communicated. Most authors have written up their talks; you can request reprints from the authors. You can find abstracts for all of the talks (including the many I don't cover here) at the Convention web site.

Keynote Address: Science in Service of Art

As part of Friday's opening ceremonies for the Convention, Floyd Toole, vice president of acoustical engineering for Harman International Industries, spoke on "Science in Service of Art." Toole researches acoustics and psychoacoustics of sound reproduction, spent 25 years with Canada's National Research Council, and is a past AES President.

By way of explanation of the title, Toole stated that he sees music and movies as art, and audio as science. Audio engineers have the task of using science to serve the artistic process. Befitting his work with Harman, his talk focused on issues associated with evaluating loudspeakers. He feels that the audio industry is in a "circle of confusion" regarding loudspeakers. A loudspeaker is evaluated by listening to a recording that is made with mics, equalizers, and other signal processors that have themselves been evaluated using... loudspeakers! He also feels that the traditional distinction between equipment for creation (monitors) and appreciation (consumer speakers) is somewhat misguided. As an analogy, he pointed out that both painters and art gallery owners seek rooms that have "neutral" light (typically from a Northern exposure); this way patrons see the art as the artist created it, i.e., as the artist intended it to be seen. In the same way, tools used for audio creation should sound similar to those used by consumers to listen to audio (and vice versa). Toole argued that combined use of scientific measurements, subjective listening tests, and psycoacoustics relating the two, can break the circle of confusion and help assure that consumers hear the art the way artists intend.

Toole then turned to measurements. He first noted that we demand flat frequency response to high accuracy from audio electronics, but not from loudspeakers. Of course, it is harder to get flat response from speakers; also, the room plays a crucial role in the final response. But Toole feels the industry is far too lax in this regard. The EBU loudspeaker spec allows a wide tolerance of +/- 3dB from 20 Hz to 20 kHz (actually, it's even wider at the high end). Toole deemed the spec "rubbish," noting that "everything from junk to jewels can be found in those tolerances." Citing studies, he pointed out that the room exerts its personality predominantly at low frequencies, while the speaker controls the high end. The user can adjust his or her room to some extent, but the speaker is unalterable and sets what is heard above a few hundred Hz.

To correlate measurements with subjective evaluation, one must be able to conduct meaningful and repeatable listening tests. Harman has set up a "pneumatic shuffler" for doing quick, double-blind listening tests. The shuffler is hidden behind a screen at one end of a listening room, and in just three seconds can remove a pair of speakers from a listening position and put another pair in its place. Listeners can thus quickly compare speakers working in identical locations. Harman first carefully measured the frequency responses of several speakers (on axis I believe) in an anechoic chamber, and then brought them to the listening room to determine to what extent subjective tests correlated with measurements.

The results were very interesting! First up were 4 pairs of high-end audiophile speakers, varying in price from $8k to $11k. A couple had fairly flat responses, but the most expensive one, which has been very highly rated in the audiophile press, had a very irregular response. In the blind subjective tests, listeners unambiguously prefered the flat speakers to the colored ones. Their preference correlated with flatness, not price (nor with critical acclaim, evidently). (By the way, I don't recall what was said about who the listeners were, nor how many were used.)

At the other end of the consumer spectrum, next they measured 6 pairs of speakers in consumer "mini" stereo systems (integrated systems) ranging in price from $150 to $400. These had responses that were all over the map, often with deviations from flat exceeding 10 dB (and in one case 20 dB). Yet averaged together, the mean response was within 3 dB of flat above 50 Hz. These "low end" speakers are thus not systematically trying to produce a specific kind of response (e.g., loud bass). In Toole's words, "they are all aiming at the same target, just missing it in different ways." He saw this as further evidence that the industry should aim for flat response across the board.

He then showed measurements of monitor speakers. The good ones had both measured responses and subjective evaluations as good as audiophile speakers that cost much more. However, the bad ones were very bad. Thus the label "pro" or "monitor" on a speaker is no guarantee that it is superior to a consumer product. Through all these tests, Toole did not identify manufacturers. But for one noticably bad monitor curve, he described the monitor as having a 6" driver and being introduced ca. 1975. He described it as "one that even Kleenex can't cure," and declared that "speakers like these are no longer relevant." Hmm, I wonder what monitors he was talking about?

Finally, Toole turned to the room, and described work presented by Makivirta & Anet at a recent AES meeting. These authors measured the responses of many speakers in many different professional studio control rooms. The median behavior was within the EBU spec, again indicating no systematic preference away from flat. But in only half of the control rooms were engineers hearing speaker-plus-room response within the EBU spec. Most of the problems were below a few hundred Hz, and due to the rooms, not the speakers. Thus Toole sees the real problem in audio today to be the loudspeaker/room/listener interface. It is here that he feels future research should be directed. We need high resolution measurements (finer than 1/3 octave, especially for low frequency modes in small rooms), and acoustic treatment and EQ innovations to improve the listener's experience, both in the engineer's chair and in homes and cars.

Virtual Microphones

Athanasios Mouchtaris, of the NSF Engineering Research Center at the Univ. of S. California, spoke on "Time-Frequency Methods for Virtual Microphone Signal Synthesis." The motivation is 5.1 and other multichannel formats, and the desire to use them to enhance the listening experience for material originally recorded only in stereo. You can think of what they're after as a fancy kind of reverb algorithm, where the input is a stereo signal and the output is a multichannel mix mimicking what would be heard in a particular hall. The particular application they have tried takes L+R from an ORTF pair and produces a 10.2 mix. To create the algorithm, they measure what is actually heard in a hall at the position of the ORTF pair, and use statistical methods to create a DSP algorithm that creates from the pair a multichannel signal that comes as close as possible (by some statistical measure) to what a "10.2 mic" would record. It is in this sense that they are creating "virtual microphones." It's like a fancier version of the sampling reverbs we are starting to see now in both hardware and as plugins. Research is currently focussing on comparing various statistical measures of similarity between synthesized and measured responses, and correlating them to listener evaluations.

An Interdisciplinary Integration of Reverb

Barry Blesser, currently of Blesser Associates, spoke on "An Interdisciplinary Integration of Reverb." Digital audio got its start when Blesser, a teaching assistant for Francis Lee at MIT who had developed a digital delay for heart monitoring, suggested to Lee that he put audio through the delay. You may recall Blesser's name on an old AES paper in the early 70s describing the result: the first commercially available digital audio delay, Lexicon's Delta T-101. Soon after, Lexicon shifted their focus to pro audio; a new technology was born. Blesser's self-described "life's passion" is reverb and reverb simulation. His talk was a condensed version of a huge paper published in the current Journal of the AES that attempts to review and integrate research on reverb across many diverse disciplines. The paper is long, but evidently not long enough—Blesser is writing a book on the subject. He had far too much material to cover in his allotted 20 minutes, so his talk was anecdotal, offering some highlights from his article. Here are some highlights of his highlights.

A theme running through the first half of his talk was the need to use statistics more explicitly in understanding and modeling reverb. Most modeling is deterministic, relying on concepts such as impulse responses. But Blesser gave physical and pyschoacoustic arguments that randomness plays a significant role in creation and perception of reverb. On the physical side, he argued that complications such as thermal effects (not to mention audience presence!) mean "there is no such thing as the impulse response of a room." Due to these influences the room response changes slightly from moment to moment, especially at high frequencies. He also cited research on psycoacoustics that suggests that our hearing system processes sound, not by doing a spectral decomposition, but by analysis of the envelope of sounds. Once a room has more than two significant modes (and most rooms do!), the envelope of a sound in the room takes on a random character. For these reasons, Blesser feels that the impulse response approach to creating reverb is inherently flawed. Given the current fuss over sample based reverb algorithms (which use a "frozen" sampled impulse response), this was a fascinating claim. Blesser feels a perceptually realistic algorithm must have a random component in its response. Blesser also stated that he feels it is more sensible to describe rooms via resonances (some with random character) than via an impulse response.

Next he talked about acoustic perception of space. He argued that "humans are not ambulatory ears;" that the purpose of audio (as part of a whole, immersive sensory experience) should affect how audio is produced. He cited research indicating that the nervous system may combine audio and visual information well before high levels of awareness. As a result, audio for film must be handled very differently from concert audio or audio for listening. On the other hand, he noted that there is a huge amount of spatial information just in what we hear, and that with sufficient experience we can learn to "hear spaces" with astonishing detail. He told the story of a group of blind bicyclists in California who have learned to "visualize" their environment acoustically so well that they can actually bike in unfamiliar locations without serious accidents (I'd love to see more documentation of this; I've found only one online mention of it). "You can learn to hear the space."

Continuing this thought, he then told some anecdotes about our ability and need to learn to hear. Perceptual studies have repeatedly verified that high quality perception can indeed be reliably developed, but that it takes a long time (and thus also significant motivation). Yet listening evaluations in the audio industry typically offer listeners low exposure and low motivation. Blesser told a story of developing a reverb algorithm, adjusting it to have minimal flutter. He would tweak it to "perfection," but after listening to it for a while would start hearing imperfections that would nag him. He'd demo the algorithm to his colleagues, who could not hear the problem. Eventually they forced him to call it quits and send the algorithm to market. It met with an enthusiastic response. But six months later, heavy users were reporting that the equipment was degrading. Digital equipment does not "degrade"! What was happening was that they had learned to hear better, so that imperfections that were not noticed at first eventually stuck out like sore thumbs.

Blesser works in the Boston area, and in another anecdote spoke of his evaluation of the acoustics of the renowned BSO concert hall. Despite the overall excellent quality of the hall, 30% of BSO seats are "acoustically horrible." Yet many of those seats are in top-dollar locations, and the patrons do not complain. They have untrained ears. Blesser sees this as reflecting a "cultural hostility" to acoustic sensibilities. He told of going to a very expensive, top-rated restaurant, where the food and place settings and visual environment were highly refined. But the acoustics of the room, even just for having a conversation, were horrible. Acoustic quality is simply not valued in our society. Blesser wondered aloud why this is so—and left us all wondering ourselves.

This is just a glimpse of his far-ranching and provocative talk. See the current JAES for more!

Listening Room Simulator

Francis Rumsey of the Inst. for Sound Recording at the U. of Surrey (UK) described the ongoing thesis work of his student Amber Naqvi on "The Active Listening Room Simulator." Many acoustics departments and laboratories have a listening room or two, but such rooms are obviously expensive and take up a lot of space, so it is not feasible to have a large variety of them. Yet real-life listening rooms vary greatly in their properties. Naqvi's thesis work attempts to actively modify the acoustics of a single room to duplicate significant features of the responses of many rooms so that loudspeaker listening tests can be made under a wide variety of conditions. The way it works is that a set of absorbtive panels are placed in key locations in the available room, to block low-order reflection paths from the walls, floor, and ceiling. Part of the research involves using computer simulations of the room behavior to help locate the panels. In addition, each panel is active, containing a flat array of transducers. The electrical signal sent to the speakers under test gets processed and sent to the active panels, with the processing (a series of delays and EQ implemented in DSP on a computer) designed so the sound from the panels mimics the early reflections of the room one wants to simulate. So far they have worked with 6 panels (one each on the floor and ceiling, and 4 oriented vertically), and verified that the algorithms they use to set up the array produce a measured response in the room that does indeed have the main features of the room they try to simulate (they use waterfall plots for the comparison). Research continues on more careful evaluation of the technique, and extending it to more panels. No mention was made of subjective evaluation; I wonder if the modified room really sounds like the simulated room to a trained listener.

Ambiophonics

Ralph Glasgal from the Ambiophonics Institute across the river in NJ spoke on "Ambiophonics: Achieving Physiological Realism in Music Recording and Reproduction." Quoting from the abstract, "Ambiophonics is the logical successor to stereophonics, 5.1, 7.0, 7.1, 10.2 or Ambisonics in the periphonic recording and reproduction of frontally staged music or drama. The paper shows how only two recording media channels, driving a multiloudspeaker surround Ambiophonic system, can consistently and optimally generate a 'you are there' soundfield...." Bold claims! Glasgal is an engineering physicist who has worked in psychoacoustics. He came up with ambiophonics on his own, but found that most of the key elements of the idea have been buried in the psychoacoustics literature. As a result (and this was news to me) ambiophonics is in the public domain. He pointed to the Ambiophonics web site for more details. I had heard a bit about Ambiophonics online, and went to the talk hoping to hear enough further about it to know whether to take it seriously or not. I have to say that I left the talk feeling somewhat frustrated. I suppose I should spend some more time at the web site; there is a lot of information there.

Glasgal began his talk with a completely sensible list of problems with stereophonic audio. He made a case for there being significant spatial information in two recorded channels, information beyond what a simple 2-channel playback could tap. His main argument was the point that we can localize quite well with just one ear; there is thus significant spatial information in one "channel." He demonstrated this with a fun experiment where he had the audience stand with eyes closed, and point to a loud metronome that he carried across the front of the room. Everyone successfully pointed at the metronome. Then he had us plug one ear and repeat the experiment; again everyone (except one unfortunate audience member!) correctly located the source. It was a fun experiment, though it seems to me not to offer a compelling case that a single channel of a stereo recording contains similar spatial information to what a person's ear provides.

Glasgal then gave a description of the equipment comprising Ambiosonics (which derives its name from "ambient" and "sonics"). The main ingredients are an ambiopole and ambiovolver. The ambiopole consists of two speakers in front of the listner, only 10 degrees apart. As a result of the close spacing, the angle of incidence for forward sound is correct for both ears, and there is no "head shadow." The ambiovolver is a DSP that calculates ambience signals for surround speakers. An optional listening component is a set of surrstats, large electrostatic speakers to mimic reflections from concert walls. Finally, an optional element for recording is a 4-channel ambiophone. Roughly speaking, this consists of a stereo head mic facing the source, with an absorbtive panel behind it and a second stereo head mic behind the panel for recording ambience. However, the claim is that ambiosonics works very well with most existing stereo recordings; the ambiophone is optional.

Unfortunately, though Glasgal gave some sound arguments explaining the limitations of stereo, he never really explained what ambiophonics was supposed to do: why exactly the ambiopole has the format it does, what happens in the abiovolver and why, etc.. So while successfully knocking down stereo, he never made a case for ambiophonics. Listeners were invited to the Ambiophonics lab the next day for a free demonstration. It was a generous offer, including free bus transportation and lunch. However, participants would have to leave the convention for over four hours. It was more time than I was willing to give, especially since I was given no real insight into why it should work. A better tactic would have been to have a room in the convention center for demos of the system. I hope anyone who went on the Ambiophonics excursion will post their experiences on rec.audio.pro.

Bayesian Harmonic Analysis for Audio Testing and Measurement

This was my talk; I'll try to be brief about it! My expertise in astronomy is in statistical analysis of data, particulary with time series data—samples of some process in time. The connection with digital audio is hopefully obvious! I've taken some algorithms in use in physics and astronomy for analyzing signals that contain pure tones (sine waves) at various frequencies, and applied them to tone-based testing in audio. They are based on the so-called Bayesian approach to statistics (the original approach adopted by Laplace, Gauss, etc.), hence the name Bayesian Harmonic Analysis (BHA). I described application of BHA to measurement of frequency response, total harmonic distortion, and intermodulation distortion; other types of measurement are also possible. The bottom line is that these algorithms let you make these measurements very accurately using math related to what is done in conventional FFT analysis, but using information in the FFT that is ignored in conventional methods. (For those familiar with the FFT, conventional methods use its magnitude, but BHA uses the real and imaginary parts separately, and uses values of the discrete time transform between Fourier frequencies.) As a result, some measurements that so far have required special hardware and proprietary techniques such as synchronized sampling and frequency shifting could potentially be performed with a good sound card and a home computer. So far I've applied it only to simulated data. Such data lack real-world complications, but have the important virtue that we know what the true signal is! This lets one verify that the algorithm returns results that are close to the truth. Future work will have to address some issues with treatment of complicated noise spectra and other real-world complications.

Measurement, Analysis & Visualization of Listening Room Response

I do some product evaluations for a Finnish company (EMF Acoustics/B-Band Pickups), and through them I've learned of the acoustics lab at Helsinki University, with whom they have an ongoing relationship. Well, the Helsinki group was present in force in the technical sessions, giving many papers. I was only able to hear a few. In this one, Juha Merimaa described work on "Measurement, Analysis & Visualization of Listening Room Response." This was a really cool talk. The goal is to measure as completely as possible the response of a room, and present it visually in a way that can be easily interpreted. The problem is that it is a very high dimensional problem: at each point in the room, the sound field has a direction (3 variables) and loudness (another variable) that can vary with frequency and time (two more variables!). It is hard to measure all of this, and equally hard to interpret the resulting huge pile of data.

To do the measurements, the group uses a 3-D mic probe made up of 12 omni mics in pairs along each coordinate direction (X-Y-Z) and with various spacings (from 1cm to 1 m, if I recall correctly). This mic probe lets them measure differential directivity and intensity wherever the probe is placed, with good frequency coverage (thanks in part to the variety of spacings in the array). As a source, they use a 30 cm dodecahedron with speakers on each face, approximating an omnidirectional source. To visualize the data from a measurement, they make a time-freqency plot (time is horizontal, frequency is vertical; somewhat akin to a waterfall plot). Throughout the time-frequency plane they use color to code the (scalar) SPL measure of sound intensity, and arrows on a grid to code the (vector) direction the sound wave is traveling. For a particular measurement (placement of source and probe in a room), they show two plots, one showing two components of the directions in the meridian plane (the vertical plane cutting you in half between the eyes as you look at the source/probe) and the other showing the components in the horizontal plane. They showed plots where the frequency axis was uniform (as from an FFT analyzer), and where it was perceptual (based on 32 ERB bands calculated using a filter bank).

After a little explanation, the resulting plots were incredibly informative! One could clearly see early reflections, and not only was their frequency response apparent (from the color coding), but you could see that one was coming from the floor (arrows pointed up), the next from the ceiling (pointing down), etc.. You could clearly see the reflections evolving with time from being discrete and directional, to diffuse in time and direction. It looks like a very neat tool that might prove useful for diagnosing room problems, or for helping us understand what makes some rooms and halls sound good and others not so good.

Room Equalization Using Fuzzy Logic

Sunil Bharitkar of the Univ. of S. California (again!) spoke on "New Factors in Room Equalization Using a Fuzzy Logic Approach." The problem he addressed is how to best equalize a room so that the sound is as good as possible everywhere. The bad way to EQ a room is to measure what you hear at the board with an RTA and compensate accordingly with your EQ. This improve things at the board, but the adjustment may make the sound worse elsewhere in the room. So one should measure the response in several places, and then try to adjust the EQ in some way to optimize the response everywhere. Papers on this topic have appeared in AES proceedings before, the typical approach being to average the measurements and compensate to the average. What is new in Bharitkar's approach is that he takes a more mathematically sophisticated approach to finding the optimal curve. Rather than merely average, he uses statistical techniques to find a smallish set of underlying responses (think of these as "classes" of response) to which each measurement belongs to a greater or lesser degree. He then uses fuzzy logic to take the actual measurements and determine a weight that measures how much each one resembles one class or another. Finally, he uses these weights and the class responses to find an optimal overall response correction. As a statistician myself, I thought the approach was a bit too statistical, and ignored aspects of the problem I thought should play a more prominent and explicit role. In particular, I felt there should be some kind of volume weighting involved. In the example given, the measurements were spread very unevenly through a room. Somehow, one should be taking into account the geometry of the measurements, and how much of the room one thinks sounds like one measurement or another, in finding the global optimum. Well, the proof is in the pudding, I suppose! They have done only one listening test so far, which found significant improvement over the standard approach, but also revealed a low-frequency artifact in the corrected response that they are still trying to understand.

The EIA CD of Test Signals for Loudspeaker Power Rating

Don Keele of harman/Becker Automotive Systems spoke on "Development of Test Signals for the EIA-426-B Loudspeaker Power-Rating Compact Disk." Keele has previously worked at EV, Crown, Kipsch, and JBL, writes speaker reviews for Audio magazine, and seemed especially proud to share an office at Harman with Dick Small(!). He only recently came to Harman, and his first big task there had to do with the EIA-426-B revised standard for loudspeaker power ratings, power compression testing, and distortion testing. Part of the revision includes the production of a CD containing the test signals for the standard. Keele thought this would be straightforward, but there were quite a few fascinating subtleties involved, including one that identified an inconsistency in the standard. His talk told the story of making the CD, and then demonstrated it. This was a fun talk!

One important revision in the standard is that the notion of speaker power has been redefined in a fundamental way. The standard no longer refers to the power of a speaker, but rather to the maximum power of the amplifier one should use with the speaker. This may be largely a semantic change, but the new language seems to make a lot of sense.

The CD includes all the EIA test tones, but there was space left and Keele used it cleverly to make the CD much more useful. The EIA section starts with a 1 kHz calibration tone for level setting, and then includes spectrally shaped noise for accelerated life testing, variable-rate sweeps for power compression testing, and pure tones for distortion testing at 1/3 octave spacing from 20 Hz to 5 kHz. The bonus tracks include more pure tones covering 6.3 kHz to 20 kHz, and shaped tone bursts at 1/3 octave spacing from 10 Hz to 20 kHz (with different burst rates in the L and R channels). As a result, with just the CD and an oscilloscope (and your ears!), you can do fairly sophisticated measurements, such as frequency response, phase response, and to some extent harmonic distortion.

The shaped noise was the hard part. Its spectrum is designed to mimic that of common program material; a speaker must be able to handle it for 8 hours at half the rated power with no degradation in speaker properties. But it turned out to be hard to actually create noise with the spectrum given in the spec. The swept tones play a similar role; the sweep is supposed to have the right behavior so that the time-averaged spectrum is the same as that of the shaped noise. Again, it turned out to be nontrivial to find the correct functions describing the necessary sweep rates as a function of time (they are suprisingly complicated).

One cute thing Keele demonstrated (using the sound system in the room) was how one could use the tone bursts to subjectively evaluate distortion. Since they are short bursts and not continuous tones, you can actually listen to the speaker being overdriven even at low frequencies without worrying about overheating the driver, as might happen if you used a continuous tone. Though to prevent us from going deaf, for the demonstration we listened to the board gradually being overdriven.

The CD is available from ALMA at a cost of $50 for ALMA members or $100 for nonmembers.

Neumann's Digital Mic

Mike Rivers provides a nice description of Neumann's new Solution-D digital mic in his exhibit report. Neumann presented a paper with technical details about the mic in the technical sessions (I believe the presenter was J. Wahl). This is the first commercially available mic implementing the new AES-42 digital mic standard. This standard specifies not only how the mic provides digital audio information to a receiver (e.g., a console), but also specifies a protocol by which the receiver can query the mic (to identify it or determine its status) and send information to the mic (controlling functions in the mic). The protocol is rich enough that very sophisticated functionality can be built into the mic and controlled remotely by the engineer (or even via automation).

For the Solution-D, the challenges Neumann faced were manifold: implementing significant functionality to take advantage of the standard; creating a system that recognizes that the standard is not yet widely implemented and thus remains compatible with existing consoles; and finally, creating a stellar microphone that upholds the Neumann reputation for audio quality despite current technical limitations in digital audio circuitry capability.

Regarding the audio quality, the mic has a 133 dB dynamic range, beyond the capability of standard ADC chips (yes, 24 bits corresponds to 144 dB, but off-the-shelf chips are not capable of anything near that). As a result, Neumann had to develop an innovative dual-converter topology to handle the dynamic range by processing low level and high level signals separately.

For compatibility with existing systems, Neumann has made the mic a two-part system: the D-01 mic implementing the AES-42 standard, and the DMI-2 digital microphone interface that communicates with the mic via AES-42, but that communicates with a computer or console via a standard interface (e.g. AES/EBU for the audio). The controlling computer runs their RCS remote control software providing a channel-strip-like graphical interface for controlling the mic.

Finally, to take advantage of the breadth of the AES-42 standard, Neumann put a huge amount of functionality in the mic itself. A DSP with significant processing power is in the mic, implementing a complete channel strip with mic-specific enhancements. From the console, the engineer can remotely switch between 15 mic patterns and control an adjustable gain stage after the ADC, an adjustable low cut filter, an adjustable pad, customizable EQ, a fast transient high frequency limiter (essentially a de-esser), mute, polarity, and red and blue LEDs on the mic to signal the talent.

There was lengthy discussion following the paper, largely about an aspect of the AES-42 spec that has proved highly controversial in the AES committee. The committee split 50-50 on whether the spec should require use of "standard" mic cable with XLR jacks, or require some other cable and connector. Half the committee was of the opinion that "we've made this mistake long enough," and wanted to require a new connector. The other half wanted to stay with XLRs so that studios and stages would not need to rewire. Presently the standard specifies "high quality" mic cable. A high bandwidth digital signal must go down the cable. For short runs, most good mic cable will do. For long runs, there could be problems. The signal requires 110 ohm balance cable, but most real-world mic cable measures at 60-80 ohms and will distort the signal over long runs. Neumann tested the mic with 100 m runs and had no problems, but admitted that further tests were needed. They are relying on field tests with a first batch of mics to get them data on this issue.

At the opening ceremonies, it was announced that AES standards, which until now one had to purchase from the AES at a high price, are now freely downloadable from the AES web site. AES-42-2001 is among the standards now available this way. Check the web site for more about this great new policy.

MPEG-21: Intellectual Property Management for Digital Media

Gabriel Spenger from the Fraunhofer Institute for Integrated Circuits spoke on "MPEG-21—What does it bring to audio?" Germany-based Fraunhofer is probably best known as the folks holding the patents for the audio compression algorithms underlying MP3. They do a variety of work on algorithms for digital media, and are active participants in many standards committees, including several MPEG committees currently finalizing new digital media standards. MPEG-21 is among these.

Spenger began by outlining the history behind MPEG-21 and its relationship to existing standards. MPEG-1, which was finalized in 1992, was the first standard for low bit rate audio. In 1994, MPEG-2 extended MPEG-1 to include video as well as audio; MP3 is part of the MPEG-2 standard ("MPEG-2 Layer III"). MPEG-4 added object-oriented features and scalability to MPEG-2. The latest standards in this chain are MPEG-7 (see below) and MPEG-21. MPEG-7 is concerned with describing digital media content, and MPEG-21 is concerned with managing intellectual property rights for such content. (See MPEG Starting Points for pointers to further info about various MPEG standards.)

Rather than describing specific algorithms, MPEG-21 instead standardizes a structure or framework for secure delivery and consumption of digital media, ensuring interoperability of many devices. I must confess I hadn't realized how complicated this could be until hearing Spenger's talk. Such a framework must be extremely flexible, and has to allow communication between many different agents.

Regarding flexibility, consider the diverse ways media can be licensed. Just a few examples using current technologies: a book may be read and resold; a CD may be played but not copied; a video tape of a movie may be rented but not copied; a public broadcast may be subscribed to for a limited period of time; some media is available for free at low quality (low bit rate) but the hi-quality format must be paid for; etc.. MPEG-21 allows for all of these types of licensing for digital media, as well as other types that are specifically relevant only for digital media (e.g., copy once to a portable device; rent 10 playbacks; purchase copies for 10 friends and get a copy free).

The examples above describe licensing for a simple, pre-existing "digital item" (the MPEG-21 term for a generic multimedia item). But MPEG-21 is flexible enough to allow one to create custom multimedia collections of various types of licensed content. As an example, Spenger described how one might assemble an electronic gift book with a "Harry Potter" theme. It might include text, audio clips, video clips, etc., all from different providers and with different licensing restrictions. MPEG-21 will allow one to transparently create and deliver this kind of content.

To accomplish this requires communication between content providers, consumer devices, delivery services, financial services, and technical services (e.g., authentication services). Even with a simple digital item, the network of communication that must go on is pretty complicated. MPEG-21 is essentially an e-commerce platform that allows this communication to take place securely and transparently.

A few details: An MPEG-21 digital item is a structured and hierarchical digital object comprised of one or more elements of digital media. MPEG-21 standardizes a language for describing digital items: DIDL (Digital Item Description Language), an XML-based markup language. The actual representation of the digital item can vary across the elements comprising it, and can include existing formats such as MPEG-4, JPEG, ASCII text, etc..

The MPEG-21 framework includes hooks for a wide variety of intellectual property management schemes, including encryption and authentication, digital watermarking, and digital fingerprinting.

The framework describes protocols for networking, allowing transactions between all possible pairs of agents requiring communication (providers, licensors, financial institutions, etc.). This will allow "anywhere, anytime" access to content that the consumer holds a license to. For example, a user might hold licenses for a music collection that they access at home through their stereo or on the road through a wireless phone. The networking protocol must have fairly sophisticated event reporting capability; for example, it must be able to refund a consumer if delivery is interrupted.

There are existing e-commerce platforms in use or development from such major players as Microsoft and Real Networks. MPEG-21 does not replace these; it is a much broader framework, and these existing platforms merely provide pieces of the puzzle that MPEG-21 assembles. Along similar lines, the EIA has already developed a standard for downloadable security (OPIMA), which so far the industry has ignored. Parts of MPEG-21 were influenced by OPIMA, but MPEG-21 is much broader, and the hope is that its high level of integration will lead the industry to adopt it.

In discussion following the talk, it seemed that many listeners felt providers would like MPEG-21 because it would make obtaining compensation for use of intellectual property easier. But the real issue will be to what extent consumers will find it appealing. One discussant argued that this will depend on how the industry uses it. It could be used to make consumption easier and much more flexible; or it could be used to constrain consumer options. As this discussant saw it, if the latter approach is taken, MPEG-21 will likely go the way of OPIMA; but if the former approach is taken, "this could be the end of the CD as we know it."

MPEG-7: Advanced Audio Identification

Oliver Hellmuth, also of Fraunhofer, spoke on "Advanced Audio Identification Using MPEG-7 Content Description." MPEG-7 is a framework for description of the audio, visual, and generic properties of digital media. Descriptive information can include metadata such as the title and artist for the media, content-oriented metadata such as lyrics, and signal-derived data called "low level descriptors" (LLDs). The standard does not specify exactly how one should derive the descriptor from the signal, nor how one should use it. It merely standardizes the definitions and formats for several types of LLDs. For audio media, it standardizes an "audio fingerprint" format. But it is important to note that though it describes how to store the fingerprint, it does not describe how to use it. Straining the fingerprint metaphor a bit, it tells you how to store a photo of a fingerprint, but doesn't tell you what to do to match one fingerprint to another (what details to pick to define a match). Such algorithms will presumably be proprietary to companies that develop them.

Few details were given as to what comprises the fingerprint, and I must admit I was confused as to how much of the algorithm for deriving the fingerprint from a signal is open and in the standard, and how much is proprietary. It includes summaries of the spectral envelope of the audio with enough detail to allow matching of segments from within the media. You'll have to read the full MPEG-7 specification to learn the details.

Most of the talk described and demonstrated the proprietary algorithm Fraunhofer has developed for quickly matching a possibly corrupted or distorted audio excerpt against a database of audio fingerprints. They have demonstrated such algorithms before, but they have revised them to take advantage of the MMX instruction set to speed them up by a factor of 10. In the demo, Hellmuth played segments of intentionally distorted audio (e.g. bandwidth limited, or with low bitrate MP3 encoding) into a laptop containing a database of fingerprints of 30,000 pop tunes. Within just a few seconds, the algorithm could identify the correct tune or determine that the tune was not in the database. It is highly reliable, providing a correct match well over 99% of the time. Uses include allowing consumers to quickly identify (and purchase!) music they hear on a broadcast, or helping broadcasters log their broadcasts.

Analog Devices' Integrated DAC and DSP Chip

Bob Adams from Analog Devices described "A single-chip three-channel 112-dB audio DAC with audio DSP capability." Chips that combine some conversion capability with some DSP capability already exist. For example, Texas Instruments has had chips available for a few years that contain a CODEC and DSP implementing multiband EQ and compression. However, the chips I know of are 2-channel and, despite having 20+ bit interfaces, have a dynamic range below what 16 bits is capable of (96 dB). The new chip from AD appears to me to offer capability significantly beyond what is currently available, in terms of both audio quality and DSP capability. Its development was driven by the needs of "midrange" consumer applications: auto, PC, and "boom-box" audio. But the specs are good enough that I could imagine several pro audio applications.

The fundamental problem with such chips is that high-end DSP chips use very fine-pitch semiconductor technology that does not lend itself to design of good converters. Thus presently one cannot put a good converter and a good general-purpose DSP on the same chip. But if one targets the use of the device—in this case to audio—one can implement a specialized DSP suitable for the intended purpose, but not so sophisticated that it requires semiconductor processes that would compromise the converters. That's what AD is doing for this new Sigma-DSP family of chips. The first part in the series is the AD1954.

The chip has 3 channels of 112 dB DAC, using some of their best technology. The intent is for L-R-Subwoofer use, but they can be used in other ways. The DSP is capable of 24 MIPS and runs synchronized to the converters at 512 fs (you can do 512 DSP instructions per sample). It is optimized for FIR and IIR filtering and dynamics processing. For the latter, it includes hardware acceleration for linear-decibel conversion, for example. The data and coefficient formats are very intelligently chosen: the data path is 26 bits in 3.23 format, with two extra bits to the left of the point to prevent clipping. Coefficients are 22 bits wide in 2.20 format, allowing the +/-2 range that is most convenient for IIR filters.

The device boots into a very flexible default system that can be easily adjusted with a graphical interface on a PC. The default L and R channel signal chains are: HPF, 7-band biquad filter, 2-band crossover filters, optional "PHAT" stereo widening, 0-2.3 ms of delay, level detection (for dynamics processing), 0-3.7 ms of additional delay, interpolation, and finally the DAC. The output of the crossovers is summed and sent to the third channel. Dynamics processing is by a look-ahead algorithm, is two-band, and can have an arbitrary compression curve. Parameters can be adjusted in real time.

That's just the default setup! The chip can be programmed more generally with a graphical compiler that is based on Orcad schematic capture. AD has defined schematic symbols for the major function blocks. The user builds a system using these symbols, filling in parameters in each symbol (e.g., describing a filter response or compression curve). One then writes a SPICE netlist that AD's compiler takes as input to generate the code to download to the chip to implement the design. The goal was to make DSP-based design appealing to engineers who are familiar with analog methods, but not DSP methods. A very clever approach!

Trade Show: The Stuff Behind the Front Panel

I own a bunch of commercial gear, but I also like to build equipment for myself. A very pleasant surprise for me was discovering that the AES trade show has displays from a significant number of vendors supplying the stuff "behind the front panel," ranging from "raw ingredients" like cables, connectors and ferrofluids (for speakers) to converter chips and pre-programmed DSPs. There are also numerous vendors suppyling test and measurement equipment targeted to everyone from electrical engineers building prototypes to sound engineers assembling an audio system to consumers just needing good metering. Here is a sampling of some of the exhibits that might interest audio DIY practitioners.

Toroidal Transformers

I learned of two suppliers of toroidal transformers previously unknown to me (one of which is indeed a new vendor). Keen Ocean Industrial Limited is based in Hong Kong and accepts orders for a minimum of 100 pieces. Quotes take 48 hrs and samples take 5 days. More DIY-friendly is Plitron Manufacturing in Toronto, which will provide single transformers. They had a huge transformer on display that was almost three feet across and weighed over 1000 pounds! When one visitor asked what it was for, the Plitron rep replied, "to show at trade shows!"

LED Bargraph Displays; Metering

I've often looked for an audio-appropriate modular LED bargraph. It's easy to find ones that are all red or all green; but for audio use it would be nice to get one that has, say, 6 green segments, 3 amber ones, and a red one. Well, Prime LED has a line of audio-specific LED bargraphs that go a lot further than this! Their bargraphs have tricolor LEDs in each segment. They sell the "bare" bargraphs in various sizes and formats from 10 segments to 53 segments. They also sell cards with bargraphs and pre-programmed DSPs that implement very sophisticated metering capability. Various metering scales and ballistics are possible, including a variety of simultaneous modes made possible by the tri-color nature of the bargraphs. An example they had at the booth had a standard 3-color format peak-hold meter, where the display constantly shows the current level with the peak segment staying lit for a few seconds. But in addition, a red segment simultaneously "floated" over the peak-reading display indicating VU. Very cool!

More sophisticated metering seems to me to be an emerging trend. DK-Audio offers a line of very sophisticated self-contained oscilloscope-like meters (the display is an LCD, not a CRT) with capability ranging from straightforward level metering to X-Y scope functionality and FFT and 1/3 octave spectrum analysis. And Metric Halo had the latest version of their SpectraFoo software on display (version "Radical 3," as in "square root of 3" or 1.732...), which offers even greater flexibility if you have your computer and monitor available for metering. Metering definitely appears to be a growing market.

Cable and Connectors

There were several cable and connector manufacturers present. A few items that caught my attention: Neutrik has a new BNC connector that you don't have to grab and twist; just press it in place and it locks. Marshall Electronics (yup, the same folks behind MXL microphones) are affiliated with Mogami Cable and Tajimi audio, video, and fiber-optic connectors. Tajimi typically only deals with large companies and large quantities, but through their relationship with Marshall one can get their connectors in small quantities. They offer some connectors that aren't easy to find elsewhere. Finally, a new company to me was Vimex International. They have a nice line of connectors. In particular, I have been looking for some time for RCA connectors that do not have the shield automatically tied to the chassis. Vimex has them; Tajimi also carries a similar product. Finally, Gepco announced a major overhaul of their web site, providing more detailed, enduser-oriented information about their cabling.

THAT Corporation App Notes (Phantom Power, etc.)

THAT Corporation, had a booth where visitors could talk with reps and grab copies of numerous application notes, including the very nice "Applications Notebook Vol. 1" that they published last year. Of particular interest to me was a reprint of a paper presented at the May 2001 AES Convention titled "The 48 Volt Phantom Menace" by Gary Hebert and Frank Thomas. THAT engineers received reports of equipment from various manufacturers failing when line driver outputs were connected to microphone preamp inputs that had 48 V phantom power activated. The paper describes both a simple SPICE model and measurements showing that currents of up to 3 A can flow through components in the line driver or mic preamp when a connection is made; the transient lasts about a millisecond. It occurs because the DC blocking capacitors used with solid-state mic pre input stages are large and store a lot of charge. When a connection is made, the end of the capacitor at the connector has its potential suddenly changed from +48 V to ~0.6 V (a semiconductor junction potential drop to ground) or ~14 V (a junction drop below the supply voltage of the connected device). Since the potential across the capacitor cannot instantly change, the potential at the end near the mic pre input stage suddenly changes from ground to -48 V, and large currents flow through junctions in the device. Similar problems arise in the line driver if it has AC coupling capacitors on its output. Most mic preamp chips can only stand ~1/4 A through their inputs without damage; this can be easily exceeded by connection transients. The traditional approaches to protecting against these transients involve connection of back-biased small signal diodes (e.g., 1N4148) or zener diodes to the mic pre inputs. With both simulations and experiments with a variety of circuits, the authors show these approaches often fail to protect devices. Sometimes the devices fail catastrophically. That's when you're lucky! Unfortunately and more insidiously, a common failure mode is damage to the input stage or to the protection diodes that compromises performance, raising distortion to a few tenths of 1%.

Fortunately, Hebert and Thomas also provide a solution to the problem. If one replaces back-biased 1N4148 diodes with Schottky diodes (they use the SB160), much superior protection results because the Schottky forward voltage drop is much smaller than that of P/N junctions in the device, so the junctions in the protected device never fully turn on. The main limitation of the approach is that it does not protect against overvoltage. If this is a concern, better protection is possible using a Schottky diode bridge to a pair of TVS diodes (transient voltage suppressors). The TVS diodes protect against surge currents, while the Schottky bridge isolates the mic pre from the nonlinear capacitance of the TVS diodes. This adds only 4 components to the typical protection scheme (two TVS diodes and two current limiting resistors); these components can be shared by multiple input stages.

For more sophisticated protection, they also describe an input topology that lets one use much smaller input capacitors yet still have low input impedance and good noise performance (less than 0.5 dB worse than with standard topologies). It uses a DC servo to control offsets due to the larger bias resistors required by the approach.

These issues mainly concern inputs on consoles, which sometimes get connected to balanced gear other than mics. For standalone mic pres that are used only with mics, this particular failure mode is less relevant. For more information, contact THAT and ask if they have reprints of Convention Paper 5335 (110th Convention) still available.

Again on the topic mic pres, the long-awaited THAT 1510 chip (the pin-compatible improved replacement for the discontinued Analog Devices SSM 2017 preamp chipped used in mic pres by Symetrix and Rane) is still not yet available. But samples should be ready by Q1 '02, and production should start by the end of Q2 '02.

Test & Measurement Equipment

There was a large number of exhibits from companies manufacturing test & measurement equipment. These ranged from simple cable testors to devices that used laser doppler vibrometry to measure speaker performance. Mike Rivers's AES review describes Neutrik Test Instrument's new Digilyzer DL1, a remarkably small, handheld digital audio analyzer. One innovative piece of gear that caught my eye was AudioControl Industrial's Iasys Electro-Acoustic analyzer. This product targets a specific niche: audio contractors who need to make fairly sophisticated audio measurements but who need a "high level" interpretation of the results. So rather than providing you with a breakout box and software that plots 1000 different types of measurements in 3-D multicolor displays on your PC, Iasys is completely self-contained (it comes with a measurement mic), and summarizes the results of its measurements in a large but manageable LED dot-matrix display in terms of recommended settings for your gear. As but one example of its capabilities, you can use it to measure the outputs of various drivers, and it will calculate optimal crossover points, alignment delays and levels for you. More at their web site.

Danville's DSP Function Module

Okay, Danville Signal Processing was not an exhibitor. But you can learn an awful lot just hanging out at Dan Kennedy's Great River booth! I dropped by to chat with Dan while his friend from Danville was by, and learned of their series of DSP Function Modules. Each module is a stuffed printed circuit card containing an AC-97 (the PC audio standard) stereo 16 bit codec, an Analog Devices ADSP-2100-series DSP chip (50 MIPS), and a PIC microprocessor. It has headers that provide analog I/O, digital audio I/O, and serial communications to the board. The user develops DSP code that gets sent to flash memory on the board via a serial interface (there is support for RS-232, -485, LVDS and USB). On power-up or reset, the PIC bootloads this firmware into the DSP which takes over control, using the PIC to help with I/O duties. The whole board is not much bigger than a credit card. It is targeted for users that are not DSP experts, to give them a drop-in system configured for digital audio that is straightforward to develop for. Applications include audio filtering, signal generation, modems, delay lines, and custom test instruments/analyzers.

Trade Show: An Acoustic Musician's Perspective

I'm an acoustic guitarist, and my audio work is mostly targeted to the live performance needs of acoustic "folk" musicians, particularly acoustic guitar amplification. My gear gauking was driven largely by these interests. Also, I've been keeping my eyes and ears out for good mics for recording vocals and acoustic guitar in my home studio, and I've been shopping for multitrack audio software. Here is a brief survey of gear on the exhibit floor that caught my eye.

Future Sonics Generic Ear Monitors

I didn't actually get to see this piece of gear, but it was announced in the daily newspaper the AES publishes during the show, and it's something I wanted to see. Future Sonics, a highly respected name in the ear monitor industry, announced the Ears model EM3 universal fit ("generic") ear monitor. Previously, Future Sonics specialized in ear monitors that required a custom earmold. These new monitors appear targeted to the same audience that uses monitors such as the Shure E1 and E5, the Garwood M-Pack, or low-end monitors from companies like Nady and Radio Partners. They are available alone or packaged with Sennheiser's Evolution 300 wireless system. Alone the cost is $198, putting them right between the Shure E1 and E5. Given Future Sonic's reputation, I am very curious to see how these compare to the competition.

Ear Q Hearing Test System

There were several booths addressing the most treasured possession of audio engineers and musicians alike: the sense of hearing. The House Ear Institute offered free audiologist exams in a soundproof remote exam truck during the show; exam times were completely booked early each day. Both they and H.E.A.R. gave out hearing protection advice and literature and free foam earplugs. The most intriguing exhibit concerning hearing care was from Ear Q (as opposed to "IQ"—get it?). They sell a system for $399 that includes a hand-assembled and calibrated high-quality headset and software for the PC that lets you test and measure your own hearing. Of course, you are urged to use a professional audiologist as necessary. But their system offers capabilities beyond what your audiologist offers. First, most audiologist tests stop at 8 kHz—plenty high enough for everday use, but not for hi-fidelity audio. The Ear Q system measures up to 20 kHz. Second, their system offers suggested EQ settings to compensate for your hearing impairments. I have some questions about how well this works (I think to some degree our brains partially correct for some deficiencies, and I wonder how they handle left-right differences), but it seems like a great idea on paper. The phones are advertised to be of high enough quality to be good for critical monitoring. Finally, having the system on hand at all times lets you monitor ear fatigue during the coarse of a long session, so you can correct for it appropriately as the night wears on (or realize you need to take a break!). A complete reading takes about 15 minutes. Mac software is in the works and should be available in early 2002. You can access an uncalibrated demo at the web site. This product sounds like a great idea.

Vocal and Acoustic Guitar Mics

In my mic shopping, I quickly learned that the AES is probably one of the worst places to do mic shopping! Every exhibitor has a different preamp, headphone amp, and headphones. It's also noisey. And in any case, listening to one's own voice in the cans while singing is not the right way to audition a vocal mic. But I couldn't resist giving a listen to some mics anyway. Here are some quick and very questionable impressions (and not all of what I mention here is new). But to further emphasize that the reader must beware, I'll mention that a mic I was very interested in—the Neumann KSM105—sounded lifeless and dull to me in the Neumann booth. In fact, so did a U87! Yet I know from folks with ears that I consider better than mine that these are outstanding mics, and that the KSM105 has much of what I'm looking for. This just goes to show what effect all the variables have on ones impressions of a mic at AES.

That said, I was very impressed with the Earthworks SR69 mic (and their mic pre, which they used to demo it, using a Mackie 1202 as a glorified switcher/headphone amp). It's a cardiod mic that has a very natural tone, not too oppressive proximity effect, and good pop filtering. You can remove the pop filter, and it then has the geometry of the better-known Earthworks instrument and measurement mics, and a flatter top-end response. So it can play the role of both a live or studio vocal mic, and an instrument mic. The list price is around $500; considering the quality and the flexibility, it sounds like a good deal. For what I am looking for, this was certainly one of the most impressive mics of the show. I'd love to hear from anyone who has SR experience with this mic. By the way, it was announced during the show that the SR69 won a Tec award this year.

I was also very curious about the Marshall MXL mics, a few of which have been reported to have a quality higher than their low price might suggest. I was most impressed with the MXL603 small-diaphragm condenser instrument mic, and the new MXL V69 large diaphragm tube vocal mic.

Shure was the only major mic manufacturer to have a vocal booth at the show where you could hear some mics in a good acoustic environment. And thus it's perhaps no surprise that their vocal mics made a very good impression on me. They had their KSM series of large diaphragm vocal mics in the booth. The KSM32 is the one that sounded best with my voice.

I left the show really wanting to hear the above mics again in a better environment. I also left wondering if perhaps Audio-Technica had the best policy: almost none of their mics could be demoed! This show really is not a good place to make a final mic purchasing decision.

AMT's Guitar-Mounted Mics

Internal (in the guitar) mics are popular for acoustic guitar amplification, typically in combination with some other onboard transducer. Indeed, they usually sound pretty horrible by themselves. Such a mic is "a mic in a box" and sounds pretty much like "a mic in a box!" As a result, some players have been wondering how to portably mount a mic outside the box. Scottish singer/songwriter Dougie Maclean is perhaps the best known of such players; he has custom-made rig (bent Romex wire!) that holds a small omni lapel mic off his guitar. Well, Applied Microphone Technology of Livingston, NJ (973-992-7699) has finally come up with a commercial alternative to Dougie's Romex rig. Their S15G system is a high-quality lapel mic on a gooseneck with a clamp that holds it to the treble-side upper bout of the guitar. They also sell an S3G model that clips to the soundhole to hold the mic inside the guitar. They offer other models for other instruments (e.g., horns). On the downside, the guitar models sell for about $300, about triple the price guitarists are used to paying for internal mics. We'll have to see if the mic quality and external mounting offer sound quality enough improved over existing offerings to justify the price. If they were to market just the S15G mounting bracket, they would probably have a good market just for that!

PreSonus Acousti-Q Instrument Preamp

This is a piece of gear that I didn't see but wish I had; I only found out about from reading the AES newspaper after the show. PreSonus Audio Electronics announced the Acousti-Q tube acoustic instrument preamp/EQ/blender. It has two tube preamps, one each for a pickup input and a condenser mic input. The EQ section has a tunable notch filter, brilliance and bass controls, and a sweepable mid. A footswitch can control not only mute, but also a preset gain cut/boost. There is a stereo effects loop. It appears that the EQ applies only to the blended signal, and not separately to the mic and pickup signals; if true, this is a serious limitation. No detailed info is yet available from the PreSonus web site; hopefully more detailed info will be available soon.

Mic Pres and EQs

I was not shopping for a mic pre or EQ. I'd like to have better stuff than what I own, but I've reached a certain budget ceiling where I know the next significant jump in quality will require more $$ than I can presently afford. But foolishly I tried a few units anyway, and despite the limitations of the setting, got a dangerous taste of what a great pre can sound like. Dan Kennedy had both the original Great River MP series "plain gain" mic pre on display, and his new MP-2NV "colored" mic pre. For what I am looking for, the original MP-2 sounded just awesome—incredibly natural and detailed, but at the same time warm and with a really solid low end. I'm sure the lovely Josephson mics he was using in the demo helped! The MP-2NV is an incredibly clever alternative for those looking for something a little colored. Its circuitry is based on that of the Neve 1073 module. Two pushbutton switches allow the user to adjust the input impedance and output loading (there are custom Sowter transformers on the input and output). One can also separately adjust the first stage gain and the output level, so a bit of nonlinearity can develop if one desires it. As a result, one has a broad pallette of subtle colorations available. I don't have "golden ears" by any means, but even my not-highly-trained ears could hear the results of some of the settings and appreciate the flexibility. This is one of the most clever pieces of gear I saw at the show. But for my needs, I think I still want the MP-2!

Another standout for a processor at the beginning of the signal chain was Pendulum Audio's new Quartet. Acoustic guitarists know Greg Gualtieri's Pendulum gear through his now-retired HZ-10 series of preamp+parametric EQ systems for acoustic pickups, and his current SPS-1 stereo preamp/EQ. Less known to guitarists is his more extensive line of pro audio gear, much of it vacuum-tube based. The Quartet four-element tube recording channel is the latest in this line. It has a mic pre stage that can double as an active DI, a 3-band EQ, an optoelectronic compressor-limiter, and a de-esser, all in two rack spaces with an analog meter for monitoring either the output or the compression. This one, too, gets an award for clever design. Three examples of cleverness: The mic pre has a switchable low cut, and the EQ can be placed before or after the compressor/limiter. Thus, for example, one can cut the lows so that they don't pump the compressor, and then boost them back after the compressor with the low EQ. The compressor has all the standard manual controls, but also "fast," "average," and "vintage" presets. Finally, the de-esser works, not by dynamically cutting all of the high end, but rather by dynamically notching a selected band (choose from 11 frequencies from 3.4 kHz to 11.5 kHz). Despite all the stuff in the signal path, this box sounds incredibly clean. I was impressed with how transparently the compressor worked. I was especially impressed with the EQ—you could dial in fairly large amounts of EQ without getting that unnatural "it sounds like EQ" sound you find from budget EQs (and even from some expensive EQs). Greg told me that it's already become popular, not just in the studio, but in the live racks for some pretty big-name touring vocalists. Easy to believe!

Crest's Small-Format Mixers

Crest Audio, a very respected name in power amps and medium and large format consoles for live sound, now has an XR series of small-format mixers (introduced earlier this year). They have some of the features of Crest's larger boards, including individual PCBs for each channel, and cleverly designed long-throw faders (the control arm bends around so the slot to the resistive element faces sideways—your spilt bear and potato chip crumbs won't clog the faders!). The Crest rep claimed the sonic quality is comparable to that of their larger format mixers. In its size and capabilities, it appears aimed at the Mackie/Behringer crowd, though perhaps it would be more accurate to say it's aimed at the Allen & Heath MixWizard crowd. The price point is well above Mackie, and a bit above A&H. It will be interesting to see how they are evaluated by the SR crowd. Unlike the MixWizards, the inputs are permanently located on the back—the "R" in "XR" stands for "Rack Mount;" you can't use them easily any other way. They had one set up with a John Hiatt CD going into two channels and headphones on the output; not exactly the best setup for evaluating a mixer. About the only thing you could check with this was the faders and the EQ. Each channel has 4 bands of EQ, with the mids semiparametric (again inviting comparison with the MixWizards). I didn't like the EQ very much, but it wasn't a great setting for critical listening. If these boards had a movable input pod like the MixWizards, I think they'd make a much bigger impression. In any case, it's good to see so reputable a manufacturer address this market.

Multitrack Audio Editing Software

I'm shopping for multirack audio editing software, and so I was very disappointed that Digidesign, Emagic, and others chose not to attend AES after the tragic events of September 11 (which caused the show dates to be moved). To be fair, many of the attendees that pulled out very likely had logistic nightmares on their hands rescheduling, and some of them have made significant donations to NYC charities or are planning major NYC events in the near future in an effort to stimulate the NYC economy (e.g., see Digidesign's press release or Mackie's ads in current magazines). Still, for folks wanting to see software in action, that left MOTU as the only contenders at the show. Their booth had a large display area with a couple dozen seats and two very large LCD displays. They hosted 30-min long demos throughout the day, and were usually speaking to standing-room-only crowds. The new version of Digital Performer is certainly impressive. Much of the presentation I saw focused on audio-for-video capabilities. I'm not interested in this myself, but it was impressive to see regardless. Also impressive are the new multichannel mixing capabilites (covering many formats, not just 5.1, and expandable via plugins), and the AltiVerb sampling reverb plugin (not really a MOTU product, but currently only compatible with MOTU). Finally, the mere fact that they took the time and energy to show up left many of us with a good impression of the whole MOTU team, particularly in light of the dramatic absence of the competition.

Though they didn't have an exhibit, Emagic and Nuendo were clever and had reps at the large booth hosted by Sam Ash. Unfortunately, they had such little display area that it was hard to see anything. Despite these limitations, my impression as a newcomer to this software is that, from the user interface perspective, most of the products are very similar, but Emagic's Logic stands out in terms of ease of navigation and customizability. A local engineer acquaintence of mine familiar with both Pro Tools and Logic compares them by saying that Pro Tools is aimed toward engineers, and Logic toward musicians. What little I saw of these products supports this simple summary.

Closing Thoughts

Well, that's it—more than you wanted to read in one place, but a lot less than what went on. Yet, one more thing deserves mention. By far my favorite time at the AES convention was that spent, not looking at gear or hearing talks, but meeting others in the audio world. This is an amazingly open and supportive community. It was especially enjoyable to finally put faces to some names that have become familiar to me through the years in publications and in online forums. Folks like Dan Kennedy and Greg Gualtieri are so down-to-earth, friendly, and open with their knowledge and experience that one takes special pleasure in finding out that their gear is so well-crafted: the gear reflects the qaulity of the people creating it. It was equally pleasurable to finally meet Scott Dorsey and Steve Dove, whose writings have taught me so much about the guts behind the faceplates of audio gear. Finally getting to thank these folks in person for their help through the years was one of the most satisfying aspects of the AES convention for me. I hope to repeat the experience in future years!