Paper regarding the subjective effects of apex listening in a variety of system models using imagery.

by Steve Deckert


As an audiophile and designer, my listening experiences have had me pondering the connection between the sound we hear and the imagery we construct in our minds compared to the imagery we see with our eyes. The fact that sound frequencies and light frequencies are both waves makes them very similar to each other even if humans didn't exist to see or hear them. The fact that our mind interprets both forms of waves via our eyes and ears to create an image in our consciousness makes our eyes and ears very similar to each other even if sound or light didn't exist.

With this in mind, I began to realize there may be a way to relate some of the subjective adjectives used to describe hi-fi systems by using images. This would be desirable for a number of reasons, least of which would be the reference value, and or possible standard set by it.

For example, fundamentals are present in both wave forms, so if we took a pure high quality image and introduced some phase shifting commonly associated with crossover distortion, we could visually see the colors and textures of the image change. This would be exactly the same as the effects a seasoned audiophile would detect with his ears. The image in the consciousness is the same, yet the audiophile got it from his ears, and the reader of this paper will get it from their eyes.

Combining the two forms of waves, sound & light, to create a performance received by the eyes and ears is a powerful communication. When you go to a concert you can see and hear the musicians as well as the ambiance of the music hall. There is no question how large the room is, where the musicians are located, or which particular ones are playing. Yet when we take away the sight it becomes exactly twice as hard to answer those questions. When we take away the room, and the real musicians and stick your listening room with a pair of speakers it only stands to reason the the difficulty will increase by many fold.

Once the live music has been recorded, reconstituted and finally regurgitated out of your speakers, your room takes over adding reflections of its own that do not match the original music hall. These reflections besides revealing your listening room over the recording, can create a myriad of problems making it difficult if not almost insanely difficult to perceive a clear image in your conscious during playback.

It should be noted that the eyes and ears are approximately equal as senses, and for this reason using both during playback of a hi-fi system is often unwise. Since you'll see the front wall of your listening room, you brain will tell you that you're in your listening room, not at the musical performance. Yet, on a seriously good hi-fi playback system, the ears send a pretty convincing signal to the brain. Sometimes the signal is as strong as the visual coming from your eyes, and then your brain crashes, or at least tension is created.

To see the 3D image of the recording you must not use your eyes.

One of the main reasons, in my opinion, that home theater has become so popular is from the markets basic failure to obtain satisfaction from audio-only playback systems. This basic inability to create a clear three dimensional image comes from failure to comply to the needs and demands of the playback system and room. It's a serious science, not a mass marketable concept. The home theater combines both audio and visual to bring a more powerful communication to the listener, and hence clearer image.

We have all wondered at one point in time or another how the speaker in our TV sets sounds so good when we know its just a cheap speaker. The answer is simple. The visual image received by the eyes and the sound heard by the ears are both interpreted simultaneously by the mind and are both used to create an image or feeling in the consciousness. Vision is a very strong cue. Anytime your eyes can see what is making the sound, the sound will become more clear. This makes it easy to interpret quality that is actually not there, and perhaps the reason so many people are drawn to home theater. In a home, the act of placing a TV between your loudspeakers or placing your speakers against the wall is the ultimate hi-fi sin. It makes it absolutely impossible to create a 3D sound stage.

On the flip side, I have played high quality soundtracks in stereo on a projection screen in my treated listening room and found that sound quality to exceed the video quality by so far, that it was extremely distracting. To remove the tension, a 3D hologram would have to have been projected that was maybe 50 feet wide and 1000's of feet deep to get the sound and video quality matched. My conclusion was that two dimensional sound is better suited for watching 2 dimensional movies on TV.

Apex listening refers to serious listening of stereo playback systems using two speakers spaced equidistant with the listening chair and in a symmetrical arrangement with the listening room. The idea being to hear the direct energy (sound) from the speakers without hearing the nasty reflections of the listening room itself. If you are doing serious listening and trying to improve the three dimensional image of your playback system's sound stage capabilities, then this paper may be of some assistance.

It is very hard to explain with words how a system sounds, and even harder to accurately describe how it images. The main reason it is so hard (reviewers) is that most readers only think they've heard what your talking about but in reality have no real reference. Since kind of getting into graphics a little bit as the result of this web site.

What I am going to do is create and manipulate an image to show you what various playback systems sound like, and attempt to explain why.

If you're not sure what a good three dimensional image sounds like, this paper and it's illustrations should give you a solid reference point!

PART 1 - Choosing the reference images

It has taken me about 18 months of pondering to decide on an image that could be used as a reference for this paper. Obviously a good quality 3D image, but of what? The first impulse was to photograph musicians playing some music and then run the photo through various filters (algorithms) to simulate various kinds of distortion commonly found in the set up and design of hi-fi playback systems.

The photo of musicians turns out to be to specific. If the discussion is about large spaces and the photo is of a small club act, there is confusion. If the discussion is about the intimacy of a close up three piece and the photo is of an orchestra, there is confusion. The image had to be somewhat generic so as to be applicable to all people, all types of music, and most any discussion. It had to contain all the wonderful things found in your mind when listening to a perfect playback system.. I.e.. the depth, width, detail, layers, tonal quality, balance etc...

After a year of looking for the perfect image, I had to give up and resort to creating one using the technology of the day - a raytracing program. After creating several object to represent point sources in the sound stage, I.e.. different instruments, I again found these powerful images to be too specific. What I realized is that representative material was needed to mean different things to different people, and this would ensure that the image as a whole was the same to everyone. Kind of ironic. The shapes that I chose are basic so as to remain representative, and are mostly floating spheres.

This paper will no doubt contain dozens of images representing various playback models. The image above will be considered a master that was recorded without a room. By that I mean, if the above image were sound, it would be part of the master tape.

Study the image closely. The black represents space or silence. Pure black is a noise floor of zero. In contrast, the objects floating represent point sources of sound. Together this is called a soundstage. The amount of light level of these objects represents sound pressure commonly measured in decibels. As you look at these objects in contrast with the silent background, they are at a pressure of around 78 dB. With a noise floor of zero (the black background) this would be interpreted as fairly loud. In your listening room, even in the middle of the night when everything is quiet, your noise floor rests between 50 and 65 dB.

Notice that all of the objects (sounds) are in harmony with each other. I didn't make one for example, hot pink. This harmony accurately represents what happens when all the instruments (objects) are in tune. If you study the wavelengths of the 20,000 colors used in this image you will see they are harmonically aligned. To make one hot pink would be to create a flagrant phase and amplitude shift within that object and gross amounts of comparative odd order distortion in the image.

Mathematically I have found that there is little difference between light and sound frequencies. Each color occupies a different bandwidth like octaves on a piano. So when we create a algorithm (circuit) and process the image through it, the alarming result is a fairly accurate visual representation of what these effects would sound like.

The large sphere in the center of the image represents the body or center of the recording. This could be a vocalist, drum kit, piano, or pretty much any sound that is panned to the center of the recording. The smaller sphere orbiting just to the right represents a supporting instrument such as a guitar or another singer, or perhaps a different section of an orchestra. Note both of these objects have reflections below them. These reflections represent the floor, of which there always IS one in all recordings.

The objects with reflections create those reflections by being fairly close to the floor. The reflections give us a reference to the height of the objects. Remember in your listening room with your eyes closed, the only reference consistent in all recordings is the floor.

The three smaller spheres above the larger one represent sounds orienting from a space in the distance. Note in this image with the silent background, it is difficult to know exactly how far back these sounds (sphere) actually are. This is because they are too far away from the reference (floor). This is interesting to note, because when we add the room to this image, the spheres will be very well defined in space and you will have no problem judging front to back orientation of these three spheres. The reason is because we will have added walls or additional planes of reference.

The challis and the pyramids represent complexity and balance. Although in the image above they are as plane as day, these images have less illumination meaning if they were points of sound, they are quieter. When we add the room to this image, we essentially will be adding a noise floor to the image (recording). This does not mean NOISE NOISE, but rather an ambient decay of the music revealing the boundaries of the space around the music (such as a room or music hall). This ambient background will represent the 50 dB of fill that is missing from the above image. When the room is revealed in the above image, it will add a complexity of its own that will make the pyramids and challis difficult to find unless everything is perfect like on the master tape.

The room image combined with the dynamic image above will startle you as the two combined create a full rich tapestry of sound ( or in this case color and shape). This is important to note because a lot of effort was taken to keep these proportions accurate. The dynamic image above, with no room, is neat, but something is obviously missing. You would feel that something is obviously missing too, if you were listening to the same scenario.

The room adds as much as 80% to the music so the change is sure to be abrupt. This is of course the original room that the musicians recorded in, not your listening room. Your listening room has effects too, that can account for as much as 40% distortion to the subjective image painted in the conciseness.

The next image will contain the background, or "room" added in the 3D model above. (if anyone knows the author of this image that I used for the background, please let me know so I can credit and compliment his work here on the site) This image will become our reference for this paper, representing the high quality of a well recorded master. The images that follow that will illustrate what happens when.... and you will see the exact side effect in the image that you would hear in your playback system of things like improper room alignment, different audio cables, different amplifiers and so on.

click here for next page



Decware is a trademark of High Fidelity Engineering Co.
Copyright 1996 1997 1998 1999 2000 2001 2002 2003 2004  2005 2006 2007 2008 by Steve Deckert