by Steve Deckert
1998
As
an audiophile and designer, my listening experiences have had me pondering
the connection between the sound we hear and the imagery we construct
in our minds compared to the imagery we see with our eyes. The fact
that sound frequencies and light frequencies are both waves makes
them very similar to each other even if humans didn't exist to see
or hear them. The fact that our mind interprets both forms of waves
via our eyes and ears to create an image in our consciousness makes
our eyes and ears very similar to each other even if sound or light
didn't exist.
With
this in mind, I began to realize there may be a way to relate some
of the subjective adjectives used to describe hi-fi systems by using
images. This would be desirable for a number of reasons, least
of which would be the reference value, and or possible standard set
by it.
For
example, fundamentals are present in both wave forms, so if we took
a pure high quality image and introduced some phase shifting commonly
associated with crossover distortion, we could visually see the colors
and textures of the image change. This would be exactly the same as
the effects a seasoned audiophile would detect with his ears. The
image in the consciousness is the same, yet the audiophile got it
from his ears, and the reader of this paper will get it from their
eyes.
Combining
the two forms of waves, sound & light, to create a performance
received by the eyes and ears is a powerful communication. When you
go to a concert you can see and hear the musicians as well as the
ambiance of the music hall. There is no question how large the room
is, where the musicians are located, or which particular ones are
playing. Yet when we take away the sight it becomes exactly twice
as hard to answer those questions. When we take away the room, and
the real musicians and stick your listening room with a pair of speakers
it only stands to reason the the difficulty will increase by many
fold.
Once
the live music has been recorded, reconstituted and finally regurgitated
out of your speakers, your room takes over adding reflections of its
own that do not match the original music hall. These reflections besides
revealing your listening room over the recording, can create a myriad
of problems making it difficult if not almost insanely difficult to
perceive a clear image in your conscious during playback.
It should be noted that the eyes and ears are approximately equal
as senses, and for this reason using both during playback of a hi-fi
system is often unwise. Since you'll see the front wall of your listening
room, you brain will tell you that you're in your listening room,
not at the musical performance. Yet, on a seriously good hi-fi playback
system, the ears send a pretty convincing signal to the brain. Sometimes
the signal is as strong as the visual coming from your eyes, and then
your brain crashes, or at least tension is created.
To
see the 3D image of the recording you must not use your eyes.
One
of the main reasons, in my opinion, that home theater has become so
popular is from the markets basic failure to obtain satisfaction from
audio-only playback systems. This basic inability to create a clear
three dimensional image comes from failure to comply to the needs
and demands of the playback system and room. It's a serious science,
not a mass marketable concept. The home theater combines both audio
and visual to bring a more powerful communication to the listener,
and hence clearer image.
We
have all wondered at one point in time or another how the speaker
in our TV sets sounds so good when we know its just a cheap speaker.
The answer is simple. The visual image received by the eyes and the
sound heard by the ears are both interpreted simultaneously by the
mind and are both used to create an image or feeling in the consciousness.
Vision is a very strong cue. Anytime your eyes can see what is making
the sound, the sound will become more clear. This makes it easy to
interpret quality that is actually not there, and perhaps the reason
so many people are drawn to home theater. In a home, the act of placing
a TV between your loudspeakers or placing your speakers against the
wall is the ultimate hi-fi sin. It makes it absolutely impossible
to create a 3D sound stage.
On
the flip side, I have played high quality soundtracks in stereo on
a projection screen in my treated listening room and found that sound
quality to exceed the video quality by so far, that it was extremely
distracting. To remove the tension, a 3D hologram would have to have
been projected that was maybe 50 feet wide and 1000's of feet deep
to get the sound and video quality matched. My conclusion was that
two dimensional sound is better suited for watching 2 dimensional
movies on TV.
Apex
listening refers to serious listening of stereo playback systems
using two speakers spaced equidistant with the listening chair and
in a symmetrical arrangement with the listening room. The idea being
to hear the direct energy (sound) from the speakers without hearing
the nasty reflections of the listening room itself. If you are doing
serious listening and trying to improve the three dimensional image
of your playback system's sound stage capabilities, then this paper
may be of some assistance.
It
is very hard to explain with words how a system sounds, and even harder
to accurately describe how it images. The main reason it is so hard
(reviewers) is that most readers only think they've heard what your
talking about but in reality have no real reference. Since kind of
getting into graphics a little bit as the result of this web site.
What
I am going to do is create and manipulate an image to show you what
various playback systems sound like, and attempt to explain why.
If
you're not sure what a good three dimensional image sounds like, this
paper and it's illustrations should give you a solid reference point!
PART
1 - Choosing the reference images
It
has taken me about 18 months of pondering to decide on an image that
could be used as a reference for this paper. Obviously a good quality
3D image, but of what? The first impulse was to photograph musicians
playing some music and then run the photo through various filters
(algorithms) to simulate various kinds of distortion commonly found
in the set up and design of hi-fi playback systems.
The
photo of musicians turns out to be to specific. If the discussion
is about large spaces and the photo is of a small club act, there
is confusion. If the discussion is about the intimacy of a close up
three piece and the photo is of an orchestra, there is confusion.
The image had to be somewhat generic so as to be applicable to all
people, all types of music, and most any discussion. It had to contain
all the wonderful things found in your mind when listening to a perfect
playback system.. I.e.. the depth, width, detail, layers, tonal quality,
balance etc...
After
a year of looking for the perfect image, I had to give up and resort
to creating one using the technology of the day - a raytracing program.
After creating several object to represent point sources in the sound
stage, I.e.. different instruments, I again found these powerful images
to be too specific. What I realized is that representative material
was needed to mean different things to different people, and this
would ensure that the image as a whole was the same to everyone. Kind
of ironic. The shapes that I chose are basic so as to remain representative,
and are mostly floating spheres.
This
paper will no doubt contain dozens of images representing various
playback models. The image above will be considered a master that
was recorded without a room. By that I mean, if the above image were
sound, it would be part of the master tape.
Study
the image closely. The black represents space or silence. Pure
black is a noise floor of zero. In contrast, the objects floating
represent point sources of sound. Together this is called a soundstage.
The amount of light level of these objects represents sound
pressure commonly measured in decibels. As you look at these objects
in contrast with the silent background, they are at a pressure of
around 78 dB. With a noise floor of zero (the black background) this
would be interpreted as fairly loud. In your listening room, even
in the middle of the night when everything is quiet, your noise floor
rests between 50 and 65 dB.
Notice
that all of the objects (sounds) are in harmony with each other. I
didn't make one for example, hot pink. This harmony accurately represents
what happens when all the instruments (objects) are in tune.
If you study the wavelengths of the 20,000 colors used in this image
you will see they are harmonically aligned. To make one hot pink would
be to create a flagrant phase and amplitude shift within that object
and gross amounts of comparative odd order distortion in the image.
Mathematically I have found that there is little difference between
light and sound frequencies. Each color occupies a different bandwidth
like octaves on a piano. So when we create a algorithm (circuit) and
process the image through it, the alarming result is a fairly accurate
visual representation of what these effects would sound like.
The
large sphere in the center of the image represents the body
or center of the recording. This could be a vocalist, drum kit, piano,
or pretty much any sound that is panned to the center of the recording.
The smaller sphere orbiting just to the right represents a
supporting instrument such as a guitar or another singer, or perhaps
a different section of an orchestra. Note both of these objects have
reflections below them. These reflections represent the floor,
of which there always IS one in all recordings.
The
objects with reflections create those reflections by being fairly
close to the floor. The reflections give us a reference to the height
of the objects. Remember in your listening room with your eyes closed,
the only reference consistent in all recordings is the floor.
The
three smaller spheres above the larger one represent sounds orienting
from a space in the distance. Note in this image with the silent background,
it is difficult to know exactly how far back these sounds (sphere)
actually are. This is because they are too far away from the reference
(floor). This is interesting to note, because when we add the room
to this image, the spheres will be very well defined in space and
you will have no problem judging front to back orientation of these
three spheres. The reason is because we will have added walls or additional
planes of reference.
The
challis and the pyramids represent complexity and balance. Although
in the image above they are as plane as day, these images have less
illumination meaning if they were points of sound, they are quieter.
When we add the room to this image, we essentially will be adding
a noise floor to the image (recording). This does not mean NOISE NOISE,
but rather an ambient decay of the music revealing the boundaries
of the space around the music (such as a room or music hall). This
ambient background will represent the 50 dB of fill that is missing
from the above image. When the room is revealed in the above image,
it will add a complexity of its own that will make the pyramids and
challis difficult to find unless everything is perfect like on the
master tape.
The
room image combined with the dynamic image above will startle you
as the two combined create a full rich tapestry of sound ( or in this
case color and shape). This is important to note because a lot of
effort was taken to keep these proportions accurate. The dynamic image
above, with no room, is neat, but something is obviously missing.
You would feel that something is obviously missing too, if you were
listening to the same scenario.
The
room adds as much as 80% to the music so the change is sure to be
abrupt. This is of course the original room that the musicians recorded
in, not your listening room. Your listening room has effects too,
that can account for as much as 40% distortion to the subjective image
painted in the conciseness.
The
next image will contain the background, or "room" added
in the 3D model above. (if anyone knows the author of this image that
I used for the background, please let me know so I can credit and
compliment his work here on the site) This image will become our reference
for this paper, representing the high quality of a well recorded master.
The images that follow that will illustrate what happens when....
and you will see the exact side effect in the image that you would
hear in your playback system of things like improper room alignment,
different audio cables, different amplifiers and so on.
click here for next page