The Importance of Timbre in Sound Reproduction Systems
by, 07-03-2012 at 05:14 AM (18011 Views)
Timbre definition from Wikipedia, “In music, timbre, also known as tone color, is the quality of a musical note or sound or tone that distinguishes different types of sound production, such as voices and musical instruments, string instruments, wind instruments, and percussion instruments. The physical characteristics of sound that determine the perception of timbre include spectrum and envelope. In psychoacoustics, timbre is also called tone quality and tone color.” I suggest reading the Wikipedia article as timbre contains both subjective and objective attributes, both of which are discussed in detail in this post.
From a sound reproduction perspective, if ones goal is to reproduce music as faithfully as possible, then timbre (and all of its subjective and objective attributes) is a significant factor. I consider room acoustics the worst offender for destroying timbre (i.e. tone quality). If you are into the scientific research, there are a number of references in the Audio Engineering Society's library, here are a couple AES E-Library » Natural Timbre in Room Correction Systems (Part II) and AES E-Library » The Influence of the Room and of Loudspeaker Position on the Timbre of Reproduced Sound in Domestic Rooms
I made two major acoustical improvements in my listening environment recently and thought I would share not only the acoustical measurements, but the actual the sound too. Literally, you will be able to hear what I hear when listening to my stereo in my listening room. You will be able to listen to the difference between an acoustically untreated room, a treated room, and using digital room correction (DRC) software.
How? With these in-ear binaural microphones. Click on the details tab, give a quick listen to the acoustic guitar demo. With these, you will be able to hear my stereo/room combo in 24/96 resolution with timbre that accurately represents what I hear. I use these to record live music and thought I could use them to record my stereo in the listening position.
I was going to call this article, “Modern Room Tuning Techniques”, as it is a continuation of my, Speaker to Room Calibration Walkthrough, but ultimately, no matter how many words I write, or graphs I post, it will not fully communicate what my speaker/room combination sounds like. You need to hear it too. Actually listening to the sound will put into perspective what the words and measurement graphs mean.
Professional acousticians and audio engineers routinely take acoustic measurements as part of their everyday job. If you have been doing it full-time for a career, then you can read an acoustic measurement graph and hear the sound in your head. Same as how a musician can read notes off a music sheet and hum the tune (some with perfect pitch) in their head.
While every acoustic space is unique, there are a couple of basic tenets that hold true for small room acoustics, which the majority of our listening rooms fall under this classification. These tenets are controlling room resonances and overall room decay times (i.e. RT60). This is based on a large body of knowledge specifically on small room acoustics. Here is a quick overview with a few reference links.
Just like electronic engineers use circuit diagrams and part’s list (BOM) to communicate the designs and sonic signatures of audio amplifiers, acousticians use time, energy, and frequency information to communicate the sonic signature of an acoustic environment (i.e. both speakers and room). In this article, you will be able to correlate what you see with what you hear (literally) and vice versa.
The Design Process
I am going to analyze my listening room and come up with two designs to improve the timbre of the room acoustics, one passive, and the other active. But first, I will measure my room “as is” and make a reference recording with the binaural microphones so you can hear my stereo and room as is. Here are a couple of pics of my very live, untreated room.
I try and make my analysis balanced between 50% what I hear and 50% what I measure. Based on that analysis, I will design and implement passive acoustic treatments. Then take another set of measurements and binaural recording of the speaker/room combo with the acoustic treatments in place.
Next, Digital Room Correction (DRC). I will fine tune the frequency response using a “target” or “designed” frequency response to reproduce the best effort tonal balance and fine tune the impulse response, (i.e. timing) for the best possible timbre from the speaker/room combo.
Meaning achieving the best possible tone quality (i.e. timbre) is limited by the physical dimensions of the listening room. Given that we can digitally manipulate all three dimensions of sound (amplitude, frequency, and time), we can create any sonic signature we want, with the limitation being the physical dimensions of the room itself. Technically, this is called a transfer function. A transfer function at this level encompasses everything that makes up the sonic signature of the speaker/room combo.
Because of digital audio, we can design and implement our own transfer functions (i.e. sonic signatures) in software with distortion and noise levels far below what we can perceive and correction at a level of resolution far greater that our ears (read:brain) can discriminate.
Historically it was thought that we could only discriminate to 1/3 of an octave (hence the 1/3 octave analog equalizer). Later research has determined that we can discriminate somewhat closer to 1/6 of an octave. So when viewing acoustical frequency response graphs, 1/6 octave smoothing is the preferred resolution to view the graphs as that is the most accurate representation of how our ear hears (or more technically correct, how the brain interprets the electrical signals).
In the digital domain, we have digital filters that can have 65,535 “bands” (or more). Compared to a 31 band 1/3 octave analog equalizer... That's a revolution.
I chose a linear phase filter (as opposed to minimum phase) as this produces the best phase coherence and time alignment. Not only is the sound “time aligned”, but some early reflections are reduced so that the phase coherence holds together long enough to hear the depth on the recording before the 3D image is destroyed by comb filtering effects of the room. Comb filtering is the root of all evil for an audiophile.
Reducing early sound reflections, (and diffusing later reflections), is critical to the realistic reproduction of any stereo recording and achieving best possible timbre (i.e. tone quality). You want to hear enough of the recording long enough so that the phase coherence or sound stage is heard before the room takes over and interferes with comb filtering “location” cues that blurs the (depth of) image and colors the sound quality with the tone of the room. You will (easily) be able to hear this in the binaural recordings when I compare “as is” with “passive acoustic treatments” and finally with “DRC”.
But first, this is what we need to listen for. It is a bit of science, hopefully presented in a fun and easy to hear manner as it is important to understand what is happening and especially what it sounds like. We all listen to it, but can we hear it?
Pretty cool the Hass effect. “The Haas effect is a psychoacoustic effect, described in 1949 by Helmut Haas in his Ph.D. thesis. It is often equated with the underlying precedence effect (or law of the first wavefront).”
“Haas found that humans localize sound sources in the direction of the first arriving sound despite the presence of a single reflection from a different direction. A single auditory event is perceived. A reflection arriving later than 1 millisecond after the direct sound increases the perceived level and spaciousness (more precisely the perceived width of the sound source). A single reflection arriving within 5 to 30 milliseconds can be up to 10 dB louder (My note: that’s twice as loud!) than the direct sound without being perceived as a secondary auditory event (echo). This time span varies with the reflection level. If the direct sound is coming from the same direction the listener is facing, the reflection's direction has no significant effect on the results. A reflection with attenuated higher frequencies expands the time span that echo suppression is active. Increased room reverberation time also expands the time span of echo suppression.”
Key concept. It is amazing how a 5 millisecond delay can have that much width. The majority of rock and pop (and most mono multi-track) recordings use the Hass effect extensively, along with more digital delays, reverbs, stereo expanders, etc. If you listen to rock and pop, or any other mono recorded, multi-track recording, it is fake stereo. It's all an illusion and fools our brain every time (speaking as someone that spent over 10,000 hours in the recording/mixing chair doing exactly that). Personally, I don't care. When I crank up SRV's Tin Pan Alley (DR 15) on my rock and roll audiophile system and it feels like I am at Buddy Guy's Legends night club in Chicago, the illusion is complete for me.
A bit more physics, as this is directly related to speaker location and listening position. Sound travels roughly 1 foot per 1 millisecond. The wavelength of a 20 KHz frequency is 0.68 of an inch. If my stereo's equilateral triangle is out even by an inch, I will already have destroyed some of the high frequency image (especially depth of field), because the equilateral triangle is misaligned and I am creating comb filtering at high frequencies.
The learning from this is that time alignment of everything is critical, due to the Haas effect, and its role in reproducing proper timbre. The better aligned the equilateral triangle, the more phase coherent image can be reproduced, which is one of the key attributes of reproducing the most realistic timbre. Additionally, this is why early reflections need to be tamed, typically 15 dB below the direct signal, so we don’t get the Haas effect blurring the time alignment of the stereo image (especially depth).
My design approach to modern room tuning techniques includes using passive acoustic treatment to minimize room resonances, early reflections, and over all room decay time (RT60). I also use state of the art DRC software to trim the frequency response for best effort tonal balance, time align the signal so that the waveform (all frequencies) arrives at the same time in the listening area, and minimize early reflections to enhance the depth and overall phase coherence of the stereo image before comb filtering destroys the recorded illusion. This is captured on the binaural recordings.
Acoustic Analysis and Design
Fellow CA readers, I am the recipient of the 2nd worst possible sounding room award, only beaten by a room shaped like a cube. This is because the length of the room is almost twice the width. Additionally, my stereo is set up off center in the room. So how do I know it is the 2nd worst possible sounding room? I am using Bob Gold’s room mode calculator that will produce a nice graphic display of the room modes given the dimensions of my room.
According to the calculator, my rooms Schroeder cutoff frequency is 92 Hz. This is my room’s fundamental transition frequency, below this frequency, the room behaves as a resonator, above, a diffuser/reflector. This transition point is far from smooth and resonates below the cutoff and rings (like a filter) above the cutoff. Just like blowing air across the mouth of a near empty coke bottle, every room resonates a tone that rides on top of all low frequency notes. Depending on how bad it is, like my room ratio for example, will produce what is sometimes called “one note” bass tone, meaning the rooms resonant frequencies are so dominant (i.e. too much amplitude) so all the bass notes (and sometimes drums) sound like just one note is playing. Also called “room boom”.
You will hear the room boom in my listening room as it is captured in the binaural recording. You too can work out your rooms resonant frequencies using this calculator. Here is a frequency response measurement of my room to see if it correlates with the model. Many thanks to JohnM for his most excellent REW measurement software.
This measurement correlates well with the model. Major peaks and valleys between 92Hz to 300 Hz. That’s the ultimate challenge isn’t it, 2nd worst possible sounding room from an acoustic perspective. If I can make this room sound good… Note the blue horizontal line is mine to help delineate the problem areas. The circled mid-range area also represents a problem area. Initially looks like too much amplitude, but the real culprit for the raised amplitued is midrange room reverb build up. We need to look at another view to see it.
This brings up a story I feel is worth sharing so you can understand where I am coming from on this. As mentioned elsewhere on my blog, I had the good fortune to have been a live sound, recording/mixing engineer for 10 years. SQ was of major importance to me and I worked extra hours to ensure the artist/group got the best possible sound I could come up with. I worked in a several state of the art acoustic spaces, with this one below sounding so good that I gave up on my home system.
The studio control room facilities I worked in were designed from the ground up acoustically to be state of the art. The rooms sounded incredible. Perfect neutral timbre. If you ever get a chance to visit a properly designed studio control room and listen to some music... I got so used to state of the art sound, that no matter what I did in my home stereo it paled in comparison to the sound of the state of the art control rooms. And I am not talking about the gear.
The biggest difference between working in the control room and listening at home was the timbre (i.e. tone quality) of the rooms. The studio control room is designed so that the engineer sitting behind the console would hear the sound of the music picked up by the mic and room of the studio before the sound of the control room could be heard. Also known as a reflection free zone (RFZ). RFZ is control room design based on knowledge of the Haas effect.
That meant obtaining a reflection free zone at the mix position and ensuring that any room timbre (i.e. tone quality and all of its subjective and objective attributes) was as neutral sounding as possible. I.e. no coke bottle resonance effects, no boxiness, etc. If you saw the blueprints for one of these control rooms, you would see no surface is parallel and are designed to ensure early reflections did not enter the RFZ and later reflections were thoroughly diffused so any room sound was perceived as a neutral sounding extension that made the room sound a bit bigger than it really was. A very neat psychoacoustic trick.
As mentioned, the point was to hear the direct sound from the mic in the studio, plus the early reflections (i.e. tonal colorations) before you could hear the sound (i.e. timbre) of the control room. That way, when you were placing mics and eq'ing, you were not making decisions based on a hearing the tonal colorations of the control room, mixed in with the sound from the studio.
When I compared the acoustics of my home listening space versus the state of the art control room I was working in +8 hours a day, the timbre gap was so great, I gave up on a traditional speaker setup at home. Mostly I listened to headphones. Sometimes, I invited the boys over to the studio when it wasn't busy and we would listen to tunes there.
While looking at some programming sites, I came across a few Digital Signal Processing (DSP) articles. One of them was showing how you can use a well-known DSP technique, called convolution where you can digitally mix (i.e. real-time convolve) the “bit-perfect” music signal with a digital filter (both in the frequency and time domain) that was the inverse (well, they really are algorithms) of the measured room response. Convolution is a transfer function.
JRiver MC has a state of the art convolution engine to host these designed digital filters. What can be done in software far exceeds what can be done in hardware and analog domain. Every modern consumer and pro A/D D/A is performing DSP on the audio signal with digital filters (in conjunction with analog filters) already. “The precision offered by Media Center's 64bit audio engine is billions of times greater than the best hardware can utilize. In other words, it is bit-perfect on all known hardware”
A bit more searching and I found a few DRC software products that used this filter design for audio. One is called Audiolense. I downloaded the demo and ran it on my crappy Logitech G51 computer speakers. If it can make those sound good… As soon as I heard it, I knew that someone (Bernt!) had figured this out in the digital domain, which is a revolution compared to what we can do in hardware/analog audio. This is what I was waiting for.
For me, it is a new ball game and gave me the opportunity to get back into listening to music the way I heard it in those acoustically (near) perfect rooms, or at least come a lot closer than ever before. We will see if the proof is in the binaural recordings in which you can listen to and draw your own conclusions.
Back to the passive acoustic filter design. The first thing I need are bass traps that have good absorption capabilities from 92 to 300 Hz. When I was in the pro audio industry, I used ASC Tube Traps (and RPG products) extensively with good success. Unfortunately, I don’t have budgets like that anymore, but I think I have found a reasonably priced bass trap that should do the job.
It is a corner trap, and should go directly behind the speakers in the corners. Because of my room’s offset, the best I can do is directly behind the speakers in a sorta corner. The idea here is twofold; one is to dampen the low end sound coming off the back of the speaker cabinet so the refection off the wall and back to the listening position is minimized. This would correspond to about 4 or 5 milliseconds delay. Remember the Haas effect video on what 5 milliseconds delay sounded like? That’s roughly 5 feet of distance, and in this case, after the main sound wave arrives, a secondary wave arrives off the wall from behind the speakers and confuses my brain on location. In this case, destroys the image from front to back. Depth of field, due to early reflections (and comb filtering) is the first thing to go. It is the green circled portion in the graph below.
With the bass traps in place, it should help dampen those resonances/ringing from 92 to 300 Hz, plus dampen the impact off the back of the speaker. This should result in a tighter (i.e. more transient) bass sound with minimal 5 millisecond later reflection so it does not blur the (depth of the) image. This is captured on the binaural recording. We can also measure this with an Energy Time Curve (ETC).
Technically, we can measure the room’s early reflections with an ETC, typically from 0 to 40 or 50 milliseconds. That’s 40 to 50 feet of travel after the direct sound arrives at the listening position. That way we can inspect anywhere along the time curve and with the wavelength calculator, turn that into distance. This allows us to figure out where the early reflections are coming from and to either dampen or diffuse accordingly.
Looking at the spikes on the graph and corresponding millisecond time reading, can be translated into feet using the wavelength calculator. Then measuring from the mic position to the point of reflection to identify where passive acoustic treatments should go.
And it is mostly the same type of acoustical treatments, one to tame the room’s resonant/ringing frequencies with bass trapping in corners. Next is diffusion or absorption of the early reflections off the floor, ceiling, and side walls. Of course, the back wall and front wall (with the windows). The windows may benefit from heavy velour curtains. Ideally, the speakers would be mounted in soffits, like in recording studio control rooms, but it’s just my living room, so it’s a design tradeoff (ha ha).
Pretty easy to correlate as one can take a tape measure, or string, or a laser distance measurer, measuring from the mic, with a mirror to find the reflection points and correlate to the ETC by using the wavelength calculator.
This is an ETC measurement of my untreated room. I can label the reflections based on translating to a physical measure in the room. As it stands it is not too bad as the rule of thumb is that all early reflections should be 15 dB or more down from the main signal amplitude. I am almost there. This is simply by virtue that my listening position is as far away from any reflecting surfaces as possible, given the contraints of my room.
Check out this waterfall graph showing at which range of frequencies are producing the long decay times. This means my room is very lively as the carpet is indeed the only real absorbent material in a room that is otherwise all drywall, glass, tile, and hardwood (on top of being the 2nd worst room ratio).
What you are seeing here is sound measured in 3 dimensions, vertical scale is level or amplitude in decibels, the horizontals scale is frequency in hertz and the z scale is time in milliseconds. In my case, the time scale is from 0 to 300 milliseconds, meaning the sound has travelled roughly 300 feet (10x the length of the room and 20x the width of the room) in the room when the microphone measured 300 milliseconds after the direct sound, so that we get the sound of the room and it’s decay and display in a visual 3D graph.
I have circled the two problem areas. The one on the left is showing the room resonances with peaks and valleys, that I identified earlier. The one in the lower middle is showing the long midrange decays times, which build up more than other frequencies and caused me to incorrectly compensate by lowering the DRC "target" frequency response by -3 dB at 2 KHz. More on that later.
Let’s look at shape of the decay over time. There are ITU, IEC, ISO, BBC, and other standards bodies specification of the reverb time (spec’d as RT60) or more properly, early decay time, for critical listening environments of a minimum volume of 2500 cubic feet. The specification or preferred range is from .4 to .6 seconds decay across the frequency band, with some rise in the bottom end allowed. That’s 400 to 600 milliseconds max.
I am definitely over the .6 second mark in the midrange as circled in the graph (turns out to be .7 seconds). In this case, some broadband absorbers with good absorption in the midrange will be called for in this design. These should be mounted at the first reflection point on the ceiling and the rear wall to not only reduce early reflections, but further dampen the “brightness” and “boxiness” sound of the untreated room. If my room happened to be the opposite, i.e. dead sounding, then I would put diffuser panels on the ceiling and rear wall instead, with that .4 to .6 second decay as the target RT60.
That’s the analysis of my room acoustics and some basic acoustic design, not only based on measurements, but extended hours listening for early reflections, room modes, and midrange comb filtering. My design is to dampen the back of that pounding 15” woofer and the room modes at the cutoff frequency and harmonic ringing. In addition, absorb broadband midrange due to bare walls, plus take care of the early reflections (floor has carpet, the ceiling gets the absorber) to get rid of that “boxy” sound. We will see if it is enough or not. As a last resort, I can hang heavy (velour) curtains over the front windows plus a good portion of the wall.
Listening to the Untreated Room
Let’s take a listen to a binaural recording of my untreated room so you can hear for yourself, the “boxiness” sound and “one note” bass sound. I chose the tune "Arbantana" from Hossam Ramzy's album, "Rock the Tabla" (iTunes) for a number of reasons. The bass notes really activates the "one note" bass tone of the room, lots of transient percussion for timing and imaging cues, plus good artificial reverb that further exaggerates the boxy tone of my room. Besides, I like the tune and Hossam is an awesome percussionist.
Sept 15 2012 Update
Unfortunately, there are two issues with my binaural recordings. One is that I did not calibrate the binaural microphones frequency response, and the other is I used my head instead of a “fixed” dummy head.
This is the frequency response of the binaural microphones as best as I could record with my head not moving. The monitors producing the swept sine have already been calibrated to 20Hz to 22KHz flat +-4db. What is being shown is the microphones frequency response deviation from flat. There should have been an inverse digital (FIR) filter applied to the mics frequency response to make the recording flat.
The other flaw was a sample rate problem with the right channel (in blue) dropping out at 16Khz. These issues reduce the true representation of what I was listening to versus what the mics picked up.
There is value in the comparisons as the relative differences are audible. However, the overall tonal balance (i.e. timbre) does not accurately represent the tone quality of the system due to the un-calibrated mics.
Here are the new binaural mics (with silcone ears) that can be mounted on a mic stand. Note these are not the ones I used for the recordings in this post, but will use in an upcomig article. Now that I have a repeatable way to make binaural recordings, and corrected the sample rate issues, I can calibrate these new binaural mics. I have also updated my A/D D/A converter to a Lynx Hilo. I may the tests/recordings and update this post, but more likely create a new post.
In the meantime, I hope you find value in the walkthrough and comparing the binaural recordings.
Use headphones to fully realize the binaural effect.
Download MP3 320kbps 4 meg
Download hires WAV 40 meg
I recorded this while sitting in the listening position and using my Lynx L22 pro sound card’s ADC direct to Audacity at 24/96. You may want to listen to it a few times through to acclimatize your ears to binaural sound and the sound of my speaker/room combo. To me, my speakers/room combo sounds bright and boxy through the mids. And the one note bass tone is evident as well. As a side, this is more a function of the room than the speakers as will be seen in a future post.
Here is what to listen for, keeping in mind the sound of the Haas effect from the video. Specifically, what bass there is, sounds muddy and has a dominant overtone. It will be hard to distinguish the kick drum and bass guitar as the "one note" tone makes everything kind of blend together, sounds drone like or room boom.
With respect to the midrange, notice a set of tight drums cracking away, interplaying with the main drum kit. Notice the width and depth of field of these drums, almost slap echo off the untreated walls - heavily comb filtered/ringing and its "depth" position in the mix is wrong. It sounds too far back, but too up in level because of the reverb build up (comb filtering/ringing) of the midrange frequency range. Listen as many times as required to tune into these timbre issues.
Adding Passive Acoustical Room Treatments
Every listening room has a fundamental resonant frequency (plus harmonics) that will need some taming. It is simply a function of the physical dimensions of the room. Depending on how “live” or “dead” sounding the room is will determine the number of diffusers and/or absorbers for any particular sound environment to achieve the recommended RT60 decay time. The ideal design is to have all sound at all frequencies decay at the same rate and meet halfway between the RT60 specification of .4 to .6 seconds.
Every critical listening environment could benefit from this basic passive acoustic filter design pattern. A more encompassing design pattern looks may look like this:
I have used this design pattern (and portions thereof) extensively and successfully when I was in the pro audio business
Here is what I ended up installing in my room. 6 panels, 4 clipped onto the back wall and 2 on the ceiling to take care of the early reflections. 2 corner bass traps behind the speakers:
Here are a few measurements to see how the passive acoustic treatments helped out the acoustics, even though I can hear the difference just standing in the room. These overlays are to compare before and after acoustic treatments. I have zoomed in the vertical scale to 2 dB per division to show detail, which exaggerates the "un-smoothness" of the frequency response.
The acoustic treatments are able to significantly dampen the circled areas almost by 5 dB at 200 Hz and 3dB through the midrange, which is reducing the room power by half. Said another way, the passive acoustic treatments reduce the room gain by half in the identified problem areas. That's significant.
The early reflection in the 4 to 5 millisecond range has been reduced considerably as a result of the bass traps placed behind the speakers and reducing the reflection off the wall behind the speakers. This is key to the kick drum having definition and hearing all the bass notes from the bass guitar at equal loudness, both in the frequency and time domain.
Compare the two 3D waterfall graphs above, the first one before treatments and the latter after. The mid-range decay times (the boxiness sound) have been reduced as circled. Also note, the 200 Hz peak and decay has also been reduced 5 dB. I was going to screen cast switching between the two graphs so you could get a real good sense of the passive acoustic filters at work as it is much more than just the circled points, the overall sound is further diffused.
Because of the passive acoustic treatments, my room's RT60 is now within the .4 to .6 second specification across the frequency range. If I was to add any more absorbent material, I might add a couple more ceiling absorbers right over the listening position to reduce the comb filtering effects of the couch, or adding heavy velour drapes to the windows in the front of the room.
Listen to the Difference - Untreated versus Treated Room
Here is another binaural recording of the same tune, but now in the treated acoustic space. What I did so you can hear the difference between untreated and treated room, I level matched the binaural recordings of the before and after (within .1 dB). Every 15 seconds, starting with the untreated room, I switch from untreated to treated to untreated, etc., every 15 seconds. Like so in Audacity:
What to listen for? When the binaural recording switches from untreated room to treated room, you should hear a timbre change as the midrange damping will lessen the boxy sound. You should also hear a tightening of the bass and less of the "one note" bass tone. As the binaural recording progresses and more instruments are played, it will become easier to hear the switch between boxy, reverberant sounding to much more defined soundstage and tighter sounding.
Especially note the sound of the kick drum and bass guitar in the low end. Remember how reverberant the tight cracking drums sounded in the untreated room? These should sound less "slappy" sounding, having a tighter definition. In fact, you should hear a tighter definition and more focus on everything.
As noted, above, as more instruments are playing, you will notice a timbre change towards the end with the soloing guitar. The slap echo or ringing effect, that adds tone color, is gone when the recording switches to treated room towards the end. You may have to play it a few times to really key in what is going on. You can also use a timer while listening and looking at the graph to key right in on the transition changes.
Use headphones to fully realize the binaural effect.
Download MP3 320kbps 4 meg
Download hires WAV 40 meg
Based on my listening tests, the bottom end and midrange are much tighter defined, as is the overall stereo 3D image. An overall improvement in frequency response smoothness, with tighter definition or imaging or timing. Sounds more focused. It is easier to hear the tone quality change towards the end. It seems I am right in there for the proper decay time.
The sonic improvements that I hear line right up with what I measure and vice versa. So from a timbre perspective, I am pretty happy with the end result.
Analysis and Design Part 2
After living with the acoustic treatments for a week and listening everyday, have made a major improvement in tone quality. Dampening the “one note” bass room mode and dampening the “boxiness” comb filtering in the mids. The decay time is within specification as evidenced by both the measurements and binaural recordings that you can hear the timbre (or tone quality) improvement yourself.
What further improvements can I make to the speaker/room interface? How can I further improve the timbre? There seems to be more room to improve, especially given the frequency response still deviates quite a few (14) dB, when I should be in the +- 3 dB range across the frequency band. Even then, 1 dB either way is audible. How do I further smoothen the frequency response?
Also, what about phase coherence and timing at the listening position? Can I improve that? I remember owning Thiel CS 3.6 time aligned speakers in the consumer world and when I was recording/mixing, I was using the Urei 813C time aligns. I can hear time alignment, and I can measure it. So how do I improve the time alignment (as my speakers don’t have that feature built-in – many don’t as it is hard to do - meaning expensive) plus how do I further minimize early reflections to get the best image possible at the listening position?
Basically I need both frequency and time alignment capabilities. Just like every piece of audio gear has a sonic signature, the revolution that is digital audio, provides a facility to correct the sound in the digital domain at high resolution (64 bit data path) and low distortion. Given the computing power and sophisticated DSP software we have today, there is a classification of software that is called Digital Room Correction (DRC) software.
Therefore, I can easily create any sonic signature I want since I have more control over the frequency and time domain than my ears can discriminate using software like this. With the software, you can use default digital filters, or using a Designer, create your own. This is designing the transfer function for the speaker/room combo. In this case, a linear phase FIR filter.
Digital Room Correction
How do we do this? We design the digital filter using a “target” frequency response, one that we design in software. If time domain correction is enabled, which it is in my case, then the impulse values change with target frequency response. The best impulse response can be achieved by matching the target's high frequency roll-off, with the natural roll-off of the tweeters filtered frequency response.
For me, this tunes the filter to yield the best possible timbre for the speaker/room combo. When this is tuned properly, the timbre tunes in like a guitar string being brought into tune. I have guitars, mics, A/D converter, so I can compare live and recorded timbre of the guitars, plus shakers, tambourines, triangles, etc.
Here is an example of a "designed target" frequency response using Audiolense. I draw or enter in the data points of the frequency response curve I want (red dots).
I have tried dozens of targets before I added my acoustical treatments. Here are a few examples:
Every one of these "targets" sound different, both from (frequency) tone and (impulse) timing perspective. I circled the 2 KHz region. If you look back at the untreated room frequency response, near the top of the article with the 3 dB peak circled in the 1 to 2 KHz range, I used DRC to compensate for this, by dropping the target down -3 dB at 2 KHz. However, this is the wrong thing to do as DRC cannot compensate for excessive midrange RT60 decay time, only by adding acoustical treatments can.
Once I treated my room acoustically, I no longer needed to drop the target frequency response by - 3 dB at 2 KHz. That was a learning for me. Actually, a re-learning for me as I remember reading this in Don Davis excellent book on Sound System Engineering "You can't effectively (digital) eq a reverberant field".
Here is another view of the target plus the uncorrected frequency response of my speakers in the main form view of Audiolense. Note how the targets frequency extremes match the speakers natural roll-offs at the extremes.
I snuck in a little bottom end lift on the target, but given the Klipsch QB3 alignment of the ported bass bin, I can tuck in little more low end and still have the bass sound tight and not over tax the amplifiers.
Now I can have Audiolense generate the digital FIR filter (which is almost an inverse of the uncorrected response, I say almost because there are other algorithms at play here):
Here is the resultant corrected frequency response:
The uncorrected frequency response is on top and the corrected frequency response is on the bottom (along with the target). In addition to the acoustical treatments, and short of building a state of the art critical listening room from the ground up, I know of no other way to achieve this level of timbre correction, given my awful room ratio.
Before we listen, let’s look at a few measurements.
Frequency response. I have zoomed way in on the amplitude scale again to show detail. The DRC is able to correct for a 14 dB swing and reduce it to +- 2 dB deviation. The spectral response is similar to preferred spectral responses as described in B&K's paper (Figure 5) and Dr. Sean Olive's paper (slide 24).
ETC looks to be in spec as almost all early reflections are – 15 dB below the main signal arrival. The early reflection of around 2 milliseconds is the first reflection off the floor to the listening position. Other than mounting the speakers in soffits, not much can be done there. The good news is on how diffuse the later reflections are. Which means the room adds little tone color to the reproduced music through the speaker/room combo. This is captured in the binaural recordings.
The blue waterfall graph is as good as it is going to get given my room ratio. I can play with the decay time of the 50 to 60 Hz wave by adjusting a parameter in the time domain window in Audiolense’s Correction Procedure Designer as a next step to tune this back a bit, but I don’t notice it too much in the sound.
Listen to the difference Part 2
Just like the previous AB compare of untreated room to treated room, I did the same thing using the in-ear binaural mics to record both the treated room and with DRC. Starting with the treated room and then switching to the treated room with DRC in 15 seconds and then switching back and forth every 15 seconds after.
Use headphones to fully realize the binaural effect.
Download MP3 320kbps 4 meg
Download hires WAV 40 meg
Even a greater timbre change with DRC enabled. So much so, it can be rather startling at first, but as you peruse the graphs and listen, you can make the correlation. It is quite flat sounding, both figuratively and literally. If that ain't your type of sound, then you can change the target to be anything you want.
Take a moment and listen to the full 1:30 seconds of the recording with room treatments and DRC (with time alignment) enabled.
Download MP3 320kbps 4 meg
Download hires WAV 40 meg
Listen for the kick drum and bass guitar. The kick drum now has a full, yet tightly delineated sound to it. It is nice and clear and note it's position in the mix. The bass guitar, you can hear every note at equal loudness, no hint of "one note" bass tone. Near the end of the tune, you can hear the bass note slide very low and rise. If you were physically in the room, you would feel the bass and drums.
Also gone is the "boxy" sound. Focus in on those tight cracking drums, no longer do they sound outside the mix. They now fit in the mix at the right level and nowhere near as reverberant.
Note the overall stereo image and depth of field. Listen to where the kick drum is slotted in the mix and that it sounds crystal clear and not buried in a muddy sound. The image now has a depth of field as deep as the width is wide.
The more you listen, the more you will notice. Our ears will acclimatize after a couple of minutes and will have forgotten the orginal horror tone of my untreated room.
I had a lot of fun doing this. I think the binaural recordings a bit novel. The timbre changes between the untreated room, treated room, and with DRC, are definitely audible in the binaural recordings. As mentioned earlier, it is too bad that the mics are not calibrated and that there is no way I was able to position myself repeatedly in the same spot. Now that I have a fix for that, along with a mic calibration procedure, my next set of binaural recordings will match much closer to the true timbre of my system.
If achieving the best possible timbre from your audiophile system is of interest to you, then this article and my previous article, Speaker to Room Calibration Walkthrough, may be of some use.
Update October 15th new frequency and impulse response measurements
I have learned a lot about digital room correction and FIR filter design over the last year. These two resources, Sound correction in the frequency and time domain and DRC: Digital Room Correction have really helped me understand what is happening and what targets to shoot for.
Also, Bob Katz, assisted with the target response of flat to 1 kHz, and using 1 kHz as a hinge point, straight line to -6 dB at 20 kHz. Here is the measured response at the listening position:
Then I measured the un/corrected impulse response, using REW. It is remarkably similar to the before and after impulse responses at Denis Sbragion's site. Here is the impulse response of Denis system, before and after DRC applied:
Here is the uncorrected and corrected impulse response of my system:
Remarkably similar to Denis’s impulse response, but maybe a bit less ringing. Getting close to textbook perfect impulse response.
The Audilolense designed 64-bit FIR filter is hosted in JRiver’s version 18 64-bit convolution engine. “The precision offered by Media Center's 64-bit audio engine is billions of times greater than the best hardware can utilize. In other words, it is bit-perfect on all known hardware.”
Using JRiver’s loopback function, I am able to send the swept sine wave output of Audiolense into the input of JRiver, where the output of JRiver goes out the DAC, line output, and into the amp/speakers. In JRiver, I can toggle the FIR filter on/off in the Convolution engine and take measures as the mic at the listening position picks up the signal, and through a mic preamp, into the analog line inputs of my Lynx Hilo, through the ADC, and routed to the digital input channel of Audiolense so it can measure the responses. Like so:
In JRiver, another feature of the 64-bit Convolution engine eliminates the FIR filters delay. This is an awesome feature for HTPC’s and/or Streaming Media players as the audio will not be delayed and the picture will be in sync. In my case, I am not using a lot of filter taps, so my filter’s delay is 171 milliseconds plus add low latency delay Lynx Hilo routing through digital loopback into JRiver’s audio/convolution engine.
This also means there is no delay when performing swept sine wave measurements where the playback and recording are synchronized, and using a tracking filter, ensure high signal to noise ratio, typically 100 dB of dynamic range.
If there is too much delay introduced either by the JRiver loopback feature or latency through the A/D D/A device or too long buffering times, both Audiolense and REW will try and “hunt” for the signal. But because of excessive delay, neither software may be able to signal lock and the measurements are no good. This is what happened to me in JRiver 17 as the Convolution engine in that version does not have this feature.. It is important to have all timing settings in the system as low latency as possible, without the audio stuttering, pops, drop outs, etc.
It takes experimenting to arrive at the optimum system settings, whether it is WASAPI, ASIO, or Kernel Streaming, plus fiddling with both the A/D D/A device buffers and JRiver buffers… But it can be done and is worth it. Why? JRiver is my predominant music player, so a valid test would be to run the test signal through the same audio engine I am listening too, either with 64-bit Convolution off (audio routed through Hilo's excellent headphone amp to my Senns), or Covolution on, routed through the amp/speaker/room playback chain.
Now that I have a repeatable way to take audio measurments through the entire signal path that I listen to music through, I think I can fine tune the system even more. Stay tuned for more updates.