PDA

View Full Version : SB2 really doesn't support 96kHz PCM on S/PDIF?



Richard Elen
2005-03-16, 04:17
> From: Phil Karn <karn (AT) ka9q (DOT) net>
> Kim Rochat wrote:
>> I've been waiting for a wireless-G version of the Squeezebox on the
>> assumption that it would support PCM sampling rates greater than 48kHz,
>> but reading the specs for the SB2 says that the digital outputs only
>> support 44.1 and 48. Is this really true? If so, I'll have to wait for
>> an SB3.
>
> Do you seriously claim that you can hear sounds above 24 KHz? All the
> way up to 48 KHz? Or are you trying to entertain your dog?

Sorry for this long reply, which you can simply skip if you know it all
already, but in my view there are several issues here, and not all of
them are straightforward.

First, with the vast majority of users storing their music in
lossy-compressed formats with sample rates seldom exceeding 44.1 kHz and
often lower, this may be moot for a lot of people. However with the
advent of lossless compression format availability in the system, very
likely those of us who care about audio quality will jump on them, and
if we do so, we might feel that the ability to handle sample rates above
48 kHz would be an advantage.

But as we cannot hear above about 20 kHz, honestly, at the best of
times, except perhaps when the level is above about 120 dB SPL at which
point we certainly can't tell the pitch, what's the point of higher
sample rates?

The traditional reason for proposing higher rates was due to analogue
filtering in the conversion process. With an audible band ending at,
say, 20kHz, and a Nyquist limit (the highest frequency you can capture
with a system, which is half the 'carrier' frequency) at fs/2, eg
22.05kHz, this meant only a small bandwidth to get signals out of the
way before they started aliasing and imaging (frequencies above the
Nyquist limit are intepreted as frequencies below - see
http://www.apogeedigital.com/pdf/apogeeguide.pdf for more details).

This meant very steep filters that were initially in the analogue
domain. As a result they cause all kinds of problems such as ringing,
and phase errors all the way down to below 10kHz. This is why early
digital recordings sounded clangy and brittle (we also didn't know about
the audibility of jitter then either).

The solution was to get the filters way out of the way by multiplying
the sample rate - multiplying by integers was simplest and produced the
fewest artefacts. But you don't "really" have to work at the higher
rate; you can instead 'oversample', where you interpolate imaginary
samples between the real ones, the Nyquist limit is way up there and
there is tons of room for filtering - which you could also do digitally
by now and avoid artefacts by and large anyway. Virtually all converters
used today do many times oversampling and as a result this problem has
gone away.

The other argument for high sample rates is that there may be stuff
going on up there even if we can't hear it, and we really ought to
capture everything - especially if we are archiving the material and
think that inaudible stuff might just be important one day. It's also
possible that inaudible frequencies might interfere with each other to
produce something audible - for example if you close-mic violins and
record each mic on its own track (please don't), not capturing the
ultrasonics might adversely affect the timbre when you mix, which would
not have been the case if you had used more distant mics and the mixing
of ultrasonics had happened in the air between the musicians and the mics.

The fact is, however, that the human hearing capability can be satisfied
by a PCM system with a minimum sample rate of around 52kHz and a word
length of around 20 bits if properly dithered. But dithering isn't
always done properly so 24-bit is a good idea (of course you create lots
more bits if doing DSP, but you then dither back down again) and offers
a theoretical 144dB of dynamic range. And although we can now convert
sample rates by non-integer values without an audible penalty, doubling
up 44.1 and 48 to 88.2 and 96 still makes sense, and anyway it's out
there now. So it is common to offer 24/88.2 and 24/96 capability.

You can hear the difference between, say, 48 and 96, (though make sure
you are doing blind testing), but this may still be down to filtering
and the quality of the analogue circuitry. It is also an interesting
fact that a good lossless compression system like MLP actually gets more
efficient at high sample rates and so the space taken up by something at
96 kHz compressed is only 1.3 times the same thing at 48 compressed (in
the case of MLP). This is partially, but not entirely, because Nothing
Is Goin On to compress up at higher frequencies so it doesn't take up
any room.

And how about marketing? Well, saying your product goes up to 96 or 192
kHz is nice because bigger numbers sell more, even if customers can't
hear the difference. And at one time, converter chips to handle higher
rates were more expensive, but now everyone's does 192 kHz (quite beyond
the pale in my view - there really /is/ nothing going on). You do,
however, need to pay more attention to the analogue circuitry to make an
improvement, and that may be more expensive. So it may be a wash (in
which case I would say go for it). And then if you are sending
high-sample-rate PCM over a network you will eat up bandwidth, so you
need your realtime lossless decompression system in the box which is now
getting more complicated...

Devices like the SB stand at an interesting junction between two
thriving markets: the home audio/video/theatre market (where
multichannel, high-quality audio is appreciated and a selling point) and
the 'personal' audio market where convenience and portability (in every
sense) are more important than more than two channels and audio quality
- if quality is the issue we will see everyone move over to lossless
compression and eschew MP3 and even AAC, but I doubt it. Contrary to
some popular opinion, these two markets are not mutually exclusive:
people still have home systems as well as iPods. However, they also do
use music for different purposes, and if customers are to be expected to
plug their SBs into the main home audio system - another aspect of
portability - they might expect a similar level of quality to be
available as their home system delivers, whether they are real bits or
'marketing bits'.

As we are not in the archiving or studio recording businesses, one could
argue that high sample rates are less important - an ability to handle
multichannel would be preferable for example in my case - but of course
making a box do high sample rates doesn't mean customers have to use
them. Most may not even notice, though it might encourage sales among
the smaller number of potential customers who believe they do. Is it
worthwhile? Well, if it was me and I could afford it, I would do some
market research. I bet SD have, and I am sure that cost/benefit analysis
has led to the decision to which they came.

Hope this helps...

--Richard E

Phil Karn
2005-03-21, 00:51
Richard Elen wrote:

> First, with the vast majority of users storing their music in
> lossy-compressed formats with sample rates seldom exceeding 44.1 kHz and
> often lower, this may be moot for a lot of people.

Agreed.

> However with the
> advent of lossless compression format availability in the system, very
> likely those of us who care about audio quality will jump on them, and
> if we do so, we might feel that the ability to handle sample rates above
> 48 kHz would be an advantage.

This is not obvious.

> But as we cannot hear above about 20 kHz, honestly, at the best of
> times, except perhaps when the level is above about 120 dB SPL at which
> point we certainly can't tell the pitch, what's the point of higher
> sample rates?

Good question.

> The traditional reason for proposing higher rates was due to analogue
> filtering in the conversion process. With an audible band ending at,
> say, 20kHz, and a Nyquist limit (the highest frequency you can capture
> with a system, which is half the 'carrier' frequency) at fs/2, eg
> 22.05kHz, this meant only a small bandwidth to get signals out of the
> way before they started aliasing and imaging (frequencies above the
> Nyquist limit are intepreted as frequencies below - see
> http://www.apogeedigital.com/pdf/apogeeguide.pdf for more details).
>
> This meant very steep filters that were initially in the analogue
> domain. As a result they cause all kinds of problems such as ringing,
> and phase errors all the way down to below 10kHz. This is why early
> digital recordings sounded clangy and brittle (we also didn't know about
> the audibility of jitter then either).

While it's true that sharp reconstruction filters were difficult to
implement in analog hardware, this problem -- if it ever *was* a problem
-- hasn't really been around for almost two decades now. Every modern
DAC uses "oversampling" techniques to implement the high-performance
part of the reconstruction filter in DSP, requiring only a simple and
cheap analog filter to complete the job.

That said, there was *never* any serious evidence that the
much-discussed phase shifts of the early analog reconstruction filters
were *ever* audible in any controlled listening environments. The effect
was comparable to tiny position changes of a tweeter vs a midrange
speaker in a cabinet. I certainly heard all those claims about
"brittle-sounding" digital audio, but never did I see one based on
solid, scientifically controlled evidence.

> The solution was to get the filters way out of the way by multiplying
> the sample rate - multiplying by integers was simplest and produced the
> fewest artefacts. But you don't "really" have to work at the higher
> rate; you can instead 'oversample', where you interpolate imaginary
> samples between the real ones, the Nyquist limit is way up there and
> there is tons of room for filtering - which you could also do digitally
> by now and avoid artefacts by and large anyway. Virtually all converters
> used today do many times oversampling and as a result this problem has
> gone away.

Exactly. This is sometimes cited by the golden ears as a vindication of
their claims -- why would the manufacturers do it if it wasn't necessary
-- but the real reason is that it's simply cheaper and easier to
oversample and filter in DSP than to do it with lot of expensive analog
components. We just happen to be lucky that the cheaper way is also the
"better" way in some sense. Of course, the vendors knew a marketing
windfall when they had one, so they made lots of hay over their
oversampled DACs. But they never made one whit of audible difference.
They just saved the vendors a few pennies per unit in component costs.

> The other argument for high sample rates is that there may be stuff
> going on up there even if we can't hear it, and we really ought to
> capture everything - especially if we are archiving the material and
> think that inaudible stuff might just be important one day. It's also
> possible that inaudible frequencies might interfere with each other to
> produce something audible - for example if you close-mic violins and
> record each mic on its own track (please don't), not capturing the
> ultrasonics might adversely affect the timbre when you mix, which would
> not have been the case if you had used more distant mics and the mixing
> of ultrasonics had happened in the air between the musicians and the mics.

That basically argues that intermodulation distortion is a good thing,
the effects of which we should preserve. Seems very strange to me,
especially since we already agree that the components involved in the
intermodulation are not directly audible. So if we were to eliminate
those ultrasonic components by low pass filtering before A/D conversion,
the sole effect would be the elimination of intermodulation products in
the audio range that wouldn't exist if the intermodulation distortion
weren't present.

Right. Makes perfect sense. Not.

> The fact is, however, that the human hearing capability can be satisfied
> by a PCM system with a minimum sample rate of around 52kHz and a word
> length of around 20 bits if properly dithered. But dithering isn't
> always done properly so 24-bit is a good idea (of course you create lots
> more bits if doing DSP, but you then dither back down again) and offers
> a theoretical 144dB of dynamic range. And although we can now convert
> sample rates by non-integer values without an audible penalty, doubling
> up 44.1 and 48 to 88.2 and 96 still makes sense, and anyway it's out
> there now. So it is common to offer 24/88.2 and 24/96 capability.

Larger dynamic ranges than 16 bits are often perfectly appropriate for
intermediate steps in the mixing process. You can easily show with math
-- and every DSP programmer well knows -- that maintaining a certain
dynamic range in the final product often requires a greater dynamic
range in the intermediate steps, especially if there are many of them.
But that does not justify the use of > 16 bits of precision at other
stages in the process, as on material that clearly doesn't have the
dynamic range to justify it.

This does not, however, justify higher sampling rates to capture
frequencies no one can hear. At best, it's a waste of bits. At worst, it
can create audible artifacts that wouldn't otherwise be audible, such as
the earlier discussion on intermodulation distortion.

> You can hear the difference between, say, 48 and 96, (though make sure
> you are doing blind testing),

I simply don't believe that.

> but this may still be down to filtering
> and the quality of the analogue circuitry.

Right. That would have to be carefully excluded in any serious
scientific argument that higher sampling rates are justified. It's easy
to get a positive result from an experiment that would seem to prove a
hypothesis when in fact the positive result is an unrelated and
uncontrolled artifact.

> It is also an interesting
> fact that a good lossless compression system like MLP actually gets more
> efficient at high sample rates and so the space taken up by something at
> 96 kHz compressed is only 1.3 times the same thing at 48 compressed (in
> the case of MLP). This is partially, but not entirely, because Nothing
> Is Goin On to compress up at higher frequencies so it doesn't take up
> any room.

This is true. It simply demonstrates that you're wasting bits.

> And how about marketing? Well, saying your product goes up to 96 or 192
> kHz is nice because bigger numbers sell more, even if customers can't
> hear the difference. And at one time, converter chips to handle higher
> rates were more expensive, but now everyone's does 192 kHz (quite beyond
> the pale in my view - there really /is/ nothing going on). You do,
> however, need to pay more attention to the analogue circuitry to make an
> improvement, and that may be more expensive. So it may be a wash (in
> which case I would say go for it). And then if you are sending
> high-sample-rate PCM over a network you will eat up bandwidth, so you
> need your realtime lossless decompression system in the box which is now
> getting more complicated...

I simply won't sanction such a cynical marketing scheme.

I will say, however, that I'm happy that sound interfaces with higher
sampling rates are available, but not because I think they make a
difference in hi fi audio applications. I'm a radio amateur interested
in digital signal processing, much of which is now done on general
purpose computers with general purpose sound interfaces. This limits the
bandwidth of any signals we can process. Sound cards with higher
sampling rates let us generate and process wider, higher rate digital
signals.

> Devices like the SB stand at an interesting junction between two
> thriving markets: the home audio/video/theatre market (where
> multichannel, high-quality audio is appreciated and a selling point) and
> the 'personal' audio market where convenience and portability (in every
> sense) are more important than more than two channels and audio quality
> - if quality is the issue we will see everyone move over to lossless
> compression and eschew MP3 and even AAC, but I doubt it. Contrary to
> some popular opinion, these two markets are not mutually exclusive:
> people still have home systems as well as iPods. However, they also do
> use music for different purposes, and if customers are to be expected to
> plug their SBs into the main home audio system - another aspect of
> portability - they might expect a similar level of quality to be
> available as their home system delivers, whether they are real bits or
> 'marketing bits'.

It is not at all difficult to provide "hi fi" performance with portable
hardware these days. One can certainly argue about the data rate that a
given lossy algorithm requires to achieve "transparency", but the fact
is that just about every portable music player can operate at those high
data rates if desired, and if the user is willing to spend the extra
disk space that's required.

> As we are not in the archiving or studio recording businesses, one could
> argue that high sample rates are less important - an ability to handle
> multichannel would be preferable for example in my case - but of course
> making a box do high sample rates doesn't mean customers have to use
> them. Most may not even notice, though it might encourage sales among
> the smaller number of potential customers who believe they do. Is it
> worthwhile? Well, if it was me and I could afford it, I would do some
> market research. I bet SD have, and I am sure that cost/benefit analysis
> has led to the decision to which they came.

Well, yes. If the higher sampling rates really *are* free, I wouldn't
object to having them in my hardware. I might even want to use it in a
wideband amateur radio digital application. But the mere existence of
those higher sampling rates hardly constitutes evidence that they
provide a detectable benefit.