Audio Engineering Myths: “Linearity”

The claim that something is (or needs to be) linear is maybe the most ill-used (in the sense of “used where it’s the wrong term”) in audio technology.

So, today’s article will explain:

  • what linearity really means,
  • how it is relevant to our audio engineering (and consequently also instrument effects) applications,
  • continue to explain the concept of time invariance (which, together with linearity, has a lot of nice properties),
  • and all that with examples.

Linearity and Time Invariance – Definitions

1. Linearity

A function (or a map¹) is called linear if (and only if, or iff in hobo signs) the following equation holds:

af(x+y) = f(ax) + f(ay)  (1)

with f our function , x and y two variables, and a a constant.

This can be further broken down into the two properties of a linear function:

  1. Additivity: f(x+y) = f(x) + f(y)  (1a),
  2. Homogeneity of degree 1: af(x) = f(ax) (1b).

2. Time Invariance

A system is called time invariant iff:

f(x(t),t) = f(x(t+d),t+d)  (2)

with x our input signal, f the transfer function of our signal, t the time, and d an arbitrary time.

3. Linear, Time Invariant

A system is called linear, time invariant (LTI) iff it is both linear and time invariant, i.e. if both (1) and (2) hold.

Why even bother?

There’s a few useful properties of LTI systems, which apply to our audio engineering world, namely:

  • LTI systems can completely be described by their impulse response, or its fourier transform, the (complex) frequency response.
  • LTI systems can completely be described as a network of unit delays, adders and multipliers.
  • If you combine LTI systems (e.g. by chaining them), the order is irrelevant.
  • A combination of LTI systems is always LTI.
  • A LTI system does never generate new spectral content. That means if you send a 300Hz sine wave into an LTI system, everything that will ever come out are 300Hz sine waves.

What it means

We’ll now look at explanations of those equations above in an audio engineering context. Followingly, f is just an effects box (be it a stomp box, fancy rack effect, VST plugin, analog or digital etc. doesn’t matter here), x and y are two audio signals, a ist just a number (e.g. 1.5) and d is a time (e.g. 1 second).

Additivity means that if you have two identical effects boxes, say, two reverbs, and two signals (e.g. two guitar signals) which get mixed together 1:1, then it doesn’t matter if you mix them before you send them trough one of your boxes, or if you send each signal to one of your boxes, and mix them afterwards.

Homogeneity means that if you have a gain knob at the input and one at the output of the effects box, then it doesn’t matter which one you use – the result is identical.

Time invariance simply means that if you send your input signal e.g. one second later than before, then the result will be identical, only one second later.

A word of caution

No existing technology implementation is truly linear for a number of reasons, and systems safe for completely non-realtime, DAW-contained ones aren’t truly time-invariant. In the context of this article, I will consider systems that are “close to being” as linear, and time-invariant, without going into the theoretical background why they can’t be linear or how to measure the degree of nonlinearity².

Effects Reality

In that chapter, I will be talking, if nothing else is mentioned, about ideal implementations of the effect in question. An ideal effect is one that is true to the theoretical thing this effect should do (e.g. in the case of a delay, just delay the signal, and not change the frequency response, distort, etc.). As a rule of thumb:

  • the implementations in modern environment (effects processors, DAWs,…) that have just the name of the effect or “digital” or “standard” in its name are ideal,
  • everything that is treasured old vintage stuff, or a digital implementation “based on” or “vintage” or “tube” is not ideal.

Delays

We know the thing: it takes an input signal and delays it by a fixed amount of time.

At the very basic level (i.e. no dry signal, no feedback, no modulation), the function f becomes:

f(x(t)) = x(t-t_d)  (3),

with t_d the delay time. If we add levers to adjust the dry and wet signal and call them g_d and g_w (with “100%=1”), this becomes

f(x(t)) = g_d*x(t) + g_w*(x(t-t_d))  (3a).

Now adding feedback g_f, we finally get a tricky sum (as the delays continue to repeat) in

f(x(t)) = g_d*x(t) + g_w*SUM[i=0,inf;(g_f^i)*x(t-(i+1)*t_d)]  (3b).

As you can see, 3a is a special case of 3b (setting g_f=0 i.e. no repeats, all summands safe for i=0 become zero) and 3 is a special case of 3a (setting g_w=1 and g_d=0, 3a becomes 3). This also means, what I prove for 3b is also true for 3a and 3, i.e. for all delays.

So let’s check for our three criteria for a LTI system:

Additivity:

f(x(t)+y(t)) = {g_d*x(t) + g_w*SUM[i=0,inf;(g_f^i)*x(t-(i+1)*t_d)]} + {g_d*y(t) + g_w*SUM[i=0,inf;(g_f^i)*y(t-(i+1)*t_d)] = f(x(t)+f(y(t)).

Homogeneity:

af(x(t)) = a*{g_d*x(t) + g_w*SUM[i=0,inf;(g_f^i)*x(t-(i+1)*t_d)]} = a*g_d*x(t) + g_w*a*SUM[i=0,inf;(g_f^i)*x(t-(i+1)*t_d)] = f(ax(t)).

Time Invariance:

I’m foregoing writing that out: there’s no t in 3b except for the ones in the “x(t)” terms, so that holds.

In consequence:

A delay effect is linear and time invariant,

as odd as that may sound, as a delay is usually called a “time-based effect”.

Modulated Delays

In that context, we describe those where the delay time is modulated, typically by a LFO. These can either be “wobbling” delay effects, or for short delay times effects that have become so important that they got their own name: chorus, flanger and phaser. Fortunately, our proof here is identical for all of those, and also based on what we’ve seen above:

The difference to before is that this time, the delay time, t_d, becomes a function of time, i.e. t_d(t). We won’t go into any detail how this function now looks, only we can see:

  • For the time invariance, using the same line of argument as above, this time it’s not time invariant because t_d now is dependant on t.
  • Linearity still holds, because all the operations we did above were independent of the fact if t_d was a constant or a function of time.

So, in consequence:

A modulated delay (e.g. a chorus, flanger or phaser) is linear, but not time invariant.

One more consequence, from the consideration above:

Any effect that modulates something based on time (e.g. has an LFO, sequencer,  etc., or a “period” setting) is not time invariant.

Equalizers and Filters

An equalizer has a somewhat twisted frequency curve, and a filter often cuts away whole portions of the signal, so they can’t be linear, right?

Instead of using for my proof the same approach as above, I will go a little bit more tricky and simply use existing proofs that

  1. Each eventually passive EQ/filter (and that’s the ones we’re looking at here) can be described by a fraction of rational functions,
  2. each fraction of rational functions can in the z domain be synthesized by a combination of adders (LTI), attenuators (LTI) and delays (LTI, proof above),
  3. each system in the z domain can be transformed into another technical realization using existing bilinear transforms, i.e. what’s true in the z domain also holds for other domains.

And with that, we’re already done:

An EQ or filter is linear and time invariant.

Which, again, stands against typical audio understanding, as people often talk about something being “linear” when they mean flat frequency response. Correctly, this is called an  allpass behaviour, and that is also LTI (see above), even if it’s not trivial, i.e. does things to the phase.

Dynamic Effects (including Distortion)

In that section, we’ll be talking about compressors, expanders, their special cases limiters and gates, and also throw in distortion/clipper circuits, because they’re a special case of a compressor with ratio=inf., attack=release=hold=0.

Now if you’ve been following the footnotes so far, you already found that non-linearity is measured by something which has distortion in its name², so it stands to reason that distortion is not linear.

Which is the case, as in a simplified fashion, dynamic effects are described by a so-called piecewise linear (PWL) function, more specifically, a montonous PWL function. Now these functions are, by definition, not linear.

However, they are, even with their obvious time dependency (by parameters like attack and release), actually time-invariant. How so? The time-dependency is always based on the input signal, so we have (and I’m skipping the proof this time):

A dynamic effect is non-linear, but time invariant.

Reverb

This is somewhat tricky, because, different from the examples above, there are various ways to build a reverb effect in an electronic effects processor, which need to be considered independently. Also, as we’re talking about complex combinations of effects, and sometimes consciously “surreal” versions, one needs to consider that those which are not modeled after the real world sometimes tend to have some modulation – taking away a time invariance otherwise stated below.

Reverb based on delay networks

A lot of reverb algorithms are built on complex networks of delays, often with added equalization. As we’ve seen above that both delays and equalizers are LTI, and considering that a combination of LTIs is always LTI, this means that those reverbs are LTI.

Reverb based on physical models

With advent of more abundant DSP power, there was the tendency to allow users to define a reverb by the simulated room’s parameters, e.g. size, surface properties etc. Starting from that, typically either an impulse response (see below) is computed, or a delay network is setup. In both cases, I like to refer to the sections above and below.

However, some of those models also include saturation effects at very high sound pressure levels or for resonating walls, and in that case, saturation automatically means nonlinearity. In sum: for those reverbs, you don’t always know. Most are LTI, and these which aren’t, typically are for sensible signal levels.

Convolution Reverb

You record (or model) what your room sounds like if you just play a theoretical, infinitely high, infintely narrow signal peak (which, for some theoretical stuff, has a surface of exactly “1”) , called a Dirac distribution (or in street vernacular, a Dirac function) – that is called the impulse response. The reverb is then calculated by performing a mathematical operation called faltung (or, by more modern folks, convolution), which is essentially an integral over your sound source and the impulse response. This faltung, like all integration, is a linear operation (and it obviously is time-invariant), so that makes a convolution reverb LTI.

Note that I consciously did evade proving the linearity of integration, mainly because it would quickly have me needing to prove basic things like limes calculation, triangle inequality etc. However, there’s an even nicer non-proof. Above, we’ve already seen that any LTI system can be described by only its impulse response. Now I simply turn this line of argument around and state that if I can describe the system by an impulse response, then it’s LTI (which is not a proof, I know).

Frequency Shift and Pitch Shift

First of all, I’d like to recapitulate what these things really are:

Frequency shift is adding a fixed amount to all frequencies. This is a simple thing to define, as frequency is a clearly defined scientific term.

It’s not so easy with pitch, as this is a subjective thing, which is defined by humans comparing a tone to a sine wave (with a defined frequency). The important thing is that the auditory perception for pitch is logarithmic (as it is for loudness), i.e. shifting pitch means multiplying frequency.

Frequency Shift

Assuming that we can fourier-transform all signals that make sense in our context (and I hope you believe that is the case, because I’m not in the mood to prove that), we can describe that in the frequency domain:

Taking our signal from the effects box f(x(t) and transforming it into F(X(f)), and shifting that by a frequency f_d, the effect (in the frequency domain) is simply:

F(X(f)) = x(f+f_d)  (4).

With our fun Fourier identities, transforming that back results in

f(x(t)) = exp(i*2*pi*f_d*t)*x(t) (4b).

In other words, a frequency shift (by a constant frequency) corresponds to a SSBSC (single sideband suppressed carrier) amplitude modulation, which is exactly what you get from a ring modulator.

Applying our criteria from above, we get for

Additivity:

f(x+y) = exp(i*2*pi*f_d*t)*(x(t)+y(t) = exp(i*2*pi*f_d*t)*x(t) + exp(i*2*pi*f_d*t)*y(t) = f(x)+f(y)

Homogeneity:

af(x(t)) = a*exp(i*2*pi*f_d*t)*x(t) = f(ax(t))

Time Invariance:

f(x(t+d),t+d) = exp(i*2*pi*f_d*(t+d))*x(t+d) != exp(i*2*pi*f_d*(t))*x(t)

In consequence:

A frequency shifter is linear, but not time-invariant

Pitch Shift

Again with the Fourier transform, we would get

F(X(f)) = x(f*b)  (5)

with b a scaling factor which somehow relates to the pitch shift (I don’t want to go into details how here). This transforms to

f(x(t)) = x(t/b)  (5b),

which looks perfectly linear and time-invariant, but there’s one problem: time moves slower or faster here! In fact, what we did with this equation is the effect of speeding up or slowing down a tape or turntable – which we can’t do in realtime, and which is obviously not the way pitch shifter effects in our stompboxes, rack effects or computers work.

There’s a number of approaches doing exactly this, starting from frequency multiplication on nonlinear devices (which, by definition, are nonlinear) over modulated delays, granular effects and whatnot. This could easily turn into a huge discussion approaching the various methods for doing pitch shift – I’d like to summarize this by the experience that typically working with monophonic sources works better than working with polyphonic ones, which essentially disproves our f(x+y) = f(x)+f(y), and thus linearity. And most implementations do modulation things, meaning they’re not time-invariant. So without a detailed proof:

A pitch shifter is nonlinear, and (most of the time) not time-invariant.


1: in our audio engineering context, we only need to deal with functions. For that reason, I will continue to only mention functions.

2: If you want to dive further into that topic: a system can’t be truly linear or time-invariant because of things attributed to both Einstein and Heisenberg (including that there aren’t completely continuous signals, and there isn’t a perfectly precise measure of time, except maybe in the moment of the creation of a universe). The degree of nonlinearity is typically quantified as the total harmonic distortion (THD).

Advertisements

3 thoughts on “Audio Engineering Myths: “Linearity”

  1. Hi – thanks for this explanation, it’s much appreciated.

    Sony advertises that the S/N ratio of their PCM D100 recorder can be raised from 96dB to 100db via the settings. Their explanation is as follows:
    ” The S/N 100dB function achieves a high S/N ratio by replacing two different leveled A/D converters with holding linearity. With the S/N 100dB function, you can record with low noise even with a low recording level.”

    Would this process reduce the dynamic range of the file do you think?
    Thanks!

    1. Hi Sharon,

      I just read the manual of said device (http://pdf.crse.com/manuals/4487745111.pdf), and here’s the explanation I can come up with (although I’m not sure that I completely understand your question). Here goes:

      Two converters are run in parallel, the second one having a 12dB lower input gain (i.e. before the converters). Normally, the first converter is used. If any clipping (i.e. level exceeding 0dBFS) is detected, the thing switches to the second converter to get an additional 12dB of headroom.

      Effectively, this increases the signal to noise ratio (it remains unclear whether it also increases the dynamic range – it would if 16bit converters are used), but only for the signals that do not exceed the 0dBFS of the first converter (or -12dBFS in the scale of the second one). This, however, is fairly acceptable, as for loud signals around the -12dB, noise is masked by the useful signal pretty well – you’re only looking for a very high SNR to be also on the safe side if your input level drops to -30dB or so (i.e. during a quiet passage).

      The tricky part here is of course to have both converters matched properly, to avoid nonlinearities when that switching happens.

      So to directly answer your question: no, it would not reduce the dynamic range of the file.

  2. Thanks so much for this. It’s hard to find the S/N ratings for portable recorders. Is 100dB particularly high for a portable unit? What would be the difference between an S/N rating and an EIN or Equivalent Input Noise rating?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s