An introduction to computer audio
Digital audio is PCM (Pulse Code Modulation) audio most of the time.
It consist of two components, the value of the signal (represented by 16 or 24 bits words) and the time step (sample rate).
We have two components, the signal and the time.
Sounds logical but pretty often you hear the 'bits are bits' theory, implying that if the bits are right, everything is right. This theory leaves the other half, the time step, out of the equation.
To play PCM audio, the bits has to be translate to a equivalent voltage and this must be done with a time step matching the sample rate.
This is done by a Digital to Analogue Converter, a DAC for short.
The sample rate is generated by a clock.
As absolute perfection does not exist, there is always some fluctuation in clock speed.
This is called clock jitter.
Interface jitter is jitter introduced in the transmission of digital signals.
Noisy power supplies, improper grounding and electromagnetic interference could induce jitter.
Crucial is the sampling jitter, deviations in the sampling interval in the DA conversion stage
According to the Redbook audio standard the clocks frequency should be within +/-100ppm (parts per million).
A deviation of 100 ppm means that a 440Hz tone deviates +/- 0.044Hz.
The first generation was Non Over-Sampling.
The DAC simply runs at the speed of the incoming stream.
Inherent to the conversion is that higher-frequency multiples of the audible range are created, the so-called aliases. In case of CD audio, the sampling rate is 44.1 kHz; the audible range is the half, so the first alias will start at 22.050 kHz.
To avoid that these aliases burn you tweeters, a low pass filter is required.
This filter starts at 20.000 and has to remove everything before the first alias starts (22.050 assuming CD audio) so it has to be very steep (brick wall filter). Filters this steep are expensive, complex and introduce all kind of artifacts like phase distortion and pre-ringing.
Schema of a NOS DAC
Most DAC’s today use oversampling to avoid the filter problem.
In general this is S/PDIF over coax or Toslink. Modern DACs offers USB input to.
There are other designs possible, an overview can be found here: http://en.wikipedia.org/wiki/Digital-to-analog_converter
A DAC might be a sound card or a separate box.
Very few companies build their own DAC (the chip set)
dCS and Chord are the ones I know using a FPGA to build their own converter.
Most companies use a chipset by Burr Brown (TI today), AKM, Wolfson, Analog Devices, ESS, etc.
The number of bits a DAC supports is a nominal value.
It is not the performance metric.
All it says is that the DAC accepts samples with a 16 or a 24-bit word length.
Not to be mistaken for being able to resolve this sample to the last bit.
A perfect DAC would have no linearity error.
As you can see in the graph, this DAC starts to deviate at -70 dBFS.
As 70/6=11.6 this DAC is more or less able to reproduce 12 of the 16 bits perfect.
High quality DACs are able to reproduce up to 22 bits correctly.
This is discussed in more detail here.
Ask in an audio forum “what is the best way to connect the PC to the audio” and the replies will probably be
Onboard sound card uses the PCI-bus. The analogue out in general is RCA; the pro-models have balanced out (XLR).
Toslink, SPDIF, AES/EBU are popular in the audio world.
It are unidirectional protocols, there is no 2-way communication.
The sender starts to stream in real time and the receiver has to lock on the incoming stream.
In principle, any variation in clock speed by the sender will result in input jitter at the receiver (the DAC). There are tricks to improve on this like ASRC.
USB and FireWire are typically computer protocols.
Both can be used in different modes.
In asynchronous mode the DAC times the bus.
In this scenario the clock of the DAC can run at a fixed speed as there is no need to sync on an incoming stream. This mode is the solution to eliminate input jitter.
Networking over Ethernet (wire) or Wi-Fi (wireless) is asynchronous by design.
Packages of data are send using a protocol like TCP/IP.
This protocol is very strict; it is built with bit perfect transmission in mind.
It is a bi-directional protocol, if a package fails the checksum test, the receiver simply tells the sender so and it will get a new one.
Sounds good but in practice other protocols as UDP might be used.
It is a lightweight protocol and therefore very efficient to transport large volumes of data like audio or video.
It is more efficient than TCP/IP because it uses a simple transmission model without implicit hand-shaking dialogues for providing reliability, ordering, or data integrity.
Thus, UDP provides an unreliable service and datagrams may arrive out of order, appear duplicated, or go missing without notice.
However, as it is audio, who cares?
By design, all input is buffered.
Networking solutions sound ideal, asynchronous, bit perfect transmission in other words a jitter free bit perfect connection.
However, a networked DAC is simply a small computer with a sound card and this raises the question again: how to get the audio out of it.