WASAPI

WASAPI is a low latency driver.

It is Microsoft’s own ASIO.

ASIO is a proprietary protocol. You can only use it your audio device supports it.

WASAPI is integral part of Windows (Vista and higher).

When used in shared mode, it uses the Win audio engine.

All audio send to this engine is converted to the settings in the Win audio panel.

Each audio stream is resampled if needed, converted to float, mixed, dithered and converted back to integer.

When used in exclusive mode WASAPI bypasses the Win audio engine.

It transports the output of the media player directly (and unaltered) to the driver of the audio device.

In exclusive mode, no other application can use the sound card.
No more system sounds at full blast over the stereo!

As it is exclusive (only one stream playing) no need to mix so no need to resample and to dither the audio stream. This means each audio file can be played at its native sample rate without user interference (automatic sample rate switching).

Automatic sample rate switching and hardware
In case of a USB DAC (using native mode drivers) you get automatic sample rate switching using WASAPI exclusive.

Most of the time the onboard audio allows for automatic sample rate switching as well.
A lot of discrete sound cards don’t allow automatic switching using WASAPI.
If the discrete sound card comes with ASIO, you better use this driver if you want automatic sample rate switching.

However, this also means the audio device must match the capabilities of the stream.

If your DAC is limited to 96 kHz, sending 192 kHz will fail.

Likewise sending mono to a stereo DAC or 24 bits audio to a 16 bit DAC.

This gave WASAPI initially a bad reputation.

Over the years this problem has been alleviated..

In general, the capabilities of the audio device are known by Windows.

The USB device enumeration will tell what the capabilities of the USB DAC are.

SPDIF can only handle 24 bit data, etc.

This allows the developer to adept the source to the capabilities of the audio device.
If the source is mono and the audio device 2 channel, the developer might decide to send the same signal to both channels.
If the sample rate of the source is not supported by the hardware e.g. 192 kHz source with a 96 kHz audio device, the program using WASAPI has to do the SRC (Sample Rate Conversion).
This can be done by calling the SRC provided by Windows or one provided by the application.

Another one is the hardware. Today's USB DAC's support up to 32 bit data and 382 kHz (or even more) sample rate. You will have a hard time finding audio exceeding these capacities.

Windows audio architecture

Vista has a completely new audio mixing engine, so WASAPI gives you the chance to plug directly into it rather than going through a layer of abstraction. The reasons for the new audio engine are:

A move to 32 bit floating point rather than 16 bit, which greatly improves audio quality when dealing with multiple audio streams or effects.
A move from kernel mode into user mode in a bid to increase system stability (bad drivers can't take the system down).
The concept of endpoints rather than audio devices - making it easier for Windows users to send sounds to "headphones" or record sound from "microphone" rather than requiring them to know technical details about the soundcard's installed on their system
Grouping audio streams. In Vista, you can group together all audio streams out of a single application and control their volume separately. In other words, a per-application volume control. This is a bit more involved than might be at first thought, because some applications such as IE host all kinds of processes and plugins that all play sound in their own way.
Support pro audio applications which needed to be as close to the metal as possible, and keep latency to a bare minimum. (see Larry Osterman's Where does WASAPI fit in the big multimedia API picture?)

Source: Mark .Net

Windows audio diagram (Vista and higher)

By default all sounds are send to the mixer.
The mixer converts the audio to 32 bit float and does the mixing.
The result is dithered and converted back to a format the audio driver accepts (most of the time 16 or 24 bit).

The applications sending sound to the mixer must see to it that the sample rate matches the default rate of the mixer. This default is set in the Advanced tab of the audio panel.

Even if the source matches the default sample rate, dithering will be applied.

Q: If you
•don't apply any per-stream or global effects and
•only have one application outputting audio and
•the sample rate and bit-depth set for the sound card matches the material's sample rate
then there should theoretically be no difference to the original because a conversion from even 24-bit integer to 32-bit float is lossless.

A: Not quite. Since we can not assure that there was nothing added, no gain controls changed, etc, we must dither the final float->fix conversion, so you will incur one step of dithering at your card's level. As annoying as this is for gain=1 with single sources, we can't possibly single-source in general.

If you don't want even that, there is exclusive mode, which is roughly speaking a memcopy.
J. D. (JJ) Johnston

Exclusive mode

WASAPI in exclusive mode bypasses the audio engine (the mixer).

The conversion to 32 float and the dither as applied by the mixer are avoided.

It also locks the audio driver; no other application can use the audio device.

Shared mode

This is equivalent to DS (Direct Sound).

All audio is send to the mixer.

The application must invoke sample rate conversion if the sample rate differs from the value set in the win audio panel.

Typically, the application is responsible for providing the Audio Engine audio buffers in a format that is supported by the Audio Engine. Audio sample formats consist of the sampling frequency, the bit depth, and the number of channels. The native bit depth of samples that the Audio Engine uses internally is 32-bit float. However, the Audio Engine accepts most integer formats that are up to 32-bits. Additionally, the Audio Engine converts most formats to the floating point representation internally. The Audio Control Panel specifies the required sampling frequency as the “Default format.” The Default format specifies the format that is used to provide the content by the audio device. The number of channels that the Audio Engine supports is generally the number of speakers in the audio device.

Changing the sampling frequency and data bit depth is called sample rate conversion. An application may decide to write its own sample rate converter. Alternatively, an application may decide to use APIs such as PlaySound, WAVE, Musical Instrument Digital Interface (MIDI), or Mixer. In these APIs, the conversion occurs automatically. When it is required, Windows Media Player performs sample rate conversion in its internal Media Foundation pipeline. However, if Windows Media Player is playing audio that the Audio Engine can handle natively, Windows Media Player rebuilds its own pipeline without a sample rate converter. This behavior occurs to reduce the intermediate audio transformations and to improve performance.

Microsoft

Event style

WASAPI can be used in push and in pull mode (event style).

A couple of asynchronous USB DAC’s had all kind of problems using push mode due to buffer problems in WASAPI.
This has been solved by using WASAPI – Event style.
The audio device pulls the data from the system.

Most of the time you can't choose the mode. It simply depends on how the programmer implemented WASAPI in the media player.

The difference between doing push or doing event is only who is responsible to know when the host has to send audio to the hardware.

Event based:

- Host tells API that it wants to be informed when it is the appropriate moment to send audio
- Host might prepare some audio in a separate thread so that it is ready when the API asks for it
- API asks host for more audio
- Host sends the prepared buffer if it was ready, or prepares then the buffer and sends it.

Push based:

- Host tells API that it will ask when it is the appropriate moment to send the audio.
- Hosts prepares some audio so that it is ready when the API is ready.
- Hosts asks the API if it is ready.
- If it is not ready, waits some time, and asks again
- When the API replies that it is ready, the host sends the prepared buffer. It might also prepare the buffer at this time and send it.

[JAZ]

WASAPI - Event Style

The output mode lets a sound device pull data from Media Center. This method is not supported by all hardware, but is recommended when supported.

WASAPI - Event Style has several advantages:

It lets the audio subsystem pull data (when events are set) instead of pushing data to the system. This allows lower latency buffer sizes, and removes an unreliable Microsoft layer.
It creates, uses, and destroys all WASAPI interfaces from a single thread.
The hardware (or WASAPI interface) never sees any pause or flush calls. Instead, on pause or flush, silence is delivered in the pull loop. This removes the need for hacks for cards that circle their buffers on pause, flush, etc. (ATI HDMI, etc.).
It allows for a more direct data path to the driver / hardware.
The main 'pull loop' uses a lock-free circle buffer (a system that J. River built for ASIO), so that fulfilling a pull request is as fast as possible.

WASAPI – JRiver Wiki

Practice

Using WASAPI requires a media player supporting this driver in exclusive mode.
Players like MusicBee or Foobar do.

Streaming audio services today offer a mixture of CD quality and Hi-res.
Check if WASAPI/Exclusive is supported as this is the only way to get automatic sample rate switching.

Conclusion

WASAPI is a low latency interface to the driver of the audio device.

In exclusive mode, it allows for automatic sample rate switching.
It is up to the developer or the user of the application using WASAPI to see to it that the properties of the audio file and the capabilities of the audio device do match.

References

User-Mode Audio Components - MSDN
Exclusive-Mode Streams - MSDN
What's up with WASAPI? - Mark Heath
Where does WASAPI fit in the big multimedia API picture? - Larry Osterman
WASAPI – JRiver Wiki