Interactive Audio Visualisation with the Web Audio API


This experiment started out as an idea for a friend’s company. He is a musician who rents out PA systems for musical and corporate events. The idea was to convert the user's device screen into a responsive speaker cabinet, with multiple tweeters and woofers for larger devices, adapting to the available space.

As a studio engineer and musician my friend understood the possibilities of using filter sweeps and panning effects to affect the audio. Filter sweeps remove frequencies within the sound, and panning is the positioning of the sound within a stereo space. We considered attaching a filter sweep to the vertical drag movement and maybe panning to the left and right.

When creating the visual elements I wanted to revisit some of the techniques I’d used previously in the pure CSS iPhone timeline project. I also went back and re-inspected some multiple box-shadow code on a single div, before making a start on the woofer styles. I based the woofer styles on a google image search.

Anatomy of a loudspeaker

To get everything working nicely I had to brush up on a few CSS features; repeating-radial-gradients, sizing with vh and vw units, and setting breakpoints using only max-aspect-ratio and min-aspect-ratio media queries. You can see woofer and tweeter styles in this sass file.

Orientation based media queries came in useful to switch the flex-direction between portrait and landscape layout. I separated all the speaker layout styles into one .scss file, so if you want to see how the layout works then you can have a look in more detail here.

Adding audio

To create the MVP we'll need to complete the following steps;

  1. Load an audio file to use as a repeating drum loop
  2. Start and stop the audio on user interaction
  3. Apply a low pass filter to the audio based on user Y position
  4. Apply a stereo panning effect to the audio based on user X position

When working with the Web Audio API it can sometimes make sense to use a library to reduce the amount of code you have to write. To code up our simple brief without a library you would have to; create an audio context, create an audio buffer, asynchronously load the file into the buffer, create the effects nodes and then route the audio buffer through the effects in the desired order, finally playing sound through the stereo output of the device.

Using a library like Pizzicato.js reduces this code to a few lines that are easy to understand.

// Load the audio file for the loop
this.drumloop = new Pizzicato.Sound({
    source: 'file',
    options: {
        path: '/assets/audio/drum-break-99bpm.wav',
        loop: true

// Create the Low Pass Filter effect
this.LPF = new Pizzicato.Effects.LowPassFilter();

// Create the Stereo Panner
this.panner = new Pizzicato.Effects.StereoPanner();

// Add the effects to the loop
View raw

So far so good. We now have some audio loaded into our browser, with a low pass filter and a stereo panner applied to it. But we can’t hear anything yet. It’s time to add some event listeners and handlers to let the user control the audio.

We bind event listeners to the speaker element, making sure to listen for touch events and mouse events as we want this to work across all devices…

// Event Listeners
    .on('touchstart mousedown', this.onTouchStart)
    .on('touchmove mousemove',  this.onTouchMove)
    .on('touchend mouseup',     this.onTouchEnd);
View raw

Playing or stopping the file is easy enough. It only makes sense to update the filter and pan settings when the loop is actually playing, so we use a boolean property to keep track of the play state.

// Event handlers
onTouchStart(e) {
    this.isPlaying = true;

onTouchEnd(e) {
    this.isPlaying = false;

    if (!this.isPlaying) {
View raw

Rather than adding the filter and panning code inside the onTouchMove handler, we create a couple of new functions. This keeps things clean, and means that we can call these new functions from other event handlers if the need arises. Let’s look at these two functions in turn.

Low Pass Filter

 * Set the Filter frequency based on pageY position
 * Possible values between 0 and 10,000
 * @param Event e   Touchmove or Mousemove event
setLPF(e) {
    let freq = 10000 - ((this.getEvent(e).pageY / $(window).height()) * 10000);
    this.LPF.frequency = freq;
View raw

Our drum loop audio is a 44.1khz stereo WAV file containing a wide range of frequencies (0hz - ~22kz). Low frequencies from the kick drum, high frequencies from the hi-hats and cymbals, with toms and snare in between.

The average human ear can distinguish frequencies between 20hz and 20khz, and our drum loop file contains information right across that frequency range. The low pass filter enables us to set a frequency value, above which, all sound will be filtered out.

If the frequency of the filter is set to 0 then frequencies above 0hz will be filtered out, resulting in silence. However if set to 10,000hz then, although higher frequencies are being filtered out, it will sound unfiltered to the user as it won’t sound muffled.

The setLPF function above sets the frequency of the low pass filter based on the user’s Y position on the screen. If the mouse is at the very bottom of the screen (0hz) then the filter will remove all frequencies above 0hz, and if the mouse is at the top of the screen (10,000hz) then most of the frequencies will be heard.

Stereo Panner

The following setPan function is similar to the setLPF function but this time we are reacting to the user’s X position to set a value between -1 (left speaker) and 1 (right speaker), with 0 as the center point.

 * Set the Panner pan position based on pageX position
 * Possible values between -1 and 1
 * @param Event e   Touchmove or Mousemove event
setPan(e) {
    let pos = (((this.getEvent(e).pageX / $(window).width())*2)-1);
    this.panner.pan = pos;
View raw

Analysing the audio in real time

Our aim here is to add animations to moving parts of the speaker that are driven in real time by the volumes of specific frequencies within the audio. The Web Audio API includes an Analyser Node which is used to "provide real-time frequency and time-domain analysis information", intended purely for use for audio visualisations.

We create an analyser node and connect the output of the low pass filter effect to it, using Pizzicato again to do this...

// Create Analyser Node
this.analyser = Pizzicato.context.createAnalyser();
View raw

One of the properties we can set on the analyser is the FFT size. Setting this controls the resolution of the data that will be returned by the analyser. It will return an array that is half the size of the declared fftSize. The array length is also referred to as the frequencyBinCount, and this value will automatically be set as a read only value of the frequencyBinCount property on the analyser.

Setting high FFT values will produce higher data resolution but will be more processor hungry. The trick is to find a sensible mid point between performance and resolution. We will set our analyer fftSze to 1024, which will give us a frequencyBinCount of 516.

We can use use this value to create a Uint8Array buffer to hold the audio data.

// Create Analyser Node
this.analyser = Pizzicato.context.createAnalyser();
this.analyser.fftSize = 1024;
this.bufferLength = this.analyser.frequencyBinCount;
this.dataArray = new Uint8Array(this.bufferLength);

View raw

Getting data from the analyser node

The analyser node has 4 methods we can call to read data from the audio;

We will be using the first, getByteFrequencyData, which "copies the current frequency data into the passed unsigned byte array".

We set up a requestAnimationFrame loop (onTick) which will run recursively whenever the audio is playing, and collect frequency data inside that to get a stream of data in real time.

onTick() {

    this.raf = requestAnimationFrame(this.onTick.bind(this));

    // animations go here...
View raw

Calling getByteFrequencyData every "tick" fetches 516 integer values into this.dataArray every time it is called (potentially 60fps+ depending upon your processor speed).

View raw

Each of the 516 values in the dataArray is referred to as a "bin", and each bin represents the volume for a specific band of frequencies.

This 516 "bin" array can be thought of like a graphic equaliser where each bin relates to a single column of the equaliser. A bin value of 0 = no lights/sound for that column, wheras a bin value of 255 would be fully lit/full blast.

The first bin refers to the sound levels at 0hz, and we can use the following equation to work out the bin frequency width which is consistent across all bins.

sample rate / fftSize = bin frequency width

So for our 41.1 kHz audio file with a fftSize of 1024, which was the value we set when creating the analyser node above, the bin frequency width is roughly 43hz...

44100 / 1024 = 43.0664062hz

Every bin represents a bin frequency width of ~43hz. If we were to write out the frequency values for each of our bins from 0hz upwards we would arrive at the following... (numbers rounded for clarity)

[0hz, 43hz, 86hz, 129hz, 172hz, 215hz, 258hz,... 22,050hz]

An actual dataArray of frequency volumes coming from our analyser node might look something like...

[0, 62, 110, 102, 125, 152, 94,... 0, 0, 0]

We are now able to look at this dataArray and say "The audio volume at 0hz is zero, the audio volume at 43hz is 62, the audio volume at 86hz is 110, etc…".

It should be no surprise that the highest frequency bins all have a volume level of zero. If you remember, we set the low pass filter's max frequency value to 10,000 earlier, so any frequencies above 10khz are being filtered out before being sent to the analyser.

Now we have all the information we need, we can use the frequency bin values to animate the scale of some moving parts within the CSS speaker.

Animating the speaker

When a speaker emits sounds the cone moves backwards and forwards, pushing air and creating the sound. The HTML for each woofer in our app consists of a cone element, with three children representing the cone-inner, the cone-mid and the cone-outer.

<div class="cone">
    <div class="cone-outer"></div>
    <div class="cone-mid"></div>
    <div class="cone-inner"></div>
View raw

We can animate each of these elements independently based upon frequency volumes from separate bins. The fftSize value on our analyser node (1024) generates a high data resolution, with enough bins to wire the outer cone to the kick drum frequency (~86hz), the mid-cone to the low snare drum frequencies (~301hz), the inner-cone to hi snare frequencies (~689hz), and the tweeter inner-cone to the hi-hats and cymbals (~9,500hz).

I am using GSAP animations to scale cone elements...

// bin 3 : ~86 Hz - kick drum
let percVol = this.dataArray[2] / 255;, 0.001, {css:{scale: (1 + (percVol * 0.05)) }});

// bin 8 : ~301 Hz - low snare
percVol = this.dataArray[7] / 255;, 0.001, {css:{scale: (1 + (percVol * 0.1)) }});

// bin 17 : ~689 Hz - high snare
percVol = this.dataArray[16] / 255;, 0.001, {css:{scale: (1 + (percVol * 0.1)) }});

// bin 221 : ~9,500 Hz - hi hats
percVol = this.dataArray[220] / 255;, 0.001, {css:{scale: (1 + (percVol * 0.09)) }});
View raw

If you click and drag around the screen with the audio playing you will notice that the tweeters move more noticeably when you are at the top of the screen and less noticeably when you are in the lower half of the screen, where the filter frequency is set to ~5,000hz.

Round up

Despite requiring a bit of mathematics and understanding of how audio files work I hope you'll agree that using the Web Audio API to analyse audio in real time is really qute straight forward, especially when simplifying things with a library such as Pizzicato.js.

If you do want to go deeper into what you can do with audio in the browser then here's a link to the current Web Audio API documentation.

The Interactive Speaker

  • Touch Click the screen
  • and drag