Introduction to Spectrogram Analysis
We’ve made some major progress towards being able to write a song-recognition app. It took us some time to break down what sound is and how to record it, and, at last, we developed some mathematical “chops” so that we can being to quantify the musical contents of a recording. Indeed, Fourier analysis is our ticket towards distinguishing songs from one another in an systematic way; we need only extend the application of these methods slightly in order to extract “fingerprints” from these songs.
We will be learning about spectrogram analysis, which will allow us to describe what notes are being played in a song as well as when they are being played. To understand our motivation behind this, let’s understand the blind spots in our current tools for quantitatively analyzing audio data.
Listen to the following five second audio clip.
Three notes are played in accumulation, each spaced about a half second apart from one another, until all three are being played together. There is a brief pause right before the three second mark, and then the three-note chord is struck once again and held for the remainder of the song.
Leveraging Our Methods of Audio Analysis
Let’s see if our current means of analysis – plotting the temporal waveform of the recording alongside its Fourier spectrum – can bear this out.
The temporal waveform reveals that the audio clip lasts for about five seconds; it also shows a jump in loudness around the 1 second mark and the brief gap just prior to the three second mark, before the sound resumes. It certainly doesn’t provide us with any interpretable information about what notes are being played, but we already knew that this would be the case – to distill actual notes from this “cacophony of data” was the entire thrust of our introduction to Fourier analysis.
So what does the Fourier spectrum tell us? It reveals the three prominent notes being played: (approximately)
It is also useful to plot the
Conveying Time and Frequency: The Spectrogram
Ultimately, we want to marry the temporal information of the waveform with the incisive frequency-decomposition of the Fourier spectrum; this is exactly the purpose of the spectrogram. Depicted below, the spectrogram tells us what notes are being played and when. This visualization is a “heat map” whose colors tell us how prominent or quiet any given note is. The
The color map used here indicates the most prominent notes with bright yellow, while near-zero amplitudes are dark green.
The spectrogram displayed above reveals that the
Simply put: a spectrogram is constructed by breaking the audio recording into brief temporal windows and performing a Fourier transform on the audio samples within each window. A vertical column of pixels in the spectrogram corresponds to a narrow time interval, and the heat map along that column stores the Fourier spectrum of the audio data in that time interval. The tall peaks in the Fourier spectrum for that time interval correspond to bright colors in the heat map along that column, and shallow regions of the Fourier spectrum correspond to dim colors.
The next set of exercises will show us how to leverage matplotlib’s built-in spectrogram to analyze audio recordings. They will also step us through the process of constructing our own spectrogram from scratch.
To conclude, let’s watch a brief video that demonstrates a spectrogram that evolves in real time as sound is being recorded. This will help mature our intuition for what the spectrogram reveals to us about audio recordings. It will also provide some nice insight into the overtones that often appear in these Fourier analyses.
Reading Comprehension: Interpreting the Spectrogram:
Refer back to the Fourier spectrum of the audio recording, which was plotted on a logarithmic scale. Note the locations of the three most prominent peaks, which register above
Next, count the number of prominent peaks on the Fourier spectrum (plotted on the logarithmic scale). Can you find a one-to-one correspondence with these peaks and the distinct notes and overtones present in the spectrogram?
Reading Comprehension Exercise Solutions
Interpreting the Spectrogram: Solution
Refer back to the Fourier spectrum of the audio recording, which was plotted on a logarithmic scale. Note the locations of the three most prominent peaks, which register above
Next, count the number of prominent peaks on the Fourier spectrum (plotted on the logarithmic scale). Can you find a one-to-one correspondence with these peaks and the distinct notes and overtones present in the spectrogram?
There are 10 distinct peaks in the log-scaled Fourier spectrum – 3 notes and 7 overtones. All ten of these features manifest as distinctive horizontal lines on the spectrogram, residing at the same frequencies, which are plotted along the
-axis on the spectrogram and the -axis of the Fourier spectrum.