Source Material:

The following exercises are adapted from Chapter 7 of Mark Newman’s book, “Computational Physics”

Exercises: Applications of DFTs

[ ]:
import numpy as np
import matplotlib.pyplot as plt
from typing import Tuple

%matplotlib notebook

Sunspot Data Analysis

(1.6.1a) Read in the sunspots.txt data as a 2-D array. This file contains the number of recorded sunspots in a given month over a timespan of hundreds of years.

There are two columns of numbers (separated by tabs) in the data. The first column is the number of the recorded month. The second column is the number of sunspots recorded in that month.

Use the following code to read in this data:

with open("data/sunspots.txt", "r") as F:
    # Produces a shape-(N, 2) array
    # column-0: month number
    # column-1: count of sunspots that month
    data = np.array([[float(i) for i in j.split('\t')] for j in F])

Once you read in the data, determine: how many months are accounted for in this dataset? What are the range (minimum and maximum) of number of sunspots per month recorded in this dataset?

Plot this data with labeled axes.

[ ]:
# STUDENT CODE HERE

(1.6.1b) Estimate the frequency of the slowly-oscillating pattern in the data that manifests between the major peaks that we see in the plot. Do this by visual inspection (you can click on the box button in the bottom-left of the plot to do a box-zoom); determine roughly the number of months that separate consecutive peaks, and then convert this into a frequency (with units of 1/month).

1.6.1b Solution: SOLUTION HERE

(1.6.2) Take a DFT of this real-valued data and plot the Fourier spectrum, \(|a_{k}|\) vs \(\nu_{k}\), to evaluate the periodicity of the data. Make sure your frequencies, \(\nu_{k}\), represent the appropriate physical units of 1/months.

Tips: - Figure out what the duration, \(T\), of this data is (in months) (recall that the end-point is not included in the samples, so the duration might not be quite as straightforward as you might expect). - Determine the number of samples, \(N\), in the dataset. - Compute \((c_k)_{k=0}^{\lfloor \frac{N}{2} \rfloor}\) for the data and then convert these to \((|a_k|)_{k=0}^{\lfloor \frac{N}{2} \rfloor}\) so that you can plot the Fourier spectrum. - Make sure you compute the appropriate \(\nu_k\) in 1/months for each \(|a_k|\).

[ ]:
# STUDENT CODE HERE

(1.6.3) What is the cause of the large peak at \(k = 0\)? Study the form of the equation for the discrete Fourier transform, specifically for \(c_{k=0}\). What is the simple relationship between \(c_{k=0}\) and the data \(({y_{n}})_{n=0}^{N-1}\)? Uniformly adjust the sunspot data so that \(c_{k=0} \approx 0\) when you take the Fourier transform of this updated data, and plot the Fourier spectrum of the new data.

SOLUTION HERE

[ ]:
# STUDENT CODE HERE

Do you see a prominent peak in this Fourier spectrum in correspondence with the period that you estimated? Make sure to check your results with a neighbor or TA.

Analyzing Audio Signals from Instruments

(1.6.4) Read in the digital audio signal for a trumpet, from data/trumpet.txt, as an array of integers. This signal was recorded at a rate of \(44,100\;\mathrm{Hz}\), which is the defacto standard for audio sampling (as implemented by Sony).

You can read in the digital samples with the code:

with open("data/trumpet.txt", 'r') as R:
    # each sample is written to a single line in the text file
    # this reads them in as a single integer-valued numpy array
    data = np.asarray([int(i) for i in R])

Plot the sampled data vs time. Be sure that the time values you use are calculated appropriately such that they reflect units of seconds. Label x axis appropriately. There is too much data to be plotted, so plot only every 100th datapoint.

[ ]:
# plot signal
# STUDENT CODE HERE

Now, play the audio (using all of the data) using

from IPython.display import Audio
Audio(data, rate=???)  # <- specify the sampling rate for this data
[ ]:
# playing audio signal
# STUDENT CODE HERE

(1.6.5) Plot the Fourier spectrum, \(|a_{k}|\) vs \(\nu_{k}\), for the first \(10,000\) \(k\)-values. Be sure to use NumPy’s fast Fourier transform (not the plain DFT) for real-valued data (np.fft.rfft). We are working with a lot of data, so the standard DFT would be slow.

Make sure that the frequency values \(\nu_{k}\) are scaled to have the appropriate units (Hz). Tip: determine out the number of samples, \(N\), and as well as the duration of the sample, \(T\). Don’t merely estimate \(T\) based on the audio player. Compute the actual value; we have all required information to do so.

Inspect Fourier spectrum – what notes are being played?

[ ]:
# STUDENT CODE HERE

Let’s try manipulating this audio signal via its Fourier coefficients.

First, make a copy of the complex-valued Fourier coefficients, \((c_k)_{k=0}^{\lfloor \frac{N}{2} \rfloor}\), to a new array. Then find all of the coefficient such that \(|a_{k}| > 100\) and set those complex-valued coefficients to 0. Next, take the inverse Fourier transform (np.fft.irfft) of the now-modified set of complex-valued Fourier coefficients, to produce a “mutated” collection of samples \((y^{\text{mutated}}_n)_{n=0}^{N-1}\). This is a new set of audio samples, but with those Fourier components removed from the samples!

[ ]:
# STUDENT CODE HERE

The result of this is a mutated version of the digital trumpet signal. Plot the wave form of this mutated signal against time (seconds).

Only plot every :math:`100^text{th}` sample.

[ ]:
# STUDENT CODE HERE

Play the audio from mutated set of samples. The sampling rate for the Audio player is still \(44,100\).

Consider what filtering you performed on the Fourier coefficients and how this affected the audio sample. Does this make sense to you? Chat with a neighbor about this.

[ ]:
# STUDENT CODE HERE

(1.6.6) Repeat all of the steps of analysis that you did for the trumpet, but for the piano audio signal "data/piano.txt". But feel free to be creative with how you ultimately mutate its audio.

Consider plotting the Fourier spectrum for the piano on a log-amplitude (y) scale, with

ax.set_yscale('log')
ax.set_ylim(1, None)
[ ]:
# plot signal
# STUDENT CODE HERE
[ ]:
# play audio signal
# STUDENT CODE HERE
[ ]:
# fourier transform
# STUDENT CODE HERE
[ ]:
# mutate signal
# STUDENT CODE HERE

# play audio signal
# STUDENT CODE HERE

Smoothing Stock Market Data

(1.6.7) Read in the stock market data from data/dow.txt. Each data point corresponds to the daily closing value of the Dow Jones Industrial Average (starting in late 2006 and ending in late 2010). Use the following code to read in the data:

with open("data/dow.txt", 'r') as R:
    # Each row of the txt file contains the closing value of the market in "points"
    # This data is read in as a numpy array of floating point values
    data = np.asarray([float(i) for i in R])

Plot the data on labeled axes.

[ ]:
# STUDENT CODE HERE

(1.6.8a) Note that this data looks roughly like a single period of a sine-wave. By visually inspecting this plot, what, roughly, are the frequency and amplitude of this sine wave? Answer with appropriate units.

1.6.8a Solution: SOLUTION HERE

(1.6.8b) Perform an FFT on this real-valued data, and plot the Fourier spectrum for this data. The \(\nu_{k}\)-axis should be scaled to be in units of \(\frac{1}{\text{days}}\). Label your axes.

Some tips:

  • Zero-center your data (such that its mean-value is \(0\)) so that \(|a_0|\) doesn’t dominate your Fourier spectrum

  • Plot on the domain \([0, 0.04]\) (units 1 / days) using ax.set_xlim(0, 0.04)

Do you see a peak in correspondence to the sinusoidal form discussed in 1.6.8a? Does the frequency and amplitude of this prominent peak make sense?

[ ]:
# STUDENT CODE HERE

(1.6.9) We want to smooth this stock market data. We can do this by “removing” the high-frequency coefficients of its Fourier spectrum. Try zeroing-out the top \(90\%\) high-frequency complex-valued coefficients \(c_k\), and then perform an inverse FFT using these altered coefficients.Plot these “smoothed” samples on top of a semi-transparent version of the original data (use the plot parameter alpha=0.5).

Then repeat this process, but with zeroing out the top \(98\%\) coefficients. In both of these cases,on what times scales are the fluctuations being filtered out?

SOLUTION HERE

[ ]:
# filter top 90% high frequencies
# STUDENT CODE HERE
[ ]:
# filter top 98% high frequencies
# STUDENT CODE HERE

(1.6.10) Now repeat this process but zero-out the bottom \(10\%\) low-frequency coefficients. What do you see? Why is there a huge down-shift in the recovered data? (Hint: recall the role of \(c_{k=0}\)). What would happen if you filtered out the bottom \(10\%\) low-frequency coefficients except for \(c_{0}\)? Try this.

SOLUTION HERE

[ ]:
# filter bottom 10% low frequencies
# STUDENT CODE HERE
[ ]:
# filter bottom 10% low frequencies, except c0
# STUDENT CODE HERE