This is a generalized framework that can be applied to a multitude of various applications relating to time series analysis. Specifically, this covers exploratory data analysis techniques involving the manipulation of time series data and a general introduction to the field of ERP research and Brain Computer Interfaces.
One of the main issues facing BCI research today is the lack of datasets to create models upon. Additionally, the data generated by EEG devices are frankly quite frightening.
This dataset was recorded at a sampling rate of 130 Hz. The first column is corresponds to 'TIME'.
The next 17 columns correspond to each of the EEG channels. 'F7', 'F3', 'FZ', 'F4', 'F8', 'T7', 'C3', 'CZ', 'C4', 'T8', 'P7', 'P3', 'PZ', 'P4', 'P8', 'O1', 'O2'.
The final two columns denotes the time at which a stimulus has occurred.
One of the the first steps I will be taking with this data is transposing it.
import pandas as pd
df = pd.read_csv('training.csv').T.to_csv('trainingInput.csv', header=False, index=False)
Now our data is ready to be plotted, using MNE.
import numpy as np
import mne
import matplotlib.pyplot as plt
#init cuda, must install CuPY, CUDA 11 and CUDNN
#mne.utils.set_config('MNE_USE_CUDA', 'true')
#mne.cuda.init_cuda(verbose=True)
data = np.loadtxt('trainingInput.csv', delimiter=',', usecols=range(1,46805))
ch_names = ['TIME', 'F7', 'F3', 'FZ', 'F4', 'F8', 'T7', 'C3', 'CZ', 'C4', 'T8', 'P7', 'P3', 'PZ', 'P4', 'P8', 'O1', 'O2', 'SIMULATION', 'TRARGET']
sfreq = 128
Regarding the initialization, the length of ch_names
must be equivalent to the rows in the CSV file. This is why it was important to transpose the dataset. Additionally, this data was recorded at a 3 and a half minute time frame. At a 128 Hz sampling rate, this means there are 46,805 datapoints for each of the 17 channels. Thats over 795,000 different numerical values. Thankfully, MNE allows to create raw object make it very simple to manipulate our data.
Additionally, I will be denoting wherever a testing event occurred during the data recording, and displaying it on our graph output.
info = mne.create_info(ch_names, sfreq)
raw = mne.io.RawArray(data, info)
eventDICT = {1: 'EVENT'}
events = mne.find_events(raw, stim_channel=['SIMULATION'])
print(events)
annoations = mne.annotations_from_events(events, sfreq=sfreq, event_desc=eventDICT)
print(annoations)
raw.set_annotations(annoations)
raw.plot(block=True)
Again, I would like to emphasize the lack of quality datasets for public use in the field of EEG analysis. As one can see in the first 5 seconds of this data recording, there is substantial noise within our signal.
This could severely effect our signal processing. And the quickest way to fix this data was honestly to zero out the entirety of the first five seconds.
If I did not do this, the results of the Fast Fourier Transform are ruined.
sr = 46804 # sampling rate
ts = 1.0/sr # sampling interval
t = np.arange(0,1,ts)
df = pd.read_csv('NoBs.csv')
x = df['P4'].to_numpy()
plt.figure(figsize = (8, 6))
plt.plot(t, x, 'r')
plt.ylabel('Amplitude')
plt.show()
Specifically the lines x = df['P4'].to_numpy
is especially important, as the fft()
function will only take in a NumPy array.
from scipy.fftpack import fftfreq, fft, ifft
X = fft(x)
N = len(X)
n = np.arange(N)
T = N/sr
freq = n/T
I am especially interesting the filtered values in the frequency lower than 30Hz. In order to do this I will implement a low-pass filter. There are two types of filters: low-pass and high-pass. Both work by assigning zeros to the FFT amplitudes where the absolute frequencies smaller than the cut-off
sig_fft = fft(x)
sig_fft_filtered = sig_fft.copy()
freq = fftfreq(len(x), d=1./2000)
cut_off = 30
sig_fft_filtered[np.abs(freq) > cut_off] = 0
filtered = ifft(sig_fft_filtered)
Using a Matrix Profile, one can discover motifs of patterns within time series data.Specifically, we want to see if there are any patterns attributed to the stimulus a subject is experiencing. The patterns are known as Event Related Potentials (ERP).
c = pd.DataFrame(filtered.astype(np.float64), columns = ['P4'])
m = 200
mp = stumpy.stump(c['P4'], m)
motif_idx = np.argsort(mp[:, 0])[0]
dists, inde = stumpy.motifs(c['P4'], mp[:,1], max_matches=10)
for z in range(0, inde.shape[1]):
rect = Rectangle((inde[0][z], 0), m, 10, facecolor=(1.0,0.0,0.0))
axs[0].add_patch(rect)
axs[1].plot(mp[:, 0])
axs[1].set_ylabel('Matrix profile', fontsize='20')
plt.show()
In the entire dataset, there were 784 different event indicators. Here I have highlighted only the top 10 of the strongest motifs in the data.