May 7, 2026

BCI part 1: from EEG to CSP features

First checkpoint on the BCI side project — load BCI Comp IV 2a, bandpass to mu/beta, epoch around the cue, then fit CSP and produce log-variance features ready for a linear classifier.

Scroll

A first checkpoint on the BCI side project. I'm decoding motor imagery from EEG, classical pipeline first, deep models later. This post goes from raw recordings to CSP features — the point at which the data is ready for a linear classifier.

The dataset

BCI Competition IV Dataset 2a, loaded through MOABB as BNCI2014_001. Nine subjects, 22 EEG channels at 250 Hz, four motor-imagery classes:

  • left hand
  • right hand
  • feet
  • tongue

Each subject has one training session and one test session. The cue lands at t=0t = 0; the imagery period runs from roughly 0.50.5 to 4s4\,\text{s} after.

Preprocessing

EEG channels only, bandpass 8–30 Hz, epoch from 0.5-0.5 to 4.0s4.0\,\text{s} around each cue with a (0.5,0)(-0.5, 0) baseline.

eeg = run_raw.copy().pick("eeg").filter(l_freq=8.0, h_freq=30.0)
events, ev_id = mne.events_from_annotations(eeg)
epochs = mne.Epochs(eeg, events, event_id=class_ids,
                    tmin=-0.5, tmax=4.0, baseline=(-0.5, 0),
                    preload=True)

The 8–30 Hz band isn't a tunable hyperparameter — it's the mu (8–13 Hz) and beta (13–30 Hz) ranges where motor imagery actually lives. Below 8 Hz is mostly slow drift and ocular artifacts; above 30 Hz is mostly muscle noise.

One session out: a tensor of shape (n_trials, 22, 1126), where 1126=250×4.5+11126 = 250 \times 4.5 + 1.

Sanity check: event-related desynchronization

Before any modeling, I want to see that the data actually contains motor imagery. The classic check is ERD: when a subject imagines a hand movement, mu/beta band power drops in the contralateral sensorimotor cortex.

Per trial and per channel, take the mean squared signal over a baseline window (0.50s-0.5 \to 0\,\text{s}) and an imagery window (0.52.5s0.5 \to 2.5\,\text{s}):

ERDc=PcimageryPcbaselinePcbaseline\mathrm{ERD}_c = \frac{P_c^{\text{imagery}} - P_c^{\text{baseline}}}{P_c^{\text{baseline}}}

Plot that as a topomap per class, blue for negative. For "left hand" you'd expect blue over right-hemisphere motor cortex, and the mirror for "right hand". If those topomaps line up, the rest of the pipeline is worth building.

CSP, with the actual math

CSP (Common Spatial Patterns) finds spatial filters — linear combinations of the 22 channels — that maximize the variance of one class while minimizing the variance of the other. Since the 8–30 Hz bandpass is already applied, post-filter variance is band power, which is exactly the signal ERD lives in.

For two classes with covariance matrices Σ1\Sigma_1 and Σ2\Sigma_2, CSP solves the generalized eigenvalue problem:

Σ1w=λ(Σ1+Σ2)w\Sigma_1 \mathbf{w} = \lambda \, (\Sigma_1 + \Sigma_2) \, \mathbf{w}

The eigenvalue λ[0,1]\lambda \in [0, 1] is the share of total variance the filter assigns to class 1. The largest eigenvectors are the strongest "class 1 vs rest" directions; the smallest are the strongest "class 2 vs rest" directions. I take the top kk and bottom kk to get 2k2k spatial filters per binary problem.

T = X.shape[2]
cov1 = (X[y == 0] @ X[y == 0].transpose(0, 2, 1)).mean(0) / (T - 1)
cov2 = (X[y == 1] @ X[y == 1].transpose(0, 2, 1)).mean(0) / (T - 1)
_, V = eigh(cov1, cov1 + cov2)
filters = np.concatenate([V[:, :k], V[:, -k:]], axis=1).T  # (2k, C)

Features: log of normalized variance

Filtering a trial gives Z=WXZ = W X of shape (2k, T). Each filter collapses to one number via variance over time, then I take the log of the normalized variance:

fi=logVar(zi)jVar(zj)f_i = \log \frac{\mathrm{Var}(z_i)}{\sum_j \mathrm{Var}(z_j)}

Normalization removes per-trial scale; the log pushes the distribution toward Gaussian, which is exactly what an LDA wants downstream.

Z = filters @ X                # (N, 2k, T)
v = (Z ** 2).mean(axis=2)      # (N, 2k)
features = np.log(v / v.sum(axis=1, keepdims=True))

Multi-class via one-vs-rest

CSP is binary by definition. For the four-class 2a problem I fit one CSP per class against the rest and concatenate the feature blocks:

features = np.concatenate(
    [csp_c.transform(X) for csp_c in csps_per_class],
    axis=1
)  # shape (N, num_classes * 2k)

Joint-diagonalization variants exist and probably do better on 2a, but one-vs-rest is the right baseline.

Where I am

End to end on a single subject: load → bandpass → epoch → CSP → log-variance features. The features are shaped to feed an LDA — that's next, then a per-subject accuracy table, then trying to actually beat the published baseline.

Subscribe

I'll send a note whenever I publish. No tracking, no other emails, just the posts.

Confirmation email only. Unsubscribe in one click.