Robust Geometrical Learning for Electroencephalography

Date limite de candidature : 01/03/2023
Date de début : 01/04/2023
Date de fin : 30/09/2023

Pôle : Signaux et statistiques
Type de poste : Stage
Contact : Stefano FORTUNATI (

la fiche
Proposal for a stage M2

Robust Geometrical Learning for Electroencephalography

General information
1. Laboratory: Laboratoire Signaux et Systèmes (L2S), CentraleSupélec, CNRS, Univ. Paris-Saclay
2. Supervision: Florent Bouchard, (CNRS, L2S), Stefano Fortunati (IPSA, L2S), Ammar Mian
(Université Savoie Mont Blanc, LISTIC)
Electroencephalography (EEG) is a neuroimagery modality, which well captures the dynamics of the brain
activity but suffers from high variability, low signal-to-noise ratio and spatial resolution. It is extensively
used in brain computer interfaces (BCI), where the subject interacts with a computer through its brain
signals. The challenge of BCI is to correctly classify incoming data. State-of-the-art methods are based
on sample covariance matrices and their associated Riemannian geometry. Even though such approaches
have shown to be effective, they have several limitaions such as the Gaussianity assumption or the need
for very specific preprocessing. In this project, we propose to exploit geometry along with robust statistics
to develop original classification and clustering methods suited to EEG data. In particular, the potential
benefit that a new class of robust learning methodologies, called R-estimators, may bring in the context
of EEG data will be investigated. The performance of the original algorithms to be developed will be
validated on real EEG data obtained from commonly used open BCI datasets.
Robust learning – Riemannian geometry – Machine learning – Electroencephalography – Brain computer
Scientific description
Context Electroencephalography (EEG) is a non-invasive neuroimagery modality invented by Hans
Berger during the 1920s [1]. It consists in recording the electrical brain activity with electrodes placed
on the scalp. Its low cost, simplicity and high temporal resolution (it well captures the dynamics of the
brain activity) made the popularity of this modality and allowed its use in many applications. EEG,
however, suffers from low signal-to-noise ratio (SNR) and spatial resolution: electrical brain signals are
mixed while going through brain tissues, skull and scalp; and electrodes also record environmental (e.g.,
electrical appliances) and biological (e.g., heart, occular movements) disturbances.
EEG is the preferred functional neuroimaging technique for brain computer interfaces (BCI), where
the subject interacts with a computer through its brain signals. A first use of BCI is for video games [2],
which can be employed to study specific brain phenomena, such as event-related potentials in the visual
cortex. They are also used for medical purposes, for instance to assist disabled people [3], control an
exoskeleton [4] or help mechanical ventilation [5]. There are datasets with different paradigms, the
main ones are: event-related potentials (ERP) [2], which correspond to a response of the brain to a
stimulus (e.g., light flash); steady state visual evoked potentials (SSVEP) [4], where the visual cortex is
synchronized with a light blinking at a fixed frequency; and motor imagery (MI) [6], where the subject
imagine moving the feet, right or left hand. From a data analysis point of view, the challenge of BCI is
to correctly classify incoming data for the computer to perform the adequate action.
State of the art To classify EEG data, usual machine learning (ML) techniques have been employed
and several specific algorithms have been designed; see [7] for a recent review. Currently, the most
popular methods (thanks to their efficiency) are the ones based on sample covariance matrices (SCM)
and their associated Riemannian geometry; see the original paper [8]. Given Z classes, K(z) training
EEG recordings for each class, the related set {C(z)
k } of sample covariance matrices can be defined as C(z)k = 1 T (z)kix(z)k (i)x(z)k (i)T ,
where T (z)k is the number of samples and x(z)k (i) is the ith sample of the kth preprocessed recording of the zth class.

From there, the minimum distance to mean (MDM) classifier computes Z centers of mass G(z)
corresponding to each class and given an incoming SCM C, its class z is the one corresponding to the
minimum distance minz {δ(C, G(z))}. Other classifiers are obtained by first computing a common center
of mass G and projecting SCMs onto the tangent space of G with the Riemannian logarithm map to get
{logG(C(z)k )}. A Euclidean classifier (LDA, SVM, etc.) is then learnt from the projected matrices and
the classification task is performed onto logG(C).
On the very applicative side, a BCI benchmarking Python library called MOABB has recently been
developed to provide effective comparison tools [9]. It features open BCI datasets for the main paradigms
(ERP, SSVEP, MI) along with associated preprocessing procedures. State-of-the-art BCI classification
methods are also available through this library. This greatly facilitate the development and testing of
new methods.
Key issues Even though geometrical approaches have proven to be effective, state-of-the-art EEG
classification methods possess severe limitations. In particular, methods exploit SCMs and are thus
based on a Gaussianity assumption. However, due to their biological nature, EEG data present high
variability and contain outliers. Moreover, they are often limited in quantity. Therefore, one might
expect an heavy-tailed distribution for the observed data and existing methods might be improved by
exploiting the robust statistics theory; see e.g. [10]. Another striking example concerns the central role
of preprocessing and tuning [11]. In the context of EEG data, preprocessing might be a complicated and
very dataset-dependent task. Consequently, turning robust methods might allow one to reduce the need
for involved and ad-hoc preprocessing leading to more general tools for EEG data analysis.
Proposed methodology The main aim of this project is then to develop robust classification and
clustering methods suited to EEG data analysis. In order to achieve this ambitious final goal, the stage
will be structured in three phases:
1. Statistical analysis of the row data: As briefly discussed above, most of the existing classification
strategy of EEG signals are based on preprocessd date. However, as briefly discussed before, the
preprocessing is a delicate step that may cause the loss of statistical information contained in the
data and consequntly affect the classification performance. Then, the first goal of this stage will
be to go back to the row EEG data and statistically characterize them without any preprocessing.
Instead of assuming a centered multivariate Gaussian distribution for the observed data, we allow
for a broader statistical characterization based on the family of centered elliptical distributions [12],
whose probability density function is, up to a scale factor,
f (x|C) = det(C)1/2g
( xT C1x)/2

where x is the data vector, C is the covariance matrix and g : R+ R+ is the so-called density
generator. In practical cases, the true density generator is unknown and the solution to obtain a
robust covariance matrix estimate is to employ an M -estimator such as Tyler’s [13]. Unfortunately,
a drawback of the robust M -estimators is that they fail to be statistically efficient. To overcome this
limitation, we may rely on a semiparametric approach [14]. In fact, the class of centered elliptical
distributions can be seen as a semiparametric model where the finite-dimensional vector of interest
is given by the (vectorized) covariance/scatter matrix, while the density generator represents an
infinite-dimensional nuisance function. Once the statistical model for the observed row data is set,
the next step of the stage will be to derive efficient classification procedures for EEG signals.
2. New efficient covariance learning for existing classification strategies The first approach that we
plan to follow is to combine new robust and efficient learning strategy of the covariance matrix
of the row EEG data with existing classification strategies. As new efficient learning method,
the class of R-estimators has been proved to be able to reconcile the two dichotomic concepts
of robustness and (semiparametric) efficiency [15] [16] in elliptically distributed data. Then, the
first approach that we are planning to follow is to use R-estimators for elliptical distributed data
to obtain the set of covariance matrices {C(z) k } related to the Z EEG classes. Consequently,
geometrical classifiers, such as MDM, or some two-step approach (e.g. projection in tangent space
+ Euclidean classifier (SVM,. . . )), will be exploited and taylored to the specific semiparametric
EEG classification tasks. This approach will allow us to fully understand the potential benefit that
joint geometrical-semiparametric efficient procedures may bring in EEG data analysis.
3. New efficient semiparametric classification strategies The second approach that we plan to pursue
in this stage is more challenging that the first one, but it is really promising from both theoretical
and applicative viewpoints. In fact, while in the first approach, the robust learning of the data
covariance matrix and the geometrical distance-based classification are two consecutive, but still
separate steps, in this second approach we plan to develop original semiparametric distance learning
methodologies, leading to optimal classification strategies. The starting point for this innovative
research line will be the seminal work of Hallin and Paindaveine [17] on a semiparametric, rank-
based, generalization of the widely-used Mahalanobis distance. The Gaussian-based Mahalanobis
distance is in fact a key ingredients of many classifiers. Its robust and semiparametric efficient
generalization will allow us drop the unrealistic Gaussian assumption, in favor of the much more
general (semiparametric) elliptical one.
Validation, application to real data and software development In order to validate the devel-
oped methods, numerical experiments on both simulated and real data will be conducted. To ensure the
practical interest of proposed algorithms, several commonly used open BCI datasets will be employed.
These datasets will be selected within the ones available in MOABB. Furthermore, developed software
solutions (in Python) are to be made available and integrated in MOABB to facilitate reproductibility.

[1] H. Berger. ̈Uber das elektrenkephalogramm des menschen. Archiv f ̈ur psychiatrie und ner-
venkrankheiten, 87(1):527–570, 1929.
[2] M. Congedo, M. Goyat, N. Tarrin, G. Ionescu, L. Varnet, B. Rivet, R. Phlypo, N. Jrad, M. Acquadro,
and C. Jutten. ”Brain Invaders”: a prototype of an open-source p300-based video game working
with the openvibe platform. In 5th International Brain-Computer Interface Conference 2011 (BCI
2011), pages 280–283, 2011

[3] L. Mayaud, S. Cabanilles, A. Van Langhenhove, M. Congedo, A. Barachant, S. Pouplin, S. Filipe,
L. P ́et ́egnief, O. Rochecouste, E. Azabou, C. Hugeron, M. Lejaille, D. Orlikowski, and D. Annane.
Brain-computer interface for the communication of acute patients: a feasibility study and a random-
ized controlled trial comparing performance with healthy participants and a traditional assistive
device. Brain-Computer Interfaces, 3(4):197–215, 2016.
[4] E.K. Kalunga, S. Chevallier, Q. Barth ́elemy, K. Djouani, E. Monacelli, and Y. Hamam. Online
SSVEP-based BCI using Riemannian geometry. Neurocomputing, 191:55–68, 2016.
[5] S. Chevallier, G. Bao, P. Hammami, F. Marlats, L. Mayaud, D. Annane, F. Lofaso, and E. Azabou.
Brain-machine interface for mechanical ventilation using respiratory-related evoked potential. In
International Conference on Artificial Neural Networks (ICANN), Rhodes, Greece, 2018.
[6] Y. Yang, S. Chevallier, J. Wiart, and I. Bloch. Subject-specific time-frequency selection for multi-
class motor imagery-based BCIs using few Laplacian EEG channels. Biomedical Signal Processing
and Control, 38:302–311, 2017.
[7] F. Lotte, L. Bougrain, A. Cichocki, M. Clerc, M. Congedo, A. Rakotomamonjy, and F. Yger. A
review of classification algorithms for EEG-based brain–computer interfaces: a 10 year update.
Journal of neural engineering, 15(3), 2018.
[8] A. Barachant, S. Bonnet, M. Congedo, and C. Jutten. Multiclass brain–computer interface classi-
fication by Riemannian geometry. IEEE Transactions on Biomedical Engineering, 59(4):920–928,
[9] V. Jayaram and A. Barachant. MOABB: trustworthy algorithm benchmarking for BCIs. Journal of
neural engineering, 15(6):066011, 2018.
[10] R. A. Maronna, R. D. Martin, V. J. Yohai, and M. Salibi ́an-Barrera. Robust statistics: theory and
methods (with R). John Wiley & Sons, 2019.
[11] S. Chevallier, E.K. Kalunga, Q. Barth ́elemy, and F. Yger. Riemannian classification for SSVEP
based BCI: offline versus online implementations. In BCI Handbook: Technological and Theoretical
Advances. CRC Press, 2018.
[12] E. Ollila, D. E. Tyler, V. Koivunen, and H. V. Poor. Complex elliptically symmetric distributions:
Survey, new results and applications. IEEE Transactions on Signal Processing, 60(11):5597–5625,
[13] D. E. Tyler. A distribution-free M-estimator of multivariate scatter. The Annals of Statistics, pages
234–251, 1987.
[14] P.J. Bickel, C.A.J Klaassen, Y. Ritov, and J.A. Wellner. Efficient and Adaptive Estimation for
Semiparametric Models. Johns Hopkins University Press, 1993.
[15] Marc Hallin, Hannu Oja, and Davy Paindaveine. Semiparametrically efficient rank-based inference
for shape II. Optimal R-estimation of shape. The Annals of Statistics, 34(6):2757–2789, 2006.
[16] S. Fortunati, A. Renaux, and F. Pascal. Robust semiparametric efficient estimators in complex
elliptically symmetric distributions. IEEE Transactions on Signal Processing, 68:5003–5015, 2020.
[17] Marc Hallin and Davy Paindaveine. Optimal tests for multivariate location based on interdirections
and pseudo-Mahalanobis ranks. The Annals of Statistics, 30(4):1103 – 1133, 2002.