Post-doctoral position « Robust Geometrical Learning for Electroencephalography »

Date limite de candidature : 21/06/2021
Date de début : 01/09/2021
Date de fin : 31/08/2022

Pôle : Signaux et statistiques
Type de poste : Post-Doc ou ATER
Contact : Bouchard Florent (

la fiche

Post-doctoral position

Robust Geometrical Learning for Electroencephalography

General information
1. Laboratory: Laboratoire Signaux et Systemes (L2S), CentraleSupelec, CNRS, Univ. Paris-Saclay
2. Supervision: Stefano Fortunati (enseignant-chercheur IPSA/L2S) and Florent Bouchard (CR

Electroencephalography (EEG) is a neuroimagery modality, which well captures the dynamics of the brain activity but su ers from high variability, low signal-to-noise ratio and spatial resolution. It is extensively used in brain computer interfaces (BCI), where the subject interacts with a computer through its brain signals. The challenge of BCI is to correctly classify incoming data. State-of-the-art methods are based on sample covariance matrices and their associated Riemannian geometry. Even though such approaches have shown to be e ective, they have several limitaions such as the Gaussianity assumption or the need for very speci c preprocessing. In this project, we propose to exploit geometry along with robust statistics to develop original classi cation and clustering methods suited to EEG data. In particular, the potential bene t that a new class of robust, rank-based, R-estimators may bring in the context of EEG data will be investigated. The performance of the original algorithms to be developed will be validated on real EEG data obtained from commonly used open BCI datasets.

Robust statistics { Riemannian geometry { Machine learning { Electroencephalography { Brain computer interface

Scienti c description

Context Electroencephalography (EEG) is a non-invasive neuroimagery modality invented by Hans Berger during the 1920s [1]. It consists in recording the electrical brain activity with electrodes placed on the scalp. Its low cost, simplicity and high temporal resolution (it well captures the dynamics of the brain activity) made the popularity of this modality and allowed its use in many applications. EEG, however, su ers from low signal-to-noise ratio (SNR) and spatial resolution: electrical brain signals are mixed while going through brain tissues, skull and scalp; and electrodes also record environmental (e.g., electrical appliances) and biological (e.g., heart, occular movements) disturbances.

EEG is the preferred functional neuroimaging technique for brain computer interfaces (BCI), where the subject interacts with a computer through its brain signals. A rst use of BCI is for video games [2], which can be employed to study speci c brain phenomena, such as event-related potentials in the visual cortex. They are also used for medical purposes, for instance to assist disabled people [3], control an exoskeleton [4] or help mechanical ventilation [5]. There are datasets with di erent paradigms, the main ones are: event-related potentials (ERP) [2], which correspond to a response of the brain to a stimulus (e.g., light ash); steady state visual evoked potentials (SSVEP) [4], where the visual cortex is synchronized with a light blinking at a xed frequency; and motor imagery (MI) [6], where the subject imagine moving the feet, right or left hand. From a data analysis point of view, the challenge of BCI is to correctly classify incoming data for the computer to perform the adequate action.

State of the art To classify EEG data, usual machine learning (ML) techniques have been employed
and several speci c algorithms have been designed; see [7] for a recent review. Currently, the most popular methods (thanks to their eciency) are the ones based on sample covariance matrices (SCM) and their associated Riemannian geometry; see the original paper [8]. Given Z classes, K(z) training EEG recordings for each class, the related set fC(z) k g of sample covariance matrices can be de ned as

C(z)k = 1/T(z)k ∑ x(z)k (i)x(z)k (i)T ;

where T(z)k is the number of samples and x(z)k (i) is the ith sample of the k th preprocessed recording of thezth class. From there, the minimum distance to mean (MDM) classi er computes Z centers of mass G(z) corresponding to each class and given an incoming SCM C, its class z is the one corresponding to the minimum distance minz f(C;G(z))g. Other classi ers are obtained by rst computing a common center of mass G and projecting SCMs onto the tangent space of G with the Riemannian logarithm map to get flogG(C(z) k )g. A Euclidean classi er (LDA, SVM, etc.) is then learnt from the projected matrices and
the classi cation task is performed onto logG(C).

On the very applicative side, a BCI benchmarking Python library called MOABB has recently been developed to provide e ective comparison tools [9]. It features open BCI datasets for the main paradigms
(ERP, SSVEP, MI) along with associated preprocessing procedures. State-of-the-art BCI classi cation
methods are also available through this library. This greatly facilitate the development and testing of
new methods.

Key issues

Even though geometrical approaches have proven to be e ective, state-of-the-art EEG classi cation methods possess severe limitations. In particular, methods exploit SCMs and are thus based on a Gaussianity assumption. However, due to their biological nature, EEG data present high variability and contain outliers. Moreover, they are often limited in quantity. Therefore, one might an heavy-tailed distribution for the observed data and existing methods might be improved by exploiting the robust statistics theory; see e.g. [10]. Another striking example concerns the central role of preprocessing and tuning [11]. In the context of EEG data, preprocessing might be a complicated and very dataset dependent task. Consequently, turning robust methods might allow one to reduce the need for involved and ad-hoc preprocessing leading to more general tools for EEG data analysis.

Proposed methodology

The main aim of this project is then to develop robust classi cation and clustering methods suited to EEG data analysis. Instead of assuming a centered multivariate Gaussian distribution for the observed data, we allow for a broader statistical characterization based on the family of centered elliptical distributions [12], whose probability density function is, up to a scale factor,

f(x/C) = det(C)-1/2 g (xTC-1x/2)

where x is the data vector, C is the covariance matrix and g : R+ ⇒R+ is the so-called density generator. In practical cases, the true density generator is unknown and the solution to obtain a robust covariance matrix estimate is to employ an M-estimator such as Tyler’s [13]. Unfortunately, a drawback of the robust M-estimators is that they fail to be statistically ecient. To overcome this limitation, we may rely on a semiparametric approach [14]. In fact, the class of centered elliptical distributions can be seen as a semiparametric model where the nite-dimensional vector of interest is given by the (vectorized) covariance/scatter matrix, while the density generator represents an in nite-dimensional nuisance function. In this context, the class of R-estimators has been proved to be able to reconcile the two dichotomic concepts of robustness and (semiparametric) eciency [15] [16]. This can be achieved by exploiting the Le Cam’s theory of one-step ecient estimators and the rank-based statistics [17].

According to the above-mentioned theoretical results, in this project we plan to develop new robust,
semiparametric ecient, rank-based (R-) procedures for classi cation and clustering methods:

Approach 1: The rst approach that we are planning to follow is to use R-estimators for elliptical
distributed data to obtain the set of covariance matrices fC(z) k g related to the Z EEG classes. Then, geometrical classi ers, such as MDM, or some two-step approach (e.g. projection in tangent space + Euclidean classi er (SVM,. . . )), will be exploited and taylored to the speci c semiparametric EEG classi cation tasks. This approach will allow us to fully understand the potential bene t that joint geometrical-semiparametric ecient procedures may bring in EEG data analysis.

Approach 2: The second approach that we plan to pursue is more challenging that the rst one, but it is really promising from both theoretical and applicative viewpoints. In fact, while in the rst approach, R-estimator of covariance matrices and geometrical distance-based classi cation are two consecutive, but still separate steps, in this second approach we plan to develop original rank-based R-estimator of the distance itself, leading to optimal (a la Le Cam [18]) classi cation strategies. The starting point for this innovative research line will be the seminal work of Hallin and Paindaveine [19] on a semiparametric, rank-based, generalization of the widely-used Mahalanobis distance. The Gaussian-based Mahalanobis distance is in fact a key ingredients of many classi ers. Its robust and semiparametric ecien generalization will allow us drop the unrealistic Gaussian assumption, in favor of the much more general (semiparametric) elliptical one.

Validation, application to real data and software development In order to validate the developed
methods, numerical experiments on both simulated and real data will be conducted. To ensure the
practical interest of proposed algorithms, several commonly used open BCI datasets will be employed.
These datasets will be selected within the ones available in MOABB. Furthermore, developed software
solutions (in Python) are to be made available and integrated in MOABB to facilitate reproductibility.

[1] H. Berger. Uber das elektrenkephalogramm des menschen. Archiv fur psychiatrie und nervenkrankheiten, 87(1):527{570, 1929.
[2] M. Congedo, M. Goyat, N. Tarrin, G. Ionescu, L. Varnet, B. Rivet, R. Phlypo, N. Jrad, M. Acquadro,
and C. Jutten. « Brain Invaders »: a prototype of an open-source p300-based video game working with the openvibe platform. In 5th International Brain-Computer Interface Conference 2011 (BCI 2011), pages 280{283, 2011.
[3] L. Mayaud, S. Cabanilles, A. Van Langhenhove, M. Congedo, A. Barachant, S. Pouplin, S. Filipe,
L. Petegnief, O. Rochecouste, E. Azabou, C. Hugeron, M. Lejaille, D. Orlikowski, and D. Annane.
Brain-computer interface for the communication of acute patients: a feasibility study and a randomized
controlled trial comparing performance with healthy participants and a traditional assistive
device. Brain-Computer Interfaces, 3(4):197{215, 2016.
[4] E.K. Kalunga, S. Chevallier, Q. Barthelemy, K. Djouani, E. Monacelli, and Y. Hamam. Online
SSVEP-based BCI using Riemannian geometry. Neurocomputing, 191:55{68, 2016.
[5] S. Chevallier, G. Bao, P. Hammami, F. Marlats, L. Mayaud, D. Annane, F. Lofaso, and E. Azabou.
Brain-machine interface for mechanical ventilation using respiratory-related evoked potential. In
International Conference on Arti cial Neural Networks (ICANN), Rhodes, Greece, 2018.
[6] Y. Yang, S. Chevallier, J. Wiart, and I. Bloch. Subject-speci c time-frequency selection for multiclass
motor imagery-based BCIs using few Laplacian EEG channels. Biomedical Signal Processing
and Control, 38:302{311, 2017.
[7] F. Lotte, L. Bougrain, A. Cichocki, M. Clerc, M. Congedo, A. Rakotomamonjy, and F. Yger. A
review of classi cation algorithms for EEG-based brain{computer interfaces: a 10 year update.
Journal of neural engineering, 15(3), 2018.
[8] A. Barachant, S. Bonnet, M. Congedo, and C. Jutten. Multiclass brain{computer interface classi-
cation by Riemannian geometry. IEEE Transactions on Biomedical Engineering, 59(4):920{928,
[9] V. Jayaram and A. Barachant. MOABB: trustworthy algorithm benchmarking for BCIs. Journal of
neural engineering, 15(6):066011, 2018.
[10] R. A. Maronna, R. D. Martin, V. J. Yohai, and M. Salibian-Barrera. Robust statistics: theory and
methods (with R). John Wiley & Sons, 2019.
[11] S. Chevallier, E.K. Kalunga, Q. Barthelemy, and F. Yger. Riemannian classi cation for SSVEP
based BCI: oine versus online implementations. In BCI Handbook: Technological and Theoretical
Advances. CRC Press, 2018.
[12] E. Ollila, D. E. Tyler, V. Koivunen, and H. V. Poor. Complex elliptically symmetric distributions:
Survey, new results and applications. IEEE Transactions on Signal Processing, 60(11):5597{5625,
[13] D. E. Tyler. A distribution-free M-estimator of multivariate scatter. The Annals of Statistics, pages 234{251, 1987.
[14] P.J. Bickel, C.A.J Klaassen, Y. Ritov, and J.A. Wellner. Ecient and Adaptive Estimation for
Semiparametric Models. Johns Hopkins University Press, 1993.
[15] Marc Hallin, Hannu Oja, and Davy Paindaveine. Semiparametrically ecient rank-based inference
for shape II. Optimal R-estimation of shape. The Annals of Statistics, 34(6):2757{2789, 2006.
[16] S. Fortunati, A. Renaux, and F. Pascal. Robust semiparametric ecient estimators in complex
elliptically symmetric distributions. IEEE Transactions on Signal Processing, 68:5003{5015, 2020.
[17] Marc Hallin and Bas J. M. Werker. Semi-parametric eciency, distribution-freeness and invariance.
Bernoulli, 9(1):137{165, 2003.
[18] Lucien Le Cam and Grace Lo Yang. Asymptotics in Statistics: Some Basic Concepts (second edition).
Springer series in statistics, 2000.
[19] Marc Hallin and Davy Paindaveine. Optimal tests for multivariate location based on interdirections
and pseudo-Mahalanobis ranks. The Annals of Statistics, 30(4):1103 { 1133, 2002.


(Framework: L2S call for projects 2021)