**Date limite de candidature :** 01/03/2023**Date de début :** 01/04/2023**Date de fin :** 30/09/2023

**Pôle :** Signaux et statistiques**Type de poste :** Stage**Contact : **Stefano FORTUNATI (stefano.fortunati@l2s.centralesupelec.fr)

Proposal for a stage M2

Robust Geometrical Learning for Electroencephalography

General information

1. Laboratory: Laboratoire Signaux et Systèmes (L2S), CentraleSupélec, CNRS, Univ. Paris-Saclay

2. Supervision: Florent Bouchard, (CNRS, L2S), Stefano Fortunati (IPSA, L2S), Ammar Mian

(Université Savoie Mont Blanc, LISTIC)

Abstract

Electroencephalography (EEG) is a neuroimagery modality, which well captures the dynamics of the brain

activity but suffers from high variability, low signal-to-noise ratio and spatial resolution. It is extensively

used in brain computer interfaces (BCI), where the subject interacts with a computer through its brain

signals. The challenge of BCI is to correctly classify incoming data. State-of-the-art methods are based

on sample covariance matrices and their associated Riemannian geometry. Even though such approaches

have shown to be effective, they have several limitaions such as the Gaussianity assumption or the need

for very specific preprocessing. In this project, we propose to exploit geometry along with robust statistics

to develop original classification and clustering methods suited to EEG data. In particular, the potential

benefit that a new class of robust learning methodologies, called R-estimators, may bring in the context

of EEG data will be investigated. The performance of the original algorithms to be developed will be

validated on real EEG data obtained from commonly used open BCI datasets.

Electroencephalography (EEG) is a neuroimagery modality, which well captures the dynamics of the brain

activity but suffers from high variability, low signal-to-noise ratio and spatial resolution. It is extensively

used in brain computer interfaces (BCI), where the subject interacts with a computer through its brain

signals. The challenge of BCI is to correctly classify incoming data. State-of-the-art methods are based

on sample covariance matrices and their associated Riemannian geometry. Even though such approaches

have shown to be effective, they have several limitaions such as the Gaussianity assumption or the need

for very specific preprocessing. In this project, we propose to exploit geometry along with robust statistics

to develop original classification and clustering methods suited to EEG data. In particular, the potential

benefit that a new class of robust learning methodologies, called R-estimators, may bring in the context

of EEG data will be investigated. The performance of the original algorithms to be developed will be

validated on real EEG data obtained from commonly used open BCI datasets.

Keywords

Robust learning – Riemannian geometry – Machine learning – Electroencephalography – Brain computer

interface

Robust learning – Riemannian geometry – Machine learning – Electroencephalography – Brain computer

interface

Scientific description

Context Electroencephalography (EEG) is a non-invasive neuroimagery modality invented by Hans

Berger during the 1920s [1]. It consists in recording the electrical brain activity with electrodes placed

on the scalp. Its low cost, simplicity and high temporal resolution (it well captures the dynamics of the

brain activity) made the popularity of this modality and allowed its use in many applications. EEG,

however, suffers from low signal-to-noise ratio (SNR) and spatial resolution: electrical brain signals are

mixed while going through brain tissues, skull and scalp; and electrodes also record environmental (e.g.,

electrical appliances) and biological (e.g., heart, occular movements) disturbances.

EEG is the preferred functional neuroimaging technique for brain computer interfaces (BCI), where

the subject interacts with a computer through its brain signals. A first use of BCI is for video games [2],

which can be employed to study specific brain phenomena, such as event-related potentials in the visual

cortex. They are also used for medical purposes, for instance to assist disabled people [3], control an

exoskeleton [4] or help mechanical ventilation [5]. There are datasets with different paradigms, the

main ones are: event-related potentials (ERP) [2], which correspond to a response of the brain to a

stimulus (e.g., light flash); steady state visual evoked potentials (SSVEP) [4], where the visual cortex is

synchronized with a light blinking at a fixed frequency; and motor imagery (MI) [6], where the subject

imagine moving the feet, right or left hand. From a data analysis point of view, the challenge of BCI is

to correctly classify incoming data for the computer to perform the adequate action.

State of the art To classify EEG data, usual machine learning (ML) techniques have been employed

and several specific algorithms have been designed; see [7] for a recent review. Currently, the most

popular methods (thanks to their efficiency) are the ones based on sample covariance matrices (SCM)

and their associated Riemannian geometry; see the original paper [8]. Given Z classes, K(z) training

EEG recordings for each class, the related set {C(z)

k } of sample covariance matrices can be defined as C(z)k = 1 T (z)k∑ix(z)k (i)x(z)k (i)T ,

where T (z)k is the number of samples and x(z)k (i) is the ith sample of the kth preprocessed recording of the zth class.

Context Electroencephalography (EEG) is a non-invasive neuroimagery modality invented by Hans

Berger during the 1920s [1]. It consists in recording the electrical brain activity with electrodes placed

on the scalp. Its low cost, simplicity and high temporal resolution (it well captures the dynamics of the

brain activity) made the popularity of this modality and allowed its use in many applications. EEG,

however, suffers from low signal-to-noise ratio (SNR) and spatial resolution: electrical brain signals are

mixed while going through brain tissues, skull and scalp; and electrodes also record environmental (e.g.,

electrical appliances) and biological (e.g., heart, occular movements) disturbances.

EEG is the preferred functional neuroimaging technique for brain computer interfaces (BCI), where

the subject interacts with a computer through its brain signals. A first use of BCI is for video games [2],

which can be employed to study specific brain phenomena, such as event-related potentials in the visual

cortex. They are also used for medical purposes, for instance to assist disabled people [3], control an

exoskeleton [4] or help mechanical ventilation [5]. There are datasets with different paradigms, the

main ones are: event-related potentials (ERP) [2], which correspond to a response of the brain to a

stimulus (e.g., light flash); steady state visual evoked potentials (SSVEP) [4], where the visual cortex is

synchronized with a light blinking at a fixed frequency; and motor imagery (MI) [6], where the subject

imagine moving the feet, right or left hand. From a data analysis point of view, the challenge of BCI is

to correctly classify incoming data for the computer to perform the adequate action.

State of the art To classify EEG data, usual machine learning (ML) techniques have been employed

and several specific algorithms have been designed; see [7] for a recent review. Currently, the most

popular methods (thanks to their efficiency) are the ones based on sample covariance matrices (SCM)

and their associated Riemannian geometry; see the original paper [8]. Given Z classes, K(z) training

EEG recordings for each class, the related set {C(z)

k } of sample covariance matrices can be defined as C(z)k = 1 T (z)k∑ix(z)k (i)x(z)k (i)T ,

where T (z)k is the number of samples and x(z)k (i) is the ith sample of the kth preprocessed recording of the zth class.

From there, the minimum distance to mean (MDM) classifier computes Z centers of mass G(z)

corresponding to each class and given an incoming SCM C, its class z is the one corresponding to the

minimum distance minz {δ(C, G(z))}. Other classifiers are obtained by first computing a common center

of mass G and projecting SCMs onto the tangent space of G with the Riemannian logarithm map to get

{logG(C(z)k )}. A Euclidean classifier (LDA, SVM, etc.) is then learnt from the projected matrices and

the classification task is performed onto logG(C).

On the very applicative side, a BCI benchmarking Python library called MOABB has recently been

developed to provide effective comparison tools [9]. It features open BCI datasets for the main paradigms

(ERP, SSVEP, MI) along with associated preprocessing procedures. State-of-the-art BCI classification

methods are also available through this library. This greatly facilitate the development and testing of

new methods.

Key issues Even though geometrical approaches have proven to be effective, state-of-the-art EEG

classification methods possess severe limitations. In particular, methods exploit SCMs and are thus

based on a Gaussianity assumption. However, due to their biological nature, EEG data present high

variability and contain outliers. Moreover, they are often limited in quantity. Therefore, one might

expect an heavy-tailed distribution for the observed data and existing methods might be improved by

exploiting the robust statistics theory; see e.g. [10]. Another striking example concerns the central role

of preprocessing and tuning [11]. In the context of EEG data, preprocessing might be a complicated and

very dataset-dependent task. Consequently, turning robust methods might allow one to reduce the need

for involved and ad-hoc preprocessing leading to more general tools for EEG data analysis.

Proposed methodology The main aim of this project is then to develop robust classification and

clustering methods suited to EEG data analysis. In order to achieve this ambitious final goal, the stage

will be structured in three phases:

1. Statistical analysis of the row data: As briefly discussed above, most of the existing classification

strategy of EEG signals are based on preprocessd date. However, as briefly discussed before, the

preprocessing is a delicate step that may cause the loss of statistical information contained in the

data and consequntly affect the classification performance. Then, the first goal of this stage will

be to go back to the row EEG data and statistically characterize them without any preprocessing.

Instead of assuming a centered multivariate Gaussian distribution for the observed data, we allow

for a broader statistical characterization based on the family of centered elliptical distributions [12],

whose probability density function is, up to a scale factor,

f (x|C) = det(C)−1/2g

( xT C−1x)/2

where x is the data vector, C is the covariance matrix and g : R+ → R+ is the so-called density

generator. In practical cases, the true density generator is unknown and the solution to obtain a

robust covariance matrix estimate is to employ an M -estimator such as Tyler’s [13]. Unfortunately,

a drawback of the robust M -estimators is that they fail to be statistically efficient. To overcome this

limitation, we may rely on a semiparametric approach [14]. In fact, the class of centered elliptical

distributions can be seen as a semiparametric model where the finite-dimensional vector of interest

is given by the (vectorized) covariance/scatter matrix, while the density generator represents an

infinite-dimensional nuisance function. Once the statistical model for the observed row data is set,

the next step of the stage will be to derive efficient classification procedures for EEG signals.

2. New efficient covariance learning for existing classification strategies The first approach that we

plan to follow is to combine new robust and efficient learning strategy of the covariance matrix

of the row EEG data with existing classification strategies. As new efficient learning method,

the class of R-estimators has been proved to be able to reconcile the two dichotomic concepts

of robustness and (semiparametric) efficiency [15] [16] in elliptically distributed data. Then, the

first approach that we are planning to follow is to use R-estimators for elliptical distributed data

to obtain the set of covariance matrices {C(z) k } related to the Z EEG classes. Consequently,

geometrical classifiers, such as MDM, or some two-step approach (e.g. projection in tangent space

+ Euclidean classifier (SVM,. . . )), will be exploited and taylored to the specific semiparametric

EEG classification tasks. This approach will allow us to fully understand the potential benefit that

joint geometrical-semiparametric efficient procedures may bring in EEG data analysis.

3. New efficient semiparametric classification strategies The second approach that we plan to pursue

in this stage is more challenging that the first one, but it is really promising from both theoretical

and applicative viewpoints. In fact, while in the first approach, the robust learning of the data

covariance matrix and the geometrical distance-based classification are two consecutive, but still

separate steps, in this second approach we plan to develop original semiparametric distance learning

methodologies, leading to optimal classification strategies. The starting point for this innovative

research line will be the seminal work of Hallin and Paindaveine [17] on a semiparametric, rank-

based, generalization of the widely-used Mahalanobis distance. The Gaussian-based Mahalanobis

distance is in fact a key ingredients of many classifiers. Its robust and semiparametric efficient

generalization will allow us drop the unrealistic Gaussian assumption, in favor of the much more

general (semiparametric) elliptical one.

Validation, application to real data and software development In order to validate the devel-

oped methods, numerical experiments on both simulated and real data will be conducted. To ensure the

practical interest of proposed algorithms, several commonly used open BCI datasets will be employed.

These datasets will be selected within the ones available in MOABB. Furthermore, developed software

solutions (in Python) are to be made available and integrated in MOABB to facilitate reproductibility.

References

[1] H. Berger. ̈Uber das elektrenkephalogramm des menschen. Archiv f ̈ur psychiatrie und ner-

venkrankheiten, 87(1):527–570, 1929.

[2] M. Congedo, M. Goyat, N. Tarrin, G. Ionescu, L. Varnet, B. Rivet, R. Phlypo, N. Jrad, M. Acquadro,

and C. Jutten. ”Brain Invaders”: a prototype of an open-source p300-based video game working

with the openvibe platform. In 5th International Brain-Computer Interface Conference 2011 (BCI

2011), pages 280–283, 2011

[1] H. Berger. ̈Uber das elektrenkephalogramm des menschen. Archiv f ̈ur psychiatrie und ner-

venkrankheiten, 87(1):527–570, 1929.

[2] M. Congedo, M. Goyat, N. Tarrin, G. Ionescu, L. Varnet, B. Rivet, R. Phlypo, N. Jrad, M. Acquadro,

and C. Jutten. ”Brain Invaders”: a prototype of an open-source p300-based video game working

with the openvibe platform. In 5th International Brain-Computer Interface Conference 2011 (BCI

2011), pages 280–283, 2011

[3] L. Mayaud, S. Cabanilles, A. Van Langhenhove, M. Congedo, A. Barachant, S. Pouplin, S. Filipe,

L. P ́et ́egnief, O. Rochecouste, E. Azabou, C. Hugeron, M. Lejaille, D. Orlikowski, and D. Annane.

Brain-computer interface for the communication of acute patients: a feasibility study and a random-

ized controlled trial comparing performance with healthy participants and a traditional assistive

device. Brain-Computer Interfaces, 3(4):197–215, 2016.

[4] E.K. Kalunga, S. Chevallier, Q. Barth ́elemy, K. Djouani, E. Monacelli, and Y. Hamam. Online

SSVEP-based BCI using Riemannian geometry. Neurocomputing, 191:55–68, 2016.

[5] S. Chevallier, G. Bao, P. Hammami, F. Marlats, L. Mayaud, D. Annane, F. Lofaso, and E. Azabou.

Brain-machine interface for mechanical ventilation using respiratory-related evoked potential. In

International Conference on Artificial Neural Networks (ICANN), Rhodes, Greece, 2018.

[6] Y. Yang, S. Chevallier, J. Wiart, and I. Bloch. Subject-specific time-frequency selection for multi-

class motor imagery-based BCIs using few Laplacian EEG channels. Biomedical Signal Processing

and Control, 38:302–311, 2017.

[7] F. Lotte, L. Bougrain, A. Cichocki, M. Clerc, M. Congedo, A. Rakotomamonjy, and F. Yger. A

review of classification algorithms for EEG-based brain–computer interfaces: a 10 year update.

Journal of neural engineering, 15(3), 2018.

[8] A. Barachant, S. Bonnet, M. Congedo, and C. Jutten. Multiclass brain–computer interface classi-

fication by Riemannian geometry. IEEE Transactions on Biomedical Engineering, 59(4):920–928,

2011.

[9] V. Jayaram and A. Barachant. MOABB: trustworthy algorithm benchmarking for BCIs. Journal of

neural engineering, 15(6):066011, 2018.

[10] R. A. Maronna, R. D. Martin, V. J. Yohai, and M. Salibi ́an-Barrera. Robust statistics: theory and

methods (with R). John Wiley & Sons, 2019.

[11] S. Chevallier, E.K. Kalunga, Q. Barth ́elemy, and F. Yger. Riemannian classification for SSVEP

based BCI: offline versus online implementations. In BCI Handbook: Technological and Theoretical

Advances. CRC Press, 2018.

[12] E. Ollila, D. E. Tyler, V. Koivunen, and H. V. Poor. Complex elliptically symmetric distributions:

Survey, new results and applications. IEEE Transactions on Signal Processing, 60(11):5597–5625,

2012.

[13] D. E. Tyler. A distribution-free M-estimator of multivariate scatter. The Annals of Statistics, pages

234–251, 1987.

[14] P.J. Bickel, C.A.J Klaassen, Y. Ritov, and J.A. Wellner. Efficient and Adaptive Estimation for

Semiparametric Models. Johns Hopkins University Press, 1993.

[15] Marc Hallin, Hannu Oja, and Davy Paindaveine. Semiparametrically efficient rank-based inference

for shape II. Optimal R-estimation of shape. The Annals of Statistics, 34(6):2757–2789, 2006.

[16] S. Fortunati, A. Renaux, and F. Pascal. Robust semiparametric efficient estimators in complex

elliptically symmetric distributions. IEEE Transactions on Signal Processing, 68:5003–5015, 2020.

[17] Marc Hallin and Davy Paindaveine. Optimal tests for multivariate location based on interdirections

and pseudo-Mahalanobis ranks. The Annals of Statistics, 30(4):1103 – 1133, 2002.

CentraleSupélec,

bât. Bréguet, 3, rue Joliot Curie,

91190 Gif-sur-Yvette

Tutelles

©2022 L2S - Tous droits réservés, reproduction interdite.