**General information**

1. Laboratory: Laboratoire Signaux et Systemes (L2S), CentraleSupelec, CNRS, Univ. Paris-Saclay

2. Supervision: Stefano Fortunati (enseignant-chercheur IPSA/L2S) and Florent Bouchard (CR

CNRS)

**Abstract**

Electroencephalography (EEG) is a neuroimagery modality, which well captures the dynamics of the brain activity but suers from high variability, low signal-to-noise ratio and spatial resolution. It is extensively used in brain computer interfaces (BCI), where the subject interacts with a computer through its brain signals. The challenge of BCI is to correctly classify incoming data. State-of-the-art methods are based on sample covariance matrices and their associated Riemannian geometry. Even though such approaches have shown to be eective, they have several limitaions such as the Gaussianity assumption or the need for very specic preprocessing. In this project, we propose to exploit geometry along with robust statistics to develop original classication and clustering methods suited to EEG data. In particular, the potential benet that a new class of robust, rank-based, R-estimators may bring in the context of EEG data will be investigated. The performance of the original algorithms to be developed will be validated on real EEG data obtained from commonly used open BCI datasets.

**Keywords**

Robust statistics { Riemannian geometry { Machine learning { Electroencephalography { Brain computer interface

**Scientic description**

**Context ***Electroencephalography* (EEG) is a non-invasive neuroimagery modality invented by Hans Berger during the 1920s [1]. It consists in recording the electrical brain activity with electrodes placed on the scalp. Its low cost, simplicity and high temporal resolution (it well captures the dynamics of the brain activity) made the popularity of this modality and allowed its use in many applications. EEG, however, suers from low signal-to-noise ratio (SNR) and spatial resolution: electrical brain signals are mixed while going through brain tissues, skull and scalp; and electrodes also record environmental (e.g., electrical appliances) and biological (e.g., heart, occular movements) disturbances.

EEG is the preferred functional neuroimaging technique for brain computer interfaces (BCI), where the subject interacts with a computer through its brain signals. A rst use of BCI is for video games [2], which can be employed to study specic brain phenomena, such as event-related potentials in the visual cortex. They are also used for medical purposes, for instance to assist disabled people [3], control an exoskeleton [4] or help mechanical ventilation [5]. There are datasets with dierent paradigms, the main ones are: event-related potentials (ERP) [2], which correspond to a response of the brain to a stimulus (e.g., light ash); steady state visual evoked potentials (SSVEP) [4], where the visual cortex is synchronized with a light blinking at a xed frequency; and motor imagery (MI) [6], where the subject imagine moving the feet, right or left hand. From a data analysis point of view, the challenge of BCI is to correctly classify incoming data for the computer to perform the adequate action.

**State of the art** To classify EEG data, usual machine learning (ML) techniques have been employed

and several specic algorithms have been designed; see [7] for a recent review. Currently, the most popular methods (thanks to their eciency) are the ones based on sample covariance matrices (SCM) and their associated Riemannian geometry; see the original paper [8]. Given Z classes, K(z) training EEG recordings for each class, the related set fC(z) k g of sample covariance matrices can be dened as

C(z)k = 1/T(z)k ∑ x(z)k (i)x(z)k (i)T ;

where T(z)k is the number of samples and x(z)k (i) is the ith sample of the k th preprocessed recording of thezth class. From there, the minimum distance to mean (MDM) classier computes Z centers of mass G(z) corresponding to each class and given an incoming SCM C, its class z is the one corresponding to the minimum distance minz f(C;G(z))g. Other classiers are obtained by rst computing a common center of mass G and projecting SCMs onto the tangent space of G with the Riemannian logarithm map to get flogG(C(z) k )g. A Euclidean classier (LDA, SVM, etc.) is then learnt from the projected matrices and

the classication task is performed onto logG(C).

On the very applicative side, a BCI benchmarking Python library called MOABB has recently been developed to provide eective comparison tools [9]. It features open BCI datasets for the main paradigms

(ERP, SSVEP, MI) along with associated preprocessing procedures. State-of-the-art BCI classication

methods are also available through this library. This greatly facilitate the development and testing of

new methods.

**Key issues**

Even though geometrical approaches have proven to be eective, state-of-the-art EEG classication methods possess severe limitations. In particular, methods exploit SCMs and are thus based on a Gaussianity assumption. However, due to their biological nature, EEG data present high variability and contain outliers. Moreover, they are often limited in quantity. Therefore, one might an heavy-tailed distribution for the observed data and existing methods might be improved by exploiting the robust statistics theory; see e.g. [10]. Another striking example concerns the central role of preprocessing and tuning [11]. In the context of EEG data, preprocessing might be a complicated and very dataset dependent task. Consequently, turning robust methods might allow one to reduce the need for involved and ad-hoc preprocessing leading to more general tools for EEG data analysis.

**Proposed methodology**

The main aim of this project is then to develop robust classication and clustering methods suited to EEG data analysis. Instead of assuming a centered multivariate Gaussian distribution for the observed data, we allow for a broader statistical characterization based on the family of centered elliptical distributions [12], whose probability density function is, up to a scale factor,

f(x/C) = det(C)-1/2 g (xTC-1x/2)

where *x* is the data vector, C is the covariance matrix and g : R+ ⇒R+ is the so-called density generator. In practical cases, the true density generator is unknown and the solution to obtain a robust covariance matrix estimate is to employ an M-estimator such as Tyler’s [13]. Unfortunately, a drawback of the robust M-estimators is that they fail to be statistically ecient. To overcome this limitation, we may rely on a semiparametric approach [14]. In fact, the class of centered elliptical distributions can be seen as a semiparametric model where the nite-dimensional vector of interest is given by the (vectorized) covariance/scatter matrix, while the density generator represents an innite-dimensional nuisance function. In this context, the class of R-estimators has been proved to be able to reconcile the two dichotomic concepts of robustness and (semiparametric) eciency [15] [16]. This can be achieved by exploiting the Le Cam’s theory of one-step ecient estimators and the rank-based statistics [17].

According to the above-mentioned theoretical results, in this project we plan to develop new robust,

semiparametric ecient, rank-based (R-) procedures for classication and clustering methods:

Approach 1: The rst approach that we are planning to follow is to use R-estimators for elliptical

distributed data to obtain the set of covariance matrices fC(z) k g related to the Z EEG classes. Then, geometrical classiers, such as MDM, or some two-step approach (e.g. projection in tangent space + Euclidean classier (SVM,. . . )), will be exploited and taylored to the specic semiparametric EEG classication tasks. This approach will allow us to fully understand the potential benet that joint geometrical-semiparametric ecient procedures may bring in EEG data analysis.

Approach 2: The second approach that we plan to pursue is more challenging that the rst one, but it is really promising from both theoretical and applicative viewpoints. In fact, while in the rst approach, R-estimator of covariance matrices and geometrical distance-based classication are two consecutive, but still separate steps, in this second approach we plan to develop original rank-based R-estimator of the distance itself, leading to optimal (a la Le Cam [18]) classication strategies. The starting point for this innovative research line will be the seminal work of Hallin and Paindaveine [19] on a semiparametric, rank-based, generalization of the widely-used Mahalanobis distance. The Gaussian-based Mahalanobis distance is in fact a key ingredients of many classiers. Its robust and semiparametric ecien generalization will allow us drop the unrealistic Gaussian assumption, in favor of the much more general (semiparametric) elliptical one.

Validation, application to real data and software development In order to validate the developed

methods, numerical experiments on both simulated and real data will be conducted. To ensure the

practical interest of proposed algorithms, several commonly used open BCI datasets will be employed.

These datasets will be selected within the ones available in MOABB. Furthermore, developed software

solutions (in Python) are to be made available and integrated in MOABB to facilitate reproductibility.

**References**

[1] H. Berger. Uber das elektrenkephalogramm des menschen. Archiv fur psychiatrie und nervenkrankheiten, 87(1):527{570, 1929.

[2] M. Congedo, M. Goyat, N. Tarrin, G. Ionescu, L. Varnet, B. Rivet, R. Phlypo, N. Jrad, M. Acquadro,

and C. Jutten. « Brain Invaders »: a prototype of an open-source p300-based video game working with the openvibe platform. In 5th International Brain-Computer Interface Conference 2011 (BCI 2011), pages 280{283, 2011.

[3] L. Mayaud, S. Cabanilles, A. Van Langhenhove, M. Congedo, A. Barachant, S. Pouplin, S. Filipe,

L. Petegnief, O. Rochecouste, E. Azabou, C. Hugeron, M. Lejaille, D. Orlikowski, and D. Annane.

Brain-computer interface for the communication of acute patients: a feasibility study and a randomized

controlled trial comparing performance with healthy participants and a traditional assistive

device. Brain-Computer Interfaces, 3(4):197{215, 2016.

[4] E.K. Kalunga, S. Chevallier, Q. Barthelemy, K. Djouani, E. Monacelli, and Y. Hamam. Online

SSVEP-based BCI using Riemannian geometry. Neurocomputing, 191:55{68, 2016.

[5] S. Chevallier, G. Bao, P. Hammami, F. Marlats, L. Mayaud, D. Annane, F. Lofaso, and E. Azabou.

Brain-machine interface for mechanical ventilation using respiratory-related evoked potential. In

International Conference on Articial Neural Networks (ICANN), Rhodes, Greece, 2018.

[6] Y. Yang, S. Chevallier, J. Wiart, and I. Bloch. Subject-specic time-frequency selection for multiclass

motor imagery-based BCIs using few Laplacian EEG channels. Biomedical Signal Processing

and Control, 38:302{311, 2017.

[7] F. Lotte, L. Bougrain, A. Cichocki, M. Clerc, M. Congedo, A. Rakotomamonjy, and F. Yger. A

review of classication algorithms for EEG-based brain{computer interfaces: a 10 year update.

Journal of neural engineering, 15(3), 2018.

[8] A. Barachant, S. Bonnet, M. Congedo, and C. Jutten. Multiclass brain{computer interface classi-

cation by Riemannian geometry. IEEE Transactions on Biomedical Engineering, 59(4):920{928,

2011.

[9] V. Jayaram and A. Barachant. MOABB: trustworthy algorithm benchmarking for BCIs. Journal of

neural engineering, 15(6):066011, 2018.

[10] R. A. Maronna, R. D. Martin, V. J. Yohai, and M. Salibian-Barrera. Robust statistics: theory and

methods (with R). John Wiley & Sons, 2019.

[11] S. Chevallier, E.K. Kalunga, Q. Barthelemy, and F. Yger. Riemannian classication for SSVEP

based BCI: oine versus online implementations. In BCI Handbook: Technological and Theoretical

Advances. CRC Press, 2018.

[12] E. Ollila, D. E. Tyler, V. Koivunen, and H. V. Poor. Complex elliptically symmetric distributions:

Survey, new results and applications. IEEE Transactions on Signal Processing, 60(11):5597{5625,

2012.

[13] D. E. Tyler. A distribution-free M-estimator of multivariate scatter. The Annals of Statistics, pages 234{251, 1987.

[14] P.J. Bickel, C.A.J Klaassen, Y. Ritov, and J.A. Wellner. Ecient and Adaptive Estimation for

Semiparametric Models. Johns Hopkins University Press, 1993.

[15] Marc Hallin, Hannu Oja, and Davy Paindaveine. Semiparametrically ecient rank-based inference

for shape II. Optimal R-estimation of shape. The Annals of Statistics, 34(6):2757{2789, 2006.

[16] S. Fortunati, A. Renaux, and F. Pascal. Robust semiparametric ecient estimators in complex

elliptically symmetric distributions. IEEE Transactions on Signal Processing, 68:5003{5015, 2020.

[17] Marc Hallin and Bas J. M. Werker. Semi-parametric eciency, distribution-freeness and invariance.

Bernoulli, 9(1):137{165, 2003.

[18] Lucien Le Cam and Grace Lo Yang. Asymptotics in Statistics: Some Basic Concepts (second edition).

Springer series in statistics, 2000.

[19] Marc Hallin and Davy Paindaveine. Optimal tests for multivariate location based on interdirections

and pseudo-Mahalanobis ranks. The Annals of Statistics, 30(4):1103 { 1133, 2002.

(*Framework:* *L2S call for projects 2021*)