Date limite de candidature : 01/03/2023
Date de début : 01/04/2023
Date de fin : 30/09/2023
Pôle : Signaux et statistiques
Type de poste : Stage
Contact : Stefano FORTUNATI (stefano.fortunati@l2s.centralesupelec.fr)
Robust Geometrical Learning for Electroencephalography
From there, the minimum distance to mean (MDM) classifier computes Z centers of mass G(z)
corresponding to each class and given an incoming SCM C, its class z is the one corresponding to the
minimum distance minz {δ(C, G(z))}. Other classifiers are obtained by first computing a common center
of mass G and projecting SCMs onto the tangent space of G with the Riemannian logarithm map to get
{logG(C(z)k )}. A Euclidean classifier (LDA, SVM, etc.) is then learnt from the projected matrices and
the classification task is performed onto logG(C).
On the very applicative side, a BCI benchmarking Python library called MOABB has recently been
developed to provide effective comparison tools [9]. It features open BCI datasets for the main paradigms
(ERP, SSVEP, MI) along with associated preprocessing procedures. State-of-the-art BCI classification
methods are also available through this library. This greatly facilitate the development and testing of
new methods.
Key issues Even though geometrical approaches have proven to be effective, state-of-the-art EEG
classification methods possess severe limitations. In particular, methods exploit SCMs and are thus
based on a Gaussianity assumption. However, due to their biological nature, EEG data present high
variability and contain outliers. Moreover, they are often limited in quantity. Therefore, one might
expect an heavy-tailed distribution for the observed data and existing methods might be improved by
exploiting the robust statistics theory; see e.g. [10]. Another striking example concerns the central role
of preprocessing and tuning [11]. In the context of EEG data, preprocessing might be a complicated and
very dataset-dependent task. Consequently, turning robust methods might allow one to reduce the need
for involved and ad-hoc preprocessing leading to more general tools for EEG data analysis.
Proposed methodology The main aim of this project is then to develop robust classification and
clustering methods suited to EEG data analysis. In order to achieve this ambitious final goal, the stage
will be structured in three phases:
1. Statistical analysis of the row data: As briefly discussed above, most of the existing classification
strategy of EEG signals are based on preprocessd date. However, as briefly discussed before, the
preprocessing is a delicate step that may cause the loss of statistical information contained in the
data and consequntly affect the classification performance. Then, the first goal of this stage will
be to go back to the row EEG data and statistically characterize them without any preprocessing.
Instead of assuming a centered multivariate Gaussian distribution for the observed data, we allow
for a broader statistical characterization based on the family of centered elliptical distributions [12],
whose probability density function is, up to a scale factor,
f (x|C) = det(C)−1/2g
( xT C−1x)/2
where x is the data vector, C is the covariance matrix and g : R+ → R+ is the so-called density
generator. In practical cases, the true density generator is unknown and the solution to obtain a
robust covariance matrix estimate is to employ an M -estimator such as Tyler’s [13]. Unfortunately,
a drawback of the robust M -estimators is that they fail to be statistically efficient. To overcome this
limitation, we may rely on a semiparametric approach [14]. In fact, the class of centered elliptical
distributions can be seen as a semiparametric model where the finite-dimensional vector of interest
is given by the (vectorized) covariance/scatter matrix, while the density generator represents an
infinite-dimensional nuisance function. Once the statistical model for the observed row data is set,
the next step of the stage will be to derive efficient classification procedures for EEG signals.
2. New efficient covariance learning for existing classification strategies The first approach that we
plan to follow is to combine new robust and efficient learning strategy of the covariance matrix
of the row EEG data with existing classification strategies. As new efficient learning method,
the class of R-estimators has been proved to be able to reconcile the two dichotomic concepts
of robustness and (semiparametric) efficiency [15] [16] in elliptically distributed data. Then, the
first approach that we are planning to follow is to use R-estimators for elliptical distributed data
to obtain the set of covariance matrices {C(z) k } related to the Z EEG classes. Consequently,
geometrical classifiers, such as MDM, or some two-step approach (e.g. projection in tangent space
+ Euclidean classifier (SVM,. . . )), will be exploited and taylored to the specific semiparametric
EEG classification tasks. This approach will allow us to fully understand the potential benefit that
joint geometrical-semiparametric efficient procedures may bring in EEG data analysis.
3. New efficient semiparametric classification strategies The second approach that we plan to pursue
in this stage is more challenging that the first one, but it is really promising from both theoretical
and applicative viewpoints. In fact, while in the first approach, the robust learning of the data
covariance matrix and the geometrical distance-based classification are two consecutive, but still
separate steps, in this second approach we plan to develop original semiparametric distance learning
methodologies, leading to optimal classification strategies. The starting point for this innovative
research line will be the seminal work of Hallin and Paindaveine [17] on a semiparametric, rank-
based, generalization of the widely-used Mahalanobis distance. The Gaussian-based Mahalanobis
distance is in fact a key ingredients of many classifiers. Its robust and semiparametric efficient
generalization will allow us drop the unrealistic Gaussian assumption, in favor of the much more
general (semiparametric) elliptical one.
Validation, application to real data and software development In order to validate the devel-
oped methods, numerical experiments on both simulated and real data will be conducted. To ensure the
practical interest of proposed algorithms, several commonly used open BCI datasets will be employed.
These datasets will be selected within the ones available in MOABB. Furthermore, developed software
solutions (in Python) are to be made available and integrated in MOABB to facilitate reproductibility.
[3] L. Mayaud, S. Cabanilles, A. Van Langhenhove, M. Congedo, A. Barachant, S. Pouplin, S. Filipe,
L. P ́et ́egnief, O. Rochecouste, E. Azabou, C. Hugeron, M. Lejaille, D. Orlikowski, and D. Annane.
Brain-computer interface for the communication of acute patients: a feasibility study and a random-
ized controlled trial comparing performance with healthy participants and a traditional assistive
device. Brain-Computer Interfaces, 3(4):197–215, 2016.
[4] E.K. Kalunga, S. Chevallier, Q. Barth ́elemy, K. Djouani, E. Monacelli, and Y. Hamam. Online
SSVEP-based BCI using Riemannian geometry. Neurocomputing, 191:55–68, 2016.
[5] S. Chevallier, G. Bao, P. Hammami, F. Marlats, L. Mayaud, D. Annane, F. Lofaso, and E. Azabou.
Brain-machine interface for mechanical ventilation using respiratory-related evoked potential. In
International Conference on Artificial Neural Networks (ICANN), Rhodes, Greece, 2018.
[6] Y. Yang, S. Chevallier, J. Wiart, and I. Bloch. Subject-specific time-frequency selection for multi-
class motor imagery-based BCIs using few Laplacian EEG channels. Biomedical Signal Processing
and Control, 38:302–311, 2017.
[7] F. Lotte, L. Bougrain, A. Cichocki, M. Clerc, M. Congedo, A. Rakotomamonjy, and F. Yger. A
review of classification algorithms for EEG-based brain–computer interfaces: a 10 year update.
Journal of neural engineering, 15(3), 2018.
[8] A. Barachant, S. Bonnet, M. Congedo, and C. Jutten. Multiclass brain–computer interface classi-
fication by Riemannian geometry. IEEE Transactions on Biomedical Engineering, 59(4):920–928,
2011.
[9] V. Jayaram and A. Barachant. MOABB: trustworthy algorithm benchmarking for BCIs. Journal of
neural engineering, 15(6):066011, 2018.
[10] R. A. Maronna, R. D. Martin, V. J. Yohai, and M. Salibi ́an-Barrera. Robust statistics: theory and
methods (with R). John Wiley & Sons, 2019.
[11] S. Chevallier, E.K. Kalunga, Q. Barth ́elemy, and F. Yger. Riemannian classification for SSVEP
based BCI: offline versus online implementations. In BCI Handbook: Technological and Theoretical
Advances. CRC Press, 2018.
[12] E. Ollila, D. E. Tyler, V. Koivunen, and H. V. Poor. Complex elliptically symmetric distributions:
Survey, new results and applications. IEEE Transactions on Signal Processing, 60(11):5597–5625,
2012.
[13] D. E. Tyler. A distribution-free M-estimator of multivariate scatter. The Annals of Statistics, pages
234–251, 1987.
[14] P.J. Bickel, C.A.J Klaassen, Y. Ritov, and J.A. Wellner. Efficient and Adaptive Estimation for
Semiparametric Models. Johns Hopkins University Press, 1993.
[15] Marc Hallin, Hannu Oja, and Davy Paindaveine. Semiparametrically efficient rank-based inference
for shape II. Optimal R-estimation of shape. The Annals of Statistics, 34(6):2757–2789, 2006.
[16] S. Fortunati, A. Renaux, and F. Pascal. Robust semiparametric efficient estimators in complex
elliptically symmetric distributions. IEEE Transactions on Signal Processing, 68:5003–5015, 2020.
[17] Marc Hallin and Davy Paindaveine. Optimal tests for multivariate location based on interdirections
and pseudo-Mahalanobis ranks. The Annals of Statistics, 30(4):1103 – 1133, 2002.
CentraleSupélec,
bât. Bréguet, 3, rue Joliot Curie,
91190 Gif-sur-Yvette
Tutelles
©2023 L2S - Tous droits réservés, reproduction interdite.