Data partitions in BCI make a difference

As in any machine-learning problem, care should be taken in how brain signals in a dataset are used in the context of BCI. Otherwise, some form of data leakage may happen, which may impact in the results obtained and, in turn, the reported performance. It is therefore critical to be cautious when conducting experiments and reporting performance to properly perform data partitioning and, in any case, clearly stating how data splits are performed and why, and which implications this may have. Not sticking to these good practices may hinder the progress in the field, and might cause confusion or frustration to authors who try to understand, make sense, or reproduce other authors’ approaches. One big danger is to attribute the merits of (may be surprisingly) high performance to the proposed methodology whereas data partitioining may have a lot to say (maybe more than, for instance, a new proposed deep learning model) about the reported SOTA performance.

In this paper we perform a detailed analysis of the effect on performance of data partitioning under different conditions, for the case of emotion recognition from electroencephalogram (EEG) signals elicited from stimuli videos. Three data splits are considered, each representing a relevant BCI task: subject-independent (affective decoding), video-independent (affective annotation), and time-based (feature extraction). It is found that model performance may change significantly (ranging e.g. from 50% to 90%) depending on how data is partitioned, in classification accuracy.

Moreno-Alcayde, Y., Traver, V.J. & Leiva, L.A. Sneaky emotions: impact of data partitions in affective computing experiments with brain-computer interfacingBiomed. Eng. Lett. 14, 103–113 (2024).