This repository is used to store codes for EEG2Text paper of IEEE BigData 2024.
First, create environment by using environment.yml to download required libreries.
Second, Download ZuCo v1.0 'Matlab files' for 'task1-SR','task2-NR','task3-TSR' from https://osf.io/q3zws/files/ under 'OSF Storage' root, unzip and move all .mat files to ~/datasets/ZuCo/task1-SR/Matlab_files, ~/datasets/ZuCo/task2-NR/Matlab_files, ~/datasets/ZuCo/task3-TSR/Matlab_files respectively. Download ZuCo v2.0 'Matlab files' for 'task1-NR' from https://osf.io/2urht/files/ under 'OSF Storage' root, unzip and move all .mat files to ~/datasets/ZuCo/task2-NR-2.0/Matlab_files.
Third, run bash ./scripts/prepare_dataset_spectro..sh to preprocess .mat files. For each task, all .mat files will be converted into one .pickle file stored in ~/datasets/ZuCo/<task_name>/<task_name>-dataset_spectro.pickle. (The suffix spectro doesn't mean it's for spectro experiment only.I added this suffix to distinguish it from the dataset without suffix generated by the EEG to Text code. The basic dataset generated by this script is common in experiments. The first three steps are basically the same as EEG to Text's in https://github.com/MikeWangWZHL/EEG-To-Text)
Fourth, run get_dataset_xxx.py to get a different masked dataset or normal dataset in .pkl format which will be stored in current folder. The xxx represents the dataset corresponding to this code, such as raw means only sentence level data is used, and spectro means the dataset is converted to spectro format.
Fifth, run pretrain_xxx.sh for pretraining. The xxx represents the same things as in the fourth step. This step is used to see which one format performs the best.