Neuroscience meets ML

Building models to encode human brain.

On the left, an image from COCO, specifically #10 and #15.

On the right, 2D visualisation of fMRI BOLD signal in left and right hemisphere while looking at the image, plotted on a flat FreeSurfer's fsaverage.

Originally, final project for Artificial Intelligence for Games and Simulations.

TL;DR

We trained models to encode the most relevant features of some fMRI data (training set) into a latent vector. The fMRI data is obtained while subjects look at images, which can contain a human or not. Our model was not explicitly trained to separate the two categories; it just encodes highly complex fMRI data into a small vector. By performing inference on known-category data (test set), we observe encoded vectors to be distinguishable based on the category.

Research Idea, Methods, and Results

The Natural Scenes Dataset is a large-scale fMRI dataset conducted at ultra-high-field (7T) strength. The dataset consists of whole-brain, high-resolution (1.8-mm isotropic, 1.6-s sampling rate) fMRI measurements of 8 healthy adult subjects while they viewed thousands of color natural scenes. (adapt. from their website)

The images are part of the COCO dataset and are labeled with category. In particular, we were interested in those which contained a person/human body, categorised as person, and those which did not. Additionally, every subject viewed a total of ~10000 images; a small part was shared by all the participants (~1000 images), while the others were subject-specific , ie. only seen by that subject. We selected a specific Region of Interest (ROI) in the Visual Cortex called floc-bodies, which specialises in recognising human bodies.

For each subject, we trained a Sparse Autoencoder (SAE) on BOLD signals from the subject-specific images (our training set). This way, it learned to extract key features of the brain state of subjects while looking at images.

The result was a set of 8 trained models, one for each subject, which we used for inference with the shared images (test set). We performed inference with ~600 images, equally separated between person and non-person categories. We plotted the resulting vectors with t-SNE, observing different distributions based on the category, which the model was completely unaware of during training.

_Note: greyed-out squares are non-significant subjects_

Nevertheless, this does not allow us to state that it specifically encodes information about this categorization. Natural scene images present a large spectrum of different features, which in turn result in complex activation patterns in the brain. Those features most likely overlap for person and non-person stimuli, making it difficult for the model to distinguish categories.

Moreover, this is a very small-scale experiment, and we specifically selected floc-bodies because it is known in the literature to be sensitive to stimuli containing bodies. Our results suggest that our sparse autoencoder, trained on activations within the floc-bodies region, might be able to capture information about whether the participant was looking at an image with or without a human body in it.

More detailed information can be found on our Research Report.

Setting up the environment

Install python version 3.11.10

# macosx
brew install pyenv

# windows powershell - not tested
Invoke-WebRequest -UseBasicParsing -Uri "https://raw.githubusercontent.com/pyenv-win/pyenv-win/master/pyenv-win/install-pyenv-win.ps1" -OutFile "./install-pyenv-win.ps1"; &"./install-pyenv-win.ps1"


# then
pyenv install 3.11.10 # or 3.11.9 if .10 is not available

Create and activate virtual environment

pyenv exec python3 -m venv .venv
source .venv/bin/activate # macosx

.venv-aigs/Scripts/activate # windows powershell - not tested

Install dependencies

pip install -r requirements.txt

Download datasets

create dataset structure

mkdir dataset
mkdir dataset/coco/ dataset/nsd_data/

download coco annotations

cd dataset/coco/
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
wget http://images.cocodataset.org/annotations/panoptic_annotations_trainval2017.zip
unzip annotations_trainval2017.zip
unzip panoptic_annotations_trainval2017.zip
rm annotations_trainval2017.zip annotations_trainval2017.zip

download algonauts dataset visit Algonauts Challenge form and fill in the form to get access to the Google Drive folder containing the unzipped dataset for each subject.

The resulting structure should be the following

dataset/
  nsd_coco.csv
  coco/
    annotations/
    panoptic_annotations/
  nsd_data/
    subj01/
    ...
    subj08/

Run the training

...

Run inference

...

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
dataset		dataset
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
.python-version		.python-version
NeuroscienceMeetsML.pdf		NeuroscienceMeetsML.pdf
README.md		README.md
img10_15_subj3.png		img10_15_subj3.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Neuroscience meets ML

TL;DR

Research Idea, Methods, and Results

Setting up the environment

Run the training

Run inference

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

bragg13/sae-fmri-extractor

Folders and files

Latest commit

History

Repository files navigation

Neuroscience meets ML

TL;DR

Research Idea, Methods, and Results

Setting up the environment

Run the training

Run inference

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages