Speaker Attribution in German Parliamentary Debates through BERT models

This repository holds the code for the submission “Politics, BERTed: Automatic Attribution of Speech Events in German Parliamentary Debates” submitted to the KONVENS 2023 Shared Task on Speaker Attribution, Task 1.

The accompanying paper can be found here: PDF (Full Proceeedings).

The goal of the shared task is the automatic identification of speech events in political debates and attributing them to their respective speakers, essentially identifying who says what to whom in the parliamentary debates.

The task is divided into two subtasks:

Task 1a is the full task, predicting both cue spans and associated role spans
Task 1b is the role prediction task only, where gold cue spans are already given.

Citing

If you are using the software and/or models, please consider citing the accompanying publication:

Ehrmanntraut, Anton. 2023. “Politics, BERTed: Automatic Attribution of Speech Events in German Parliamentary Debates.” In Proceedings of the GermEval 2023 Shared Task on Speaker Attribution in Newswire and Parliamentary Debates (SpkAtt-2023), edited by Ines Rehbein, Fynn Petersen-Frey, Annelen Brunner, Josef Ruppenhofer, Chris Biemann, and Simone Paolo Ponzetto, 22–30. Ingolstadt, Germany.

@inproceedings{ehrmanntraut_politics_2023,
	location = {Ingolstadt, Germany},
	title = {Politics, {BERTed}: Automatic Attribution of Speech Events in German Parliamentary Debates},
	pages = {22--30},
	booktitle = {Proceedings of the {GermEval} 2023 Shared Task on Speaker Attribution in Newswire and Parliamentary Debates ({SpkAtt}-2023)},
	author = {Ehrmanntraut, Anton},
	editor = {Rehbein, Ines and Petersen-Frey, Fynn and Brunner, Annelen and Ruppenhofer, Josef and Biemann, Chris and Ponzetto, Simone Paolo},
	date = {2023-09-18}
}

Models

Used Base Model	SpkAtt-F1 (test set)	Match-F1 (dev set)	Download
aehrm/gepabert	82.8	84.8	Link
deepset/gbert-large	(not evaluated)	84.4	Link
deepset/gbert-base	(not evaluated)	81.2	Link

Setup

The project uses poetry for dependency management. You can just run: poetry install to install all dependencies.

You may open a shell with poetry shell with all required python packages and interpreter. Alternatively, you can run scripts with the project-dependent python interpreter with poetry run python <script.py>.

Usage

Inference

Before inference, you either need to download the published models and place them into the models/ folder, or train the models yourself (see below).

After the models/ folder has been populated, you can run the full inference (1a) like this:

# e.g, download GePaBERT models
(cd models; wget https://github.com/aehrm/spkatt_gepade/releases/download/konvens/gepabert_models.tar; tar xf gepabert_models.tar;)


# adjust if needed
#export PEFT_MODEL_DIR=./models

poetry run python ./predict.sh 1a input_dir [output_dir]

The input_dir should hold tokenized speeches as JSON file, like in the GePaDe test dataset (the one provided for the shared task).

E.g., to reproduce the results, run

wget https://github.com/umanlp/SpkAtt-2023/archive/refs/heads/master.zip -O gepade.zip
unzip gepade.zip

poetry run python ./predict.sh 1a SpkAtt-2023-master/data/dev/task1 [output_dir]

Alternatively, you can run the subtask 1b (role prediction from gold cues) like the following, e.g., on the GePaDe dev dataset. Make sure the input_dir JSON files contain annotation objects with cue spans.

poetry run python ./predict.sh 1b path/to/spkatt_data/dev/task1 [output_dir]

Training

After downloading the full GePaDe dataset in the folder data, you can run the training like this:

# adjust if needed
#export BASE_MODEL_NAME=aehrm/gepabert
#export PEFT_MODEL_DIR=./models
#export TRAIN_FILES='./data/train/task1'
#export DEV_FILES='./data/dev/task1'

poetry run python ./train_cue_detector.py
poetry run python ./train_cue_joiner.py
poetry run python ./train_role_detector.py

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
models		models
spkatt_gepade		spkatt_gepade
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE.txt		LICENSE.txt
README.md		README.md
bias_analysis.py		bias_analysis.py
paper.pdf		paper.pdf
poetry.lock		poetry.lock
predict.py		predict.py
pyproject.toml		pyproject.toml
train_cue_detector.py		train_cue_detector.py
train_cue_joiner.py		train_cue_joiner.py
train_role_detector.py		train_role_detector.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Speaker Attribution in German Parliamentary Debates through BERT models

Citing

Models

Setup

Usage

Inference

Training

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

aehrm/spkatt_gepade

Folders and files

Latest commit

History

Repository files navigation

Speaker Attribution in German Parliamentary Debates through BERT models

Citing

Models

Setup

Usage

Inference

Training

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages