VTI-Data

NLU and NLG datasets developed within the Latvian Language Technology Initiative

Alpaca Latvian dataset

ALPACA-LV is a machine translated Alpaca instruction dataset for Latvian.
COPA

COPA is a machine translated COPA benchmark dataset for Latvian.
MMLU

MMLU is a machine translated MMLU benchmark dataset for Latvian. The sociology_postedited.json file contains a post-edited collection of the first 100 tasks in the sociology subject.
LV-exams

Multiple-choice questions (MCQ) from Latvian Centralized High School Exams.

Citation

If you find this useful in your research, please consider citing:

@inproceedings{dargis-etal-2024-evaluating,
	author = "Darģis, Roberts and Bārzdiņš, Guntis and Skadiņa, Inguna and Grūzītis, Normunds and Saulīte, Baiba",
	title = "Evaluating Open-Source LLMs in Low-Resource Languages: Insights from Latvian High School Exams",
	year = 2024,
 booktitle = "Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities",
	publisher = "Association for Computational Linguistics",
	pages = "289-293",
	month = "Nov",
	url = "https://aclanthology.org/2024.nlp4dh-1.28.pdf"
 }

 @inproceedings{Skadina-EtAl:2025,
 author = "Skadiņa, Inguna and Bakanovs, Bruno and Darģis, Roberts",
 title = "First Steps in Benchmarking Latvian in Large Language Models",
 year = 2025,
 journal = "Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)",
 publisher = "University of Tartu Library",
 pages = "86-95",
 url = "https://hdl.handle.net/10062/107120"

}

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
alpaca-lv		alpaca-lv
copa		copa
lv-exams		lv-exams
mmlu		mmlu
udlv-instruct		udlv-instruct
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VTI-Data

Citation

About

Uh oh!

Releases

Packages

Contributors 6

Uh oh!

LUMII-AILab/VTI-Data

Folders and files

Latest commit

History

Repository files navigation

VTI-Data

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Uh oh!

Packages