- Paper: A Fictional Q&A Dataset for Studying Memorization and Knowledge Acquisition
- Datasets Collection: tomg-group-umd/FictionalQA
- (6/18/25) Initial dataset generation code release.
- (6/5/25) Paper posted to ArXiv.
The FictionalQA dataset is a dataset specifically created to empower researchers to study the dual processes of fact memorization and verbatim sequence memorization. The dataset consists of synthetically-generated, webtext-like documents about fictional events and various facts they entail, as well as question-answer pairs about the facts within the fictional documents.
This repository contains a refactored version the dataset construction code used to produce the dataset for the paper.
We are working on demo and usage instructions, so check back soon!