OTTO Recommender Systems Dataset

A real-world e-commerce dataset for session-based recommender systems research.

Get the Data • Data Format • Installation • Evaluation • FAQ • License

The OTTO session dataset is a large-scale, industry-grade dataset designed to bridge the gap between academic research and real-world applications in session-based and sequential recommendation. It features anonymized behavior logs from the OTTO webshop and app, supporting both multi-objective (predicting clicks, carts, and orders) and single-objective tasks. With ready-to-use formats, clear evaluation metrics, and a focus on realistic, scalable research, this dataset aims to drive innovation in the recommender systems community and has been featured in our own Kaggle competition.

Key Features

12M real-world anonymized user sessions
220M events, consiting of clicks, carts and orders
1.8M unique articles in the catalogue
Ready to use data in .jsonl format
Evaluation metrics for single and multi-objective tasks

Dataset Statistics

Dataset	#sessions	#items	#events	#clicks	#carts	#orders	Density [%]
Train	12.899.779	1.855.603	216.716.096	194.720.954	16.896.191	5.098.951	0.0005
Test	1.671.803	1.019.357	13.851.293	12.340.303	1.155.698	355.292	0.0005

	mean	std	min	50%	75%	90%	95%	max
Train #events per session	16.80	33.58	2	6	15	39	68	500
Test #events per session	8.29	13.74	2	4	8	18	28	498

#events per session histogram (90th percentile)

	mean	std	min	50%	75%	90%	95%	max
Train #events per item	116.79	728.85	3	20	56	183	398	129004
Test #events per item	13.59	70.48	1	3	9	24	46	17068

#events per item histogram (90th percentile)

Get the Data

The data is stored on the Kaggle platform and can be downloaded using their API:

kaggle datasets download -d otto/recsys-dataset

Data Format

The sessions are stored as JSON objects containing a unique session ID and a list of events:

{
    "session": 42,
    "events": [
        { "aid": 0, "ts": 1661200010000, "type": "clicks" },
        { "aid": 1, "ts": 1661200020000, "type": "clicks" },
        { "aid": 2, "ts": 1661200030000, "type": "clicks" },
        { "aid": 2, "ts": 1661200040000, "type": "carts"  },
        { "aid": 3, "ts": 1661200050000, "type": "clicks" },
        { "aid": 3, "ts": 1661200060000, "type": "carts"  },
        { "aid": 4, "ts": 1661200070000, "type": "clicks" },
        { "aid": 2, "ts": 1661200080000, "type": "orders" },
        { "aid": 3, "ts": 1661200080000, "type": "orders" }
    ]
}

session - the unique session id
events - the time ordered sequence of events in the session
- aid - the article id (product code) of the associated event
- ts - the Unix timestamp of the event
- type - the event type, i.e., whether a product was clicked, added to the user's cart, or ordered during the session

Train/Test Split

To evaluate a model's ability to predict future behavior, as required for deployment in a real-world webshop, we use a time-based validation split. The training set includes user sessions from a 4-week period, while the test set contains sessions from the following week. To prevent information leakage, any training sessions overlapping with the test period were trimmed, ensuring a clear separation between past and future data. The diagram below illustrates this process:

Evaluation Metrics

To ensure research relevance and industry applicability, we provide standardized evaluation protocols that closely correlate with real-world performance. For consistent and reliable benchmarking, we strongly recommend:

Using the provided train/test split to ensure direct comparability with other research results, without leaving out any items or sessions in the evaluation
Evaluating on the entire test sequences without truncation
Never use sampling during evaluation, as this will lead to misleading results (see details here)

Single-Objective Evaluation

For click prediction tasks, we recommend using Recall@20 (preferred) and MRR@20, which have demonstrated strong correlation with business impact metrics in our production systems, as validated in our research paper.

Model	Recall@20	MRR@20	Epochs/h
GRU4Rec⁺	0.443	0.205	0.019
SASRec	0.307	0.180	0.248
TRON	0.472	0.219	0.227

Multi-Objective Evaluation

For models predicting multiple user actions, we offer two approaches:

Joint Recall Metric: Developed for our Kaggle competition, this metric integrates recall scores for clicks, basket additions, and orders into a single comprehensive measure
MultiTRON: An approach that optimizes for clicks and orders simultaneously, allowing for evaluation of different preference trade-offs as detailed in our research paper

Note that multi-objective recommendation evaluation remains an active research area without definitive benchmarks. We welcome further research and contributions to improve evaluation methodologies for these complex scenarios.

Kaggle Competition

For detailed usage instructions and evaluation guidelines regarding the competition, please refer to the KAGGLE.md file.

FAQ

How is a user `session` defined?

A session is all activity by a single user either in the train or the test set.

Are there identical users in the train and test data?

No, train and test users are completely disjunct.

Are all test `aids` included in the train set?

Yes, all test items are also included in the train set.

How can a session start with an order or a cart?

This can happen if the ordered item was already in the customer's cart before the data extraction period started. Similarly, a wishlist in our shop can lead to cart additions without a previous click.

Are `aids` the same as article numbers on otto.de?

No, all article and session IDs are anonymized.

Are most of the clicks generated by our current recommendations?

No, our current recommendations generated only about 20% of the product page views in the dataset. Most users reached product pages via search results and product lists.

License

The OTTO dataset is released under the CC-BY 4.0 License, while the code is licensed under the MIT License.

Citation

BibTeX entry:

@online{philipp_normann_sophie_baumeister_timo_wilm_2023,
 title={OTTO Recommender Systems Dataset: A real-world e-commerce dataset for session-based recommender systems research},
 url={https://www.kaggle.com/dsv/4991874},
 doi={10.34740/KAGGLE/DSV/4991874},
 publisher={Kaggle},
 author={Philipp Normann and Sophie Baumeister and Timo Wilm},
 year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/workflows		.github/workflows
.readme		.readme
src		src
test		test
.gitignore		.gitignore
.style.yapf		.style.yapf
KAGGLE.md		KAGGLE.md
LICENSE		LICENSE
MAINTAINERS		MAINTAINERS
OSSMETADATA		OSSMETADATA
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OTTO Recommender Systems Dataset

Key Features

Dataset Statistics

Get the Data

Data Format

Train/Test Split

Evaluation Metrics

Single-Objective Evaluation

Multi-Objective Evaluation

Kaggle Competition

FAQ

How is a user `session` defined?

Are there identical users in the train and test data?

Are all test `aids` included in the train set?

How can a session start with an order or a cart?

Are `aids` the same as article numbers on otto.de?

Are most of the clicks generated by our current recommendations?

License

Citation

About

Uh oh!

Uh oh!

Contributors 7

Uh oh!

Languages

License

otto-de/recsys-dataset

Folders and files

Latest commit

History

Repository files navigation

OTTO Recommender Systems Dataset

Key Features

Dataset Statistics

Get the Data

Data Format

Train/Test Split

Evaluation Metrics

Single-Objective Evaluation

Multi-Objective Evaluation

Kaggle Competition

FAQ

How is a user session defined?

Are there identical users in the train and test data?

Are all test aids included in the train set?

How can a session start with an order or a cart?

Are aids the same as article numbers on otto.de?

Are most of the clicks generated by our current recommendations?

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 7

Uh oh!

Languages

How is a user `session` defined?

Are all test `aids` included in the train set?

Are `aids` the same as article numbers on otto.de?