This repository represents the implementation of the paper:
Ye Hong, Yanan Xin, Henry Martin, Dominik Bucher, Martin Raubal
IKG, ETH Zurich
While the results in the paper are obtained from SBB Green Class dataset that is not publicly available, we provide a runnable example of the pipeline on the Geolife dataset. The steps to run the pipeline are as follows:
- Download the repo, install neccessary
Requirements and dependencies
. - Download the Geolife GPS tracking dataset from here. Unzip and copy the
Data
folder intogeolife/
. The file structure should look likegeolife/Data/000/...
. - Define your working directories in
utils/config.py
. - Run
utils/preProGeolife.py
andutils/generateLocation.py
scripts to generate trips and locations. - Run the
main_Geolife.py
script for the travel behaviour change detection pipeline. The Figures and detection results are saved in theconfig["resultFig"]
folder.
Note: this is only for demonstration purposes, and the parameter combinations are not guaranteed to produce meaningful results.
The main entrance for SBB and Geolife datasets:
main_Geolife.py
: the whole pipeline for the Geolife dataset.
main_SBB.py
: the whole pipeline for the SBB dataset.
Files containing the different steps of the pipeline:
getActivitySet.py
: generate activity set and important trip setsimilarityMeasures.py
: similarity measurementclustering.py
: clusteringclusterVisualization.py
: clustering result analysis and plotchangeDetection.py
: change detection algorithms and result plot- jupyter notebook scripts:
stat.ipynb
: get preprocessed data size, prove of stability for the important trip set, and top1 location change detection (a proxy for home changes)tracking_quality.ipynb
: select users based on tracking coverage.
- And helper script in
.utils/
folder:config.py
: define data paths for intermediate results.data_figure.py
: helper function to generate data for Figure 2.generateLocation.py
: location generation from stay points.preProSBB.py
: data loading and preprocessing (trip generation) for the SBB dataset.preProGeolife.py
: data loading and preprocessing (trip generation) for the Geolife dataset.
Users are pre-filtered based on overall and sliding window tracking quality
- user tracked > 300 days.
- for each time window of 10 weeks, user tracking quality > 0.6.
All time-series cut at 2017-12-25 when the main study ends.
User selection for Figures:
- for demonstrating cluster result (Figure 3): user 1659.
- for demonstrating change detection results (Figure 4): user 1659.
- for comparing different users (Figure 5): (A) user 1632, (B) user 1641, (C) user 1620, and (D) user 1630.
Users who changed their top1 location during the study (a proxy for home location change):
- for 1 time: user 1651, 1624, 1608
- for 2 times: user 1650 (probably holiday house), 1620 (intercontinental travel, probably business reasons)
- for multiple times (probably multiple homes/holiday house): user 1631, 1630
- Numpy
- Pandas
- GeoPandas
- Matplotlib
- trackintel
- tqdm
- scikit-learn-extra
If you find this code useful for your work or use it in your project, please consider citing:
@InProceedings{Hong_2021_GIScience,
author = {Hong, Ye and Xin, Yanan and Martin, Henry and Bucher, Dominik and Raubal, Martin},
title = {A Clustering-Based Framework for Individual Travel Behaviour Change Detection},
booktitle = {11th International Conference on Geographic Information Science (GIScience 2021) - Part II},
pages = {4:1--4:15},
year = {2021},
volume = {208},
doi = {10.4230/LIPIcs.GIScience.2021.II.4},
}
If you have any questions, please let me know:
- Ye Hong {hongy@ethz.ch}