8000 GitHub - earthat/transdim: Data imputation for urban transportation systems
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

earthat/transdim

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

transdim

Transportation data imputation (transdim).

Contents

Strategic aim

Creating accurate and efficient solutions for the spatio-temporal traffic data imputation and prediction tasks.

Tasks and challenges

  • Missing data imputation

    • Random missing: Each sensor lost their observations at completely random. (★★★)
    • Non-random missing: Each sensor lost their observations during several days. (★★★★)
  • Rolling traffic prediction

    • Forecasting without missing values. (★★★)
    • Forecasting with incomplete observations. (★★★★★)

What we do just now!

  • add a framework indicating overall studies;

framework

Framework: Tensor completion task and its framework including data organization and tensor completion, in which traffic measurements are partially observed.

  • define the problems clearly;

    • Example: Traffic forecasting using matrix factorization models.

      example

Real experiment setting: Observations with 0%, 20% and 40% fiber missing rates during first 56 days are treated as stationary inputs. Meanwhile, there are some rolling inputs for forecasting traffic speed during last 5 days (from Monday to Friday) in a rolling manner.

  • describe the core challenges intuitively;
  • list main contributions of these studies.

What we care about!

  • Best algebraic structure for data imputation.
  • The context of urban transportation (e.g., biases).
  • Data noise avoidance.
  • Competitive imputation and prediction performance.
  • Capable of various missing data scenarios.

Overview

With the development and application of intelligent transportation systems, large quantities of urban traffic data are collected on a continuous basis from various sources, such as loop detectors, cameras, and floating vehicles. These data sets capture the underlying states and dynamics of transportation networks and the whole system and become beneficial to many traffic operation and management applications, including routing, signal control, travel time prediction, and so on. However, the missing data problem is inevitable when collecting traffic data from intelligent transportation systems.

Publicly available at our Zenodo repository!

example (a) Time series of actual and estimated speed within two weeks from August 1 to 14.

example (b) Time series of actual and estimated speed within two weeks from September 12 to 25.

The imputation performance of BGCP (CP rank r=15 and missing rate α=30%) under the fiber missing scenario with third-order tensor representation, where the estimated result of road segment #1 is selected as an example. In the both two panels, red rectangles represent fiber missing (i.e., speed observations are lost in a whole day).

Machine learning models

  • Missing data imputation

Urban traffic speed data set (i.e., Guangzhou-data-set(Gdata)) registered traffic speed data from 214 road segments over two months (61 days from August 1 to September 30 in 2016) in Guangzhou, China. We organize the raw data into a time series matrix of (214, 8784). For tensor-based models, we use a third-order tensor (214, 61, 144) as input. Matrix based models are tested with the time series matrix (214, 8784).

We consider two common missing data scenarios (i.e., random missing (RM) and non-random missing (NM)). For RM, we simply remove certain amount of observed entries in the matrix randomly and use these entries as ground truth to evaluate RMSE. For NM, we apply correlated fiber missing experiment by randomly choosing certain amount (e.g., 40%) (location, day) combinations and removing the whole time series in each combination.

Missingness BGCP BPMF PMF GAIN
20%, RM 3.5762 4.0403 4.0909 4.6718
40%, RM 3.5969 4.1578 4.2280 5.1776
20%, NM 4.4136 4.3828 4.3575 6.5500
40%, NM 4.6791 4.5586 4.4866 6.9947

Selected references

Our publications

  • Xinyu Chen, Zhaocheng He, Yixian Chen, Yuhuan Lu, Jiawei Wang (2019). Missing traffic data imputation and pattern discovery with a Bayesian augmented tensor factorization model. Transportation Research Part C: Emerging Technologies, 104: 66-77. [preprint] [slide] [data] [Matlab code]

  • Xinyu Chen, Zhaocheng He, Lijun Sun (2019). A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation. Transportation Research Part C: Emerging Technologies, 98: 73-84. [preprint] [doi] [data] [Matlab code] [Imputation example in Jupyter notebook (Matlab)] [Jupyter notebook (Python)]

  • Xinyu Chen, Zhaocheng He, Jiawei Wang (2018). Spatial-temporal traffic speed patterns discovery and incomplete data recovery via SVD-combined tensor decomposition. Transportation Research Part C: Emerging Technologies, 86: 59-77. [doi] [data]

    Please consider citing our papers if they help your research.

Our blog posts (in Chinese)

License

This work is released under the MIT license.

About

Data imputation for urban transportation systems

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%
0