The project automatically fetches the latest papers from arXiv based on keywords.
The subheadings in the README file represent the search keywords.
Only the most recent articles for each keyword are retained, up to a maximum of 100 papers.
You can click the 'Watch' button to receive daily email notifications.
Last update: 2025-07-11
Title | Date | Abstract | Comment |
---|---|---|---|
Generating Multi-Table Time Series EHR from Latent Space with Minimal Preprocessing | 2025-07-09 | ShowElectronic Health Records (EHR) are time-series relational databases that record patient interactions and medical events over time, serving as a critical resource for healthcare research and applications. However, privacy concerns and regulatory restrictions limit the sharing and utilization of such sensitive data, necessitating the generation of synthetic EHR datasets. Unlike previous EHR synthesis methods, which typically generate medical records consisting of expert-chosen features (e.g. a few vital signs or structured codes only), we introduce RawMed, the first framework to synthesize multi-table, time-series EHR data that closely resembles raw EHRs. Using text-based representation and compression techniques, RawMed captures complex structures and temporal dynamics with minimal preprocessing. We also propose a new evaluation framework for multi-table time-series synthetic EHRs, assessing distributional similarity, inter-table relationships, temporal dynamics, and privacy. Validated on two open-source EHR datasets, RawMed outperforms baseline models in fidelity and utility. The code is available at https://github.com/eunbyeol-cho/RawMed. |
|
PyPOTS: A Python Toolkit for Machine Learning on Partially-Observed Time Series | 2025-07-09 | ShowPyPOTS is an open-source Python library dedicated to data mining and analysis on multivariate partially-observed time series with missing values. Particularly, it provides easy access to diverse algorithms categorized into five tasks: imputation, forecasting, anomaly detection, classification, and clustering. The included models represent a diverse set of methodological paradigms, offering a unified and well-documented interface suitable for both academic research and practical applications. With robustness and scalability in its design philosophy, best practices of software construction, for example, unit testing, continuous integration and continuous delivery, code coverage, maintainability evaluation, interactive tutorials, and parallelization, are carried out as principles during the development of PyPOTS. The toolbox is available on PyPI, Anaconda, and Docker. PyPOTS is open source and publicly available on GitHub https://github.com/WenjieDu/PyPOTS. |
PyPOT...PyPOTS website is at https://pypots.com, and PyPOTS is open source at https://github.com/WenjieDu/PyPOTS |
The cost of ensembling: is it always worth combining? | 2025-07-09 | ShowGiven the continuous increase in dataset sizes and the complexity of forecasting models, the trade-off between forecast accuracy and computational cost is emerging as an extremely relevant topic, especially in the context of ensemble learning for time series forecasting. To asses it, we evaluated ten base models and eight ensemble configurations across two large-scale retail datasets (M5 and VN1), considering both point and probabilistic accuracy under varying retraining frequencies. We showed that ensembles consistently improve forecasting performance, particularly in probabilistic settings. However, these gains come at a substantial computational cost, especially for larger, accuracy-driven ensembles. We found that reducing retraining frequency significantly lowers costs, with minimal impact on accuracy, particularly for point forecasts. Moreover, efficiency-driven ensembles offer a strong balance, achieving competitive accuracy with considerably lower costs compared to accuracy-optimized combinations. Most importantly, small ensembles of two or three models are often sufficient to achieve near-optimal results. These findings provide practical guidelines for deploying scalable and cost-efficient forecasting systems, supporting the broader goals of sustainable AI in forecasting. Overall, this work shows that careful ensemble design and retraining strategy selection can yield accurate, robust, and cost-effective forecasts suitable for real-world applications. |
|
Note on a non-parametric method for change-point detection | 2025-07-09 | ShowThe purpose of this note is to present in details R codes to implement a non-parametric method for change-point detection. The proposed approach is validated from various perspectives using simulations. This method is a competitor to that of Pettitt ([3]) and is, like the latter, based on the Wilcoxon-Mann-Whitney test. It is used in [4] for the study of relatively short time series obtained from measurements on cores sampled in the bay of Brest. |
|
Elite Polarization in European Parliamentary Speeches: a Novel Measurement Approach Using Large Language Models | 2025-07-09 | ShowThis project introduces a new measure of elite polarization via actor and subject detection using artificial intelligence. I identify when politicians mention one another in parliamentary speeches, note who is speaking and who is being addressed, and assess the emotional temperature behind these evaluations. This maps how elites evaluate their various out-parties, allowing us to create an index of mutual out-party hostility, that is, elite polarization. While I analyzed polarization data over the past four decades for the UK, and two decades for Hungary and Italy, my approach lays the groundwork for a twenty-year, EU-wide time-series dataset on elite polarization. I obtain the results that can be aggregated by party and quarter. The resulting index demonstrates a good face validity: it reacts to events such as electoral campaigns, country- and party-level crises, and to parties losing and assuming power. |
|
Sparsity-Promoting Dynamic Mode Decomposition Applied to Sea Surface Temperature Fields | 2025-07-09 | ShowIn this paper, we leverage Koopman mode decomposition to analyze the nonlinear and high-dimensional climate systems acting on the observed data space. The dynamics of atmospheric systems are assumed to be equation-free, with the linear evolution of observables derived from measured historical long-term time-series data snapshots, such as monthly sea surface temperature records, to construct a purely data-driven climate dynamics. In particular, sparsity-promoting dynamic mode decomposition is exploited to extract the dominant spatial and temporal modes, which are among the most significant coherent structures underlying climate variability, enabling a more efficient, interpretable, and low-dimensional representation of the system dynamics. We hope that the combined use of Koopman modes and sparsity-promoting techniques will provide insights into the significant climate modes, enabling reduced-order modeling of the climate system and offering a potential framework for predicting and controlling weather and climate variability. |
8 pages |
MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models | 2025-07-09 | ShowAs a prominent data modality task, time series forecasting plays a pivotal role in diverse applications. With the remarkable advancements in Large Language Models (LLMs), the adoption of LLMs as the foundational architecture for time series modeling has gained significant attention. Although existing models achieve some success, they rarely both model time and frequency characteristics in a pretraining-finetuning paradigm leading to suboptimal performance in predictions of complex time series, which requires both modeling periodicity and prior pattern knowledge of signals. We propose MoFE-Time, an innovative time series forecasting model that integrates time and frequency domain features within a Mixture of Experts (MoE) network. Moreover, we use the pretraining-finetuning paradigm as our training framework to effectively transfer prior pattern knowledge across pretraining and finetuning datasets with different periodicity distributions. Our method introduces both frequency and time cells as experts after attention modules and leverages the MoE routing mechanism to construct multidimensional sparse representations of input signals. In experiments on six public benchmarks, MoFE-Time has achieved new state-of-the-art performance, reducing MSE and MAE by 6.95% and 6.02% compared to the representative methods Time-MoE. Beyond the existing evaluation benchmarks, we have developed a proprietary dataset, NEV-sales, derived from real-world business scenarios. Our method achieves outstanding results on this dataset, underscoring the effectiveness of the MoFE-Time model in practical commercial applications. |
|
Quadratic Gating Mixture of Experts: Statistical Insights into Self-Attention | 2025-07-08 | ShowMixture of Experts (MoE) models are well known for effectively scaling model capacity while preserving computational overheads. In this paper, we establish a rigorous relation between MoE and the self-attention mechanism, showing that each row of a self-attention matrix can be written as a quadratic gating mixture of linear experts. Motivated by this connection, we conduct a comprehensive convergence analysis of MoE models with two different quadratic gating functions, namely the quadratic polynomial gate and the quadratic monomial gate, offering useful insights into the design of gating and experts for the MoE framework. First, our analysis indicates that the use of the quadratic monomial gate yields an improved sample efficiency for estimating parameters and experts compared to the quadratic polynomial gate. Second, parameter and expert estimation rates become significantly faster when employing non-linear experts in place of linear experts. Combining these theoretical insights with the above link between MoE and self-attention, we propose a novel \emph{active-attention} mechanism where we apply a non-linear activation function to the value matrix in the formula of self-attention. Finally, we demonstrate that the proposed active-attention outperforms the standard self-attention through several extensive experiments in various tasks, including image classification, language modeling, and multivariate time series forecasting. |
Pedra...Pedram Akbarian, Huy Nguyen, and Xing Han made equal contributions to this work |
Forecasting the U.S. Renewable-Energy Mix with an ALR-BDARMA Compositional Time-Series Framework | 2025-07-08 | ShowAccurate forecasts of the US renewable-generation mix are critical for planning transmission upgrades, sizing storage, and setting balancing-market rules. We present a Bayesian Dirichlet ARMA (BDARMA) model for monthly shares of hydro, geothermal, solar, wind, wood, municipal waste, and biofuels from January 2010 to January 2025. The mean vector follows a parsimonious VAR(2) in additive-log-ratio space, while the Dirichlet concentration parameter combines an intercept with ten Fourier harmonics, letting predictive dispersion expand or contract with the seasons. A 61-split rolling-origin study generates twelve-month density forecasts from January 2019 to January 2024. Relative to three benchmarks, a Gaussian VAR(2) in transform space, a seasonal naive copy of last year's proportions, and a drift-free additive-log-ratio random walk, BDARMA lowers the mean continuous ranked probability score by fifteen to sixty percent, achieves component-wise ninety percent interval coverage close to nominal, and matches Gaussian VAR point accuracy through eight months with a maximum loss of 0.02 Aitchison units thereafter. BDARMA therefore delivers sharp, well-calibrated probabilistic forecasts of multivariate renewable-energy shares without sacrificing point precision. |
|
AI-Based Demand Forecasting and Load Balancing for Optimising Energy use in Healthcare Systems: A real case study | 2025-07-08 | ShowThis paper tackles the urgent need for efficient energy management in healthcare facilities, where fluctuating demands challenge operational efficiency and sustainability. Traditional methods often prove inadequate, causing inefficiencies and higher costs. To address this, the study presents an AI-based framework combining Long Short-Term Memory (LSTM), genetic algorithm (GA), and SHAP (Shapley Additive Explanations), specifically designed for healthcare energy management. Although LSTM is widely used for time-series forecasting, its application in healthcare energy prediction remains underexplored. The results reveal that LSTM significantly outperforms ARIMA and Prophet models in forecasting complex, non-linear demand patterns. LSTM achieves a Mean Absolute Error (MAE) of 21.69 and Root Mean Square Error (RMSE) of 29.96, far better than Prophet (MAE: 59.78, RMSE: 81.22) and ARIMA (MAE: 87.73, RMSE: 125.22), demonstrating superior performance. The genetic algorithm is applied to optimize model parameters and improve load balancing strategies, enabling adaptive responses to real-time energy fluctuations. SHAP analysis further enhances model transparency by explaining the influence of different features on predictions, fostering trust in decision-making processes. This integrated LSTM-GA-SHAP approach offers a robust solution for improving forecasting accuracy, boosting energy efficiency, and advancing sustainability in healthcare facilities. Future research may explore real-time deployment and hybridization with reinforcement learning for continuous optimization. Overall, the study establishes a solid foundation for using AI in healthcare energy management, highlighting its scalability, efficiency, and resilience potential. |
|
KnowIt: Deep Time Series Modeling and Interpretation | 2025-07-08 | ShowKnowIt (Knowledge discovery in time series data) is a flexible framework for building deep time series models and interpreting them. It is implemented as a Python toolkit, with source code and documentation available from https://must-deep-learning.github.io/KnowIt. It imposes minimal assumptions about task specifications and decouples the definition of dataset, deep neural network architecture, and interpretability technique through well defined interfaces. This ensures the ease of importing new datasets, custom architectures, and the definition of different interpretability paradigms while maintaining on-the-fly modeling and interpretation of different aspects of a user's own time series data. KnowIt aims to provide an environment where users can perform knowledge discovery on their own complex time series data through building powerful deep learning models and explaining their behavior. With ongoing development, collaboration and application our goal is to make this a platform to progress this underexplored field and produce a trusted tool for deep time series modeling. |
|
DRAN: A Distribution and Relation Adaptive Network for Spatio-temporal Forecasting | 2025-07-08 | ShowAccurate predictions of spatio-temporal systems are crucial for tasks such as system management, control, and crisis prevention. However, the inherent time variance of many spatio-temporal systems poses challenges to achieving accurate predictions whenever stationarity is not granted. In order to address non-stationarity, we propose a Distribution and Relation Adaptive Network (DRAN) capable of dynamically adapting to relation and distribution changes over time. While temporal normalization and de-normalization are frequently used techniques to adapt to distribution shifts, this operation is not suitable for the spatio-temporal context as temporal normalization scales the time series of nodes and possibly disrupts the spatial relations among nodes. In order to address this problem, a Spatial Factor Learner (SFL) module is developed that enables the normalization and de-normalization process. To adapt to dynamic changes in spatial relationships among sensors, we propose a Dynamic-Static Fusion Learner (DSFL) module that effectively integrates features learned from both dynamic and static relations through an adaptive fusion ratio mechanism. Furthermore, we introduce a Stochastic Learner to capture the noisy components of spatio-temporal representations. Our approach outperforms state-of-the-art methods on weather prediction and traffic flow forecasting tasks.Experimental results show that our SFL efficiently preserves spatial relationships across various temporal normalization operations. Visualizations of the learned dynamic and static relations demonstrate that DSFL can capture both local and distant relationships between nodes. |
15 pages, 10 figures |
skfolio: Portfolio Optimization in Python | 2025-07-08 | ShowPortfolio optimization is a fundamental challenge in quantitative finance, requiring robust computational tools that integrate statistical rigor with practical implementation. We present skfolio, an open-source Python library for portfolio construction and risk management that seamlessly integrates with the scikit-learn ecosystem. skfolio provides a unified framework for diverse allocation strategies, from classical mean-variance optimization to modern clustering-based methods, state-of-the-art financial estimators with native interfaces, and advanced cross-validation techniques tailored for financial time series. By adhering to scikit-learn's fit-predict-transform paradigm, the library enables researchers and practitioners to leverage machine learning workflows for portfolio optimization, promoting reproducibility and transparency in quantitative finance. |
7 pages |
Deep learning from strongly mixing observations: Sparse-penalized regularization and minimax optimality | 2025-07-08 | ShowThe explicit regularization and optimality of deep neural networks estimators from independent data have made considerable progress recently. The study of such properties on dependent data is still a challenge. In this paper, we carry out deep learning from strongly mixing observations, and deal with the squared and a broad class of loss functions. We consider sparse-penalized regularization for deep neural network predictor. For a general framework that includes, regression estimation, classification, time series prediction,$\cdots$, oracle inequality for the expected excess risk is established and a bound on the class of H"older smooth functions is provided. For nonparametric regression from strong mixing data and sub-exponentially error, we provide an oracle inequality for the |
|
Decomposing the Time Serie 10000 s Forecasting Pipeline: A Modular Approach for Time Series Representation, Information Extraction, and Projection | 2025-07-08 | ShowWith the advent of Transformers, time series forecasting has seen significant advances, yet it remains challenging due to the need for effective sequence representation, memory construction, and accurate target projection. Time series forecasting remains a challenging task, demanding effective sequence representation, meaningful information extraction, and precise future projection. Each dataset and forecasting configuration constitutes a distinct task, each posing unique challenges the model must overcome to produce accurate predictions. To systematically address these task-specific difficulties, this work decomposes the time series forecasting pipeline into three core stages: input sequence representation, information extraction and memory construction, and final target projection. Within each stage, we investigate a range of architectural configurations to assess the effectiveness of various modules, such as convolutional layers for feature extraction and self-attention mechanisms for information extraction, across diverse forecasting tasks, including evaluations on seven benchmark datasets. Our models achieve state-of-the-art forecasting accuracy while greatly enhancing computational efficiency, with reduced training and inference times and a lower parameter count. The source code is available at https://github.com/RobertLeppich/REP-Net. |
|
Predicting Graph Structure via Adapted Flux Balance Analysis | 2025-07-08 | ShowMany dynamic processes such as telecommunication and transport networks can be described through discrete time series of graphs. Modelling the dynamics of such time series enables prediction of graph structure at future time steps, which can be used in applications such as detection of anomalies. Existing approaches for graph prediction have limitations such as assuming that the vertices do not to change between consecutive graphs. To address this, we propose to exploit time series prediction methods in combination with an adapted form of flux balance analysis (FBA), a linear programming method originating from biochemistry. FBA is adapted to incorporate various constraints applicable to the scenario of growing graphs. Empirical evaluations on synthetic datasets (constructed via Preferential Attachment model) and real datasets (UCI Message, HePH, Facebook, Bitcoin) demonstrate the efficacy of the proposed approach. |
exten...extended and revised version of arXiv:2401.04280 |
KAN-AD: Time Series Anomaly Detection with Kolmogorov-Arnold Networks | 2025-07-08 | ShowTime series anomaly detection (TSAD) underpins real-time monitoring in cloud services and web systems, allowing rapid identification of anomalies to prevent costly failures. Most TSAD methods driven by forecasting models tend to overfit by emphasizing minor fluctuations. Our analysis reveals that effective TSAD should focus on modeling "normal" behavior through smooth local patterns. To achieve this, we reformulate time series modeling as approximating the series with smooth univariate functions. The local smoothness of each univariate function ensures that the fitted time series remains resilient against local disturbances. However, a direct KAN implementation proves susceptible to these disturbances due to the inherently localized characteristics of B-spline functions. We thus propose KAN-AD, replacing B-splines with truncated Fourier expansions and introducing a novel lightweight learning mechanism that emphasizes global patterns while staying robust to local disturbances. On four popular TSAD benchmarks, KAN-AD achieves an average 15% improvement in detection accuracy (with peaks exceeding 27%) over state-of-the-art baselines. Remarkably, it requires fewer than 1,000 trainable parameters, resulting in a 50% faster inference speed compared to the original KAN, demonstrating the approach's efficiency and practical viability. |
11 pages, ICML 2025 |
Unifying Explainable Anomaly Detection and Root Cause Analysis in Dynamical Systems | 2025-07-08 | ShowDynamical systems, prevalent in various scientific and engineering domains, are susceptible to anomalies that can significantly impact their performance and reliability. This paper addresses the critical challenges of anomaly detection, root cause localization, and anomaly type classification in dynamical systems governed by ordinary differential equations (ODEs). We define two categories of anomalies: cyber anomalies, which propagate through interconnected variables, and measurement anomalies, which remain localized to individual variables. To address these challenges, we propose the Interpretable Causality Ordinary Differential Equation (ICODE) Networks, a model-intrinsic explainable learning framework. ICODE leverages Neural ODEs for anomaly detection while employing causality inference through an explanation channel to perform root cause analysis (RCA), elucidating why specific time periods are flagged as anomalous. ICODE is designed to simultaneously perform anomaly detection, RCA, and anomaly type classification within a single, interpretable framework. Our approach is grounded in the hypothesis that anomalies alter the underlying ODEs of the system, manifesting as changes in causal relationships between variables. We provide a theoretical analysis of how perturbations in learned model parameters can be utilized to identify anomalies and their root causes in time series data. Comprehensive experimental evaluations demonstrate the efficacy of ICODE across various dynamical systems, showcasing its ability to accurately detect anomalies, classify their types, and pinpoint their origins. |
Accep...Accepted by the AAAI-25 Workshop on Artificial Intelligence for Cyber Security (AICS) |
LATST: Are Transformers Necessarily Complex for Time-Series Forecasting | 2025-07-08 | ShowTransformer-based architectures have achieved remarkable success in natural language processing and computer vision. However, their performance in multivariate long-term forecasting often falls short compared to simpler linear baselines. Previous research has identified the traditional attention mechanism as a key factor limiting their effectiveness in this domain. To bridge this gap, we introduce LATST, a novel approach designed to mitigate entropy collapse and training instability common challenges in Transformer-based time series forecasting. We rigorously evaluate LATST across multiple real-world multivariate time series datasets, demonstrating its ability to outperform existing state-of-the-art Transformer models. Notably, LATST manages to achieve competitive performance with fewer parameters than some linear models on certain datasets, highlighting its efficiency and effectiveness. |
8 pag...8 pages with referencing, 1 figure, 5 tables |
Regularized Estimation of the Loading Matrix in Factor Models for High-Dimensional Time Series | 2025-07-07 | ShowHigh-dimensional data analysis using traditional models suffers from overparameterization. Two types of techniques are commonly used to reduce the number of parameters - regularization and dimension reduction. In this project, we combine them by imposing a sparse factor structure and propose a regularized estimator to further reduce the number of parameters in factor models. A challenge limiting the widespread application of factor models is that factors are hard to interpret, as both factors and the loading matrix are unobserved. To address this, we introduce a penalty term when estimating the loading matrix for a sparse estimate. As a result, each factor only drives a smaller subset of time series that exhibit the strongest correlation, improving the factor interpretability. The theoretical properties of the proposed estimator are investigated. The simulation results are presented to confirm that our algorithm performs well. We apply our method to Hawaii tourism data. |
|
Temporal Conformal Prediction (TCP): A Distribution-Free Statistical and Machine Learning Framework for Adaptive Risk Forecasting | 2025-07-07 | ShowWe propose Temporal Conformal Prediction (TCP), a novel framework for constructing prediction intervals in financial time-series with guaranteed finite-sample validity. TCP integrates quantile regression with a conformal calibration layer that adapts online via a decaying learning rate. This hybrid design bridges statistical and machine learning paradigms, enabling TCP to accommodate non-stationarity, volatility clustering, and regime shifts which are hallmarks of real-world asset returns, without relying on rigid parametric assumptions. We benchmark TCP against established methods including GARCH, Historical Simulation, and static Quantile Regression across equities (S&P 500), cryptocurrency (Bitcoin), and commodities (Gold). Empirical results show that TCP consistently delivers sharper intervals with competitive or superior coverage, particularly in high-volatility regimes. Our study underscores TCP's strength in navigating the coverage-sharpness tradeoff, a central challenge in modern risk forecasting. Overall, TCP offers a distribution-free, adaptive, and interpretable alternative for financial uncertainty quantification, advancing the interface between statistical inference and machine learning in finance. |
|
A Bayesian workflow for securitizing casualty insurance risk | 2025-07-07 | ShowCasualty insurance-linked securities (ILS) are appealing to investors because the underlying insurance claims, which are directly related to resulting security performance, are uncorrelated with most other asset classes. Conversely, casualty ILS are appealing to insurers as an efficient capital management tool. However, securitizing casualty insurance risk is non-trivial, as it requires forecasting loss ratios for pools of insurance policies that have not yet been written, in addition to estimating how the underlying losses will develop over time within future accident years. In this paper, we lay out a Bayesian workflow that tackles these complexities by using: (1) theoretically informed time-series and state-space models to capture how loss ratios develop and change over time; (2) historic industry data to inform prior distributions of models fit to individual programs; (3) stacking to combine loss ratio predictions from candidate models, and (4) both prior predictive simulations and simulation-based calibration to aid model specification. Using historic Schedule P filings, we then show how our proposed Bayesian workflow can be used to assess and compare models across a variety of key model performance metrics evaluated on future accident year losses. |
41 pages, 12 figures |
Causal Foundation Models: Disentangling Physics from Instrument Properties | 2025-07-07 | ShowFoundation models for structured time series data must contend with a fundamental challenge: observations often conflate the true underlying physical phenomena with systematic distortions introduced by measurement instruments. This entanglement limits model generalization, especially in heterogeneous or multi-instrument settings. We present a causally-motivated foundation model that explicitly disentangles physical and instrumental factors using a dual-encoder architecture trained with structured contrastive learning. Leveraging naturally occurring observational triplets (i.e., where the same target is measured under varying conditions, and distinct targets are measured under shared conditions) our model learns separate latent representations for the underlying physical signal and instrument effects. Evaluated on simulated astronomical time series designed to resemble the complexity of variable stars observed by missions like NASA's Transiting Exoplanet Survey Satellite (TESS), our method significantly outperforms traditional single-latent space foundation models on downstream prediction tasks, particularly in low-data regimes. These results demonstrate that our model supports key capabilities of foundation models, including few-shot generalization and efficient adaptation, and highlight the importance of encoding causal structure into representation learning for structured data. |
8 pag...8 pages, 5 figures. Accepted to the ICML 2025 Foundation Models for Structured Data Workshop and accepted to the Machine Learning for Astrophysics Workshop 2025 |
Bridging Expressivity and Scalability with Adaptive Unitary SSMs | 2025-07-07 | ShowRecent work has revealed that state space models (SSMs), while efficient for long-sequence processing, are fundamentally limited in their ability to represent formal languages particularly due to time-invariant and real-valued recurrence structures. In this work, we draw inspiration from adaptive and structured dynamics observed in biological neural systems and introduce the Adaptive Unitary State Space Model (AUSSM)- a novel class of SSMs that leverages skew-symmetric, input-dependent recurrence to achieve unitary evolution and high expressive power. Using algebraic automata theory, we prove that AUSSM can perform modulo counting and simulate solvable group automata at finite precision, enabling SSMs to model a broad class of regular languages that are out of reach for other SSM architectures. To overcome the practical inefficiencies of adaptive recurrence, we develop a separable convolution formulation and a CUDA implementation that enables scalable parallel training. Empirically, we show that AUSSM when interleaved with Mamba outperform prior SSMs on formal algorithmic tasks such as parity and modular arithmetic, and achieve competent performance on real-world long time-series classification benchmarks. Our results demonstrate that adaptive unitary recurrence provides a powerful and efficient inductive bias for both symbolic and continuous sequence modeling. |
|
Can Local Representation Alignment RNNs Solve Temporal Tasks? | 2025-07-07 | ShowRecurrent Neural Networks (RNNs) are commonly used for real-time processing, streaming data, and cases where the amount of training samples is limited. Backpropagation Through Time (BPTT) is the predominant algorithm for training RNNs; however, it is frequently criticized for being prone to exploding and vanishing gradients and being biologically implausible. In this paper, we present and evaluate a target propagation-based method for RNNs, which uses local updates and seeks to reduce the said instabilities. Having stable RNN models increases their practical use in a wide range of fields such as natural language processing, time-series forecasting, anomaly detection, control systems, and robotics. The proposed solution uses local representation alignment (LRA). We thoroughly analyze the performance of this method, experiment with normalization and different local error functions, and invalidate certain assumptions about the behavior of this type of learning. Namely, we demonstrate that despite the decomposition of the network into sub-graphs, the model still suffers from vanishing gradients. We also show that gradient clipping as proposed in LRA has little to no effect on network performance. This results in an LRA RNN model that is very difficult to train due to vanishing gradients. We address this by introducing gradient regularization in the direction of the update and demonstrate that this modification promotes gradient flow and meaningfully impacts convergence. We compare and discuss the performance of the algorithm, and we show that the regularized LRA RNN considerably outperforms the unregularized version on three landmark tasks: temporal order, 3-bit temporal order, and random permutation. |
This ...This is the version of the paper presented at ICCSM 2025 (July 2025 in Rome, Italy). No major changes in the content, but it uses a different LaTeX template and formatting |
Monitoring for a Phase Transition in a Time Series of Wigner Matrices | 2025-07-07 | ShowWe develop methodology and theory for the detection of a phase transition in a time-series of high-dimensional random matrices. In the model we study, at each time point ( t = 1,2,\ldots ), we observe a deformed Wigner matrix ( \mathbf{M}_t ), where the unobservable deformation represents a latent signal. This signal is detectable only in the supercritical regime, and our objective is to detect the transition to this regime in real time, as new matrix--valued observations arrive. Our approach is based on a partial sum process of extremal eigenvalues of |
|
Solar Flare Prediction Using LSTM and DLSTM with Sliding Window Pattern Recognition | 2025-07-07 | ShowWe investigate the use of Long Short-Term Memory (LSTM) and Decomposition-LSTM (DLSTM) networks, combined with an ensemble algorithm, to predict solar flare occurrences using time-series data from the GOES catalog. The dataset spans from 2003 to 2023 and includes 151,071 flare events. Among approximately possible patterns, 7,552 yearly pattern windows are identified, highlighting the challenge of long-term forecasting due to the Sun's complex, self-organized criticality-driven behavior. A sliding window technique is employed to detect temporal quasi-patterns in both irregular and regularized flare time series. Regularization reduces complexity, enhances large flare activity, and captures active days more effectively. To address class imbalance, resampling methods are applied. LSTM and DLSTM models are trained on sequences of peak fluxes and waiting times from irregular time series, while LSTM and DLSTM, integrated with an ensemble approach, are applied to sliding windows of regularized time series with a 3-hour interval. Performance metrics, particularly TSS (0.74), recall (0.95) and the area under the curve (AUC=0.87) in the receiver operating characteristic (ROC), indicate that DLSTM with an ensemble approach on regularized time series outperforms other models, offering more accurate large-flare forecasts with fewer false errors compared to models trained on irregular time series. The superior performance of DLSTM is attributed to its ability to decompose time series into trend and seasonal components, effectively isolating random noise. This study underscores the potential of advanced machine learning techniques for solar flare prediction and highlights the importance of incorporating various solar cycle phases and resampling strategies to enhance forecasting reliability. |
Publi...Published in the Astrophysical Journal Supplement Series, volume 279, 2025, DOI: 10.3847/1538-4365/addc73 |
Quantifying Robustness: A Benchmarking Framework for Deep Learning Forecasting in Cyber-Physical Systems | 2025-07-07 | ShowCyber-Physical Systems (CPS) in domains such as manufacturing and energy distribution generate complex time series data crucial for Prognostics and Health Management (PHM). While Deep Learning (DL) methods have demonstrated strong forecasting capabilities, their adoption in industrial CPS remains limited due insufficient robustness. Existing robustness evaluations primarily focus on formal verification or adversarial perturbations, inadequately representing the complexities encountered in real-world CPS scenarios. To address this, we introduce a practical robustness definition grounded in distributional robustness, explicitly tailored to industrial CPS, and propose a systematic framework for robustness evaluation. Our framework simulates realistic disturbances, such as sensor drift, noise and irregular sampling, enabling thorough robustness analyses of forecasting models on real-world CPS datasets. The robustness definition provides a standardized score to quantify and compare model performance across diverse datasets, assisting in informed model selection and architecture design. Through extensive empirical studies evaluating prominent DL architectures (including recurrent, convolutional, attention-based, modular, and structured state-space models) we demonstrate the applicability and effectiveness of our approach. We publicly release our robustness benchmark to encourage further research and reproducibility. |
Accep...Accepted at the 30th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA) |
A Novel Approach for Estimating Positive Lyapunov Exponents in One-Dimensional Chaotic Time Series Using Machine Learning | 2025-07-07 | ShowUnderstanding and quantifying chaos in nonlinear dynamical systems remains a fundamental challenge in science and engineering. The Lyapunov exponent is a key measure of chaotic behavior, but its accurate estimation from experimental data is often hindered by methodological and computational limitations. In this work, we present a novel machine-learning-based approach for estimating the positive Lyapunov exponent (MLE) from one-dimensional time series, using the growth of out-of-sample prediction errors as a proxy for trajectory divergence. Our method demonstrates high scientific relevance, offering a robust, data-driven alternative to traditional analytic techniques. Through comprehensive testing on several canonical chaotic maps - including the logistic, sine, cubic, and Chebyshev maps - we achieved a coefficient of determination R2pos > 0.9 between predicted and theoretical MLE values for time series as short as M = 200 points. The best accuracy was observed for the Chebyshev map (R2pos = 0.999). Notably, the proposed method maintains high computational efficiency and generalizes well across various machine learning algorithms. These results highlight the significance of our approach for practical chaos analysis in both synthetic and experimental settings, opening new possibilities for robust nonlinear dynamics assessment when only time series data are available. |
14 pa...14 pages, 3 figures, 2 Tables, 10 Equations |
Data Matters: The Case of Predicting Mobile Cellular Traffic | 2025-07-07 | ShowAccurate predictions of base stations' traffic load are essential to mobile cellular operators and their users as they support the efficient use of network resources and allow delivery of services that sustain smart cities and roads. Traditionally, cellular network time-series have been considered for this prediction task. More recently, exogenous factors such as points of interest and other environmental knowledge have been explored too. In contrast to incorporating external factors, we propose to learn the processes underlying cellular load generation by employing population dynamics data. In this study, we focus on smart roads and use road traffic measures to improve prediction accuracy. Comprehensive experiments demonstrate that by employing road flow and speed, in addition to cellular network metrics, base station load prediction errors can be substantially reduced, by as much as |
8 pag...8 pages, 5 figures, 5 tables |
DisMS-TS: Eliminating Redundant Multi-Scale Features for Time Series Classification | 2025-07-07 | ShowReal-world time series typically exhibit complex temporal variations, making the time series classification task notably challenging. Recent advancements have demonstrated the potential of multi-scale analysis approaches, which provide an effective solution for capturing these complex temporal patterns. However, existing multi-scale analysis-based time series prediction methods fail to eliminate redundant scale-shared features across multi-scale time series, resulting in the model over- or under-focusing on scale-shared features. To address this issue, we propose a novel end-to-end Disentangled Multi-Scale framework for Time Series classification (DisMS-TS). The core idea of DisMS-TS is to eliminate redundant shared features in multi-scale time series, thereby improving prediction performance. Specifically, we propose a temporal disentanglement module to capture scale-shared and scale-specific temporal representations, respectively. Subsequently, to effectively learn both scale-shared and scale-specific temporal representations, we introduce two regularization terms that ensure the consistency of scale-shared representations and the disparity of scale-specific representations across all temporal scales. Extensive experiments conducted on multiple datasets validate the superiority of DisMS-TS over its competitive baselines, with the accuracy improvement up to 9.71%. |
This ...This paper has been accepted for presentation at the ACM International Conference on Multimedia (ACM MM 2025) |
Selective Prediction via Training Dynamics | 2025-07-06 | ShowSelective Prediction is the task of rejecting inputs a model would predict incorrectly on. This involves a trade-off between input space coverage (how many data points are accepted) and model utility (how good is the performance on accepted data points). Current methods for selective prediction typically impose constraints on either the model architecture or the optimization objective; this inhibits their usage in practice and introduces unknown interactions with pre-existing loss functions. In contrast to prior work, we show that state-of-the-art selective prediction performance can be attained solely from studying the (discretized) training dynamics of a model. We propose a general framework that, given a test input, monitors metrics capturing the instability of predictions from intermediate models (i.e., checkpoints) obtained during training w.r.t. the final model's prediction. In particular, we reject data points exhibiting too much disagreement with the final prediction at late stages in training. The proposed rejection mechanism is domain-agnostic (i.e., it works for both discrete and real-valued prediction) and can be flexibly combined with existing selective prediction approaches as it does not require any train-time modifications. Our experimental evaluation on image classification, regression, and time series problems shows that our method beats past state-of-the-art accuracy/utility trade-offs on typical selective prediction benchmarks. |
Publi...Published in Transactions on Machine Learning Research (TMLR) |
Surrogate modeling with functional nonlinear autoregressive models (F-NARX) | 2025-07-06 | ShowWe propose a novel functional approach to surrogate modeling of dynamical systems with exogenous inputs. This approach, named Functional Nonlinear AutoRegressive with eXogenous inputs (F-NARX), approximates the system response based on temporal features of the exogenous inputs and the system response. This marks a major step away from the discrete-time-centric approach of classical NARX models, which determines the relationship between selected time steps of the input/output time series. By modeling the system in a time-feature space, F-NARX takes advantage of the temporal smoothness of the process being modeled, providing more stable predictions and reducing the dependence of model performance on the discretization of the time axis. In this work, we introduce an F-NARX implementation based on principal component analysis and polynomial regression. To further improve prediction accuracy, we also introduce a modified hybrid least angle regression approach to identify a sparse model structure and minimize the expected forecast error, rather than the one-step-ahead prediction error. We investigate the behavior and capabilities of our F-NARX implementation on two case studies: an eight-story building under wind loading and a three-story steel frame under seismic loading. Our results demonstrate that F-NARX has several favorable properties that make it well-suited to surrogate modeling applications. |
|
DC-Mamber: A Dual Channel Prediction Model based on Mamba and Linear Transformer for Multivariate Time Series Forecasting | 2025-07-06 | ShowIn multivariate time series forecasting (MTSF), existing strategies for processing sequences are typically categorized as channel-independent and channel-mixing. The former treats all temporal information of each variable as a token, focusing on capturing local temporal features of individual variables, while the latter constructs a token from the multivariate information at each time step, emphasizing the modeling of global temporal dependencies. Current mainstream models are mostly based on Transformer and the emerging Mamba. Transformers excel at modeling global dependencies through self-attention mechanisms but exhibit limited sensitivity to local temporal patterns and suffer from quadratic computational complexity, restricting their efficiency in long-sequence processing. In contrast, Mamba, based on state space models (SSMs), achieves linear complexity and efficient long-range modeling but struggles to aggregate global contextual information in parallel. To overcome the limitations of both models, we propose DC-Mamber, a dual-channel forecasting model based on Mamba and linear Transformer for time series forecasting. Specifically, the Mamba-based channel employs a channel-independent strategy to extract intra-variable features, while the Transformer-based channel adopts a channel-mixing strategy to model cross-timestep global dependencies. DC-Mamber first maps the raw input into two distinct feature representations via separate embedding layers. These representations are then processed by a variable encoder (built on Mamba) and a temporal encoder (built on linear Transformer), respectively. Finally, a fusion layer integrates the dual-channel features for prediction. Extensive experiments on eight public datasets confirm DC-Mamber's superior accuracy over existing models. |
|
Structural Classification of Locally Stationary Time Series Based on Second-order Characteristics | 2025-07-06 | ShowTime series classification is crucial for numerous scientific and engineering applications. In this article, we present a numerically efficient, practically competitive, and theoretically rigorous classification method for distinguishing between two classes of locally stationary time series based on their time-domain, second-order characteristics. Our approach builds on the autoregressive approximation for locally stationary time series, combined with an ensemble aggregation and a distance-based threshold for classification. It imposes no requirement on the training sample size, and is shown to achieve zero misclassification error rate asymptotically when the underlying time series differ only mildly in their second-order characteristics. The new method is demonstrated to outperform a variety of state-of-the-art solutions, including wavelet-based, tree-based, convolution-based methods, as well as modern deep learning methods, through intensive numerical simulations and a real EEG data analysis for epilepsy classification. |
41 Pages, 4 Figures |
Generative Regression with IQ-BART | 2025-07-05 | ShowImplicit Quantile BART (IQ-BART) posits a non-parametric Bayesian model on the conditional quantile function, acting as a model over a conditional model for |
48 pages, 7 figures |
Model selection for stochastic dynamics: a parsimonious and principled approach | 2025-07-05 | ShowThis thesis focuses on the discovery of stochastic differential equations (SDEs) and stochastic partial differential equations (SPDEs) from noisy and discrete time series. A major challenge is selecting the simplest possible correct model from vast libraries of candidate models, where standard information criteria (AIC, BIC) are often limited. We introduce PASTIS (Parsimonious Stochastic Inference), a new information criterion derived from extreme value theory. Its penalty term, |
PhD t...PhD thesis, 201 pages, 29 figures. Defended at the University of Aix-Marseille on July 1, 2025. Supervised by Dr. Pierre Ronceray |
Time Distributed Deep Learning Models for Purely Exogenous Forecasting: Application to Water Table Depth Prediction using Weather Image Time Series | 2025-07-05 | ShowGroundwater resources are one of the most relevant elements in the water cycle, therefore developing models to accurately predict them is a pivotal task in the sustainable resource management framework. Deep Learning (DL) models have been revealed to be very effective in hydrology, especially by feeding spatially distributed data (e.g. raster data). In many regions, hydrological measurements are difficult to obtain regularly or periodically in time, and in some cases, the last available data are not up to date. Reversely, weather data, which significantly impacts water resources, are usually more available and with higher quality. More specifically, we have proposed two different DL models to predict the water table depth in the Grana-Maira catchment (Piemonte, IT) using only exogenous weather image time series. To deal with the image time series, both models are made of a first Time Distributed Convolutional Neural Network (TDC) which encodes the image available at each time step into a vectorial representation. The first model, TDC-LSTM uses then a Sequential Module based on an LSTM layer to learn temporal relations and output the predictions. The second model, TDC-UnPWaveNet uses instead a new version of the WaveNet architecture, adapted here to output a sequence shorter and completely shifted in the future with respect to the input one. To this aim, and to deal with the different sequence lengths in the UnPWaveNet, we have designed a new Channel Distributed layer, that acts like a Time Distributed one but on the channel dimension, i.e. applying the same set of operations to each channel of the input. TDC-LSTM and TDC-UnPWaveNet have shown both remarkable results. However, the two models have focused on different learnable information: TDC-LSTM has focused more on lowering the bias, while TDC-UnPWaveNet has focused more on the temporal dynamics, maximizing correlation, and KGE. |
|
Transformer Model for Alzheimer's Disease Progression Prediction Using Longitudinal Visit Sequences | 2025-07-05 | ShowAlzheimer's disease (AD) is a neurodegenerative disorder with no known cure that affects tens of millions of people worldwide. Early detection of AD is critical for timely intervention to halt or slow the progression of the disease. In this study, we propose a Transformer model for predicting the stage of AD progression at a subject's next clinical visit using features from a sequence of visits extracted from the subject's visit history. We also rigorously compare our model to recurrent neural networks (RNNs) such as long short-term memory (LSTM), gated recurrent unit (GRU), and minimalRNN and assess their performances based on factors such as the length of prior visits and data imbalance. We test the importance of different feature categories and visit history, as well as compare the model to a newer Transformer-based model optimized for time series. Our model demonstrates strong predictive performance despite missing visits and missing features in available visits, particularly in identifying converter subjects -- individuals transitioning to more severe disease stages -- an area that has posed significant challenges in longitudinal prediction. The results highlight the model's potential in enhancing early diagnosis and patient outcomes. |
Confe...Conference on Health, Inference, and Learning (CHIL, 2025) |
Temporal Window Smoothing of Exogenous Variables for Improved Time Series Prediction | 2025-07-04 | ShowAlthough most transformer-based time series forecasting models primarily depend on endogenous inputs, recent state-of-the-art approaches have significantly improved performance by incorporating external information through exogenous inputs. However, these methods face challenges, such as redundancy when endogenous and exogenous inputs originate from the same source and limited ability to capture long-term dependencies due to fixed look-back windows. In this paper, we propose a method that whitens the exogenous input to reduce redundancy that may persist within the data based on global statistics. Additionally, our approach helps the exogenous input to be more aware of patterns and trends over extended periods. By introducing this refined, globally context-aware exogenous input to the endogenous input without increasing the lookback window length, our approach guides the model towards improved forecasting. Our approach achieves state-of-the-art performance in four benchmark datasets, consistently outperforming 11 baseline models. These results establish our method as a robust and effective alternative for using exogenous inputs in time series forecasting. |
Accep...Accepted at IJCNN 2025 |
Scientific Machine Learning of Chaotic Systems Discovers Governing Equations for Neural Populations | 2025-07-04 | ShowDiscovering governing equations that describe complex chaotic systems remains a fundamental challenge in physics and neuroscience. Here, we introduce the PEM-UDE method, which combines the prediction-error method with universal differential equations to extract interpretable mathematical expressions from chaotic dynamical systems, even with limited or noisy observations. This approach succeeds where traditional techniques fail by smoothing optimization landscapes and removing the chaotic properties during the fitting process without distorting optimal parameters. We demonstrate its efficacy by recovering hidden states in the Rossler system and reconstructing dynamics from noise-corrupted electrical circuit data, where the correct functional form of the dynamics is recovered even when one of the observed time series is corrupted by noise 5x the magnitude of the true signal. We demonstrate that this method is capable of recovering the correct dynamics, whereas direct symbolic regression methods, such as SINDy, fail to do so with the given amount of data and noise. Importantly, when applied to neural populations, our method derives novel governing equations that respect biological constraints such as network sparsity - a constraint necessary for cortical information processing yet not captured in next-generation neural mass models - while preserving microscale neuronal parameters. These equations predict an emergent relationship between connection density and both oscillation frequency and synchrony in neural circuits. We validate these predictions using three intracranial electrode recording datasets from the medial entorhinal cortex, prefrontal cortex, and orbitofrontal cortex. Our work provides a pathway to develop mechanistic, multi-scale brain models that generalize across diverse neural architectures, bridging the gap between single-neuron dynamics and macroscale brain activity. |
46 pages, 9 figures |
CHIME: Conditional Hallucination and Integrated Multi-scale Enhancement for Time Series Diffusion Model | 2025-07-04 | ShowThe denoising diffusion probabilistic model has become a mainstream generative model, achieving significant success in various computer vision tasks. Recently, there has been initial exploration of applying diffusion models to time series tasks. However, existing studies still face challenges in multi-scale feature alignment and generative capabilities across different entities and long-time scales. In this paper, we propose CHIME, a conditional hallucination and integrated multi-scale enhancement framework for time series diffusion models. By employing multi-scale decomposition and integration, CHIME captures the decomposed features of time series, achieving in-domain distribution alignment between generated and original samples. In addition, we introduce a feature hallucination module in the conditional denoising process, enabling the temporal features transfer across long-time scales. Experimental results on publicly available real-world datasets demonstrate that CHIME achieves state-of-the-art performance and exhibits excellent generative generalization capabilities in few-shot scenarios. |
|
ReTimeCausal: EM-Augmented Additive Noise Models for Interpretable Causal Discovery in Irregular Time Series | 2025-07-04 | ShowThis paper studies causal discovery in irregularly sampled time series-a pivotal challenge in high-stakes domains like finance, healthcare, and climate science, where missing data and inconsistent sampling frequencies distort causal mechanisms. Traditional methods (e.g., Granger causality, PCMCI) fail to reconcile multi-scale interactions (e.g., hourly storms vs. decadal climate shifts), while neural approaches (e.g., CUTS+) lack interpretability, stemming from a critical gap: existing frameworks either rigidly assume temporal regularity or aggregate dynamics into opaque representations, neglecting real-world granularity and auditable logic. To bridge this gap, we propose ReTimeCausal, a novel integration of Additive Noise Models (ANM) and Expectation-Maximization (EM) that unifies physics-guided data imputation with sparse causal inference. Through kernelized sparse regression and structural constraints, ReTimeCausal iteratively refines missing values (E-step) and causal graphs (M-step), resolving cross-frequency dependencies and missing data issues. Extensive experiments on synthetic and real-world datasets demonstrate that ReTimeCausal outperforms existing state-of-the-art methods under challenging irregular sampling and missing data conditions. |
12 pages, 2 figures |
PiCME: Pipeline for Contrastive Modality Evaluation and Encoding in the MIMIC Dataset | 2025-07-03 | ShowMultimodal deep learning holds promise for improving clinical prediction by integrating diverse patient data, including text, imaging, time-series, and structured demographics. Contrastive learning facilitates this integration by producing a unified representation that can be reused across tasks, reducing the need for separate models or encoders. Although contrastive learning has seen success in vision-language domains, its use in clinical settings remains largely limited to image and text pairs. We propose the Pipeline for Contrastive Modality Evaluation and Encoding (PiCME), which systematically assesses five clinical data types from MIMIC: discharge summaries, radiology reports, chest X-rays, demographics, and time-series. We pre-train contrastive models on all 26 combinations of two to five modalities and evaluate their utility on in-hospital mortality and phenotype prediction. To address performance plateaus with more modalities, we introduce a Modality-Gated LSTM that weights each modality according to its contrastively learned importance. Our results show that contrastive models remain competitive with supervised baselines, particularly in three-modality settings. Performance declines beyond three modalities, which supervised models fail to recover. The Modality-Gated LSTM mitigates this drop, improving AUROC from 73.19% to 76.93% and AUPRC from 51.27% to 62.26% in the five-modality setting. We also compare contrastively learned modality importance scores with attribution scores and evaluate generalization across demographic subgroups, highlighting strengths in interpretability and fairness. PiCME is the first to scale contrastive learning across all modality combinations in MIMIC, offering guidance for modality selection, training strategies, and equitable clinical prediction. |
|
Multiple data-driven missing imputation | 2025-07-03 | ShowThis paper introduces KZImputer, a novel adaptive imputation method for univariate time series designed for short to medium-sized missed points (gaps) (1-5 points and beyond) with tailored strategies for segments at the start, middle, or end of the series. KZImputer employs a hybrid strategy to handle various missing data scenarios. Its core mechanism differentiates between gaps at the beginning, middle, or end of the series, applying tailored techniques at each position to optimize imputation accuracy. The method leverages linear interpolation and localized statistical measures, adapting to the characteristics of the surrounding data and the gap size. The performance of KZImputer has been systematically evaluated against established imputation techniques, demonstrating its potential to enhance data quality for subsequent time series analysis. This paper describes the KZImputer methodology in detail and discusses its effectiveness in improving the integrity of time series data. Empirical analysis demonstrates that KZImputer achieves particularly strong performance for datasets with high missingness rates (around 50% or more), maintaining stable and competitive results across statistical and signal-reconstruction metrics. The method proves especially effective in high-sparsity regimes, where traditional approaches typically experience accuracy degradation. |
|
Ridge Regression for Manifold-valued Time-Series with Application to Meteorological Forecast | 2025-07-03 | ShowWe propose a natural intrinsic extension of the ridge regression from Euclidean spaces to general manifolds, which relies on Riemannian least-squares fitting, empirical covariance, and Mahalanobis distance. We utilize it for time-series prediction and apply the approach to forecast hurricane tracks and their wind speeds. |
|
Orientation-Aware Sparse Tensor PCA for Efficient Unsupervised Feature Selection | 2025-07-03 | ShowRecently, introducing Tensor Decomposition (TD) techniques into unsupervised feature selection (UFS) has been an emerging research topic. A tensor structure is beneficial for mining the relations between different modes and helps relieve the computation burden. However, while existing methods exploit TD to preserve the data tensor structure, they do not consider the influence of data orientation and thus have difficulty in handling orientation-specific data such as time series. To solve the above problem, we utilize the orientation-dependent tensor-tensor product from Tensor Singular Value Decomposition based on *M-product (T-SVDM) and extend the one-dimensional Sparse Principal Component Analysis (SPCA) to a tensor form. The proposed sparse tensor PCA model can constrain sparsity at the specified mode and yield sparse tensor principal components, enhancing flexibility and accuracy in learning feature relations. To ensure fast convergence and a flexible description of feature correlation, we develop a convex version specially designed for general UFS tasks and propose an efficient slice-by-slice algorithm that performs dual optimization in the transform domain. Experimental results on real-world datasets demonstrate the effectiveness and remarkable computational efficiency of the proposed method for tensor data of diverse structures over the state-of-the-art. When transform axes align with feature distribution patterns, our method is promising for various applications. The codes related to our proposed methods and the experiments are available at https://github.com/zjj20212035/STPCA.git. |
|
Adaptive Probabilistic ODE Solvers Without Adaptive Memory Requirements | 2025-07-03 | ShowDespite substantial progress in recent years, probabilistic solvers with adaptive step sizes can still not solve memory-demanding differential equations -- unless we care only about a single point in time (which is far too restrictive; we want the whole time series). Counterintuitively, the culprit is the adaptivity itself: Its unpredictable memory demands easily exceed our machine's capabilities, making our simulations fail unexpectedly and without warning. Still, dropping adaptivity would abandon years of progress, which can't be the answer. In this work, we solve this conundrum. We develop an adaptive probabilistic solver with fixed memory demands building on recent developments in robust state estimation. Switching to our method (i) eliminates memory issues for long time series, (ii) accelerates simulations by orders of magnitude through unlocking just-in-time compilation, and (iii) makes adaptive probabilistic solvers compatible with scientific computing in JAX. |
|
DUST: A Duality-Based Pruning Method For Exact Multiple Change-Point Detection | 2025-07-03 | ShowWe tackle the challenge of detecting multiple change points in large time series by optimising a penalised likelihood derived from exponential family models. While dynamic programming algorithms can solve this task exactly with at most quadratic time complexity, recent years have seen the development of pruning strategies to improve their time efficiency. However, the two existing approaches have notable limitations: PELT struggles with pruning efficiency in sparse-change scenarios, while FPOP's structure is ill-suited for multi-parametric settings. To address these issues, we introduce the DUal Simple Test (DUST) framework, which prunes candidates by evaluating a dual function against a threshold. This approach is highly flexible and broadly applicable to parametric models of any dimension. Under mild assumptions, we establish strong duality for the underlying non-convex pruning problem. We demonstrate DUST's effectiveness across various change point regimes and models. In particular, for one-parametric models, DUST matches the simplicity of PELT with the efficiency of FPOP. Its use is especially advantageous for non-Gaussian models. Its use is especially advantageous for non-Gaussian models. Finally, we apply DUST to mouse monitoring time series under a change-in-variance model, illustrating its ability to recover the optimal change point structure efficiently. |
|
Assembling ensembling: An adventure in approaches across disciplines | 2025-07-03 | ShowWhen we think of model ensembling or ensemble modeling, there are many possibilities that come to mind in different disciplines. For example, one might think of a set of descriptions of a phenomenon in the world, perhaps a time series or a snapshot of multivariate space, and perhaps that set is comprised of data-independent descriptions, or perhaps it is quite intentionally fit to data, or even a suite of data sets with a common theme or intention. The very meaning of 'ensemble' - a collection together - conjures different ideas across and even within disciplines approaching phenomena. In this paper, we present a typology of the scope of these potential perspectives. It is not our goal to present a review of terms and concepts, nor is it to convince all disciplines to adopt a common suite of terms, which we view as futile. Rather, our goal is to disambiguate terms, concepts, and processes associated with 'ensembles' and 'ensembling' in order to facilitate communication, awareness, and possible adoption of tools across disciplines. |
34 pages, 4 figures |
Gateformer: Advancing Multivariate Time Series Forecasting through Temporal and Variate-Wise Attention with Gated Representations | 2025-07-03 | ShowThere has been a recent surge of interest in time series modeling using the Transformer architecture. However, forecasting multivariate time series with Transformer presents a unique challenge as it requires modeling both temporal (cross-time) and variate (cross-variate) dependencies. While Transformer-based models have gained popularity for their flexibility in capturing both sequential and cross-variate relationships, it is unclear how to best integrate these two sources of information in the context of the Transformer architecture while optimizing for both performance and efficiency. We re-purpose the Transformer architecture to effectively model both cross-time and cross-variate dependencies. Our approach begins by embedding each variate independently into a variate-wise representation that captures its cross-time dynamics, and then models cross-variate dependencies through attention mechanisms on these learned embeddings. Gating operations in both cross-time and cross-variate modeling phases regulate information flow, allowing the model to focus on the most relevant features for accurate predictions. Our method achieves state-of-the-art performance across 13 real-world datasets and can be seamlessly integrated into other Transformer-based and LLM-based forecasters, deli 10000 vering performance improvements up to 20.7% over original models. Code is available at this repository: https://github.com/nyuolab/Gateformer. |
Accep...Accepted at ICML Workshop on Foundation Models for Structured Data |
Tensor-product interactions in Markov-switching models | 2025-07-03 | ShowMarkov-switching models are a powerful tool for modelling time series data that are driven by underlying latent states. As such, they are widely used in behavioural ecology, where discrete states can serve as proxies for behavioural modes and enable inference on latent behaviour driving e.g. observed movement. To understand drivers of behavioural changes, it is common to link model parameters to covariates. Over the last decade, nonparametric approaches have gained traction in this context to avoid unrealistic parametric assumptions. Nonetheless, existing methods are largely limited to univariate smooth functions of covariates, based on penalised splines, while real processes are typically complex requiring consideration of interaction effects. We address this gap by incorporating tensor-product interactions into Markov-switching models, enabling flexible modelling of multidimensional effects in a computationally efficient manner. Based on the extended Fellner-Schall method, we develop an efficient automatic smoothness selection procedure that is robust and scales well with the number of smooth functions in the model. The method builds on a random effects view of the spline coefficients and yields a recursive penalised likelihood procedure. As special cases, this general framework accommodates bivariate smoothing, function-valued random effects, and space-time interactions. We demonstrate its practical utility through three ecological case studies of an African elephant, common fruitflies, and Arctic muskoxen. The methodology is implemented in the LaMa R package, providing applied ecologists with an accessible and flexible tool for semiparametric inference in hidden-state models. The approach has the potential to drastically improve the level of detail in inference, allowing to fit HMMs with hundreds of parameters, 10-20 (potentially bivariate) smooths to thousands of observations. |
|
DeltaSHAP: Explaining Prediction Evolutions in Online Patient Monitoring with Shapley Values | 2025-07-03 | ShowThis study proposes DeltaSHAP, a novel explainable artificial intelligence (XAI) algorithm specifically designed for online patient monitoring systems. In clinical environments, discovering the causes driving patient risk evolution is critical for timely intervention, yet existing XAI methods fail to address the unique requirements of clinical time series explanation tasks. To this end, DeltaSHAP addresses three key clinical needs: explaining the changes in the consecutive predictions rather than isolated prediction scores, providing both magnitude and direction of feature attributions, and delivering these insights in real time. By adapting Shapley values to temporal settings, our approach accurately captures feature coalition effects. It further attributes prediction changes using only the actually observed feature combinations, making it efficient and practical for time-sensitive clinical applications. We also introduce new evaluation metrics to evaluate the faithfulness of the attributions for online time series, and demonstrate through experiments on online patient monitoring tasks that DeltaSHAP outperforms state-of-the-art XAI methods in both explanation quality as 62% and computational efficiency as 33% time reduction on the MIMIC-III decompensation benchmark. We release our code at https://github.com/AITRICS/DeltaSHAP. |
Accep...Accepted to ICML 2025 Workshop on Actionable Interpretability. Code is available at https://github.com/AITRICS/DeltaSHAP |
CineMyoPS: Segmenting Myocardial Pathologies from Cine Cardiac MR | 2025-07-03 | ShowMyocardial infarction (MI) is a leading cause of death worldwide. Late gadolinium enhancement (LGE) and T2-weighted cardiac magnetic resonance (CMR) imaging can respectively identify scarring and edema areas, both of which are essential for MI risk stratification and prognosis assessment. Although combining complementary information from multi-sequence CMR is useful, acquiring these sequences can be time-consuming and prohibitive, e.g., due to the administration of contrast agents. Cine CMR is a rapid and contrast-free imaging technique that can visualize both motion and structural abnormalities of the myocardium induced by acute MI. Therefore, we present a new end-to-end deep neural network, referred to as CineMyoPS, to segment myocardial pathologies, \ie scars and edema, solely from cine CMR images. Specifically, CineMyoPS extracts both motion and anatomy features associated with MI. Given the interdependence between these features, we design a consistency loss (resembling the co-training strategy) to facilitate their joint learning. Furthermore, we propose a time-series aggregation strategy to integrate MI-related features across the cardiac cycle, thereby enhancing segmentation accuracy for myocardial pathologies. Experimental results on a multi-center dataset demonstrate that CineMyoPS achieves promising performance in myocardial pathology segmentation, motion estimation, and anatomy segmentation. |
|
Multivariate de Bruijn Graphs: A Symbolic Graph Framework for Time Series Forecasting | 2025-07-03 | ShowTime series forecasting remains a challenging task for foundation models due to temporal heterogeneity, high dimensionality, and the lack of inherent symbolic structure. In this work, we propose DRAGON (Discrete Representation and Augmented Graph encoding Over de BruijN Graphs), a novel encoder that introduces Multivariate de Bruijn Graphs (MdBGs) to bridge the gap between symbolic representations and neural modeling. DRAGON discretizes continuous input sequences and maps them onto a fixed graph structure, enabling dynamic context recovery via graph-based attention. Integrated as an auxiliary module within a dual-branch architecture, DRAGON augments conventional CNN-based encoders with symbolic, structure-aware representations. All code developed for this study is available at: https://github.com/KurbanIntelligenceLab/MultdBG-Time-Series-Library |
|
Deep Learning-Based Forecasting of Hotel KPIs: A Cross-City Analysis of Global Urban Markets | 2025-07-02 | ShowThis study employs Long Short-Term Memory (LSTM) networks to forecast key performance indicators (KPIs), Occupancy (OCC), Average Daily Rate (ADR), and Revenue per Available Room (RevPAR), across five major cities: Manchester, Amsterdam, Dubai, Bangkok, and Mumbai. The cities were selected for their diverse economic profiles and hospitality dynamics. Monthly data from 2018 to 2025 were used, with 80% for training and 20% for testing. Advanced time series decomposition and machine learning techniques enabled accurate forecasting and trend identification. Results show that Manchester and Mumbai exhibited the highest predictive accuracy, reflecting stable demand patterns, while Dubai and Bangkok demonstrated higher variability due to seasonal and event-driven influences. The findings validate the effectiveness of LSTM models for urban hospitality forecasting and provide a comparative framework for data-driven decision-making. The models generalisability across global cities highlights its potential utility for tourism stakeholders and urban planners. |
|
Generating Hypotheses of Dynamic Causal Graphs in Neuroscience: Leveraging Generative Factor Models of Observed Time Series | 2025-07-02 | ShowThe field of hypothesis generation promises to reduce costs in neuroscience by narrowing the range of interventional studies needed to study various phenomena. Existing machine learning methods can generate scientific hypotheses from complex datasets, but many approaches assume causal relationships are static over time, limiting their applicability to systems with dynamic, state-dependent behavior, such as the brain. While some techniques attempt dynamic causal discovery through factor models, they often restrict relationships to linear patterns or impose other simplifying assumptions. We propose a novel method that models dynamic graphs as a conditionally weighted superposition of static graphs, where each static graph can capture nonlinear relationships. This approach enables the detection of complex, time-varying interactions between variables beyond linear limitations. Our method improves f1-scores of predicted dynamic causal patterns by roughly 22-28% on average over baselines in some of our experiments, with some improvements reaching well over 60%. A case study on real brain data demonstrates our method's ability to uncover relationships linked to specific behavioral states, offering valuable insights into neural dynamics. |
|
Identifying Systems with Symmetries using Equivariant Autoregressive Reservoir Computers | 2025-07-02 | ShowThe investigation reported in this document focuses on identifying systems with symmetries using equivariant autoregressive reservoir computers. General results in structured matrix approximation theory are presented, exploring a two-fold approach. Firstly, a comprehensive examination of generic symmetry-preserving nonlinear time delay embedding is conducted. This involves analyzing time series data sampled from an equivariant system under study. Secondly, sparse least-squares methods are applied to discern approximate representations of the output coupling matrices. These matrices play a critical role in determining the nonlinear autoregressive representation of an equivariant system. The structural characteristics of these matrices are dictated by the set of symmetries inherent in the system. The document outlines prototypical algorithms derived from the described techniques, offering insight into their practical applications. Emphasis is placed on the significant improvement on structured identification precision when compared to classical reservoir computing methods for the simulation of equivariant dynamical systems. |
The v...The views expressed in the article do not necessarily represent the views of the National Commission of Banks and Insurance Companies of Honduras |
Characterizing control between interacting subsystems with deep Jacobian estimation | 2025-07-02 | ShowBiological function arises through the dynamical interactions of multiple subsystems, including those between brain areas, within gene regulatory networks, and more. A common approach to understanding these systems is to model the dynamics of each subsystem and characterize communication between them. An alternative approach is through the lens of control theory: how the subsystems control one another. This approach involves inferring the directionality, strength, and contextual modulation of control between subsystems. However, methods for understanding subsystem control are typically linear and cannot adequately describe the rich contextual effects enabled by nonlinear complex systems. To bridge this gap, we devise a data-driven nonlinear control-theoretic framework to characterize subsystem interactions via the Jacobian of the dynamics. We address the challenge of learning Jacobians from time-series data by proposing the JacobianODE, a deep learning method that leverages properties of the Jacobian to directly estimate it for arbitrary dynamical systems from data alone. We show that JacobianODEs outperform existing Jacobian estimation methods on challenging systems, including high-dimensional chaos. Applying our approach to a multi-area recurrent neural network (RNN) trained on a working memory selection task, we show that the "sensory" area gains greater control over the "cognitive" area over learning. Furthermore, we leverage the JacobianODE to directly control the trained RNN, enabling precise manipulation of its behavior. Our work lays the foundation for a theoretically grounded and data-driven understanding of interactions among biological subsystems. |
10 pages, 6 figures |
Towards Foundation Auto-Encoders for Time-Series Anomaly Detection | 2025-07-02 | ShowWe investigate a novel approach to time-series modeling, inspired by the successes of large pretrained foundation models. We introduce FAE (Foundation Auto-Encoders), a foundation generative-AI model for anomaly detection in time-series data, based on Variational Auto-Encoders (VAEs). By foundation, we mean a model pretrained on massive amounts of time-series data which can learn complex temporal patterns useful for accurate modeling, forecasting, and detection of anomalies on previously unseen datasets. FAE leverages VAEs and Dilated Convolutional Neural Networks (DCNNs) to build a generic model for univariate time-series modeling, which could eventually perform properly in out-of-the-box, zero-shot anomaly detection applications. We introduce the main concepts of FAE, and present preliminary results in different multi-dimensional time-series datasets from various domains, including a real dataset from an operational mobile ISP, and the well known KDD 2021 Anomaly Detection dataset. |
Prese...Presented at ACM KDD 2024, MiLeTS 2024 Workshop, August 25, 2024, Barcelona, Spain |
Modeling the Deterioration of Pavement Skid Resistance and Surface Texture After Preventive Maintenance | 2025-07-02 | ShowThis study investigates the deterioration of skid resistance and surface macrotexture following preventive maintenance using micro-milling techniques. Field data were collected from 31 asphalt pavement sections located across four climatic zones in Texas, encompassing a variety of surface types, milling depths, operational speeds, and drum configurations. A standardized data collection protocol was followed, with measurements taken before milling, immediately after treatment, and at 3, 6, 12, and 18 months post-treatment. Skid number and Mean Profile Depth (MPD) were used to evaluate surface friction and texture characteristics. The dataset was reformatted into a time-series structure with 930 observations, incorporating contextual variables such as climatic zone, treatment parameters, and baseline surface condition. A comparative modeling framework was applied to predict the deterioration trends of both skid resistance and macrotexture over time. Eight regression models, including linear, tree-based, and ensemble methods, were evaluated alongside a sequence-to-one transformer model. Results show that the transformer model achieved the highest prediction accuracy for skid resistance (R2=0.981), while Random Forest performing best for macrotexture prediction (R2 = 0.838). The findings indicate that the degradation of surface characteristics after preventive maintenance is nonlinear and influenced by a combination of environmental and operational factors. This study demonstrates the effectiveness of data-driven modeling in supporting transportation agencies with pavement performance forecasting and maintenance planning. |
|
Time-Series JEPA for Predictive Remote Control under Capacity-Limited Networks | 2025-07-02 | ShowIn remote control systems, transmitting large data volumes (e.g., images, video frames) from wireless sensors to remote controllers is challenging when uplink capacity is limited (e.g., RedCap devices or massive wireless sensor networks). Furthermore, controllers often need only information-rich representations of the original data. To address this, we propose a semantic-driven predictive control combined with a channel-aware scheduling to enhance control performance for multiple devices under limited network capacity. At its core, the proposed framework, coined Time-Series Joint Embedding Predictive Architecture (TS-JEPA), encodes high-dimensional sensory data into low-dimensional semantic embeddings at the sensor, reducing communication overhead. Furthermore, TS-JEPA enables predictive inference by predicting future embeddings from current ones and predicted commands, which are directly used by a semantic actor model to compute control commands within the embedding space, eliminating the need to reconstruct raw data. To further enhance reliability and communication efficiency, a channel-aware scheduling is integrated to dynamically prioritize device transmissions based on channel conditions and age of information (AoI). Simulations on inverted cart-pole systems show that the proposed framework significantly outperforms conventional control baselines in communication efficiency, control cost, and predictive accuracy. It enables robust and scalable control under limited network capacity compared to traditional scheduling schemes. |
|
Simulation and evaluation of local daily temperature and precipitation series derived by stochastic downscaling of ERA5 reanalysis | 2025-07-02 | ShowReanalysis products such as the ERA5 reanalysis are commonly used as proxies for observed atmospheric conditions. These products are convenient to use due to their global coverage, the large number of available atmospheric variables and the physical consistency between these variables, as well as their relatively high spatial and temporal resolutions. However, despite the continuous improvements in accuracy and increasing spatial and temporal resolutions of reanalysis products, they may not always capture local atmospheric conditions, especially for highly localised variables such as precipitation. This paper proposes a computationally efficient stochastic downscaling of ERA5 temperature and precipitation. The method combines information from ERA5 and surface observations from nearby stations in a non-linear regression framework that combines generalised additive models (GAMs) with regression splines and auto-regressive moving average (ARMA) models to produce realistic time series of local daily temperature and precipitation. Using a wide range of evaluation criteria that address different properties of the data, the proposed framework is shown to improve the representation of local temperature and precipitation compared to ERA5 at over 4000 locations in Europe over a period of 60 years. |
|
Non-collective Calibrating Strategy for Time Series Forecasting | 2025-07-02 | ShowDeep learning-based approaches have demonstrated significant advancements in time series forecasting. Despite these ongoing developments, the complex dynamics of time series make it challenging to establish the rule of thumb for designing the golden model architecture. In this study, we argue that refining existing advanced models through a universal calibrating strategy can deliver substantial benefits with minimal resource costs, as opposed to elaborating and training a new model from scratch. We first identify a multi-target learning conflict in the calibrating process, which arises when optimizing variables across time steps, leading to the underutilization of the model's learning capabilities. To address this issue, we propose an innovative calibrating strategy called Socket+Plug (SoP). This approach retains an exclusive optimizer and early-stopping monitor for each predicted target within each Plug while keeping the fully trained Socket backbone frozen. The model-agnostic nature of SoP allows it to directly calibrate the performance of any trained deep forecasting models, regardless of their specific architectures. Extensive experiments on various time series benchmarks and a spatio-temporal meteorological ERA5 dataset demonstrate the effectiveness of SoP, achieving up to a 22% improvement even when employing a simple MLP as the Plug (highlighted in Figure 1). Code is available at https://github.com/hanyuki23/SoP. |
Accep...Accepted by IJCAI 2025 |
MOMENTI: Scalable Motif Mining in Multidimensional Time Series | 2025-07-02 | ShowTime series play a fundamental role in many domains, capturing a plethora of information about the underlying data-generating processes. When a process generates multiple synchronized signals we are faced with multidimensional time series. In this context a fundamental problem is that of motif mining, where we seek patterns repeating twice with minor variations, spanning some of the dimensions. State of the art exact solutions for this problem run in time quadratic in the length of the input time series. We provide a scalable method to find the top-k motifs in multidimensional time series with probabilistic guarantees on the quality of the results. Our algorithm runs in time subquadratic in the length of the input, and returns the exact solution with probability at least |
14 pa...14 pages, 7 figures, extended experimental section, change of algorithm name due to a title clash with another paper published in the same issue |
Time Series Representations for Classification Lie Hidden in Pretrained Vision Transformers | 2025-07-02 | ShowTime series classification is a fundamental task in healthcare and industry, yet the development of time series foundation models (TSFMs) remains limited by the scarcity of publicly available time series datasets. In this work, we propose Time Vision Transformer (TiViT), a framework that converts time series into images to leverage the representational power of frozen Vision Transformers (ViTs) pretrained on large-scale image datasets. First, we theoretically motivate our approach by analyzing the 2D patching of ViTs for time series, showing that it can increase the number of label-relevant tokens and reduce the sample complexity. Second, we empirically demonstrate that TiViT achieves state-of-the-art performance on standard time series classification benchmarks by utilizing the hidden representations of large OpenCLIP models. We explore the structure of TiViT representations and find that intermediate layers with high intrinsic dimension are the most effective for time series classification. Finally, we assess the alignment between TiViT and TSFM representation spaces and identify a strong complementarity, with further performance gains achieved by combining their features. Our findings reveal a new direction for reusing vision representations in a non-visual domain. Code is available at https://github.com/ExplainableML/TiViT. |
Preprint |
A Bayesian framework for change-point detection with uncertainty quantification | 2025-07-02 | ShowWe introduce a novel Bayesian method that can detect multiple structural breaks in the mean and variance of a length |
|
Fourier Series Guided Design of Quantum Convolutional Neural Networks for Enhanced Time Series Forecasting | 2025-07-02 | ShowIn this study, we apply 1D quantum convolution to address the task of time series forecasting. By encoding multiple points into the quantum circuit to predict subsequent data, each point becomes a feature, transforming the problem into a multidimensional one. Building on theoretical foundations from prior research, which demonstrated that Variational Quantum Circuits (VQCs) can be expressed as multidimensional Fourier series, we explore the capabilities of different architectures and ansatz. This analysis considers the concepts of circuit expressibility and the presence of barren plateaus. Analyzing the problem within the framework of the Fourier series enabled the design of an architecture that incorporates data reuploading, resulting in enhanced performance. Rather than a strict requirement for the number of free parameters to exceed the degrees of freedom of the Fourier series, our findings suggest that even a limited number of parameters can produce Fourier functions of higher degrees. This highlights the remarkable expressive power of quantum circuits. This observation is also significant in reducing training times. The ansatz with greater expressibility and number of non-zero Fourier coefficients consistently delivers favorable results across different scenarios, with performance metrics improving as the number of qubits increases. |
|
Fitting Sparse Markov Models to Categorical Time Series Using Convex Clustering | 2025-07-02 | ShowHigher-order Markov chains are frequently used to model categorical time series. However, a major problem with fitting such models is the exponentially growing number of parameters in the model order. A popular approach to parsimonious modeling is to use a Variable Length Markov Chain (VLMC), which determines relevant contexts (recent pasts) of variable orders and forms a context tree. A more general parsimonious modeling approach is given by Sparse Markov Models (SMMs), where all possible histories of order |
|
Tunnelling Through Time Series: A Probabilistic Visibility Graph for Local and Global Pattern Discovery | 2025-07-01 | ShowThe growing availability of high-resolution, long-term time series data has highlighted the need for methods capable of capturing both local and global patterns. To address this, we introduce the Probabilistic Visibility Graph (PVG), a novel approach inspired by the quantum tunnelling phenomenon. The PVG extends the classical Visibility Graph (VG) by introducing probabilistic connections between time points that are obstructed in the VG due to intermediate values. We demonstrate the PVG's effectiveness in capturing long-range dependencies through simulations of amplitude-modulated signals and analysis of electrocorticography (ECoG) data under rest and anesthesia conditions. Key results show that the PVG presents distinct network properties between rest and anesthesia, with rest exhibiting stronger small-worldness and scale-free behavior, reflecting a hub-dominated, centralized connectivity structure, compared to anesthesia. These findings highlight the PVG's potential for analyzing complex signals with interacting temporal scales, offering new insights into neural dynamics and other real-world phenomena. |
|
Uniform Validity of the Subset Anderson-Rubin Test under Heteroskedasticity and Nonlinearity | 2025-07-01 | ShowWe consider the Anderson-Rubin (AR) statistic for a general set of nonlinear moment restrictions. The statistic is based on the criterion function of the continuous updating estimator (CUE) for a subset of parameters not constrained under the Null. We treat the data distribution nonparametrically with parametric moment restrictions imposed under the Null. We show that subset tests and confidence intervals based on the AR statistic are uniformly valid over a wide range of distributions that include moment restrictions with general forms of heteroskedasticity. We show that the AR based tests have correct asymptotic size when parameters are unidentified, partially identified, weakly or strongly identified. We obtain these results by constructing an upper bound that is using a novel perturbation and regularization approach applied to the first order conditions of the CUE. Our theory applies to both cross-sections and time series data and does not assume stationarity in time series settings or homogeneity in cross-sectional settings. |
|
Meta-Posterior Consistency for the Bayesian Inference of Metastable System | 2025-07-01 | ShowThe vast majority of the literature on learning dynamical systems or stochastic processes from time series has focused on stable or ergodic systems, for both Bayesian and frequentist inference procedures. However, most real-world systems are only metastable, that is, the dynamics appear to be stable on some time scale, but are in fact unstable over longer time scales. Consistency of inference for metastable systems may not be possible, but one can ask about metaconsistency: Do inference procedures converge when observations are taken over a large but finite time interval, but diverge on longer time scales? In this paper we introduce, discuss, and quantify metaconsistency in a Bayesian framework. We discuss how metaconsistency can be exploited to efficiently infer a model for a sub-system of a larger system, where inference on the global behavior may require much more data, or there is no theoretical guarantee as to the asymptotic success of inference procedures. We also discuss the relation between metaconsistency and the spectral properties of the model dynamical system in the case of uniformly ergodic and non-ergodic diffusions. |
32 pages, 3 figures |
Toward a Data Processing Pipeline for Mobile-Phone Tracking Data | 2025-07-01 | ShowAs mobile phones become ubiquitous, high-frequency smartphone positioning data are increasingly being used by researchers studying the mobility patterns of individuals as they go about their daily routines and the consequences of these patterns for health, behavioral, and other outcomes. A complex data pipeline underlies empirical research leveraging mobile phone tracking data. A key component of this pipeline is transforming raw, time-stamped positions into analysis-ready data objects, typically space-time "trajectories." In this paper, we break down a key portion of the data analysis pipeline underlying the Adolescent Health and Development in Context (AHDC) Study, a large-scale, longitudinal study of youth residing in the Columbus, OH metropolitan area. Recognizing that the bespoke "binning algorithm" used by AHDC researchers resembles a time-series filtering algorithm, we propose a statistical framework - a formal probability model and computational approach to inference - inspired by the binning algorithm for transforming noisy, time-stamped geographic positioning observations into mobility trajectories that capture periods of travel and stability. Our framework, unlike the binning algorithm, allows for formal smoothing via a particle Gibbs algorithm, improving estimation of trajectories as compared to the original binning algorithm. We argue that our framework can be used as a default data processing tool for future mobile-phone tracking studies. |
|
Time Series Foundation Models are Flow Predictors | 2025-07-01 | ShowWe investigate the effectiveness of time series foundation models (TSFMs) for crowd flow prediction, focusing on Moirai and TimesFM. Evaluated on three real-world mobility datasets-Bike NYC, Taxi Beijing, and Spanish national OD flows-these models are deployed in a strict zero-shot setting, using only the temporal evolution of each OD flow and no explicit spatial information. Moirai and TimesFM outperform both statistical and deep learning baselines, achieving up to 33% lower RMSE, 39% lower MAE and up to 49% higher CPC compared to state-of-the-art competitors. Our results highlight the practical value of TSFMs for accurate, scalable flow prediction, even in scenarios with limited annotated data or missing spatial context. |
arXiv...arXiv admin note: text overlap with arXiv:2203.07372 |
LangTime: A Language-Guided Unified Model for Time Series Forecasting with Proximal Policy Optimization | 2025-07-01 | ShowRecent research has shown an increasing interest in utilizing pre-trained large language models (LLMs) for a variety of time series applications. However, there are three main challenges when using LLMs as foundational models for time series forecasting: (1) Cross-domain generalization. (2) Cross-modality alignment. (3) Error accumulation in autoregressive frameworks. To address these challenges, we proposed LangTime, a language-guided unified model for time series forecasting that incorporates cross-domain pre-training with reinforcement learning-based fine-tuning. Specifically, LangTime constructs Temporal Comprehension Prompts (TCPs), which include dataset-wise and channel-wise instructions, to facilitate domain adaptation and condense time series into a single token, enabling LLMs to understand better and align temporal data. To improve autoregressive forecasting, we introduce TimePPO, a reinforcement learning-based fine-tuning algorithm. TimePPO mitigates error accumulation by leveraging a multidimensional rewards function tailored for time series and a repeat-based value estimation strategy. Extensive experiments demonstrate that LangTime achieves state-of-the-art cross-domain forecasting performance, while TimePPO fine-tuning effectively enhances the stability and accuracy of autoregressive forecasting. |
|
Over the Stability Space of a Multivariate Time Series | 2025-07-01 | ShowThis paper jointly addresses the challenges of non-stationarity and high dimensionality in analysing multivariate time series. Building on the classical concept of cointegration, we introduce a more flexible notion, called stability space, aimed at capturing stationary components in settings where traditional assumptions may not hold. Based on the dimensionality reduction techniques of Partial Least Squares and Principal Component Analysis, we proposed two non-parametric procedures for estimating such a space and a targeted selection of components that prioritises stationarity. We compare these alternatives with the parametric Johansen procedure, when possible. Through simulations and real-data applications, we evaluated the performance of these methodologies across various scenarios, including high-dimensional configurations. |
31 pa...31 pages, 7 figures, 6 tables |
Towards Efficient Parametric State Estimation in Circulating Fuel Reactors with Shallow Recurrent Decoder Networks | 2025-07-01 | ShowThe recent developments in data-driven methods have paved the way to new methodologies to provide accurate state reconstruction of engineering systems; nuclear reactors represent particularly challenging applications for this task due to the complexity of the strongly coupled physics involved and the extremely harsh and hostile environments, especially for new technologies such as Generation-IV reactors. Data-driven techniques can combine different sources of information, including computational proxy models and local noisy measurements on the system, to robustly estimate the state. This work leverages the novel Shallow Recurrent Decoder architecture to infer the entire state vector (including neutron fluxes, precursors concentrations, temperature, pressure and velocity) of a reactor from three out-of-core time-series neutron flux measurements alone. In particular, this work extends the standard architecture to treat parametric time-series data, ensuring the possibility of investigating different accidental scenarios and showing the capabilities of this approach to provide an accurate state estimation in various operating conditions. This paper considers as a test case the Molten Salt Fast Reactor (MSFR), a Generation-IV reactor concept, characterised by strong coupling between the neutronics and the thermal hydraulics due to the liquid nature of the fuel. The promising results of this work are further strengthened by the possibility of quantifying the uncertainty associated with the state estimation, due to the considerably low training cost. The accurate reconstruction of every characteristic field in real-time makes this approach suitable for monitoring and control purposes in the framework of a reactor digital twin. |
|
AI Analyst: Framework and Comprehensive Evaluation of Large Language Models for Financial Time Series Report Generation | 2025-07-01 | ShowThis paper explores the potential of large language models (LLMs) to generate financial reports from time series data. We propose a framework encompassing prompt engineering, model selection, and evaluation. We introduce an automated highlighting system to categorize information within the generated reports, differentiating between insights derived directly from time series data, stemming from financial reasoning, and those reliant on external knowledge. This approach aids in evaluating the factual grounding and reasoning capabilities of the models. Our experiments, utilizing both data from the real stock market indices and synthetic time series, demonstrate the capability of LLMs to produce coherent and informative financial reports. |
|
Continuous Wavelet Transform and Siamese Network-Based Anomaly Detection in Multi-variate Semiconductor Process Time Series | 2025-07-01 | ShowSemiconductor manufacturing is an extremely complex process, characterized by thousands of interdependent parameters collected across diverse tools and process steps. Multi-variate time-series (MTS) analysis has emerged as a critical methodology for enabling real-time monitoring, fault detection, and predictive maintenance in such environments. However, anomaly prediction in semiconductor fabrication presents several critical challenges, including high data dimensionality, severe class imbalance due to the rarity of true faults, noisy and missing measurements, and non-stationary behavior of production systems. Furthermore, the complex interdependencies between variables and the delayed emergence of faults across downstream stages complicate both anomaly detection and root-cause-analysis. This paper presents a novel and generic approach for anomaly detection in MTS data using machine learning. The proposed methodology consists of three main steps: a) converting MTS data into image-based representations using the Continuous Wavelet Transform, b) developing a multi-class image classifier by fine-tuning a pretrained VGG-16 architecture on custom CWT image datasets, and c) constructing a Siamese network composed of two identical sub-networks, each utilizing the fine-tuned VGG-16 as a backbone. The network takes pairs of CWT images as input -one serving as a reference or anchor (representing a known-good signal), and the other as a query (representing an unknown signal). The model then compares the embeddings of both inputs to determine whether they belong to the same class at a given time step. Our approach demonstrates high accuracy in identifying anomalies on a real FAB process time-series dataset, offering a promising solution for offline anomaly detection in process and tool trace data. Moreover, the approach is flexible and can be applied in both supervised and semi-supervised settings. |
13 pa...13 pages, 7 figures, submitted to IEEE Transactions on Semiconductor Manufacturing |
Teaching Time Series to See and Speak: Forecasting with Aligned Visual and Textual Perspectives | 2025-07-01 | ShowTime series forecasting traditionally relies on unimodal numerical inputs, which often struggle to capture high-level semantic patterns due to their dense and unstructured nature. While recent approaches have explored representing time series as text using large language models (LLMs), these methods remain limited by the discrete nature of token sequences and lack the perceptual intuition humans typically apply, such as interpreting visual patterns. In this paper, we propose a multimodal contrastive learning framework that transforms raw time series into structured visual and textual perspectives. Rather than using natural language or real-world images, we construct both modalities directly from numerical sequences. We then align these views in a shared semantic space via contrastive learning, enabling the model to capture richer and more complementary representations. Furthermore, we introduce a variate selection module that leverages the aligned representations to identify the most informative variables for multivariate forecasting. Extensive experiments on fifteen short-term and six long-term forecasting benchmarks demonstrate that our approach consistently outperforms strong unimodal and cross-modal baselines, highlighting the effectiveness of multimodal alignment in enhancing time series forecasting. Code is available at: https://github.com/Ironieser/TimesCLIP. |
Code:... |
Evaluation of a Foundational Model and Stochastic Models for Forecasting Sporadic or Spiky Production Outages of High-Performance Machine Learning Services | 2025-06-30 | ShowTime series forecasting models have diverse real world applications (e.g., from electricity metrics to software workload). Latest foundational models trained for time series forecasting show strengths (e.g., for long sequences and in zero-shot settings). However, foundational model was not yet used for forecasting rare, spiky events, i.e., a challenging target because those are a corner case of extreme events. In this paper, we optimize a state-of-the-art foundational model to forecast sporadic or spiky production outages of high-performance machine learning services powering billions of client devices. We evaluate the forecasting errors of the foundational model compared with classical stochastic forecasting models (e.g., moving average and autoregressive). The analysis helps us understand how each of the evaluated models performs for the sporadic or spiky events. For example, it identifies the key patterns in the target data that are well tracked by the foundational model vs. each of the stochastic models. We use the models with optimal parameters to estimate a year-long outage statistics of a particular root cause with less than 6% value errors. |
|
MamNet: A Novel Hybrid Model for Time-Series Forecasting and Frequency Pattern Analysis in Network Traffic | 2025-06-30 | ShowThe abnormal fluctuations in network traffic may indicate potential security threats or system failures. Therefore, efficient network traffic prediction and anomaly detection methods are crucial for network security and traffic management. This paper proposes a novel network traffic prediction and anomaly detection model, MamNet, which integrates time-domain modeling and frequency-domain feature extraction. The model first captures the long-term dependencies of network traffic through the Mamba module (time-domain modeling), and then identifies periodic fluctuations in the traffic using Fourier Transform (frequency-domain feature extraction). In the feature fusion layer, multi-scale information is integrated to enhance the model's ability to detect network traffic anomalies. Experiments conducted on the UNSW-NB15 and CAIDA datasets demonstrate that MamNet outperforms several recent mainstream models in terms of accuracy, recall, and F1-Score. Specifically, it achieves an improvement of approximately 2% to 4% in detection performance for complex traffic patterns and long-term trend detection. The results indicate that MamNet effectively captures anomalies in network traffic across different time scales and is suitable for anomaly detection tasks in network security and traffic management. Future work could further optimize the model structure by incorporating external network event information, thereby improving the model's adaptability and stability in complex network environments. |
16 pages |
Constructing Non-Markovian Decision Process via History Aggregator | 2025-06-30 | ShowIn the domain of algorithmic decision-making, non-Markovian dynamics manifest as a significant impediment, especially for paradigms such as Reinforcement Learning (RL), thereby exerting far-reaching consequences on the advancement and effectiveness of the associated systems. Nevertheless, the existing benchmarks are deficient in comprehensively assessing the capacity of decision algorithms to handle non-Markovian dynamics. To address this deficiency, we have devised a generalized methodology grounded in category theory. Notably, we established the category of Markov Decision Processes (MDP) and the category of non-Markovian Decision Processes (NMDP), and proved the equivalence relationship between them. This theoretical foundation provides a novel perspective for understanding and addressing non-Markovian dynamics. We further introduced non-Markovianity into decision-making problem settings via the History Aggregator for State (HAS). With HAS, we can precisely control the state dependency structure of decision-making problems in the time series. Our analysis demonstrates the effectiveness of our method in representing a broad range of non-Markovian dynamics. This approach facilitates a more rigorous and flexible evaluation of decision algorithms by testing them in problem settings where non-Markovian dynamics are explicitly constructed. |
|
Towards transparent and data-driven fault detection in manufacturing: A case study on univariate, discrete time series | 2025-06-30 | ShowEnsuring consistent product quality in modern manufacturing is crucial, particularly in safety-critical applications. Conventional quality control approaches, reliant on manually defined thresholds and features, lack adaptability to the complexity and variability inherent in production data and necessitate extensive domain expertise. Conversely, data-driven methods, such as machine learning, demonstrate high detection performance but typically function as black-box models, thereby limiting their acceptance in industrial environments where interpretability is paramount. This paper introduces a methodology for industrial fault detection, which is both data-driven and transparent. The approach integrates a supervised machine learning model for multi-class fault classification, Shapley Additive Explanations for post-hoc interpretability, and a do-main-specific visualisation technique that maps model explanations to operator-interpretable features. Furthermore, the study proposes an evaluation methodology that assesses model explanations through quantitative perturbation analysis and evaluates visualisations by qualitative expert assessment. The approach was applied to the crimping process, a safety-critical joining technique, using a dataset of univariate, discrete time series. The system achieves a fault detection accuracy of 95.9 %, and both quantitative selectivity analysis and qualitative expert evaluations confirmed the relevance and inter-pretability of the generated explanations. This human-centric approach is designed to enhance trust and interpretability in data-driven fault detection, thereby contributing to applied system design in industrial quality control. |
Title | Date | Abstract | Comment |
---|---|---|---|
Physics-Grounded Motion Forecasting via Equation Discovery for Trajectory-Guided Image-to-Video Generation | 2025-07-09 | ShowRecent advances in diffusion-based and autoregressive video generation models have achieved remarkable visual realism. However, these models typically lack accurate physical alignment, failing to replicate real-world dynamics in object motion. This limitation arises primarily from their reliance on learned statistical correlations rather than capturing mechanisms adhering to physical laws. To address this issue, we introduce a novel framework that integrates symbolic regression (SR) and trajectory-guided image-to-video (I2V) models for physics-grounded video forecasting. Our approach extracts motion trajectories from input videos, uses a retrieval-based pre-training mechanism to enhance symbolic regression, and discovers equations of motion to forecast physically accurate future trajectories. These trajectories then guide video generation without requiring fine-tuning of existing models. Evaluated on scenarios in Classical Mechanics, including spring-mass, pendulums, and projectile motions, our method successfully recovers ground-truth analytical equations and improves the physical alignment of generated videos over baseline methods. |
|
ILNet: Trajectory Prediction with Inverse Learning Attention for Enhancing Intention Capture | 2025-07-09 | ShowTrajectory prediction for multi-agent interaction scenarios is a crucial challenge. Most advanced methods model agent interactions by efficiently factorized attention based on the temporal and agent axes. However, this static and foward modeling lacks explicit interactive spatio-temporal coordination, capturing only obvious and immediate behavioral intentions. Alternatively, the modern trajectory prediction framework refines the successive predictions by a fixed-anchor selection strategy, which is difficult to adapt in different future environments. It is acknowledged that human drivers dynamically adjust initial driving decisions based on further assumptions about the intentions of surrounding vehicles. Motivated by human driving behaviors, this paper proposes ILNet, a multi-agent trajectory prediction method with Inverse Learning (IL) attention and Dynamic Anchor Selection (DAS) module. IL Attention employs an inverse learning paradigm to model interactions at neighboring moments, introducing proposed intentions to dynamically encode the spatio-temporal coordination of interactions, thereby enhancing the model's ability to capture complex interaction patterns. Then, the learnable DAS module is proposed to extract multiple trajectory change keypoints as anchors in parallel with almost no increase in parameters. Experimental results show that the ILNet achieves state-of-the-art performance on the INTERACTION and Argoverse motion forecasting datasets. Particularly, in challenged interaction scenarios, ILNet achieves higher accuracy and more multimodal distributions of trajectories over fewer parameters. Our codes are available at https://github.com/mjZeng11/ILNet. |
|
Self-supervised learning predicts plant growth trajectories from multi-modal industrial greenhouse data | 2025-07-08 | ShowQuantifying organism-level phenotypes, such as growth dynamics and biomass accumulation, is fundamental to understanding agronomic traits and optimizing crop production. However, quality growing data of plants at scale is difficult to generate. Here we use a mobile robotic platform to capture high-resolution environmental sensing and phenotyping measurements of a large-scale hydroponic leafy greens system. We describe a self-supervised modeling approach to build a map from observed growing data to the entire plant growth trajectory. We demonstrate our approach by forecasting future plant height and harvest mass of crops in this system. This approach represents a significant advance in combining robotic automation and machine learning, as well as providing actionable insights for agronomic research and operational efficiency. |
|
Driving View Synthesis on Free-form Trajectories with Generative Prior | 2025-07-08 | ShowDriving view synthesis along free-form trajectories is essential for realistic driving simulations, enabling closed-loop evaluation of end-to-end driving policies. Existing methods excel at view interpolation along recorded paths but struggle to generalize to novel trajectories due to limited viewpoints in driving videos. To tackle this challenge, we propose DriveX, a novel free-form driving view synthesis framework, that progressively distills generative prior into the 3D Gaussian model during its optimization. Within this framework, we utilize a video diffusion model to refine the degraded novel trajectory renderings from the in-training Gaussian model, while the restored videos in turn serve as additional supervision for optimizing the 3D Gaussian. Concretely, we craft an inpainting-based video restoration task, which can disentangle the identification of degraded regions from the generative capability of the diffusion model and remove the need of simulating specific degraded pattern in the training of the diffusion model. To further enhance the consistency and fidelity of generated contents, the pseudo ground truth is progressively updated with gradually improved novel trajectory rendering, allowing both components to co-adapt and reinforce each other while minimizing the disruption on the optimization. By tightly integrating 3D scene representation with generative prior, DriveX achieves high-quality view synthesis beyond recorded trajectories in real time--unlocking new possibilities for flexible and realistic driving simulations on free-form trajectories. |
ICCV 2025 |
GC-GAT: Multimodal Vehicular Trajectory Prediction using Graph Goal Conditioning and Cross-context Attention | 2025-07-08 | ShowPredicting future trajectories of surrounding vehicles heavily relies on what contextual information is given to a motion prediction model. The context itself can be static (lanes, regulatory elements, etc) or dynamic (traffic participants). This paper presents a lane graph-based motion prediction model that first predicts graph-based goal proposals and later fuses them with cross attention over multiple contextual elements. We follow the famous encoder-interactor-decoder architecture where the encoder encodes scene context using lightweight Gated Recurrent Units, the interactor applies cross-context attention over encoded scene features and graph goal proposals, and the decoder regresses multimodal trajectories via Laplacian Mixture Density Network from the aggregated encodings. Using cross-attention over graph-based goal proposals gives robust trajectory estimates since the model learns to attend to future goal-relevant scene elements for the intended agent. We evaluate our work on nuScenes motion prediction dataset, achieving state-of-the-art results. |
|
SegmentDreamer: Towards High-fidelity Text-to-3D Synthesis with Segmented Consistency Trajectory Distillation< 10000 /a> | 2025-07-07 | ShowRecent advancements in text-to-3D generation improve the visual quality of Score Distillation Sampling (SDS) and its variants by directly connecting Consistency Distillation (CD) to score distillation. However, due to the imbalance between self-consistency and cross-consistency, these CD-based methods inherently suffer from improper conditional guidance, leading to sub-optimal generation results. To address this issue, we present SegmentDreamer, a novel framework designed to fully unleash the potential of consistency models for high-fidelity text-to-3D generation. Specifically, we reformulate SDS through the proposed Segmented Consistency Trajectory Distillation (SCTD), effectively mitigating the imbalance issues by explicitly defining the relationship between self- and cross-consistency. Moreover, SCTD partitions the Probability Flow Ordinary Differential Equation (PF-ODE) trajectory into multiple sub-trajectories and ensures consistency within each segment, which can theoretically provide a significantly tighter upper bound on distillation error. Additionally, we propose a distillation pipeline for a more swift and stable generation. Extensive experiments demonstrate that our SegmentDreamer outperforms state-of-the-art methods in visual quality, enabling high-fidelity 3D asset creation through 3D Gaussian Splatting (3DGS). |
Accep...Accepted by ICCV 2025, project page: https://zjhjojo.github.io/ |
From Marginal to Joint Predictions: Evaluating Scene-Consistent Trajectory Prediction Approaches for Automated Driving | 2025-07-07 | ShowAccurate motion prediction of surrounding traffic participants is crucial for the safe and efficient operation of automated vehicles in dynamic environments. Marginal prediction models commonly forecast each agent's future trajectories independently, often leading to sub-optimal planning decisions for an automated vehicle. In contrast, joint prediction models explicitly account for the interactions between agents, yielding socially and physically consistent predictions on a scene level. However, existing approaches differ not only in their problem formulation but also in the model architectures and implementation details used, making it difficult to compare them. In this work, we systematically investigate different approaches to joint motion prediction, including post-processing of the marginal predictions, explicitly training the model for joint predictions, and framing the problem as a generative task. We evaluate each approach in terms of prediction accuracy, multi-modality, and inference efficiency, offering a comprehensive analysis of the strengths and limitations of each approach. Several prediction examples are available at https://frommarginaltojointpred.github.io/. |
Accep...Accepted at International Conference on Intelligent Transportation Systems 2025 (ITSC 2025) |
Beyond Features: How Dataset Design Influences Multi-Agent Trajectory Prediction Performance | 2025-07-07 | ShowAccurate trajectory prediction is critical for safe autonomous navigation, yet the impact of dataset design on model performance remains understudied. This work systematically examines how feature selection, cross-dataset transfer, and geographic diversity influence trajectory prediction accuracy in multi-agent settings. We evaluate a state-of-the-art model using our novel L4 Motion Forecasting dataset based on our own data recordings in Germany and the US. This includes enhanced map and agent features. We compare our dataset to the US-centric Argoverse 2 benchmark. First, we find that incorporating supplementary map and agent features unique to our dataset, yields no measurable improvement over baseline features, demonstrating that modern architectures do not need extensive feature sets for optimal performance. The limited features of public datasets are sufficient to capture convoluted interactions without added complexity. Second, we perform cross-dataset experiments to evaluate how effective domain knowledge can be transferred between datasets. Third, we group our dataset by country and check the knowledge transfer between different driving cultures. |
|
Discrete Diffusion Trajectory Alignment via Stepwise Decomposition | 2025-07-07 | ShowDiscrete diffusion models have demonstrated great promise in modeling various sequence data, ranging from human language to biological sequences. Inspired by the success of RL in language models, there is growing interest in further improving the models by alignment with a certain reward. In this work, we propose a novel preference optimization method for masked discrete diffusion models through a principled diffusion trajectory alignment. Instead of applying the reward on the final output and backpropagating the gradient to the entire discrete denoising process, we decompose the problem into a set of stepwise alignment objectives. This framework enables efficient diffusion optimization, is compatible with arbitrary reward functions, and importantly, guarantees an equivalent optimal solution under additive factorization of the trajectory reward. Experiments across multiple domains including DNA sequence design, protein inverse folding, and language modeling consistently demonstrate the superiority of our approach. Notably, it achieves an up to 12% improvement over the most competitive RL-based baseline in terms of predicted activity on DNA sequence design, and further improves the GSM8K score from 78.6 to 80.7 on LLaDA-8B-Instruct for language modeling. |
22 pages, 3 figures |
Learning Maximal Safe Sets Using Hypernetworks for MPC-based Local Trajectory Planning in Unknown Environments | 2025-07-07 | ShowThis paper presents a novel learning-based approach for online estimation of maximal safe sets for local trajectory planning in unknown static environments. The neural representation of a set is used as the terminal set constraint for a model predictive control (MPC) local planner, resulting in improved recursive feasibility and safety. To achieve real-time performance and desired generalization properties, we employ the idea of hypernetworks. We use the Hamilton-Jacobi (HJ) reachability analysis as the source of supervision during the training process, allowing us to consider general nonlinear dynamics and arbitrary constraints. The proposed method is extensively evaluated against relevant baselines in simulations for different environments and robot dynamics. The results show an increase in success rate of up to 52% compared to the best baseline while maintaining comparable execution speed. Additionally, we deploy our proposed method, NTC-MPC, on a physical robot and demonstrate its ability to safely avoid obstacles in scenarios where the baselines fail. |
|
Risk-Aware Trajectory Optimization and Control for an Underwater Suspended Robotic System | 2025-07-07 | ShowThis paper focuses on the trajectory optimization of an underwater suspended robotic system comprising an uncrewed surface vessel (USV) and an uncrewed underwater vehicle (UUV) for autonomous litter collection. The key challenge lies in the significant uncertainty in drag and weight parameters introduced by the collected litter. We propose a dynamical model for the coupled UUV-USV system in the primary plane of motion and a risk-aware optimization approach incorporating parameter uncertainty and noise to ensure safe interactions with the environment. A stochastic optimization problem is solved using a conditional value-at-risk framework. Simulations demonstrate that our approach reduces collision risks and energy consumption, highlighting its reliability compared to existing control methods. |
|
LTMSformer: A Local Trend-Aware Attention and Motion State Encoding Transformer for Multi-Agent Trajectory Prediction | 2025-07-07 | ShowIt has been challenging to model the complex temporal-spatial dependencies between agents for trajectory prediction. As each state of an agent is closely related to the states of adjacent time steps, capturing the local temporal dependency is beneficial for prediction, while most studies often overlook it. Besides, learning the high-order motion state attributes is expected to enhance spatial interaction modeling, but it is rarely seen in previous works. To address this, we propose a lightweight framework, LTMSformer, to extract temporal-spatial interaction features for multi-modal trajectory prediction. Specifically, we introduce a Local Trend-Aware Attention mechanism to capture the local temporal dependency by leveraging a convolutional attention mechanism with hierarchical local time boxes. Next, to model the spatial interaction dependency, we build a Motion State Encoder to incorporate high-order motion state attributes, such as acceleration, jerk, heading, etc. To further refine the trajectory prediction, we propose a Lightweight Proposal Refinement Module that leverages Multi-Layer Perceptrons for trajectory embedding and generates the refined trajectories with fewer model parameters. Experiment results on the Argoverse 1 dataset demonstrate that our method outperforms the baseline HiVT-64, reducing the minADE by approximately 4.35%, the minFDE by 8.74%, and the MR by 20%. We also achieve higher accuracy than HiVT-128 with a 68% reduction in model size. |
|
Spatial-Temporal Conditional Random Field for Human Trajectory Prediction | 2025-07-07 | ShowTrajectory prediction is of significant importance in computer vision. Accurate pedestrian trajectory prediction benefits autonomous vehicles and robots in planning their motion. Pedestrians' trajectories are greatly influenced by their intentions. Prior studies having introduced various deep learning methods only pay attention to the spatial and temporal information of trajectory, overlooking the explicit intention information. In this study, we introduce a novel model, termed the \textbf{S-T CRF}: \textbf{S}patial-\textbf{T}emporal \textbf{C}onditional \textbf{R}andom \textbf{F}ield, which judiciously incorporates intention information besides spatial and temporal information of trajectory. This model uses a Conditional Random Field (CRF) to generate a representation of future intentions, greatly improving the prediction of subsequent trajectories when combined with spatial-temporal representation. Furthermore, the study innovatively devises a space CRF loss and a time CRF loss, meticulously designed to enhance interaction constraints and temporal dynamics, respectively. Extensive experimental evaluations on dataset ETH/UCY and SDD demonstrate that the proposed method surpasses existing baseline approaches. |
|
Predicting Drivers' Route Trajectories in Last-Mile Delivery Using A Pair-wise Attention-based Pointer Neural Network | 2025-07-07 | ShowIn last-mile delivery, drivers frequently deviate from planned delivery routes because of their tacit knowledge of the road and curbside infrastructure, customer availability, and other characteristics of the respective service areas. Hence, the actual stop sequences chosen by an experienced human driver may be potentially preferable to the theoretical shortest-distance routing under real-life operational conditions. Thus, being able to predict the actual stop sequence that a human driver would follow can help to improve route planning in last-mile delivery. This paper proposes a pair-wise attention-based pointer neural network for this prediction task using drivers' historical delivery trajectory data. In addition to the commonly used encoder-decoder architecture for sequence-to-sequence prediction, we propose a new attention mechanism based on an alternative specific neural network to capture the local pair-wise information for each pair of stops. To further capture the global efficiency of the route, we propose a new iterative sequence generation algorithm that is used after model training to identify the first stop of a route that yields the lowest operational cost. Results from an extensive case study on real operational data from Amazon's last-mile delivery operations in the US show that our proposed method can significantly outperform traditional optimization-based approaches and other machine learning methods (such as the Long Short-Term Memory encoder-decoder and the original pointer network) in finding stop sequences that are closer to high-quality routes executed by experienced drivers in the field. Compared to benchmark models, the proposed model can increase the average prediction accuracy of the first four stops from around 0.229 to 0.312, and reduce the disparity between the predicted route and the actual route by around 15%. |
|
D4orm: Multi-Robot Trajectories with Dynamics-aware Diffusion Denoised Deformations | 2025-07-07 | ShowThis work presents an optimization method for generating kinodynamically feasible and collision-free multi-robot trajectories that exploits an incremental denoising scheme in diffusion models. Our key insight is that high-quality trajectories can be discovered merely by denoising noisy trajectories sampled from a distribution. This approach has no learning component, relying instead on only two ingredients: a dynamical model of the robots to obtain feasible trajectories via rollout, and a fitness function to guide denoising with Monte Carlo gradient approximation. The proposed framework iteratively optimizes a deformation for the previous trajectory with the current denoising process, allows anytime refinement as time permits, supports different dynamics, and benefits from GPU acceleration. Our evaluations for differential-drive and holonomic teams with up to 16 robots in 2D and 3D worlds show its ability to discover high-quality solutions faster than other black-box optimization methods such as MPPI. In a 2D holonomic case with 16 robots, it is almost twice as fast. As evidence for feasibility, we demonstrate zero-shot deployment of the planned trajectories on eight multirotors. |
Accep...Accepted by 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) |
VLM-TDP: VLM-guided Trajectory-conditioned Diffusion Policy for Robust Long-Horizon Manipulation | 2025-07-06 | ShowDiffusion policy has demonstrated promising performance in the field of robotic manipulation. However, its effectiveness has been primarily limited in short-horizon tasks, and its performance significantly degrades in the presence of image noise. To address these limitations, we propose a VLM-guided trajectory-conditioned diffusion policy (VLM-TDP) for robust and long-horizon manipulation. Specifically, the proposed method leverages state-of-the-art vision-language models (VLMs) to decompose long-horizon tasks into concise, manageable sub-tasks, while also innovatively generating voxel-based trajectories for each sub-task. The generated trajectories serve as a crucial conditioning factor, effectively steering the diffusion policy and substantially enhancing its performance. The proposed Trajectory-conditioned Diffusion Policy (TDP) is trained on trajectories derived from demonstration data and validated using the trajectories generated by the VLM. Simulation experimental results indicate that our method significantly outperforms classical diffusion policies, achieving an average 44% increase in success rate, over 100% improvement in long-horizon tasks, and a 20% reduction in performance degradation in challenging conditions, such as noisy images or altered environments. These findings are further reinforced by our real-world experiments, where the performance gap becomes even more pronounced in long-horizon tasks. Videos are available on https://youtu.be/g0T6h32OSC8 |
|
Rapid and Safe Trajectory Planning over Diverse Scenes through Diffusion Composition | 2025-07-06 | ShowSafe trajectory planning remains a significant challenge in complex environments, where traditional methods often trade off computational efficiency for safety. Comprehensive obstacle modeling improves safety but is computationally expensive, while approximate methods are more efficient but may compromise safety. To address this issue, this paper introduces a rapid and safe trajectory planning framework based on state-based diffusion models. Leveraging only low-dimensional vehicle states, the diffusion models achieve notable inference efficiency while ensuring sufficient collision-free characteristics. By composing diffusion models, the proposed framework can safely generalize across diverse scenarios, planning collision-free trajectories even in unseen scenes. To further ensure the safety of the generated trajectories, an efficient, rule-based safety filter is proposed, which selects optimal trajectories that satisfy both sufficient safety and control feasibility from among candidate trajectories. Both in seen and unseen scenarios, the proposed method achieves efficient inference time while maintaining high safety and stability. Evaluations on the F1TENTH vehicle further demonstrate that the proposed method is practical in real-world applications. The project page is at: https://rstp-comp-diffuser.github.io/. |
|
WebSynthesis: World-Model-Guided MCTS for Efficient WebUI-Trajectory Synthesis | 2025-07-06 | ShowRecent advancements in large language models (LLMs) have significantly improved the capabilities of web agents. However, effectively navigating complex and dynamic web environments still requires more advanced trajectory-level planning and execution. Prior studies have addressed self-improving agents by collecting extensive GUI trajectories from real-environment interactions. Despite their effectiveness, these approaches encounter two critical challenges: (1) Uncontrollable environment states, where real or sandboxed web environments often yield unstable and non-deterministic feedback, complicating the reproduction and debugging of agent behaviors; and (2) High API costs, as generating even a single interaction trajectory can involve hundreds of queries, leading to considerable API usage and computational expenses. To address these limitations and enable scalable self-improvement for agents, we propose WebSynthesis, a novel framework for trajectory synthesis and training. WebSynthesis leverages a learned world model to simulate virtual web environments, allowing a policy agent to perform efficient and reversible tree-based planning. This approach supports the large-scale generation of diverse and high-quality trajectories, which are subsequently utilized to refine the agent's policy. Experimental results demonstrate that an agent trained using WebSynthesis on a small-scale synthetic dataset achieves performance comparable to or even surpassing that of models trained on large-scale real-world data. |
|
K Nearest Neighbor-Guided Trajectory Similarity Learning | 2025-07-06 | ShowTrajectory similarity is fundamental to many spatio-temporal data mining applications. Recent studies propose deep learning models to approximate conventional trajectory similarity measures, exploiting their fast inference time once trained. Although efficient inference has been reported, challenges remain in similarity approximation accuracy due to difficulties in trajectory granularity modeling and in exploiting similarity signals in the training data. To fill this gap, we propose TSMini, a highly effective trajectory similarity model with a sub-view modeling mechanism capable of learning multi-granularity trajectory patterns and a k nearest neighbor-based loss that guides TSMini to learn not only absolute similarity values between trajectories but also their relative similarity ranks. Together, these two innovations enable highly accurate trajectory similarity approximation. Experiments show that TSMini can outperform the state-of-the-art models by 22% in accuracy on average when learning trajectory similarity measures. |
|
SRefiner: Soft-Braid Attention for Multi-Agent Trajectory Refinement | 2025-07-06 | ShowAccurate prediction of multi-agent future trajectories is crucial for autonomous driving systems to make safe and efficient decisions. Trajectory refinement has emerged as a key strategy to enhance prediction accuracy. However, existing refinement methods often overlook the topological relationships between trajectories, which are vital for improving prediction precision. Inspired by braid theory, we propose a novel trajectory refinement approach, Soft-Braid Refiner (SRefiner), guided by the soft-braid topological structure of trajectories using Soft-Braid Attention. Soft-Braid Attention captures spatio-temporal topological relationships between trajectories by considering both spatial proximity and vehicle motion states at ``soft intersection points". Additionally, we extend this approach to model interactions between trajectories and lanes, further improving the prediction accuracy. SRefiner is a multi-iteration, multi-agent framework that iteratively refines trajectories, incorporating topological information to enhance interactions within traffic scenarios. SRefiner achieves significant performance improvements over four baseline methods across two datasets, establishing a new state-of-the-art in trajectory refinement. Code is here https://github.com/Liwen-Xiao/SRefiner. |
|
AMD: Adaptive Momentum and Decoupled Contrastive Learning Framework for Robust Long-Tail Trajectory Prediction | 2025-07-06 | ShowAccurately predicting the future trajectories of traffic agents is essential in autonomous driving. However, due to the inherent imbalance in trajectory distributions, tail data in natural datasets often represents more complex and hazardous scenarios. Existing studies typically rely solely on a base model's prediction error, without considering the diversity and uncertainty of long-tail trajectory patterns. We propose an adaptive momentum and decoupled contrastive learning framework (AMD), which integrates unsupervised and supervised contrastive learning strategies. By leveraging an improved momentum contrast learning (MoCo-DT) and decoupled contrastive learning (DCL) module, our framework enhances the model's ability to recognize rare and complex trajectories. Additionally, we design four types of trajectory random augmentation methods and introduce an online iterative clustering strategy, allowing the model to dynamically update pseudo-labels and better adapt to the distributional shifts in long-tail data. We propose three different criteria to define long-tail trajectories and conduct extensive comparative experiments on the nuScenes and ETH$/$UCY datasets. The results show that AMD not only achieves optimal performance in long-tail trajectory prediction but also demonstrates outstanding overall prediction accuracy. |
|
Particle Trajectory Representation Learning with Masked Point Modeling | 2025-07-06 | ShowEffective self-supervised learning (SSL) techniques have been key to unlocking large datasets for representation learning. While many promising methods have been developed using online corpora and captioned photographs, their application to scientific domains, where data encodes highly specialized knowledge, remains a challenge. Liquid Argon Time Projection Chambers (LArTPCs) provide high-resolution 3D imaging for fundamental physics, but analysis of their sparse, complex point cloud data often relies on supervised methods trained on large simulations, introducing potential biases. We introduce the Point-based Liquid Argon Masked Autoencoder (PoLAr-MAE), applying masked point modeling to unlabeled LArTPC images using domain-specific volumetric tokenization and energy prediction. We show this SSL approach learns physically meaningful trajectory representations directly from data. This yields remarkable data efficiency: fine-tuning on just 100 labeled events achieves track/shower semantic segmentation performance comparable to the state-of-the-art supervised baseline trained on $>$100,000 events. Furthermore, internal attention maps exhibit emergent instance segmentation of particle trajectories. While challenges remain, particularly for fine-grained features, we make concrete SSL's potential for building a foundation model for LArTPC image analysis capable of serving as a common base for all data reconstruction tasks. To facilitate further progress, we release PILArNet-M, a large dataset of 1M LArTPC events. Project site: https://youngsm.com/polarmae. |
Prepr...Preprint. 28 pages, 18 figures. v3 includes new results on data efficiency and attention maps |
Breaking Imitation Bottlenecks: Reinforced Diffusion Powers Diverse Trajectory Generation | 2025-07-05 | ShowMost end-to-end autonomous driving methods rely on imitation learning from single expert demonstrations, often leading to conservative and homogeneous behaviors that limit generalization in complex real-world scenarios. In this work, we propose DIVER, an end-to-end driving framework that integrates reinforcement learning with diffusion-based generation to produce diverse and feasible trajectories. At the core of DIVER lies a reinforced diffusion-based generation mechanism. First, the model conditions on map elements and surrounding agents to generate multiple reference trajectories from a single ground-truth trajectory, alleviating the limitations of imitation learning that arise from relying solely on single expert demonstrations. Second, reinforcement learning is employed to guide the diffusion process, where reward-based supervision enforces safety and diversity constraints on the generated trajectories, thereby enhancing their practicality and generalization capability. Furthermore, to address the limitations of L2-based open-loop metrics in capturing trajectory diversity, we propose a novel Diversity metric to evaluate the diversity of multi-mode predictions.Extensive experiments on the closed-loop NAVSIM and Bench2Drive benchmarks, as well as the open-loop nuScenes dataset, demonstrate that DIVER significantly improves trajectory diversity, effectively addressing the mode collapse problem inherent in imitation learning. |
16 pages, 6 figures |
Robust and Modular Multi-Limb Synchronization in Motion Stack for Space Robots with Trajectory Clamping via Hypersphere | 2025-07-05 | ShowModular robotics holds immense potential for space exploration, where reliability, repairability, and reusability are critical for cost-effective missions. Coordination between heterogeneous units is paramount for precision tasks -- whether in manipulation, legged locomotion, or multi-robot interaction. Such modular systems introduce challenges far exceeding those in monolithic robot architectures. This study presents a robust method for synchronizing the trajectories of multiple heterogeneous actuators, adapting dynamically to system variations with minimal system knowledge. This design makes it inherently robot-agnostic, thus highly suited for modularity. To ensure smooth trajectory adherence, the multidimensional state is constrained within a hypersphere representing the allowable deviation. The distance metric can be adapted hence, depending on the task and system under control, deformation of the constraint region is possible. This approach is compatible with a wide range of robotic platforms and serves as a core interface for Motion-Stack, our new open-source universal framework for limb coordination (available at https://github.com/2lian/Motion-Stack ). The method is validated by synchronizing the end-effectors of six highly heterogeneous robotic limbs, evaluating both trajectory adherence and recovery from significant external disturbances. |
6 pag...6 pages, 15 figures. Accepted at IROS 2025 |
Label-Free Long-Horizon 3D UAV Trajectory Prediction via Motion-Aligned RGB and Event Cues | 2025-07-04 | ShowThe widespread use of consumer drones has introduced serious challenges for airspace security and public safety. Their high agility and unpredictable motion make drones difficult to track and intercept. While existing methods focus on detecting current positions, many counter-drone strategies rely on forecasting future trajectories and thus require more than reactive detection to be effective. To address this critical gap, we propose an unsupervised vision-based method for predicting the three-dimensional trajectories of drones. Our approach first uses an unsupervised technique to extract drone trajectories from raw LiDAR point clouds, then aligns these trajectories with camera images through motion consistency to generate reliable pseudo-labels. We then combine kinematic estimation with a visual Mamba neural network in a self-supervised manner to predict future drone trajectories. We evaluate our method on the challenging MMAUD dataset, including the V2 sequences that feature wide-field-of-view multimodal sensors and dynamic UAV motion in urban scenes. Extensive experiments show that our framework outperforms supervised image-only and audio-visual baselines in long-horizon trajectory prediction, reducing 5-second 3D error by around 40 percent without using any manual 3D labels. The proposed system offers a cost-effective, scalable alternative for real-time counter-drone deployment. All code will be released upon acceptance to support reproducible research in the robotics community. |
|
Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control | 2025-07-04 | ShowRecent advances in video diffusion models have demonstrated strong potential for generating robotic decision-making data, with trajectory conditions further enabling fine-grained control. However, existing trajectory-based methods primarily focus on individual object motion and struggle to capture multi-object interaction crucial in complex robotic manipulation. This limitation arises from multi-feature entanglement in overlapping regions, which leads to degraded visual fidelity. To address this, we present RoboMaster, a novel framework that models inter-object dynamics through a collaborative trajectory formulation. Unlike prior methods that decompose objects, our core is to decompose the interaction process into three sub-stages: pre-interaction, interaction, and post-interaction. Each stage is modeled using the feature of the dominant object, specifically the robotic arm in the pre- and post-interaction phases and the manipulated object during interaction, thereby mitigating the drawback of multi-object feature fusion present during interaction in prior work. To further ensure subject semantic consistency throughout the video, we incorporate appearance- and shape-aware latent representations for objects. Extensive experiments on the challenging Bridge V2 dataset, as well as in-the-wild evaluation, demonstrate that our method outperforms existing approaches, establishing new state-of-the-art performance in trajectory-controlled video generation for robotic manipulation. |
Proje...Project Page: https://fuxiao0719.github.io/projects/robomaster/ Code: https://github.com/KwaiVGI/RoboMaster |
3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation | 2025-07-04 | ShowThis paper aims to manipulate multi-entity 3D motions in video generation. Previous methods on controllable video generation primarily leverage 2D control signals to manipulate object motions and have achieved remarkable synthesis results. However, 2D control signals are inherently limited in expressing the 3D nature of object motions. To overcome this problem, we introduce 3DTrajMaster, a robust controller that regulates multi-entity dynamics in 3D space, given user-desired 6DoF pose (location and rotation) sequences of entities. At the core of our approach is a plug-and-play 3D-motion grounded object injector that fuses multiple input entities with their respective 3D trajectories through a gated self-attention mechanism. In addition, we exploit an injector architecture to preserve the video diffusion prior, which is crucial for generalization ability. To mitigate video quality degradation, we introduce a domain adaptor during training and employ an annealed sampling strategy during inference. To address the lack of suitable training data, we construct a 360-Motion Dataset, which first correlates collected 3D human and animal assets with GPT-generated trajectory and then captures their motion with 12 evenly-surround cameras on diverse 3D UE platforms. Extensive experiments show that 3DTrajMaster sets a new state-of-the-art in both accuracy and generalization for controlling multi-entity 3D motions. Project page: http://fuxiao0719.github.io/projects/3dtrajmaster |
ICLR ...ICLR 2025. Project Page & Code & Data: http://fuxiao0719.github.io/projects/3dtrajmaster |
Coherent Track Before Detect: Detection via simultaneous trajectory estimation and long time integration | 2025-07-03 | ShowIn this work, we consider the detection of manoeuvring small objects with radars. Such objects induce low signal to noise ratio (SNR) reflections in the received signal. We consider both co-located and separated transmitter/receiver pairs, i.e., mono-static and bi-static configurations, respectively, as well as multi-static settings involving both types. We propose coherent track before detect: A detection approach which is capable of coherently integrating these reflections within a coherent processing interval (CPI) in all these configurations and continuing integration for an arbitrarily long time across consecutive CPIs. {We estimate the complex value of the reflection coefficients for integration while simultaneously estimating the object trajectory. Compounded with these computations is the estimation of the unknown time reference shift of the separated transmitters necessary for coherent processing.} Detection is made by using the resulting integration value in a Neyman-Pearson test against a constant false alarm rate threshold. We demonstrate the efficacy of our approach in a simulation example with a very low SNR object which cannot be detected with conventional techniques. |
This ...This article is based on Chapter 3 in "Reliable detection and characterisation of dim targets via track-before-detect", a PhD Thesis by Kimin Kim, The University of Edinburgh (https://era.ed.ac.uk/bitstream/handle/1842/38091/Kim2021.pdf?sequence=1&isAllowed=y) |
Control Synthesis Along Uncertain Trajectories Using Integral Quadratic Constraints | 2025-07-03 | ShowThe paper presents a novel approach to synthesize robust controllers for nonlinear systems along perturbed trajectories. The approach linearizes the system with respect to a reference trajectory. In contrast to existing methods rooted in robust linear time-varying synthesis, the approach accurately includes perturbations that drive the system away from the reference trajectory. Hence, the controller obtained in the linear framework provides a significantly more robust nonlinear performance. The calculation of the controller is derived from robust synthesis approaches rooted in the integral quadratic constraints framework. The feasibility of the approach is demonstrated on a pitch tracker design for a space launcher. |
Accep...Accepted for presentation at American Control Conference 2025 |
BERT4Traj: Transformer Based Trajectory Reconstruction for Sparse Mobility Data | 2025-07-03 | ShowUnderstanding human mobility is essential for applications in public health, transportation, and urban planning. However, mobility data often suffers from sparsity due to limitations in data collection methods, such as infrequent GPS sampling or call detail record (CDR) data that only capture locations during communication events. To address this challenge, we propose BERT4Traj, a transformer based model that reconstructs complete mobility trajectories by predicting hidden visits in sparse movement sequences. Inspired by BERT's masked language modeling objective and self_attention mechanisms, BERT4Traj leverages spatial embeddings, temporal embeddings, and contextual background features such as demographics and anchor points. We evaluate BERT4Traj on real world CDR and GPS datasets collected in Kampala, Uganda, demonstrating that our approach significantly outperforms traditional models such as Markov Chains, KNN, RNNs, and LSTMs. Our results show that BERT4Traj effectively reconstructs detailed and continuous mobility trajectories, enhancing insights into human movement patterns. |
This ...This paper was accepted at GIScience 2025 |
Trajectory Optimization for Differential Drive Mobile Manipulators via Topological Paths Search and Arc Length-Yaw Parameterization | 2025-07-03 | ShowWe present an efficient hierarchical motion planning pipeline for differential drive mobile manipulators. Our approach first searches for multiple collisionfree and topologically distinct paths for the mobile base to extract the space in which optimal solutions may exist. Further sampling and optimization are then conducted in parallel to explore feasible whole-body trajectories. For trajectory optimization, we employ polynomial trajectories and arc length-yaw parameterization, enabling efficient handling of the nonholonomic dynamics while ensuring optimality. |
Technical Report |
LATTE: Latent Trajectory Embedding for Diffusion-Generated Image Detection | 2025-07-03 | ShowThe rapid advancement of diffusion-based image generators has made it increasingly difficult to distinguish generated from real images. This can erode trust in digital media, making it critical to develop generalizable detectors for generated images. Recent methods leverage diffusion denoising cues, but mainly focus on single-step reconstruction errors, ignoring the inherent sequential nature of the denoising process. In this work, we propose LATTE - Latent Trajectory Embedding - a novel approach that models the evolution of latent embeddings across several denoising timesteps. By modeling the trajectory of such embeddings rather than single-step errors, LATTE captures subtle, discriminative patterns that distinguish real from generated images. Each latent is refined by employing our latent-visual feature refinement module and aggregated into a unified representation. Afterwards, it is fused with the visual features and finally passed into a lightweight classifier. Our experiments demonstrate that LATTE surpasses the baselines on several established benchmarks, such as GenImage and DiffusionFake. Moreover, it demonstrates strong performance in cross-generator and cross-datasets settings, highlighting the potential of using the trajectory of latent embeddings for generated image detection. The code is available on the following link: https://github.com/AnaMVasilcoiu/LATTE-Diffusion-Detector. |
10 pa...10 pages, 6 figures, submitted to NeurIPS 2025, includes benchmark evaluations on GenImage and Diffusion Forensics |
Improving Consistency in Vehicle Trajectory Prediction Through Preference Optimization | 2025-07-03 | ShowTrajectory prediction is an essential step in the pipeline of an autonomous vehicle. Inaccurate or inconsistent predictions regarding the movement of agents in its surroundings lead to poorly planned maneuvers and potentially dangerous situations for the end-user. Current state-of-the-art deep-learning-based trajectory prediction models can achieve excellent accuracy on public datasets. However, when used in more complex, interactive scenarios, they often fail to capture important interdependencies between agents, leading to inconsistent predictions among agents in the traffic scene. Inspired by the efficacy of incorporating human preference into large language models, this work fine-tunes trajectory prediction models in multi-agent settings using preference optimization. By taking as input automatically calculated preference rankings among predicted futures in the fine-tuning process, our experiments--using state-of-the-art models on three separate datasets--show that we are able to significantly improve scene consistency while minimally sacrificing trajectory prediction accuracy and without adding any excess computational requirements at inference time. |
Accep...Accepted for publication at ITSC 2025 |
Towards Bio-Inspired Robotic Trajectory Planning via Self-Supervised RNN | 2025-07-02 | ShowTrajectory planning in robotics is understood as generating a sequence of joint configurations that will lead a robotic agent, or its manipulator, from an initial state to the desired final state, thus completing a manipulation task while considering constraints like robot kinematics and the environment. Typically, this is achieved via sampling-based planners, which are computationally intensive. Recent advances demonstrate that trajectory planning can also be performed by supervised sequence learning of trajectories, often requiring only a single or fixed number of passes through a neural architecture, thus ensuring a bounded computation time. Such fully supervised approaches, however, perform imitation learning; they do not learn based on whether the trajectories can successfully reach a goal, but try to reproduce observed trajectories. In our work, we build on this approach and propose a cognitively inspired self-supervised learning scheme based on a recurrent architecture for building a trajectory model. We evaluate the feasibility of the proposed method on a task of kinematic planning for a robotic arm. The results suggest that the model is able to learn to generate trajectories only using given paired forward and inverse kinematics models, and indicate that this novel method could facilitate planning for more complex manipulation tasks requiring adaptive solutions. |
12 pa...12 pages, 4 figures, 2 tables. To be published in 2025 International Conference on Artificial Neural Networks (ICANN) proceedings. This research was funded by the Horizon Europe project TERAIS, GA no. 101079338, and in part by the Slovak Grant Agency for Science (VEGA), project 1/0373/23 |
Tenure and Research Trajectories | 2025-07-02 | ShowTenure is a cornerstone of the US academic system, yet its relationship to faculty research trajectories remains poorly understood. Conceptually, tenure systems may act as a selection mechanism, screening in high-output researchers; a dynamic incentive mechanism, encouraging high output prior to tenure but low output after tenure; and a creative search mechanism, encouraging tenured individuals to undertake high-risk work. Here, we integrate data from seven different sources to trace US tenure-line faculty and their research outputs at an unprecedented scale and scope, covering over 12,000 researchers across 15 disciplines. Our analysis reveals that faculty publication rates typically increase sharply during the tenure track and peak just before obtaining tenure. Post-tenure trends, however, vary across disciplines: in lab-based fields, such as biology and chemistry, research output typically remains high post-tenure, whereas in non-lab-based fields, such as mathematics and sociology, research output typically declines substantially post-tenure. Turning to creative search, faculty increasingly produce novel, high-risk research after securing tenure. However, this shift toward novelty and risk-taking comes with a decline in impact, with post-tenure research yielding fewer highly cited papers. Comparing outcomes across common career ages but different tenure years or comparing research trajectories in tenure-based and non-tenure-based research settings underscores that breaks in the research trajectories are sharply tied to the individual's tenure year. Overall, these findings provide a new empirical basis for understanding the tenure system, individual research trajectories, and the shape of scientific output. |
|
Multi-Revolution Low-Thrust Trajectory Optimization With Very Sparse Mesh Pseudospectral Method | 2025-07-02 | ShowMulti-revolution low-thrust trajectory optimization problems are important and challenging in space mission design. In this paper, an efficient, accurate, and widely applicable pseudospectral method is proposed to solve multi-revolution low-thrust trajectory optimization problems with various objective functions and perturbations. The method is based on the Sundman transformation and pseudospectral method, together with a sparse mesh that is monotonic, near-uniformly spaced, and uniformly scattered on the unit circle. Two methods are proposed to construct the mesh: a deterministic method based on rotation mapping; a stochastic method utilizing autocorrelated random sequences. Core mechanisms ensuring the correctness of the method are analyzed, including the dual roles of mesh points as both integration points in the temporal domain and sampling points in the angular domain, the slow dynamics of the system excluding the fast angle variable, and the nearly commutative vector fields generated by applying different control inputs. The method is demonstrated through a multi-revolution low-thrust orbital rendezvous problem. Results show that the proposed method achieves high accuracy with only a few seconds of computational time for challenging problems. |
|
LANet: A Lane Boundaries-Aware Approach For Robust Trajectory Prediction | 2025-07-02 | ShowAccurate motion forecasting is critical for safe and efficient autonomous driving, enabling vehicles to predict future trajectories and make informed decisions in complex traffic scenarios. Most of the current designs of motion prediction models are based on the major representation of lane centerlines, which limits their capability to capture critical road environments and traffic rules and constraints. In this work, we propose an enhanced motion forecasting model informed by multiple vector map elements, including lane boundaries and road edges, that facilitates a richer and more complete representation of driving environments. An effective feature fusion strategy is developed to merge information in different vector map components, where the model learns holistic information on road structures and their interactions with agents. Since encoding more information about the road environment increases memory usage and is computationally expensive, we developed an effective pruning mechanism that filters the most relevant map connections to the target agent, ensuring computational efficiency while maintaining essential spatial and semantic relationships for accurate trajectory prediction. Overcoming the limitations of lane centerline-based models, our method provides a more informative and efficient representation of the driving environment and advances the state of the art for autonomous vehicle motion forecasting. We verify our approach with extensive experiments on the Argoverse 2 motion forecasting dataset, where our method maintains competitiveness on AV2 while achieving improved performance. Index Terms-Autonomous driving, trajectory prediction, vector map elements, road topology, connection pruning, Argoverse 2. |
Accep...Accepted at the 17th IEEE International Conference on Advanced Computational Intelligence (ICACI 2025) |
clustra: A multi-platform k-means clustering algorithm for analysis of longitudinal trajectories in large electronic health records data | 2025-07-01 | ShowBackground and Objective: Variables collected over time, or longitudinally, such as biologic measurements in electronic health records data, are not simple to summarize with a single time-point, and thus can be more holistically conceptualized as trajectories over time. Cluster analysis with longitudinal data further allows for clinical representation of groups of subjects with similar trajectories and identification of unique characteristics, or phenotypes, that can be investigated as risk factors or disease outcomes. Some of the challenges in estimating these clustered trajectories lie in the handling of observations at inconsistent time intervals and the usability of algorithms across programming languages. Methods: We propose longitudinal trajectory clustering using a k-means algorithm with thin-plate regression splines, implemented across multiple platforms, the R package clustra and corresponding \SAS macros. The \SAS macros accommodate flexible clustering approaches, and also include visualization of the clusters, and silhouette plots for diagnostic evaluation of the appropriate cluster number. The R package, designed in parallel, has similar functionality, with additional multi-core processing and Rand-index-based diagnostics. Results: The package and macros achieve comparable results when applied to an example of simulated blood pressure measurements based on real data from Veterans Affairs Healthcare recipients who were initiated on anti-hypertensive medication. Conclusion: The R package clustra and the SAS macros integrate a K-means clustering algorithm for longitudinal trajectories that operates with large electronic health record data. The implementations provide comparable results in both platforms, satisfying the needs of investigators familiar with, or constrained by access to, one or the other platform. |
15 pa...15 pages, 11 figures, clustra package available in https://cran.r-project.org/web/packages/clustra/index.html, SAS macros available in https://github.com/MVP-CHAMPION/clustra-SAS |
Flatness-based Finite-Horizon Multi-UAV Formation Trajectory Planning and Directionally Aware Collision Avoidance Tracking | 2025-07-01 | ShowOptimal collision-free formation control of the unmanned aerial vehicle (UAV) is a challenge. The state-of-the-art optimal control approaches often rely on numerical methods sensitive to initial guesses. This paper presents an innovative collision-free finite-time formation control scheme for multiple UAVs leveraging the differential flatness of the UAV dynamics, eliminating the need for numerical methods. We formulate a finite-time optimal control problem to plan a formation trajectory for feasible initial states. This optimal control problem in formation trajectory planning involves a collective performance index to meet the formation requirements to achieve relative positions and velocity consensus. It is solved by applying Pontryagin's principle. Subsequently, a collision-constrained regulating problem is addressed to ensure collision-free tracking of the planned formation trajectory. The tracking problem incorporates a directionally aware collision avoidance strategy that prioritizes avoiding UAVs in the forward path and relative approach. It assigns lower priority to those on the sides with an oblique relative approach, disregarding UAVs behind and not in the relative approach. The high-fidelity simulation results validate the effectiveness of the proposed control scheme. |
Accep...Accepted for Journal of the Franklin Institute |
Adaptive Modified RISE Control for Quadrotors: Enhancing Trajectory Tracking Through Uncertainty Compensation | 2025-07-01 | ShowThis paper presents an adaptive modified Robust Inverse of Signum Error (AM-RISE) control method, which achieves reliable trajectory tracking control for a quadrotor unmanned aerial vehicle. The proposed method systematically accounts for gyroscopic effects, rotor dynamics, parametric uncertainties, and external disturbances, ensuring robust performance across varying trajectory speeds. Through novel mathematical manipulation in the error system development, the quadrotor dynamics are expressed in a control-oriented form, which explicitly incorporates the uncertainty in the gyroscopic term and control actuation term. An adaptive modified RISE law is then designed to stabilize both the position and attitude loops of the quadrotor system. A rigorous Lyapunov-based analysis is utilized to prove asymptotic trajectory tracking, where the region of convergence can be made arbitrarily large through judicious control gain selection. Moreover, the stability analysis formally addresses gyroscopic effects and actuator uncertainty. To illustrate the performance of the control law, comparative numerical simulation results are provided, which demonstrate the improved closed-loop performance achieved under varying levels of parametric uncertainty, disturbance magnitudes and trajectory speeds. |
This ...This paper is under review |
Generating and Customizing Robotic Arm Trajectories using Neural Networks | 2025-06-30 | ShowWe introduce a neural network approach for generating and customizing the trajectory of a robotic arm, that guarantees precision and repeatability. To highlight the potential of this novel method, we describe the design and implementation of the technique and show its application in an experimental setting of cognitive robotics. In this scenario, the NICO robot was characterized by the ability to point to specific points in space with precise linear movements, increasing the predictability of the robotic action during its interaction with humans. To achieve this goal, the neural network computes the forward kinematics of the robot arm. By integrating it with a generator of joint angles, another neural network was developed and trained on an artificial dataset created from suitable start and end poses of the robotic arm. Through the computation of angular velocities, the robot was characterized by its ability to perform the movement, and the quality of its action was evaluated in terms of shape and accuracy. Thanks to its broad applicability, our approach successfully generates precise trajectories that could be customized in their shape and adapted to different settings. |
The c...The code is released at https://github.com/andylucny/nico2/tree/main/generate |
SICNav-Diffusion: Safe and Interactive Crowd Navigation with Diffusion Trajectory Predictions | 2025-06-30 | ShowTo navigate crowds without collisions, robots must interact with humans by forecasting their future motion and reacting accordingly. While learning-based prediction models have shown success in generating likely human trajectory predictions, integrating these stochastic models into a robot controller presents several challenges. The controller needs to account for interactive coupling between planned robot motion and human predictions while ensuring both predictions and robot actions are safe (i.e. collision-free). To address these challenges, we present a receding horizon crowd navigation method for single-robot multi-human environments. We first propose a diffusion model to generate joint trajectory predictions for all humans in the scene. We then incorporate these multi-modal predictions into a SICNav Bilevel MPC problem that simultaneously solves for a robot plan (upper-level) and acts as a safety filter to refine the predictions for non-collision (lower-level). Combining planning and prediction refinement into one bilevel problem ensures that the robot plan and human predictions are coupled. We validate the open-loop trajectory prediction performance of our diffusion model on the commonly used ETH/UCY benchmark and evaluate the closed-loop performance of our robot navigation method in simulation and extensive real-robot experiments demonstrating safe, efficient, and reactive robot motion. |
Accep...Accepted to IEEE Robotics and Automation Letters (RA-L) |
Predictive Risk Analysis and Safe Trajectory Planning for Intelligent and Connected Vehicles | 2025-06-30 | ShowThe safe trajectory planning of intelligent and connected vehicles is a key component in autonomous driving technology. Modeling the environment risk information by field is a promising and effective approach for safe trajectory planning. However, existing risk assessment theories only analyze the risk by current information, ignoring future prediction. This paper proposes a predictive risk analysis and safe trajectory planning framework for intelligent and connected vehicles. This framework first predicts future trajectories of objects by a local risk-aware algorithm, following with a spatiotemporal-discretised predictive risk analysis using the prediction results. Then the safe trajectory is generated based on the predictive risk analysis. Finally, simulation and vehicle experiments confirm the efficacy and real-time practicability of our approach. |
|
Intrinsic Dimensionality of Fermi-Pasta-Ulam-Tsingou High-Dimensional Trajectories Through Manifold Learning: A Linear Approach | 2025-06-30 | ShowA data-driven approach based on unsupervised machine learning is proposed to infer the intrinsic dimension |
15 pages, 15 figures |
Consistency Trajectory Matching for One-Step Generative Super-Resolution | 2025-06-30 | ShowCurrent diffusion-based super-resolution (SR) approaches achieve commendable performance at the cost of high inference overhead. Therefore, distillation techniques are utilized to accelerate the multi-step teacher model into one-step student model. Nevertheless, these methods significantly raise training costs and constrain the performance of the student model by the teacher model. To overcome these tough challenges, we propose Consistency Trajectory Matching for Super-Resolution (CTMSR), a distillation-free strategy that is able to generate photo-realistic SR results in one step. Concretely, we first formulate a Probability Flow Ordinary Differential Equation (PF-ODE) trajectory to establish a deterministic mapping from low-resolution (LR) images with noise to high-resolution (HR) images. Then we apply the Consistency Training (CT) strategy to directly learn the mapping in one step, eliminating the necessity of pre-trained diffusion model. To further enhance the performance and better leverage the ground-truth during the training process, we aim to align the distribution of SR results more closely with that of the natural images. To this end, we propose to minimize the discrepancy between their respective PF-ODE trajectories from the LR image distribution by our meticulously designed Distribution Trajectory Matching (DTM) loss, resulting in improved realism of our recovered HR images. Comprehensive experimental results demonstrate that the proposed methods can attain comparable or even superior capabilities on both synthetic and real datasets while maintaining minimal inference latency. |
|
Joint Trajectory and Resource Optimization for HAPs-SAR Systems with Energy-Aware Constraints | 2025-06-29 | ShowThis paper investigates the joint optimization of trajectory planning and resource allocation for a high-altitude platform stations synthetic aperture radar (HAPs-SAR) system. To support real-time sensing and conserve the limited energy budget of the HAPs, the proposed framework assumes that the acquired radar data are transmitted in real time to a ground base station for SAR image reconstruction. A dynamic trajectory model is developed, and the power consumption associated with radar sensing, data transmission, and circular flight is comprehensively analyzed. In addition, solar energy harvesting is considered to enhance system sustainability. An energy-aware mixed-integer nonlinear programming (MINLP) problem is formulated to maximize radar beam coverage while satisfying operational constraints. To solve this challenging problem, a sub-optimal successive convex approximation (SCA)-based framework is proposed, incorporating iterative optimization and finite search. Simulation results validate the convergence of the proposed algorithm and demonstrate its effectiveness in balancing SAR performance, communication reliability, and energy efficiency. A final SAR imaging simulation on a 9-target lattice scenario further confirms the practical feasibility of the proposed solution. |
|
Mode Collapse Happens: Evaluating Critical Interactions in Joint Trajectory Prediction Models | 2025-06-29 | ShowAutonomous Vehicle decisions rely on multimodal prediction models that account for multiple route options and the inherent uncertainty in human behavior. However, models can suffer from mode collapse, where only the most likely mode is predicted, posing significant safety risks. While existing methods employ various strategies to generate diverse predictions, they often overlook the diversity in interaction modes among agents. Additionally, traditional metrics for evaluating prediction models are dataset-dependent and do not evaluate inter-agent interactions quantitatively. To our knowledge, none of the existing metrics explicitly evaluates mode collapse. In this paper, we propose a novel evaluation framework that assesses mode collapse in joint trajectory predictions, focusing on safety-critical interactions. We introduce metrics for mode collapse, mode correctness, and coverage, emphasizing the sequential dimension of predictions. By testing four multi-agent trajectory prediction models, we demonstrate that mode collapse indeed happens. When looking at the sequential dimension, although prediction accuracy improves closer to interaction events, there are still cases where the models are unable to predict the correct interaction mode, even just before the interaction mode becomes inevitable. We hope that our framework can help researchers gain new insights and advance the development of more consistent and accurate prediction models, thus enhancing the safety of autonomous driving systems. |
12 pa...12 pages, 8 figures, submitted to a journal |
Interpretable Interaction Modeling for Trajectory Prediction via Agent Selection and Physical Coefficient | 2025-06-28 | ShowA thorough understanding of the interaction between the target agent and surrounding agents is a prerequisite for accurate trajectory prediction. Although many methods have been explored, they assign correlation coefficients to surrounding agents in a purely learning-based manner. In this study, we present ASPILin, which manually selects interacting agents and replaces the attention scores in Transformer with a newly computed physical correlation coefficient, enhancing the interpretability of interaction modeling. Surprisingly, these simple modifications can significantly improve prediction performance and substantially reduce computational costs. We intentionally simplified our model in other aspects, such as map encoding. Remarkably, experiments conducted on the INTERACTION, highD, and CitySim datasets demonstrate that our method is efficient and straightforward, outperforming other state-of-the-art methods. |
Accep...Accepted by International Conference on Intelligent Robots and Systems (IROS 2025) |
Aircraft Trajectory Dataset Augmentation in Latent Space | 2025-06-28 | ShowAircraft trajectory modeling plays a crucial role in Air Traffic Management (ATM) and is important for various downstream tasks, including conflict detection and landing time prediction. Dataset augmentation through the addition of synthetically generated trajectory data is necessary to develop a more robust aircraft trajectory model and ensure that the trajectory dataset is sufficient and balanced. In this work, we propose a novel framework called ATRADA for aircraft trajectory dataset augmentation. In the proposed framework, a Transformer encoder learns the underlying patterns in the original trajectory dataset and converts each data point into a context vector in the learned latent space. The converted dataset in the latent space is projected into reduced dimensions using principal component analysis (PCA), and a Gaussian mixture model (GMM) is applied to fit the probability distribution of the data points in the reduced-dimensional space. Finally, new samples are drawn from the fitted GMM, the dimension of the samples is reverted to the original dimension, and they are decoded with a Multi-Layer Perceptron (MLP). Several experiments demonstrate that the framework effectively generates new, high-quality synthetic aircraft trajectory data, which were compared to the results of several baselines. |
|
Global Optimization of Multi-Flyby Trajectories for Multi-Orbital-Plane Constellations Inspection | 2025-06-28 | ShowThe rapid expansion of mega-constellations in low Earth orbits has posed significant challenges to space traffic management, necessitating periodic inspections of satellites to ensure the sustainability of the space environment when economically feasible. This study addresses the orbital design challenge associated with inspecting numerous satellites distributed across multiple orbital planes through flybys by proposing an innovative orbital-plane-based inspection strategy. The proposed methodology reformulates the multi-satellite flyby problem into a multi-rendezvous trajectory planning problem by proposing an analytical approach to determine a maneuver-free inspection orbit that enables flyby of all satellites within a specific orbital plane. Additionally, a three-layer global optimization framework is developed to tackle this problem. The first layer establishes an approximate cost evaluation model for orbital plane visitation sequences, utilizing a genetic algorithm to identify the optimal sequence from a vast array of candidate planes, thereby maximizing inspection targets while minimizing fuel consumption. The second layer constructs a mixed-integer programming model to locally refine the rendezvous epochs and orbital parameters of each inspection orbit to reduce the total velocity increment. The third layer accurately computes the optimal impulsive maneuvers and trajectories between inspection orbits. In contrast to traditional low-Earth orbit rendezvous optimization frameworks, the proposed framework fully leverages the adjustable freedom in inclination and right ascension of the ascending node (RAAN) of inspection orbits, significantly reducing the total velocity increment. Simulation results demonstrate that the proposed method can effectively address the trajectory optimization problem associated with constellation inspection for tens of thousands of satellites. |
|
Kill Two Birds with One Stone! Trajectory enabled Unified Online Detection of Adversarial Examples and Backdoor Attacks | 2025-06-28 | ShowThe proposed UniGuard is the first unified online detection framework capable of simultaneously addressing adversarial examples and backdoor attacks. UniGuard builds upon two key insights: first, both AE and backdoor attacks have to compromise the inference phase, making it possible to tackle them simultaneously during run-time via online detection. Second, an adversarial input, whether a perturbed sample in AE attacks or a trigger-carrying sample in backdoor attacks, exhibits distinctive trajectory signatures from a benign sample as it propagates through the layers of a DL model in forward inference. The propagation trajectory of the adversarial sample must deviate from that of its benign counterpart; otherwise, the adversarial objective cannot be fulfilled. Detecting these trajectory signatures is inherently challenging due to their subtlety; UniGuard overcomes this by treating the propagation trajectory as a time-series signal, leveraging LSTM and spectrum transformation to amplify differences between adversarial and benign trajectories that are subtle in the time domain. UniGuard exceptional efficiency and effectiveness have been extensively validated across various modalities (image, text, and audio) and tasks (classification and regression), ranging from diverse model architectures against a wide range of AE attacks and backdoor attacks, including challenging partial backdoors and dynamic triggers. When compared to SOTA methods, including ContraNet (NDSS 22) specific for AE detection and TED (IEEE SP 24) specific for backdoor detection, UniGuard consistently demonstrates superior performance, even when matched against each method's strengths in addressing their respective threats-each SOTA fails to parts of attack strategies while UniGuard succeeds for all. |
|
Economic Model Predictive Control with a Non-Fixed Reference Trajectory for Optimal Microgrid Dispatch | 2025-06-27 | ShowEconomic Model Predictive Control (EMPC), instead of stabilizing a reference trajectory/state in the objective function like a Tracking MPC, optimizes the economic performance over the prediction horizon, making it attractive for economical microgrid (MG) dispatch. However, the demand charge component in the monthly electricity cost, make it difficult to be encapsulated in additive stage costs, and can make solutions violate the principle of optimality if naively introduced in the objective function. Moreover, previous EMPC based works mostly rely on a-priori knowledge of an optimal economic steady state or optimal periodic trajectory for performance guarantees, which are not useful or possibly don't exist respectively, for real-time economical MG dispatch where load/generation forecasts are known only 24-48 h in advance. This paper, first, proposes an EMPC formulation for a generic deterministic discrete non-linear time varying system with hard state and input constraints, without any a-priori requirements of an optimal economic steady state or optimal periodic trajectory. It is proved that under mild assumptions on terminal cost and region, the asymptotic average economic cost of the proposed method is no worse than the asymptotic average economic cost of any other non-fixed arbitrary reference trajectory which is known only until the current time-step. The EMPC framework is then leveraged for optimal MG dispatch by showing that the problem can be reformulated to satisfy the assumptions required for the asymptotic performance guarantee. Realistic simulations at the Port of San Diego MG demonstrated that the proposed method can also reduce monthly electricity costs in closed-loop with respect to reference trajectories generated by directly optimizing the electricity cost function over the prediction horizon or by tracking an ideal grid import curve in a majority of the cases. |
18 pa...18 pages, 4 tables, Manuscript under review |
Pedestrian Intention and Trajectory Prediction in Unstructured Traffic Using IDD-PeD | 2025-06-27 | ShowWith the rapid advancements in autonomous driving, accurately predicting pedestrian behavior has become essential for ensuring safety in complex and unpredictable traffic conditions. The growing interest in this challenge highlights the need for comprehensive datasets that capture unstructured environments, enabling the development of more robust prediction models to enhance pedestrian safety and vehicle navigation. In this paper, we introduce an Indian driving pedestrian dataset designed to address the complexities of modeling pedestrian behavior in unstructured environments, such as illumination changes, occlusion of pedestrians, unsignalized scene types and vehicle-pedestrian interactions. The dataset provides high-level and detailed low-level comprehensive annotations focused on pedestrians requiring the ego-vehicle's attention. Evaluation of the state-of-the-art intention prediction methods on our dataset shows a significant performance drop of up to |
|
TrajFlow: Learning Distributions over Trajectories for Human Behavior Prediction | 2025-06-27 | ShowPredicting the future behavior of human road users is an important aspect for the development of risk-aware autonomous vehicles. While many models have been developed towards this end, effectively capturing and predicting the variability inherent to human behavior still remains an open challenge. This paper proposes TrajFlow - a new approach for probabilistic trajectory prediction based on Normalizing Flows. We reformulate the problem of capturing distributions over trajectories into capturing distributions over abstracted trajectory features using an autoencoder, simplifying the learning task of the Normalizing Flows. TrajFlow outperforms state-of-the-art behavior prediction models in capturing full trajectory distributions in two synthetic benchmarks with known true distributions, and is competitive on the naturalistic datasets ETH/UCY, rounD, and nuScenes. Our results demonstrate the effectiveness of TrajFlow in probabilistic prediction of human behavior. |
|
Universal Retrieval for Multimodal Trajectory Modeling | 2025-06-27 | ShowTrajectory data, capturing human actions and environmental states across various modalities, holds significant potential for enhancing AI agent capabilities, particularly in GUI environments. However, how to model the representation of trajectory-level data presents a significant challenge that has not been systematically addressed amid explosive trajectory data growth. In this work, we introduce Multimodal Trajectory Retrieval, bridging the gap between universal retrieval and agent-centric trajectory modeling. We construct the Unified Agent Trajectory Dataset (UATD) from annotated demonstrations and states across diverse real-world scenarios. Based on this, we present GAE-Bench, a benchmark containing a large number of trajectory-based retrieval pairs. In addition, we propose GAE-Retriever, a multimodal retrieval framework that adopts vision-language models and incorporates optimized contrastive learning through a token selection and the GradCache mechanism. Comprehensive evaluations across multiple datasets show that GAE-Retriever consistently outperforms strong baselines in retrieval recall, highlighting its effectiveness in advancing multimodal trajectory retrieval. |
18 pa...18 pages, 3 figures, accepted by Workshop on Computer-use Agents @ ICML 2025 |
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis | 2025-06-27 | ShowGraphical User Interface (GUI) agents powered by Vision-Language Models (VLMs) have demonstrated human-like computer control capability. Despite their utility in advancing digital automation, a critical bottleneck persists: collecting high-quality trajectory data for training. Common practices for collecting such data rely on human supervision or synthetic data generation through executing pre-defined tasks, which are either resource-intensive or unable to guarantee data quality. Moreover, these methods suffer from limited data diversity and significant gaps between synthetic data and real-world environments. To address these challenges, we propose OS-Genesis, a novel GUI data synthesis pipeline that reverses the conventional trajectory collection process. Instead of relying on pre-defined tasks, OS-Genesis enables agents first to perceive environments and perform step-wise interactions, then retrospectively derive high-quality tasks to enable trajectory-level exploration. A trajectory reward model is then employed to ensure the quality of the generated trajectories. We demonstrate that training GUI agents with OS-Genesis significantly improves their performance on highly challenging online benchmarks. In-depth analysis further validates OS-Genesis's efficiency and its superior data quality and diversity compared to existing synthesis methods. Our codes, data, and checkpoints are available at https://qiushisun.github.io/OS-Genesis-Home/. |
ACL 2...ACL 2025 Camera Ready |
TROFI: Trajectory-Ranked Offline Inverse Reinforcement Learning | 2025-06-27 | ShowIn offline reinforcement learning, agents are trained using only a fixed set of stored transitions derived from a source policy. However, this requires that the dataset be labeled by a reward function. In applied settings such as video game development, the availability of the reward function is not always guaranteed. This paper proposes Trajectory-Ranked OFfline Inverse reinforcement learning (TROFI), a novel approach to effectively learn a policy offline without a pre-defined reward function. TROFI first learns a reward function from human preferences, which it then uses to label the original dataset making it usable for training the policy. In contrast to other approaches, our method does not require optimal trajectories. Through experiments on the D4RL benchmark we demonstrate that TROFI consistently outperforms baselines and performs comparably to using the ground truth reward to learn policies. Additionally, we validate the efficacy of our method in a 3D game environment. Our studies of the reward model highlight the importance of the reward function in this setting: we show that to ensure the alignment of a value function to the actual future discounted reward, it is fundamental to have a well-engineered and easy-to-learn reward function. |
Publi...Published at Reinforcement Learning and Video Games Workshop at RLC 2025 |
A92E Active Disturbance Rejection Control for Trajectory Tracking of a Seagoing USV: Design, Simulation, and Field Experiments | 2025-06-26 | ShowUnmanned Surface Vessels (USVs) face significant control challenges due to uncertain environmental disturbances like waves and currents. This paper proposes a trajectory tracking controller based on Active Disturbance Rejection Control (ADRC) implemented on the DUS V2500. A custom simulation incorporating realistic waves and current disturbances is developed to validate the controller's performance, supported by further validation through field tests in the harbour of Scheveningen, the Netherlands, and at sea. Simulation results demonstrate that ADRC significantly reduces cross-track error across all tested conditions compared to a baseline PID controller but increases control effort and energy consumption. Field trials confirm these findings while revealing a further increase in energy consumption during sea trials compared to the baseline. |
Accep...Accepted for presentation at IROS 2025. Submitted version |
GoIRL: Graph-Oriented Inverse Reinforcement Learning for Multimodal Trajectory Prediction | 2025-06-26 | ShowTrajectory prediction for surrounding agents is a challenging task in autonomous driving due to its inherent uncertainty and underlying multimodality. Unlike prevailing data-driven methods that primarily rely on supervised learning, in this paper, we introduce a novel Graph-oriented Inverse Reinforcement Learning (GoIRL) framework, which is an IRL-based predictor equipped with vectorized context representations. We develop a feature adaptor to effectively aggregate lane-graph features into grid space, enabling seamless integration with the maximum entropy IRL paradigm to infer the reward distribution and obtain the policy that can be sampled to induce multiple plausible plans. Furthermore, conditioned on the sampled plans, we implement a hierarchical parameterized trajectory generator with a refinement module to enhance prediction accuracy and a probability fusion strategy to boost prediction confidence. Extensive experimental results showcase our approach not only achieves state-of-the-art performance on the large-scale Argoverse & nuScenes motion forecasting benchmarks but also exhibits superior generalization abilities compared to existing supervised models. |
Accep...Accepted by ICML 2025 |
IMPACT: Behavioral Intention-aware Multimodal Trajectory Prediction with Adaptive Context Trimming | 2025-06-26 | ShowWhile most prior research has focused on improving the precision of multimodal trajectory predictions, the explicit modeling of multimodal behavioral intentions (e.g., yielding, overtaking) remains relatively underexplored. This paper proposes a unified framework that jointly predicts both behavioral intentions and trajectories to enhance prediction accuracy, interpretability, and efficiency. Specifically, we employ a shared context encoder for both intention and trajectory predictions, thereby reducing structural redundancy and information loss. Moreover, we address the lack of ground-truth behavioral intention labels in mainstream datasets (Waymo, Argoverse) by auto-labeling these datasets, thus advancing the community's efforts in this direction. We further introduce a vectorized occupancy prediction module that infers the probability of each map polyline being occupied by the target vehicle's future trajectory. By leveraging these intention and occupancy prediction priors, our method conducts dynamic, modality-dependent pruning of irrelevant agents and map polylines in the decoding stage, effectively reducing computational overhead and mitigating noise from non-critical elements. Our approach ranks first among LiDAR-free methods on the Waymo Motion Dataset and achieves first place on the Waymo Interactive Prediction Dataset. Remarkably, even without model ensembling, our single-model framework improves the soft mean average precision (softmAP) by 10 percent compared to the second-best method in the Waymo Interactive Prediction Leaderboard. Furthermore, the proposed framework has been successfully deployed on real vehicles, demonstrating its practical effectiveness in real-world applications. |
under review |
Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery | 2025-06-25 | ShowThis paper presents a framework for extracting georeferenced vehicle trajectories from high-altitude drone imagery, addressing key challenges in urban traffic monitoring and the limitations of traditional ground-based systems. Our approach integrates several novel contributions, including a tailored object detector optimized for high-altitude bird's-eye view perspectives, a unique track stabilization method that uses detected vehicle bounding boxes as exclusion masks during image registration, and an orthophoto and master frame-based georeferencing strategy that enhances consistent alignment across multiple drone viewpoints. Additionally, our framework features robust vehicle dimension estimation and detailed road segmentation, enabling comprehensive traffic analysis. Conducted in the Songdo International Business District, South Korea, the study utilized a multi-drone experiment covering 20 intersections, capturing approximately 12TB of 4K video data over four days. The framework produced two high-quality datasets: the Songdo Traffic dataset, comprising approximately 700,000 unique vehicle trajectories, and the Songdo Vision dataset, containing over 5,000 human-annotated images with about 300,000 vehicle instances in four classes. Comparisons with high-precision sensor data from an instrumented probe vehicle highlight the accuracy and consistency of our extraction pipeline in dense urban environments. The public release of Songdo Traffic and Songdo Vision, and the complete source code for the extraction pipeline, establishes new benchmarks in data quality, reproducibility, and scalability in traffic research. Results demonstrate the potential of integrating drone technology with advanced computer vision for precise and cost-effective urban traffic monitoring, providing valuable resources for developing intelligent transportation systems and enhancing traffic management strategies. |
|
Towards Interpretable and Efficient Feature Selection in Trajectory Datasets: A Taxonomic Approach | 2025-06-25 | ShowTrajectory analysis is not only about obtaining movement data, but it is also of paramount importance in understanding the pattern in which an object moves through space and time, as well as in predicting its next move. Due to the significant interest in the area, data collection has improved substantially, resulting in a large number of features becoming available for training and predicting models. However, this introduces a high-dimensionality-induced feature explosion problem, which reduces the efficiency and interpretability of the data, thereby reducing the accuracy of machine learning models. To overcome this issue, feature selection has become one of the most prevalent tools. Thus, the objective of this paper was to introduce a taxonomy-based feature selection method that categorizes features based on their internal structure. This approach classifies the data into geometric and kinematic features, further categorizing them into curvature, indentation, speed, and acceleration. The comparative analysis indicated that a taxonomy-based approach consistently achieved comparable or superior predictive performance. Furthermore, due to the taxonomic grouping, which reduces combinatorial space, the time taken to select features was drastically reduced. The taxonomy was also used to gain insights into what feature sets each dataset was more sensitive to. Overall, this study provides robust evidence that a taxonomy-based feature selection method can add a layer of interpretability, reduce dimensionality and computational complexity, and contribute to high-level decision-making. It serves as a step toward providing a methodological framework for researchers and practitioners dealing with trajectory datasets and contributing to the broader field of explainable artificial intelligence. |
|
RA-NeRF: Robust Neural Radiance Field Reconstruction with Accurate Camera Pose Estimation under Complex Trajectories | 2025-06-24 | ShowNeural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have emerged as powerful tools for 3D reconstruction and SLAM tasks. However, their performance depends heavily on accurate camera pose priors. Existing approaches attempt to address this issue by introducing external constraints but fall short of achieving satisfactory accuracy, particularly when camera trajectories are complex. In this paper, we propose a novel method, RA-NeRF, capable of predicting highly accurate camera poses even with complex camera trajectories. Following the incremental pipeline, RA-NeRF reconstructs the scene using NeRF with photometric consistency and incorporates flow-driven pose regulation to enhance robustness during initialization and localization. Additionally, RA-NeRF employs an implicit pose filter to capture the camera movement pattern and eliminate the noise for pose estimation. To validate our method, we conduct extensive experiments on the Tanks&Temple dataset for standard evaluation, as well as the NeRFBuster dataset, which presents challenging camera pose trajectories. On both datasets, RA-NeRF achieves state-of-the-art results in both camera pose estimation and visual quality, demonstrating its effectiveness and robustness in scene reconstruction under complex pose trajectories. |
IROS 2025 |
FusionForce: End-to-end Differentiable Neural-Symbolic Layer for Trajectory Prediction | 2025-06-24 | ShowWe propose end-to-end differentiable model that predicts robot trajectories on rough offroad terrain from camera images and/or lidar point clouds. The model integrates a learnable component that predicts robot-terrain interaction forces with a neural-symbolic layer that enforces the laws of classical mechanics and consequently improves generalization on out-of-distribution data. The neural-symbolic layer includes a differentiable physics engine that computes the robot's trajectory by querying these forces at the points of contact with the terrain. As the proposed architecture comprises substantial geometrical and physics priors, the resulting model can also be seen as a learnable physics engine conditioned on real sensor data that delivers |
Code:... |
Pseudo-Kinematic Trajectory Control and Planning of Tracked Vehicles | 2025-06-24 | ShowTracked vehicles distribute their weight continuously over a large surface area (the tracks). This distinctive feature makes them the preferred choice for vehicles required to traverse soft and uneven terrain. From a robotics perspective, however, this flexibility comes at a cost: the complexity of modelling the system and the resulting difficulty in designing theoretically sound navigation solutions. In this paper, we aim to bridge this gap by proposing a framework for the navigation of tracked vehicles, built upon three key pillars. The first pillar comprises two models: a simulation model and a control-oriented model. The simulation model captures the intricate terramechanics dynamics arising from soil-track interaction and is employed to develop faithful digital twins of the system across a wide range of operating conditions. The control-oriented model is pseudo-kinematic and mathematically tractable, enabling the design of efficient and theoretically robust control schemes. The second pillar is a Lyapunov-based feedback trajectory controller that provides certifiable tracking guarantees. The third pillar is a portfolio of motion planning solutions, each offering different complexity-accuracy trade-offs. The various components of the proposed approach are validated through an extensive set of simulation and experimental data. |
|
Path Learning with Trajectory Advantage Regression | 2025-06-24 | ShowIn this paper, we propose trajectory advantage regression, a method of offline path learning and path attribution based on reinforcement learning. The proposed method can be used to solve path optimization problems while algorithmically only solving a regression problem. |
|
Trajectory Prediction in Dynamic Object Tracking: A Critical Study | 2025-06-24 | ShowThis study provides a detailed analysis of current advancements in dynamic object tracking (DOT) and trajectory prediction (TP) methodologies, including their applications and challenges. It covers various approaches, such as feature-based, segmentation-based, estimation-based, and learning-based methods, evaluating their effectiveness, deployment, and limitations in real-world scenarios. The study highlights the significant impact of these technologies in automotive and autonomous vehicles, surveillance and security, healthcare, and industrial automation, contributing to safety and efficiency. Despite the progress, challenges such as improved generalization, computational efficiency, reduced data dependency, and ethical considerations still exist. The study suggests future research directions to address these challenges, emphasizing the importance of multimodal data integration, semantic information fusion, and developing context-aware systems, along with ethical and privacy-preserving frameworks. |
|
Reinforcement learning for efficient and robust multi-setpoint and multi-trajectory tracking in bioprocesses | 2025-06-24 | ShowEfficient and robust bioprocess control is essential for maximizing performance and adaptability in advanced biotechnological systems. In this work, we present a reinforcement-learning framework for multi-setpoint and multi-trajectory tracking. Tracking multiple setpoints and time-varying trajectories in reinforcement learning is challenging due to the complexity of balancing multiple objectives, a difficulty further exacerbated by system uncertainties such as uncertain initial conditions and stochastic dynamics. This challenge is relevant, e.g., in bioprocesses involving microbial consortia, where precise control over population compositions is required. We introduce a novel return function based on multiplicative reciprocal saturation functions, which explicitly couples reward gains to the simultaneous satisfaction of multiple references. Through a case study involving light-mediated cybergenetic growth control in microbial consortia, we demonstrate via computational experiments that our approach achieves faster convergence, improved stability, and superior control compliance compared to conventional quadratic-cost-based return functions. Moreover, our method enables tuning of the saturation function's parameters, shaping the learning process and policy updates. By incorporating system uncertainties, our framework also demonstrates robustness, a key requirement in industrial bioprocessing. Overall, this work advances reinforcement-learning-based control strategies in bioprocess engineering, with implications in the broader field of process and systems engineering. |
|
Augmenting Multi-Agent Communication with State Delta Trajectory | 2025-06-24 | ShowMulti-agent techniques such as role playing or multi-turn debates have been shown to be effective in improving the performance of large language models (LLMs) in downstream tasks. Despite their differences in workflows, existing LLM-based multi-agent systems mostly use natural language for agent communication. While this is appealing for its simplicity and interpretability, it also introduces inevitable information loss as one model must down sample its continuous state vectors to concrete tokens before transferring them to the other model. Such losses are particularly significant when the information to transfer is not simple facts, but reasoning logics or abstractive thoughts. To tackle this problem, we propose a new communication protocol that transfers both natural language tokens and token-wise state transition trajectory from one agent to another. Particularly, compared to the actual state value, we find that the sequence of state changes in LLMs after generating each token can better reflect the information hidden behind the inference process, so we propose a State Delta Encoding (SDE) method to represent state transition trajectories. The experimental results show that multi-agent systems with SDE achieve SOTA performance compared to other communication protocols, particularly in tasks that involve complex reasoning. This shows the potential of communication augmentation for LLM-based multi-agent systems. |
22 pages, 5 figures |
Employing Laban Shape for Generating Emotionally and Functionally Expressive Trajectories in Robotic Manipulators | 2025-06-23 | ShowSuccessful human-robot collaboration depends on cohesive communication and a precise understanding of the robot's abilities, goals, and constraints. While robotic manipulators offer high precision, versatility, and productivity, they exhibit expressionless and monotonous motions that conceal the robot's intention, resulting in a lack of efficiency and transparency with humans. In this work, we use Laban notation, a dance annotation language, to enable robotic manipulators to generate trajectories with functional expressivity, where the robot uses nonverbal cues to communicate its abilities and the likelihood of succeeding at its task. We achieve this by introducing two novel variants of Hesitant expressive motion (Spoke-Like and Arc-Like). We also enhance the emotional expressivity of four existing emotive trajectories (Happy, Sad, Shy, and Angry) by augmenting Laban Effort usage with Laban Shape. The functionally expressive motions are validated via a human-subjects study, where participants equate both variants of Hesitant motion with reduced robot competency. The enhanced emotive trajectories are shown to be viewed as distinct emotions using the Valence-Arousal-Dominance (VAD) spectrum, corroborating the usage of Laban Shape. |
Accep...Accepted for presentation at the 2025 IEEE RO-MAN Conference |
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs | 2025-06-23 | ShowProcess Reward Models (PRMs) have recently emerged as a powerful framework for supervising intermediate reasoning steps in large language models (LLMs). Previous PRMs are primarily trained on model final output responses and struggle to evaluate intermediate thinking trajectories robustly, especially in the emerging setting of trajectory-response outputs generated by frontier reasoning models like Deepseek-R1. In this work, we introduce ReasonFlux-PRM, a novel trajectory-aware PRM explicitly designed to evaluate the trajectory-response type of reasoning traces. ReasonFlux-PRM incorporates both step-level and trajectory-level supervision, enabling fine-grained reward assignment aligned with structured chain-of-thought data. We adapt ReasonFlux-PRM to support reward supervision under both offline and online settings, including (i) selecting high-quality model distillation data for downstream supervised fine-tuning of smaller models, (ii) providing dense process-level rewards for policy optimization during reinforcement learning, and (iii) enabling reward-guided Best-of-N test-time scaling. Empirical results on challenging downstream benchmarks such as AIME, MATH500, and GPQA-Diamond demonstrate that ReasonFlux-PRM-7B selects higher quality data than strong PRMs (e.g., Qwen2.5-Math-PRM-72B) and human-curated baselines. Furthermore, our derived ReasonFlux-PRM-7B yields consistent performance improvements, achieving average gains of 12.1% in supervised fine-tuning, 4.5% in reinforcement learning, and 6.3% in test-time scaling. We also release our efficient ReasonFlux-PRM-1.5B for resource-constrained applications and edge deployment. Projects: https://github.com/Gen-Verse/ReasonFlux |
Codes...Codes and Models: https://github.com/Gen-Verse/ReasonFlux |
Understanding Software Engineering Agents: A Study of Thought-Action-Result Trajectories | 2025-06-23 | ShowLarge Language Model (LLM)-based agents are increasingly employed to automate complex software engineering tasks such as program repair and issue resolution. These agents operate by autonomously generating natural language thoughts, invoking external tools, and iteratively refining their solutions. Despite their widespread adoption, the internal decision-making processes of these agents remain largely unexplored, limiting our understanding of their operational dynamics and failure modes. In this paper, we present a large-scale empirical study of the thought-action-result trajectories of three state-of-the-art LLM-based agents: \textsc{RepairAgent}, \textsc{AutoCodeRover}, and \textsc{OpenHands}. We unify their interaction logs into a common format, capturing 120 trajectories and 2822 LLM interactions focused on program repair and issue resolution. Our study combines quantitative analyses of structural properties, action patterns, and token usage with qualitative assessments of reasoning coherence and feedback integration. We identify key trajectory characteristics such as iteration counts and token consumption, recurring action sequences, and the semantic coherence linking thoughts, actions, and their results. Our findings reveal behavioral motifs and anti-patterns that distinguish successful from failed executions, providing actionable insights for improving agent design, including prompting strategies, failure diagnosis, and anti-pattern detection. We release our dataset and annotation framework to support further research on transparent and robust autonomous software engineering agents. |
|
Selective Social-Interaction via Individual Importance for Fast Human Trajectory Prediction | 2025-06-23 | ShowThis paper presents an architecture for selecting important neighboring people to predict the primary person's trajectory. To achieve effective neighboring people selection, we propose a people selection module called the Importance Estimator which outputs the importance of each neighboring person for predicting the primary person's future trajectory. To prevent gradients from being blocked by non-differentiable operations when sampling surrounding people based on their importance, we employ the Gumbel Softmax for training. Experiments conducted on the JRDB dataset show that our method speeds up the process with competitive prediction accuracy. |
MIRU 2025 |
Area between trajectories: Insights into optimal group selection and trajectory heterogeneity in group-based trajectory modeling | 2025-06-22 | ShowGroup-based trajectory modeling (GBTM) is commonly used to identify longitudinal patterns in health outcomes among older adults, with determining the optimal number of groups being a crucial step. While statistically grounded criteria are primarily relied upon, clinical relevance is gradually emphasized in medicine to ensure that the identified trajectory heterogeneity appropriately reflects changes in a disease or symptom over time. However, such considerations are often judged through visual comparisons, without concrete approaches for their application. To address this, the Area Between Trajectories (ABTs) was introduced as insights for quantifying trajectory group differences. Using a simulated sleep quality dataset, GBTM was applied to build and compare models. Subsequently, ABTs was demonstrated to show how it works, while also highlighting its limitations and potential applications. |
15 pa...15 pages, 4 figures, 1 table |
DriveSuprim: Towards Precise Trajectory Selection for End-to-End Planning | 2025-06-22 | ShowIn complex driving environments, autonomous vehicles must navigate safely. Relying on a single predicted path, as in regression-based approaches, usually does not explicitly assess the safety of the predicted trajectory. Selection-based methods address this by generating and scoring multiple trajectory candidates and predicting the safety score for each, but face optimization challenges in precisely selecting the best option from thousands of possibilities and distinguishing subtle but safety-critical differences, especially in rare or underrepresented scenarios. We propose DriveSuprim to overcome these challenges and advance the selection-based paradigm through a coarse-to-fine paradigm for progressive candidate filtering, a rotation-based augmentation method to improve robustness in out-of-distribution scenarios, and a self-distillation framework to stabilize training. DriveSuprim achieves state-of-the-art performance, reaching 93.5% PDMS in NAVSIM v1 and 87.1% EPDMS in NAVSIM v2 without extra data, demonstrating superior safetycritical capabilities, including collision avoidance and compliance with rules, while maintaining high trajectory quality in various driving scenarios. |
15 pages, 6 figures |
Leveling the Playing Field: Carefully Comparing Classical and Learned Controllers for Quadrotor Trajectory Tracking | 2025-06-21 | ShowLearning-based control approaches like reinforcement learning (RL) have recently produced a slew of impressive results for tasks like quadrotor trajectory tracking and drone racing. Naturally, it is common to demonstrate the advantages of these new controllers against established methods like analytical controllers. We observe, however, that reliably comparing the performance of such very different classes of controllers is more complicated than might appear at first sight. As a case study, we take up the problem of agile tracking of an end-effector for a quadrotor with a fixed arm. We develop a set of best practices for synthesizing the best-in-class RL and geometric controllers (GC) for benchmarking. In the process, we resolve widespread RL-favoring biases in prior studies that provide asymmetric access to: (1) the task definition, in the form of an objective function, (2) representative datasets, for parameter optimization, and (3) feedforward information, describing the desired future trajectory. The resulting findings are the following: our improvements to the experimental protocol for comparing learned and classical controllers are critical, and each of the above asymmetries can yield misleading conclusions. Prior works have claimed that RL outperforms GC, but we find the gaps between the two controller classes are much smaller than previously published when accounting for symmetric comparisons. Geometric control achieves lower steady-state error than RL, while RL has better transient performance, resulting in GC performing better in relatively slow or less agile tasks, but RL performing better when greater agility is required. Finally, we open-source implementations of geometric and RL controllers for these aerial vehicles, implementing best practices for future development. Website and code is available at https://pratikkunapuli.github.io/rl-vs-gc/ |
Accep...Accepted for publication to RSS 2025. 10 pages, 5 figures. Project website: https://pratikkunapuli.github.io/rl-vs-gc/ |
Trajectory Prediction for Autonomous Driving: Progress, Limitations, and Future Directions | 2025-06-21 | ShowAs the potential for autonomous vehicles to be integrated on a large scale into modern traffic systems continues to grow, ensuring safe navigation in dynamic environments is crucial for smooth integration. To guarantee safety and prevent collisions, autonomous vehicles must be capable of accurately predicting the trajectories of surrounding traffic agents. Over the past decade, significant efforts from both academia and industry have been dedicated to designing solutions for precise trajectory forecasting. These efforts have produced a diverse range of approaches, raising questions about the differences between these methods and whether trajectory prediction challenges have been fully addressed. This paper reviews a substantial portion of recent trajectory prediction methods proposing a taxonomy to classify existing solutions. A general overview of the prediction pipeline is also provided, covering input and output modalities, modeling features, and prediction paradigms existing in the literature. In addition, the paper discusses active research areas within trajectory prediction, addresses the posed research questions, and highlights the remaining research gaps and challenges. |
|
Physics-informed mixture of experts network for interpretable battery degradation trajectory computation amid second-life complexities | 2025-06-21 | ShowRetired electric vehicle batteries offer immense potential to support low-carbon energy systems, but uncertainties in their degradation behavior and data inaccessibilities under second-life use pose major barriers to safe and scalable deployment. This work proposes a Physics-Informed Mixture of Experts (PIMOE) network that computes battery degradation trajectories using partial, field-accessible signals in a single cycle. PIMOE leverages an adaptive multi-degradation prediction module to classify degradation modes using expert weight synthesis underpinned by capacity-voltage and relaxation data, producing latent degradation trend embeddings. These are input to a use-dependent recurrent network for long-term trajectory prediction. Validated on 207 batteries across 77 use conditions and 67,902 cycles, PIMOE achieves an average mean absolute percentage (MAPE) errors of 0.88% with a 0.43 ms inference time. Compared to the state-of-the-art Informer and PatchTST, it reduces computational time and MAPE by 50%, respectively. Compatible with random state of charge region sampling, PIMOE supports 150-cycle forecasts with 1.50% average and 6.26% maximum MAPE, and operates effectively even with pruned 5MB training data. Broadly, PIMOE framework offers a deployable, history-free solution for battery degradation trajectory computation, redefining how second-life energy storage systems are assessed, optimized, and integrated into the sustainable energy landscape. |
|
Trajectory tracking control of USV with actuator constraints in the presence of disturbances | 2025-06-20 | ShowAll practical systems often pose a problem of finite control capability, which can notably degrade the performance if not properly addressed. Since actuator input bounds are typically known, integrating actuator saturation considerations into the control law design process can lead to enhanced performance and more precise trajectory tracking. Also, the actuators cannot provide the demanded forces or torques instantaneously; hence, there is a limitation on the rate of magnitude. This work proposes nonlinear feedback controller designs developed using the Lyapunov stability and backstepping method while actively considering the actuator magnitude and rate constraints. The system dynamics are augmented with a smooth control input saturation model. Additionally, an observer is incorporated to estimate the disturbance vector. Through Lyapunov stability analysis, we demonstrate the system's stability under the proposed controller for the Uncrewed Surface Vessel (USV), ensuring adherence to actuator constraints provided their initial values fall within the prescribed bounds. Extensive numerical simulations performed by considering various trajectories and multiple initial conditions demonstrate the effectiveness of the controller in maintaining tracking performance without violating actuator constraints. This work also relaxes the assumption of equally capable actuators to be used to control the motion of USVs, affirming the viability of the controller in practical applications. |
|
TrajSceneLLM: A Multimodal Perspective on Semantic GPS Trajectory Analysis | 2025-06-19 | ShowGPS trajectory data reveals valuable patterns of human mobility and urban dynamics, supporting a variety of spatial applications. However, traditional methods often struggle to extract deep semantic representations and incorporate contextual map information. We propose TrajSceneLLM, a multimodal perspective for enhancing semantic understanding of GPS trajectories. The framework integrates visualized map images (encoding spatial context) and textual descriptions generated through LLM reasoning (capturing temporal sequences and movement dynamics). Separate embeddings are generated for each modality and then concatenated to produce trajectory scene embeddings with rich semantic content which are further paired with a simple MLP classifier. We validate the proposed framework on Travel Mode Identification (TMI), a critical task for analyzing travel choices and understanding mobility behavior. Our experiments show that these embeddings achieve significant performance improvement, highlighting the advantage of our LLM-driven method in capturing deep spatio-temporal dependencies and reducing reliance on handcrafted features. This semantic enhancement promises significant potential for diverse downstream applications and future research in geospatial artificial intelligence. The source code and dataset are publicly available at: https://github.com/februarysea/TrajSceneLLM. |
Under...Under review for ACM SIGSPATIAL 2025 |
VideoGAN-based Trajectory Proposal for Automated Vehicles | 2025-06-19 | ShowBeing able to generate realistic trajectory options is at the core of increasing the degree of automation of road vehicles. While model-driven, rule-based, and classical learning-based methods are widely used to tackle these tasks at present, they can struggle to effectively capture the complex, multimodal distributions of future trajectories. In this paper we investigate whether a generative adversarial network (GAN) trained on videos of bird's-eye view (BEV) traffic scenarios can generate statistically accurate trajectories that correctly capture spatial relationships between the agents. To this end, we propose a pipeline that uses low-resolution BEV occupancy grid videos as training data for a video generative model. From the generated videos of traffic scenarios we extract abstract trajectory data using single-frame object detection and frame-to-frame object matching. We particularly choose a GAN architecture for the fast training and inference times with respect to diffusion models. We obtain our best results within 100 GPU hours of training, with inference times under 20,ms. We demonstrate the physical realism of the proposed trajectories in terms of distribution alignment of spatial and dynamic parameters with respect to the ground truth videos from the Waymo Open Motion Dataset. |
|
Enhanced Trust Region Sequential Convex Optimization for Multi-Drone Thermal Screening Trajectory Planning in Urban Environments | 2025-06-19 | ShowThe rapid detection of abnormal body temperatures in urban populations is essential for managing public health risks, especially during outbreaks of infectious diseases. Multi-drone thermal screening systems offer promising solutions for fast, large-scale, and non-intrusive human temperature monitoring. However, trajectory planning for multiple drones in complex urban environments poses significant challenges, including collision avoidance, coverage efficiency, and constrained flight environments. In this study, we propose an enhanced trust region sequential convex optimization (TR-SCO) algorithm for optimal trajectory planning of multiple drones performing thermal screening tasks. Our improved algorithm integrates a refined convex optimization formulation within a trust region framework, effectively balancing trajectory smoothness, obstacle avoidance, altitude constraints, and maximum screening coverage. Simulation results demonstrate that our approach significantly improves trajectory optimality and computational efficiency compared to conventional convex optimization methods. This research provides critical insights and practical contributions toward deploying efficient multi-drone systems for real-time thermal screening in urban areas. For reader who are interested in our research, we release our source code at https://github.com/Cherry0302/Enhanced-TR-SCO. |
|
Autonomous Trajectory Optimization for UAVs in Disaster Zone Using Henry Gas Optimization Scheme | 2025-06-18 | ShowThe unmanned aerial vehicles (UAVs) in a disaster-prone environment plays important role in assisting the rescue services and providing the internet connectivity with the outside world. However, in such a complex environment the selection of optimum trajectory of UAVs is of utmost importance. UAV trajectory optimization deals with finding the shortest path in the minimal possible time. In this paper, a cluster optimization scheme (COS) is proposed using the Henry gas optimization (HGO) metaheuristic algorithm to identify the shortest path having minimal transportation cost and algorithm complexity. The mathematical model is designed for COS using the HGO algorithm and compared with the state-of-the-art metaheuristic algorithms such as particle swarm optimization (PSO), grey wolf optimization (GWO), cuckoo search algorithm (CSA) and barnacles mating optimizer (BMO). In order to prove the robustness of the proposed model, four different scenarios are evaluated that includes ambient environment, constrict environment, tangled environment, and complex environment. In all the aforementioned scenar F438 ios, the HGO algorithm outperforms the existing algorithms. Particularly, in the ambient environment, the HGO algorithm achieves a 39.3% reduction in transportation cost and a 16.8% reduction in computational time as compared to the PSO algorithm. Hence, the HGO algorithm can be used for autonomous trajectory optimization of UAVs in smart cities. |
12 pages, 9 figuers |
Title | Date | Abstract | Comment |
---|---|---|---|
An AI Approach for Learning the Spectrum of the Laplace-Beltrami Operator | 2025-07-09 | ShowThe spectrum of the Laplace-Beltrami (LB) operator is central in geometric deep learning tasks, capturing intrinsic properties of the shape of the object under consideration. The best established method for its estimation, from a triangulated mesh of the object, is based on the Finite Element Method (FEM), and computes the top k LB eigenvalues with a complexity of O(Nk), where N is the number of points. This can render the FEM method inefficient when repeatedly applied to databases of CAD mechanical parts, or in quality control applications where part metrology is acquired as large meshes and decisions about the quality of each part are needed quickly and frequently. As a solution to this problem, we present a geometric deep learning framework to predict the LB spectrum efficiently given the CAD mesh of a part, achieving significant computational savings without sacrificing accuracy, demonstrating that the LB spectrum is learnable. The proposed Graph Neural Network architecture uses a rich set of part mesh features - including Gaussian curvature, mean curvature, and principal curvatures. In addition to our trained network, we make available, for repeatability, a large curated dataset of real-world mechanical CAD models derived from the publicly available ABC dataset used for training and testing. Experimental results show that our method reduces computation time of the LB spectrum by approximately 5 times over linear FEM while delivering competitive accuracy. |
18 pa...18 pages, 9 figures, submitted for publication |
GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning | 2025-07-09 | ShowMicroscopic assessment of histopathology images is vital for accurate cancer diagnosis and treatment. Whole Slide Image (WSI) classification and captioning have become crucial tasks in computer-aided pathology. However, microscopic WSI face challenges such as redundant patches and unknown patch positions due to subjective pathologist captures. Moreover, generating automatic pathology captions remains a significant challenge. To address these issues, we introduce a novel GNN-ViTCap framework for classification and caption generation from histopathological microscopic images. First, a visual feature extractor generates patch embeddings. Redundant patches are then removed by dynamically clustering these embeddings using deep embedded clustering and selecting representative patches via a scalar dot attention mechanism. We build a graph by connecting each node to its nearest neighbors in the similarity matrix and apply a graph neural network to capture both local and global context. The aggregated image embeddings are projected into the language model's input space through a linear layer and combined with caption tokens to fine-tune a large language model. We validate our method on the BreakHis and PatchGastric datasets. GNN-ViTCap achieves an F1 score of 0.934 and an AUC of 0.963 for classification, along with a BLEU-4 score of 0.811 and a METEOR score of 0.569 for captioning. Experimental results demonstrate that GNN-ViTCap outperforms state of the art approaches, offering a reliable and efficient solution for microscopy based patient diagnosis. |
|
Heterogeneous Graph Neural Networks for Short-term State Forecasting in Power Systems across Domains and Time Scales: A Hydroelectric Power Plant Case Study | 2025-07-09 | ShowAccurate short-term state forecasting is essential for efficient and stable operation of modern power systems, especially in the context of increasing variability introduced by renewable and distributed energy resources. As these systems evolve rapidly, it becomes increasingly important to reliably predict their states in the short term to ensure operational stability, support control decisions, and enable interpretable monitoring of sensor and machine behavior. Modern power systems often span multiple physical domains - including electrical, mechanical, hydraulic, and thermal - posing significant challenges for modeling and prediction. Graph Neural Networks (GNNs) have emerged as a promising data-driven framework for system state estimation and state forecasting in such settings. By leveraging the topological structure of sensor networks, GNNs can implicitly learn inter-sensor relationships and propagate information across the network. However, most existing GNN-based methods are designed under the assumption of homogeneous sensor relationships and are typically constrained to a single physical domain. This limitation restricts their ability to integrate and reason over heterogeneous sensor data commonly encountered in real-world energy systems, such as those used in energy conversion infrastructure. In this work, we propose the use of Heterogeneous Graph Attention Networks to address these limitations. Our approach models both homogeneous intra-domain and heterogeneous inter-domain relationships among sensor data from two distinct physical domains - hydraulic and electrical - which exhibit fundamentally different temporal dynamics. Experimental results demonstrate that our method significantly outperforms conventional baselines on average by 35.5% in terms of normalized root mean square error, confirming its effectiveness in multi-domain, multi-rate power system state forecasting. |
25 pages, 9 figures |
Deep-Learning-Based Pre-Layout Parasitic Capacitance Prediction on SRAM Designs | 2025-07-09 | ShowTo achieve higher system energy efficiency, SRAM in SoCs is often customized. The parasitic effects cause notable discrepancies between pre-layout and post-layout circuit simulations, leading to difficulty in converging design parameters and excessive design iterations. Is it possible to well predict the parasitics based on the pre-layout circuit, so as to perform parasitic-aware pre-layout simulation? In this work, we propose a deep-learning-based 2-stage model to accurately predict these parasitics in pre-layout stages. The model combines a Graph Neural Network (GNN) classifier and Multi-Layer Perceptron (MLP) regressors, effectively managing class imbalance of the net parasitics in SRAM circuits. We also employ Focal Loss to mitigate the impact of abundant internal net samples and integrate subcircuit information into the graph to abstract the hierarchical structure of schematics. Experiments on 4 real SRAM designs show that our approach not only surpasses the state-of-the-art model in parasitic prediction by a maximum of 19X reduction of error but also significantly boosts the simulation process by up to 598X speedup. |
Publi...Published in Proceedings of GLSVLSI2024 |
GMLM: Bridging Graph Neural Networks and Language Models for Heterophilic Node Classification | 2025-07-09 | ShowIntegrating powerful but computationally expensive Pre-trained Language Models (PLMs) with Graph Neural Networks (GNNs) is a key challenge, especially on text-rich heterophilic graphs. We propose the Graph Masked Language Model (GMLM), a framework designed for the efficient and effective fusion of graph structure and text semantics. GMLM employs a two-stage process: first, a contrastive pre-training stage with a novel soft masking technique builds a robust multi-scale GNN; second, an end-to-end fine-tuning stage uses a dynamic active node selection strategy for scalability and a bi-directional cross-attention module for deep fusion. Experiments on five heterophilic benchmarks show GMLM achieves state-of-the-art results on four, significantly outperforming prior GNN and large LLM-based methods. For instance, it improves accuracy on the Texas dataset by over 8% and on Wisconsin by nearly 5%. Our work demonstrates that a sophisticated, deeply-integrated architecture can be more effective and efficient than larger, general-purpose models for text-rich graph representation learning. |
|
DecoyDB: A Dataset for Graph Contrastive Learning in Protein-Ligand Binding Affinity Prediction | 2025-07-08 | ShowPredicting the binding affinity of protein-ligand complexes plays a vital role in drug discovery. Unfortunately, progress has been hindered by the lack of large-scale and high-quality binding affinity labels. The widely used PDBbind dataset has fewer than 20K labeled complexes. Self-supervised learning, especially graph contrastive learning (GCL), provides a unique opportunity to break the barrier by pre-training graph neural network models based on vast unlabeled complexes and fine-tuning the models on much fewer labeled complexes. However, the problem faces unique challenges, including a lack of a comprehensive unlabeled dataset with well-defined positive/negative complex pairs and the need to design GCL algorithms that incorporate the unique characteristics of such data. To fill the gap, we propose DecoyDB, a large-scale, structure-aware dataset specifically designed for self-supervised GCL on protein-ligand complexes. DecoyDB consists of high-resolution ground truth complexes (less than 2.5 Angstrom) and diverse decoy structures with computationally generated binding poses that range from realistic to suboptimal (negative pairs). Each decoy is annotated with a Root Mean Squared Deviation (RMSD) from the native pose. We further design a customized GCL framework to pre-train graph neural networks based on DecoyDB and fine-tune the models with labels from PDBbind. Extensive experiments confirm that models pre-trained with DecoyDB achieve superior accuracy, label efficiency, and generalizability. |
|
Learning-Augmented Model-Based Multi-Robot Planning for Time-Critical Search and Inspection Under Uncertainty | 2025-07-08 | ShowIn disaster response or surveillance operations, quickly identifying areas needing urgent attention is critical, but deploying response teams to every location is inefficient or often impossible. Effective performance in this domain requires coordinating a multi-robot inspection team to prioritize inspecting locations more likely to need immediate response, while also minimizing travel time. This is particularly challenging because robots must directly observe the locations to determine which ones require additional attention. This work introduces a multi-robot planning framework for coordinated time-critical multi-robot search under uncertainty. Our approach uses a graph neural network to estimate the likelihood of PoIs needing attention from noisy sensor data and then uses those predictions to guide a multi-robot model-based planner to determine the cost-effective plan. Simulated experiments demonstrate that our planner improves performance at least by 16.3%, 26.7%, and 26.2% for 1, 3, and 5 robots, respectively, compared to non-learned and learned baselines. We also validate our approach on real-world platforms using quad-copters. |
7 pag...7 pages, 6 figures, CASE 2025 |
RPHunter: Unveiling Rug Pull Schemes in Crypto Token via Code-and-Transaction Fusion Analysis | 2025-07-08 | ShowRug pull scams have emerged as a persistent threat to cryptocurrency, causing significant financial losses. A typical scenario involves scammers deploying honeypot contracts to attract investments, restricting token sales, and draining the funds, which leaves investors with worthless tokens. Current methods either rely on predefined patterns to detect code risks or utilize statistical transaction data to train detection models. However, real-world Rug Pull schemes often involve a complex interplay between malicious code and suspicious transaction behaviors. These methods, which solely focus on one aspect, fall short in detecting such schemes effectively. In this paper, we propose RPHunter, a novel technique that integrates code and transaction for Rug Pull detection. First, RPHunter establishes declarative rules and performs flow analysis to extract code risk information, further constructing a semantic risk code graph (SRCG). Meanwhile, to leverage transaction information, RPHunter formulates dynamic token transaction activities as a token flow behavior graph (TFBG) in which nodes and edges are characterized from network structure and market manipulation perspectives. Finally, RPHunter employs graph neural networks to extract complementary features from SRCG and TFBG, integrating them through an attention fusion model to enhance the detection of Rug Pull. We manually analyzed 645 Rug Pull incidents from code and transaction aspects and constructed a ground-truth dataset. We evaluated RPHunter on our dataset, achieving a precision of 95.3%, a recall of 93.8% and an F1 score of 94.5%, which highlights superior performance compared to existing methods. Furthermore, when applied to the real-world scenarios, RPHunter has identified 4801 Rug Pull tokens, achieving a precision of 90.7%. |
|
CoDy: Counterfactual Explainers for Dynamic Graphs | 2025-07-08 | ShowTemporal Graph Neural Networks (TGNNs) are widely used to model dynamic systems where relationships and features evolve over time. Although TGNNs demonstrate strong predictive capabilities in these domains, their complex architectures pose significant challenges for explainability. Counterfactual explanation methods provide a promising solution by illustrating how modifications to input graphs can influence model predictions. To address this challenge, we present CoDy, Counterfactual Explainer for Dynamic Graphs, a model-agnostic, instance-level explanation approach that identifies counterfactual subgraphs to interpret TGNN predictions. CoDy employs a search algorithm that combines Monte Carlo Tree Search with heuristic selection policies, efficiently exploring a vast search space of potential explanatory subgraphs by leveraging spatial, temporal, and local event impact information. Extensive experiments against state-of-the-art factual and counterfactual baselines demonstrate CoDy's effectiveness, with improvements of 16% in AUFSC+ over the strongest baseline. |
Proce...Proceedings in ICML 2025 |
WATS: Calibrating Graph Neural Networks with Wavelet-Aware Temperature Scaling | 2025-07-08 | ShowGraph Neural Networks (GNNs) have demonstrated strong predictive performance on relational data; however, their confidence estimates often misalign with actual predictive correctness, posing significant limitations for deployment in safety-critical settings. While existing graph-aware calibration methods seek to mitigate this limitation, they primarily depend on coarse one-hop statistics, such as neighbor-predicted confidence, or latent node embeddings, thereby neglecting the fine-grained structural heterogeneity inherent in graph topology. In this work, we propose Wavelet-Aware Temperature Scaling (WATS), a post-hoc calibration framework that assigns node-specific temperatures based on tunable heat-kernel graph wavelet features. Specifically, WATS harnesses the scalability and topology sensitivity of graph wavelets to refine confidence estimates, all without necessitating model retraining or access to neighboring logits or predictions. Extensive evaluations across seven benchmark datasets with varying graph structures and two GNN backbones demonstrate that WATS achieves the lowest Expected Calibration Error (ECE) among all compared methods, outperforming both classical and graph-specific baselines by up to 42.3% in ECE and reducing calibration variance by 17.24% on average compared with graph-specific methods. Moreover, WATS remains computationally efficient, scaling well across graphs of diverse sizes and densities. Code will be released based on publication. |
|
Multi-Channel Hypergraph Contrastive Learning for Matrix Completion | 2025-07-08 | ShowRating is a typical user explicit feedback that visually reflects how much a user likes a related item. The (rating) matrix completion is essentially a rating prediction process, which is also a significant problem in recommender systems. Recently, graph neural networks (GNNs) have been widely used in matrix completion, which captures users' preferences over items by formulating a rating matrix as a bipartite graph. However, existing methods are susceptible due to data sparsity and long-tail distribution in real-world scenarios. Moreover, the messaging mechanism of GNNs makes it difficult to capture high-order correlations and constraints between nodes, which are essentially useful in recommendation tasks. To tackle these challenges, we propose a Multi-Channel Hypergraph Contrastive Learning framework for matrix completion, named MHCL. Specifically, MHCL adaptively learns hypergraph structures to capture high-order correlations between nodes and jointly captures local and global collaborative relationships through attention-based cross-view aggregation. Additionally, to consider the magnitude and order information of ratings, we treat different rating subgraphs as different channels, encourage alignment between adjacent ratings, and further achieve the mutual enhancement between different ratings through multi-channel cross-rating contrastive learning. Extensive experiments on five public datasets demonstrate that the proposed method significantly outperforms the current state-of-the-art approaches. |
|
GATMesh: Clock Mesh Timing Analysis using Graph Neural Networks | 2025-07-08 | ShowClock meshes are essential in high-performance VLSI systems for minimizing skew and handling PVT variations, but analyzing them is difficult due to reconvergent paths, multi-source driving, and input mesh buffer skew. SPICE simulations are accurate but slow; yet simplified models miss key effects like slew and input skew. We propose GATMesh, a Graph Neural Network (GNN)-based framework that models the clock mesh as a graph with augmented structural and physical features. Trained on SPICE data, GATMesh achieves high accuracy with average delay error of 5.27ps on unseen benchmarks, while achieving speed-ups of 47146x over multi-threaded SPICE simulation. |
|
HRRRCast: a data-driven emulator for regional weather forecasting at convection allowing scales | 2025-07-08 | ShowThe High-Resolution Rapid Refresh (HRRR) model is a convection-allowing model used in operational weather forecasting across the contiguous United States (CONUS). To provide a computationally efficient alternative, we introduce HRRRCast, a data-driven emulator built with advanced machine learning techniques. HRRRCast includes two architectures: a ResNet-based model (ResHRRR) and a Graph Neural Network-based model (GraphHRRR). ResHRRR uses convolutional neural networks enhanced with squeeze-and-excitation blocks and Feature-wise Linear Modulation, and supports probabilistic forecasting via the Denoising Diffusion Implicit Model (DDIM). To better handle longer lead times, we train a single model to predict multiple lead times (1h, 3h, and 6h), then use a greedy rollout strategy during inference. When evaluated on composite reflectivity over the full CONUS domain using ensembles of 3 to 10 members, ResHRRR outperforms HRRR forecast at light rainfall threshold (20 dBZ) and achieves competitive performance at moderate thresholds (30 dBZ). Our work advances the StormCast model of Pathak et al. [21] by: a) training on the full CONUS domain, b) using multiple lead times to improve long-range skill, c) training on analysis data instead of the +1h post-analysis data inadvertently used in StormCast, and d) incorporating future GFS states as inputs, enabling downscaling that improves long-lead accuracy. Grid-, neighborhood-, and object-based metrics confirm better storm placement, lower frequency bias, and higher success ratios than HRRR. HRRRCast ensemble forecasts also maintain sharper spatial detail, with power spectra more closely matching HRRR analysis. While GraphHRRR underperforms in its current form, it lays groundwork for future graph-based forecasting. HRRRCast represents a step toward efficient, data-driven regional weather prediction with competitive accuracy and ensemble capability. |
|
DESIGN: Encrypted GNN Inference via Server-Side Input Graph Pruning | 2025-07-08 | ShowGraph Neural Networks (GNNs) have achieved state-of-the-art performance in various graph-based learning tasks. However, enabling privacy-preserving GNNs in encrypted domains, such as under Fully Homomorphic Encryption (FHE), typically incurs substantial computational overhead, rendering real-time and privacy-preserving inference impractical. In this work, we propose DESIGN (EncrypteD GNN Inference via sErver-Side Input Graph pruNing), a novel framework for efficient encrypted GNN inference. DESIGN tackles the critical efficiency limitations of existing FHE GNN approaches, which often overlook input data redundancy and apply uniform computational strategies. Our framework achieves significant performance gains through a hierarchical optimization strategy executed entirely on the server: first, FHE-compatible node importance scores (based on encrypted degree statistics) are computed from the encrypted graph. These scores then guide a homomorphic partitioning process, generating multi-level importance masks directly under FHE. This dynamically generated mask facilitates both input graph pruning (by logically removing unimportant elements) and a novel adaptive polynomial activation scheme, where activation complexity is tailored to node importance levels. Empirical evaluations demonstrate that DESIGN substantially accelerates FHE GNN inference compared to state-of-the-art methods while maintaining competitive model accuracy, presenting a robust solution for secure graph analytics. |
Under...Under Review in Conference on Neural Information Processing Systems (NeurIPS 2025) |
Learnable quantum spectral filters for hybrid graph neural networks | 2025-07-08 | ShowIn this paper, we describe a parameterized quantum circuit that can be considered as convolutional and pooling layers for graph neural networks. The circuit incorporates the parameterized quantum Fourier circuit where the qubit connections for the controlled gates derived from the Laplacian operator. Specifically, we show that the eigenspace of the Laplacian operator of a graph can be approximated by using QFT based circuit whose connections are determined from the adjacency matrix. For an |
The s...The simulation code and results used for this paper is publicly available at: https://github.com/adaskin/gnn-qsf |
LLMs are Introvert | 2025-07-08 | ShowThe exponential growth of social media and generative AI has transformed information dissemination, fostering connectivity but also accelerating the spread of misinformation. Understanding information propagation dynamics and developing effective control strategies is essential to mitigate harmful content. Traditional models, such as SIR, provide basic insights but inadequately capture the complexities of online interactions. Advanced methods, including attention mechanisms and graph neural networks, enhance accuracy but typically overlook user psychology and behavioral dynamics. Large language models (LLMs), with their human-like reasoning, offer new potential for simulating psychological aspects of information spread. We introduce an LLM-based simulation environment capturing agents' evolving attitudes, emotions, and responses. Initial experiments, however, revealed significant gaps between LLM-generated behaviors and authentic human dynamics, especially in stance detection and psychological realism. A detailed evaluation through Social Information Processing Theory identified major discrepancies in goal-setting and feedback evaluation, stemming from the lack of emotional processing in standard LLM training. To address these issues, we propose the Social Information Processing-based Chain of Thought (SIP-CoT) mechanism enhanced by emotion-guided memory. This method improves the interpretation of social cues, personalization of goals, and evaluation of feedback. Experimental results confirm that SIP-CoT-enhanced LLM agents more effectively process social information, demonstrating behaviors, attitudes, and emotions closer to real human interactions. In summary, this research highlights critical limitations in current LLM-based propagation simulations and demonstrates how integrating SIP-CoT and emotional memory significantly enhances the social intelligence and realism of LLM agents. |
|
Graph Learning | 2025-07-08 | ShowGraph learning has rapidly evolved into a critical subfield of machine learning and artificial intelligence (AI). Its development began with early graph-theoretic methods, gaining significant momentum with the advent of graph neural networks (GNNs). Over the past decade, progress in scalable architectures, dynamic graph modeling, multimodal learning, generative AI, explainable AI (XAI), and responsible AI has broadened the applicability of graph learning to various challenging environments. Graph learning is significant due to its ability to model complex, non-Euclidean relationships that traditional machine learning struggles to capture, thus better supporting real-world applications ranging from drug discovery and fraud detection to recommender systems and scientific reasoning. However, challenges like scalability, generalization, heterogeneity, interpretability, and trustworthiness must be addressed to unlock its full potential. This survey provides a comprehensive introduction to graph learning, focusing on key dimensions including scalable, temporal, multimodal, generative, explainable, and responsible graph learning. We review state-of-the-art techniques for efficiently handling large-scale graphs, capturing dynamic temporal dependencies, integrating heterogeneous data modalities, generating novel graph samples, and enhancing interpretability to foster trust and transparency. We also explore ethical considerations, such as privacy and fairness, to ensure responsible deployment of graph learning models. Additionally, we identify and discuss emerging topics, highlighting recent integration of graph learning and other AI paradigms and offering insights into future directions. This survey serves as a valuable resource for researchers and practitioners seeking to navigate the rapidly evolving landscape of graph learning. |
178 pages |
Robust Learning on Noisy Graphs via Latent Space Constraints with External Knowledge | 2025-07-07 | ShowGraph Neural Networks (GNNs) often struggle with noisy edges. We propose Latent Space Constrained Graph Neural Networks (LSC-GNN) to incorporate external "clean" links and guide embeddings of a noisy target graph. We train two encoders--one on the full graph (target plus external edges) and another on a regularization graph excluding the target's potentially noisy links--then penalize discrepancies between their latent representations. This constraint steers the model away from overfitting spurious edges. Experiments on benchmark datasets show LSC-GNN outperforms standard and noise-resilient GNNs in graphs subjected to moderate noise. We extend LSC-GNN to heterogeneous graphs and validate it on a small protein-metabolite network, where metabolite-protein interactions reduce noise in protein co-occurrence data. Our results highlight LSC-GNN's potential to boost predictive performance and interpretability in settings with noisy relational structures. |
|
Balancing Efficiency and Expressiveness: Subgraph GNNs with Walk-Based Centrality | 2025-07-07 | ShowSubgraph GNNs have emerged as promising architectures that overcome the expressiveness limitations of Graph Neural Networks (GNNs) by processing bags of subgraphs. Despite their compelling empirical performance, these methods are afflicted by a high computational complexity: they process bags whose size grows linearly in the number of nodes, hindering their applicability to larger graphs. In this work, we propose an effective and easy-to-implement approach to dramatically alleviate the computational cost of Subgraph GNNs and unleash broader applications thereof. Our method, dubbed HyMN, leverages walk-based centrality measures to sample a small number of relevant subgraphs and drastically reduce the bag size. By drawing a connection to perturbation analysis, we highlight the strength of the proposed centrality-based subgraph sampling, and further prove that these walk-based centralities can be additionally used as Structural Encodings for improved discriminative power. A comprehensive set of experimental results demonstrates that HyMN provides an effective synthesis of expressiveness, efficiency, and downstream performance, unlocking the application of Subgraph GNNs to dramatically larger graphs. Not only does our method outperform more sophisticated subgraph sampling approaches, it is also competitive, and sometimes better, than other state-of-the-art approaches for a fraction of their runtime. |
ICML 2025 |
Theoretical Learning Performance of Graph Neural Networks: The Impact of Jumping Connections and Layer-wise Sparsification | 2025-07-07 | ShowJumping connections enable Graph Convolutional Networks (GCNs) to overcome over-smoothing, while graph sparsification reduces computational demands by selecting a sub-matrix of the graph adjacency matrix during neighborhood aggregation. Learning GCNs with graph sparsification has shown empirical success across various applications, but a theoretical understanding of the generalization guarantees remains limited, with existing analyses ignoring either graph sparsification or jumping connections. This paper presents the first learning dynamics and generalization analysis of GCNs with jumping connections using graph sparsification. Our analysis demonstrates that the generalization accuracy of the learned model closely approximates the highest achievable accuracy within a broad class of target functions dependent on the proposed sparse effective adjacency matrix $A^$. Thus, graph sparsification maintains generalization performance when $A^$ preserves the essential edges that support meaningful message propagation. We reveal that jumping connections lead to different sparsification requirements across layers. In a two-hidden-layer GCN, the generalization is more affected by the sparsified matrix deviations from |
TMLR |
Bit-Flip Fault Attack: Crushing Graph Neural Networks via Gradual Bit Search | 2025-07-07 | ShowGraph Neural Networks (GNNs) have emerged as a powerful machine learning method for graph-structured data. A plethora of hardware accelerators has been introduced to meet the performance demands of GNNs in real-world applications. However, security challenges of hardware-based attacks have been generally overlooked. In this paper, we investigate the vulnerability of GNN models to hardware-based fault attack, wherein an attacker attempts to misclassify output by modifying trained weight parameters through fault injection in a memory device. Thus, we propose Gradual Bit-Flip Fault Attack (GBFA), a layer-aware bit-flip fault attack, selecting a vulnerable bit in each selected weight gradually to compromise the GNN's performance by flipping a minimal number of bits. To achieve this, GBFA operates in two steps. First, a Markov model is created to predict the execution sequence of layers based on features extracted from memory access patterns, enabling the launch of the attack within a specific layer. Subsequently, GBFA identifies vulnerable bits within the selected weights using gradient ranking through an in-layer search. We evaluate the effectiveness of the proposed GBFA attack on various GNN models for node classification tasks using the Cora and PubMed datasets. Our findings show that GBFA significantly degrades prediction accuracy, and the variation in its impact across different layers highlights the importance of adopting a layer-aware attack strategy in GNNs. For example, GBFA degrades GraphSAGE's prediction accuracy by 17% on the Cora dataset with only a single bit flip in the last layer. |
|
Inaugural MOASEI Competition at AAMAS'2025: A Technical Report | 2025-07-07 | ShowWe present the Methods for Open Agent Systems Evaluation Initiative (MOASEI) Competition, a multi-agent AI benchmarking event designed to evaluate decision-making under open-world conditions. Built on the free-range-zoo environment suite, MOASEI introduced dynamic, partially observable domains with agent and task openness--settings where entities may appear, disappear, or change behavior over time. The 2025 competition featured three tracks--Wildfire, Rideshare, and Cybersecurity--each highlighting distinct dimensions of openness and coordination complexity. Eleven teams from international institutions participated, with four of those teams submitting diverse solutions including graph neural networks, convolutional architectures, predictive modeling, and large language model--driven meta--optimization. Evaluation metrics centered on expected utility, robustness to perturbations, and responsiveness to environmental change. The results reveal promising strategies for generalization and adaptation in open environments, offering both empirical insight and infrastructure for future research. This report details the competition's design, findings, and contributions to the open-agent systems research community. |
Repor...Report from the MOASEI'2025 Competition held at AAMAS'2025 |
Scalable Multi-Task Learning for Particle Collision Event Reconstruction with Heterogeneous Graph Neural Networks | 2025-07-07 | ShowThe growing luminosity frontier at the Large Hadron Collider is challenging the reconstruction and analysis of particle collision events. Increased particle multiplicities are straining latency and storage requirements at the data acquisition stage, while new complications are emerging, including higher background levels and more frequent particle vertex misassociations. This in turn necessitates the development of more holistic and scalable reconstruction methods that take advantage of recent advances in machine learning. We propose a novel Heterogeneous Graph Neural Network (HGNN) architecture featuring unique representations for diverse particle collision relationships and integrated graph pruning layers for scalability. Trained with a multi-task paradigm in an environment mimicking the LHCb experiment, this HGNN significantly improves beauty hadron reconstruction performance. Notably, it concurrently performs particle vertex association and graph pruning within a single framework. We quantify reconstruction and pruning performance, demonstrate enhanced inference time scaling with event complexity, and mitigate potential performance loss using a weighted message passing scheme. |
21 pa...21 pages, 10 figures, 4 tables (planned submission to Machine Learning Science and Technology) |
Exploring Semantic Clustering and Similarity Search for Heterogeneous Traffic Scenario Graph | 2025-07-07 | ShowScenario-based testing is an indispensable instrument for the comprehensive validation and verification of automated vehicles (AVs). However, finding a manageable and finite, yet representative subset of scenarios in a scalable, possibly unsupervised manner is notoriously challenging. Our work is meant to constitute a cornerstone to facilitate sample-efficient testing, while still capturing the diversity of relevant operational design domains (ODDs) and accounting for the "long tail" phenomenon in particular. To this end, we first propose an expressive and flexible heterogeneous, spatio-temporal graph model for representing traffic scenarios. Leveraging recent advances of graph neural networks (GNNs), we then propose a self-supervised method to learn a universal embedding space for scenario graphs that enables clustering and similarity search. In particular, we implement contrastive learning alongside a bootstrapping-based approach and evaluate their suitability for partitioning the scenario space. Experiments on the nuPlan dataset confirm the model's ability to capture semantics and thus group related scenarios in a meaningful way despite the absence of discrete class labels. Different scenario types materialize as distinct clusters. Our results demonstrate how variable-length traffic scenarios can be condensed into single vector representations that enable nearest-neighbor retrieval of representative candidates for distinct scenario categories. Notably, this is achieved without manual labeling or bias towards an explicit objective such as criticality. Ultimately, our approach can serve as a basis for scalable selection of scenarios to further enhance the efficiency and robustness of testing AVs in simulation. |
accep...accepted in the IEEE IAVVC 2025 conference |
Conditional Graph Neural Network for Predicting Soft Tissue Deformation and Forces | 2025-07-07 | ShowSoft tissue simulation in virtual environments is becoming increasingly important for medical applications. However, the high deformability of soft tissue poses significant challenges. Existing methods rely on segmentation, meshing and estimation of stiffness properties of tissues. In addition, the integration of haptic feedback requires precise force estimation to enable a more immersive experience. We introduce a novel data-driven model, a conditional graph neural network (cGNN) to tackle this complexity. Our model takes surface points and the location of applied forces, and is specifically designed to predict the deformation of the points and the forces exerted on them. We trained our model on experimentally collected surface tracking data of a soft tissue phantom and used transfer learning to overcome the data scarcity by initially training it with mass-spring simulations and fine-tuning it with the experimental data. This approach improves the generalisation capability of the model and enables accurate predictions of tissue deformations and corresponding interaction forces. The results demonstrate that the model can predict deformations with a distance error of 0.35$\pm$0.03 mm for deformations up to 30 mm and the force with an absolute error of 0.37$\pm$0.05 N for forces up to 7.5 N. Our data-driven approach presents a promising solution to the intricate challenge of simulating soft tissues within virtual environments. Beyond its applicability in medical simulations, this approach holds the potential to benefit various fields where realistic soft tissue simulations are required. |
|
High Order Collaboration-Oriented Federated Graph Neural Network for Accurate QoS Prediction | 2025-07-07 | ShowPredicting Quality of Service (QoS) data crucial for cloud service selection, where user privacy is a critical concern. Federated Graph Neural Networks (FGNNs) can perform QoS data prediction as well as maintaining user privacy. However, existing FGNN-based QoS predictors commonly implement on-device training on scattered explicit user-service graphs, thereby failing to utilize the implicit user-user interactions. To address this issue, this study proposes a high order collaboration-oriented federated graph neural network (HC-FGNN) to obtain accurate QoS prediction with privacy preservation. Concretely, it magnifies the explicit user-service graphs following the principle of attention mechanism to obtain the high order collaboration, which reflects the implicit user-user interactions. Moreover, it utilizes a lightweight-based message aggregation way to improve the computational efficiency. The extensive experiments on two QoS datasets from real application indicate that the proposed HC-FGNN possesses the advantages of high prediction accurate and privacy protection. |
|
Hierarchical Intent-guided Optimization with Pluggable LLM-Driven Semantics for Session-based Recommendation | 2025-07-07 | ShowSession-based Recommendation (SBR) aims to predict the next item a user will likely engage with, using their interaction sequence within an anonymous session. Existing SBR models often focus only on single-session information, ignoring inter-session relationships and valuable cross-session insights. Some methods try to include inter-session data but struggle with noise and irrelevant information, reducing performance. Additionally, most models rely on item ID co-occurrence and overlook rich semantic details, limiting their ability to capture fine-grained item features. To address these challenges, we propose a novel hierarchical intent-guided optimization approach with pluggable LLM-driven semantic learning for session-based recommendations, called HIPHOP. First, we introduce a pluggable embedding module based on large language models (LLMs) to generate high-quality semantic representations, enhancing item embeddings. Second, HIPHOP utilizes graph neural networks (GNNs) to model item transition relationships and incorporates a dynamic multi-intent capturing module to address users' diverse interests within a session. Additionally, we design a hierarchical inter-session similarity learning module, guided by user intent, to capture global and local session relationships, effectively exploring users' long-term and short-term interests. To mitigate noise, an intent-guided denoising strategy is applied during inter-session learning. Finally, we enhance the model's discriminative capability by using contrastive learning to optimize session representations. Experiments on multiple datasets show that HIPHOP significantly outperforms existing methods, demonstrating its effectiveness in improving recommendation quality. Our code is available: https://github.com/hjx159/HIPHOP. |
|
Graph Neural Networks as a Substitute for Transformers in Single-Cell Transcriptomics | 2025-07-05 | ShowGraph Neural Networks (GNNs) and Transformers share significant similarities in their encoding strategies for interacting with features from nodes of interest, where Transformers use query-key scores and GNNs use edges. Compared to GNNs, which are unable to encode relative positions, Transformers leverage dynamic attention capabilities to better represent relative relationships, thereby becoming the standard backbones in large-scale sequential pre-training. However, the subtle difference prompts us to consider: if positions are no longer crucial, could we substitute Transformers with Graph Neural Networks in some fields such as Single-Cell Transcriptomics? In this paper, we first explore the similarities and differences between GNNs and Transformers, specifically in terms of relative positions. Additionally, we design a synthetic example to illustrate their equivalence where there are no relative positions between tokens in the sample. Finally, we conduct extensive experiments on a large-scale position-agnostic dataset-single-cell transcriptomics-finding that GNNs achieve competitive performance compared to Transformers while consuming fewer computation resources. These findings provide novel insights for researchers in the field of single-cell transcriptomics, challenging the prevailing notion that the Transformer is always the optimum choice. |
9 pages, 5 figures |
Physics-Informed Graph Neural Networks to Reconstruct Local Fields Considering Finite Strain Hyperelasticity | 2025-07-05 | ShowWe propose a physics-informed machine learning framework called P-DivGNN to reconstruct local stress fields at the micro-scale, in the context of multi-scale simulation given a periodic micro-structure mesh and mean, macro-scale, stress values. This method is based in representing a periodic micro-structure as a graph, combined with a message passing graph neural network. We are able to retrieve local stress field distributions, providing average stress values produced by a mean field reduced order model (ROM) or Finite Element (FE) simulation at the macro-scale. The prediction of local stress fields are of utmost importance considering fracture analysis or the definition of local fatigue criteria. Our model incorporates physical constraints during training to constraint local stress field equilibrium state and employs a periodic graph representation to enforce periodic boundary conditions. The benefits of the proposed physics-informed GNN are evaluated considering linear and non linear hyperelastic responses applied to varying geometries. In the non-linear hyperelastic case, the proposed method achieves significant computational speed-ups compared to FE simulation, making it particularly attractive for large-scale applications. |
28 pa...28 pages, 17 figures, pre-print |
Commute Networks as a Signature of Urban Socioeconomic Performance: Evaluating Mobility Structures with Deep Learning Models | 2025-07-05 | ShowUrban socioeconomic modeling has predominantly concentrated on extensive location and neighborhood-based features, relying on the localized population footprint. However, networks in urban systems are common, and many urban modeling methods don't account for network-based effects. In this study, we propose using commute information records from the census as a reliable and comprehensive source to construct mobility networks across cities. Leveraging deep learning architectures, we employ these commute networks across U.S. metro areas for socioeconomic modeling. We show that mobility network structures provide significant predictive performance without considering any node features. Consequently, we use mobility networks to present a supervised learning framework to model a city's socioeconomic indicator directly, combining Graph Neural Network and Vanilla Neural Network models to learn all parameters in a single learning pipeline. Our experiments in 12 major U.S. cities show the proposed model outperforms previous conventional machine learning models. This work provides urban researchers methods to incorporate network effects in urban modeling and informs stakeholders of wider network-based effects in urban policymaking and planning. |
|
Graph Collaborative Attention Network for Link Prediction in Knowledge Graphs | 2025-07-05 | ShowKnowledge graphs offer a structured representation of real-world entities and their relationships, enabling a wide range of applications from information retrieval to automated reasoning. In this paper, we conduct a systematic comparison between traditional rule-based approaches and modern deep learning methods for link prediction. We focus on KBGAT, a graph neural network model that leverages multi-head attention to jointly encode both entity and relation features within local neighborhood structures. To advance this line of research, we introduce \textbf{GCAT} (Graph Collaborative Attention Network), a refined model that enhances context aggregation and interaction between heterogeneous nodes. Experimental results on four widely-used benchmark datasets demonstrate that GCAT not only consistently outperforms rule-based methods but also achieves competitive or superior performance compared to existing neural embedding models. Our findings highlight the advantages of attention-based architectures in capturing complex relational patterns for knowledge graph completion tasks. |
|
Hierarchical graph sampling based minibatch learning with chain preservation and variance reduction | 2025-07-05 | ShowGraph sampling-based Graph Convolutional Networks (GCNs) decouple sampling from forward and backward propagation during minibatch training, enhancing scalability with respect to layer depth and graph size. We propose HIS_GCNs, a hierarchical importance sampling-based learning method. By constructing minibatches using sampled subgraphs, HIS_GCNs focuses on the importance of both the core and periphery in a scale-free training graph. Specifically, it preserves the centrum of the core in most minibatches, which maintains connectivity between periphery nodes, and samples periphery edges without core node interference, which allows longer chains composed entirely of low-degree nodes remain within the same minibatch. HIS_GCNs can maximize the discrete Ricci curvature (i.e., Ollivier-Ricci curvatures) of the edges in a subgraph, enabling preservation of important chains for information propagation. This approach can achieve a low node embedding variance and a high convergence speed. Diverse experiments on Graph Neural Networks (GNNs) with node classification tasks confirmed the superior performance of HIS_GCNs in terms of both accuracy and training time. Open-source code (https://github.com/HuQiaCHN/HIS-GCN). |
30 pages, 12 figures |
Combining Graph Neural Networks and Mixed Integer Linear Programming for Molecular Inference under the Two-Layered Model | 2025-07-05 | ShowRecently, a novel two-phase framework named mol-infer for inference of chemical compounds with prescribed abstract structures and desired property values has been proposed. The framework mol-infer is primarily based on using mixed integer linear programming (MILP) to simulate the computational process of machine learning methods and describe the necessary and sufficient conditions to ensure such a chemical graph exists. The existing approaches usually first convert the chemical compounds into handcrafted feature vectors to construct prediction functions, but because of the limit on the kinds of descriptors originated from the need for tractability in the MILP formulation, the learning performances on datasets of some properties are not good enough. A lack of good learning performance can greatly lower the quality of the inferred chemical graphs, and thus improving learning performance is of great importance. On the other hand, graph neural networks (GNN) offer a promising machine learning method to directly utilize the chemical graphs as the input, and many existing GNN-based approaches to the molecular property prediction problem have shown that they can enjoy better learning performances compared to the traditional approaches that are based on feature vectors. In this study, we develop a molecular inference framework based on mol-infer, namely mol-infer-GNN, that utilizes GNN as the learning method while keeping the great flexibility originated from the two-layered model on the abstract structure of the chemical graph to be inferred. We conducted computational experiments on the QM9 dataset to show that our proposed GNN model can obtain satisfying learning performances for some properties despite its simple structure, and can infer small chemical graphs comprising up to 20 non-hydrogen atoms within reasonable computational time. |
arXiv...arXiv admin note: substantial text overlap with arXiv:2107.02381, arXiv:2109.02628 |
OrbitAll: A Unified Quantum Mechanical Representation Deep Learning Framework for All Molecular Systems | 2025-07-05 | ShowDespite the success of deep learning methods in quantum chemistry, their representational capacity is most often confined to neutral, closed-shell molecules. However, real-world chemical systems often exhibit complex characteristics, including varying charges, spins, and environments. We introduce OrbitAll, a geometry- and physics-informed deep learning framework that can represent all molecular systems with electronic structure information. OrbitAll utilizes spin-polarized orbital features from the underlying quantum mechanical method, and combines it with graph neural networks satisfying SE(3)-equivariance. The resulting framework can represent and process any molecular system with arbitrary charges, spins, and environmental effects. OrbitAll demonstrates superior performance and generalization on predicting charged, open-shell, and solvated molecules, while also robustly extrapolating to molecules significantly larger than the training data by leveraging a physics-informed architecture. OrbitAll achieves chemical accuracy using 10 times fewer training data than competing AI models, with a speedup of approximately |
|
Distributed Equivariant Graph Neural Networks for Large-Scale Electronic Structure Prediction | 2025-07-04 | ShowEquivariant Graph Neural Networks (eGNNs) trained on density-functional theory (DFT) data can potentially perform electronic structure prediction at unprecedented scales, enabling investigation of the electronic properties of materials with extended defects, interfaces, or exhibiting disordered phases. However, as interactions between atomic orbitals typically extend over 10+ angstroms, the graph representations required for this task tend to be densely connected, and the memory requirements to perform training and inference on these large structures can exceed the limits of modern GPUs. Here we present a distributed eGNN implementation which leverages direct GPU communication and introduce a partitioning strategy of the input graph to reduce the number of embedding exchanges between GPUs. Our implementation shows strong scaling up to 128 GPUs, and weak scaling up to 512 GPUs with 87% parallel efficiency for structures with 3,000 to 190,000 atoms on the Alps supercomputer. |
13 pages, 8 figures |
Effective Capacitance Modeling Using Graph Neural Networks | 2025-07-04 | ShowStatic timing analysis is a crucial stage in the VLSI design flow that verifies the timing correctness of circuits. Timing analysis depends on the placement and routing of the design, but at the same time, placement and routing efficiency depend on the final timing performance. VLSI design flows can benefit from timing-related prediction to better perform the earlier stages of the design flow. Effective capacitance is an essential input for gate delay calculation, and finding exact values requires routing or routing estimates. In this work, we propose the first GNN-based post-layout effective capacitance modeling method, GNN-Ceff, that achieves significant speed gains due to GPU parallelization while also providing better accuracy than current heuristics. GNN-Ceff parallelization achieves 929x speedup on real-life benchmarks over the state-of-the-art method run serially. |
7 pag...7 pages, 5 figures, 6 tables |
CosmoBench: A Multiscale, Multiview, Multitask Cosmology Benchmark for Geometric Deep Learning | 2025-07-04 | ShowCosmological simulations provide a wealth of data in the form of point clouds and directed trees. A crucial goal is to extract insights from this data that shed light on the nature and composition of the Universe. In this paper we introduce CosmoBench, a benchmark dataset curated from state-of-the-art cosmological simulations whose runs required more than 41 million core-hours and generated over two petabytes of data. CosmoBench is the largest dataset of its kind: it contains 34 thousand point clouds from simulations of dark matter halos and galaxies at three different length scales, as well as 25 thousand directed trees that record the formation history of halos on two different time scales. The data in CosmoBench can be used for multiple tasks -- to predict cosmological parameters from point clouds and merger trees, to predict the velocities of individual halos and galaxies from their collective positions, and to reconstruct merger trees on finer time scales from those on coarser time scales. We provide several baselines on these tasks, some based on established approaches from cosmological modeling and others rooted in machine learning. For the latter, we study different approaches -- from simple linear models that are minimally constrained by symmetries to much larger and more computationally-demanding models in deep learning, such as graph neural networks. We find that least-squares fits with a handful of invariant features sometimes outperform deep architectures with many more parameters and far longer training times. Still there remains tremendous potential to improve these baselines by combining machine learning and cosmology to fully exploit the data. CosmoBench sets the stage for bridging cosmology and geometric deep learning at scale. We invite the community to push the frontier of scientific discovery by engaging with this dataset, available at https://cosmobench.streamlit.app |
|
Plugging Attention into Power Grids: Towards Transparent Forecasting | 2025-07-04 | ShowAccurate electricity consumption forecasting is crucial for ensuring grid stability and optimizing power generation, particularly in increasingly decentralized and complex systems. While classical approaches such as Generalized Additive Models (GAMs) remain widely used, they often fail to capture the spatial dependencies inherent in energy networks. Graph Neural Networks (GNNs) offer a principled framework to incorporate this structure by directly leveraging graph topologies. In this work, we evaluate a broad set of GNN architectures -- including GCN, GraphSAGE, ChebConv, TAG, APPNP, TransformerConv, and Graph Attention Networks (GAT and GATv2) -- on two real-world electricity consumption datasets from France and the UK. Our experiments show that while complex architectures like GATv2 and TransformerConv do not consistently outperform their simpler counterparts, models such as GCN and APPNP achieve strong results in low-data or highly disaggregated settings. Nonetheless, the vanilla GAT remains highly competitive across both datasets and offers an additional interpretability layer via attention mechanisms. We perform a temporal analysis of attention weights, revealing evolving patterns of regional interaction linked to seasonal and meteorological variability. These results highlight that, although attention is not universally superior, it provides valuable explanatory power when spatial dependencies are prominent. Finally, we benchmark ensemble-based expert aggregation strategies, showing that uniform or learned combinations can enhance robustness and outperform individual models under data heterogeneity. |
16 pages |
A Hybrid Supervised and Self-Supervised Graph Neural Network for Edge-Centric Applications | 2025-07-04 | ShowThis paper presents a novel graph-based deep learning model for tasks involving relations between two nodes (edge-centric tasks), where the focus lies on predicting relationships and interactions between pairs of nodes rather than node properties themselves. This model combines supervised and self-supervised learning, taking into account for the loss function the embeddings learned and patterns with and without ground truth. Additionally it incorporates an attention mechanism that leverages both node and edge features. The architecture, trained end-to-end, comprises two primary components: embedding generation and prediction. First, a graph neural network (GNN) transform raw node features into dense, low-dimensional embeddings, incorporating edge attributes. Then, a feedforward neural model processes the node embeddings to produce the final output. Experiments demonstrate that our model matches or exceeds existing methods for protein-protein interactions prediction and Gene Ontology (GO) terms prediction. The model also performs effectively with one-hot encoding for node features, providing a solution for the previously unsolved problem of predicting similarity between compounds with unknown structures. |
|
Simplifying Graph Neural Kernels: from Stacking Layers to Collapsed Structure | 2025-07-04 | ShowThe Graph Neural Tangent Kernel (GNTK) successfully bridges the gap between kernel methods and Graph Neural Networks (GNNs), addressing key challenges such as the difficulty of training deep networks and the limitations of traditional kernel methods. However, the existing layer-stacking strategy in GNTK introduces redundant computations, significantly increasing computational complexity and limiting scalability for practical applications. To address these issues, this paper proposes the Simplified Graph Neural Tangent Kernel (SGTK), which replaces the traditional multi-layer stacking mechanism with a continuous |
|
Molecular Machine Learning Using Euler Characteristic Transforms | 2025-07-04 | ShowThe shape of a molecule determines its physicochemical and biological properties. However, it is often underrepresented in standard molecular representation learning approaches. Here, we propose using the Euler Characteristic Transform (ECT) as a geometrical-topological descriptor. Computed directly on a molecular graph derived from handcrafted atomic features, the ECT enables the extraction of multiscale structural features, offering a novel way to represent and encode molecular shape in the feature space. We assess the predictive performance of this representation across nine benchmark regression datasets, all centered around predicting the inhibition constant |
|
Multi-Level Fusion Graph Neural Network for Molecule Property Prediction | 2025-07-04 | ShowAccurate molecular property prediction is essential in drug discovery and related fields. However, existing graph neural networks (GNNs) often struggle to simultaneously capture both local and global molecular structures. In this work, we propose a Multi-Level Fusion Graph Neural Network (MLFGNN) that integrates Graph Attention Networks and a novel Graph Transformer to jointly model local and global dependencies. In addition, we incorporate molecular fingerprints as a complementary modality and introduce a mechanism of interaction between attention to adaptively fuse information across representations. Extensive experiments on multiple benchmark datasets demonstrate that MLFGNN consistently outperforms state-of-the-art methods in both classification and regression tasks. Interpretability analysis further reveals that the model effectively captures task-relevant chemical patterns, supporting the usefulness of multi-level and multi-modal fusion in molecular representation learning. |
38 pa...38 pages, 11 figures, 6 tables |
AoI-Energy-Spectrum Optimization in Post-Disaster Powered Communication Intelligent Network via Hierarchical Heterogeneous Graph Neural Network | 2025-07-04 | ShowThis paper designs a post-disaster powered communication intelligent network (PDPCIN) to address communication disruptions caused by ground base station (GBS) failures within the post-disaster area. PDPCIN employs unmanned aerial vehicles (UAVs) to provide wireless data collection (WDC) and wireless energy transmission (WET) for affected areas and leverages low earth orbit satellites (LEO SATs) to relay UAV data to the nearest survival GBS. To ensure basic post-disaster communication while co-optimizing age of information (AoI), energy efficiency, and spectrum efficiency, intelligent synchronization-UAV (IS-UAV) architecture, AoI-based four thresholds updating (AFTU) mechanism, and Dynamic multi-LEO access (DMLA) strategy are proposed. However, three key challenges remain: time-varying task-resource imbalances, complex topology caused by multi-device scheduling, and nonlinear coupling in multidimensional metric optimization, making system optimization NP-hard. Therefore, this paper proposes a hierarchical heterogeneous graph neural networks (HHGNN) framework. It models heterogeneous device nodes and their communication relations as a hierarchical heterogeneous graph structure, integrating our defined graph sensing, exchange, and mask layer to handle the network's input, feature propagation, and output. To search appropriate number of single-LEO SATs, we propose single-LEO SAT density optimization (S-LSDO) algorithm. Finally, we compare the proposed scheme with state-of-the-art benchmarks to validate its superior collaborative optimization of AoI, energy efficiency, and spectrum efficiency. Based on this, we derive the expressions for the expected values of AoI and stagnant AoI proportion. |
|
Learning Traffic Anomalies from Generative Models on Real-Time Observations | 2025-07-04 | ShowAccurate detection of traffic anomalies is crucial for effective urban traffic management and congestion mitigation. We use the Spatiotemporal Generative Adversarial Network (STGAN) framework combining Graph Neural Networks and Long Short-Term Memory networks to capture complex spatial and temporal dependencies in traffic data. We apply STGAN to real-time, minute-by-minute observations from 42 traffic cameras across Gothenburg, Sweden, collected over several months in 2020. The images are processed to compute a flow metric representing vehicle density, which serves as input for the model. Training is conducted on data from April to November 2020, and validation is performed on a separate dataset from November 14 to 23, 2020. Our results demonstrate that the model effectively detects traffic anomalies with high precision and low false positive rates. The detected anomalies include camera signal interruptions, visual artifacts, and extreme weather conditions affecting traffic flow. |
|
ParticleFormer: A 3D Point Cloud World Model for Multi-Object, Multi-Material Robotic Manipulation | 2025-07-04 | Show3D world models (i.e., learning-based 3D dynamics models) offer a promising approach to generalizable robotic manipulation by capturing the underlying physics of environment evolution conditioned on robot actions. However, existing 3D world models are primarily limited to single-material dynamics using a particle-based Graph Neural Network model, and often require time-consuming 3D scene reconstruction to obtain 3D particle tracks for training. In this work, we present ParticleFormer, a Transformer-based point cloud world model trained with a hybrid point cloud reconstruction loss, supervising both global and local dynamics features in multi-material, multi-object robot interactions. ParticleFormer captures fine-grained multi-object interactions between rigid, deformable, and flexible materials, trained directly from real-world robot perception data without an elaborate scene reconstruction. We demonstrate the model's effectiveness both in 3D scene forecasting tasks, and in downstream manipulation tasks using a Model Predictive Control (MPC) policy. In addition, we extend existing dynamics learning benchmarks to include diverse multi-material, multi-object interaction scenarios. We validate our method on six simulation and three real-world experiments, where it consistently outperforms leading baselines by achieving superior dynamics prediction accuracy and less rollout error in downstream visuomotor tasks. Experimental videos are available at https://particleformer.github.io/. |
|
Structure-Aware Compound-Protein Affinity Prediction via Graph Neural Network with Group Lasso Regularization | 2025-07-04 | ShowExplainable artificial intelligence (XAI) approaches have been increasingly applied in drug discovery to learn molecular representations and identify substructures driving property predictions. However, building end-to-end explainable machine learning models for structure-activity relationship (SAR) modeling for compound property prediction faces many challenges, such as limited activity data per target and the sensitivity of properties to subtle molecular changes. To address this, we leveraged activity-cliff molecule pairs, i.e., compounds sharing a common scaffold but differing sharply in potency, targeting three proto-oncogene tyrosine-protein kinase Src proteins (i.e., PDB IDs 1O42, 2H8H, and 4MXO). We implemented graph neural network (GNN) methods to obtain atom-level feature information and predict compound-protein affinity (i.e., half maximal inhibitory concentration, IC50). In addition, we trained GNN models with different structure-aware loss functions to adequately leverage molecular property and structure information. We also utilized group lasso and sparse group lasso to prune and highlight molecular subgraphs and enhance the structure-specific model explainability for the predicted property difference in molecular activity-cliff pairs. We improved drug property prediction by integrating common and uncommon node information and using sparse group lasso, reducing the average root mean squared error (RMSE) by 12.70%, and achieving the lowest averaged RMSE=0.2551 and the highest PCC=0.9572. Furthermore, applying regularization enhances feature attribution methods that estimate the contribution of each atom in the molecular graphs by boosting global direction scores and atom-level accuracy in atom coloring accuracy, which improves model interpretability in drug discovery pipelines, particularly in investigating important molecular substructures in lead optimization. |
15 pages, 7 figures |
On the Adversarial Robustness of Graph Neural Networks with Graph Reduction | 2025-07-03 | ShowAs Graph Neural Networks (GNNs) become increasingly popular for learning from large-scale graph data across various domains, their susceptibility to adversarial attacks when using graph reduction techniques for scalability remains underexplored. In this paper, we present an extensive empirical study to investigate the impact of graph reduction techniques, specifically graph coarsening and sparsification, on the robustness of GNNs against adversarial attacks. Through extensive experiments involving multiple datasets and GNN architectures, we examine the effects of four sparsification and six coarsening methods on the poisoning attacks. Our results indicate that, while graph sparsification can mitigate the effectiveness of certain poisoning attacks, such as Mettack, it has limited impact on others, like PGD. Conversely, graph coarsening tends to amplify the adversarial impact, significantly reducing classification accuracy as the reduction ratio decreases. Additionally, we provide a novel analysis of the causes driving these effects and examine how defensive GNN models perform under graph reduction, offering practical insights for designing robust GNNs within graph acceleration systems. |
|
RLHGNN: Reinforcement Learning-driven Heterogeneous Graph Neural Network for Next Activity Prediction in Business Processes | 2025-07-03 | ShowNext activity prediction represents a fundamental challenge for optimizing business processes in service-oriented architectures such as microservices environments, distributed enterprise systems, and cloud-native platforms, which enables proactive resource allocation and dynamic service composition. Despite the prevalence of sequence-based methods, these approaches fail to capture non-sequential relationships that arise from parallel executions and conditional dependencies. Even though graph-based approaches address structural preservation, they suffer from homogeneous representations and static structures that apply uniform modeling strategies regardless of individual process complexity characteristics. To address these limitations, we introduce RLHGNN, a novel framework that transforms event logs into heterogeneous process graphs with three distinct edge types grounded in established process mining theory. Our approach creates four flexible graph structures by selectively combining these edges to accommodate different process complexities, and employs reinforcement learning formulated as a Markov Decision Process to automatically determine the optimal graph structure for each specific process instance. RLHGNN then applies heterogeneous graph convolution with relation-specific aggregation strategies to effectively predict the next activity. This adaptive methodology enables precise modeling of both sequential and non-sequential relationships in service interactions. Comprehensive evaluation on six real-world datasets demonstrates that RLHGNN consistently outperforms state-of-the-art approaches. Furthermore, it maintains an inference latency of approximately 1 ms per prediction, representing a highly practical solution suitable for real-time business process monitoring applications. The source code is available at https://github.com/Joker3993/RLHGNN. |
15 pa...15 pages, 7 figures. Business process prediction using reinforcement learning and heterogeneous graph neural networks |
Interpreting Graph Inference with Skyline Explanations | 2025-07-03 | ShowInference queries have been routinely issued to graph machine learning models such as graph neural networks (GNNs) for various network analytical tasks. Nevertheless, GNNs outputs are often hard to interpret comprehensively. Existing methods typically compromise to individual pre-defined explainability measures (such as fidelity), which often leads to biased, ``one-sided'' interpretations. This paper introduces skyline explanation, a new paradigm that interprets GNN output by simultaneously optimizing multiple explainability measures of users' interests. (1) We propose skyline explanations as a Pareto set of explanatory subgraphs that dominate others over multiple explanatory measures. We formulate skyline explanation as a multi-criteria optimization problem, and establish its hardness results. (2) We design efficient algorithms with an onion-peeling approach, which strategically prioritizes nodes and removes unpromising edges to incrementally assemble skyline explanations. (3) We also develop an algorithm to diversify the skyline explanations to enrich the comprehensive interpretation. (4) We introduce efficient parallel algorithms with load-balancing strategies to scale skyline explanation for large-scale GNN-based inference. Using real-world and synthetic graphs, we experimentally verify our algorithms' effectiveness and scalability. |
|
S2FGL: Spatial Spectral Federated Graph Learning | 2025-07-03 | ShowFederated Graph Learning (FGL) combines the privacy-preserving capabilities of federated learning (FL) with the strong graph modeling capability of Graph Neural Networks (GNNs). Current research addresses subgraph-FL only from the structural perspective, neglecting the propagation of graph signals on spatial and spectral domains of the structure. From a spatial perspective, subgraph-FL introduces edge disconnections between clients, leading to disruptions in label signals and a degradation in the class knowledge of the global GNN. From a spectral perspective, spectral heterogeneity causes inconsistencies in signal frequencies across subgraphs, which makes local GNNs overfit the local signal propagation schemes. As a result, spectral client drifts occur, undermining global generalizability. To tackle the challenges, we propose a global knowledge repository to mitigate label signal disruption and a frequency alignment to address spectral client drifts. The combination of spatial and spectral strategies forms our framework S2FGL. Extensive experiments on multiple datasets demonstrate the superiority of S2FGL. The code is available at https://github.com/Wonder7racer/S2FGL.git. |
|
Enhancing Swarms Durability to Threats via Graph Signal Processing and GNN-based Generative Modeling | 2025-07-03 | ShowSwarms, such as schools of fish or drone formations, are prevalent in both natural and engineered systems. While previous works have focused on the social interactions within swarms, the role of external perturbations--such as environmental changes, predators, or communication breakdowns--in affecting swarm stability is not fully understood. Our study addresses this gap by modeling swarms as graphs and applying graph signal processing techniques to analyze perturbations as signals on these graphs. By examining predation, we uncover a "detectability-durability trade-off", demonstrating a tension between a swarm's ability to evade detection and its resilience to predation, once detected. We provide theoretical and empirical evidence for this trade-off, explicitly tying it to properties of the swarm's spatial configuration. Toward task-specific optimized swarms, we introduce SwaGen, a graph neural network-based generative model. We apply SwaGen to resilient swarm generation by defining a task-specific loss function, optimizing the contradicting trade-off terms simultaneously.With this, SwaGen reveals novel spatial configurations, optimizing the trade-off at both ends. Applying the model can guide the design of robust artificial swarms and deepen our understanding of natural swarm dynamics. |
|
Neural Graph Matching Improves Retrieval Augmented Generation in Molecular Machine Learning | 2025-07-03 | ShowMolecular machine learning has gained popularity with the advancements of geometric deep learning. In parallel, retrieval-augmented generation has become a principled approach commonly used with language models. However, the optimal integration of retrieval augmentation into molecular machine learning remains unclear. Graph neural networks stand to benefit from clever matching to understand the structural alignment of retrieved molecules to a query molecule. Neural graph matching offers a compelling solution by explicitly modeling node and edge affinities between two structural graphs while employing a noise-robust, end-to-end neural network to learn affinity metrics. We apply this approach to mass spectrum simulation and introduce MARASON, a novel model that incorporates neural graph matching to enhance a fragmentation-based neural network. Experimental results highlight the effectiveness of our design, with MARASON achieving 28% top-1 accuracy, a substantial improvement over the non-retrieval state-of-the-art accuracy of 19%. Moreover, MARASON outperforms both naive retrieval-augmented generation methods and traditional graph matching approaches. Code is publicly available at https://github.com/coleygroup/ms-pred |
ICML 2025 |
Generating Large Semi-Synthetic Graphs of Any Size | 2025-07-02 | ShowGraph generation is an important area in network science. Traditional approaches focus on replicating specific properties of real-world graphs, such as small diameters or power-law degree distributions. Recent advancements in deep learning, particularly with Graph Neural Networks, have enabled data-driven methods to learn and generate graphs without relying on predefined structural properties. Despite these advances, current models are limited by their reliance on node IDs, which restricts their ability to generate graphs larger than the input graph and ignores node attributes. To address these challenges, we propose Latent Graph Sampling Generation (LGSG), a novel framework that leverages diffusion models and node embeddings to generate graphs of varying sizes without retraining. The framework eliminates the dependency on node IDs and captures the distribution of node embeddings and subgraph structures, enabling scalable and flexible graph generation. Experimental results show that LGSG performs on par with baseline models for standard metrics while outperforming them in overlooked ones, such as the tendency of nodes to form clusters. Additionally, it maintains consistent structural characteristics across graphs of different sizes, demonstrating robustness and scalability. |
|
Non-exchangeable Conformal Prediction for Temporal Graph Neural Networks | 2025-07-02 | ShowConformal prediction for graph neural networks (GNNs) offers a promising framework for quantifying uncertainty, enhancing GNN reliability in high-stakes applications. However, existing methods predominantly focus on static graphs, neglecting the evolving nature of real-world graphs. Temporal dependencies in graph structure, node attributes, and ground truth labels violate the fundamental exchangeability assumption of standard conformal prediction methods, limiting their applicability. To address these challenges, in this paper, we introduce NCPNET, a novel end-to-end conformal prediction framework tailored for temporal graphs. Our approach extends conformal prediction to dynamic settings, mitigating statistical coverage violations induced by temporal dependencies. To achieve this, we propose a diffusion-based non-conformity score that captures both topological and temporal uncertainties within evolving networks. Additionally, we develop an efficiency-aware optimization algorithm that improves the conformal prediction process, enhancing computational efficiency and reducing coverage violations. Extensive experiments on diverse real-world temporal graphs, including WIKI, REDDIT, DBLP, and IBM Anti-Money Laundering dataset, demonstrate NCPNET's capability to ensure guaranteed coverage in temporal graphs, achieving up to a 31% reduction in prediction set size on the WIKI dataset, significantly improving efficiency compared to state-of-the-art methods. Our data and code are available at https://github.com/ODYSSEYWT/NCPNET. |
accepted by KDD 2025 |
Enhancing Power Flow Estimation with Topology-Aware Gated Graph Neural Networks | 2025-07-02 | ShowAccurate and scalable surrogate models for AC power flow are essential for real-time grid monitoring, contingency analysis, and decision support in increasingly dynamic and inverter-dominated power systems. However, most existing surrogates fall short of practical deployment due to their limited capacity to capture long-range nonlinear dependencies in meshed transmission networks and their weak enforcement of physical laws. These models often require extensive hyperparameter tuning, exhibit poor generalization under topology changes or large load swings, and typically do not quantify uncertainty or scale well beyond a few hundred buses. To address these challenges, this paper proposes a \textit{gated graph neural network (GGNN)} surrogate for AC power-flow estimation under topological uncertainty. The model is trained across multiple IEEE benchmark networks of varying size and complexity, each incorporating randomized line contingencies and up to 40% load variation. To improve robustness and generalization, we explore both conventional supervised learning and physics-informed self-supervised training strategies. Comparative evaluations show that the proposed GGNN consistently outperforms prior GNN-based surrogates, achieving predictions closely aligned with Newton--Raphson solutions. By embedding operational constraints directly into the architecture and loss function, the model ensures physical consistency and delivers a lightweight, accurate, and scalable tool for real-time grid operations. |
|
Graph-Based Deep Learning for Component Segmentation of Maize Plants | 2025-07-02 | ShowIn precision agriculture, one of the most important tasks when exploring crop production is identifying individual plant components. There are several attempts to accomplish this task by the use of traditional 2D imaging, 3D reconstructions, and Convolutional Neural Networks (CNN). However, they have several drawbacks when processing 3D data and identifying individual plant components. Therefore, in this work, we propose a novel Deep Learning architecture to detect components of individual plants on Light Detection and Ranging (LiDAR) 3D Point Cloud (PC) data sets. This architecture is based on the concept of Graph Neural Networks (GNN), and feature enhancing with Principal Component Analysis (PCA). For this, each point is taken as a vertex and by the use of a K-Nearest Neighbors (KNN) layer, the edges are established, thus representing the 3D PC data set. Subsequently, Edge-Conv layers are used to further increase the features of each point. Finally, Graph Attention Networks (GAT) are applied to classify visible phenotypic components of the plant, such as the leaf, stem, and soil. This study demonstrates that our graph-based deep learning approach enhances segmentation accuracy for identifying individual plant components, achieving percentages above 80% in the IoU average, thus outperforming other existing models based on point clouds. |
|
Joint Power Control and Precoding for Cell-Free Massive MIMO Systems With Sparse Multi-Dimensional Graph Neural Networks | 2025-07-02 | ShowCell-free massive multiple-input multiple-output (CF mMIMO) has emerged as a prominent candidate for future networks due to its ability to significantly enhance spectral efficiency by eliminating inter-cell interference. However, its practical deployment faces considerable challenges, such as high computational complexity and the optimization of its complex processing. To address these challenges, this correspondence proposes a framework based on a sparse multi-dimensional graph neural network (SP-MDGNN), which sparsifies the connections between access points (APs) and user equipments (UEs) to significantly reduce computational complexity while maintaining high performance. In addition, the weighted minimum mean square error (WMMSE) algorithm is introduced as a comparative method to further analyze the trade-off between performance and complexity. Simulation results demonstrate that the sparse method achieves an optimal balance between performance and complexity, significantly reducing the computational complexity of the original MDGNN method while incurring only a slight performance degradation, providing insights for the practical deployment of CF mMIMO systems in large-scale network. |
5 pages, 5 figures |
MILP-SAT-GNN: Yet Another Neural SAT Solver | 2025-07-02 | ShowWe proposes a novel method that enables Graph Neural Networks (GNNs) to solve SAT problems by leveraging a technique developed for applying GNNs to Mixed Integer Linear Programming (MILP). Specifically, k-CNF formulae are mapped into MILP problems, which are then encoded as weighted bipartite graphs and subsequently fed into a GNN for training and testing. From a theoretical perspective: (i) we establish permutation and equivalence invariance results, demonstrating that the method produces outputs that are stable under reordering of clauses and variables; (ii) we identify a theoretical limitation, showing that for a class of formulae called foldable formulae, standard GNNs cannot always distinguish satisfiable from unsatisfiable instances; (iii) we prove a universal approximation theorem, establishing that with Random Node Initialization (RNI), the method can approximate SAT solving to arbitrary precision on finite datasets, that is, the GNN becomes approximately sound and complete on such datasets. Furthermore, we show that for unfoldable formulae, the same approximation guarantee can be achieved without the need for RNI. Finally, we conduct an experimental evaluation of our approach, which show that, despite the simplicity of the neural architecture, the method achieves promising results. |
|
Vehicle-group-based Crash Risk Prediction and Interpretation on Highways | 2025-07-01 | ShowPrevious studies in predicting crash risks primarily associated the number or likelihood of crashes on a road segment with traffic parameters or geometric characteristics, usually neglecting the impact of vehicles' continuous movement and interactions with nearby vehicles. Recent technology advances, such as Connected and Automated Vehicles (CAVs) and Unmanned Aerial Vehicles (UAVs) are able to collect high-resolution trajectory data, which enables trajectory-based risk analysis. This study investigates a new vehicle group (VG) based risk analysis method and explores risk evolution mechanisms considering VG features. An impact-based vehicle grouping method is proposed to cluster vehicles into VGs by evaluating their responses to the erratic behaviors of nearby vehicles. The risk of a VG is aggregated based on the risk between each vehicle pair in the VG, measured by inverse Time-to-Collision (iTTC). A Logistic Regression and a Graph Neural Network (GNN) are then employed to predict VG risks using aggregated and disaggregated VG information. Both methods achieve excellent performance with AUC values exceeding 0.93. For the GNN model, GNNExplainer with feature perturbation is applied to identify critical individual vehicle features and their directional impact on VG risks. Overall, this research contributes a new perspective for identifying, predicting, and interpreting traffic risks. |
Accep...Accepted and published in IEEE Transactions on Intelligent Transportation Systems, vol. 26, no. 6, pp. 7807-7818, June 2025. DOI: 10.1109/TITS.2025.3556543 |
RaGNNarok: A Light-Weight Graph Neural Network for Enhancing Radar Point Clouds on Unmanned Ground Vehicles | 2025-07-01 | ShowLow-cost indoor mobile robots have gained popularity with the increasing adoption of automation in homes and commercial spaces. However, existing lidar and camera-based solutions have limitations such as poor performance in visually obscured environments, high computational overhead for data processing, and high costs for lidars. In contrast, mmWave radar sensors offer a cost-effective and lightweight alternative, providing accurate ranging regardless of visibility. However, existing radar-based localization suffers from sparse point cloud generation, noise, and false detections. Thus, in this work, we introduce RaGNNarok, a real-time, lightweight, and generalizable graph neural network (GNN)-based framework to enhance radar point clouds, even in complex and dynamic environments. With an inference time of just 7.3 ms on the low-cost Raspberry Pi 5, RaGNNarok runs efficiently even on such resource-constrained devices, requiring no additional computational resources. We evaluate its performance across key tasks, including localization, SLAM, and autonomous navigation, in three different environments. Our results demonstrate strong reliability and generalizability, making RaGNNarok a robust solution for low-cost indoor mobile robots. |
8 pag...8 pages, accepted by IROS 2025 |
Understanding Generalization in Node and Link Prediction | 2025-07-01 | ShowUsing message-passing graph neural networks (MPNNs) for node and link prediction is crucial in various scientific and industrial domains, which has led to the development of diverse MPNN architectures. Besides working well in practical settings, their ability to generalize beyond the training set remains poorly understood. While some studies have explored MPNNs' generalization in graph-level prediction tasks, much less attention has been given to node- and link-level predictions. Existing works often rely on unrealistic i.i.d.@ assumptions, overlooking possible correlations between nodes or links, and assuming fixed aggregation and impractical loss functions while neglecting the influence of graph structure. In this work, we introduce a unified framework to analyze the generalization properties of MPNNs in inductive and transductive node and link prediction settings, incorporating diverse architectural parameters and loss functions and quantifying the influence of graph structure. Additionally, our proposed generalization framework can be applied beyond graphs to any classification task under the inductive or transductive setting. Our empirical study supports our theoretical insights, deepening our understanding of MPNNs' generalization capabilities in these tasks. |
arXiv...arXiv admin note: text overlap with arXiv:2412.07106 |
NN-Former: Rethinking Graph Structure in Neural Architecture Representation | 2025-07-01 | ShowThe growing use of deep learning necessitates efficient network design and deployment, making neural predictors vital for estimating attributes such as accuracy and latency. Recently, Graph Neural Networks (GNNs) and transformers have shown promising performance in representing neural architectures. However, each of both methods has its disadvantages. GNNs lack the capabilities to represent complicated features, while transformers face poor generalization when the depth of architecture grows. To mitigate the above issues, we rethink neural architecture topology and show that sibling nodes are pivotal while overlooked in previous research. We thus propose a novel predictor leveraging the strengths of GNNs and transformers to learn the enhanced topology. We introduce a novel token mixer that considers siblings, and a new channel mixer named bidirectional graph isomorphism feed-forward network. Our approach consistently achieves promising performance in both accuracy and latency prediction, providing valuable insights for learning Directed Acyclic Graph (DAG) topology. The code is available at https://github.com/XuRuihan/NNFormer. |
Accep...Accepted to CVPR 2025. Code is avaiable at https://github.com/XuRuihan/NNFormer |
Studying and Improving Graph Neural Network-based Motif Estimation | 2025-07-01 | ShowGraph Neural Networks (GNNs) are a predominant method for graph representation learning. However, beyond subgraph frequency estimation, their application to network motif significance-profile (SP) prediction remains under-explored, with no established benchmarks in the literature. We propose to address this problem, framing SP estimation as a task independent of subgraph frequency estimation. Our approach shifts from frequency counting to direct SP estimation and modulates the problem as multitarget regression. The reformulation is optimised for interpretability, stability and scalability on large graphs. We validate our method using a large synthetic dataset and further test it on real-world graphs. Our experiments reveal that 1-WL limited models struggle to make precise estimations of SPs. However, they can generalise to approximate the graph generation processes of networks by comparing their predicted SP with the ones originating from synthetic generators. This first study on GNN-based motif estimation also hints at how using direct SP estimation can help go past the theoretical limitations that motif estimation faces when performed through subgraph counting. |
This ...This manuscript represents a revised version from the paper on https://openreview.net/forum?id=PZVVOeu6xx. Still a work in progress. Comments are welcome! 23 pages (12 main text + references), 9 figures, 5 tables. (First update: Fix broken links, references and text review.) |
Neural Augmented Kalman Filters for Road Network assisted GNSS positioning | 2025-07-01 | ShowThe Global Navigation Satellite System (GNSS) provides critical positioning information globally, but its accuracy in dense urban environments is often compromised by multipath and non-line-of-sight errors. Road network data can be used to reduce the impact of these errors and enhance the accuracy of a positioning system. Previous works employing road network data are either limited to offline applications, or rely on Kalman Filter (KF) heuristics with little flexibility and robustness. We instead propose training a Temporal Graph Neural Network (TGNN) to integrate road network information into a KF. The TGNN is designed to predict the correct road segment and its associated uncertainty to be used in the measurement update step of the KF. We validate our approach with real-world GNSS data and open-source road networks, observing a 29% decrease in positioning error for challenging scenarios compared to a GNSS-only KF. To the best of our knowledge, ours is the first deep learning-based approach jointly employing road network data and GNSS measurements to determine the user position on Earth. |
Accep...Accepted to ICML 2025 workshop ML4Wireless |
Cooperative Sheaf Neural Networks | 2025-07-01 | ShowSheaf diffusion has recently emerged as a promising design pattern for graph representation learning due to its inherent ability to handle heterophilic data and avoid oversmoothing. Meanwhile, cooperative message passing has also been proposed as a way to enhance the flexibility of information diffusion by allowing nodes to independently choose whether to propagate/gather information from/to neighbors. A natural question ensues: is sheaf diffusion capable of exhibiting this cooperative behavior? Here, we provide a negative answer to this question. In particular, we show that existing sheaf diffusion methods fail to achieve cooperative behavior due to the lack of message directionality. To circumvent this limitation, we introduce the notion of cellular sheaves over directed graphs and characterize their in- and out-degree Laplacians. We leverage our construction to propose Cooperative Sheaf Neural Networks (CSNNs). Theoretically, we characterize the receptive field of CSNN and show it allows nodes to selectively attend (listen) to arbitrarily far nodes while ignoring all others in their path, potentially mitigating oversquashing. Our experiments show that CSNN presents overall better performance compared to prior art on sheaf diffusion as well as cooperative graph neural networks. |
|
Rotational Sampling: A Plug-and-Play Encoder for Rotation-Invariant 3D Molecular GNNs | 2025-07-01 | ShowGraph neural networks (GNNs) have achieved remarkable success in molecular property prediction. However, traditional graph representations struggle to effectively encode the inherent 3D spatial structures of molecules, as molecular orientations in 3D space introduce significant variability, severely limiting model generalization and robustness. Existing approaches primarily focus on rotation-invariant and rotation-equivariant methods. Invariant methods often rely heavily on prior knowledge and lack sufficient generalizability, while equivariant methods suffer from high computational costs. To address these limitations, this paper proposes a novel plug-and-play 3D encoding module leveraging rotational sampling. By computing the expectation over the SO(3) rotational group, the method naturally achieves approximate rotational invariance. Furthermore, by introducing a carefully designed post-alignment strategy, strict invariance can be achieved without compromising performance. Experimental evaluations on the QM9 and C10 Datasets demonstrate superior predictive accuracy, robustness, and generalization performance compared to existing methods. Moreover, the proposed approach maintains low computational complexity and enhanced interpretability, providing a promising direction for efficient and effective handling of 3D molecular information in drug discovery and material design. |
|
An Automatic Graph Construction Framework based on Large Language Models for Recommendation | 2025-07-01 | ShowGraph neural networks (GNNs) have emerged as state-of-the-art methods to learn from graph-structured data for recommendation. However, most existing GNN-based recommendation methods focus on the optimization of model structures and learning strategies based on pre-defined graphs, neglecting the importance of the graph construction stage. Earlier works for graph construction usually rely on speciffic rules or crowdsourcing, which are either too simplistic or too labor-intensive. Recent works start to utilize large language models (LLMs) to automate the graph construction, in view of their abundant open-world knowledge and remarkable reasoning capabilities. Nevertheless, they generally suffer from two limitations: (1) invisibility of global view (e.g., overlooking contextual information) and (2) construction inefficiency. To this end, we introduce AutoGraph, an automatic graph construction framework based on LLMs for recommendation. Specifically, we first use LLMs to infer the user preference and item knowledge, which is encoded as semantic vectors. Next, we employ vector quantization to extract the latent factors from the semantic vectors. The latent factors are then incorporated as extra nodes to link the user/item nodes, resulting in a graph with in-depth global-view semantics. We further design metapath-based message aggregation to effectively aggregate the semantic and collaborative information. The framework is model-agnostic and compatible with different backbone models. Extensive experiments on three real-world datasets demonstrate the efficacy and efffciency of AutoGraph compared to existing baseline methods. We have deployed AutoGraph in Huawei advertising platform, and gain a 2.69% improvement on RPM and a 7.31% improvement on eCPM in the online A/B test. Currently AutoGraph has been used as the main trafffc model, serving hundreds of millions of people. |
Accepted by KDD'25 |
A Recipe for Causal Graph Regression: Confounding Effects Revisited | 2025-07-01 | ShowThrough recognizing causal subgraphs, causal graph learning (CGL) has risen to be a promising approach for improving the generalizability of graph neural networks under out-of-distribution (OOD) scenarios. However, the empirical successes of CGL techniques are mostly exemplified in classification settings, while regression tasks, a more challenging setting in graph learning, are overlooked. We thus devote this work to tackling causal graph regression (CGR); to this end we reshape the processing of confounding effects in existing CGL studies, which mainly deal with classification. Specifically, we reflect on the predictive power of confounders in graph-level regression, and generalize classification-specific causal intervention techniques to regression through a lens of contrastive learning. Extensive experiments on graph OOD benchmarks validate the efficacy of our proposals for CGR. The model implementation and the code are provided on https://github.com/causal-graph/CGR. |
ICML 2025 accepted |
Generative Modeling of Full-Atom Protein Conformations using Latent Diffusion on Graph Embeddings | 2025-06-30 | ShowGenerating diverse, all-atom conformational ensembles of dynamic proteins such as G-protein-coupled receptors (GPCRs) is critical for understanding their function, yet most generative models simplify atomic detail or ignore conformational diversity altogether. We present latent diffusion for full protein generation (LD-FPG), a framework that constructs complete all-atom protein structures, including every side-chain heavy atom, directly from molecular dynamics (MD) trajectories. LD-FPG employs a Chebyshev graph neural network (ChebNet) to obtain low-dimensional latent embeddings of protein conformations, which are processed using three pooling strategies: blind, sequential and residue-based. A diffusion model trained on these latent representations generates new samples that a decoder, optionally regularized by dihedral-angle losses, maps back to Cartesian coordinates. Using D2R-MD, a 2-microsecond MD trajectory (12 000 frames) of the human dopamine D2 receptor in a membrane environment, the sequential and residue-based pooling strategy reproduces the reference ensemble with high structural fidelity (all-atom lDDT of approximately 0.7; C-alpha-lDDT of approximately 0.8) and recovers backbone and side-chain dihedral-angle distributions with a Jensen-Shannon divergence of less than 0.03 compared to the MD data. LD-FPG thereby offers a practical route to system-specific, all-atom ensemble generation for large proteins, providing a promising tool for structure-based therapeutic design on complex, dynamic targets. The D2R-MD dataset and our implementation are freely available to facilitate further research. |
10 pa...10 pages (main text), 4 figures, 2 tables. Submitted to NeurIPS 2025. Code and data are publicly available |
Bridging Theory and Practice in Link Representation with Graph Neural Networks | 2025-06-30 | ShowGraph Neural Networks (GNNs) are widely used to compute representations of node pairs for downstream tasks such as link prediction. Yet, theoretical understanding of their expressive power has focused almost entirely on graph-level representations. In this work, we shift the focus to links and provide the first comprehensive study of GNN expressiveness in link representation. We introduce a unifying framework, the |
|
Graph Neural Networks in Wind Power Forecasting | 2025-06-30 | ShowWe study the applicability of GNNs to the problem of wind energy forecasting. We find that certain architectures achieve performance comparable to our best CNN-based benchmark. The study is conducted on three wind power facilities using five years of historical data. Numerical Weather Prediction (NWP) variables were used as predictors, and models were evaluated on a 24 to 36 hour ahead test horizon. |
|
Z-REx: Human-Interpretable GNN Explanations for Real Estate Recommendations | 2025-06-30 | ShowTransparency and interpretability are crucial for enhancing customer confidence and user engagement, especially when dealing with black-box Machine Learning (ML)-based recommendation systems. Modern recommendation systems leverage Graph Neural Network (GNN) due to their ability to produce high-quality recommendations in terms of both relevance and diversity. Therefore, the explainability of GNN is especially important for Link Prediction (LP) tasks since recommending relevant items can be viewed as predicting links between users and items. GNN explainability has been a well-studied field, but existing methods primarily focus on node or graph-level tasks, leaving a gap in LP explanation techniques. This work introduces Z-REx, a GNN explanation framework designed explicitly for heterogeneous link prediction tasks. Z-REx utilizes structural and attribute perturbation to identify critical substructures and important features while reducing the search space by leveraging domain-specific knowledge. In our experimentation, we show the efficacy of Z-REx in generating contextually relevant and human-interpretable explanations for ZiGNN, a GNN-based recommendation engine, using a real-world real-estate dataset from Zillow Group, Inc. We compare against State-of-The-Art (SOTA) GNN explainers to show Z-REx outperforms them by 61% in the Fidelity metric by producing superior human-interpretable explanations. |
Accep...Accepted to be published in KDD Workshop in Machine Learning on Graphs in the Era of Generative Artificial Intelligence (MLoG-GenAI@KDD) 2025 |
NeuralOM: Neural Ocean Model for Subseasonal-to-Seasonal Simulation | 2025-06-30 | ShowAccurate Subseasonal-to-Seasonal (S2S) ocean simulation is critically important for marine research, yet remains challenging due to its substantial thermal inertia and extended time delay. Machine learning (ML)-based models have demonstrated significant advancements in simulation accuracy and computational efficiency compared to traditional numerical methods. Nevertheless, a significant limitation of current ML models for S2S ocean simulation is th CBB0 eir inadequate incorporation of physical consistency and the slow-changing properties of the ocean system. In this work, we propose a neural ocean model (NeuralOM) for S2S ocean simulation with a multi-scale interactive graph neural network to emulate diverse physical phenomena associated with ocean systems effectively. Specifically, we propose a multi-stage framework tailored to model the ocean's slowly changing nature. Additionally, we introduce a multi-scale interactive messaging module to capture complex dynamical behaviors, such as gradient changes and multiplicative coupling relationships inherent in ocean dynamics. Extensive experimental evaluations confirm that our proposed NeuralOM outperforms state-of-the-art models in S2S and extreme event simulation. The codes are available at https://github.com/YuanGao-YG/NeuralOM. |
|
When GNNs Met a Word Equations Solver: Learning to Rank Equations (Extended Technical Report) | 2025-06-30 | ShowNielsen transformation is a standard approach for solving word equations: by repeatedly splitting equations and applying simplification steps, equations are rewritten until a solution is reached. When solving a conjunction of word equations in this way, the performance of the solver will depend considerably on the order in which equations are processed. In this work, the use of Graph Neural Networks (GNNs) for ranking word equations before and during the solving process is explored. For this, a novel graph-based representation for word equations is presented, preserving global information across conjuncts, enabling the GNN to have a holistic view during ranking. To handle the variable number of conjuncts, three approaches to adapt a multi-classification task to the problem of ranking equations are proposed. The training of the GNN is done with the help of minimum unsatisfiable subsets (MUSes) of word equations. The experimental results show that, compared to state-of-the-art string solvers, the new framework solves more problems in benchmarks where each variable appears at most once in each equation. |
|
Adaptive Supergeo Design: A Scalable Framework for Geographic Marketing Experiments | 2025-06-30 | ShowGeographic experiments are a gold-standard for measuring incremental return on ad spend (iROAS) at scale, yet their design is challenging: the unit count is small, heterogeneity is large, and the optimal Supergeo partitioning problem is NP-hard. We introduce Adaptive Supergeo Design (ASD), a two-stage framework that renders Supergeo designs practical for thousands of markets. A bespoke graph-neural network first learns geo-embeddings and proposes a concise candidate set of 'supergeos'; a CP-SAT solver then selects a partition that balances both baseline outcomes and pre-treatment covariates believed to modify the treatment effect. We prove that ASD's objective value is within (1+epsilon) of the global optimum under mild community-structure assumptions. In simulations with up to 1,000 Designated Market Areas ASD completes in minutes on standard hardware, retains every media dollar, and cuts iROAS bias substantively relative to existing methods. ASD therefore turns geo-lift testing into a routine, scalable component of media planning while preserving statistical rigour. |
|
CPT: Competence-progressive Training Strategy for Few-shot Node Classification | 2025-06-30 | ShowGraph Neural Networks (GNNs) have made significant advancements in node classification, but their success relies on sufficient labeled nodes per class in the training data. Real-world graph data often exhibits a long-tail distribution with sparse labels, emphasizing the importance of GNNs' ability in few-shot node classification, which entails categorizing nodes with limited data. Traditional episodic meta-learning approaches have shown promise in this domain, but they face an inherent limitation: it might lead the model to converge to suboptimal solutions because of random and uniform task assignment, ignoring task difficulty levels. This could lead the meta-learner to face complex tasks too soon, hindering proper learning. Ideally, the meta-learner should start with simple concepts and advance to more complex ones, like human learning. So, we introduce CPT, a novel two-stage curriculum learning method that aligns task difficulty with the meta-learner's progressive competence, enhancing overall performance. Specifically, in CPT's initial stage, the focus is on simpler tasks, fostering foundational skills for engaging with complex tasks later. Importantly, the second stage dynamically adjusts task difficulty based on the meta-learner's growing competence, aiming for optimal knowledge acquisition. Extensive experiments on popular node classification datasets demonstrate significant improvements of our strategy over existing methods. |
APWEB-WAIM 2025 |
Visual Re-Ranking with Non-Visual Side Information | 2025-06-30 | ShowThe standard approach for visual place recognition is to use global image descriptors to retrieve the most similar database images for a given query image. The results can then be further improved with re-ranking methods that re-order the top scoring images. However, existing methods focus on re-ranking based on the same image descriptors that were used for the initial retrieval, which we argue provides limited additional signal. In this work we propose Generalized Contextual Similarity Aggregation (GCSA), which is a graph neural network-based re-ranking method that, in addition to the visual descriptors, can leverage other types of available side information. This can for example be other sensor data (such as signal strength of nearby WiFi or BlueTooth endpoints) or geometric properties such as camera poses for database images. In many applications this information is already present or can be acquired with low effort. Our architecture leverages the concept of affinity vectors to allow for a shared encoding of the heterogeneous multi-modal input. Two large-scale datasets, covering both outdoor and indoor localization scenarios, are utilized for training and evaluation. In experiments we show significant improvement not only on image retrieval metrics, but also for the downstream visual localization task. |
Accep...Accepted at Scandinavian Conference on Image Analysis (SCIA) 2025 |
RegionGCN: Spatial-Heterogeneity-Aware Graph Convolutional Networks | 2025-06-30 | ShowModeling spatial heterogeneity in the data generation process is essential for understanding and predicting geographical phenomena. Despite their prevalence in geospatial tasks, neural network models usually assume spatial stationarity, which could limit their performance in the presence of spatial process heterogeneity. By allowing model parameters to vary over space, several approaches have been proposed to incorporate spatial heterogeneity into neural networks. However, current geographically weighting approaches are ineffective on graph neural networks, yielding no significant improvement in prediction accuracy. We assume the crux lies in the over-fitting risk brought by a large number of local parameters. Accordingly, we propose to model spatial process heterogeneity at the regional level rather than at the individual level, which largely reduces the number of spatially varying parameters. We further develop a heuristic optimization procedure to learn the region partition adaptively in the process of model training. Our proposed spatial-heterogeneity-aware graph convolutional network, named RegionGCN, is applied to the spatial prediction of county-level vote share in the 2016 US presidential election based on socioeconomic attributes. Results show that RegionGCN achieves significant improvement over the basic and geographically weighted GCNs. We also offer an exploratory analysis tool for the spatial variation of non-linear relationships through ensemble learning of regional partitions from RegionGCN. Our work contributes to the practice of Geospatial Artificial Intelligence (GeoAI) in tackling spatial heterogeneity. |
29 pages, 6 figures |
Strategic Counterfactual Modeling of Deep-Target Airstrike Systems via Intervention-Aware Spatio-Causal Graph Networks | 2025-06-30 | ShowThis study addresses the lack of structured causal modeling between tactical strike behavior and strategic delay in current strategic-level simulations, particularly the structural bottlenecks in capturing intermediate variables within the "resilience - nodal suppression - negotiation window" chain. We propose the Intervention-Aware Spatio-Temporal Graph Neural Network (IA-STGNN), a novel framework that closes the causal loop from tactical input to strategic delay output. The model integrates graph attention mechanisms, counterfactual simulation units, and spatial intervention node reconstruction to enable dynamic simulations of strike configurations and synchronization strategies. Training data are generated from a multi-physics simulation platform (GEANT4 + COMSOL) under NIST SP 800-160 standards, ensuring structural traceability and policy-level validation. Experimental results demonstrate that IA-STGNN significantly outperforms baseline models (ST-GNN, GCN-LSTM, XGBoost), achieving a 12.8 percent reduction in MAE and 18.4 percent increase in Top-5 percent accuracy, while improving causal path consistency and intervention stability. IA-STGNN enables interpretable prediction of strategic delay and supports applications such as nuclear deterrence simulation, diplomatic window assessment, and multi-strategy optimization, providing a structured and transparent AI decision-support mechanism for high-level policy modeling. |
This ...This paper proposes the first closed-loop causal modeling framework (IA-STGNN) that links tactical strike variables to strategic delay outcomes via graph neural networks with counterfactual reasoning |
Data Filtering for Genetic Perturbation Prediction | 2025-06-29 | ShowGenomic studies, including CRISPR-based PerturbSeq analyses, face a vast hypothesis space, while gene perturbations remain costly and time-consuming. Gene expression models based on graph neural networks are trained to predict the outcomes of gene perturbations to facilitate such experiments. Active learning methods are often employed to train these models due to the cost of the genomic experiments required to build the training set. However, poor model initialization in active learning can result in suboptimal early selections, wasting time and valuable resources. While typical active learning mitigates this issue over many iterations, the limited number of experimental cycles in genomic studies exacerbates the risk. To this end, we propose graph-based data filtering as an alternative. Unlike active learning, data filtering selects the gene perturbations before training, meaning it is free of bias due to random initialization and initial random selection. Moreover, reducing the iterations between the wet lab and the model provides several operational advantages resulting in significant acceleration. The proposed methods are motivated by theoretical studies of graph neural network generalization. The criteria are defined over the input graph and are optimized with submodular maximization. We compare them empirically to baselines and active learning methods that are state-of-the-art. The results demonstrate that graph-based data filtering achieves comparable accuracy while alleviating the aforementioned risks. |
21 pages |
Double-Diffusion: Diffusion Conditioned Diffusion Probabilistic Model For Air Quality Prediction | 2025-06-29 | ShowAir quality prediction is a challenging forecasting task due to its spatio-temporal complexity and the inherent dynamics as well as uncertainty. Most of the current models handle these two challenges by applying Graph Neural Networks or known physics principles, and quantifying stochasticity through probabilistic networks like Diffusion models. Nevertheless, finding the right balancing point between the certainties and uncertainties remains an open question. Therefore, we propose Double-Diffusion, a novel diffusion probabilistic model that harnesses the power of known physics to guide air quality forecasting with stochasticity. To the best of our knowledge, while precedents have been made of using conditional diffusion models to predict air pollution, this is the first attempt to use physics as a conditional generative approach for air quality prediction. Along with a sampling strategy adopted from image restoration and a new denoiser architecture, Double-Diffusion ranks first in most evaluation scenarios across two real-life datasets compared with other probabilistic models, it also cuts inference time by 50% to 30% while enjoying an increase between 3-12% in Continuous Ranked Probabilistic Score (CRPS). |
|
Prediction Gaps as Pathways to Explanation: Rethinking Educational Outcomes through Differences in Model Performance | 2025-06-28 | ShowSocial contexts -- such as families, schools, and neighborhoods -- shape life outcomes. The key question is not simply whether they matter, but rather for whom and under what conditions. Here, we argue that prediction gaps -- differences in predictive performance between statistical models of varying complexity -- offer a pathway for identifying surprising empirical patterns (i.e., not captured by simpler models) which highlight where theories succeed or fall short. Using population-scale administrative data from the Netherlands, we compare logistic regression, gradient boosting, and graph neural networks to predict university completion using early-life social contexts. Overall, prediction gaps are small, suggesting that previously identified indicators, particularly parental status, capture most measurable variation in educational attainment. However, gaps are larger for girls growing up without fathers -- suggesting that the effects of social context for these groups go beyond simple models in line with sociological theory. Our paper shows the potential of prediction methods to support sociological explanation. |
|
Missing-Modality-Aware Graph Neural Network for Cancer Classification | 2025-06-28 | ShowA key challenge in learning from multimodal biological data is missing modalities, where all data from some modalities are missing for some patients. Current fusion methods address this by excluding patients with missing modalities, imputing missing modalities, or making predictions directly with partial modalities. However, they often struggle with diverse missing-modality patterns and the exponential growth of the number of such patterns as the number of modalities increases. To address these limitations, we propose MAGNET (Missing-modality-Aware Graph neural NETwork) for direct prediction with partial modalities, which introduces a patient-modality multi-head attention mechanism to fuse lower-dimensional modality embeddings based on their importance and missingness. MAGNET's complexity increases linearly with the number of modalities while adapting to missing-pattern variability. To generate predictions, MAGNET further constructs a patient graph with fused multimodal embeddings as node features and the connectivity determined by the modality missingness, followed by a conventional graph neural network. Experiments on three public multiomics datasets for cancer classification, with real-world instead of artificial missingness, show that MAGNET outperforms the state-of-the-art fusion methods. The data and code are available at https://github.com/SinaTabakhi/MAGNET. |
15 pages, 7 figures |
Graph Contrastive Learning with Low-Rank Regularization and Low-Rank Attention for Noisy Node Classification | 2025-06-28 | ShowGraph Neural Networks (GNNs) have achieved remarkable success in learning node representations and have shown strong performance in tasks such as node classification. However, recent findings indicate that the presence of noise in real-world graph data can substantially impair the effectiveness of GNNs. To address this challenge, we introduce a robust and innovative node representation learning method named Graph Contrastive Learning with Low-Rank Regularization, or GCL-LRR, which follows a two-stage transductive learning framework for node classification. In the first stage, the GCL-LRR encoder is optimized through prototypical contrastive learning while incorporating a low-rank regularization objective. In the second stage, the representations generated by GCL-LRR are employed by a linear transductive classifier to predict the labels of unlabeled nodes within the graph. Our GCL-LRR is inspired by the Low Frequency Property (LFP) of the graph data and its labels, and it is also theoretically motivated by our sharp generalization bound for transductive learning. To the best of our knowledge, our theoretical result is among the first to theoretically demonstrate the advantage of low-rank regularization in transductive learning, which is also supported by strong empirical results. To further enhance the performance of GCL-LRR, we present an improved model named GCL-LR-Attention, which incorporates a novel LR-Attention layer into GCL-LRR. GCL-LR-Attention reduces the kernel complexity of GCL-LRR and contributes to a tighter generalization bound, leading to improved performance. Extensive evaluations on standard benchmark datasets evidence the effectiveness and robustness of both GCL-LRR and GCL-LR-Attention in learning meaningful node representations. The code is available at https://github.com/Statistical-Deep-Learning/GCL-LR-Attention. |
|
Convergent Privacy Framework with Contractive GNN Layers for Multi-hop Aggregations | 2025-06-28 | ShowDifferential privacy (DP) has been integrated into graph neural networks (GNNs) to protect sensitive structural information, e.g., edges, nodes, and associated features across various applications. A common approach is to perturb the message-passing process, which forms the core of most GNN architectures. However, existing methods typically incur a privacy cost that grows linearly with the number of layers (Usenix Security'23), ultimately requiring excessive noise to maintain a reasonable privacy level. This limitation becomes particularly problematic when deep GNNs are necessary to capture complex and long-range interactions in graphs. In this paper, we theoretically establish that the privacy budget can converge with respect to the number of layers by applying privacy amplification techniques to the message-passing process, exploiting the contractive properties inherent to standard GNN operations. Motivated by this analysis, we propose a simple yet effective Contractive Graph Layer (CGL) that ensures the contractiveness required for theoretical guarantees while preserving model utility. Our framework, CARIBOU, supports both training and inference, equipped with a contractive aggregation module, a privacy allocation module, and a privacy auditing module. Experimental evaluations demonstrate that CARIBOU significantly improves the privacy-utility trade-off and achieves superior performance in privacy auditing tasks. |
23 pages |
DistShap: Scalable GNN Explanations with Distributed Shapley Values | 2025-06-27 | ShowWith the growing adoption of graph neural networks (GNNs), explaining their predictions has become increasingly important. However, attributing predictions to specific edges or features remains computationally expensive. For example, classifying a node with 100 neighbors using a 3-layer GNN may involve identifying important edges from millions of candidates contributing to the prediction. To address this challenge, we propose DistShap, a parallel algorithm that distributes Shapley value-based explanations across multiple GPUs. DistShap operates by sampling subgraphs in a distributed setting, executing GNN inference in parallel across GPUs, and solving a distributed least squares problem to compute edge importance scores. DistShap outperforms most existing GNN explanation methods in accuracy and is the first to scale to GNN models with millions of features by using up to 128 GPUs on the NERSC Perlmutter supercomputer. |
12 pages |
Learning Non-Local Molecular Interactions via Equivariant Local Representations and Charge Equilibration | 2025-06-27 | ShowGraph Neural Network (GNN) potentials relying on chemical locality offer near-quantum mechanical accuracy at significantly reduced computational costs. Message-passing GNNs model interactions beyond their immediate neighborhood by propagating local information between neighboring particles while remaining effectively local. However, locality precludes modeling long-range effects critical to many real-world systems, such as charge transfer, electrostatic interactions, and dispersion effects. In this work, we propose the Charge Equilibration Layer for Long-range Interactions (CELLI) to address the challenge of efficiently modeling non-local interactions. This novel architecture generalizes the classical charge equilibration (Qeq) method to a model-agnostic building block for modern equivariant GNN potentials. Therefore, CELLI extends the capability of GNNs to model long-range interactions while providing high interpretability through explicitly modeled charges. On benchmark systems, CELLI achieves state-of-the-art results for strictly local models. CELLI generalizes to diverse datasets and large structures while providing high computational efficiency and robust predictions. |
|
CoATA: Effective Co-Augmentation of Topology and Attribute for Graph Neural Networks | 2025-06-27 | ShowGraph Neural Networks (GNNs) have garnered substantial attention due to their remarkable capability in learning graph representations. However, real-world graphs often exhibit substantial noise and incompleteness, which severely degrades the performance of GNNs. Existing methods typically address this issue through single-dimensional augmentation, focusing either on refining topology structures or perturbing node attributes, thereby overlooking the deeper interplays between the two. To bridge this gap, this paper presents CoATA, a dual-channel GNN framework specifically designed for the Co-Augmentation of Topology and Attribute. Specifically, CoATA first propagates structural signals to enrich and denoise node attributes. Then, it projects the enhanced attribute space into a node-attribute bipartite graph for further refinement or reconstruction of the underlying structure. Subsequently, CoATA introduces contrastive learning, leveraging prototype alignment and consistency constraints, to facilitate mutual corrections between the augmented and original graphs. Finally, extensive experiments on seven benchmark datasets demonstrate that the proposed CoATA outperforms eleven state-of-the-art baseline methods, showcasing its effectiveness in capturing the synergistic relationship between topology and attributes. |
icmr |
How do Probabilistic Graphical Models and Graph Neural Networks Look at Network Data? | 2025-06-27 | ShowGraphs are a powerful data structure for representing relational data and are widely used to describe complex real-world systems. Probabilistic Graphical Models (PGMs) and Graph Neural Networks (GNNs) can both leverage graph-structured data, but their inherent functioning is different. The question is how do they compare in capturing the information contained in networked datasets? We address this objective by solving a link prediction task and we conduct three main experiments, on both synthetic and real networks: one focuses on how PGMs and GNNs handle input features, while the other two investigate their robustness to noisy features and increasing heterophily of the graph. PGMs do not necessarily require features on nodes, while GNNs cannot exploit the network edges alone, and the choice of input features matters. We find that GNNs are outperformed by PGMs when input features are low-dimensional or noisy, mimicking many real scenarios where node attributes might be scalar or noisy. Then, we find that PGMs are more robust than GNNs when the heterophily of the graph is increased. Finally, to assess performance beyond prediction tasks, we also compare the two frameworks in terms of their computational complexity and interpretability. |
|
Task-Agnostic Contrastive Pretraining for Relational Deep Learning | 2025-06-27 | ShowRelational Deep Learning (RDL) is an emerging paradigm that leverages Graph Neural Network principles to learn directly from relational databases by representing them as heterogeneous graphs. However, existing RDL models typically rely on task-specific supervised learning, requiring training separate models for each predictive task, which may hamper scalability and reuse. In this work, we propose a novel task-agnostic contrastive pretraining approach for RDL that enables database-wide representation learning. For that aim, we introduce three levels of contrastive objectives$-$row-level, link-level, and context-level$-$designed to capture the structural and semantic heterogeneity inherent to relational data. We implement the respective pretraining approach through a modular RDL architecture and an efficient sampling strategy tailored to the heterogeneous database setting. Our preliminary results on standard RDL benchmarks demonstrate that fine-tuning the pretrained models measurably outperforms training from scratch, validating the promise of the proposed methodology in learning transferable representations for relational data. |
arXiv...arXiv admin note: text overlap with arXiv:2506.22199 |
LLM as GNN: Graph Vocabulary Learning for Text-Attributed Graph Foundation Models | 2025-06-27 | ShowText-Attributed Graphs (TAGs), where each node is associated with text descriptions, are ubiquitous in real-world scenarios. They typically exhibit distinctive structure and domain-specific knowledge, motivating the development of a Graph Foundation Model (GFM) that generalizes across diverse graphs and tasks. Despite large efforts to integrate Large Language Models (LLMs) and Graph Neural Networks (GNNs) for TAGs, existing approaches suffer from decoupled architectures with two-stage alignment, limiting their synergistic potential. Even worse, existing methods assign out-of-vocabulary (OOV) tokens to graph nodes, leading to graph-specific semantics, token explosion, and incompatibility with task-oriented prompt templates, which hinders cross-graph and cross-task transferability. To address these challenges, we propose PromptGFM, a versatile GFM for TAGs grounded in graph vocabulary learning. PromptGFM comprises two key components: (1) Graph Understanding Module, which explicitly prompts LLMs to replicate the finest GNN workflow within the text space, facilitating seamless GNN-LLM integration and elegant graph-text alignment; (2) Graph Inference Module, which establishes a language-based graph vocabulary ensuring expressiveness, transferability, and scalability, enabling readable instructions for LLM fine-tuning. Extensive experiments demonstrate our superiority and transferability across diverse graphs and tasks. The code is available at this: https://github.com/agiresearch/PromptGFM. |
|
MisinfoTeleGraph: Network-driven Misinformation Detection for German Telegram Messages | 2025-06-27 | ShowConnectivity and message propagation are central, yet often underutilized, sources of information in misinformation detection -- especially on poorly moderated platforms such as Telegram, which has become a critical channel for misinformation dissemination, namely in the German electoral context. In this paper, we introduce Misinfo-TeleGraph, the first German-language Telegram-based graph dataset for misinformation detection. It includes over 5 million messages from public channels, enriched with metadata, channel relationships, and both weak and strong labels. These labels are derived via semantic similarity to fact-checks and news articles using M3-embeddings, as well as manual annotation. To establish reproducible baselines, we evaluate both text-only models and graph neural networks (GNNs) that incorporate message forwarding as a network structure. Our results show that GraphSAGE with LSTM aggregation significantly outperforms text-only baselines in terms of Matthews Correlation Coefficient (MCC) and F1-score. We further evaluate the impact of subscribers, view counts, and automatically versus human-created labels on performance, and highlight both the potential and challenges of weak supervision in this domain. This work provides a reproducible benchmark and open dataset for future research on misinformation detection in German-language Telegram networks and other low-moderation social platforms. |