8000 feat: spatiotemporal agent by maciejmajek · Pull Request #453 · RobotecAI/rai · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat: spatiotemporal agent #453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: development
Choose a base branch
from
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -174,3 +174,5 @@ src/examples/*-demo
artifact_database.pkl

imgui.ini

vectorstore_data/
4 changes: 4 additions & 0 deletions config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@ complex_model = "llama3.1:70b"
embeddings_model = "llama3.2"
base_url = "http://localhost:11434"

[vectorstore]
type = "faiss"
uri = "vectorstore_data"

[tracing]
project = "rai"

Expand Down
119 changes: 119 additions & 0 deletions docs/agents/spatiotemporal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# SpatioTemporalAgent

![SpatioTemporalAgent](../imgs/spatiotemporal.png)

## Overview

The `SpatioTemporalAgent` is an intelligent agent designed to capture, process, and store spatiotemporal data - information that combines both spatial (location, pose) and temporal (time-based) aspects. It's particularly useful in robotics and autonomous systems where understanding both the spatial context and its evolution over time is crucial.

Key capabilities:

- Captures and processes image data from multiple sources
- Records spatial transformations (e.g., robot poses)
- Converts visual data into textual descriptions using LLMs
- Compresses robot's current history to create a temporal context
- Stores all data in MongoDB for persistence and retrieval

## Core Components

### Data Model

The agent works with several key data structures:

- `SpatioTemporalRecord`: The main data container that includes:
- `timestamp`: When the data was captured
- `images`: Dictionary of camera images (base64 encoded)
- `tf`: Transform data (robot's pose)
- `temporal_context`: Compressed history of messages
- `image_text_descriptions`: LLM-generated descriptions of the images

### Configuration

The agent is configured through `SpatioTemporalConfig`:

- `db_url`: MongoDB connection URL
- `db_name`: Target database name
- `collection_name`: Collection for data storage
- `image_to_text_model`: AI model for image description
- `context_compression_model`: AI model for context compression
- `vector_db`: Vector store
- `time_interval`: Data collection frequency

## ROS2 Implementation

The ROS2-specific implementation (`ROS2SpatioTemporalAgent`) extends the base agent to work with ROS2 systems.

### Additional Configuration

`ROS2SpatioTemporalConfig` adds ROS2-specific parameters:

- `camera_topics`: List of ROS2 topics providing image data
- `robot_frame`: The robot's reference frame (e.g., "base_link")
- `world_frame`: The world reference frame (e.g., "world")

### Setup and Usage

1. **Prerequisites**

First, ensure MongoDB is running. Using Docker is recommended:

```bash
docker run -d --name rai-mongo -p 27017:27017 mongo
```

2. **Agent Configuration**

```python
import rclpy
from rai.agents.spatiotemporal import ROS2SpatioTemporalAgent, ROS2SpatioTemporalConfig
from rai.utils.model_initialization import get_llm_model, get_vectorstore

config = ROS2SpatioTemporalConfig(
robot_frame="base_link",
world_frame="world",
db_url="mongodb://localhost:27017/",
db_name="rai",
collection_name="spatiotemporal_collection",
image_to_text_model=get_llm_model("simple_model"),
context_compression_model=get_llm_model("simple_model"),
time_interval=10.0,
camera_topics=["/camera/camera/color/image_raw"],
vector_db=get_vectorstore(),
)

agent = ROS2SpatioTemporalAgent(config)
```

A complete working example can be found in [examples/agents/spatiotemporal.py](../../examples/agents/spatiotemporal.py).

3. **Running the Agent**

```python
agent.run()
```

## Best Practices

1. **Camera Topics**: Choose camera topics that provide stable, consistent data streams
2. **Transform Frames**: Ensure your `robot_frame` and `world_frame` are valid and published
3. **Time Interval**: Set based on your application needs - shorter for high-frequency monitoring, longer for periodic snapshots

## Troubleshooting

Common issues and solutions:

1. **Missing Images**

- Verify camera topics are publishing
- Check topic names and message types

2. **Transform Errors**

- Confirm both frames exist in the TF tree
- Check for timing issues in transform lookups
- Verify transform chain is complete

3. **Database Issues**
- Ensure MongoDB is running and accessible
- Check database credentials and permissions
- Verify network connectivity to database server
Binary file added docs/imgs/spatiotemporal.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
52 changes: 52 additions & 0 deletions examples/agents/spatiotemporal.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Copyright (C) 2025 Robotec.AI
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language goveself.rning permissions and
# limitations under the License.

import time

import rclpy
from rai.agents.spatiotemporal import ROS2SpatioTemporalAgent, ROS2SpatioTemporalConfig
from rai.utils.model_initialization import get_llm_model, get_vectorstore


def create_agent():
config = ROS2SpatioTemporalConfig(
robot_frame="base_link",
world_frame="world",
db_url="mongodb://localhost:27017/",
db_name="rai",
collection_name="spatiotemporal_collection",
image_to_text_model=get_llm_model("simple_model"),
context_compression_model=get_llm_model("simple_model"),
time_interval=10.0,
camera_topics=["/camera/camera/color/image_raw"],
vector_db=get_vectorstore(),
)
agent = ROS2SpatioTemporalAgent(config)
return agent


def main():
rclpy.init()
agent = create_agent()
agent.run()

try:
while True:
time.sleep(1.0)
except KeyboardInterrupt:
agent.stop()


if __name__ == "__main__":
main()
Loading
0