This project implements a Vision Transformer (ViT) based navigation system that combines RGB images, local occupancy maps, and goal information to generate robot control commands. The system is integrated with ROS for real-world robot control. In the case of the hybrid model, the control values predicted by the model are combined with the pure pursuit control values.
goal_vit/
├── models/ # Neural network models
├── ros/ # ROS node implementations
├── utils/ # Utility functions
├── config/ # Configuration files
├── ros_launch/ # gazebo simulator
└── README.md # This file
The core of the system is a custom Vision Transformer that processes:
- RGB camera images (64×192)
- Local occupancy maps (64×64)
- Goal information (relative position, distance, angle)
The model outputs:
- Linear velocity
- Angular velocity
The system includes:
- Navigation controller node
- Data collection node
- Visualization utilities
- Real-time navigation control
- Path following with Pure Pursuit correction
- Automatic local map generation
- Visual debugging tools
- Comprehensive data collection
- Tested in a relatively simple Gazebo simulator; coordinate transformation for path information needs verification.
- The current E2E model cannot avoid dynamic objects.
- Clone the repository:
git clone <repository_url>
cd goal_vit
- Install dependencies:
pip install -r requirements.txt
- Build the ROS package:
cd <catkin_workspace>
catkin_make
- Collect training data:
rosrun goal_vit data_collector.py
- Train the model:
python train.py --config config/model_config.py
- Start the navigation controller:
rosrun goal_vit controller.py _model_path:=path/to/model.pth _map_path:=path/to/map.pgm _yaml_path:=path/to/map.yaml
- Send navigation goals via RViz or command line:
rostopic pub /move_base_simple/goal geometry_msgs/PoseStamped "..."
The system behavior can be customized through:
config/model_config.py
: Model architecture and training parametersconfig/ros_config.py
: ROS node parameters and control settings