A full-stack application for managing scientific datasets and their providers. Built with FastAPI backend and React frontend, containerized with Docker for easy deployment.
- Overview
- Features
- Getting Started
- Architecture
- Environment Setup
- Development Workflow
- Production Deployment
- API Documentation
- Traefik Integration
- Maintenance Mode
- Troubleshooting
- Contributing
- License
Aggregator is designed to catalog and manage scientific datasets and their providers. It allows users to browse datasets, administrators to manage user access, and provides a comprehensive API for integration with other systems.
-
User Authentication and Authorization
- JWT-based authentication
- Role-based access control
- Provider-specific permissions
-
Provider Management
- Create, read, update, and delete data providers
- Associate metadata with providers
-
Dataset Management
- Organize datasets under providers
- Track dataset sources and landing pages
- Manage XML archives and useful links
-
Modern Web Interface
- Responsive React-based frontend
- User-friendly dashboard
-
API Integration
- RESTful API for all operations
- Legacy harvesting support
- Comprehensive documentation
As a user of Aggregator, you can:
-
Access the Platform:
- Navigate to the application URL in your web browser
- Log in with your provided credentials
-
Browse Datasets:
- View available datasets organized by provider
- Access dataset details and related resources
-
Use Dataset Resources:
- Follow links to dataset landing pages
- Access XML archives
- Use provided useful links for additional information
As an administrator, you have additional capabilities:
-
User Management:
- Create new user accounts
- Assign roles and permissions
- Manage provider-specific access
-
Provider Administration:
- Add new data providers to the system
- Update provider information
- Remove providers when necessary
-
Dataset Administration:
- Add, update, or remove datasets
- Manage dataset metadata
- Organize datasets under appropriate providers
As a developer working with Aggregator:
-
Local Development Setup:
# Clone the repository git clone <repository-url> cd aggregator # Copy and configure environment files cp .env.example .env cp backend/.env.example backend/.env cp frontend/.env.example frontend/.env # IMPORTANT: Generate a secure SECRET_KEY for backend/.env: # python -c "import secrets; print(secrets.token_hex(32))"
#Replace the default SECRET_KEY with your generated key # Start development environment docker-compose up -
API Integration:
- Use the API documentation at
/docs
to understand available endpoints - Authenticate with JWT tokens
- Make API calls to integrate with your systems
- Use the API documentation at
Aggregator follows a modern microservices architecture:
- Backend: FastAPI application providing RESTful API endpoints
- Frontend: React single-page application for the user interface
- Database: PostgreSQL database for persistent storage
- Reverse Proxy: Traefik for routing, load balancing, and service discovery
The application uses Traefik as a modern reverse proxy and load balancer:
- Automatic Service Discovery: Traefik automatically discovers services through Docker labels
- Path-Based Routing:
/api/*
routes are directed to the backend service- All other routes are directed to the frontend service
- Network Isolation: Services are connected through a dedicated Docker network (
app-network
) - Security:
- Only containers explicitly enabled with
traefik.enable=true
label are exposed - SSL/TLS configuration (commented out but ready for production use)
- API dashboard is disabled by default for security
- Only containers explicitly enabled with
To add a new service to Traefik:
- Connect the service to the
app-network
in docker-compose - Add the following labels to your service:
labels: - "traefik.enable=true" - "traefik.http.routers.[service-name].rule=PathPrefix(`/your-path`)" - "traefik.http.routers.[service-name].entrypoints=web" - "traefik.http.services.[service-name].loadbalancer.server.port=[internal-port]"
For production deployments, uncomment and configure the HTTPS sections in traefik/traefik.yml
and update the acme.json
file permissions to 600.
The application includes a maintenance mode feature that can be enabled during deployments or updates:
When enabled, a maintenance page is displayed to users while the application services are being updated. This is implemented using Traefik's dynamic routing rules:
- A dedicated
maintenance
container serves a static maintenance page - The container has a higher priority route that intercepts all traffic
- Backend and frontend services are temporarily disabled in Traefik routing
-
Using CI/CD Variables:
- In GitLab, go to Settings > CI/CD > Variables
- Add variable
MAINTENANCE_MODE
with valuetrue
- Run a deployment to enable maintenance mode
- Set back to
false
and re-deploy when maintenance is complete
-
For a Single Deployment:
- When manually triggering a pipeline, set variable
MAINTENANCE_MODE=true
- After completing maintenance, run another deployment with
MAINTENANCE_MODE=false
- When manually triggering a pipeline, set variable
You can also directly enable/disable maintenance mode on the server:
# Enable maintenance mode
docker-compose exec traefik traefik service update --label-add "traefik.enable=true" maintenance
docker-compose exec traefik traefik service update --label-add "traefik.enable=false" backend
docker-compose exec traefik traefik service update --label-add "traefik.enable=false" frontend
# Disable maintenance mode
docker-compose exec traefik traefik service update --label-add "traefik.enable=false" maintenance
docker-compose exec traefik traefik service update --label-add "traefik.enable=true" backend
docker-compose exec traefik traefik service update --label-add "traefik.enable=true" frontend
The maintenance page is located at /maintenance/index.html
and can be customized:
- Content: Modify the HTML to change the maintenance message
- Styling: Update the CSS in the style section
- Countdown: By default, the page shows a 30-minute countdown
- Behavior: The page will automatically refresh after the countdown ends
The maintenance page includes:
- GFBio branding
- Informative message about the maintenance
- Visual countdown timer
- Automatic refresh to check if the service is back online
The application uses environment variables for configuration:
-
Root
.env
file:# Database configuration DB_USER=user DB_PASSWORD=password DB_NAME=dbname # Security SECRET_KEY=your-secret-key # Frontend configuration (for development) REACT_APP_API_URL=http://localhost:8000
-
Environment-specific configuration:
- Development: Uses local directories mounted as volumes for hot-reloading
- Production: Uses built Docker images with optimized settings
-
Start the Development Environment:
docker-compose up
-
Backend Development:
- Edit files in the
backend/
directory - FastAPI hot-reloads changes automatically
- Access API documentation at
http://localhost:8000/docs
- Edit files in the
-
Frontend Development:
- Edit files in the
frontend/
directory - React development server hot-reloads changes
- Access frontend at
http://localhost:3000
- Edit files in the
-
Database Migrations:
# Inside the backend container alembic revision --autogenerate -m "description" alembic upgrade head
-
Build and Start Production Services:
docker-compose -f docker-compose.prod.yml up -d
-
Access the Application:
- Frontend:
http://your-server
- Backend API:
http://your-server/api
- Frontend:
-
Scaling Considerations:
- Adjust memory limits in
docker-compose.prod.yml
if needed - Consider using a container orchestration platform for larger deployments
- Adjust memory limits in
Once the application is running, you can access:
- Interactive API documentation:
http://localhost:8000/docs
(development) orhttp://your-server/api/docs
(production) - Alternative API documentation:
http://localhost:8000/redoc
(development) orhttp://your-server/api/redoc
(production)
-
Database Connection Errors:
- Verify database credentials in
.env
- Ensure PostgreSQL service is running
- Check network connectivity between containers
- Verify database credentials in
-
Frontend Not Loading:
- Check browser console for JavaScript errors
- Verify API URL configuration
- Ensure Nginx is properly configured
-
API Request Failures:
- Verify authentication token is valid
- Check CORS configuration
- Ensure proper permissions for the requested operation
-
Docker Issues:
- Run
docker-compose down
and thendocker-compose up
to rebuild - Check Docker logs with
docker-compose logs
- Verify Docker and Docker Compose versions
- Run
-
Traefik Routing Issues:
- Verify container labels are correctly configured
- Check Traefik logs with
docker-compose logs traefik
- Ensure your service is connected to the
app-network
- Check that the container has
traefik.enable=true
label
- Fork the repository
- Create a feature branch:
git checkout -b feature-name
- Commit your changes:
git commit -m 'Add some feature'
- Push to the branch:
git push origin feature-name
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.