A robust, scalable financial chatbot capable of processing structured and unstructured financial documents. The system extracts key insights, summarizes content, and answers user queries based on uploaded data.
This implementation uses an agentic AI architecture powered by Google's Gemini models. The system consists of:
-
Core Agent: A central coordinator that manages:
- Intent detection
- Tool selection
- Response generation
- Language detection and translation
-
Tool Registry System: Modular tools that can be dynamically invoked:
- Tools register capabilities and schemas
- The agent selects appropriate tools based on user intent
- Loose coupling allows for easy extension
-
LLM Integration: Google Gemini integration with:
- Intent detection
- Structured output generation
- Response generation
- Process multiple file formats (CSV, Excel, PDF, DOCX)
- Extract financial data and insights
- Analyze trends and metrics
- Support for multiple languages
- Real-time chat interface
- Document history management
- Web search and online content analysis
- Advanced data visualization capabilities
- Export functionality for session data
- Robust database with local fallback storage
- Comprehensive history tracking
- Third-party charting library integration
FinWise/
├── keys/ # API keys and credentials
├── src/
│ ├── agents/
│ │ └── financial_agent.py # Core agent coordinator
│ ├── tools/
│ │ ├── file_processor.py # File processing tool
│ │ ├── financial_analysis_tool.py # Financial analysis tool
│ │ ├── text_summarization.py # Text summarization tool
│ │ ├── language_tool.py # Language detection/translation tool
│ │ ├── web_search_tool.py # Web content extraction tool
│ │ ├── search_api_tool.py # Search API integration
│ │ ├── data_visualization_tool.py # Visualization tool
│ │ ├── csv_analyzer_tool.py # CSV analysis tool
│ │ ├── dynamic_visualization_tool.py # Advanced visualization
│ │ └── tool_registry.py # Tool registry system
│ ├── llm/
│ │ ├── llm_manager.py # LLM provider manager
│ │ ├── gemini_integration.py # Google Gemini integration
│ │ └── groq_integration.py # Groq LLM integration
│ ├── db/
│ │ └── db_connection.py # Database connection manager
│ ├── config/
│ │ └── config.py # Configuration settings
│ ├── models/ # Data models
│ ├── utils/ # Utility functions
│ └── routes/ # API routes
├── docs/
│ ├── screenshots/ # Application screenshots
│ └── diagrams/ # System architecture diagrams
├── uploads/ # Directory for uploaded files
├── local_storage/ # Local storage for database fallback
├── main.py # Streamlit application
├── system_architecture.md # Architecture diagrams
├── visualization/ # Visualization capabilities documentation
├── evaluation_report.md # System performance and evaluation report
├── debug_tools.py # Debugging and tool validation utilities
├── requirements.txt # Dependencies
└── .env # Environment variables```
- Language: Python 3.8+
- LLM: Google Gemini Pro, Groq
- Frontend: Streamlit
- Database: MongoDB with local storage fallback
- Data Processing:
- Pandas for tabular data
- PyPDF for PDF parsing
- python-docx for DOCX processing
- Data Visualization:
- Matplotlib
- Plotly
- Altair
- Seaborn
- Web Search:
- SerpAPI integration
- BeautifulSoup for web scraping
- Libraries:
- langdetect for language detection
- streamlit for web UI
- langchain for LLM integrations
- Clone the repository
git clone https://github.com/yourusername/financial-intelligence-chatbot.git
cd financial-intelligence-chatbot
- Create and activate a virtual environment (recommended)
python -m venv venv
# On Windows
venv\Scripts\activate
# On macOS/Linux
source venv/bin/activate
- Install the required dependencies:
pip install -r requirements.txt
- Create a
.env
file with the following variables:
# LLM API Keys
GEMINI_API_KEY=your_gemini_api_key_here
GROQ_API_KEY=your_groq_api_key_here
# SerpAPI for web search
SERPAPI_KEY=your_serpapi_key_here
# MongoDB connection (optional)
MONGODB_CONNECTION_STRING=your_mongodb_connection_string
-
Go to Google Cloud Console
Open: https://console.cloud.google.com/ -
Select your project
At the top, click the project dropdown and choose the project you're working with (finbot
in your case). -
Enable the required API (if not already enabled):
- For Gen Language models or Vertex AI:
Search for and enable Vertex AI API or Generative Language API.
- For Gen Language models or Vertex AI:
-
Go to IAM & Admin > Service Accounts
- Navigation Menu (☰) → IAM & Admin → Service Accounts
-
Create a New Service Account (or select an existing one)
- Click “Create Service Account”
- Give it a name like
gen-lang-client
- Click “Create and Continue”
-
Assign Role(s)
- Choose appropriate roles. For Gen AI use, select:
Vertex AI User
orGenerative Language API User
- Click “Continue” and then “Done”
- Choose appropriate roles. For Gen AI use, select:
-
Generate the Key File (JSON)
- After creation, click the service account name
- Go to the “Keys” tab
- Click “Add Key” → “Create new key”
- Select “JSON” → Click “Create”
- ✅ The
.json
key file will automatically download.
-
Move or Save the Key File to Your Path
- Save it to:
C:\your\path\keys\
(create the folders if they don’t exist)
- Save it to:
Once you’ve downloaded the JSON key file, you need to set an environment variable so your code can authenticate using that key.
export GOOGLE_APPLICATION_CREDENTIALS="C:/path/to/project/finbot/keys/gen-lang-client-0049803850-db4864d6249d.json"
set GOOGLE_APPLICATION_CREDENTIALS=C:\path\to\project\finbot\keys\gen-lang-client-0049803850-db4864d6249d.json
$env:GOOGLE_APPLICATION_CREDENTIALS="C:\path\to\project\finbot\keys\gen-lang-client-0049803850-db4864d6249d.json"
Never share this key publicly. It gives access to your Google Cloud resources. Treat it like a password.
The application supports two database options:
- Set up a MongoDB database (local or cloud-based like MongoDB Atlas)
- Add your connection string to the
.env
file:MONGODB_CONNECTION_STRING=mongodb+srv://username:password@cluster.mongodb.net/financial_chatbot
- If no MongoDB connection is provided, the system automatically falls back to local file storage
- Local storage files are saved in the
local_storage/
directory - This option works out of the box with no additional setup
You can configure multiple API keys for the same LLM provider to enable key rotation and load balancing. To add multiple keys:
-
In your
.env
file, add multiple keys using comma separation:OPENAI_API_KEY=key1,key2,key3 GEMINI_API_KEY=key1,key2
-
The system will automatically rotate between these keys to:
- Distribute API calls across multiple keys
- Avoid rate limiting issues
- Provide failover if one key encounters errors
The application implements an intelligent key rotation strategy:
- Keys are used in sequence, rotating to the next key after each API call
- If a key encounters an error (like rate limiting), it's temporarily skipped
- The system tracks usage metrics for each key to optimize distribution
Start the Streamlit app:
streamlit run main.py
The application will be available at http://localhost:8501
You can customize the application startup with these options:
streamlit run main.py -- --port 8502 --no-db-connection --mock-llm
Available options:
--port
: Specify a custom port (default: 8501)--no-db-connection
: Run without attempting database connection--mock-llm
: Use mock LLM responses for testing without API costs
-
User Query Processing:
- The system analyzes the user's query to determine intent
- It selects the appropriate tools based on the detected intent
- The agent coordinates tool execution
-
Document Processing:
- Files are uploaded through the Streamlit interface
- The file processor tool extracts content based on file type
- Different processing strategies are applied for different formats
-
Analysis & Insights:
- The financial analysis tool extracts trends, metrics, and insights
- Results are formatted for easy understanding
-
Response Generation:
- The agent generates a natural language response using Gemini
- Responses incorporate insights from the tool executions
-
Multilingual Support:
- Language is detected automatically
- Users can select their preferred language
- Responses are translated to the selected language
-
Web Research:
- Extract content from URLs mentioned in queries
- Perform web searches for relevant financial information
- Summarize online content and provide source citations
-
Data Visualization:
- Analyze data to determine appropriate visualization types
- Generate charts based on financial data patterns
- Support multiple visualization libraries for different chart types
- Save visualizations for future reference
-
Export & History Management:
- Track session history including chat, documents, and searches
- Export data in multiple formats (JSON, CSV)
- Generate specialized reports for different aspects of analysis
- Access document processing history for audit purposes
The application interface is designed for intuitive navigation:
- Chat Interface: Central area for conversation with the AI
- Sidebar: Contains settings, document management, and history
- Fixed Chat Input: Always accessible at the bottom of the screen
- Chat Settings: Language selection and session management
- Documents: Upload and manage financial documents
- Visualizations: View saved charts and graphs
- History: Browse conversation history
- Document History: Track document processing actions
- Search History: View past web searches
- Export Data: Export session data in various formats
- About: Information about the application
The application provides comprehensive export capabilities:
-
Export Formats:
- JSON: Complete data export with all details
- CSV: Spreadsheet-friendly format for easy analysis
- Full Archive: Complete session backup
-
Report Types:
- Chat History: Export conversations with timestamps
- Document Summary: Summary of all processed documents
- Visualization History: Export of charts and analysis
- Activity Log: Complete audit trail of system usage
-
Export Process:
- Navigate to the "Export Data" tab in the sidebar
- Select your preferred export format
- Choose whether to include visualizations
- Click "Export Current Session" or select a specific report
- Download the generated file
The system supports multiple charting libraries for different visualization needs:
-
Built-in Libraries:
- Matplotlib: Standard static charts
- Plotly: Interactive visualizations
- Altair: Declarative charts
- Seaborn: Statistical vis 8000 ualizations
-
Chart Types:
- Line charts for trend analysis
- Bar charts for comparisons
- Scatter plots for correlation analysis
- Pie charts for composition analysis
- Heatmaps for complex data patterns
- Box plots for distribution analysis
-
Dynamic Chart Selection:
- The system automatically determines the most appropriate chart type based on data characteristics
- Users can specify preferred chart types in their queries
- Charts adapt to data size and dimensionality
To add new tools:
- Create a new tool class that inherits from the
Tool
base class - Implement the
execute
method with your tool's functionality - Define input and output schemas
- Register the tool with the
tool_registry
Example of a new tool implementation:
from src.tools.tool_registry import Tool, tool_registry
class NewFinancialTool(Tool):
"""Tool for specialized financial analysis."""
name = "NewFinancialTool"
description = "Performs specialized financial analysis"
input_schema = {
"type": "object",
"properties": {
"data": {
"type": "string",
"description": "Financial data to analyze"
}
},
"required": ["data"]
}
output_schema = {
"type": "object",
"properties": {
"result": {
"type": "string",
"description": "Analysis result"
}
}
}
def execute(self, data: str) -> dict:
"""Execute the tool with the provided data."""
# Your implementation here
result = self._analyze_data(data)
return {"result": result}
def _analyze_data(self, data: str) -> str:
# Custom analysis logic
return "Analysis results"
# Register the tool
tool_registry.register(NewFinancialTool)
The system architecture is documented in detail in the system_architecture.md
file, which contains Mermaid diagrams illustrating:
- System Architecture Diagram: High-level components and their relationships
- Block Diagram: Logical organization of the system's layers
- Workflow Diagram: Sequence of operations for typical user interactions
- Component Hierarchy: Organization and nesting of system components
- Data Flow Diagram: How data moves through the system
- Entity-Relationship Diagram: Data relationships
- State Diagram: System states during different operations
- Deployment Diagram: Physical deployment structure
To view these diagrams, open the file in a Markdown viewer that supports Mermaid, such as:
- GitHub's web interface
- VS Code with the Markdown Preview Mermaid Support extension
- Various online Mermaid viewers
The application supports API key rotation for Google Gemini to help manage rate limits and ensure uninterrupted service:
-
Multiple API Keys:
- The system can use multiple Gemini API keys
- Keys are automatically rotated when rate limits are reached
-
Adding API Keys:
- Open
src/llm/gemini_integration.py
- Add your additional API keys to the
API_KEYS
list - Example:
API_KEYS = [GEMINI_API_KEY, "YOUR_SECOND_API_KEY_HERE", "YOUR_THIRD_API_KEY_HERE"]
- Open
-
Benefits:
- Increased resilience against rate limiting
- Higher throughput for LLM requests
- Seamless fallback when one key reaches its quota
The chatbot supports multiple LLM providers with automatic fallback functionality:
-
Available Providers:
- Google Gemini: Primary provider for most queries
- Groq: Used as a fallback when Gemini encounters errors or rate limits
-
Setting Up Groq:
- Create a Groq account at groq.com
- Get your API key from the Groq console
- Add it to your
.env
file:GROQ_API_KEY=your_groq_api_key_here
-
Provider Management:
- The system automatically switches between providers when errors occur
- Error detection identifies rate limits and quota issues
- Cooldown periods are managed for each provider
-
Benefits:
- Increased reliability through provider redundancy
- Continued operation during provider outages
- Ability to leverage the unique strengths of each model
- Check your MongoDB connection string in the
.env
file - Ensure your IP address is whitelisted in MongoDB Atlas (if using Atlas)
- The system will automatically fall back to local storage if the database is unavailable
- Verify your API keys in the
.env
file - Check quota limits on your LLM provider accounts
- The system will attempt to use alternative providers if available
- Ensure uploaded files are in supported formats
- Check file encoding (UTF-8 is recommended)
- Large files may need to be split into smaller pieces
- Verify your SerpAPI key
- Check your internet connection
- Some websites may block content extraction
We welcome contributions to improve the Financial Intelligence Chatbot:
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature-name
- Commit your changes:
git commit -m 'Add some feature'
- Push to the branch:
git push origin feature/your-feature-name
- Open a Pull Request
This project is licensed under the MIT License.