Personalized arXiv Paper Recommendations with Multiple AI Models
This repository provides an enhanced daily digest for newly published arXiv papers based on your research interests, leveraging multiple AI models including OpenAI GPT, Google Gemini, and Anthropic Claude to provide relevancy ratings, detailed analysis, and topic clustering.
- Features
- Quick Start
- What This Repo Does
- Model Integrations
- Design Paper Discovery
- Output Formats
- Setting Up and Usage
- API Usage Notes
- Directory Structure
- Roadmap
- Contributing
- Multi-Model Integration: Support for OpenAI, Gemini, and Claude models for paper analysis
- Latest Models: Support for GPT-4o, GPT-4o mini, Claude 3.5, and other current models
- Two-Stage Processing: Efficient paper analysis with quick filtering followed by detailed analysis
- Enhanced Analysis: Detailed paper breakdowns including key innovations, critical analysis, and practical applications
- HTML Report Generation: Clean, organized reports saved with date-based filenames
- Adjustable Relevancy Threshold: Interactive slider for filtering papers by relevance score
- Design Automation Backend: Specialized tools for analyzing design-related papers
- Topic Clustering: Group similar papers using AI-powered clustering (Gemini)
- Robust JSON Parsing: Reliable extraction of analysis results from LLM responses
- Standardized Directory Structure: Organized codebase with
/src
,/data
, and/digest
directories - Improved Web UI: Clean Gradio interface with dynamic topic selection and error handling
Try it out on Hugging Face using your own API keys.
Staying up to date on arXiv papers is time-consuming, with hundreds of new papers published daily. Even with the official daily digest service, categories like cs.AI still contain 50-100 papers per day.
This repository creates a personalized daily digest by:
- Crawling arXiv for recent papers in your areas of interest
- Analyzing papers in-depth using AI models (OpenAI, Gemini, or Claude)
- Two-stage processing for efficiency:
- Stage 1: Quick relevancy filtering using only title and abstract
- Stage 2: Detailed analysis of papers that meet the relevancy threshold
- Scoring relevance on a scale of 1-10 based on your research interests
- Providing detailed analysis of each paper, including:
- Key innovations
- Critical analysis
- Implementation details
- Practical applications
- Related work
- Generating reports in HTML format with clean organization
The system supports three major AI providers:
- OpenAI GPT (gpt-3.5-turbo-16k, gpt-4, gpt-4-turbo, gpt-4o, gpt-4o-mini)
- Google Gemini (gemini-1.5-flash, gemini-1.5-pro, gemini-2.0-flash)
- Anthropic Claude (claude-3-haiku, claude-3-sonnet, claude-3-opus, claude-3.5-sonnet)
You can use any combination of these models, allowing you to compare results or choose based on your needs.
Reports are generated in multiple formats:
- HTML Reports: Clean, organized reports saved to the
/digest
directory with date-based filenames - Console Output: Summary information displayed in the terminal
- JSON Data: Raw paper data saved to the
/data
directory
Every HTML report includes:
- Paper title, authors, and link to arXiv
- Relevancy score with explanation
- Abstract and key innovations
- Critical analysis and implementation details
- Experiments, results, and discussion points
- Related work and practical applications
Example HTML report:
Modify config.yaml
with your preferences:
# Main research area
topic: "Computer Science"
# Specific categories to monitor
categories: ["Artificial Intelligence", "Computation and Language", "Machine Learning", "Information Retrieval"]
# Minimum relevance score (1-10)
threshold: 2
# Your research interests in natural language
interest: |
1. AI alignment and AI safety
2. Mechanistic interpretability and explainable AI
3. Large language model optimization
4. RAGs, Information retrieval
5. AI Red teaming, deception and misalignment
To run locally with the simplified UI:
- Install requirements:
pip install -r requirements.txt
- Run the app:
python src/app_new.py
- Open the URL displayed in your terminal
- Enter your API key(s) and configure your preferences
- Use the relevancy threshold slider to adjust paper filtering (default is 2)
To set up automated daily digests:
- Fork this repository
- Update
config.yaml
with your preferences - Set the following secrets in your repository settings:
OPENAI_API_KEY
(and/orGEMINI_API_KEY
orANTHROPIC_API_KEY
)
- The GitHub Action will run on schedule or can be triggered manually
For advanced users:
# Regular paper digests with simplified UI
python src/app_new.py
# Design paper finder
./src/design/find_design_papers.sh --days 7 --analyze
This tool respects arXiv's robots.txt and implements proper rate limiting. If you encounter 403 Forbidden errors:
- Wait a few hours before trying again
- Consider reducing the number of categories you're fetching
- Increase the delay between requests in the code
The repository is organized as follows:
/src
- All Python source codeapp_new.py
- Simplified interface with improved threshold handling and UIdownload_new_papers.py
- arXiv crawlerrelevancy.py
- Paper scoring and analysis with robust JSON parsingmodel_manager.py
- Multi-model integrationgemini_utils.py
- Gemini API integrationanthropic_utils.py
- Claude API integrationdesign/
- Design automation toolspaths.py
- Standardized path handling
/data
- JSON data files (auto-created)/digest
- HTML report files (auto-created)
- Support multiple AI models (OpenAI, Gemini, Claude)
- Generate comprehensive HTML reports with date-based filenames
- Specialized analysis for design automation papers
- Topic clustering via Gemini
- Standardized directory structure
- Enhanced HTML reports with detailed analysis sections
- Pre-filtering of arXiv categories for efficiency
- Adjustable relevancy threshold with UI slider
- Robust JSON parsing for reliable LLM response handling
- Simplified UI focused on core functionality
- Dynamic topic selection UI with improved error handling
- Support for newer models (GPT-4o, GPT-4o mini, Claude 3.5)
- Two-stage paper processing for efficiency (quick filtering followed by detailed analysis)
- Removed email functionality in favor of local HTML reports
- Full PDF content analysis
- Author-based ranking and filtering
- Fine-tuned open-source model support: Ollama, LocalAI...
You're encouraged to modify this code for your personal needs. If your modifications would be useful to others, please submit a pull request.
Valuable contributions include:
- Additional AI model integrations
- New analysis capabilities
- UI improvements
- Prompt engineering enhancements