-
https://github.com/i-krishna/AI-Agents_LLMs/blob/main/fine-tune-deepseek-medical-data.py
-
Fine-tuning method: LoRA (Low-Rank Adaptation)
-
Tools:
Unsloth (efficient LLM fine-tuning)
Hugging Face Transformers & Datasets
Weights & Biases for experiment tracking
PyTorch for auxiliary tasks
Kaggle Notebooks for free GPU access
- Setup instructions:
Activate GPU in Kaggle
Obtain API tokens for Weights & Biases and Hugging Face
Store them securely in Kaggle's secrets manager
- https://github.com/i-krishna/AI-Agents_LLMs/blob/main/fine-tune-llm.py
- Fine-Tuning adjusts internal parameters (weights/biases) of a pre-trained LLM to specialize it for a specific task (e.g., GPT-3 → ChatGPT).
Base (e.g., GPT-3): General-purpose text completion. Fine-Tuned (e.g., text-DaVinci-003): Task-aligned and more practical.
Example: 1.3B InstructGPT outperforms 175B GPT-3 on instruction tasks
-
Self-Supervised Learning: Predict next token using curated text
-
Supervised Learning: Train on labeled input-output pairs
-
Reinforcement Learning: Based on Human feedback → reward model → PPO fine-tuning
-
Choose task
-
Prepare dataset - https://github.com/i-krishna/AI-Agents_LLMs/blob/main/fine-tune-llm.py#L22
-
Select base model
-
Fine-tune
-
Evaluate
-
Full Training: Update all model weights
-
Transfer Learning: Update final layers only
-
PEFT (e.g., LoRA): Freeze base weights, inject small trainable layers
Dramatically reduces trainable parameters (e.g., 1M → 4K), improving efficiency
Example – DistilBERT Sentiment Classifier:
Model: distilbert-base-uncased
Task: Binary sentiment classification
Steps: Tokenization, formatting, padding, accuracy metric
Base model performs ~50% accuracy (random chance)
Training accuracy improves; some overfitting observed. Slight improvement in real-world sentiment prediction
https://github.com/i-krishna/AI-Agents_LLMs/blob/main/ai_agent_researchpaper_replication.py
If AI can Read papers, Understand them, Code and test them, and Evaluate results…
Then we're heading toward AI improving AI (Reinforcement Machine Learning), which could accelerate innovation at a pace faster than humans alone can achieve.
An AI 5F72 Agent is an autonomous system that perceives its environment, processes information, and takes actions to achieve specific goals. In AI research, these agents can read papers, write code, run experiments, and even innovate.
How AI Agents Conduct AI Research (4-Step Process)
- Agent Submission
Given research papers to replicate (e.g., OpenAI's PaperBench https://cdn.openai.com/papers/22265bac-3191-44e5-b057-7aaacd8e90cd/paperbench.pdf).
- Reproduction Execution
Develops codebases to reproduce paper results.
- Automated Grading
An LLM (e.g., GPT-4, https://github.com/google/automl) judges replication accuracy.
- Performance Analysis
Evaluates if agents can replicate and improve research.
AI-Agents_LLMs/chat.py AI-Agents_LLMs/agents.py