Topic 1. Build a Medical Chatbot: Fine-tune Deepseek R1 LLM on medical data

https://github.com/i-krishna/AI-Agents_LLMs/blob/main/fine-tune-deepseek-medical-data.py
Fine-tuning method: LoRA (Low-Rank Adaptation)
Tools:

Unsloth (efficient LLM fine-tuning)

Hugging Face Transformers & Datasets

Weights & Biases for experiment tracking

PyTorch for auxiliary tasks

Kaggle Notebooks for free GPU access

Setup instructions:

Activate GPU in Kaggle

Obtain API tokens for Weights & Biases and Hugging Face

Store them securely in Kaggle's secrets manager

Topic 2. Steps to Fine-tune a pre-trained LLM (HuggingFace)

https://github.com/i-krishna/AI-Agents_LLMs/blob/main/fine-tune-llm.py
Fine-Tuning adjusts internal parameters (weights/biases) of a pre-trained LLM to specialize it for a specific task (e.g., GPT-3 → ChatGPT).

Base vs. Fine-Tuned Models:

Base (e.g., GPT-3): General-purpose text completion. Fine-Tuned (e.g., text-DaVinci-003): Task-aligned and more practical.

Smaller Fine-Tuned > Larger Base:

Example: 1.3B InstructGPT outperforms 175B GPT-3 on instruction tasks

Three Fine-Tuning Methods:

Self-Supervised Learning: Predict next token using curated text
Supervised Learning: Train on labeled input-output pairs
Reinforcement Learning: Based on Human feedback → reward model → PPO fine-tuning

Fine-Tuning Workflow (Supervised):

Choose task
Prepare dataset - https://github.com/i-krishna/AI-Agents_LLMs/blob/main/fine-tune-llm.py#L22
Select base model
Fine-tune
Evaluate

Parameter Update Strategies:

Full Training: Update all model weights
Transfer Learning: Update final layers only
PEFT (e.g., LoRA): Freeze base weights, inject small trainable layers

LoRA (Low-Rank Adaptation):

Dramatically reduces trainable parameters (e.g., 1M → 4K), improving efficiency

Example – DistilBERT Sentiment Classifier:

Model: distilbert-base-uncased

Task: Binary sentiment classification

Steps: Tokenization, formatting, padding, accuracy metric

Pre-Fine-Tuning Evaluation:

Base model performs ~50% accuracy (random chance)

Post-Fine-Tuning Observations:

Training accuracy improves; some overfitting observed. Slight improvement in real-world sentiment prediction

Topic 3: Intelligence Explosion & AI Agents

https://github.com/i-krishna/AI-Agents_LLMs/blob/main/ai_agent_researchpaper_replication.py

If AI can Read papers, Understand them, Code and test them, and Evaluate results…

Then we're heading toward AI improving AI (Reinforcement Machine Learning), which could accelerate innovation at a pace faster than humans alone can achieve.

An AI 5F72 Agent is an autonomous system that perceives its environment, processes information, and takes actions to achieve specific goals. In AI research, these agents can read papers, write code, run experiments, and even innovate.

Research Paper Replication

How AI Agents Conduct AI Research (4-Step Process)

Agent Submission

Given research papers to replicate (e.g., OpenAI's PaperBench https://cdn.openai.com/papers/22265bac-3191-44e5-b057-7aaacd8e90cd/paperbench.pdf).

Reproduction Execution

Develops codebases to reproduce paper results.

Automated Grading

An LLM (e.g., GPT-4, https://github.com/google/automl) judges replication accuracy.

Performance Analysis

Evaluates if agents can replicate and improve research.

Text classification with LLM Models

AI-Agents_LLMs/chat.py AI-Agents_LLMs/agents.py

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.gitignore		.gitignore
README.md		README.md
agents.py		agents.py
ai_agent_researchpaper_replication.py		ai_agent_researchpaper_replication.py
chat.py		chat.py
customerdb_sqlite.py		customerdb_sqlite.py
fine-tune-deepseek-medical-data.py		fine-tune-deepseek-medical-data.py
fine-tune-llm.py		fine-tune-llm.py
settings.json		settings.json
text2SQL_Agent.py		text2SQL_Agent.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Topic 1. Build a Medical Chatbot: Fine-tune Deepseek R1 LLM on medical data

Topic 2. Steps to Fine-tune a pre-trained LLM (HuggingFace)

Base vs. Fine-Tuned Models:

Smaller Fine-Tuned > Larger Base:

Three Fine-Tuning Methods:

Fine-Tuning Workflow (Supervised):

Parameter Update Strategies:

LoRA (Low-Rank Adaptation):

Pre-Fine-Tuning Evaluation:

Post-Fine-Tuning Observations:

Topic 3: Intelligence Explosion & AI Agents

Research Paper Replication

Text classification with LLM Models

References

Machine Learning

About

Uh oh!

Releases

Packages

Languages

i-krishna/AI-Agents_LLMs

Folders and files

Latest commit

History

Repository files navigation

Topic 1. Build a Medical Chatbot: Fine-tune Deepseek R1 LLM on medical data

Topic 2. Steps to Fine-tune a pre-trained LLM (HuggingFace)

Base vs. Fine-Tuned Models:

Smaller Fine-Tuned > Larger Base:

Three Fine-Tuning Methods:

Fine-Tuning Workflow (Supervised):

Parameter Update Strategies:

LoRA (Low-Rank Adaptation):

Pre-Fine-Tuning Evaluation:

Post-Fine-Tuning Observations:

Topic 3: Intelligence Explosion & AI Agents

Research Paper Replication

Text classification with LLM Models

References

Machine Learning

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages