This repository provides a RESTful API built using Flask for interacting with a custom language model, NGILlama3
, based on the Llama
architecture. The API allows for text generation using a pre-trained language model from Hugging Face, designed for a range of natural language processing tasks.
- Text Generation: Generate responses based on user input.
- Custom Model: The API uses the
NGILlama3
model, a fine-tuned version of Llama, for improved natural language understanding and generation. - Hugging Face Integration: Utilizes the Hugging Face
transformers
library for easy access to pre-trained models and tokenizers.
To use the application, ensure the following dependencies are installed:
- Docker: Required for running the application in a containerized environment.
- Python 3.8+: If running the application outside of Docker, you need Python and the associated libraries.
-
Clone this repository:
git clone https://github.com/HeReFanMi/NGI_LLM.git cd NGI_LLM
-
Build the Docker image:
docker build -t ngillama3-flask-api .
Once the image is built, you can run the container using:
docker run -d -p 5002:5002 ngillama3-flask-api
This will start the Flask application on port 5002
inside the container and expose it to your host machine.
The Flask API exposes a single endpoint:
POST /predict
: Takes a JSON payload with a text input and returns a generated response.
Make a POST
request to /predict
with the following JSON payload:
{
"text": "Your input text here."
}
The API will respond with a JSON object containing the generated text:
{
"response": "The model-generated text here."
}
curl -X POST http://127.0.0.1:5002/predict -H "Content-Type: application/json" -d '{"chunks":["A new study has shown that regular exercise can help reduce the risk of chronic diseases such as diabetes and heart disease.","Research also indicates that physical activity improves mental health and overall quality of life."],"question":"What are the health benefits of regular exercise?"}'
'
- Model Name:
a-hamdi/NGILlama3-merged
- Architecture: Fine-tuned
Llama
model. - Hugging Face Model: NGILlama3-merged on Hugging Face
To run the application locally without Docker, follow these steps:
-
Clone the repository:
git clone https://github.com/HeReFanMi/NGI_LLM.git cd NGI_LLM
-
Set up the Conda environment:
conda create --name unsloth_env python=3.10 pytorch-cuda=11.8 pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers -y conda activate unsloth_env
-
Install the required Python dependencies:
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" pip install --no-deps "trl<0.9.0" peft accelerate bitsandbytes
-
Run the Flask app:
flask run
This will start the application at http://127.0.0.1:5002
.
The application relies on the following Python libraries:
- transformers==4.33.2: Hugging Face Transformers library for working with pre-trained models.
- torch==2.0.1: PyTorch for model inference.
- flask==2.3.2: Flask web framework for building the API.
- Model Loading Issues: Ensure the model is available on Hugging Face and the internet connection is stable.
- Out of Memory Errors: If you are running the app locally and encounter memory issues, consider using a machine with a more powerful GPU or reduce the model size.