FastAPI Goose3 HTML Content Extractor

This project is a REST API service built with FastAPI that extracts and returns cleaned text from HTML content using the Goose3 library.

Features

Accepts HTML content via a POST request.
Extracts and returns the main text content from the HTML.
Simple and fast implementation using FastAPI and Goose3.

Requirements

Python 3.8+
FastAPI
Goose3
Uvicorn

Installation

Clone the repository:

git clone https://github.com/rbehzadan/extract-text-api.git
cd extract-text-api

Create a virtual environment and activate it:

python -m venv venv
source venv/bin/activate

Install the dependencies:
```
pip install -r requirements.txt
```

Running the Application

Start the FastAPI application using Uvicorn:

uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4

Usage

Send a POST request to /extract-text with the HTML content in the request body:

POST /extract-text
Content-Type: application/json

{
  "content": "<html><body><h1>Sample Article</h1><p>This is a sample paragraph.</p></body></html>"
}

The API will return a JSON response with the cleaned text:

{
  "text": "Sample Article\nThis is a sample paragraph."
}

Docker

To run the application using Docker:

Build the Docker image:
```
docker build -t extract-text-api .
```

Run the Docker container:

docker run -d -p 8000:8080 extract-text-api

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any changes.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FastAPI Goose3 HTML Content Extractor

Features

Requirements

Installation

Running the Application

Usage

Docker

License

Contributing

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

rbehzadan/extract-text-api

Folders and files

Latest commit

History

Repository files navigation

FastAPI Goose3 HTML Content Extractor

Features

Requirements

Installation

Running the Application

Usage

Docker

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages