8000 GitHub - rbehzadan/extract-text-api
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

rbehzadan/extract-text-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FastAPI Goose3 HTML Content Extractor

This project is a REST API service built with FastAPI that extracts and returns cleaned text from HTML content using the Goose3 library.

Features

  • Accepts HTML content via a POST request.
  • Extracts and returns the main text content from the HTML.
  • Simple and fast implementation using FastAPI and Goose3.

Requirements

  • Python 3.8+
  • FastAPI
  • Goose3
  • Uvicorn

Installation

  1. Clone the repository:

    git clone https://github.com/rbehzadan/extract-text-api.git
    cd extract-text-api
  2. Create a virtual environment and activate it:

    python -m venv venv
    source venv/bin/activate
  3. Install the dependencies:

    pip install -r requirements.txt

Running the Application

Start the FastAPI application using Uvicorn:

uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4

Usage

Send a POST request to /extract-text with the HTML content in the request body:

POST /extract-text
Content-Type: application/json

{
  "content": "<html><body><h1>Sample Article</h1><p>This is a sample paragraph.</p></body></html>"
}

The API will return a JSON response with the cleaned text:

{
  "text": "Sample Article\nThis is a sample paragraph."
}

Docker

To run the application using Docker:

  1. Build the Docker image:

    docker build -t extract-text-api .
  2. Run the Docker container:

    docker run -d -p 8000:8080 extract-text-api

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any changes.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published
0