The Bookshelf Scanner App is an end-to-end solution for detecting books from an image of a bookshelf and identifying their titles and authors. It leverages cutting-edge computer vision (YOLO segmentation) and Large Language Models (Moondream2) to deliver a seamless user experience via a modern web interface.
This repository contains three main components:
-
AI (Python, Poetry):
- Purpose: Performs book segmentation from the bookshelf image using a YOLO model. Each detected book spine is then cropped and, if necessary, rotated for correct orientation.
- LLM Inference (Moondream2): After segmentation, the AI model uses a Moondream2 LLM to extract the book title and author from each cropped book image. The Moondream2 model is fine-tuned and quantized for fast inference time.
-
Backend (Python, FastAPI, Poetry):
- Purpose: Provides an HTTP API (
POST /api/predict
) that accepts an uploaded image file and streams the prediction results. The first response chunk includes the segmented image in a suitable format (base64), followed by incremental streams of recognized book titles and authors. - Features:
- Asynchronous API for efficient streaming of inference results.
- FastAPI-based implementation for scalability and ease of deployment.
- Purpose: Provides an HTTP API (
-
Frontend (Angular, Bun):
- Purpose: Offers a simple UI where users can upload an image of a bookshelf.
- Features:
- Displays the segmented image returned by the backend.
- Dynamically shows the recognized books’ titles and authors as they stream from the backend.
- Tech Stack: Angular for the SPA, Bun as the runtime and package manager for better performance.
- The user uploads an image via the Frontend.
- The Backend receives the image and sends it to the AI service.
- The AI processes the image:
- Segments the bookshelf image using YOLO.
- Extracts each book spine.
- Uses the Moondream2 LLM model to recognize the title and author from each spine.
- The Backend streams these results back to the Frontend, starting with the segmented image, followed by the books’ data.
- The Frontend updates the UI in real-time as the stream arrives.
- Image Upload: The user uploads an image of a bookshelf.
- Preprocessing: The image size is reduced to 2560 pixels, then the contrast and brightness are increased and noise is removed.
- Segmentation: The image is sent to the AI service, which segments the books from the background using YOLO 11x segmentation model.
- Book Extraction: Each segmented book spine is cropped and rotated if necessary.
- Recognition: Each cropped book individual image is sent to the Moondream2 LLM model for title and author recognition.
- Streaming: The backend streams the segmented image and recognized book titles and authors back to the frontend. The first chunk contains the segmented image, followed by incremental streams of book data as they are recognized in the fomat of
Book {index}: {title} by {author}
. - Display: The frontend displays the segmented image and dynamically updates the recognized book titles and authors as they arrive.
You need the following software installed to run the project:
git clone https://github.com/suxrobGM/bookshelf-scanner.git
cd bookshelf-scanner
-
Navigate to the backend directory:
cd ./backend
-
Install Python dependencies with Poetry:
If you haven't installed Poetry yet, you can do so by following the instructions here. Optionally, you can configure Poetry to create virtual environments within the project directory. This is recommended for better project isolation. Run the following command:
poetry config virtualenvs.in-project true
Then, install the dependencies:
poetry install
-
Run the Backend API:
poetry run fastapi dev src/main.py
The API should now be accessible at
http://localhost:8000/docs
.
-
Navigate to the frontend directory:
cd ./frontend
-
Install Dependencies:
bun install
-
Run the Frontend Dev Server:
bun run start
The UI should now be accessible at
http://localhost:8001
- Open the Frontend in your browser.
- Upload an image of your bookshelf.
- Wait for the segmented image to appear, followed by streaming book titles and authors as they’re recognized.
If you have any questions or suggestions, feel free to reach out to me at my email address: suxrobgm@gmail.com or via LinkedIn.
My LinkedIn profile: Sukhrob Ilyosbekov