PDF Summarizer is a Java Spring Boot web application that enables users to upload PDF files and receive concise text summaries. It employs Apache PDFBox to extract text and a custom frequency-based algorithm to identify and display the most relevant sentences.
- User-Friendly Interface: Simple and intuitive design for easy navigation.
- PDF Upload: Easily upload PDF files for summarization.
- Text Extraction: Utilizes Apache PDFBox for accurate text extraction.
- Summarization Algorithm: Implements a custom frequency-based algorithm for summarizing text.
- Downl 9A45 oad Summaries: Download the summarized text for further use.
- Web-Based: Accessible from any device with a web browser.
- Java: The core programming language for backend development.
- Spring Boot: Framework used to create the web application.
- Apache PDFBox: Library for PDF content extraction.
- HTML/CSS/JavaScript: Technologies used for the frontend.
- Thymeleaf: Template engine for rendering web pages.
To get started with PDF Summarizer, follow these steps:
-
Clone the Repository:
git clone https://github.com/kyawhtetoo134/pdf-summarizer.git
-
Navigate to the Project Directory:
cd pdf-summarizer
-
Build the Project: Use Maven to build the project.
mvn clean install
-
Run the Application: Start the application using the following command:
mvn spring-boot:run
-
Access the Application: Open your web browser and go to
http://localhost:8080
.
- Upload a PDF: Click on the upload button to select a PDF file from your device.
- Generate Summary: After the file is uploaded, click on the summarize button.
- View Summary: The application will display the summarized text.
- Download Summary: Click the download button to save the summary as a text file.
The PDF Summarizer uses a two-step process to generate summaries:
-
Text Extraction: The application utilizes Apache PDFBox to extract text from the uploaded PDF file. This ensures that the text is accurately retrieved for summarization.
-
Summarization: A custom frequency-based algorithm analyzes the extracted text. It identifies the most relevant sentences based on their frequency and context within the document. The algorithm ranks sentences and presents the top results as a summary.
We welcome contributions from the community! If you would like to contribute, please follow these steps:
- Fork the Repository: Click the fork button on the top right of the repository page.
- Create a Branch: Create a new branch for your feature or bug fix.
git checkout -b feature/YourFeatureName
- Make Changes: Implement your changes and commit them.
git commit -m "Add your message here"
- Push Changes: Push your changes to your forked repository.
git push origin feature/YourFeatureName
- Create a Pull Request: Go to the original repository and create a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
For any inquiries, please contact:
- Kyaw Htet Oo: kyawhtetoo134@gmail.com
To download the latest version of the PDF Summarizer, visit the Releases section. You can find the latest builds and execute them as needed.
For more details, you can also check the Releases section.
This repository covers various topics related to automated text summarization, including:
- Automated Text Summarization System
- Document Summarizer Using Spring Boot
- Extractive Summarization of PDF Documents
- Java-Based PDF Summary Generator
- Natural Language Processing for PDF Summarization
- PDF Content Extraction and Summarization
- PDF File Summarizer
- Sentence Scoring Based Summarizer
- Text Summarization from PDF Files
- Web-Based PDF Summarizer Using Java
Thank you for checking out PDF Summarizer! We hope you find it useful for your document summarization needs.