8000 GitHub - algernon28/pdf-analyzer
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

algernon28/pdf-analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This Spring Boot command-line application analyzes PDF documents for grammar issues, typos, performs comparisons between different PDF versions, and includes placeholder checks for layout and semantics. It generates a comprehensive Allure report to visualize the analysis results.

The application is structured with a main PdfAnalyzerRunner that orchestrates tasks performed by specialized classes: TemplatePdfAnalyzerTask, CompiledVsTemplateComparisonTask, and InternalComparisonTask. Allure reporting utilities are centralized in AllureReportUtil.

Prerequisites

Project Structure

  • cli/PdfAnalyzerRunner.java: Main CLI orchestrator.
  • cli/AllureReportUtil.java: Static helpers for Allure reporting.
  • cli/TemplatePdfAnalyzerTask.java: Handles analysis of the template PDF.
  • cli/CompiledVsTemplateComparisonTask.java: Handles comparison of compiled PDF vs. template.
  • cli/InternalComparisonTask.java: Handles comparison of sections within the compiled PDF.
  • config/PdfConfiguration.java: Application configurations.
  • model/: Data models (GrammarIssue, ParagraphInfo, WordInfo).
  • services/: Core PDF processing services.

The application.yaml file in src/main/resources/ should contain configurations for PDF processing, highlighting, LanguageTool, and CSV reporting.

Building the Application

To build the executable JAR:

mvn clean package

This will produce a JAR file in the target/ directory (e.g., pdf-analyzer-0.0.1-SNAPSHOT.jar).

Running the Application

Execute the JAR from your terminal.

Usage:

java -jar target/pdf-analyzer-0.0.1-SNAPSHOT.jar [templatePdfPath] [compiledPdfPath] [outputDirectoryForFiles]
  • No arguments:

    • The application will look for template.pdf and compiled.pdf in the same directory as the JAR file.
    • Output files will be saved to a directory named pdf_analysis_output created in the JAR's directory.
    java -jar target/pdf-analyzer-0.0.1-SNAPSHOT.jar
  • With specific PDF paths:

    java -jar target/pdf-analyzer-0.0.1-SNAPSHOT.jar "path/to/your/template.pdf" "path/to/your/compiled.pdf"

    (Output directory will be pdf_analysis_output in the current working directory).

  • With specific PDF paths and output directory:

    java -jar target/pdf-analyzer-0.0.1-SNAPSHOT.jar "path/to/template.pdf" "path/to/compiled.pdf" "custom_output_folder"

The application creates an allure-results directory in the current working directory (where the command is run), populating it with JSON files required for the Allure report.

Generating and Viewing the Allure Report

After the application finishes:

  1. Generate the report: Navigate to the directory where the JAR was run (where allure-results was created) and execute:

    allure generate allure-results -o allure-report --clean

    For a single HTML file report:

    allure generate allure-results -o allure-report --clean --single-file
  2. Open the report:

    allure open allure-report

    Or open allure-report/index.html manually.

Analysis Tasks Reported in Allure

  1. Template PDF Analysis: Grammar, typos, layout (WordInfo CSV), semantics (keyword check).
  2. Compiled vs. Template Comparison: Full text diff.
  3. Internal Compiled PDF Consistency: Compares "COPIA EDENRED" vs. "COPIA CLIENTE" (default pages 1-3 vs 4-6) with heuristic filtering.
    • Limitation: Filtering and page ranges are basic; may need refinement.

Configuration

Configure via src/main/resources/application.yaml. TODO comments in the code indicate areas for potential future configuration enhancements.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0