This Spring Boot command-line application analyzes PDF documents for grammar issues, typos, performs comparisons between different PDF versions, and includes placeholder checks for layout and semantics. It generates a comprehensive Allure report to visualize the analysis results.
The application is structured with a main PdfAnalyzerRunner
that orchestrates tasks performed by specialized classes: TemplatePdfAnalyzerTask
, CompiledVsTemplateComparisonTask
, and InternalComparisonTask
. Allure reporting utilities are centralized in AllureReportUtil
.
- Java 17 JDK or newer
- Apache Maven (for building)
- Allure Commandline Tool (for generating HTML reports)
cli/PdfAnalyzerRunner.java
: Main CLI orchestrator.cli/AllureReportUtil.java
: Static helpers for Allure reporting.cli/TemplatePdfAnalyzerTask.java
: Handles analysis of the template PDF.cli/CompiledVsTemplateComparisonTask.java
: Handles comparison of compiled PDF vs. template.cli/InternalComparisonTask.java
: Handles comparison of sections within the compiled PDF.config/PdfConfiguration.java
: Application configurations.model/
: Data models (GrammarIssue
,ParagraphInfo
,WordInfo
).services/
: Core PDF processing services.
The application.yaml
file in src/main/resources/
should contain configurations for PDF processing, highlighting, LanguageTool, and CSV reporting.
To build the executable JAR:
mvn clean package
This will produce a JAR file in the target/
directory (e.g., pdf-analyzer-0.0.1-SNAPSHOT.jar
).
Execute the JAR from your terminal.
Usage:
java -jar target/pdf-analyzer-0.0.1-SNAPSHOT.jar [templatePdfPath] [compiledPdfPath] [outputDirectoryForFiles]
-
No arguments:
- The application will look for
template.pdf
andcompiled.pdf
in the same directory as the JAR file. - Output files will be saved to a directory named
pdf_analysis_output
created in the JAR's directory.
java -jar target/pdf-analyzer-0.0.1-SNAPSHOT.jar
- The application will look for
-
With specific PDF paths:
java -jar target/pdf-analyzer-0.0.1-SNAPSHOT.jar "path/to/your/template.pdf" "path/to/your/compiled.pdf"
(Output directory will be
pdf_analysis_output
in the current working directory). -
With specific PDF paths and output directory:
java -jar target/pdf-analyzer-0.0.1-SNAPSHOT.jar "path/to/template.pdf" "path/to/compiled.pdf" "custom_output_folder"
The application creates an allure-results
directory in the current working directory (where the command is run), populating it with JSON files required for the Allure report.
After the application finishes:
-
Generate the report: Navigate to the directory where the JAR was run (where
allure-results
was created) and execute:allure generate allure-results -o allure-report --clean
For a single HTML file report:
allure generate allure-results -o allure-report --clean --single-file
-
Open the report:
allure open allure-report
Or open
allure-report/index.html
manually.
- Template PDF Analysis: Grammar, typos, layout (WordInfo CSV), semantics (keyword check).
- Compiled vs. Template Comparison: Full text diff.
- Internal Compiled PDF Consistency: Compares "COPIA EDENRED" vs. "COPIA CLIENTE" (default pages 1-3 vs 4-6) with heuristic filtering.
- Limitation: Filtering and page ranges are basic; may need refinement.
Configure via src/main/resources/application.yaml
. TODO
comments in the code indicate areas for potential future configuration enhancements.