8000 GitHub - AlbinisRudaku/stylometrics-analyzer: A comprehensive tool for analyzing writing style and document characteristics, providing detailed metrics and ML-ready outputs.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

A comprehensive tool for analyzing writing style and document characteristics, providing detailed metrics and ML-ready outputs.

Notifications You must be signed in to change notification settings

AlbinisRudaku/stylometrics-analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stilo - Stylometric Analysis Tool

A comprehensive tool for analyzing writing style and document characteristics, providing detailed metrics and ML-ready outputs.

Table of Contents

Features

  • Comprehensive Analysis: Extracts and analyzes multiple aspects of writing style

    • Lexical features (word usage, vocabulary richness)
    • Syntactic patterns (sentence structure, complexity)
    • Structural elements (paragraph organization, text density)
    • Readability metrics (Flesch Reading Ease, Gunning Fog)
  • Multiple Output Formats:

    • Detailed JSON reports
    • ML-ready CSV format
    • Human-readable summaries
  • Advanced Metrics:

    • Style consistency scoring
    • Document complexity analysis
    • Writing pattern detection
    • Vocabulary usage assessment
  • Performance:

    • Efficient PDF text extraction
    • Parallel processing for large documents
    • Optimized feature calculations
  • Developer-Friendly:

    • Modular architecture
    • Extensive logging
    • Clear documentation
    • Type-safe implementation

Prerequisites

  • Python 3.8 or higher
  • pip (Python package installer)
  • Virtual environment (recommended)

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/stylometrics-analyzer.git
cd stylometrics-analyzer
  1. Create and activate a virtual environment:
# Windows
python -m venv venv
.\venv\Scripts\activate

# Linux/Mac
python -m venv venv
source venv/bin/activate
  1. Install dependencies:
# Install required packages
pip install -r requirements.txt

# Install NLTK data
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords'); nltk.download('averaged_perceptron_tagger')"

# Install spaCy model
python -m spacy download en_core_web_sm

# Install the package in development mode
pip install -e .

Usage

Basic Analysis (Generates both JSON and CSV)

python -m src.main "<path_to_pdf>"

or

stilo "<path_to_pdf>"

# Creates: 
# - results/analysis_TIMESTAMP.json
# - results/analysis_TIMESTAMP.csv

This will create both JSON and CSV files in the results directory with a timestamp.

Specific Format Output

  1. JSON output only:
python -m src.main "<path_to_pdf>" --format json

or


stilo "<path_to_pdf>" --format json

# Creates: ./results/analysis_TIMESTAMP.json
  1. CSV output only:
python -m src.main "<path_to_pdf>" --format csv

or

stilo "<path_to_pdf>" --format csv

# Creates: ./results/analysis_TIMESTAMP.csv

Output Formats

  1. JSON (default when format specified):

    • Complete analysis with all metrics
    • Formatted for readability (pretty-printed)
    • Includes all features and analysis results
  2. CSV (optimized for ML):

    • Flattened data structure
    • Key metrics and features only
    • Ready for machine learning or spreadsheet analysis

When no format is specified, both JSON and CSV files are generated automatically.

Project Structure

stylometrics-analyzer/
├── src/                    # Source code
│   ├── main.py            # Entry point
│   ├── features/          # Feature extractors
│   ├── models/            # Analysis models
│   └── utils/             # Utility functions
├── results/               # Output directory
├── tests/                 # Test files
└── docs/                  # Documentation

Available Metrics

See METRICS_GUIDE.md for detailed explanation of:

  • Style metrics (complexity, consistency)
  • Writing patterns
  • Lexical features
  • Syntactic features
  • Readability scores

Troubleshooting

  1. If you get import errors:

    pip install -e .
  2. If NLTK data is missing:

    python -c "import nltk; nltk.download('all')"
  3. If spaCy model is missing:

    python -m spacy download en_core_web_sm

About

A comprehensive tool for analyzing writing style and document characteristics, providing detailed metrics and ML-ready outputs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0