Stilo - Stylometric Analysis Tool

A comprehensive tool for analyzing writing style and document characteristics, providing detailed metrics and ML-ready outputs.

Features

Comprehensive Analysis: Extracts and analyzes multiple aspects of writing style
- Lexical features (word usage, vocabulary richness)
- Syntactic patterns (sentence structure, complexity)
- Structural elements (paragraph organization, text density)
- Readability metrics (Flesch Reading Ease, Gunning Fog)
Multiple Output Formats:
- Detailed JSON reports
- ML-ready CSV format
- Human-readable summaries
Advanced Metrics:
- Style consistency scoring
- Document complexity analysis
- Writing pattern detection
- Vocabulary usage assessment
Performance:
- Efficient PDF text extraction
- Parallel processing for large documents
- Optimized feature calculations
Developer-Friendly:
- Modular architecture
- Extensive logging
- Clear documentation
- Type-safe implementation

Prerequisites

Python 3.8 or higher
pip (Python package installer)
Virtual environment (recommended)

Installation

Clone the repository:

git clone https://github.com/yourusername/stylometrics-analyzer.git
cd stylometrics-analyzer

Create and activate a virtual environment:

# Windows
python -m venv venv
.\venv\Scripts\activate

# Linux/Mac
python -m venv venv
source venv/bin/activate

Install dependencies:

# Install required packages
pip install -r requirements.txt

# Install NLTK data
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords'); nltk.download('averaged_perceptron_tagger')"

# Install spaCy model
python -m spacy download en_core_web_sm

# Install the package in development mode
pip install -e .

Usage

Basic Analysis (Generates both JSON and CSV)

python -m src.main "<path_to_pdf>"

or

stilo "<path_to_pdf>"

# Creates: 
# - results/analysis_TIMESTAMP.json
# - results/analysis_TIMESTAMP.csv

This will create both JSON and CSV files in the results directory with a timestamp.

Specific Format Output

JSON output only:

python -m src.main "<path_to_pdf>" --format json

or


stilo "<path_to_pdf>" --format json

# Creates: ./results/analysis_TIMESTAMP.json

CSV output only:

python -m src.main "<path_to_pdf>" --format csv

or

stilo "<path_to_pdf>" --format csv

# Creates: ./results/analysis_TIMESTAMP.csv

Output Formats

JSON (default when format specified):
- Complete analysis with all metrics
- Formatted for readability (pretty-printed)
- Includes all features and analysis results
CSV (optimized for ML):
- Flattened data structure
- Key metrics and features only
- Ready for machine learning or spreadsheet analysis

When no format is specified, both JSON and CSV files are generated automatically.

Project Structure

stylometrics-analyzer/
├── src/                    # Source code
│   ├── main.py            # Entry point
│   ├── features/          # Feature extractors
│   ├── models/            # Analysis models
│   └── utils/             # Utility functions
├── results/               # Output directory
├── tests/                 # Test files
└── docs/                  # Documentation

Available Metrics

See METRICS_GUIDE.md for detailed explanation of:

Style metrics (complexity, consistency)
Writing patterns
Lexical features
Syntactic features
Readability scores

Troubleshooting

If you get import errors:
```
pip install -e .
```

If NLTK data is missing:

python -c "import nltk; nltk.download('all')"

If spaCy model is missing:
```
python -m spacy download en_core_web_sm
```

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
src		src
stilo.egg-info		stilo.egg-info
tests		tests
.gitignore		.gitignore
METRICS_GUIDE.md		METRICS_GUIDE.md
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Stilo - Stylometric Analysis Tool

Table of Contents

Features

Prerequisites

Installation

Usage

Basic Analysis (Generates both JSON and CSV)

Specific Format Output

Output Formats

Project Structure

Available Metrics

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Languages

AlbinisRudaku/stylometrics-analyzer

Folders and files

Latest commit

History

Repository files navigation

Stilo - Stylometric Analysis Tool

Table of Contents

Features

Prerequisites

Installation

Usage

Basic Analysis (Generates both JSON and CSV)

Specific Format Output

Output Formats

Project Structure

Available Metrics

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages