AutomatedCleaning is a Python library for automated data cleaning.It helps preprocess and analyze datasets by handling missing values, outliers, spelling corrections, and more.
- Supports both large (100+ GB) and small datasets
- Detects and handles missing values and duplicate records
- Identifies and corrects spelling errors in categorical values
- Detect outliers
- Detects and fixes data imbalance
- Identifies and corrects skewness in numerical data
- Checks for correlation and detects multicollinearity
- Analyzes cardinality in categorical columns
- Identifies and cleans text columns
- Detect JSON-type columns
- Detect and mask PII types of columns
- Performs univariate, bivariate, and multivariate analysis and save in a dashboard
python3.11 -m venv .venv
.venv\Scripts\activate
uv pip install AutomatedCleaning==1.1
It requires Claude API key which you can get it from here https://console.anthropic.com/settings/keys (Optional Step)
import automatedcleaning as ac
df = ac.load_data("Company_Data.csv")
df = ac.clean_data(df, background_image_path="assets/gradient.png")