8000 GitHub - DataSpoof/AutomatedCleaning: Automated Data Cleaning Library
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

DataSpoof/AutomatedCleaning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AutomatedCleaning

PyPI LinkedIn Instagram X YouTube

AutomatedCleaning is a Python library for automated data cleaning.It helps preprocess and analyze datasets by handling missing values, outliers, spelling corrections, and more.

Logo

Features

  • Supports both large (100+ GB) and small datasets
  • Detects and handles missing values and duplicate records
  • Identifies and corrects spelling errors in categorical values
  • Detect outliers
  • Detects and fixes data imbalance
  • Identifies and corrects skewness in numerical data
  • Checks for correlation and detects multicollinearity
  • Analyzes cardinality in categorical columns
  • Identifies and cleans text columns
  • Detect JSON-type columns
  • Detect and mask PII types of columns
  • Performs univariate, bivariate, and multivariate analysis and save in a dashboard

Installation

python3.11 -m venv .venv
.venv\Scripts\activate
uv pip install AutomatedCleaning==1.1

Usage

It requires Claude API key which you can get it from here https://console.anthropic.com/settings/keys (Optional Step)

import automatedcleaning as ac

df = ac.load_data("Company_Data.csv")
df = ac.clean_data(df, background_image_path="assets/gradient.png")

🎥 Watch the Demo

Watch the video

About

Automated Data Cleaning Library

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0