8000 GitHub - ziliHarvey/Optical-Character-Recognition: Hard-coded optical character recognition and feature extraction using both MLP and CNN
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

ziliHarvey/Optical-Character-Recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Optical Character Recognition and Feature Extraction

This repository contains implementation of optical character recognition and feature extraction using both MLP and CNN.
Bounding box extraction and detection staff is not magical! It's possible to achieve this using a single-layer neural network and sufficient image manipulation skills instead of a bulky Region-CNN.
Here is the pipeline of how it is implemented from scratching, all you need is Numpy and Scikit-image, no fancy Deep Learning Framework I promise.

Raw image -> Extraction -> Recognition -> Readable Texts  
Extraction: denoise->greyscale->threshold->morphology->clear border->label->measure region->plot box
Recognition: Train nn on NIST36->load resized/scaled bbox->run nn

Files included

src/util.py contains derivative of various activation functions
src/nn.py contains functions for constructing a fully-connected neural network
src/fakeDataTest.py training/testing with a single-layer fully-connected neural network on randomly generated dataset
src/realDataTest.py training/testing with a single-layer fully-connected neural network on NIST36 dataset
src/findLetters.py function for extracting texts from neighboring pixels and plotting with bounding boxes
src/textExtraction.py extracting hand-written texts from image and classified using nn trained in realDataTest.py
src/vaeCompression.py compressing NIST36 dataset images with a 2-hidden-layer vanilla Autoencoder
src/pcaCompression.py compressing NIST36 dataset images using principal component analysis approach
src/pytorchMLP.py implementing a fc nn using Pytorch
src/pytorchCNN.py implementing a CNN using Pytorch
src/cnnVisualization.py visualizing feature maps and filters in a trained cnn

Hand-written texts extraction and recognition

python textExtraction.py

Image compression

python vaeCompression.py  
python pcaCompression.py

Convolutional network visualization

python cnnVisualization.py

Dataset

Accuracy

Visualization

About

Hard-coded optical character recognition and feature extraction using both MLP and CNN

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0