10000 GitHub - jhrcook/CoreMLDemoApp: A demonstration of using CoreML
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

jhrcook/CoreMLDemoApp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Demo using CoreML

ios jhc github jhc twitter jhc website

This is a demonstration of using CoreML to recognize succulents from images. It is still very much in it's early stages.

Overview of the process

  1. Create an R script that scrapes the plant names from World of Succulents.
  2. Create a shell script that uses 'googliser' to download the images to a directory called "images/" and each plant has a subdirectory.
  3. Use TransorFlow to retrain an image classifier with my new data set.
  4. Use the core-ml python package to convert the TensorFlow model into one that can be imported into Xcode for CoreML.

Current Status

  • I have made the web-scraping script and created a list of over 1,500 succulents.
  • I have 'googliser' funcitoning and a job-array submission system to parrallelize the process for each plant.
  • Here, I have demonstrated the feasibility of the workflow using a sample of 5 plants.

Work-flow {#workflow}

Data Acquisition

Create plant name list

I scraped plant names from World of Succulents using 'rvest' to retrieve and parse the HTML. The code is in "make_plant_list.r" and outputs a list of names to "plant_names.txt"

Rscript make_plant_list.r

Download images

I am using the bash tool 'googliser' to download plant images. It currently has a limit of 1,000 images per query. This should be sufficient for my needs, though.

Set up 'googliser'

The tool can be installed from GitHub using the following command.

wget -qN git.io/googliser.sh && chmod +x googliser.sh

It requires imagemagick, which is available on O2.

module load imageMagick/6.9.1.10

Below is an example command to download 20 images of Euphorbia obesa.

./googliser.sh \
  --phrase "Euphorbia obesa" \
  --number 20 \
  --no-gallery \
  --output images/Euphorbia_obesa

Downloading the images in parallel

I downloaded all of the images for every plant by submitting a job-array, where each job downloads N images for a single plant. The script "download_google_images.sh" takes an integer (the job number) and downloads the images for the plant on that line of "plant_names.txt".

sbatch \
  --array=1-$(wc -l < plant_names.txt) download_google_images.sh plant_names.txt \
  --constraint="scratch2"

Remove corrupted files and wrong formats

(The following step may no longer be necessary since each image is reportedly a JPEG.)

Some of the images were corrupted or of WEBP format that the TensorFlow script could not accept. These were filtered using another R script.

module load imageMagick/6.9.1.10
Rscript filter_bad_images.r

Ensure all images were properly downloaded.

The R Markdown file "check_images_downloaded.Rmd" checks that each plant has images downloaded. It outputs an HTML file of the results.

Rscript -e 'rmarkdown::render("check_images_downloaded.Rmd")'

In addition, if there are plants that do not have all of the images downloaded (or are within 50 images of the expected number), it creates the file "failed_dwnlds_plant_names.txt" with a list of plant names to be run, again.

sbatch \
  --array=1-$(wc -l < failed_dwnlds_plant_names.txt) download_google_images.sh failed_dwnlds_plant_names.txt \
  --constraint="scratch2"

ML Model Creation

I began by following the tutorial How to Retrain an Image Classifier for New Categories to retrain a general image classifier to recognize the images. I can then convert to and export a CoreML object and import that into a simple iOS app that tries to predict the cactus from a new image.

Install TensorFlow and TensorFlow Hub

TensorFlow is an incredibly powerful machine learning framework that is used extensively in education, research and production. (Excitingly, there is Swift for TensorFlow, though it is still in beta (as of August 18, 2019)).

"TensorFlow Hub is a library for the publication, discovery, and consumption of reusable parts of machine learning models."

To install both, we can use pip from within the virtual environment.

Create virtual environment.

# create and activate a virtual environment
module load python/3.6.0
python3 -m venv image-download
source image-download/bin/activate

# install the necessary packages
pip3 install --upgrade pip
pip3 install setuptools>=41.0.0
pip3 install tensorflow tensorflow-hub
pip3 install coremltools==3.0b5 tfcoreml==0.4.0b1  # betas required for CoreML3

Example retraining: practice with flowers

There is an example on the tutorial for retraining ImageNet to identify several different plants by their flower. All of this was performed in a subdirectory called "flowers_example".

mkdir flowers_example
cd flowers_example

The images were downloaded and unarchived.

curl -LO http://download.tensorflow.org/example_images/flower_photos.tgz
tar xzf flower_photos.tgz
ls flower_photos
#> daisy  dandelion  LICENSE.txt  roses  sunflowers  tulips

The retraining script was downloaded from GitHub.

curl -LO https://github.com/tensorflow/hub/raw/master/examples/image_retraining/retrain.py

The script was run on the plant images.

python retrain.py --image_dir ./flower_photos

If the connection to O2 is set up correctly, the TensorBoard can be run and opened locally.

tensorboard --logdir /tmp/retrain_logs
#> TensorBoard 1.14.0 at http://compute-e-16-229.o2.rc.hms.harvard.edu:6006/ (Press CTRL+C to quit)

Finally, the newe model was used to classify a photo using the "label_image.py" script (downloaded from GitHub).

# download the script
curl -LO https://github.com/tensorflow/tensorflow/raw/master/tensorflow/examples/label_image/label_image.py
# run it on an image
python label_image.py \
    --graph=/tmp/output_graph.pb \
    --labels=/tmp/output_labels.txt \
    --input_layer=Placeholder \
    --output_layer=final_result \
    --image=./flower_photos/daisy/21652746_cc379e0eea_m.jpg
#> daisy 0.99798715
#> sunflowers 0.0011478926
#> dandelion 0.00045892605
#> tulips 0.0003524925
#> roses 5.3392014e-05

It worked!

Small-scale experiment

You can see the results from a small-scale experiement here. Overall, it went well, but the plants used were obviously different from each other, so it may be worth running a test with more similar types of plants.

Retraining work-flow

Activate the virtual environment.

module load python/3.6.0
source image-download/bin/activate

Retrain ImageNet.

python3 imageClassifierModel/retrain.py \
  --image_dir=/n/scratch2/jc604_plantimages \
  --output_graph=imageClassifierModel/tf_succulent_classifier.pb \
  --output_labels=imageClassifierModel/tf_output_labels.txt \
  --summaries_dir=imageClassifierModel/tf_summaries \
  --output_layer=plant_classifier \
  --random_brightness=5

Test on some images.

python label_image.py \
    --graph=imageClassifierModel/tf_succulent_classifier.pb \
    --labels=imageClassifierModel/tf_output_labels.txt \
    --input_layer=Placeholder \
    --output_layer=plant_classifier \
    --image=imageClassifierModel/my_plant_images/Euphorbia obesa_5.JPG

Convert to CoreML format. (untested)

import tfcoreml as tf_converter

tf_converter.convert(tf_model_path='my_model.pb',
                     mlmodel_path='my_model.mlmodel',
                     output_feature_names=['softmax'],
                     input_name_shape_dict={'input': [1, 227, 227, 3]},
                     use_coreml_3=True)

Notes

Image sources

Code Sources

For CoreML3

0