PlantPal is a web app that rapidly identifies houseplants from photos and provides basic care information and information on any potential toxicity to the users pets. Potentially saving them from years of discomfort or serious illness.
The web app and algorithm are hosted at PlantPal.org
It should work something similar to this:
Have a bug? Please create an issue here on GitHub at https://github.com/Alex-Robson/PlantPal/issues.
Requirements can be found in requirements .txt
They can be installed to your environment with pip:
pip install -r requirements.txt
Using the first four notebooks you can scrape the dataset and remove duplicate images:
- 01_scrape_google_images.ipynb - scrape houseplant images from Google Images
- 02_scrape_shutterstock.ipynb - scrape houseplant images from Shutterstock
- 03_scrape_plantnet.ipynb - scrape houseplant images from Pl@ntNet (a database of user submitted images)
- 04_duplicate_image_remove.ipynb - automatically remove duplicate images.
Manual filtering out of some iamges is necessary, such as removing cartoons of the plant that may come up in google image results.
Notebook 05_EDA.ipynb primarily explores and visualizes the data imbalance in the current dataset
06_train_network.ipynb shows examples of the data, their preprocessing/augmentation, and the resulting training.
Training was done via transfer learning with models such as Inception Resnet v2, Inception v3, ResNet50, and VGG16 starting with the imagenet weights. For the first 20 epochs these were frozen and only a new training block was free to train. After these initial 20 epochs the whole model was unfrozen and the learning rate reduced before another 80 epochs of training.
Finally 07_model_evaluation.ipynb compares how the different models metrics (validation accuracy and top 5 accuracy) vary throughout training, as well as exploring the confusion matrix and other useful information for model evaluation.
Ultimately, most models considered, including the final model built using Inception Resnet v2 produced ~95% validation accuracy. More imporantlty the model shows ~99.5% top 5 accuracy, meaning only 1/200 images of plant which are present in the database are not offered to the user to select from the dropdown - in which case they can select "None of the above" and the app will provide tips on providing a more readily identified photo.
The web app is built using streamlit and is hosted on AWS at PlantPal.org.
The final model implemented is Inception Resnet v2