TFLcam

Firmware (arduino sketch) for the ESP32-CAM turning it into a TensorFlow-Lite based classification sensor.

Save a TensorFlow Lite model on an SD card, slide the SD card in the ESP32-CAM, and the ESP will print over Serial what it sees with it's camera.

Here is a demo video.

Introduction

The TFLcam project started with getting to know the ESP32-CAM. The ESP32-CAM is a cost effective and small board with an ESP32, an OmniVision OV2640 camera, an SD card, and even a flash LED. It does not have an on-board USB-serial converter, I would advise to buy one separately.

The ESP32-CAM board is powerful enough to capture images with its camera and pass them through a TensorFlow Lite interpreter running on the ESP32 itself ("edge AI"). I piloted that with a rock-paper-scissors classifier. A Python script on a PC trains and generates the (TensorFlow Lite) model, and saves it as a flatbuffer file. The flatbuffer is linked in in the ESP32 sketch, together with a TensorFlow Lite interpreter (I took EloquentTinyML), resulting in a rock-paper-scissors classifier, see video.

This project goes one step further. It puts the flatbuffer on the SD card. It also puts a configuration file on the SD card: what is the input of the TensorFlow model (e.g. which area of the camera to crop, how much to subsample) and what is the output of the TensorFlow model (how many classes, what are their names).

This means that by writing the two files to the SD card, one can configure the TFLcam to be either a rock-paper-scissors classifier or, for example, a simple Lego brick classifier.

Architecture

The core modules of the TFLcam project are

TensorFlow Lite interpreter (tflu.cpp and tflu.h).
Commands (cmds.cpp and cmds.h), you also need my Arduino l 8000 ib for the command interpreter cmd.
Camera driver (cam.cpp and cam.h).
SD card file system driver (file.cpp and file.h).
Top-level application sketch (TFLcam.ino and TFLcam.h).

The command interpreter contains several commands. Key are the commands that configure the processing done on the camera output (cropping, sub sampling); the reduced frame buffer is fed to the TensorFlow interpreter. Another configuration command loads the TensorFlow model from the SD card into the TensoFlow interpreter. Finally there is a command that configures the labels of the classes predicted by the model.

All these configuration commands are typically stored in a file boot.cmd. This file is automatically run on (power-on) reset:

// File boot.cmd for classifier: rock, paper, scissors
@img crop  left 122  top 36  width 112  height 184  xsize 28  ysize 46
@labels none paper rock scissors // order as in rps.tfl
@file load /rps.tfl // from 28x46 to 4

The example model (rps.tfl) is trained on input images of 28×46 (see crop xsize and ysize) and is expected to output a vector of four class probabilities (see labels none, paper, rock, and scissors).

Find a typical SD card image in this repo, in the sub directory sdcard.

Sample run

Here is a sample run. Observe that the ESP32 boots: it shows a banner and successful initialization (no error messages). Then it runs boot.cmd, which configures the input and output for the TensorFlow Lite interpreter, and loads the TensorFlow Lite model rps.tfl.

   _______ ______ _
  |__   __|  ____| |
     | |  | |__  | |     ___ __ _ _ __ ___
     | |  |  __| | |    / __/ _` | '_ ` _ \
     | |  | |    | |___| (_| (_| | | | | | |
     |_|  |_|    |______\___\__,_|_| |_| |_|
TFLcam - TensorFlow Lite camera - version 1.0.0

type 'help' for help
>> file run /boot.cmd

>> // File boot.cmd for classifier: rock, paper, scissors
>> @img crop  left 122  top 36  width 112  height 184  xsize 28  ysize 46
>> @img trans rotcw
>> @img proc histeq
>> @labels none paper rock scissors // order as in rps.tfl
>> @file load /rps.tfl // from 28x46 to 4
>>

>> mode continuous 2
mode: continuous (stable 2)
>>
predict: 0/none
predict: 2/rock
predict: 3/scissors
predict: 1/paper
predict: 2/rock
predict: 0/none

Manually I put the sensor in continuous mode (mode continuous 2). The final 2 asks for reports, only when the prediction changes and when it is stable for 2 measurements.

I have put my rock hand in front of the camera, changed it to scissors, paper and back to rock, and then removed it. Notice that I indeed got one report for each.

If you want to replay, put your ESP32-CAM 40cm from a white wall and keep your hand at the right position. You can check that with mode single ascii.

>>  mode single ascii
y\x: 00                                            45
  0: |WWW@@888OOooo===---            ----====oooOO8W|
  1: |WW@@888OOOoo====--              ---====oooOOOW|
  2: |WW@@888OOooo====--               ---===oooOOOW|
  3: |W@@@888OOooo==---                ---===oooOOOW|
  4: |W@@@88OOOooo==---                ---===oooOOOW|
  5: |W@@@88OOOoo===---                 --===oooOOOW|
  6: |W@@@88OOooo===---                ---===oooOOOW|
  7: |W@@@88OOOooo==----               ---===oooOOOW|
  8: |W@@@888OOooo==---                ---===oooOO8W|
  9: |W@@@888OOooo==----               ---===oooOO8W|
 10: |W@@@888OOooo===---              ----==ooooOO8W|
 11: |W@@@@88OOooo===-----            ----==oooOOO8W|
 12: |WW@@@88OOOoo=o=-----           ----===oooOOO8W|
 13: |WW@@@88OOOoooo====-----        ----===ooOOO88W|
 14: |WW@@@888OOOoooo====----    - ----=====ooOOO88@|
 15: |WWW@@@888OOoooo=====-----------======ooOOO888W|
 16: |WWW@@@888OOooooo======--------=====ooooOOO888@|
 17: |WWW@@@@888OOooooo======---========ooooOOO888@@|
 18: |WWWW@@@8888Ooooooo==============oooooOOO8888@W|
 19: |WWWW@@@88888OoOOoooo===========oooooOOO8888@@@|
 20: |WWWWW@@@8888OOOOOooooo=======ooooooOOO8888@@@@|
 21: |WWWWW@@@@8888OOOOOoooooooooooooooOOOO88888@@@@|
 22: |WWWWWW@@@@888OO8OOOOooooooooooOOOOOOO8888@@@@W|
 23: |WWWWWWW@@@@@88O8OOOOOOOOOOOOOOOOOOOO888@@@@@WW|
 24: |WWWWWWW@@@@@@888OO8888OOOOO8OOO88@@@@@@@@@@@WW|
 25: |WWWWWWWW@@@@@@@@8O888888888888@WWWWWWW@@@@@WWW|
 26: |WWWWWWWWW@@@@@@@88@@888888888@WWWWWWWW@@@@WWW@|
 27: |WWWWWWWWWW@@@@@@@8@@@@@888@@@WWWWWWWWW@@WWWWWW|
predict: 0/none
>>  mode single ascii
y\x: 00                                            45
  0: |==-------                             -------=|
  1: |=-------                               ------=|
  2: |=-------                               ------=|
  3: |=-------        =Oo=----               ------=|
  4: |--------        W8O88OOooo--           ------=|
  5: |--------        8@88oooo==oOooo=-      ------=|
  6: |-------       ---OW8OOOo===oooooo-    -------=|
  7: |=----========oO88=-@@@@Oo=oooooooO=-  -------=|
  8: |=-=WWWW8O8OOOOOOO@WWWWW8OooO8OooooOo=--------=|
  9: |=-WWW@8Ooooooo=ooooOOO8WWWW@@8OooooOO=-------=|
 10: |=-WW@@8OOooooo==oooooooO88WW88Oo=oooO8O=-----=|
 11: |oWWWWW@@@8OoOooooooo=o=OO@@@OOOO===ooO8O-----=|
 12: |WWWW@@@@@@@8@@@@8@@o=ooO8@OOOOoooooooOO8o=---=|
 13: |WWWW@88OOOooOooO8@8o===oO@8888OoooooOOO@@8O===|
 14: |WWWWWW@88OOoo==ooOOOOoooO8W8@@8OOoOOO88@@@WWWW|
 15: |oOWWWWW@8OOOoooooOOoo8OooO8W@@@88OOO888@@@@@@W|
 16: |ooWWWWWW@@@@@@88@OOOoo8ooOo8@@W@@OO88888@88@@@|
 17: |=oWWWW@@8OOOOOoO8WOooooOoOOOO@WWW@888888888@@@|
 18: |=ooO@WWW@8OO8OooO8Oooooooo8OOo8@WW@8@888888@@@|
 19: |======@WWW@O88OooOOoOOOooOOOOOO88@88888@@@@@@@|
 20: |======WWWW@WW@@W@@OOOOOoOOOOOO888@88O888@@@@@@|
 21: |======@WW@@8OOOOOO@@OOOoOOO8888@8@8OOO88@@@@@@|
 22: |=======oWWW@888ooOO@8oooOO8@@8@@@@8O88@@@@@WWW|
 23: |==========o8@WWW@8O8@@8O888@@@@888888@@WWW@WWW|
 24: |===============O8WW@8@@O88888@88888@@@WWWWWWWW|
 25: |=o================oO@@@@@@@@@@@@@@@WWWWWWWWWWW|
 26: |=o=======----======oooO@WWWWWWWWWWWWWWWWWWWWWW|
 27: |=oo=======--------=====o==oOO88OWWWWWW8@WWWWWW|
predict: 1/paper
>>  mode single ascii
y\x: 00                                            45
  0: |ooo===-----                         ------===O|
  1: |o====------                         ------===O|
  2: |o====-----                          ------===O|
  3: |o===------                          ------===O|
  4: |o===-----                           ------===O|
  5: |o===-----        -OOOOo--  -=-      -----====O|
  6: |o===------      o@8O8888@888Oo=-   ------====O|
  7: |o====-----     o@8OO88W@@88888Oo-  ------====O|
  8: |o===-------  - 888888888O8OOOO88o=------=====O|
  9: |o====----------@88W8OOOOOOOOOOO88Oo=----=====O|
 10: |o====-------==oO8@W8OooOOOOOOOOO888Oo--=====oO|
 11: |o====-------=ooOWWW8OOOOO888OOOO888@8o=====ooO|
 12: |oo===-------=oo@8@@OOO888W@88OOOO888@@Ooo==ooO|
 13: |oo====------=o888W@OO88@@WW@888OO888@@@88OoooO|
 14: |oo=====----==o8@@W@888@O8WWW@88888888@@@WWWWWW|
 15: |ooo=========oo8@@W@88W8O8WWW@@@88888@@@@@@@WWW|
 16: |ooo=========oOOO@W8@W888@@@WWW@@@@@@@@@@@@@@@@|
 17: |oooo=======oooOOOW@W@888OO8WWWWW@@@@@@@@8@@@@@|
 18: |oooo=======oooOOoW@88888OO8WWWWWWW@@@@@@@@@@@@|
 19: |ooooo======oooOOOW@88888888WWWWWWW@@@@@@@@@@@@|
 20: |Ooooo=====ooooOOOOW@88@@@88WWW@@88@@@@@@@@@@@@|
 21: |OOoooo=====oooOOOOOW@888OO8WWW@888888@@@@@@@@@|
 22: |OOOoooooo=oooooOOOOW@@@8888W@8@8888@@@@@@WWWW@|
 23: |OOOOoooooooooooOOOO@W@@@@W@8888@88@@@W@WWWWWWW|
 24: |OOOOOooooooooooOOO8OWWWWWW@88888@@@WWWWWWWWWWW|
 25: |OOOOOooooooooooOOOO888WWWWWWWW@WWWWWWWWWWWWWWW|
 26: |OOOOOOoooooooooOooOOOOOO88@WWWWWWWWWWWWWWWWWWW|
 27: |OOOOOOoooooooooooooOOOOOO888O88@WWWWWWWWWWWWWW|
predict: 2/rock
>> mode single ascii
y\x: 00                                            45
  0: |ooo==-----                         -----=====O|
  1: |oo===-----                         ------====O|
  2: |oo==------                          -----====O|
  3: |o===-----                          ------====o|
  4: |====------                         ------====O|
  5: |====----- -OOOOOoo=-               ------====o|
  6: |====----- WW@@888OOOOOOo-          ------===oo|
  7: |====------WW@@8888OOO8OOOOo-       -----====oo|
  8: |====-----=8WW@@@@88888OOOOO88OOOo=------====oo|
  9: |=====----===OOO8@@@@@@8OOOOO8888888Oo=--====oo|
 10: |=o===-----========oO@@@@88888888888888Oooo=ooo|
 11: |=o===------------==ooO8@@@8@88@888@@@@@@88OOOo|
 12: |=o===----------  --==ooooW888@W8@@@WW@@@@@@@@8|
 13: |=o====-------    -=oOO8@@@@8O8W@@WW@8@@@WW@@@@|
 14: |=oo===----oOO88888@@888@@@@W@8@W@WW8888@@@@@@W|
 15: |=oo====-8WW@@@888888OO888888@WWWWW88OO8888@@@W|
 16: |=oo====OWWW@8@8888@888888@@@@WWWW@8OO8888888@@|
 17: |oooo==oOWWW@@@@88@@@@@@8@W8@WWW@88OO88888888@@|
 18: |ooooo=oo8WW@8OOOOOOOOooO@8@W@W@88OO88@88888@@W|
 19: |ooooo=oooo=o=======ooooO8WW@@@@888@@W@@@@@@WWW|
 20: |oOooooo======-======oooWWW@@@@@8@@WWWWWWWWWWWW|
 21: |OOOoooo======--====ooO8@@8@88@@@WW88@WWWWWWWWW|
 22: |OOOOooooo========oooOOW88888@WW@8888@WWWWWWWWW|
 23: |OOOOooooooo==o=ooooOOO@8O8WWWW@@@88@WW@WWW@WWW|
 24: |OOOOOoooooooooooooOOOOO@WWW@OW@@@@WWWW@@@WWWWW|
 25: |OOOOOOooooooooooooOOOOOOOOOO8@WWWWWWWWWWWWWWWW|
 26: |OOOOOOOoooooooooooOOOOOOOOOOOO8@WWWWWWWWWWWWWW|
 27: |OOOOOOOoooooooooOoOOOOOOOOOOOO88@WWWWWWWWWWWWW|
predict: 3/scissors

Versions

I have often problems with Arduino versions, so TFLcam tries to record them. When giving the version command, the used versions are shown.

>> version
app     : TensorFlow Lite camera (TFLcam) 1.0.3
library : cmd 8.0.1
library : EloquentTinyML 0.0.10
runtime : Arduino ESP32 1_0_6
compiler: 5.2.0
arduino : 10816
compiled: Sep 23 2021, 22:06:36
>>

Also, it appears important to have File > Preferences > Compiler warnings on Default.

Limitations

At this moment the biggest limitations are that the sketch uses the camera in gray scale only (but that could be changed), and the available RAM, a structural problem.

The RAM needs to hold the "Tensors" (the values of the "pixels" in the various layers). Secondly, because we opted for a dynamic model, the RAM also needs to hold the model (flatbuffer). These are #defines in the code, both below 100 kilo byte. Arduino currently reports Global variables use 118588 bytes (36%) of dynamic memory, but I have trouble allocating big consecutive blocks (as needed for the tensors and the flatbuffer).

Personally I do not care too much about speed, but that might be different for your application. At this moment, the rock paper scissors classifier runs at about 5FPS. Note that 2/3 of the time is spend on capture and normalize, and 1/3 on prediction (inference).

>> mode single time                                                                
predict: 0/none                                                                 
time: 5.10 FPS, 196 ms = 119 (capture) + 11 (crop) + 66 (predict) + 0 (output)  
>> mode single time                                                             
predict: 3/scissors                                                             
time: 6.49 FPS, 154 ms = 76 (capture) + 11 (crop) + 67 (predict) + 0 (output)   
>> mode single time                                                             
predict: 1/paper                                                                
time: 5.21 FPS, 192 ms = 114 (capture) + 11 (crop) + 67 (predict) + 0 (output)  
>> mode single time                                                             
predict: 2/rock                                                                 
time: 4.50 FPS, 222 ms = 144 (capture) + 11 (crop) + 67 (predict) + 0 (output)  
>> mode single time                                                             
predict: 0/none                                                                 
time: 5.56 FPS, 180 ms = 102 (capture) + 11 (crop) + 66 (predict) + 1 (output)  
>> mode single time                                                             
predict: 2/rock                                                                 
time: 6.13 FPS, 163 ms = 84 (capture) + 12 (crop) + 66 (predict) + 1 (output)   
>> mode single time                                                             
predict: 3/scissors                                                             
time: 4.76 FPS, 210 ms = 132 (capture) + 12 (crop) + 66 (predict) + 0 (output)  
>> mode single time                                                             
predict: 1/paper                                                                
time: 4.81 FPS, 208 ms = 130 (capture) + 11 (crop) + 67 (predict) + 0 (output)  
>>

Application

As a first application, I'm using TFLcam to make a sensors for Lego Mindstorms, even with a Lego compatible casing. That sensor is used in a rock, paper, scissors robot model.

References

TensorFlow Lite for ESP32: starter post, associated library, dynamic loading of model
The camera driver
Using an SD card on the ESP32
Histogram equalization as an image improvement step
Save image as text file in PGM format
Omnivision OV2640 camera
ESP32 reference manual
Details on the ESP32-CAM, e.g. schematics and how to prevent Flash LED to clash with SD card
Brownout work around
ASCII font generator

(end)

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
TFLcam		TFLcam
casing		casing
mindstorms		mindstorms
model		model
sdcard		sdcard
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TFLcam

Introduction

Architecture

Sample run

Versions

Limitations

Application

References

About

Uh oh!

Releases

Packages

Languages

License

maarten-pennings/TFLcam

Folders and files

Latest commit

History

Repository files navigation

TFLcam

Introduction

Architecture

Sample run

Versions

Limitations

Application

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages