This code is designed to monitor a video stream by being fed individual frames and track all the people present.
The tracker will attempt to recognise individuals so if a person leaves the camera view and then returns they should be recognised as having been seen before even if their identity is not known.
mkdir -p build/x86_64
cd build/x86_64
cmake -D CMAKE_BUILD_TYPE=Release -D USE_AVX_INSTRUCTIONS=1 ../..
make
This accepts an input video and a list of known people and produces an output video annotated with the current frame rate (calculated using an exponential moving average), the faces currently being tracked and their identities if known.
./manager-demo <INPUT_VIDEO_FILE> <OUTPUT_VIDEO_FILE> <MOTION_DIFF_METHOD> [NAME1 FACE1.jpg] ... [NAME_N FACE_N.jpg]
where MOTION_DIFF_METHOD can be one of:
- ALWAYS - always "detect" motion, used for comparison
- NEVER - never detect motion, used to allow us to see cost of reading video with no processing overhead
- CONTOURS - Use the PyImageSearch method based on contours
- MSE - Use mean squared error
- MSE_WITH_BLUR - Use mean squared error after blurring
- DIFF - use frame differencing
- DIFF_WITH_BLUR - use frame differencing after blurring
These are intended to compare the various motion detection methods and get a feel for the performance improvements when using the manager compared to a naive implementation.
./manager-benchmark <VIDEO_FILE> 1
These are intended to get rough performance figures for the basic operations performed by the face tracking and motion detection code. The aim is to guide the implementation and get a feel for how expensive the various operations are on a desktop and Raspberry Pi
./micro-benchmarks <EXAMPLE_IMAGE_WITH_FACE>
There are some micro benchmark results for my desktop (x86_64 with nvidia GTX 1080 GPU) and a Raspberry Pi 3 in benchmark-results. I plan to add results for the Raspberry Pi Zero soon. The results should be taken with a large grain of salt and there are several things that should be improved before they are taken too seriously:
- Running at the large resolutions caused swapping on the Pi3 so the results are unlikely to be pure CPU
- The benchmarks were not all run at the same time so there may be some slight variation in performance.
- The tracker benchmarks were run repeatedly on the same patch - it would be more realistic to run them against different patches and over several frames.
Once again, please don't take the exact values too seriously. They are just a guide to the cost of the operations
Tests use Catch2
Generally follows Google C++ style with the following exceptions:
- constants use Java style ALL_CAPS.
- indentation is 4 spaces not 2
Feel free to point out places where I need to fix code style :-)
This code is licensed under the Boost 1.0 license
- Many thanks to Davis King (@nulhom) for creating dlib and for providing the trained facial feature detection and face encoding models used in this library. For more information on the ResNet that powers the face encodings, check out his blog post.
This code was originally written as the final project for Satya Mallick's Computer Vision for Faces course. My intent is to continue to develop it into a useful package to support human interaction with IoT devices and robots.