8000 GitHub - brkljac/VideoFace2.0: Transforming faces into video stories - VideoFace2.0
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

brkljac/VideoFace2.0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 

Repository files navigation

VideoFace2.0

Transforming faces into video stories

Authors: Branko Brkljač $^{\text{§}}$, Vladimir Kalušev $^{\text{§}}$, Branislav Popović and Milan Sečujski

$^{\text{§}}$ equal contribution


VideoFace 2.0 processing workflow and applications


Abstract

Face detection and face recognition have been in the focus of vision community since the very beginnings. Inspired by the success of the original Videoface digitizer, a pioneering device that allowed users to capture video signals from any source, we have designed an advanced video analytics tool to efficiently create structured video stories, i.e. identity-based information catalogs. VideoFace2.0 is the name of the developed system for spatial and temporal localization of each unique face in the input video, i.e. face re-identification (ReID), which also allows their cataloging, characterization and creation of structured video outputs for later downstream tasks. Developed near real-time solution is primarily designed to be utilized in application scenarios involving TV production, media analysis, and as an efficient tool for creating large video datasets necessary for training machine learning (ML) models in challenging vision tasks such as lip reading and multimodal speech recognition. Conducted experiments confirm applicability of the proposed face ReID algorithm that is combining the concepts of face detection, face recognition and passive tracking-by-detection in order to achieve robust and efficient face ReID. The system is envisioned as a compact and modular extensions of the existing video production equipment. Presented results are based on test implementation that achieves between 18-25 fps on consumer type notebook. Ablation experiments also confirmed that the proposed algorithm brings relative gain in the reduction of number of false identities in the range of 73%-93%. We hope that the presented work and shared code will stimulate further interest in development of similar, application specific video analysis tools, and lower the entry barrier for production of high-quality multi-modal ML datasets in the future.

Original Videoface device

Original Videoface device

For more information, please check our conference publication on the link below

📄 Publication preprint available at: DOI:10.48550/arXiv.2505.02060

code TBA ...


Main characteristics include:

  • Near real-time operation with ~ 18-25 fps (consumer type notebook with GPU)

  • On-line or off-line processing mode with different types of results visulaizations

🔍 * Detailed log-file of face identities found by the system. Suitable for video cataloging and spatial-temporal localization of each face image in which the same person appears

  • Fast post production of video stories based on the results of video analysis stored in the corresponding log-file: single run of face ReID producing multiple outputs

  • Modular and independent of the specific choice of methods for each of the components in Algorithm 1 (face detection and face recognition models)

⚡ * Succesfully tested on open-set face ReID in open-world indoor and outdoor scenes


Main applications include:

  • TV production, media analysis and creative industries.

⭐ * Production of custom video-based datasets for machine learning (ML) tasks involving multi-modal inputs like speech, text and image.

  • Automated video analysis and cataloging.

⭐ * Production of structured video outputs or video stories.

🔨 * Editing of interviews or reportages, talk-shows, podcasts and other formats that include multiple speakers or participants.



Video stories and VideoFace2.0 face ReID

An example of the produced testVideo2 --> person30_video_story

  • Input testVideo2 that corresponds to the reportage brought by the field reporter to TV studio is automatically processed by VideoFace2.0 and the shown video story corresponding to an unknown person identified by the system as "person30" is created.

  • Produced "person30" video story alongside extracted face and mouth region videos (side-by-side visualization):

▶️ testVideo2 --> "person30" + face + mouth region video stories

Watch the testVideo2 "person30" video story alongside extracted face and mouth region videos

  • All frames of the original input video in which the open-set face of the selected person appears are identified by the system and mixed together into the shown video story with synchronized audio.

  • Original testVideo2 reportage: "Vancouver Talks" - by @impsquared YouTube™ channel.

  • Produced video story also includes overlaid visualizations of face bounding boxes and face landmarks of all other persons that are present in the same frames in which the selected "person30" appears (other persons identified by the proposed face ReID procedure).

  • Produced video story, download link: ▶️testVideo2-->person30_video_story.

  • *more video examples are available on the corresponding links in the Experimental results section below.


Other examples of video stories and VideoFace2.0 analyses:


  

   (a) face region video story;

  

   (b) mouth region video story;

  

   (c) face identity mismatch;

   Image Link

   (d) ablation experiments on testVideo2 (side-by-side visual comparison of 4 different algorithms);

   Image Link

   (e) on screen presence of all 23 identities found by VideoFace2.0 in testVideo2 in case of full Algorithm 1 - the proposed face ReID procedure corresponding to the the best face ReID result shown in the lower right part of abalation experiments visualization in (d).

Proposed generic face ReID procedure:

  



Table 1 - Summary of ablation experiments

Reduction of false identities brought by Algorithm 1 (relative gain $\gamma$)


# Number of found identities exp 1 exp 2 exp 3 exp 4 true $\gamma$ [%] [m:s]
testvideo1 50 42 30 7 4 83 02:44
testvideo2 421 378 263 23 13 93 07:25
testvideo3 39 37 25 9 6 73 18:45

Table 1 notes:

  • $\exp i$, $i=1..4$, ablation experiments

  • $\gamma$, relative gain of Algorithm 1 in terms of number of found identities in comaprison to other experiments, calculated as:
    $\gamma = (1- \exp4 /(\sum_{i=1}^{3}(\exp_i)/3))\times 100%$

  • exp 1: detection + recognition;

  • exp 2: detection + recognition + passive tracker filtering of new identities

  • exp 2: detection + recognition + passive tracker filtering of new identities + detection confidence score

  • exp 4 (full Algorithm 1): detection + recognition + passive tracker filtering of new identities + detection confidence score + temporal post filtering

  • "true", expected or true number of unique identities in each video. Number of distinct faces that are expected to be found by the system. Does not mean that these faces have significant on screen presence.

  • [m:s] indicates duration in minutes and seconds.



Experimental results:

Video demonstrations of VideoFace2.0 functionalities are available on the following YouTube™ channel:

  • Presented experiments include 3 specific test videos with challenging face ReID situations and scene environments characteristic for the above mentioned application scenarios.

💡 In the following are image previews and individual YouTube™ links of some of the conducted experiments.



testVideo1

Face ReID results based on full Algorithm 1 with parameters set to: $\sigma_h=0.6$, $\tau_d=0.6$, $\tau=0.8$, and $t_{min 8000 }=60$ frames*:

   ▶️testVideo1 face ReID results

   Watch the testVideo1 face ReID results


Face ReID ablation experiments, side-by-side comparison:

   ▶️testVideo1 face ReID ablation experiments

   Watch the testVideo1 face ReID ablation experiments

   Experiments are numbered 1-4 and consist of:

     1. Upper left: detection + recognition (exp 1)

     2. Upper right: detection + recognition + passive tracker filtering of new identities (exp 2)

     3. Lower left: detection + recognition + passive tracker filtering of new identities + detection confidence score (exp 3)

     4. Lower right: detection + recognition + passive tracker filtering of new identities + detection confidence score + temporal post filtering (proposed full Algorithm 1, exp 4)

   $^{\text{*}}$ Note that the introduced $t_{min}$ delay in new identity approval only affects initial appearance of new identities, but does not affect ReID of the identities already present in the gallery (real-time operation after the new identity is approved as valid). Therefore, it could be replaced by a more complex ReID decision rule, which would have the same role as the introduced post filtering. In case of the need for immediate appearance of new identities in real-time operation, $t_{min} \approx 0$ should be used.


Face video story:

   ▶️testVideo1 face video story

   Watch the testVideo1 face video story


Mouth region video story:

   ▶️testVideo1 mouth region video story

   Watch the testVideo1 mouth region video story



testVideo2

Face ReID results based on full Algorithm 1 (with same set of parameters as for testVideo1):

   ▶️testVideo2 face ReID results

   Watch the testVideo2 face ReID results


Face ReID results together with face and mouth region extraction (side-by-side) for the selected person identified as "person30":

   ▶️testVideo2 person30 face ReID with face and mouth region extraction

   Watch the testVideo2 person30 face ReID with face and mouth region extraction

   Video consists of 3 parts:

   1. Left side: Face re-identification (ReID) results.

      Video shows all persons that have been identified as present together (in the same frame) with the selected "person30": their bounding boxes, person IDs and face landmark points.

   2. Top right: Zoomed-in face image regions for the selected person.

      Video part contains face images of "person30" cropped to face detection bounding box and:

      * AVG-scaled: Scaled to average width and height of face ROI over all frames in which "person30" appears (non-uniform scaling): face image on the left
      * MAX-scaled: Face image is only positioned next to the AVG-scaled version (original image without scaling). Shown face video dimensions correspond to face appearance with maximum width and height in the original video.

   3. Bottom right: Mouth region extraction for the selected person.

      Video part interpretation is the same as in the case of face image regions described in the previous point 2.


Face ReID ablation experiments, side-by-side comparison:

   ▶️testVideo2 face ReID ablation experiments

   Watch the testVideo2 face ReID ablation experiments



testVideo3

Face ReID results (with same set of parameters as for testVideo1):

   ▶️testVideo3 face ReID results

   Watch the testVideo3 face ReID results


Face re-identification results together with landmark points, face and mouth region extraction (side-by-side) for the selected person identified as "person1":

▶️ testVideo3 --> "person1" video story

Watch the testVideo3 "person1" video stories


Face ReID ablation experiments, side-by-side comparison:

   ▶️testVideo3 face ReID ablation experiments

   Watch the testVideo3 face ReID ablation experiments



Licenses:

Original testVideo2 and testVideo3 are avaialble on the following links under the YouTube™'s "Creative Commons Attribution license (reuse allowed)":

Presented implementation and experimental results are based on the pre-trained face detection and face recognition models kindly provided by the InsightFace project - State-of-the-art 2D and 3D face analysis.

VideoFace2.0 is released under the MIT License terms in the provided LICENSE file.



How to cite:

[1] Brkljač, B., Kalušev, V., Popović, B., Sečujski, M. (2025). Transforming faces into video stories - VideoFace2.0. In Preprint submitted to the 14th Mediterranean Conference on Embedded Computing - MECO 2025, Budva, Montenegro, 10-14 June, 2025 DOI:number


    @inproceedings{brkljacVideoface2025,
    author = {Brklja{\v{c}}, Branko and Kalu{\v{s}}ev, Vladimir and Popovi{\'c}, Branislav and Se{\v{c}}ujski, Milan},
    title = {Transforming faces into video stories - {VideoFace2.0}},
    booktitle = {Preprint submitted to the 14\textsuperscript{th} Mediterranean Conference on Embedded Computing (MECO), Budva, Montenegro},
    volume = {1},
    pages = {1--4},   
    month = {10--14 June},
    year = {2025},
    doi = {-}
    }

[2] Brkljač, B., Kalušev, V., Popović, B., Sečujski, M. (2025). Transforming faces into video stories - VideoFace2.0. arXiv preprint arXiv:2505.02060


      @misc{brkljac2025transformingfacesvideostories,
      title={Transforming faces into video stories - VideoFace2.0}, 
      author={Branko Brkljač and Vladimir Kalušev and Branislav Popović and Milan Sečujski},
      year={2025},
      eprint={2505.02060},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.02060},
      doi={10.48550/arXiv.2505.02060} 	  
}

DOI:10.48550/arXiv.2505.02060


0