Error Rates in Users of Automatic Face Recognition Software

PLoS One. 2015 Oct 14;10(10):e0139827. doi: 10.1371/journal.pone.0139827. eCollection 2015.

Abstract

In recent years, wide deployment of automatic face recognition systems has been accompanied by substantial gains in algorithm performance. However, benchmarking tests designed to evaluate these systems do not account for the errors of human operators, who are often an integral part of face recognition solutions in forensic and security settings. This causes a mismatch between evaluation tests and operational accuracy. We address this by measuring user performance in a face recognition system used to screen passport applications for identity fraud. Experiment 1 measured target detection accuracy in algorithm-generated 'candidate lists' selected from a large database of passport images. Accuracy was notably poorer than in previous studies of unfamiliar face matching: participants made over 50% errors for adult target faces, and over 60% when matching images of children. Experiment 2 then compared performance of student participants to trained passport officers-who use the system in their daily work-and found equivalent performance in these groups. Encouragingly, a group of highly trained and experienced "facial examiners" outperformed these groups by 20 percentage points. We conclude that human performance curtails accuracy of face recognition systems-potentially reducing benchmark estimates by 50% in operational settings. Mere practise does not attenuate these limits, but superior performance of trained examiners suggests that recruitment and selection of human operators, in combination with effective training and mentorship, can improve the operational accuracy of face recognition systems.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Adult
  • Algorithms*
  • Databases, Factual
  • Ethnicity
  • Face*
  • Female
  • Humans
  • Image Processing, Computer-Assisted*
  • Male
  • Middle Aged
  • Software Design
  • Software*

Grants and funding

This research was supported by Australian Research Council grants to Kemp (LP110100448, LP130100702). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.