A Brief Introduction to the paper

A Face Annotation Framework with Partial Clustering and Interactive Labeling


What does the paper do?

The overall goal is to design a Photo Annotation System. It aims to alleviate human labors on labeling lots of digital photos, especially grouping them by human identities, or faces.

The recent achievements of Face Detection/Recognition techniques make it possible, but only to limited extent. The performance of Face Detection now approaches commercial strength but Face Recognition is far from being a useful and stable component. In practical cases, facial features are liable to illumination, view and pose changes, thus not reliable.

That situation makes fully automatic grouping by faces an intractable problem. We apply semi-automatic approaches to solve this, aiming to minimize human labors as possible.


The framework

Firstly we regularly use Face Detection techniques to extract faces from images, get multiple features and fuse them into a similarity matrix. This similarity matrix is not necessarily precise and may contain a lot of noise.

Then we divide the labeling process into two parts: unsupervised and interactive part. The overall framework is like the following:


Unsupervised part

In unsupervised part, we cluster faces via Partial Clustering. As one contribution in the paper, Partial clustering only groups evident/good clusters in which similarity is consistent, and leave all the other faces in the litterbin, in which similarity measure is contaminated by noise and not reliable.


Interactive part

In interactive part, user interactions are involved. Users are required to label evident/good clusters so that we can get a lot of labeled faces as "seed" without giving annoying experience to users.

Then we apply Efficient Labeling, the second contribution in the paper, to the half-labeled face set.

This algorithm picks out a list of faces Q, once at a time for the user to label. Regarding labeling precess as information influx to resolve the ambiguity of half-labeled set, the list of faces are deliberately chosen so that the ratio of information gain to estimated number of user interactions, or "information efficiency", is maximized. We expect that user can label many faces while fewer user interactions are required.

Estimation of #user interactions is done via Subset Saliency, indicating how cohesive Q w.r.t. other unlabeled faces.



We conduct several experiments. One can find detailed experimental description and results in the Paper.

Here we compare overall performance of our framework with Riya's. (See http://www.riya.com).

Ours outperform Riya by about 46%!


Download PDF: A Face Annotation Framework with Partial Clustering and Interactive Labeling