The FACES algorithm
Put as succinctly as possible, the FACES algorithm works as follows.
We believe that there are not any off- the- shelf modules that can be used directly and it may be required to consider a combination of different methods. Establishing these methods necessitates a careful study of artists’ renditions so as to be able to extract maximum relevant information and model their styles.
Figure 1: Overview of the Algorithm
Figure 1 provides an overview of the algorithm. Local features (LF) and anthropometric distance (AD) feature descriptors are extracted from each face. A subset of these features is identified to be characteristic of artists’ styles by means of the random subspace ensemble learning method. Non-parametric statistical permutation tests give the importance of selected features. Wherever enough images of an artist are not available to learn specific style, all the features are used. Similarity scores between image pairs are computed using the weighted features/all features to yield style-specific/general match/non-match scores as appropriate. These scores are then validated (using the robust Siegel-Tukey non-parametric statistical test for artist specific case). The learned similarity scores for the general case referred to as the Portrait Feature Space (PFS) are used for identification of unknown instances using the statistical hypothesis tests. We describe the details below.
1.0 Details of the Algorithm
Once we have obtained the image descriptors (local features across fiducial points and salient anthropometric distances as mentioned in Sec 1 of Report 1) that characterize a face image, we wish to learn which of these are characteristic of an artist’s style. In other words, we want to learn a subset of these image descriptors characterizing an artist’s renditions. Towards this we employ the random subspace ensemble learning method (This consists of several classifiers (classifier gives a label to an object and thus helps in categorization) and outputs the class based on individual classifiers). For convenience, in the rest of this report, we refer to the local feature image descriptors and the anthropometric distance image descriptors as features.
1.1 Random Subspace Ensemble Learning
The random subspace method randomly samples a subset of these features and performs training in this reduced feature space. Multiple sets (or bags) of randomly sampled features are generated, and for each bag the parameters are learned. This method is capable of handling deficiencies of learning in small sample size and has superior performance than a single classifier .
More specifically, we are given, say, Z training image pairs and D features. Let L be the number of individual classifiers in the ensemble. We choose di<D (without replacement) to be the number of features to be used in ith classifier. For each classifier, we determine the match and non-match scores (as appropriate) using the di features to obtain LF and AD similarity scores using appropriate measures as follows.
Where Sn (I, I’) is any normalized similarity measure computed between image pairs I, I’.
In order to identify features that give the highest separation between match and non-match scores, we then compute the Fisher Linear Discriminant function for each classifier (as described in Step 3 of Sec 2 in Report 1). We choose the union of features from those classifiers that give the top k Fisher Linear Discriminant values as our style features; k chosen experimentally. It is to be noted that we select the style features separately for both local features and anthropometric distances.
1.2 Importance of the Chosen Features
Not all features identified by the above method are equally important in representing a style. In order to understand the importance of the chosen features, we consider the non-parametric permutation statistical test . Permutation tests helps in assessing what features are same (in other words invariant) across all the instances belonging to a class. Thus, features which are more invariant across the instances of the class can be perceived to be more characteristic of the class and thus be assigned greater importance. Permutation tests have been applied to determine invariant features in artworks such as in .
The null hypothesis H0 (in statistics null hypothesis refers to a default scenario) is chosen to indicate that two image groups G1 and G2 have the same average value (μ) in a particular feature v; the alternate hypothesis H1 indicating that the average value of that feature is different in the two groups. Thus,
If the null hypothesis is true, then it should not matter when this feature v is randomly assigned among images in the group. For instance, there is a certain way that the mouth corner looks when a person smiles. On an average, if this appearance is same across all images and across groups, then the principle behind this test is that there will not be a significant difference if the mouth tips are randomly assigned across images in the group (i.e. assigning the feature of one person to the corresponding feature of another person). Thus, if there are many images of an artist by depicting different sitters, this test essentially captures important features that are invariant across the works of the artist.
Specifically, if there are Ns images of a style class S, then we can divide these Ns images into 2 subgroups consisting of Ns1 and Ns2 images. Let the feature values for the first group be [v1,v2,…,v Ns1] and in second group be [vNs1+1,… vNs_1+s_2 ]. The two sided permutation test is done by randomly shuffling [v1,……,vNs] and assigning the first Ns1values, say, [v(1),v (2),…,v(N(s_1)] to the first group and the remaining Ns2 values [vN(s1+1),…,v(Ns1+s2)] to the second group.
For the original two groups we compute,
δ0 denotes the variation of the feature v as exhibited by the various image instances Ii in the 2 groups under consideration.
For any two permuted groups we compute,
δs denotes the variation in the feature v of style class S after assigning the feature as depicted by Ii, i=1, 2,…l to an image not necessarily of Ii
This value obtained from the permutation test referred to as the p value in statistics community, reflects the variation of the feature in the two groups. It is given by the number of times δs > δ0.
Smaller p denotes stronger evidence against the null hypothesis, meaning that the feature differed considerably in the two groups. If a certain feature showed no difference in the 2 groups, then it does not matter to which image this feature is associated since the average value does not change; thus it can be considered as a random assignment into any image in the pool. We compute pvalues for each feature (chosen by the random subspace method) as described above and use them as weights in computing the similarity scores between the image pairs. Thus the p-normalized similarity scores sp (I,I’) is now given by
Where pv is the p value of the feature v as determined by the permutation test and M is the number of features as chosen by the random subspace method.
Subsequently the p-normalized similarity scores from the two measures (LF/AD) are fused in an optimal manner as described in Report 1 (Sec 2).
1.3 Validation of the Style Features
Our goal here is to verify that given match/non-match scores obtained from style features of the class and given match/non-match scores obtained using all features (independent of the style), to show that there is a higher confidence associated with style-specific scores than with the latter case. In other words, we wish to show that style-specific similarity score are better representations of the style class than the similarity scores obtained using all features. Towards this, we employ a robust non-parametric statistical test called the Siegel-Tukey test that basically checks the null hypothesis that two independent score sets come from the same population (style) against the alternative hypothesis that they come from populations differing in variability or spread. If the style features are indeed good representations of the class, then there should be a higher level of confidence associated with the null hypothesis when compared with style independent features.
The principle behind this test is based on the following idea–Suppose there are two groups A and B with n observations (in our case similarity scores) for the first group and m observations for the second (so there are N = n + m total observations). If all N observations are arranged in ascending order, it can be expected that the values of the two groups will be mixed or sorted randomly, if there are no differences between the two groups (following the null hypothesis). This would mean that among the ranks of extreme (high and low) scores, there would be similar values from Group A and Group B. If, say, Group A were more inclined to extreme values (alternate hypothesis), then there will be a higher proportion of observations from group A with low or high values, and a reduced proportion of values at the center. Thus the p values of this test provide a measure of the confidence of the learned style-specific similarity scores.
For artists/images where style could not be learnt, we use all the features (LF/AD) in computing the similarity scores (match/non-match). The learned similarity scores (match and non-match) were used to construct the Portrait Feature Space (distribution of match/non-match scores). The PFS was then validated using the same procedure as mentioned in Report 1.
1.4 Identification Framework
It is to be noted that the similarity scores obtained using the style learning algorithm described above are associated with greater confidence than the ones obtained in Phase 1. Thus, identity verification is more robust. The method for identity verification is similar to that described in Report 1 and is included here for completeness.
Given the learned PFS, the question now is to verify an unknown test image against a reference image. Towards this, we employ hypothesis testing.
This is a method for testing a claim or hypothesis (in this case that of a match/non-match between portrait pairs) . Below, we summarize it with respect to the learned PFS in arriving at the conclusion for a match.
- Null hypothesis claims that the match distribution accounts for the test’s similarity score (with reference) better than non-match distribution. The alternate hypothesis is that non-match distribution models the score better.
- We set level of significance α (test’s probability of incorrectly rejecting the null hypothesis) as 0.05, as per common practice in such problems.
- We compute the test statistic using one independent non- directional z test , which determines the number of standard deviations the similarity score deviates from the mean similarity score of the learned distributions
- We compute pvalues which are the probabilities of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. If p<α we reject the null hypothesis.
In order to examine the validity of the chosen approach, we consider similarity scores of the test image with artworks known to depict different persons other than the one depicted in reference image. We call these images as distracters. Depending on availability, we choose similar works by the same artist (artist of reference image) as distracters. If a test image indeed represents the same subject as in the reference image, not only should its score with the reference image be modeled through match distribution, but also its scores with distracter faces should be modeled by non-match distribution.
We computed similarity scores of test cases with corresponding reference image and with 10 distracters. Table 1 lists various hypothesis test scenarios that can arise  and the corresponding conclusions that one can infer. Match and non-match cases are straight forward to infer from Table 1. In cases where both match and non-match distributions are likely to account for the test data in the same way, it can be said that the learned PFS cannot accurately describe the test data (black rows in Table 1). If either match or non-match distribution is more likely to account for both test as well as distracters (magenta rows in Table 1), it can be inferred that the chosen features do not possess sufficient discriminating power to prune outliers. Thus in these scenarios, it is not possible to reach any conclusion.
References T. Ho, The random subspace method for constructing decision forests, IEEE. Trans. on Patt. Anal. and Mach. Intell., vol 20, no 8, pp 832-844, 1998.  P. I. Good, Permutation, Parametric, and Bootstrap Tests of Hypothesis, 3rd Ed., Springer, 2009.  J. Li and L. Yao and J.Wang. Rhythmic Brushstrokes Distinguish Van Gough from his contemporaries: Findings via Automated Brushstrokes Extraction, IEEE Trans. Patt. Anal. Mach. Intell., vol34, no 6, pp 159-176, 2012.