Album organizer using face recognition with deep learning

Posted on

November 21, 2024

Posted by

Team Staple AI

No headings

If you ever wanted to organize your digital photo closet or find pictures of yourself or your friends in a heap of Gbs of data, you have come to the right place. In this blog, you will learn about how you can detect and filter pictures of your friends from an album of thousands using face recognition with deep learning. After a brief overview of the basics of face recognition, we will build an album organizer for your custom dataset. All you need is about 5–10 pictures in advance for the model to recognize your favorite people (including yourself).

‍But how does it work?

Our brain is great at memorizing the faces of the people we meet or interact with a bunch of times. A computer needs to identify unique features of a face such as height/width of the face, nose, lips, average color, the relative distance, and angles between them. We call this information a “feature vector” or a “face encoding” which is basically quantification of a face in terms of its visual appearance encoded in some (128 here) real numbers which a computer can process (see Image 3) and understand.

Here we use theface recognition library to create a 128-dimensional (128-D) feature vector for the face of interest. You can find more details about this library and how it works in this article and pip install it with pip3 install face-recognition.This library creates feature vectors using the OpenFace project (face recognition implementation in Python and Torch) based on the FaceNet model, given by Schroff and co-authors in 2015. The FaceNet is a deep convolutional neural network that uses a set of three images for each training instance: anchor (reference face), positive (same face as anchor face), and negative (different face as anchor face). The deep learning metric for this training process is also known as the Triplet-loss function which minimizes the distance between an anchor and a positive, both of which have the same identity, and maximizes the distance between the anchor and a negative of a different identity.

The network is trained such that the 128-D feature vector of every face (even in different images) of the same person is close to each other and at the same time is far away from the 128-D feature vector of any face of some other person. In this context, the notion of “closeness” of feature vectors is quantified by L2 distance and in turn, provides a way to characterize similar-looking faces. Image 1 shows a representation of the learning processes where the triplet-loss minimizes the L2 distance between the feature vectors of anchor and positive while maximizing the distance between the ones of anchor and negatives.

Image 1: Triplet loss minimizing the distance between feature vectors of anchors and positives while maximizing it between anchors and negatives. The anchors and positives are pictures of similar faces while negative is a picture of a different face than anchors and positives. The image is taken from FaceNet.

What's actually happening under the hood?

Behind the scene, the first step is to detect all the faces in the input image using the Histogram of Oriented Gradients or some deep learning method. This is followed by the finding face landmarks to perform image transformations for compensating bad illumination and side-pose of the face. The cropped image is then fed to a trained deep neural network (FaceNet) that generates a 128-D feature vector that can be further used for classification, similarity detection to other faces, or general clustering. The pipeline in Image 2 shows the steps involved in face recognition.

Image 2: The pipeline for generating the 128-D feature vector of a face using OpenCV and FaceNet deep neural network. The image is taken from OpenFace.

Remember we are not training any deep neural network in this blog to create the feature vector but merely accessing a pre-trained one using the face-recognition library. This is mostly because training such neural nets requires a dataset of ~3 million images and consumes enormous computational resources. You can also refer to this article about face recognition with deep learning by Adam Geitgey if you want to learn about face recognition in detail.

For building an album organizer, there are 2 keys steps:

Identify the location (rectangular border enclosing the face commonly known as ‘bounding box’ ) of the face in the pictures of the input data. Construct the feature vector of the face of interest and store it in a reference database (Image 3).
Take a test image in the album, locate the bounding boxes of the faces, construct the feature vector of each detected face, and finally match it to the reference feature vectors in the database and identify people of interest.

Image 3: Face is detected, cropped and the 128-D feature vector of the face is created. This feature vector is the unique identity of the face as given out by the already trained deep neural network.

Let’s get to the code!

Suppose you have six friends: Chandler, Joey, Rachel, Monica, Phoebe, and Ross and you want to sort out their pictures from thousands of images. As an input, we need a few pictures of your friends so face recognition can construct their individual feature vectors. We can organize them in a directory structure like this below.

Folder FriendsData has sub-folders, each with about 5 pictures of your friends.
Python script album.py contains the code for converting your friends’ pictures into their respective face encodings (that would be stored in friends_face_encodings.pickle ).
Python script recognize.py finds the encodings of faces in the target picture, and compares them to your friends’ encodings and determines if they match. If there is a match, it draws a bounding box around your friends’ faces in the target picture and labels them.
Folders TestData and OutputData are directories for storing the pictures of the album you want to sort and the output pictures with your friends’ faces labeled on them after running recognize.py script.

The first step (see code below) involves collecting the paths of all your input images of your friends (in the folderFriendsData) in a listdatapaths (line 56). In the function get_encodings , we loop over all the image paths, get the name of the friend in that image (line 22), identify the bounding box around the face (line 29), and construct their corresponding face encodings (line 32). We store their name and the encoding in a dictionary data and dump it in a pickle file friends_face_encodings.pickle (line 44).

----------

1import os, sys
2import cv2 
3from imutils import paths
4import pickle
5import face_recognition
6‍
7def get_encodings(datapaths):
8    """
9    Input: List of all image paths of your friends pictures
10    This function loops over all the image paths, detects their face positions in the image, 
11    constructs the face encodings and stores them and their corresponding names in a pickled file
12    for further use.
13    """
14‍
15    friendsEncodings = []
16    friendsNames = []
17‍
18    # loop over all the images and extract the face encodings
19    for count, imagepath in enumerate(datapaths):
20‍
21        # get the friend's name from the image path
22        name = imagepath.split(os.path.sep)[-2]
23‍
24        # read the image and convert it to RGB scale
25        image = cv2.imread(imagepath)
26        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
27‍
28        # get the bounding box of the face using HOG (Histogram of Oriented Gradients)
29        bboxes = face_recognition.face_locations(image, model='hog')
30‍
31        # construct the encoding of the face within the bounding box
32        encodings = face_recognition.face_encodings(image, bboxes)
33‍
34        # store the face encoding and their respective name to the lists of encodings and names respectively,
35        for encoding in encodings:
36            friendsEncodings.append(encoding)
37            friendsNames.append(name)
38        
39    # make a dictionary that stores all the encodings and their corresponding names
40    data = {"encodings": friendsEncodings,
41             "names": friendsNames}
42‍
43    # save the dictionary locally in a pickle file to be used later for the recognition part
44    with open('friends_face_encodings.pickle', "wb") as fe:
45        fe.write(pickle.dumps(data))
46‍
47    return data
48‍
49if __name__ == '__main__':  
50‍
51    # define the main directory and directory with your friends' data
52    main_dir = os.getcwd()
53    data_dir = os.path.join(main_dir, 'FriendsData')
54‍
55    # list of paths of all your friends' pictures
56    datapaths = list(paths.list_images(data_dir))
57		
58    # get the encodings for your friends's faces
59    data = get_encodings(datapaths)

----------

In the first step, we made a database of feature vectors of all our friends’ faces and saved it as a pickle file. Now we have the identities of our friend’s face in terms of their feature vectors.

But how do we recognize their faces in an unknown image?

In the second step (see code below), we first load the unknown test images (line 36) from the directory from which we need to recognize the familiar faces of our friends. In each test image, we detect the bounding boxes of the identified faces (line 39) and make face encodings for each of them (line 40). Once we have all the encodings of the faces in the test image, we compare each of them with our reference feature vectors in the database. If we get even a single match, we get the name of the face which had the highest number of votes. For example, we have 30 face encodings here in the database saved in friends_face_encodings.pickle and we detected 1 face in the test image. We compare this 1 face with all 30 encodings using thecompare_faces function (line 47). This function computes the Euclidean distance between our test image and all the other 30 images in the database. If the distance is below a certain threshold, it returns True but otherwise False. Therefore, we get a list matches of length 30 with boolean values such as [True, True, True,…, False] each indicating whether the test image matched the reference face encoding or not. We then extract the indices matchedIdxs where we hit a match and get the respective name from the loaded pickled file (that we saved from the script album.py) and get the vote count counts. The name with the maximum votes wins. If no match was found, we just label the face as Unknown.

1# Usage: python3 recognize.py -i <path_to_test_image_dir> -o <path_to_output_dir>
2import os, sys
3import cv2 
4from imutils import paths
5import pickle
6import face_recognition
7import argparse
8‍
9# load the reference encodings created in the script album.py
10data = pickle.loads(open('friends_face_encodings.pickle', "rb").read())
11‍
12# make the argument parser and parse the arguments
13ap = argparse.ArgumentParser()
14‍
15# provide a path to the directory containing test images and 
16# a path to the directory where you would like to save your output data
17ap.add_argument("-i", "--test_directory", required=True,
18	help="path to the test image directory")   
19ap.add_argument("-o", "--output_directory", required=True,
20	help="path to the output directory")        
21args = vars(ap.parse_args())
22‍
23test_dir = args["test_directory"]
24output_dir = args["output_directory"]
25‍
26# initialize a map linking the faces and the filenames they are found in the output
27filemap = {names: [] for names in data["names"]}
28‍
29# loop over all the images in the test directory
30for count, image in enumerate(os.listdir(test_dir)):
31   
32    imagepath = os.path.join(test_dir, image)
33    filename = imagepath.split(os.path.sep)[-1]
34‍
35    # load the image
36    testimage = cv2.imread(imagepath)
37‍
38    # extract the position of bounding box of the face and their corresponding face encodings
39    bboxes = face_recognition.face_locations(testimage, model='hog')
40    encodings = face_recognition.face_encodings(testimage, bboxes)
41‍
42    names = []
43‍
44    # loop over the found encodings and compare it to the encodings in the reference database
45    for encoding in encodings:
46‍
47        matches = face_recognition.compare_faces(data["encodings"], encoding)
48        name = "Unknown"
49‍
50        # if the test image has even a single face that matched a face in the database
51        if True in matches:
52‍
53            # extract the matched indices 
54            matchedIdxs = [i for (i, b) in enumerate(matches) if b]
55            counts = {}
56‍
57            # extract the corresponding names of the matched indices and get a vote count for each matched face name 
58            for i in matchedIdxs:
59                name = data["names"][i]
60                counts[name] = counts.get(name, 0) + 1
61‍
62            # the name of the face with maximum number of votes wins
63            name = max(counts, key=counts.get)
64‍
65        names.append(name)   
66‍
67    # draw the bounding box around the faces with their detected names
68    for ((top, right, bottom, left), name) in zip(bboxes, names):
69        
70        cv2.rectangle(testimage, (left, top), (right, bottom), (0, 255, 0), 2)
71        cv2.putText(testimage, name, (left, top - 10), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 2)
72        if name != 'Unknown':
73            filemap[name].append(filename)
74‍
75    outputname = "output_%s" % filename
76    outputpath = os.path.join(output_dir, outputname)
77    cv2.imwrite(outputpath, testimage)
78

‍

We save the output image in the output_dir with labeled faces and return a filemap dictionary with keys as the names of your friends and the values as a list of filenames they were detected in. This dictionary can be further used to sort out the pictures. Here are a few examples of the test images and the detected faces in them:

It’s needless to say, the more encodings we have in our reference database, the more accurate our detection would be. Have fun with it! Re-organize your album, find out which celebrities you look like, or just filter out your pictures from a random mess of pictures. Here is a link to the full code on Github. Feel free to leave any comments or suggestions related to this blog.

References

‍