УДК 004

Building a smart faceid system using camera and machine learning

Конырова Куляш Каировна – магистрант факультета Информационных технологий Казахстанско-Британского технического университета (Республика Казахстан, Алматы).

Abstract: In recent years, face recognition technology has gained immense popularity due to its myriad of potential applications in various industries, such as security systems, user identification, and targeted advertising. This article elucidates the development of a sophisticated face recognition system utilizing camera and machine learning techniques. Historically, facial recognition systems were primarily reliant on conventional methods like feature extraction and template matching. However, in recent years, advanced techniques involving feed-forward neural networks, in combination with innovative image processing methods, have been employed to enhance the accuracy of facial recognition systems. The proposed system utilizes a convolutional neural network which takes input pictures of faces and generates a distance between the two embeddings. This network was trained to minimize the distance between pictures of the same person and maximize the distance between pictures of different persons. The developed face recognition system exhibits a remarkable capacity for identifying individuals based on their facial features. The system's potential applications extend to security systems, user identification, and targeted advertising. With sustained research and development, facial recognition technology is poised to become even more precise and reliable, fostering more widespread adoption in numerous sectors.

Аннотация: В последние годы технология распознавания лиц приобрела огромную популярность благодаря своим многочисленным потенциальным применениям в различных отраслях, таких как системы безопасности, идентификация пользователей и целевая реклама. В данной статье описывается разработка системы распознавания лиц с использованием камер и методов машинного обучения. Исторически системы распознавания лиц в основном опирались на традиционные методы, такие как извлечение признаков и сопоставление шаблонов. Однако в последние годы для улучшения точности систем распознавания лиц были использованы продвинутые методы, включающие нейронные сети, в сочетании с инновационными методами обработки изображений. Предлагаемая система использует сиамскую нейронную сеть, которая принимает на вход изображения лиц и генерирует расстояние между двумя векторными представлениями. Эта сеть была обучена минимизировать расстояние между изображениями одного и того же человека и максимизировать расстояние между изображениями разных людей. Разработанная система распознавания лиц обладает способностью идентифицировать людей по их признакам. Потенциальные применения системы распознавания лиц распространяются на системы безопасности, идентификацию пользователей и целевую рекламу. С продолжением исследований и разработок технология распознавания лиц готовится стать еще более точной и надежной, способствуя более широкому использованию во многих отраслях.

Keywords: face recognition, siamese neural network, facial clusters, model training, feature extraction.

Ключевые слова: система распознавания лиц, сиамская нейронная сеть, кластеры лиц, обучение модели, выделение признаков.


Face recognition systems have become increasingly important in various applications due to their ability to accurately and efficiently identify individuals. They offer numerous benefits such as enhanced security, improved accessibility, and personalized experiences. Authors [1] present a new linear discriminant analysis (LDA)-based face recognition system that solves the small sample size problem encountered when applying LDA. The proposed technique proves that the most expressive vectors derived from principal component analysis (PCA) in the null space of the within-class scatter matrix are equal to the optimal discriminant vectors obtained from LDA in the original space. The experimental results demonstrate a significant improvement in the face recognition system's performance. The work [2] investigates a technique for face recognition based on computing 25 local autocorrelation coefficients. The study focuses on recognizing a large number of known human faces while rejecting unknown faces that lie close in pattern space. The multiresolution system achieves a 95% recognition rate and falsely accepts only 1.5% of unknown faces, operating at about one face per second. Without rejection of unknown faces, the technique obtains a peak recognition rate of 99.9%. The research [3] proposes a face recognition system that achieves a high degree of robustness and stability to illumination variation, image misalignment, and partial occlusion. The system uses sparse representation tools to align a test face image to a set of frontal training images, and the region of attraction of the alignment algorithm is computed empirically for public face data sets such as Multi-PIE. A complete face recognition system, including a projector-based training acquisition system, is implemented and effectively recognizes faces under a variety of realistic conditions. In the [4] the researchers compare two simple but general strategies for computer recognition of human faces, based on the computation of geometrical features and almost-gray-level template matching, respectively. The results favor the template-matching approach, achieving almost perfect recognition. [5] surveys past work in solving subproblems related to face recognition, including detection of a pattern as a face, identification of the face, analysis of facial expressions, and classification based on physical features of the face. The paper discusses the capability of the human visual system and serves as a guide for an automated system. Some new approaches to these problems are briefly discussed. [6] There is a method for face recognition across variations in pose and illumination by simulating the process of image formation in 3D space and estimating 3D shape and texture of faces from single images. The estimate is achieved by fitting a statistical, morphable model of 3D faces to images, and faces are represented by model parameters for 3D shape and texture. The study presents results obtained with images from publicly available databases. [7] A novel feature extraction method named uniform pursuit to address the one sample problem in face recognition. The method achieves high recognition accuracy with only one training sample per person by exploiting the sparse representation of the face images. The study [8] compares the accuracy of state-of-the-art face recognition algorithms with humans on a face-matching task, where they determine whether pairs of face images taken under different illumination conditions are pictures of the same person or of different people. The results show that current algorithms compete favorably with humans, with some algorithms surpassing human performance on both difficult and easy face pairs, highlighting the need to compare algorithms with humans to achieve the best control. Authors [9] discuss the "one sample per person" problem, where given a stored database of faces, the goal is to identify a person from the database later in time in any different and unpredictable poses, lighting, etc. from just one image. The paper categorizes and evaluates the numerous algorithms developed to solve this problem, addressing relevant issues such as data collection, the influence of small sample size, and system evaluation, and proposes several promising directions for future research. The paper [10] discusses two important feature extraction methods, Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), and their limitations when dealing with image data due to the high dimensionality of image vectors. In addition, the paper presents two-dimensional PCA (2DPCA) and two-dimensional LDA (2DLDA) as solutions that work directly on 2-D image matrices without a vectorization procedure, reducing computational effort and the possibility of singularity in feature extraction. It shows that these matrix-based 2-D algorithms are equivalent to special cases of image block-based feature extraction, providing a better understanding of the 2-D feature extraction approaches.


  1. The first step to implement any machine learning algorithm is to get appropriate data. Despite the long history of studying face detection, there is yet to be a universally accepted benchmark public database that incorporates both color and depth information obtained from the same sensor. To address this issue, proposed in their study [11] an RGB-D database containing synchronized RGB-D-T facial images of individuals, incorporating variations in facial orientation, lighting conditions, and expressions. Notably, the faces in the images have not been extracted from either the RGB or depth data, rendering the database useful for both detection and recognition purposes. This dataset perfectly simulates images taken with a smartphone’s front camera at the moment of authentication.
  2. An open-source library called Keras, that facilitates the development and training of neural networks, was used. Keras is built on top of other machine learning libraries, such as TensorFlow, and it supports both CPU and GPU computation. Keras is essential for implementing machine learning algorithms as it simplifies the process of building, training, and deploying deep learning models. Its high-level interface allows for rapid prototyping and experimentation, making it an ideal tool for this research.
  3. One-shot learning is a useful technique in face recognition because it enables the model to accurately recognize an individual with only a single image, rather than requiring a large number of labeled images for training. One-shot learning also allows for the recognition of individuals with significant facial variations, such as changes in facial hair, glasses, or makeup, which can be challenging for traditional machine learning approaches.
  4. The siamese-like convolutional neural network based on SqueezeNet architecture [12] was created. Siamese like CNN has emerged to solve tasks where acquiring large amounts of data is impossible. Unlike traditional neural networks, Siamese networks can provide accurate predictions using only a few images, which makes them highly valuable for problems with limited data availability. When presented with images of two individuals' faces in RGBD format, meaning a four-channel image, the neural network computes the distance between their respective embeddings (Figure 1).


Figure 1. Siamese network architecture.

This neural network is trained with a contrastive loss function (where Y is a tensor of details about image similarity, Dw is an Euclidean distance between images and margin is a constant for enforcing the minimum distance and considering are they similar or not) that aims to minimize the distance between images of the same individual and maximize the distance between images of different individuals:


  1. The t-distributed stochastic neighbor embedding (t-SNE) technique was used to visualize facial clusters in the embedding space. The approach transforms data points into a 2D or 3D framework and facilitates the exploration of connections.


The t-SNE algorithm was employed to transform the 128-dimensional embedding space into a two-dimensional visualization. Each color in the resulting plot represents a distinct individual, showcasing the network's ability to effectively cluster similar images together. It is important to note that when using the t-SNE algorithm, the distances between these clusters hold no significant meaning. This observation is illustrated in Figure 2.


Figure 2. Facial clusters in the embedding space (t-SNE).

Furthermore, an attempt was made to utilize the t-SNE trained data as input for classification algorithm, specifically the random forest algorithm, in order to make predictions. However, the accuracy achieved was relatively low, approximately 28%, likely due to the additional overhead introduced by the t-SNE algorithm.


The research reviewed clustering facial images based on their extracted features and then using it for classification algorithm to make predictions. Through the analysis, it was observed that it may be necessary to consider alternative data pre-processing methods or algorithms to improve the accuracy of the classification task. The entire system was implemented using an asynchronous web framework for Python, and the choice of technology stack influenced the execution time. Future endeavors will focus on enhancing both the model's accuracy and system's execution time. To improve the model's accuracy, the next training dataset will incorporate a more precise RGB-D face database. Additionally, the inclusion of images of same people with slightly new appearance (e.g., glasses, makeup, mustache, hats) in the next training dataset is essential for improving model predictions.


  1. F. Chen, H.Y.M. Liao, M.T. Ko, J.C. Lin, G.J. Yu, A new LDA-based face recognition system which can solve the small sample size problem, Patt. Recogn. 33 (10) (2000) 1713–1726.
  2. Goudail, E. Lange, T. Iwamoto, K. Kyuma, N. Otsu, Face recognition system using local autocorrelations and multiscale integration, IEEE Trans. Pattern Anal. Mach. Intell. 18 (10) (1996) 1024}1028.
  3. Wagner, A.; Wright, J.; Ganesh, A.; Zihan, Zhou.; Mobahi, H.; Yi, Ma. Toward a Practical Face Recognition System: Robust Alignment and Illumination by Sparse Representation. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 372–386.
  4. Roberto B, Poggio T (1993) Face recognition: features versus templates. IEEE Tran Pattern Anal Mach Intell 15(10):1042–1052
  5. Samal, A. and Iyengar, P. 1992. Automatic recognition and analysis of human faces and facial expressions: A survey. Patt. Recog. 25, 65–77.
  6. Blanz, V. and Vetter, T. 2003. Face recognition based on fitting a 3D morphable model. IEEE Trans. Patt. Anal. Mach. Intell. 25, 1063–1074.
  7. Deng, J. Hu, J. Guo, W. Cai, and D. Feng. Robust, Accurate and Efficient Face Recognition from a Single Training Image: A Uniform Pursuit Approach, Pattern Recognition. vol. 43, pp. 1748-1762, 2010.
  8. J. O’Toole, P. J. Phillips, F. Jiang, J. Ayyad, N. Penard, H. Abdi, Face recognition algorithms surpass humans matching faces over changes in illumination, Pattern Analysis and Machine Intelligence, IEEE Transactions on 29 (9) (2007) 1642–1646.
  9. Xiaoyang Tan, Songcan Chen, Zhi-Hua Zhou, and Fuyan Zhang. 2006. Face recognition from a single image per person: A survey. Pattern Recognit. 39, 9 (2006), 1725–1745.
  10. Wang, X. Wang, and J. Feng. On image matrix based feature extraction algorithms IEEE Trans. Syst., Man, Cybern. B, Cybern. vol. 36, no. 1, pp. 194–197, Feb. 2006.
  11. Olegs Nikisins, Kamal Nasrollahi, Modris Greitans, and Thomas B. Moeslund, RGB-D-T based Face Recognition, International Conference on Pattern Recognition (ICPR), Sweden, 2014.
  12. Iandola, Forrest & Moskewicz, Matthew & Ashraf, Khalid & Han, Song & Dally, William & Keutzer, Kurt. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and textless1MB model size. (2016).

Интересная статья? Поделись ей с другими: