Step 1: Collecting the right dataset of images
After conducting a careful research of multiple datasets, where some needed to be requested, others were publicly available or open source, I stumbled across a black and white dataset which I though would be ideal for the model I am building, as the project is about facial emotion expression recognition and not just facial recognition, thinking this might reduce bias in my system. However I was proven wrong because it is possible that the model wouldn’t have recognised colourful images with different angles of light meaning that when working with human faces which should be inclusive with minimal bias, these models cannot perform well without high-quality data.
Step 2: Programming the Machine Learning model in Google Colaboratory
As I was going through the testing of the dataset with only 60 images in the model I got stuck in the layering and training generator, which took a lot of my time to figure out and slowed down the process of my app creation, therefore I decided that when you have a large data set, you cannot train your model on your personal computer since it takes significantly more processing power; instead, you need a massive system with several GPUs. Therefore I will be using a pre-trained model for emotion recognition as it would be more efficient and time saving.
The dataset I decided to use is AffectNet which is the biggest database of facial expressions, valence, and arousal in the wild, allowing research into automated facial expression detection using two distinct emotion models. This dataset needed to be requested 2 weeks in advance. AffectNet comprises over 1M face images gathered from the Internet by searching three of the most popular search engines. Approximately half of the retrieved photos were manually labeled for the occurrence of seven distinct facial expressions (categorial model) and the strength of valence and arousal (dimensional model). The facial expressions include:

Link to dataset: http://mohammadmahoor.com/affectnet/
About the model
There are many pre-trained models using this dataset of images, and I had to pick the one that suits my objectives and accuracy the most. So I decided to test 3 different models to determine which one will be the most suitable for my application
- The first model I chose is a category model, where two baseline deep neural networks are utilised to classify images and predict the strength of valence and arousal. Several assessment measures reveal that deep neural network baselines outperform traditional machine learning approaches and off-the-shelf face expression detection systems. https://paperswithcode.com/paper/affectnet-a-database-for-facial-expression
2. The second model I chose to use has facial expression and attributes recognition (age, gender, ethnicity) based on multi-task learning of lightweight neural networks – https://github.com/HSE-asavchenko/face-emotion-recognition
This model enabled HSE-NN to place third in the multi-task learning challenge, fourth in the Valence-Arousal and Expression challenges, and fifth in the Action Unite Detection Challenge. It was developed in the RSF (Russian Science Foundation)
3. The third model is built with Keras and is concentrated in emotion detection only – https://gitlab.com/Giovygio97/emotiondetectioncoreml
Testing the models:
To make sure which models works with the highest accuracy on random pictures, I decided to test each one of them in Google Colaboratory. Once I get the highest accuracy I will use the model in my application.
I loaded the models and I inserted a test folder with random images. I graphed the photos and added the necessary labels(facial expressions)
Testing the first model:
https://colab.research.google.com/drive/1uVEByhgTMbkmnnBekVWm3xQ2xi8V_WVX?usp=sharing
with accuracy of 70%
Testing the second model:
https://colab.research.google.com/drive/1Kw943irL8eDJuQD9AgJ_od9EZoGt2-6C?usp=sharing
with accuracy of 62%
Testing the third model:
https://colab.research.google.com/drive/1jlA0bcXlcG_NCHvVJhYCBoY17H4ZTOiG#scrollTo=Yw3P9Si763Fj
with accuracy of 89%
I chose to proceed with the third model which I had to covert in order to implement in my app by importing it in python since coremltools supports .h5 format, so I loaded the model and saved it again with the correct format.

In order to easily import the model in Xcode I had to save it in Core “.mlmodel” format.
ml_model.save('./model_v6.mlmodel')
Xcode compiles the Core ML model into a resource that’s been optimized to run on a device. This optimised representation of the model is included in my app bundle and is what’s used to make predictions while the app is running on a device.