INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND MATHEMATICAL THEORY (IJCSMT )
E-ISSN 2545-5699
P-ISSN 2695-1924
VOL. 10 NO. 4 2024
DOI: 10.56201/ijcsmt.v10.no4.2024.pg1.20
ABDULLAHI Adamu Isa, Dr. Yusuf Musa Malgwi, Yakubu Hassan Zali, Mohammed Usman
Facial expression recognition is a critical component in the field of affective computing, offering significant applications in areas such as human-computer interaction, security, and mental health monitoring. This study investigates the use of a hybrid deep learning approach combining Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) for effective facial expression identification. Using the FERPlus dataset, which encompasses a diverse range of facial images annotated with seven distinct emotions (happiness, sadness, anger, surprise, fear, disgust, and neutral), the model was trained to classify and recognize emotional states. The training process involved comprehensive data preprocessing including noise reduction, image resizing, and normalization, as well as data augmentation techniques to enhance the model’s generalization capability. The hybrid CNN-RNN architecture was implemented to leverage both spatial feature extraction from CNNs and temporal sequence learning from RNNs. Performance evaluation was conducted using a confusion matrix and plots of accuracy and loss over 50 epochs, demonstrating the model’s ability to accurately identify various facial expressions. Results indicate that the hybrid model effectively captures and distinguishes between different emotions with a high degree of accuracy. The implementation of this model showed promising results in real-time emotion detection through live streaming. The study highlights the potential of combining CNNs and RNNs to advance facial expression recognition systems and suggests further exploration into model optimization, multi-modal approaches, and real-world application integration. This research contributes to the ongoing development of emotion recognition technologies and provides a foundation for future improvements in the accuracy and applicability of facial expression analysis systems.
Facial Expression Recognition, Deep Learning, Convolutional Neural Networks
Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to
align and translate. arXiv preprint arXiv:1409.0473. https://arxiv.org/abs/1409.0473
Barsoum, E., Zhang, C., Ferrer, C. C., & Zhang, Z. (2016). Training deep networks for facial
expression recognition with crowd-sourced label distribution. In Proceedings of the 18th
ACM International Conference on Multimodal Interaction (pp. 279-283).
https://doi.org/10.1145/2981993.2982018
Benitez-Quiroz, C. F., Srinivasan, R., & Martinez, A. M. (2016). Emotionet: An accurate, real-
time algorithm for the automatic annotation of a million facial expressions in the wild. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp.
5562-5570). https://doi.org/10.1109/CVPR.2016.595
Cai, J., Zhan, Z., Hu, X., & Lei, J. (2019). A hybrid approach for facial expression recognition
using deep learning. IEEE Access, 7, 94677-94685.
https://doi.org/10.1109/ACCESS.2019.2920801
Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the
details: Delving deep into convolutional nets. arXiv preprint arXiv:1405.3531.
https://arxiv.org/abs/1405.3531
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent
neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
https://arxiv.org/abs/1412.3555
Corneanu, C. A., Simón, M. O., Cohn, J. F., & Guerrero, S. E. (2016). Survey on RGB, 3D,
thermal, and multimodal approaches for facial expression recognition: History, trends,
and affect-related applications. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 38(8), 1548-1568. https://doi.org/10.1109/TPAMI.2015.2490422
Dhall, A., Goecke, R., Lucey, S., & Gedeon, T. (2012). Collecting large, richly annotated facial-
expression databases from movies. IEEE Multimedia, 19(3), 34-41.
https://doi.org/10.1109/MMUL.2012.65
Fan, Y., Lu, X., Li, D., & Liu, Y. (2016). Video-based emotion recognition using CNN-RNN
and C3D hybrid networks. In Proceedings of the 18th ACM International Conference on
Multimodal Interaction (pp. 445-450). https://doi.org/10.1145/2993148.2993163
Fasel, B., & Luettin, J. (2003). Automatic facial expression analysis: A survey. Pattern
Recognition, 36(1), 259-275. https://doi.org/10.1016/S0031-3203(02)00081-2
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio,
Y. (2014). Generative adversarial nets. In Advances in Neural Information Processing
Systems (pp. 2672-2680). https://doi.org/10.5555/2969033.2969125
Graves, A., Jaitly, N., & Mohamed, A. R. (2013). Hybrid speech recognition with deep
bidirectional LSTM. In 2013 IEEE Workshop on Automatic Speech Recognition and
Understanding (pp. 273-278). https://doi.org/10.1109/ASRU.2013.6707740
Gross, S., & Wilber, M. (2016). Training and investigating residual nets. Facebook AI Research.
Retrieved from https://research.fb.com/publications/training-and-investigating-residual-
nets/
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8),
1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2019). Progressive growing of GANs for improved
quality, stability, and variation. arXiv preprint arXiv:1710.10196.
https://arxiv.org/abs/1710.10196
Ko, J., Kim, Y., & Kim, J. (2018). Real-time facial expression recognition with deep learning: A
survey. Computer Vision and Image Understanding, 173, 94-108.
https://doi.org/10.1016/j.cviu.2018.01.011
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep
convolutional neural networks. In Advances in Neural Information Processing Systems
(pp. 1097-1105). https://doi.org/10.5555/2999134.2999257
Li, X., & Deng, J. (2020). Deep facial expression recognition: A survey. IEEE Transactions on
Pattern Analysis and Machine Intelligence.
https://doi.org/10.1109/TPAMI.2020.3036997
Lucey, S., Cohn, J. F., & Kanade, T. (2010). The extended Cohn-Kanade dataset (CK+): A
complete dataset for action unit and emotion-specified expression. Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition.
https://doi.org/10.1109/CVPR.2010.5539913
Lyons, M., Akamatsu, S., Kamachi, M., & Gyoba, J. (1998). Coding facial expressions with
Gabor wavelets. In Proceedings of the Third IEEE International Conference on
Automatic Face and Gesture Recognition (pp. 200-205).
https://doi.org/10.1109/AFGR.1998.670992
Mase, K. (1991). Recognition of facial expressions based on facial dynamics. IEEE Transactions
on Systems, Man, and Cybernetics, 21(5), 1174-1181. https://doi.org/10.1109/21.107252
Matsumoto, D., & Ekman, P. (1988). The facial expressions of emotion. Consulting
Psychologists Press.
McDuff, D., El Kaliouby, R., & Cohn, J. F. (2013). Affectiva facial expression dataset. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp.
3400-3407). https://doi.org/10.1109/CVPR.2013.437
Mollahosseini, A., Chan, D., & Mahoor, M. H. (2017). AffectNet: A dataset for facial
expression, valence, and arousal computing in the wild. IEEE Transactions on Affective
Computing, 10(1), 18-31. https://doi.org/10.1109/TAFFC.2017.2759785
Radford, A., Metz, L., & Chintala, S. (2016). Unsupervised representation learning with deep
convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.
https://arxiv.org/abs/1511.06434
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image
recognition. In Proceedings of the International Conference on Learning
Representations. https://arxiv.org/abs/1409.1556
Tan, M., Pang, R., & Le, Q. V. (2018). A study of deep learning with transfer learning for facial
expression recognition. In Proceedings of the European Conference on Computer Vision.
https://doi.org/10.1007/978-3-030-01264-9_1
Tian, Y., Kanade, T., & Cohn, J. F. (2001). Recognizing action units for facial expression
analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(2), 97-
https://doi.org/10.1109/34.908973
Xu, K., Ba, J., Kiros, R., Cho, K., & Bengio, Y. (2015). Show, attend and tell: Neural image
caption generation with visual attention. In Proceedings of the International Conference
on Machine Learning (pp. 2048-2057). https://arxiv.org/abs/1502.03044
Zeng, Z., Pantic, M., Roisman, G. I., & Huang, T. S. (2008). A survey of affect recognition
methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 31(1), 39-58. https://doi.org/10.1109/TPAMI.2008.35
Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2017). Joint face detection and alignment using multi-
task cascaded convolutional networks. IEEE Signal Processing Letters, 23(10), 1499-
https://doi.org/10.1109/LSP.2016.2581001
Zhang, L., Zhang, D., & He, X. (2018). A survey of deep learning for facial expression
recognition. Neurocomputing, 275, 1777-1794.
https://doi.org/10.1016/j.neucom.2017.12.051
Zhao, Z., Liu, L., & Zhang, Z. (2018). A hybrid deep learning model for facial emotion
recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition. https://doi.org/10.1109/CVPR.2018.00082
Zhao, X., He, X., & Liu, X. (2021). Transfer learning for facial expression recognition: A
survey. Journal of Computer Vision Research.