A YOLO Detector Providing Fast and Accurate Pupil Center Estimation using Regions Surrounding a Pupil

Wattanapong Kurdthongmee, Piyadhida Kurdthongmee, Korrakot Suwannarat, Jeremy K. Kiplagat


Eye-tracking technology has many useful applications, including Virtual Reality (VR) devices, Augmented Reality (AR) devices, and assistive technology. The main objective of eye-tracking technology is to detect eye position and track eye movements. It is possible to determine the eye position when the pupil center is detected. In this paper, a deep learning-based approach to the detection of pupil centers from webcam images is presented. As opposed to all previous approaches to object detection based on training the detector with objects to be detected, our object detector was trained with both the region surrounding a pupil and the region between an eye and the region surrounding a pupil. The latter set of regions has been found to increase the overall detection accuracy. A novel post-processing algorithm is also presented to estimate the pupil center from all the detected regions. To achieve real-time performance, we have adopted the tiny architecture of YOLOv3, which has 23 layers and can be executed without requiring a GPU accelerator. To train the detectors, different variations of regions covering a pupil and an eye were used, as well as expanding regions surrounding a pupil and an eye. The PUPPIE dataset was used as the primary input for training the detector. The setting with the best detection accuracy was applied to all publicly available datasets: I2Head, MPIIGaze, and U2Eyes. In terms of accuracy, the results indicate that pupil center estimation is comparable to the state-of-the-art approach. It achieves pupil center estimation errors below the size of a constricted pupil in more than 98.24% of images. Furthermore, the detection time is 2.8 times faster than the state-of-the-art approach.


Doi: 10.28991/ESJ-2022-06-05-05

Full Text: PDF


Eye Tracking; Pupil Center Detection; Convolutional Neural Networks; You-Only-Look-Once.


Guestrin, E. D., & Eizenman, M. (2006). General theory of remote gaze estimation using the pupil center and corneal reflections. IEEE Transactions on Biomedical Engineering, 53(6), 1124–1133. doi:10.1109/TBME.2005.863952.

Duchowski, A. T. (2003). Eye Tracking Techniques. Eye Tracking Methodology: Theory and Practice, 55–65, Springer, London. doi:10.1007/978-1-4471-3750-4_5.

Cognolato, M., Atzori, M., & Müller, H. (2018). Head-mounted eye gaze tracking devices: An overview of modern devices and recent advances. Journal of Rehabilitation and Assistive Technologies Engineering, 5, 1-13. doi:10.1177/2055668318773991.

Fuhl, W., Santini, T., Kasneci, G., & Kasneci, E. (2016). Pupilnet: Convolutional neural networks for robust pupil detection. arXiv preprint arXiv:1601.04902. doi:10.48550/arXiv.1601.04902.

Fuhl, W., Santini, T., Kasneci, G., Rosenstiel, W., & Kasneci, E. (2017). Pupilnet v2. 0: Convolutional neural networks for CPU based real time robust pupil detection. arXiv preprint arXiv:1711.00112. doi: 10.48550/arXiv.1711.00112.

Choi, J. H., Il Lee, K., Kim, Y. C., & Cheol Song, B. (2019). Accurate Eye Pupil Localization Using Heterogeneous CNN Models. 2019 IEEE International Conference on Image Processing (ICIP). doi:10.1109/icip.2019.8803121.

Xia, Y., Yu, H., & Wang, F. Y. (2019). Accurate and robust eye center localization via fully convolutional networks. IEEE/CAA Journal of Automatica Sinica, 6(5), 1127–1138. doi:10.1109/JAS.2019.1911684.

Lee, K.I., Jeon, J.H., Song, B.C. (2020). Deep Learning-Based Pupil Center Detection for Fast and Accurate Eye Tracking System. Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science. Springer, Cham, Switzerland. doi:10.1007/978-3-030-58529-7_3.

Kitazumi, K., & Nakazawa, A. (2019). Robust Pupil Segmentation and Center Detection from Visible Light Images Using Convolutional Neural Network. 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC). doi:10.1109/SMC.2018.00154.

King, D. E. (2009). Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research, 10, 1755–1758.

Ronneberger, O. (2017). Invited Talk: U-Net Convolutional Networks for Biomedical Image Segmentation. Bildverarbeitung für die Medizin 2017. Informatik Aktuell. Springer, Berlin, Heidelberg. doi:10.1007/978-3-662-54345-0_3.

Zdarsky, N., Treue, S., & Esghaei, M. (2021). A Deep Learning-Based Approach to Video-Based Eye Tracking for Human Psychophysics. Frontiers in Human Neuroscience, 15. doi:10.3389/fnhum.2021.685830.

Mathis, A., Mamidanna, P., Cury, K. M., Abe, T., Murthy, V. N., Mathis, M. W., & Bethge, M. (2018). DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nature Neuroscience, 21(9), 1281–1289. doi:10.1038/s41593-018-0209-y.

Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B. (2016). DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model. Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science. Springer, Cham, Switzerland. https://doi.org/10.1007/978-3-319-46466-4_3.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr.2016.90.

Kim, S., Jeong, M., & Ko, B. C. (2020). Energy Efficient Pupil Tracking Based on Rule Distillation of Cascade Regression Forest. Sensors, 20(18), 5141. doi:10.3390/s20185141.

Cai, H., Liu, B., Ju, Z., Thill, S., Belpaeme, T., Vanderborght, B., & Liu, H. (2018). Accurate eye center localization via hierarchical adaptive convolution. 29th British Machine Vision Conference. British Machine Vision Association, 3-6 September 2018, Newcastle, United Kingdom.

Lee, K.I., Jeon, J.H., Song, B.C. (2020). Deep Learning-Based Pupil Center Detection for Fast and Accurate Eye Tracking System. Computer Vision – ECCV 2020. Lecture Notes in Computer Science. Springer, Cham, Switzerland. https://doi.org/10.1007/978-3-030-58529-7_3.

Brousseau, B., Rose, J., & Eizenman, M. (2020). Hybrid eye-tracking on a smartphone with CNN feature extraction and an infrared 3D model. Sensors (Switzerland), 20(2), 543. doi:10.3390/s20020543.

Ou, W. L., Kuo, T. L., Chang, C. C., & Fan, C. P. (2021). Deep-learning-based pupil center detection and tracking technology for visible-light wearable gaze tracking devices. Applied Sciences (Switzerland), 11(2), 851.

Poulopoulos, N., Psarakis, E. Z., & Kosmopoulos, D. (2021). PupilTAN: A Few-Shot Adversarial Pupil Localizer. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). doi:10.1109/cvprw53098.2021.00350.

Kang, D., & Chang, H. S. (2021). Low-complexity pupil tracking for sunglasses-wearing faces for glasses-free 3d Huds. Applied Sciences (Switzerland), 11(10), 4366. doi:10.3390/app11104366.

Larumbe-Bergera, A., Garde, G., Porta, S., Cabeza, R., & Villanueva, A. (2021). Accurate pupil center detection in off-the-shelf eye tracking systems using convolutional neural networks. Sensors, 21(20), 6847. doi:10.3390/s21206847.

Lin, Z., Liu, Y., Wang, H., Liu, Z., Cai, S., Zheng, Z., … Zhang, X. (2022). An eye tracker based on webcam and its preliminary application evaluation in Chinese reading tests. Biomedical Signal Processing and Control, 74, 103521. doi:10.1016/j.bspc.2022.103521.

Larumbe-Bergera, A., Porta, S., Cabeza, R., & Villanueva, A. (2019). SeTA. Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications. doi:10.1145/3314111.3319830.

Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767. doi:10.48550/arXiv.1804.02767.

Bradski, G. (2000). The openCV library. Dr. Dobb's Journal: Software Tools for the Professional Programmer, 25(11), 120-123.

Jesorsky, O., Kirchberg, K.J., Frischholz, R.W. (2001). Robust Face Detection Using the Hausdorff Distance. Audio- and Video-Based Biometric Person Authentication. AVBPA 2001. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg. doi:10.1007/3-540-45344-X_14.

Larumbe, A., Cabeza, R., & Villanueva, A. (2018). Supervised descent method (SDM) applied to accurate pupil detection in off-the-shelf eye tracking systems. Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications. doi:10.1145/3204493.3204551.

Levinshtein, A., Phung, E., & Aarabi, P. (2018). Hybrid eye center localization using cascaded regression and hand-crafted model fitting. Image and Vision Computing, 71, 17–24. doi:10.1016/j.imavis.2018.01.003.

Full Text: PDF

DOI: 10.28991/ESJ-2022-06-05-05


  • There are currently no refbacks.

Copyright (c) 2022 Wattanapong Kurdthongmee