T-CER-Net: Attention-Based Temporal Cross-Eye Regression for Noise-Resilient Detection of Intermittent Strabismus
Downloads
Automated strabismus screening using video is difficult in unconstrained settings, where brief events such as blinking, head movement, or tracking errors can easily be mistaken for true ocular misalignment. The objective of this study is to improve diagnostic specificity while maintaining sensitivity in automated pre-screening scenarios. To address this problem, a temporal analysis framework, termed the Temporal Cross-Eye Regression Network (T-CER-Net), is proposed. The method introduces the Cross-Eye Regression Error (CERE), a scale- and position-invariant temporal signal that characterizes deviations in binocular coordination by measuring prediction error between the two eyes. Rather than relying on frame-level deviation estimates, the approach analyzes extended CERE sequences using a Transformer Encoder to assess temporal consistency. In addition, the training procedure explicitly accounts for real-world variability through oversampling of normal sequences containing common artifacts and the use of class weighting. The proposed method was evaluated against static threshold-based classifiers and a CNN–LSTM temporal baseline. On a held-out test set, T-CER-Net achieved an area under the ROC curve of 0.9140, with a sensitivity of 0.8421 and a specificity of 0.8500, showing improved robustness to noise-induced false positives. The findings suggest that treating binocular misalignment as a temporal pattern, together with attention-based sequence analysis, offers a practical and robust basis for automated strabismus pre-screening in real-world settings.
Downloads
[1] Hashemi, H., Rezvan, F., Pakzad, R., Ansaripour, A., Heydarian, S., Yekta, A., Ostadimoghaddam, H., Pakbin, M., & Khabazkhoob, M. (2022). Global and Regional Prevalence of Diabetic Retinopathy; A Comprehensive Systematic Review and Meta-analysis. Seminars in Ophthalmology, 37(3), 291–306. doi:10.1080/08820538.2021.1962920.
[2] Holmes, J. M., & Clarke, M. P. (2006). Amblyopia. The Lancet, 367(9519), 1343-1351. doi:10.1016/S0140-6736(06)68581-4.
[3] Holmes, J. M., Chandler, D. L., Christiansen, S. P., Birch, E. E., Bothun, E., Laby, D., Melia, B. M., Repka, M. X., Silbert, D. I., & Zeto, V. L. (2009). Interobserver reliability of the prism and alternate cover test in children with esotropia. Archives of Ophthalmology, 127(1), 59–65. doi:10.1001/archophthalmol.2008.548.
[4] Liu, J., Chi, J., Yang, H., & Yin, X. (2022). In the eye of the beholder: A survey of gaze tracking techniques. Pattern Recognition, 132, 108944. doi:10.1016/j.patcog.2022.108944.
[5] Guestrin, E. D., & Eizenman, M. (2006). General theory of remote gaze estimation using the pupil center and corneal reflections. IEEE Transactions on Biomedical Engineering, 53(6), 1124–1133. doi:10.1109/TBME.2005.863952.
[6] Tengtrisorn, S., Tungsattayathitthan, A., Na Phatthalung, S., Singha, P., Rattanalert, N., Bhurachokviwat, S., & Chouyjan, S. (2021). The reliability of the angle of deviation measurement from the Photo-Hirschberg tests and Krimsky tests. PLoS ONE, 16(12 December), 258744. doi:10.1371/journal.pone.0258744.
[7] Mestre, C., Gautier, J., & Pujol, J. (2018). Robust eye tracking based on multiple corneal reflections for clinical applications. Journal of Biomedical Optics, 23(03), 1. doi:10.1117/1.jbo.23.3.035001.
[8] Kellnhofer, P., Recasens, A., Stent, S., Matusik, W., & Torralba, A. (2019). Gaze360: Physically unconstrained gaze estimation in the wild. Proceedings of the IEEE International Conference on Computer Vision, 6911–6920. doi:10.1109/ICCV.2019.00701.
[9] Ghosh, S., Dhall, A., Hayat, M., Knibbe, J., & Ji, Q. (2024). Automatic Gaze Analysis: A Survey of Deep Learning Based Approaches. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(1), 61–84. doi:10.1109/TPAMI.2023.3321337.
[10] Simons, B. D., Siatkowski, R. M., Schiffman, J. C., Berry, B. E., & Flynn, J. T. (1999). Pediatric photoscreening for strabismus and refractive errors in a high- risk population. Ophthalmology, 106(6), 1073–1080. doi:10.1016/S0161-6420(99)90243-9.
[11] Huang, X., Lee, S. J., Kim, C. Z., & Choi, S. H. (2021). An automatic screening method for strabismus detection based on image processing. PLoS ONE, 16(8 August), 255643. doi:10.1371/journal.pone.0255643.
[12] Williams, T., Morgan, L. A., High, R., & Suh, D. W. (2018). Critical assessment of an ocular photoscreener. Journal of Pediatric Ophthalmology and Strabismus, 55(3), 194–199. doi:10.3928/01913913-20170703-18.
[13] Zhao, Z., Meng, H., Li, S., Wang, S., Wang, J., & Gao, S. (2025). High-Accuracy Intermittent Strabismus Screening via Wearable Eye-Tracking and AI-Enhanced Ocular Feature Analysis. Biosensors, 15(2), 110. doi:10.3390/bios15020110.
[14] Selva, J., Johansen, A. S., Escalera, S., Nasrollahi, K., Moeslund, T. B., & Clapés, A. (2023). Video transformers: A survey. IEEE transactions on pattern analysis and machine intelligence, 45(11), 12922-12943. doi:10.1109/TPAMI.2023.3243465.
[15] Omarov, B. (2025). Deep Learning in Biomedical Image and Signal Processing: A Survey. Computers, Materials, & Continua, 85(2), 2195. doi:10.32604/cmc.2025.064799.
[16] Ali, Z., Bukhari, M., Javaid, M., Safdar, J., Kim, H., & Rho, S. (2025). Investigating vulnerabilities of gait recognition model using latent-based perturbations. Scientific Reports, 15(1), 39242. doi:10.1038/s41598-025-22869-4.
[17] Madan, S., Lentzen, M., Brandt, J., Rueckert, D., Hofmann-Apitius, M., & Fröhlich, H. (2024). Transformer models in biomedicine. BMC Medical Informatics and Decision Making, 24(1), 214. doi:10.1186/s12911-024-02600-5.
[18] Tipirneni, S., & Reddy, C. K. (2022). Self-supervised transformer for sparse and irregularly sampled multivariate clinical time-series. ACM Transactions on Knowledge Discovery from Data (TKDD), 16(6), 1-17. doi:10.1145/3516367.
[19] Zheng, J., Ranjan, R., Chen, C. H., Chen, J. C., Castillo, C. D., & Chellappa, R. (2020). An automatic system for unconstrained video-based face recognition. IEEE Transactions on Biometrics, Behavior, and Identity Science, 2(3), 194-209. doi:10.1109/TBIOM.2020.2973504.
[20] Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., ... & Grundmann, M. (2019). Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172. doi:10.48550/arXiv.1906.08172.
[21] Baltrušaitis, T., Robinson, P., & Morency, L. P. (2016, March). Openface: an open-source facial behavior analysis toolkit. In 2016 IEEE winter conference on applications of computer vision (WACV), 1-10. doi:10.1109/WACV.2016.7477553.
[22] Beck, R. W. (1998). The Pediatric Eye Disease Investigator Group. Journal of AAPOS: The Official Publication of the American Association for Pediatric Ophthalmology and Strabismus / American Association for Pediatric Ophthalmology and Strabismus, 2(5), 255–256. doi:10.1016/S1091-8531(98)90079-9.
[23] Ahuja, K., Islam, R., Barbhuiya, F. A., & Dey, K. (2017). Convolutional neural networks for ocular smartphone-based biometrics. Pattern Recognition Letters, 91, 17–26. doi:10.1016/j.patrec.2017.04.002.
[24] Song, F., Tan, X., Chen, S., & Zhou, Z. H. (2013). A literature survey on robust and efficient eye localization in real-life scenarios. Pattern Recognition, 46(12), 3157–3173. doi:10.1016/j.patcog.2013.05.009.
[25] Morid, M. A., Sheng, O. R. L., & Dunbar, J. (2023). Time Series Prediction Using Deep Learning Methods in Healthcare. ACM Transactions on Management Information Systems, 14(1), 1–29. doi:10.1145/3531326.
[26] Farhad, M., Masud, M. M., Beg, A., Ahmad, A., & Ahmed, L. (2023). A Review of Medical Diagnostic Video Analysis Using Deep Learning Techniques. Applied Sciences (Switzerland), 13(11), 6582. doi:10.3390/app13116582.
[27] Ngo, T., & Manjunath, B. S. (2017). Saccade gaze prediction using a recurrent neural network. Proceedings - International Conference on Image Processing, ICIP, 2017-September, 3435–3439. doi:10.1109/ICIP.2017.8296920.
[28] Zheng, C., Li, W., Wang, S., Ye, H., Xu, K., Fang, W., Dong, Y., Wang, Z., & Qiao, T. (2024). Automated detection of steps in videos of strabismus surgery using deep learning. BMC Ophthalmology, 24(1), 242. doi:10.1186/s12886-024-03504-8.
[29] Han, C., Park, H., Kim, Y., & Gim, G. (2023). Hybrid CNN-LSTM Based Time Series Data Prediction Model Study. Studies in Computational Intelligence, 1075, 43–54. doi:10.1007/978-3-031-19608-9_4.
[30] Du, Q., Gu, W., Zhang, L., & Huang, S. L. (2018). Attention-based LSTM-CNNs for time-series classification. Proceedings of the 16th ACM conference on embedded networked sensor systems, 410-411. doi:10.1145/3274783.3275208.
[31] Thundiyil, S., & Picone, J. (2025). Time Series Analysis from Classical Methods to Transformer-Based Approaches: A Review. Signal Processing in Medicine and Biology: Applications of Deep Learning to the Health Sciences, 51–104. doi:10.1007/978-3-031-88024-7_2.
[32] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 5999–6009. doi:10.1201/9781003561460-19.
[33] Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., & Zhang, W. (2021). Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. 35th AAAI Conference on Artificial Intelligence, AAAI 2021, 12B(12), 11106–11115. doi:10.1609/aaai.v35i12.17325.
[34] Oliveira, J. M., & Ramos, P. (2024). Evaluating the Effectiveness of Time Series Transformers for Demand Forecasting in Retail. Mathematics, 12(17), 2728. doi:10.3390/math12172728.
[35] Hu, Y., & Xiao, F. (2022). Network self-attention for forecasting time series. Applied Soft Computing, 124, 109092. doi:10.1016/j.asoc.2022.109092.
[36] Martínez-Martínez, J., Brown, O., Karami, M., & Nabavi, S. (2025). Robust Training with Data Augmentation for Medical Imaging Classification. arXiv preprint arXiv:2506.17133. doi:10.48550/arXiv.2506.17133.
[37] Mosquera, C., Ferrer, L., Milone, D. H., Luna, D., & Ferrante, E. (2024). Class imbalance on medical image classification: towards better evaluation practices for discrimination and calibration performance. European Radiology, 34(12), 7895–7903. doi:10.1007/s00330-024-10834-0.
[38] Cui, Y., Jia, M., Lin, T. Y., Song, Y., & Belongie, S. (2019). Class-balanced loss based on effective number of samples. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019-June, 9260–9269. doi:10.1109/CVPR.2019.00949.
[39] Sugano, Y., Matsushita, Y., & Sato, Y. (2014). Learning-by-synthesis for appearance-based 3D gaze estimation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1821–1828. doi:10.1109/CVPR.2014.235.
- This work (including HTML and PDF Files) is licensed under a Creative Commons Attribution 4.0 International License.



















