Vision Transformer Embedded Feature Fusion Model with Pre-Trained Transformers for Keratoconus Disease Classification

Md Fatin Ishrak; Md Maruf Rahman; Md Imran Kabir Joy; Anna Tamuly; Salma Akter; Dewan M. Tanim; Shahajada Jawar; Nayeem Ahmed; Md Sadekur Rahman

doi:10.28991/ESJ-2025-09-02-027

Authors

Md Fatin Ishrak Department of Electrical and Computer Engineering, University of Memphis, Memphis,, United States https://orcid.org/0000-0001-6644-3822
Md Maruf Rahman Department of Marketing & Business Analytics, Texas A&M University- Commerce, Texas,, United States
Md Imran Kabir Joy MSA in Engineering Management, Central Michigan University, Michigan,, United States
Anna Tamuly Department of Computer Science, University of Memphis, Memphis,, United States
Salma Akter Department of Public Administration, Gannon University, Pennsylvania,, United States
Dewan M. Tanim Department of Computer and Information Science, Gannon University, Pennsylvania,, United States
Shahajada Jawar Department of Computer and Information Science, Gannon University, Pennsylvania,, United States
Nayeem Ahmed Department of Computer Science, University of Memphis, Memphis,, United States
Md Sadekur Rahman
sadekur.cse@daffodilvarsity.edu.bd
Department of Computer Science and Engineering, Daffodil International University, Birulia,, Bangladesh

Vol. 9 No. 2 (2025): April

Research Articles

Downloads

PDF

Abstract
How to Cite
Metrics
References
License

Keratoconus is a progressive eye disorder that, if undetected, can lead to severe visual impairment or blindness, necessitating early and accurate diagnosis. The primary objective of this research is to develop a feature fusion hybrid deep learning framework that integrates pretrained Convolutional Neural Networks (CNNs) with Vision Transformers (ViTs) for the automated classification of keratoconus into three distinct categories: Keratoconus, Normal, and Suspect. The dataset employed in this study is sourced from a widely recognized and publicly available online repository. Prior to model development, comprehensive preprocessing techniques were applied, including the removal of low-quality samples, image resizing, rescaling, and data augmentation. The dataset was subsequently partitioned into training, testing, and validation subsets to facilitate robust model training and performance evaluation. Eight state-of-the-art CNN architectures, including DenseNet121, EfficientNetB0, InceptionResNetV2, InceptionV3, MobileNetV2, ResNet50, VGG16, and VGG19, were utilized for feature extraction, while the ViT served as the classification head, leveraging its global attention mechanism for enhanced contextual learning, achieving near-perfect accuracy (e.g., DenseNet121+ViT: 99.28%). This study underscores the potential of hybrid CNN-ViT architectures to revolutionize keratoconus diagnosis, offering scalable, accurate, and efficient solutions to overcome limitations of traditional diagnostic methods while paving the way for broader applications in medical imaging.

Doi:10.28991/ESJ-2025-09-02-027

Full Text:PDF

Namba, H., Maeda, N., Utsunomiya, H., Kaneko, Y., Ishizawa, K., Ueno, Y., & Nishitsuka, K. (2025). Prevalence of keratoconus and keratoconus suspect, and their characteristics on corneal tomography in a population-based study. PLoS ONE, 20(1 January), 308892. doi:10.1371/journal.pone.0308892.

Chen, X., Zhao, J., Iselin, K. C., Borroni, D., Romano, D., Gokul, A., McGhee, C. N. J., Zhao, Y., Sedaghat, M. R., Momeni-Moghaddam, H., Ziaei, M., Kaye, S., Romano, V., & Zheng, Y. (2021). Keratoconus detection of changes using deep learning of colour-coded maps. BMJ Open Ophthalmology, 6(1), 824. doi:10.1136/bmjophth-2021-000824.

Li, S., Huo, Y., Xie, R., Han, Y., Zou, H., & Wang, Y. (2025). Enhancing early detection of keratoconus suspects using interocular corneal tomography asymmetry. International Ophthalmology, 45(1), 55. doi:10.1007/s10792-025-03423-7.

Kuo, B. I., Chang, W. Y., Liao, T. S., Liu, F. Y., Liu, H. Y., Chu, H. S., Chen, W. L., Hu, F. R., Yen, J. Y., & Wang, I. J. (2020). Keratoconus screening based on deep learning approach of corneal topography. Translational Vision Science and Technology, 9(2), 1–11. doi:10.1167/tvst.9.2.53.

Kamiya, K., Ayatsuka, Y., Kato, Y., Shoji, N., Miyai, T., Ishii, H., Mori, Y., & Miyata, K. (2021). Prediction of keratoconus progression using deep learning of anterior segment optical coherence tomography maps. Annals of Translational Medicine, 9(16), 1287–1287. doi:10.21037/atm-21-1772.

Muhsin, Z., Qahwaji, R., Ghafir, I., AlShawabkeh, M. A., Al Bdour, M., AlRyalat, S. A., & Al-Taee, M. Two-Stage Ensemble Learning Framework for Automated Classification of Keratoconus Severity. SSRN Electronic Journal, 1-17. doi:10.2139/ssrn.5096562.

Al-Timemy, A. H., Alzubaidi, L., Mosa, Z. M., Abdelmotaal, H., Ghaeb, N. H., Lavric, A., Hazarbassanov, R. M., Takahashi, H., Gu, Y., & Yousefi, S. (2023). A Deep Feature Fusion of Improved Suspected Keratoconus Detection with Deep Learning. Diagnostics, 13(10), 1689–1689,. doi:10.3390/diagnostics13101689.

Huang, H., Lin, L., Tong, R., Hu, H., Zhang, Q., Iwamoto, Y., Han, X., Chen, Y. W., & Wu, J. (2020). UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2020-May, 1055–1059. doi:10.1109/ICASSP40776.2020.9053405.

Gu, H., Guo, Y., Gu, L., Wei, A., Xie, S., Ye, Z., Xu, J., Zhou, X., Lu, Y., Liu, X., & Hong, J. (2020). Deep learning for identifying corneal diseases from ocular surface slit-lamp photographs. Scientific Reports, 10(1), 17851. doi:10.1038/s41598-020-75027-3.

Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017-January, 5987–5995. doi:10.1109/CVPR.2017.634.

Tan, M., & Le, Q. V. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. 36th International Conference on Machine Learning, ICML 2019, 2019-June, 10691–10700.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December, 770–778. doi:10.1109/CVPR.2016.90.

Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A. (2017). Inception-v4, inception-ResNet and the impact of residual connections on learning. 31st AAAI Conference on Artificial Intelligence, AAAI 2017, 31(1), 4278–4284. doi:10.1609/aaai.v31i1.11231.

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the Inception Architecture for Computer Vision. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December, 2818–2826. doi:10.1109/CVPR.2016.308.

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). MobileNetV2: Inverted Residuals and Linear Bottlenecks. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 4510–4520. doi:10.1109/CVPR.2018.00474.

Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, 1-14. doi:10.48550/arXiv.1409.1556

Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A Large-Scale Hierarchical Image Database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, 248–255. doi:10.1109/CVPR.2009.5206848.

Deheyab, A. O. A., Alwan, M. H., Rezzaqe, I. K. A., Mahmood, O. A., Hammadi, Y. I., Kareem, A. N., & Ibrahim, M. (2022). An Overview of Challenges in Medical Image Processing. ACM International Conference Proceeding Series, 511–516. doi:10.1145/3584202.3584278.

Al Bdour, M., Sabbagh, H. M., & Jammal, H. M. (2024). Multi-modal imaging for the detection of early keratoconus: a narrative review. Eye and Vision, 11(1), 18. doi:10.1186/s40662-024-00386-1.

Al-Timemy, A. H., Mosa, Z. M., Alyasseri, Z., Lavric, A., Lui, M. M., Hazarbassanov, R. M., & Yousefi, S. (2021). A Hybrid Deep Learning Construct for Detecting Keratoconus From Corneal Maps. Translational Vision Science and Technology, 10(14), 16. doi:10.1167/tvst.10.14.16.

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR 2021 - 9th International Conference on Learning Representations, 1-22. doi:10.48550/arXiv.2010.11929.

Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., & Jégou, H. (2021). Training data-efficient image transformers & distillation through attention. Proceedings of Machine Learning Research, 139, 10347–10357. doi:10.48550/arXiv.2012.12877.

He, K., Gan, C., Li, Z., Rekik, I., Yin, Z., Ji, W., Gao, Y., Wang, Q., Zhang, J., & Shen, D. (2023). Transformers in medical image analysis. Intelligent Medicine, 3(1), 59–78. doi:10.1016/j.imed.2022.07.002.

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2323. doi:10.1109/5.726791.

Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359. doi:10.1109/TKDE.2009.191.

Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Vol. 8689 LNCS, Issue PART 1, 818–833. doi:10.1007/978-3-319-10590-1_53.

Xu, Y., Quan, R., Xu, W., Huang, Y., Chen, X., & Liu, F. (2024). Advances in Medical Image Segmentation: A Comprehensive Review of Traditional, Deep Learning and Hybrid Approaches. Bioengineering, 11(10), 1034. doi:10.3390/bioengineering11101034.

Islam, S., Elmekki, H., Elsebai, A., Bentahar, J., Drawel, N., Rjoub, G., & Pedrycz, W. (2024). A comprehensive survey on applications of transformers for deep learning tasks. Expert Systems with Applications, 241, 122666–122666,. doi:10.1016/j.eswa.2023.122666.

Al-hammuri, K., Gebali, F., Kanan, A., & Chelvan, I. T. (2023). Vision transformer architecture and applications in digital health: a tutorial and survey. Visual Computing for Industry, Biomedicine, and Art, 6(1), 14. doi:10.1186/s42492-023-00140-9.

Rayed, M. E., Islam, S. M. S., Niha, S. I., Jim, J. R., Kabir, M. M., & Mridha, M. F. (2024). Deep learning for medical image segmentation: State-of-the-art advancements and challenges. Informatics in Medicine Unlocked, 47, 101504–101504. doi:10.1016/j.imu.2024.101504.

Pinto-Coelho, L. (2023). How Artificial Intelligence Is Shaping Medical Imaging Technology: A Survey of Innovations and Applications. Bioengineering, 10(12), 1435. doi:10.3390/bioengineering10121435.

Srinivasan, S., Francis, D., Mathivanan, S. K., Rajadurai, H., Shivahare, B. D., & Shah, M. A. (2024). A hybrid deep CNN model for brain tumor image multi-classification. BMC Medical Imaging, 24(1), 21. doi:10.1186/s12880-024-01195-7.

Tamilselvi, S., Suchetha, M., & Raman, R. (2025). Leveraging ResNet50 with Swin Attention for Accurate Detection of OCT Biomarkers Using Fundus Images. IEEE Access, 3544332. doi:10.1109/ACCESS.2025.3544332.

Azad, R., Jia, Y., Aghdam, E. K., Cohen-Adad, J., & Merhof, D. (2023). Enhancing Medical Image Segmentation with TransCeption: A Multi-Scale Feature Fusion Approach. arxiv:2301.10847, 1-11. doi:10.48550/arxiv.2301.10847.

Gürsoy, E., & Kaya, Y. (2025). Multi-source deep feature fusion for medical image analysis. Multidimensional Systems and Signal Processing, 36(1), 1-20. doi:10.1007/s11045-024-00897-z.

Patel, S., Patel, R., Ganatra, N., & Patel, A. (2022). Spatial Feature Fusion for Biomedical Image Classification based on Ensemble Deep CNN and Transfer Learning. International Journal of Advanced Computer Science and Applications, 13(5), 153–159. doi:10.14569/IJACSA.2022.0130519.

Öksüz, C., Urhan, O., & Güllü, M. K. (2022). Brain tumor classification using the fused features extracted from expanded tumor region. Biomedical Signal Processing and Control, 72, 103356. doi:10.1016/j.bspc.2021.103356.

Matsoukas, C., Haslum, J. F., Söderberg, M., & Smith, K. (2021). Is it Time to Replace CNNs with Transformers for Medical Images? arXiv:2108.09038, 1-6. doi:10.48550/arXiv.2108.09038.

Yamanakkanavar, N., & Lee, B. (2022). MF2-Net: A multipath feature fusion network for medical image segmentation. Engineering Applications of Artificial Intelligence, 114, 105004. doi:10.1016/j.engappai.2022.105004.

Zhou, T., Cheng, Q. R., Lu, H. L., Li, Q., Zhang, X. X., & Qiu, S. (2023). Deep learning methods for medical image fusion: A review. Computers in Biology and Medicine, 160, 106959. doi:10.1016/j.compbiomed.2023.106959.

Khan, S. A., Khan, M. A., Song, O.-Y., & Nazir, M. (2020). Medical Imaging Fusion Techniques: A Survey Benchmark Analysis, Open Challenges and Recommendations. Journal of Medical Imaging and Health Informatics, 10(11), 2523–2531. doi:10.1166/jmihi.2020.3222.

Lavric, A., & Valentin, P. (2019). KeratoDetect: Keratoconus Detection Algorithm Using Convolutional Neural Networks. Computational Intelligence and Neuroscience, 8162567, 1–9. doi:10.1155/2019/8162567.

Yousefi, S., Yousefi, E., Takahashi, H., Hayashi, T., Tampo, H., Inoda, S., Arai, Y., & Asbell, P. (2019). Keratoconus severity identification using unsupervised machine learning. PLoS ONE, 13(11), 205998. doi:10.1371/journal.pone.0205998.

Al-Timemy, A. H., Ghaeb, N. H., Mosa, Z. M., & Escudero, J. (2022). Deep Transfer Learning for Improved Detection of Keratoconus using Corneal Topographic Maps. Cognitive Computation, 14(5), 1627–1642. doi:10.1007/s12559-021-09880-3.

Wan, Q., Wang, Q., Wei, R., Tang, J., Yin, H., Deng, Y. P., & Ma, K. (2025). Machine learning-based progress prediction in accelerated cross-linking for Keratoconus. Graefe's Archive for Clinical and Experimental Ophthalmology, 1-15. doi:10.1007/s00417-025-06792-y.

Hashim, A. A., & Mazinani, M. (2025). Detection of Keratoconus Disease Depending on Corneal Topography Using Deep Learning. Kufa Journal of Engineering, 16(1), 463–478. doi:10.30572/2018/KJE/160125.

Ismael, O. (2025). Enhancing keratoconus detection with transformer technology and multi-source integration. Artificial Intelligence Review, 58(1), 1-31. doi:10.1007/s10462-024-11016-6.

Askarian, B., Askarian, A., Tabei, F., & Chong, J. W. (2025). An IoT-Enabled mHealth Sensing Approach for Remote Detection of Keratoconus Using Smartphone Technology. Sensors, 25(5), 1316. doi:10.3390/s25051316.

Hartmann, L. M., Langhans, D. S., Eggarter, V., Freisenich, T. J., Hillenmayer, A., König, S. F., Vounotrypidis, E., Wolf, A., & Wertheimer, C. M. (2024). Keratoconus Progression Determined at the First Visit: A Deep Learning Approach With Fusion of Imaging and Numerical Clinical Data. Translational Vision Science and Technology, 13(5), 7–7,. doi:10.1167/tvst.13.5.7.

Yaraghi, S., & Khatibi, T. (2024). Keratoconus disease classification with multimodel fusion and vision transformer: a pretrained model approach. BMJ Open Ophthalmology, 9(1), 1589. doi:10.1136/bmjophth-2023-001589.

Shi, K., Dou, N., Sun, S., Li, M., Li, P., Xu, L., ... & Mi, S. Feature Vector Aggregation Network: A New Paradigm for Keratoconus Detection. SSRN Electronic Journal, 1-14. doi:10.2139/ssrn.4713088.

Ji, Q., Huang, J., He, W., & Sun, Y. (2019). Optimized deep convolutional neural networks for identification of macular diseases from optical coherence tomography images. Algorithms, 12(3), 51. doi:10.3390/a12030051.

Isaksson, L. J., Pepa, M., Summers, P., Zaffaroni, M., Vincini, M. G., Corrao, G., Mazzola, G. C., Rotondi, M., Lo Presti, G., Raimondi, S., Gandini, S., Volpe, S., Haron, Z., Alessi, S., Pricolo, P., Mistretta, F. A., Luzzago, S., Cattani, F., Musi, G., ... Jereczek-Fossa, B. A. (2023). Comparison of automated segmentation techniques for magnetic resonance images of the prostate. BMC Medical Imaging, 23(1), 32. doi:10.1186/s12880-023-00974-y.

Peng, C., Liu, Y., Yuan, X., & Chen, Q. (2022). Research of image recognition method based on enhanced inception-ResNet-V2. Multimedia Tools and Applications, 81(24), 34345–34365. doi:10.1007/s11042-022-12387-0.

Iparraguirre-Villanueva, O., Guevara-Ponce, V., Paredes, O. R., Sierra-Liñan, F., Zapata-Paulini, J., & Cabanillas-Carbonell, M. (2022). Convolutional Neural Networks with Transfer Learning for Pneumonia Detection. International Journal of Advanced Computer Science and Applications, 13(9), 544–551. doi:10.14569/IJACSA.2022.0130963.

Al-Rammahi, A. H. I. (2022). Face mask recognition system using MobileNetV2 with optimization function. Applied Artificial Intelligence, 36(1), 2145638. doi:10.1080/08839514.2022.2145638.

Mukherjee, S. (2022). The annotated RESNet-50 - towards data science. Towards Data Science, California, United States. Available online: https://towardsdatascience.com/the-annotated-resnet-50-a6c536034758 (accessed on March 2025).

Aluka, M., Reddy, V. P., & Ganesan, S. (2022). Comparative Analysis of CNN Regularisation and Augmentation Techniques with Ten Layer Deep Learning Model to Detect Lung Cancer. International Journal on Recent and Innovation Trends in Computing and Communication, 10(11), 33–39. doi:10.17762/ijritcc.v10i11.5777.

Zheng, Y., Yang, C., & Merkulov, A. (2018). Breast cancer screening using convolutional neural network and follow-up digital mammography. Computational Imaging III, 4. doi:10.1117/12.2304564.

Acceptance Rate:	21%
Review Speed:	74 days
Issue Per Year:	6
Number of Volumes:	7
Number of Issues:	44
Number of Articles:	493
Number of Reviewers:	1187
Number of Contributors:	1394
Contributing Countries:	83
No. of WoS Citations:	2609
No. of Scopus Citations:	2936
No. of Google Citations:	4161
Google h-index:	29
Google i10-index:	126
Abstract Views:	681,807
PDF Download:	492,524

Vision Transformer Embedded Feature Fusion Model with Pre-Trained Transformers for Keratoconus Disease Classification

Authors

Downloads

Downloads

Login

submission

Publisher & Affiliated Societies

Indexing & Abstracting

SidebarMenu

IndexedBy

Indexing and Abstracting

twitter

Social Media

Analytics

Analytics

Information

Most Cited Articles

Digital Transformation: Opportunities and Challenges for Leaders in the Emerging Countries in Response to Covid-19 Pandemic

Impediments of Green Finance Adoption System: Linking Economy and Environment

Thermal Regeneration and Reuse of Carbon and Glass Fibers from Waste Composites

Optical and Structural Characterization of Bi2FexNbO7 Nanoparticles for Environmental Applications

Address

Contact Info:

Vision Transformer Embedded Feature Fusion Model with Pre-Trained Transformers for Keratoconus Disease Classification

Authors

Downloads

Downloads

Login

submission

Publisher & Affiliated Societies

Indexing & Abstracting

SidebarMenu

social

Journal Imprint

Journal Metrics

IndexedBy

Indexing and Abstracting

twitter

Social Media

Analytics

Analytics

Information

Most Cited Articles

Digital Transformation: Opportunities and Challenges for Leaders in the Emerging Countries in Response to Covid-19 Pandemic

Impediments of Green Finance Adoption System: Linking Economy and Environment

Thermal Regeneration and Reuse of Carbon and Glass Fibers from Waste Composites

Optical and Structural Characterization of Bi2FexNbO7 Nanoparticles for Environmental Applications