Vision Transformer Embedded Feature Fusion Model with Pre-Trained Transformers for Keratoconus Disease Classification

Feature Fusion Model Keratoconus Vision Transformer DenseNet121 EfficientNetB0 InceptionResNetV2 InceptionV3 MobileNetV2 ResNet50 VGG16 VGG19.

Authors

  • Md Fatin Ishrak Department of Electrical and Computer Engineering, University of Memphis, Memphis,, United States https://orcid.org/0000-0001-6644-3822
  • Md Maruf Rahman Department of Marketing & Business Analytics, Texas A&M University- Commerce, Texas,, United States
  • Md Imran Kabir Joy MSA in Engineering Management, Central Michigan University, Michigan,, United States
  • Anna Tamuly Department of Computer Science, University of Memphis, Memphis,, United States
  • Salma Akter Department of Public Administration, Gannon University, Pennsylvania,, United States
  • Dewan M. Tanim Department of Computer and Information Science, Gannon University, Pennsylvania,, United States
  • Shahajada Jawar Department of Computer and Information Science, Gannon University, Pennsylvania,, United States
  • Nayeem Ahmed Department of Computer Science, University of Memphis, Memphis,, United States
  • Md Sadekur Rahman
    sadekur.cse@daffodilvarsity.edu.bd
    Department of Computer Science and Engineering, Daffodil International University, Birulia,, Bangladesh

Downloads

Keratoconus is a progressive eye disorder that, if undetected, can lead to severe visual impairment or blindness, necessitating early and accurate diagnosis. The primary objective of this research is to develop a feature fusion hybrid deep learning framework that integrates pretrained Convolutional Neural Networks (CNNs) with Vision Transformers (ViTs) for the automated classification of keratoconus into three distinct categories: Keratoconus, Normal, and Suspect. The dataset employed in this study is sourced from a widely recognized and publicly available online repository. Prior to model development, comprehensive preprocessing techniques were applied, including the removal of low-quality samples, image resizing, rescaling, and data augmentation. The dataset was subsequently partitioned into training, testing, and validation subsets to facilitate robust model training and performance evaluation. Eight state-of-the-art CNN architectures, including DenseNet121, EfficientNetB0, InceptionResNetV2, InceptionV3, MobileNetV2, ResNet50, VGG16, and VGG19, were utilized for feature extraction, while the ViT served as the classification head, leveraging its global attention mechanism for enhanced contextual learning, achieving near-perfect accuracy (e.g., DenseNet121+ViT: 99.28%). This study underscores the potential of hybrid CNN-ViT architectures to revolutionize keratoconus diagnosis, offering scalable, accurate, and efficient solutions to overcome limitations of traditional diagnostic methods while paving the way for broader applications in medical imaging.

 

Doi: 10.28991/ESJ-2025-09-02-027

Full Text: PDF