SMOTE‑ENSEMBLE: A REVIEW OF DATA‑BALANCING TECHNIQUES AND HYBRID MACHINE LEARNING MODELS FOR EARLY PREDICTION OF DIABETIC RETINOPATHY COMPLICATIONS

  • Ha Ngoc Tuan Hung Yen University of Technology and Education
  • Pham Thi Anh Huong Hung Yen University of Technology and Education
  • Tran Thi Thu Huyen Hung Yen University of Technology and Education
  • Ngo Thi Lan Anh Hung Yen University of Technology and Education

Abstract

Diabetic retinopathy (DR) is one of the most common microvascular complications of diabetes and the leading cause of vision loss worldwide, yet its early stages often produce no noticeable symptoms. Consequently, large‑scale fundus image screening programs face challenges due to the inherent class imbalance in DR datasets: severe cases requiring urgent intervention are scarce compared to mild or non‑DR images. To address this, we provide a comprehensive overview of SMOTE ensemble strategies to enhance detection sensitivity for underrepresented classes. First, we analyze DR’s pathophysiological progression and dataset characteristics, demonstrating how imbalance reduces model recall. We then detail SMOTE and its variants—including Borderline‑SMOTE, ADASYN, and Geometric SMOTE—highlighting controlled synthetic minority over‑sampling [1]. Next, we review ensemble learning frameworks (Bagging, Boosting, Voting, Stacking) and their integration with SMOTE, with emphasis on the SMOTEBoost algorithm and recent refinements. Synthesizing results from over thirty studies, we show that SMOTE ensemble methods yield 5–18 % improvements in AUC, F1 score, and recall for severe DR detection [2].

Finally, we discuss current limitations and propose future research directions—such as GAN‑based augmentation, multitask learning, and interpretable model design—to accelerate clinical deployment.

Published
2025-12-08