SMOTE‑ENSEMBLE: A REVIEW OF DATA‑BALANCING TECHNIQUES AND HYBRID MACHINE LEARNING MODELS FOR EARLY PREDICTION OF DIABETIC RETINOPATHY COMPLICATIONS

  • Ha Ngoc Tuan Hung Yen University of Technology and Education
  • Pham Thi Anh Huong Hung Yen University of Technology and Education
  • Tran Thi Thu Huyen Hung Yen University of Technology and Education
  • Ngo Thi Lan Anh Hung Yen University of Technology and Education
Keywords: SMOTE Ensemble ,Ensemble Learning, Borderline-SMOTE, ADASYN, Geometric, imbalance data

Abstract

Diabetic retinopathy (DR) is one of the most common microvascular complications of diabetes and the leading cause of vision loss worldwide, yet its early stages often produce no noticeable symptoms. Consequently, large‑scale fundus image screening programs face challenges due to the inherent class imbalance in DR datasets: severe cases requiring urgent intervention are scarce compared to mild or non‑DR images. To address this, we provide a comprehensive overview of SMOTE ensemble strategies to enhance detection sensitivity for underrepresented classes. First, we analyze DR’s pathophysiological progression and dataset characteristics, demonstrating how imbalance reduces model recall. We then detail SMOTE and its variants—including Borderline‑SMOTE, ADASYN, and Geometric SMOTE—highlighting controlled synthetic minority over‑sampling [1]. Next, we review ensemble learning frameworks (Bagging, Boosting, Voting, Stacking) and their integration with SMOTE, with emphasis on the SMOTEBoost algorithm and recent refinements. Synthesizing results from over thirty studies, we show that SMOTE ensemble methods yield 5–18 % improvements in AUC, F1 score, and recall for severe DR detection [2].

Finally, we discuss current limitations and propose future research directions—such as GAN‑based augmentation, multitask learning, and interpretable model design—to accelerate clinical deployment.

References

S. Yadav, “A Comparative Analysis of Sampling Techniques for Imbalanced Datasets in Machine Learning,” International Journal of Innovative Research and Development, vol. 7, Sep. 2021, doi: 10.5281/zenodo.14203644.

X. Wan et al., “Predicting diabetic retinopathy based on routine laboratory tests by machine learning algorithms,” Eur J Med Res, vol. 30, no. 1, p. 183, 2025, doi: 10.1186/s40001-025-02442-5.

D. Yan, X. Li, Y. Wang, and Z. Cai, “Optimized prediction of diabetes complications using ensemble learning with Bayesian optimization: a cost-efficient laboratory-based approach,” Front Endocrinol (Lausanne), vol. 16-2025, 2025, doi: 10.3389/fendo.2025.1593068.

O. I. Ogunyemi et al., “Detecting diabetic retinopathy through machine learning on electronic health record data from an urban, safety net healthcare system,” JAMIA Open, vol. 4, no. 3, p. ooab066, Aug. 2021, doi: 10.1093/jamiaopen/ooab066.

Georgedouzas, “georgedouzas / imbalanced-learn-extra Public,” https://github.com/georgedouzas/imbalanced-learn-extra.

S. Yadav, “A Comparative Analysis of Sampling Techniques for Imbalanced Datasets in Machine Learning,” International Journal of Innovative Research and Development, vol. 7, Jul. 2021, doi: 10.5281/zenodo.14203644.

X. Wan et al., “Predicting diabetic retinopathy based on routine laboratory tests by machine learning algorithms,” Eur J Med Res, vol. 30, no. 1, p. 183, 2025, doi: 10.1186/s40001-025-02442-5.

A. Khan, O. Chaudhari, and R. Chandra, “A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation,” Expert Syst Appl, vol. 244, p. 122778, Jul. 2024, doi: 10.1016/j.eswa.2023.122778.

S. Naik, D. Kamidi, S. Govathoti, R. Cheruku, and A. Reddy, “RETRACTED ARTICLE: Efficient diabetic retinopathy detection using convolutional neural network and data augmentation,” Soft comput, vol. 28, p. 617, Jun. 2023, doi: 10.1007/s00500-023-08537-7.

V. Gulshan et al., “Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs,” JAMA, vol. 316, no. 22, pp. 2402–2410, Dec. 2016, doi: 10.1001/jama.2016.17216.

Published
2025-12-08
How to Cite
Ha Ngoc Tuan, Pham Thi Anh Huong, Tran Thi Thu Huyen, & Ngo Thi Lan Anh. (2025). SMOTE‑ENSEMBLE: A REVIEW OF DATA‑BALANCING TECHNIQUES AND HYBRID MACHINE LEARNING MODELS FOR EARLY PREDICTION OF DIABETIC RETINOPATHY COMPLICATIONS. Journal of Applied Science and Technology, 48, 51-57. Retrieved from https://jst.utehy.edu.vn/index.php/jst/article/view/833