IRS-BAG-Integrated Radius-SMOTE Algorithm with Bagging Ensemble Learning Model for Imbalanced Data Set Classification

Lilis Yuningsih; Gede Angga Pradipta; Dadang Hermawan; Putu Desiana Wulaning Ayu; Dandy Pramana Hostiadi; Roy Rudolf Huizen

doi:10.28991/ESJ-2023-07-05-04

Authors

Lilis Yuningsih Department of Information System, Faculty Computer and Informatics, Institut Teknologi dan Bisnis STIKOM Bali, Denpasar 80234,, Indonesia https://orcid.org/0000-0002-6087-619X
Gede Angga Pradipta
angga_pradipta@stikom-bali.ac.id
Post Graduate Department of Information System, Faculty Computer and Informatics, Institut Teknologi dan Bisnis STIKOM Bali, Denpasar 80234,, Indonesia
Dadang Hermawan Department of Digital Bussines, Faculty Bussines and Vocation, Institut Teknologi dan Bisnis STIKOM Bali Denpasar 80234,, Indonesia
Putu Desiana Wulaning Ayu Post Graduate Department of Information System, Faculty Computer and Informatics, Institut Teknologi dan Bisnis STIKOM Bali, Denpasar 80234,, Indonesia
Dandy Pramana Hostiadi Post Graduate Department of Information System, Faculty Computer and Informatics, Institut Teknologi dan Bisnis STIKOM Bali, Denpasar 80234,, Indonesia
Roy Rudolf Huizen Post Graduate Department of Information System, Faculty Computer and Informatics, Institut Teknologi dan Bisnis STIKOM Bali, Denpasar 80234,, Indonesia

Vol. 7 No. 5 (2023): October

Research Articles

Downloads

PDF

Abstract
How to Cite
Metrics
References
License

Imbalanced learning problems are a challenge faced by classifiers when data samples have an unbalanced distribution among classes. The Synthetic Minority Over-Sampling Technique (SMOTE) is one of the most well-known data pre-processing methods. Problems that arise when oversampling with SMOTE are the phenomenon of noise, small disjunct samples, and overfitting due to a high imbalance ratio in a dataset. A high level of imbalance ratio and low variance conditions cause the results of synthetic data generation to be collected in narrow areas and conflicting regions among classes and make them susceptible to overfitting during the learning process by machine learning methods. Therefore, this research proposes a combination between Radius-SMOTE and Bagging Algorithm called the IRS-BAG Model. For each sub-sample generated by bootstrapping, oversampling was done using Radius SMOTE. Oversampling on the sub-sample was likely to overcome overfitting problems that might occur. Experiments were carried out by comparing the performance of the IRS-BAG model with various previous oversampling methods using the imbalanced public dataset. The experiment results using three different classifiers proved that all classifiers had gained a notable improvement when combined with the proposed IRS-BAG model compared with the previous state-of-the-art oversampling methods.

Doi:10.28991/ESJ-2023-07-05-04

Full Text:PDF

Fernández, A., García, S., Galar, M., Prati, R. C., Krawczyk, B., & Herrera, F. (2018). Learning from Imbalanced Data Sets. Springer, Cham, Switzerland. doi:10.1007/978-3-319-98074-4.

Ren, J., Wang, Y., Cheung, Y. ming, Gao, X. Z., & Guo, X. (2023). Grouping-based Oversampling in Kernel Space for Imbalanced Data Classification. Pattern Recognition, 133, 108992. doi:10.1016/j.patcog.2022.108992.

Ganaie, M. A., & Tanveer, M. (2022). KNN weighted reduced universum twin SVM for class imbalance learning. Knowledge-Based Systems, 245, 108578. doi:10.1016/j.knosys.2022.108578.

Anyanwu, G. O., Nwakanma, C. I., Lee, J. M., & Kim, D. S. (2023). RBF-SVM kernel-based model for detecting DDoS attacks in SDN integrated vehicular network. Ad Hoc Networks, 140, 103026. doi:10.1016/j.adhoc.2022.103026.

Petinrin, O. O., Saeed, F., & Al-Hadhrami, T. (2017). Voting-based ensemble method for prediction of bioactive molecules. 2017 2nd International Conference on Knowledge Engineering and Applications (ICKEA). doi:10.1109/ickea.2017.8169913.

Smith, M. R., & Martinez, T. (2018). The robustness of majority voting compared to filtering misclassified instances in supervised classification tasks. Artificial Intelligence Review, 49(1), 105–130. doi:10.1007/s10462-016-9518-2.

Onan, A., Korukoğlu, S., & Bulut, H. (2016). A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification. Expert Systems with Applications, 62, 1–16. doi:10.1016/j.eswa.2016.06.005.

Bashir, S., Qamar, U., & Khan, F. H. (2015). Heterogeneous classifiers fusion for dynamic breast cancer diagnosis using weighted vote-based ensemble. Quality and Quantity, 49(5), 2061–2076. doi:10.1007/s11135-014-0090-z.

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. doi:10.1613/jair.953.

Fernández, A., García, S., Herrera, F., & Chawla, N. V. (2018). SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary. Journal of Artificial Intelligence Research, 61, 863–905. doi:10.1613/jair.1.11192.

Hoffmann, C. H. (2022). Intelligence in Light of Perspectivalism: Lessons from Octopus Intelligence and Artificial Intelligence. Journal of Human, Earth, and Future, 3(3), 288-298. doi:10.28991/HEF-2022-03-03-03.

Bunkhumpornpat, C., Sinapiromsaran, K., & Lursinsap, C. (2009). Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem. Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science, Vol. 5476. Springer, Berlin, Germany. doi:10.1007/978-3-642-01307-2_43.

Han, H., Wang, W. Y., & Mao, B. H. (2005). Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. Advances in Intelligent Computing. ICIC 2005. Lecture Notes in Computer Science, 3644. Springer, Berlin, Germany. doi:10.1007/11538059_91.

Maciejewski, T., & Stefanowski, J. (2011). Local neighbourhood extension of SMOTE for mining imbalanced data. 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM). doi:10.1109/cidm.2011.5949434.

Borowska, K., & Stepaniuk, J. (2016). Imbalanced Data Classification: A Novel Re-sampling Approach Combining Versatile Improved SMOTE and Rough Sets. Computer Information Systems and Industrial Management. CISIM 2016. Lecture Notes in Computer Science, 9842. Springer, Cham, Switzerland. doi:10.1007/978-3-319-45378-1_4.

Gosain, A., & Sardana, S. (2019). Farthest SMOTE: A Modified SMOTE Approach. Computational Intelligence in Data Mining. Advances in Intelligent Systems and Computing, 711, Springer, Singapore. doi:10.1007/978-981-10-8055-5_28.

Mahmoudi, S., Moradi, P., Akhlaghian, F., & Moradi, R. (2014). Diversity and separable metrics in over-sampling technique for imbalanced data classification. 4th International Conference on Computer and Knowledge Engineering (ICCKE-2014). doi:10.1109/iccke.2014.6993409.

Wang, G. (2018). D-self-SMOTE: New method for customer credit risk prediction based on self-training and smote. ICIC Express Letters, Part B: Applications, 9(3), 241-246.

He, H., Bai, Y., Garcia, E. A., & Li, S. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. IEEE International Joint Conference on Neural Networks, IEEE World Congress on Computational Intelligence, Hong Kong. doi:10.1109/ijcnn.2008.4633969.

Torres, F. R., Carrasco-Ochoa, J. A., & Martínez-Trinidad, J. F. (2016). SMOTE-D a Deterministic Version of SMOTE. Pattern Recognition. MCPR 2016. Lecture Notes in Computer Science, 9703. Springer, Cham, Switzerland. doi:10.1007/978-3-319-39393-3_18.

Asniar, Maulidevi, N. U., & Surendro, K. (2022). SMOTE-LOF for noise identification in imbalanced data classification. Journal of King Saud University - Computer and Information Sciences, 34(6), 3413–3423. doi:10.1016/j.jksuci.2021.01.014.

Ramentol, E., Verbiest, N., Bello, R., Caballero, Y., Cornelis, C., & Herrera, F. (2012). SMOTE-FRST: a new resampling method using fuzzy rough set theory. Uncertainty modeling in knowledge engineering and decision making. World Scientific, Singapore.

Hu, F., & Li, H. (2013). A novel boundary oversampling algorithm based on neighborhood rough set model: NRSBoundary-SMOTE. Mathematical Problems in Engineering, 2013. doi:10.1155/2013/694809.

Ramentol, E., Caballero, Y., Bello, R., & Herrera, F. (2012). SMOTE-RSB: A hybrid preprocessing approach based on oversampling and under sampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowledge and Information Systems, 33(2), 245–265. doi:10.1007/s10115-011-0465-6.

Ramentol, E., Gondres, I., Lajes, S., Bello, R., Caballero, Y., Cornelis, C., & Herrera, F. (2016). Fuzzy-rough imbalanced learning for the diagnosis of High Voltage Circuit Breaker maintenance: The SMOTE-FRST-2T algorithm. Engineering Applications of Artificial Intelligence, 48, 134–139. doi:10.1016/j.engappai.2015.10.009.

Sáez, J. A., Luengo, J., Stefanowski, J., & Herrera, F. (2015). SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Information Sciences, 291, 184–203. doi:10.1016/j.ins.2014.08.051.

Pradipta, G. A., Wardoyo, R., Musdholifah, A., & Sanjaya, I. N. H. (2021). Radius-SMOTE: A New Oversampling Technique of Minority Samples Based on Radius Distance for Learning from Imbalanced Data. IEEE Access, 9, 74763–74777. doi:10.1109/ACCESS.2021.3080316.

Pradipta, G. A., Wardoyo, R., Musdholifah, A., & Sanjaya, I. N. H. (2022). Machine learning model for umbilical cord classification using combination coiling index and texture feature based on 2-D Doppler ultrasound images. Health Informatics Journal, 28(1), 1–19. doi:10.1177/14604582221084211.

Pradipta, G. A., Wardoyo, R., Musdholifah, A., & Sanjaya, I. N. H. (2020). Improving classification performance of fetal umbilical cord using combination of SMOTE method and multiclassifier voting in imbalanced data and small dataset. International Journal of Intelligent Engineering and Systems, 13(5), 441–454. doi:10.22266/ijies2020.1031.39.

Wardoyo, R., Wirawan, I. M. A., & Pradipta, I. G. A. (2022). Oversampling Approach Using Radius-SMOTE for Imbalance Electroencephalography Datasets. Emerging Science Journal, 6(2), 382–398. doi:10.28991/ESJ-2022-06-02-013.

Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. doi:10.1007/bf00058655.

Tomek, I. (1976). An Experiment with the Edited Nearest-Neighbor Rule. IEEE Transactions on Systems, Man, and Cybernetics, SMC-6(6), 448–452. doi:10.1109/tsmc.1976.4309523.

Acceptance Rate:	21%
Review Speed:	74 days
Issue Per Year:	6
Number of Volumes:	7
Number of Issues:	44
Number of Articles:	493
Number of Reviewers:	1187
Number of Contributors:	1394
Contributing Countries:	83
No. of WoS Citations:	2609
No. of Scopus Citations:	2936
No. of Google Citations:	4161
Google h-index:	29
Google i10-index:	126
Abstract Views:	681,807
PDF Download:	492,524

IRS-BAG-Integrated Radius-SMOTE Algorithm with Bagging Ensemble Learning Model for Imbalanced Data Set Classification

Authors

Downloads

Downloads

Login

submission

Publisher & Affiliated Societies

Indexing & Abstracting

SidebarMenu

IndexedBy

Indexing and Abstracting

twitter

Social Media

Analytics

Analytics

Information

Most Cited Articles

Impediments of Green Finance Adoption System: Linking Economy and Environment

Optical and Structural Characterization of Bi2FexNbO7 Nanoparticles for Environmental Applications

Digital Transformation: Opportunities and Challenges for Leaders in the Emerging Countries in Response to Covid-19 Pandemic

Thermal Regeneration and Reuse of Carbon and Glass Fibers from Waste Composites

Address

Contact Info:

IRS-BAG-Integrated Radius-SMOTE Algorithm with Bagging Ensemble Learning Model for Imbalanced Data Set Classification

Authors

Downloads

Downloads

Login

submission

Publisher & Affiliated Societies

Indexing & Abstracting

SidebarMenu

social

Journal Imprint

Journal Metrics

IndexedBy

Indexing and Abstracting

twitter

Social Media

Analytics

Analytics

Information

Most Cited Articles

Impediments of Green Finance Adoption System: Linking Economy and Environment

Optical and Structural Characterization of Bi2FexNbO7 Nanoparticles for Environmental Applications

Digital Transformation: Opportunities and Challenges for Leaders in the Emerging Countries in Response to Covid-19 Pandemic

Thermal Regeneration and Reuse of Carbon and Glass Fibers from Waste Composites