Comparison of Machine Learning Approach for Waste Bottle Classification

The use of machine learning for the image classification process is growing all the time. Many methods can be used to classify an image with good accuracy. Convolutional Neural Network (CNN) and Support Vector Machine (SVM) are popular methods for this case. The two approaches have differences in the data training process to achieve classification objectives. Although there are some differences between these approaches, there are some advantages to both of them. This research explores the comparison of the two CNN and SVM methods by comparing the training process carried out and the accuracy results of the classification. The process stages are divided into pre-processing, training, and testing. The objects used are ten waste plastic bottles with different brands of medium size with a total data of 1100 images. Based on the observations, both methods have advantages and disadvantages in the data training and classification process. However, from the results, CNN's accuracy is better than SVM. The accuracy of both networks is 99% for CNN and 74% for SVM, respectively. So, from the results of experiments that have been carried out in the study, it was found that CNN was still better than SVM.

label to predict based on pattern data. Therefore, SL needs input data training to learn the models. Many methods with the SL algorithm can be used to classify an image, but the most popular methods frequently used are the Convolutional Neural Network (CNN) and Support Vector Machine (SVM) [6]. CNN is a deep learning based on a neural network classifying an image [7,8], voice [9,10], and video [11,12]. This method has an input layer, an output layer, and a hidden layer [13]. The hidden layers include convolutional layers, pooling layers, normalization layers, ReLU layers, fully connected, and loss layers [14]. Convolutional layers are the core of the computing process in CNN. The convolution process is done by moving the kernel convolution to get information about the image [15]. Many people use the CNN method to classify an image because it can predict many objects and classes with high accuracy [16,17]. Another method that can be used to classify the image is the Support Vector Machine (SVM) [18]. SVM is a mathematical algorithm-based method for classifying images. This method can resolve the problem of classification in a linear or non-linear manner [19]. The basic task of SVM is to find the best hyperplane with the maximum distance of each class [20]. The hyperplane is a function that can be split into classes with high dimensions [21].
Both methods have different training techniques to classify objects. CNN produces models that can be used as a classification model reference, such as recycling detection [22], vehicle detection [23,24], and medical applications [25,26]. Moreover, SVM has a simple technique to classify an object image [27,28].
The comparison of CNN and SVM methods in the last few kinds of literature gives dynamic results. In the use of two classes of databases, SVM provides a high level of accuracy compared to CNN [29,30]. Meanwhile, for data that uses multiple classes of databases, CNN provides a higher level of performance compared to SVM [31]. With the difference between the training processes of CNN and SVM in older research, the purpose of the research is to find the best technique for the multi-class classification of plastic bottle images. We evaluate based on the process which is conducted using each method, CNN and SVM.
This research aims to compare CNN and SVM to classification objects based on the training technique model. In this research, CNN and SVM were presented to classify the plastic bottle waste database. This work presents a modified CNN and SVM learning method using a programming base. In the final stage, the accuracy level for each method and the classification results for each class in the database are compared. The objects used are plastic bottles of 10 different brands with a medium size and images taken by a camera with 2352×4160 pixels resolution.

2-Material and Methods
The implementation of this research requires some steps that should be prepared as the image data of plastic bottles. Plastic bottle image data comes from ten brands, namely Aqua, Ades, Coca-cola, Fanta, Floridina, Freshtea, Fruittea, Ichiocha, Minute-maid, and Sprite, as shown in Figure 1. Furthermore, the laptop's hardware for running the code has the following specifications: CPU Intel Core i5-3317U, RAM 8GB, and Graphical Processing Unit (GPU) Nvidia GeForce 740M. The software used is Anaconda for Python programming.

Figure 1. Sample Image Data of Plastic Bottles
The research methodology in this study can be seen in Figure 2. In general, the research methodology using an artificial intelligence approach begins with collecting data as system learning materials. From this data, the total amount of data is allocated as learning data and performance data. In the learning dataset, learning is carried out in stages with the final product of a model. The model of the learning outcomes is run as a measurement of the level of accuracy using performance data. Performance results that meet and have good accuracy will later be used as the final model of this artificial intelligence approach. However, an unsatisfactory accuracy value will be given another learning method until good performance is achieved in this work.

Figure 2. Flowchart of research methodology
The process making of the system is divided into two parts because CNN and SVM each have their own methods. However, the process of the system has essential steps such as pre-processing, training, and testing. The essential steps can be seen in    Data collection of Supervised Learning (SL) in terms of CNN and SVM must be labeled before pre-processing step. The labeling is used to ensure that the specific class that the generated output will be stored in data knowledge or model results. The total images labeled are split into two parts. The first part is for training data, and the other is used for testing. The beginning process refers to Figure 3-a starting from collecting the database for training. The total image data used in training is multiplied by ten classes. However, the total CNN and SVM training data is 1000 image data. Next, the pre-processing for CNN adopts the whole original image. In contrast, the SVM classifier needs GrabCut implementation to clear the image background during the pre-processing step. Figure 3-b is a testing process that uses ten images for each class to evaluate the performance system. Based on the number of classes, the total testing images are 100 images. The evaluation performance that mentioned earlier is the percentage of accuracy based on the match of estimation and target class.

2-1-Convolutional Neural Network
Convolutional Neural Network (CNN) is an approach that uses pixels as a reference to classify estimate classes. There are so many types of CNN, such as ResNet [32], DenseNet [33], RCNN [34], Fast R-CNN [35], and Faster R-CNN [36]. The CNN model that will be used in this research is Faster R-CNN. Faster R-CNN proposes a Region Proposal Network (RPN) that leads to the best result among other kinds of R-CNN architecture. This research operations, both training, and testing, use the Tensorflow platform. Tensorflow is used to accelerate numeric computing and is more accessible to implementing machine learning. The Tensorflow version used is Tensorflow 1.15.0. Model Tensorflow used is Faster-RCNN. This is an upgradable model to improve the accuracy of classification. The CNN works steps can be seen in Figure 4.

Figure 4. CNN Work Step
This method works by utilizing a convolution or filter process that divides the image into several small parts so that the system will get new information. Each image resulting from the convolution will be used as an input to get a feature representation to recognize an object through the Neural Network process. The results of the image data input will be saved into a new Array. The Array value will then be subjected to a Downsampling process to reduce the Array size by using Max Pooling or taking the most significant pixel value in each pooling kernel. Feature extraction in the CNN method occurs from the encoding process of an image into a feature in the form of a numeric value. This process occurs in the Convolutional Layer and Pooling Layer processes. At the end of the Convolutional network process, there is an evaluation section that is used to measure system performance. This process is called the accuracy and loss measurement process. The measurement formula is based on Mao (2020) [37]; it can be seen in Equations 1 and 2. From this equation, it is known that N is the number of observations and K is the number of data classes.
The initial stage in pre-processing or image data preparation is to prepare plastic bottles image data and configuration before training. The Faster-RCNN model requires two folders before training, namely a folder for training image data and image data for validation. The plastic bottle image data is labeled to provide information according to the object in the image, which is then stored in (*.xml) format. The results of the labeling are then converted into .csv format files. The file contains a collection of information data from the labeling process that is put together. Before the training process, the CSV file must be converted to TFRecord, so that Tensorflow will read the input data. The following preparation is to prepare a LabelMap which contains a sequence of id and class names used for classification. Before the Training process, the Faster RCNN model requires configuration such as the number of classes used, the number of num_steps, the path of the image data file, TFRecord, and LabelMap files that are in the Training program script.
The data training process will train all training data and produce a model that can detect objects. The time required to carry out the training data process depends on the number of steps and the equipment used. The recommended device in the Training data process is a GPU device. After doing the training data, the next step is to save the results into an Inference Graph (.pb) file. Data testing is done by entering new images and data models that have been trained previously. Then the classification will be carried out, which will be given a label based on the class that has been determined.

2-2-Support Vector Machine
Unlike CNN, the SVM classification model will use an additional technique, namely Multiclass SVM, because it will classify with many classes. Multiclass SVM in this study will combine all SVM binary data from various classes into an optimization process that must be completed. This approach is also known as the One-Against-All approach. This method is built on the SVM binary model with multiple classes. The classification model will be trained with all data to produce accurate class classification results.
This system is the same as the previous system, which is generally done through processes such as Pre-processing, Training, and Testing. In the Pre-processing stage, the image data are grouped by the class, which will go through the Cropping process so that only the desired object is displayed. The following process is resizing from the original image size to 227 × 227 pixels and changing the original image's color to an RGB image. Unlike the CNN method, the SVM process does not require manual labeling. Still, it is carried out automatically through a script program to generate an array of values before the training process.
The training process will utilize the values of the id array, label array, and image array on each image data to generate an SVM model. Making the SVM model will use the framework, namely Sklearn. After the SVM model is generated, further testing will be carried out by entering new image data. The new image data will also go through the Pre-Processing process as before. The Testing process will produce a value in the form of an id from 0 to 10 according to the order of the class. The result of the id value will be classified and will be predicted based on the class name. All the processes in SVM are in Figures 3-a and 3 The final training process obtains the resulting CNN and SVM approach. This model is also known as data knowledge used in the testing process. This proposed data knowledge consists of a lot of parameter algorithms. The parameter has a function to ensure and classify the input images during the testing process.

3-1-Convolutional Neural Network
The first CNN Pre-Processing step is the labeling process for each image using software called LabelImg. The labeling process is done in the software by opening the storage file from the image data. After the image data is displayed, the next step is to drag the plastic bottle part on the image data and label it according to the class name used. Files that have been labeled are then saved in .xml format in the same folder. Labeling is carried out for all images used, including those in the validation data folder. After labeling the image, the next step is to convert the resulting .xml file into .csv format. The Tensorflow library has provided a program script file that can be used to change the file from XML to CSV format. The next step is to configure TFRecord using the python script program. The contents of the configuration are the class names that will be used for the classification process. Next is creating the Labelmap file, which will later be saved in .pbtxt. The contents of the Labelmap consist of item names from plastic bottle brands consisting of numbers 1 to 10. Before the training process is carried out, the last file that needs to be prepared is the configuration of the faster_rcnn_inception_v2_pets.config script program. The script contains the path or address for the location of the image data storage and other required files. The training process is carried out using the Tensorflow Framework. Training data is started by running a script python program. During the training process, the value of the loss and the training steps that have been carried out will be displayed. The training process will read all the data training and re-run while determining the epoch reached. The Loss value generated based on Equation 2 in the last epoch is obtained at 0.0207. The Loss Training graph looks like in Figure 5. The Loss value already represents that the results of the model are good. Because the smaller the resulting loss value, the better the resulting model. The graph comes from a feature owned by Tensorflow, namely Tensorboard. This feature contains a recording of the training process carried out, such as the number of steps and the number of losses stored in real-time. The training process will be stored at the checkpoint so that the training process can be postponed or resumed. The training results are then converted into an Inference graph. The system will use the file as data for classification. Data testing is done by running the python program, which is already available in the Tensorflow model. The detection resulted in the CNN method of test data will be displayed and will be predicted by displaying a square image on the detected object, as shown in Figure 6.

Figure 6. Result Classification with CNN
The test is carried out with the classification of 10 new plastic bottle image classes, each of which has a total of 10 image data. Based on the experiment results, the system's accuracy calculation based on Equation 1 has been seen in Table 1. The results of system testing using the Convolutional Neural Network (CNN) method have a high level of accuracy, 99% of accuracy. In more detail in terms of the level of accuracy, the Confusion Matrix in Table 2 describes the detection rate generated from the CNN system. The accuracy and error detection of each class is presented in the form of a detailed table. The training results that have been carried out show that the classification technique using the CNN method has a good level of accuracy. The accuracy value can still be improved by increasing the amount of data for training and using variations of the shape of the position of each image.

3-2-Support Vector Machine
The pre-processing method used in SVM is the same as the CNN method. The details scheme of pre-processing and data split can be seen in Figure 3-a. The data will be divided into ten folders based on the name of the plastic bottle. The following process is converting the image from RGB to BGR and changing the image size to 227×227 pixels. Before the training process, the image data was given a GrabCut segmentation process to remove the background and only left objects are displayed, as shown in Figure 7.

Figure 7. Result GrabCut Image
Based on the GrabCut results above, only the plastic bottle object is visible. The final result of this process looks clearer because it only emphasizes the object used. However, this process takes a long time because it is done manually on each image data. Different data training is conducted using the CNN method, and the SVM method is carried out by dividing the data into two, namely train data and validation data. The training and prediction processes are carried out using the Python programming language. It also uses a framework from Tensorflow. However, the training process must be carried out repeatedly to run the system. Table 3, the SVM test has a lower success rate than the CNN method. The error rate in this test is relatively high. The system can predict correctly as many as 74 of the 100 data tested. Based on Equation 1, the results of testing, the resulting accuracy is 74%. Similar to the CNN approach, the Confusion Matrix in Table 4 describes the detection rate generated by the SVM system. The accuracy and error detection of each class is presented in the form of a detailed table.

4-Discussion
According to the findings of all of the research, the two systems for classifying plastic bottles offer a number of advantages and disadvantages. The advantages and disadvantages of the system creation process and the level of system accuracy are divided into two parts of the assessment.
The process of making the system between the CNN method and SVM has several differences, such as the CNN method is done by giving a label to each piece of data used. This method can be advantageous because it does not require a segmentation process or special feature extraction. However, the disadvantage of the labeling process is that it will take a long time if the data to be carried out requires a lot of training. The SVM method is much more complex where the image to be carried out in the previous training must go through a segmentation process. This process can be an advantage because the segmentation process will remove objects that are not needed. However, this process also has drawbacks where the segmentation results do not necessarily match the desired characteristics, and it takes a long time because it is done manually.
In the training process, the two methods also have differences; among others, CNN requires a GPU device. The use of this tool is intended to speed up the data training process. However, the use of this device requires no small amount of money besides the specifications of the computer used must also be adjusted. This contrasts with SVM, which does not require devices like the previous method and is enough for simple computing to run the system. In addition, the training process of CNN takes a long time because it has to produce a slight loss to produce a good model. However, this is only done once. In contrast to SVM, each classification process first requires a training dataset. It takes time for the classification process.
The classification results from the two methods also have differences, including the fact that the results from CNN will be displayed by giving a box to the recognized object. Whereas, in the training process with the SVM method, the classification results are shown by displaying the value and class name at the comment prompt without providing a visualization of the recognized object.
Finally, the research findings indicate that CNN is highly recommended for the practical automation of waste management, especially plastic bottles. Compared to SVM, CNN has overcome inaccuracy problems and is easy to develop in an actual application. The SVM method in several works of literature, such as [29,30], has good accuracy in a small number of classes. The research conducted by Lamberti (2021) [29] and Sonmez (2022) [30] uses two-three classes in order to classify tasks. Their research implies that the result of the SVM network obtains better accuracy performance than CNN. However, for classification with a higher number of classes, the accuracy of the SVM method is not too high. This confirms that the CNN method has very high performance and surpasses the SVM method in the classification level. This result is in line with the research on a multi-class database by Han (2021) [31], which concludes that the CNN method is very suitable to be implemented in classification with many classes. Then, to get better accuracy and simpler application, it needs more comparison with other machine learning algorithms.

5-Conclusion
Based on the results of the two methods used, namely Convolutional Neural Network (CNN) and Support Vector Machine (SVM), for the classification of plastic bottles, it is shown that the two methods have different manufacturing processes. The most visible difference is that during the pre-processing process, CNN performs image labeling to determine the brand of plastic bottles, while SVM only divides the image into several classes and performs segmentation with GrabCut. The process of training data on CNN is much longer than on SVM. This is because CNN conducts more in-depth training than SVM. The SVM training method is much easier than the CNN method. According to the previous research, the number of classes in the database affects the accuracy of the artificial intelligence approach. SVM dilate with a small number of classes has a high level of accuracy and correctness of classification compared to CNN. However, CNN surpasses the SVM approach in terms of the level of accuracy for a large number of classes. The proof using a database of 10 classes in this study is in line with the results of previous studies. The results of the training for each method based on the confusion matrix show that the SVM method has a higher classification error rate than the CNN method. The accuracy generated by CNN is 99%, much more significant than the 74% accuracy generated by SVM. Therefore, based on the results that have been obtained, CNN is better used for plastic bottle classification compared to the SVM method.

6-2-Data Availability Statement
The data presented in this study are available in the article.

6-3-Funding
This Research is funded by the Universitas Ahmad Dahlan, Yogyakarta, Indonesia.

6-4-Acknowledgements
The authors would like to extend special thanks to Universitas Ahmad Dahlan for providing access to all Laboratory and article funding.

6-7-Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this manuscript. In addition, the ethical issues, including plagiarism, informed consent, misconduct, data fabrication and/or falsification, double publication and/or submission, and redundancies have been completely observed by the authors.