Development of Computer Vision Algorithms for Multi-class Waste Segregation and Their Analysis

Classification of waste for recycling has been a focal point for scientists interested in the field of conservation of the environment. Recycling consists of numerous steps, of which one of the most crucial is the segregation of recyclables from all other waste. Due to a lack of safety standards in developing countries, waste collection is often done manually by domestic helpers, or "rag-pickers". Such a process risks individual and public health. The waste collection methods may ultimately cause waste to become non-recyclable due to cross-contamination. Literature shows that research in this direction focuses on a single class of waste detection. The proposed work investigates CNN , YOLO , and faster RCNN-based multi-class classification methods to detect different types of waste at the collecting point. The smart dustbin proposed employs these computer vision methods with a Raspberry Pi microcontroller and camera module. The experimental results for multi-class classification show that the CNN has 80% of accuracy with 60% of the loss. Whereas the YOLO algorithm shows an accuracy of 88% and a loss of 40%. But the best results were obtained from faster RCNN object detection with API, with an accuracy of 91% and a loss of 16%. There is already an existing method for making a smart dustbin, so the results are compared to show how computer vision can be used to make a smart dustbin. This shows how computer vision can be used to make a smart dustbin. methodology, N.N., A.R., and S.H.; software, N.N.; validation, A.R. and S.H. formal analysis, S.H. and A.R.; investigation, A.R., N.N., and S.H.; resources, A.R.; data curation, N.N.; writing— original draft preparation, N.N., A.R., and S.H.; writing—review and editing, N.N., A.R., and S.H.; visualization, N.N., A.R., and S.H.; supervision, A.R.; project administration, A.R.; funding A.R.

as it helps to promote better public health and also increases livelihood through the improved quality of recycled materials.
Due to the growth of the urban population globally, waste generation is increasing in record numbers. However, in developing areas, for example, Varanasi city in India, only 24% of the population segregates waste for recycling [2]. Because 89% of the city's population had never received any formal education on waste segregation. People do not know about the advantages of segregation and the disadvantages of the cumulation of waste together. Hence, a problem arises since household or community bins generate waste, which falls on the local municipality to manage waste segregation as shown in Figure 1. a sample of a landfill makes all wastes generate cross-contamination. Moreover, waste collection has become an enormous task, especially in big metropolitan cities like Mumbai, which generates 9400 tons of solid waste every day [3]. This huge amount of waste pollutes the environment and ground water as well. So, it is necessary to find a solution to reduce pollution.

Figure 1. A landfill site
The research article aims to find optimal multiclass waste segregation methods at the source point. Three image classification algorithms are taken into consideration to classify the waste. These algorithms aim to have high accuracy and reduce computational time and the complexity of the hardware required. This research proposes building smart dustbins that use these algorithms to segregate waste at the source point.
The rest of the sections in this study are arranged as follows: Section 2 deals with a detailed literature review. Section 3 explains the methods of the proposed work, whereas Section 4 discusses the development of algorithms to detect material for waste classification. In Section 5, analysis of algorithms presents results obtained. The paper is concluded, and the future scope of the work is presented in Section 6.

2-Literature Review
Balamurugan et al. [4] described a low-power waste management system that uses cannibals to collect the thrash that is decomposed daily. The GSM module and the Arduino Uno microcontroller send information regarding three levels of waste. Jain et al. [5] analysed problems concerning waste management in the world. The improper planning of waste management and the lack of technical support are the main reasons affecting citizens' health conditions. The waste management system has four models based on size, budget, route, and waste processing machines. The risk management module helps the municipal corporation of a city manage waste environmentally and economically. Sreejith et al. [6] designed a dustbin that avoids contamination of waste during rain and sends information about its level. The method mentioned is effective when solid waste disposal utilizes robotic technology. Saha et al. (2017) explained waste management strategies [7]. Such a process gives revenue. The suggested IOT technology treats the waste for animal feeding, recycling, composting, fermentation, landfills, and burning. Adam et al. [8] suggested the use of wireless sensor networks and IoT technologies to manage waste and also suggested real-time monitoring of containers and their levels.
Cristina [9] et al. analyzed waste management in two ways. The first one was the application of panel data order, and the second was the use of bootstrapped truncated regression. The results obtained show that certain local governments' political and social-economic factors increase cost-efficiency. White and Beaven [10] briefed on the LDAT landfill model. The input and output data are obtained from calculations using the full sets of data to convert conventional waste characteristics into degradation. Ferronato et al. [11] modelled waste management using different data normalizations. In this respect, economic scale is confirmed, along with the critical role of an adequate waste facility in cost minimization. Ferronato et al. [12] explained the geographic information system that enables the selective collection of municipal solid waste in developing cities. The study shows that the implementation of formal and informal recycling is the main advantage. Viau et al. [13] aimed at a life cycle assessment that needs to be critically analysed to recover recyclables from the municipal solid waste management systems. Laura et al. followed the pioneering approach [14] to obtain a global inefficiency score and an individual inefficiency score for each variable integrated into the model. The results indicated that one-third of the municipalities evaluated were eco-efficient in the provision of services. Riedewald et al. [15] explained the eco-efficiency assessment of municipal solid waste services by exogenous variables. The results obtained show that a reliable and accessible market for solid waste is available. Kumar et al. [16] estimated the generation rate of different plastic wastes by machine learning and made revenue recovery from the recycling process. Xu et al. [17] studied artificial neural networks to solve solid waste-related issues. ANN is widely used in the literature for waste generation and technological parameter prediction. Jain et al. [18] explained that heavy metal content in soil reclined from landfills and claimed that solid waste was characterized by the concentration of various heavy metals. Funch et al. [19] classify glass and metal waste using convolutional neural networks. The obtained results support the CNN method for real-time waste classification. The SSD (Single Shot Multi Box Detector) method was suggested by Liu et al. [20] for detecting objects in images using a single deep neural network. The SSD method creates bounding boxes of different aspect ratios and scales for discretising the output.

3-Methodology
The research article aims to find the most suitable algorithm for optimal multiclass waste segregation. Three image classification algorithms are taken into consideration to classify waste. There are six most common categories of waste, such as cardboard, glass, metal, paper, plastic, and other trash. The implementation looks at accuracy, computational time, and the complexity of the hardware required. Such algorithms aim to reduce the involvement of humans in waste management and to provide safer working environments. The reduction in human effort also increases the quality and quantity of waste while segregating it. Hence, it is possible to build smart dustbins that use these algorithms to segregate waste at the source point. The smart dust bin employs the Raspberry Pi microcontroller with a camera module as shown in Figure 2 for an economical design. Such smart bins will make segregation easier for the local municipality to manage waste collection. The municipality can further transport segregated waste directly to recycling plants. The recycling plants get materials without much effort through such operations.

3-1-Concepts of Convolutional Neural Networks
The convolutional neural network (CNN) is the class of neural networks very commonly applied to image classification [21]. The biological neural networks inspire a convolutional neural network. They are regularized multilayer perceptrons with fully connected networks. The pattern connection in the neural network resembles the pattern of connection of neurons in the visual cortex of animals [22]. The convolutional neural network uses less pre-processing of images when compared to other image classification algorithms. The CNN consists of an input layer, an output layer, and multiple hidden layers as shown in Figure 3. The hidden layers convolve using a dot product and the Rectified Linear Activation function (ReLU). The input is usually an image tensor that is primarily several images x height x width x depth, and the output is the image with the appropriate bounding box and label. Each neuron computes the output value by applying this activation function to the value it gets as input from the previous layer. The CNN uses pooling to reduce the dimensionality of the data by combining the outputs of the neurons of one layer into single input to the next layer.  Figure 4 shows that how convolutional neural networks takes an input image, converts it into a vector of labels, and exhibits the phenomena of pooling. The initial image is divided into an 8×8 vector and then converted into a 7×5 vector by the previous layer. The current layer converts into a 2×2 vector. This process continues until the output is a n×1 vector, where is the number of images.

3-2-Theory of YOLO
You Only Look Once Algorithm (YOLO) is an object detection algorithm that comes in four versions. The proposed research work employs version three. The CNN becomes the base for YOLO.CNN divides an image into an SxS grid and draws a bounding box around the parts of the image classified during training. The YOLO has twenty-four convolutional layers, followed by two fully connected layers. It reasons globally about the image when making the predictions [25,26]. Figures 5-a and 5-b demonstrate how an image gets divided into a grid. In the grid, each box is given a number by calculating the probability. The box contains the object that can be detected. After calculating all probabilities, a bounding box is drawn to encompass the object, and then a label is given. The advantage of YOLO over CNN is that it performs object detection directly over the images rather than simultaneously predicting multiple bounding boxes and class probabilities for those boxes [27]. The class specific probability [28,29] for each grid cell is defined as in Equation 3: The output pixel is calculated as follows in Equation 4 to 7: where T represents time of data transmission, 2 are operations, Tin, Tweight and Tout are trips counts and DSin DS weight and DSout data block sizes. The Loss function is expressed in Equation 8: where, bx and by are variables refer to centre of prediction, bw, bh are dimensions of bounding boxes, λ cocord, λnoobj are to increase emphasis, C refers to confidence(c) is classification of prediction, is jth bounding box in the I th cell, is the I th cell.

3-3-Theory of Faster RCNN
Faster RCNN is a Region based Convolutional Neural Network (RCNN) used for real time object detection. It is a search region proposal network. The RCNN uses anchors on images to centralize bounding boxes around the image and it detects potential regions of the target object and eliminates them by probability. Faster RCNN becomes useful because both region proposal and object detection are made simultaneously [30]. Such action increases the speed. Hence the algorithm delivers results faster. Region Proposal Networks (RPN) based on CNN is used instead of RPN based on

(b)
Selective Search which was used by its predecessors RCNN and Fast RCNN [31]. More over the detection network also uses CNN. While conducting region proposals, the RPN [32,33] uses a sliding window to specify each location on the feature map. These regions use anchor boxes, which are dependent on the scales and aspect ratios to generate region proposals. The subsequent two layers identify an object in those regions and the bounding box needed for those objects. After the RPN returns its results, CNN then classifies the object detected by the RPN. Figure 6 displays the working of the Faster RCNN as explained above. Its loss function is like YOLO, but the model is more accurate as total loss eventually plateaus with each iteration.
where x, y are the coordinate of the image and w and H are width and height of the image.
The loss function is given in Equation 10 as: where; L, LC, and Lt are the joint loss, classification loss and regression loss, Nc and Nr are number of categories, λ and k represent the weight coefficient, pk and * is the probability that box k is the object, tk is the predicted offset box * is the offset between the anchor box and actual box.

4-Development of Algorithms to Detect Material for Waste Classification
In order to develop algorithm for its implementation level, simulations were run on a machine with Windows 10 as the operating system. The programs for simulation were written in Python 3.7 using Keras and TensorFlow 2.0 to create, train and test the models. Microsoft VoTT (Visual Object Tagging Tool) is used to label the images. Figure 7 shows how algorithm implementation classifies the waste by flowchart.

4-1-Implementing CNN Model for Waste classification
Following steps shows implementation of CNN model:  Step 6. The detector gets the results after completing its training.

4-2-Implementing the "You Only Look Once" Object Detection Model for waste classification
The series of steps to be followed for implementing algorithm in YOLO.
 Step 1. Images get splitted and stored in two folders having names as test and train.
 Step 2. Then, the images are annotated in the train folder using VoTT which gives an output in the form of a CSV File.
 Step 3. This CSV file creates a text file that is readable by the training program. YOLO Algorithm gets trained by the text file which contains the dimension of bounding boxes  Step 4. Then the weights are downloaded for training code that uses the darknet weights.
 Step 5. Then the Object Detector is trained as indicated in Figure 8.
 Step 6. After Training is complete, the code is run for labelling the images based on a trained dataset on the images in the Test folder as revealed in Figure 9.

4-3-Implementing the Faster RCNN for Training and Classification
The following steps explain algorithmic ways for the implementation of faster RCNN:  Step 1. Both the training and the testing images get annotated using labelling, resulting in an XML file as output  Step 2. The information has two CSV files, like the one made for the YOLO algorithm.
 Step 3. These files generate tf. records and maps to the config file.
 Step 4. The config file gives details like input and output directory, learning rate, and neural network structure.
 Step 5. Then the training is done using config file and the iterations are executed as conveyed in Figure 10.
 Step 6. The training is stopped, and the inference graph is frozen.


Step 7. Then the trained model is run through test images, and the output is evaluated.

5-Analysis of Algorithms
CNN, YOLO and RCNN algorithms get images of waste for their processing. These three algorithms get trained for classifying waste. Once identification and classification of waste are done, the image file of each waste is stored in appropriate folder. Figure 11 expresses classification of the images by CNN and their probability graphs. The probability graph indicates the possible classification of images. The glass gets classified with maximum probability. In most of the graphs, it is seen that the algorithm predicts only one class. However, the algorithm predicts glass with two other classes metal and plastic because of their similarities between current image and the images used to train the network to identify metal and plastic. The network was trained on a total of 2500 training images with a train/test split of 90/10. During training, a batch size of 32 is taken initially. However, due to the system's capacity under training the network, the batch size was reduced to 8.

Figure 11. Classified Images and their Probability Graphs
The graph in Figure 12 displays the accuracy when the network is trained. It is observed that the accuracy goes through small fluctuations initially, but overall, it follows a trend of eventually increasing until the final accuracy of 80% is obtained.  Figure  12, where the loss keeps fluctuating per iteration, but the overall performance of loss decreases. However, towards the 100th epoch, there is an increase in the difference between training and validation loss. This difference indicates that the model is moving towards over fitting and measures are needed to prevent over fitting in future runs.

5-2-Results and Discussions of YOLO
The timing details of YOLO network to finish training are monitored. Initially, like the CNN algorithm, the batch size had to be reduced from 32 to 8 due to the computer system's capabilities. The network initially trained over 51 epochs with a specific learning rate. Again, the network gets trained with another 49 epochs at a reduced learning rate. This method is adopted in order to prevent overfitting by reducing the learning rate when the loss was plateauing. After the training was over, the algorithm was tested for over two hundred images. The results showed an accuracy of 88%. Thus, although the algorithm took more time than the simple CNN for training, it has better accuracy. Figure. 14 displays six tested images among two hundred images. The accuracy shows that YOLO algorithm is better than the CNN algorithm. The algorithm also is better in classifying multiple wastes in the same frame. Such method of classifying waste proves that YOLO will be a better algorithm for rovers going into landfills and conducting waste segregation. Figure 15 proclaims the loss of YOLO over ninety iterations. The first ten iterations are not shown in the above graph as the loss was high, and therefore, this high loss would hide the tiny fluctuations in the loss for the next iterations. Similar to CNN, there are small fluctuations in the validation loss but overall loss tends to decrease. However, unlike CNN, there are no fluctuations in the training loss. The difference between training loss and validation loss is constant which indicates that the network is not overfitting over the images. However, the loss from YOLO is much higher than the loss due to CNN because YOLO requires more training images to classify the images into six classes. Further test results in the same trend where images are predicted more accurately for some classes than others. There are more similar images available in one class than the other class as shown in the graph.  Figure 16 exhibits the percentage of images correctly classified for each class and having 252 images. The classifier correctly classifies cardboard and metal with 88% accuracy. The paper and the plastic get classified with least accuracy due to plenty of paper images are similar to cardboard, and many plastic images show some resemblance to glass. To mitigate this problem, an extensive dataset of images is needed, which can be easily obtained by local municipal authorities.

5-3-Results and Discussions of Faster RCNN
Along with faster RCNN, TensorFlow employs Application Programming Interface (API) for object detection. This API helps to implement various object detection algorithms. For example, for those municipal corporations that cannot afford high end equipment for every dustbin in the city, lower cost equipment for object detection mechanism [9] can be used. The iterations in number 16000 were performed for training. Then the model runs through the testing images. The obtained results showed an accuracy of 91%. The loss by execution of the model was around 0.16 while making the best classifier so far. However, this algorithm requires high power and an extensive amount of data to work. Initially, the loss was oscillating from almost zero to exponential values. However, the learning rate was reduced after 2500 iterations, and loss decreased to 16 % after 16000 iterations, as shown in Figure 17.

Figure 17. Total Loss
The training gets stopped after the loss metric was found as shown inference graph in Figure 17 python trains these objects to recognise them from various photographs of common waste. Figure 18 demonstrate that these algorithms are accurate enough to classify wastes. They can be run even on live video footage, which would be used in smart dustbins.  Figure 19 compares the accuracy and loss of various models used to train smart dustbins, and their performance values are available in Table 1. CNN shows 80% accuracy when classifying waste. At the same time, the loss was found to be 60%. The accuracy of 88% and loss of 40% are exhibited by the YOLO algorithm. The faster RCNN object detection with API generates an accuracy of 91% and the loss is 16%. It is seen that the faster-RCNN object detection model has the highest accuracy, and loss is less. TensorFlow object detection via the API can also be used with other models, such as low-resolution mobile cameras.  CNN does not require annotation and has the fastest training time. However, it has the lowest accuracy and maximum loss. YOLO takes more time and requires annotation, but is more accurate than CNN and can be used on systems with less power. Faster RCNN has the highest accuracy and the most negligible loss. However, it requires the most computational power and an extensive dataset. Therefore, it is the most computationally complex algorithm. TensorFlow object detection using the API provides a great infrastructure that can host a variety of models for the segregation of waste. It provides scripts in which trained models can be used in image and live video feeds, making it the best infrastructure for the smart dustbin. Therefore, depending on the budget and the capability of existing systems, the municipal corporation can design a dustbin by employing any one of several algorithms. Since computer vision is employed to identify and classify waste, it becomes a strength of this research if a proper image of the waste is analysed. The limitation of research is that images of waste should be the same as those of stored images. If the shape of waste is changed and the same images are processed, it leads to erroneous results.

6-1-Further Research
The PC system with high configuration employs these algorithms to classify the waste. These algorithms can also build a smart dustbin using a microcontroller with a camera model so that its cost is lower and more affordable. Research on the combination of two or more computer vision algorithms may be implemented to increase accuracy. Furthermore, the dataset used for this study is very limited in terms of the shapes and materials of waste available. Therefore, testing with an extensive dataset is crucial. Lastly, our team wishes to combine the CV algorithm tested here with the sensors currently available on the market to continue our research further to make an economically viable smart dustbin.

7-2-Data Availability Statement
Data sharing is not applicable to this article.

7-3-Funding
The authors received no financial support for the research, authorship, and/or publication of this article.