Optimization of Markov Weighted Fuzzy Time Series Forecasting Using Genetic Algorithm (GA) and Particle Swarm Optimization (PSO)

The Markov Weighted Fuzzy Time Series (MWFTS) is a method for making predictions based on developing a fuzzy time series (FTS) algorithm. The MWTS has overcome certain limitations of FTS, such as repetition of fuzzy logic relationships and weight considerations of fuzzy logic relationships. The main challenge of the MWFTS method is the absence of standardized rules for determining partition intervals. This study compares the MWFTS model to the partition methods Genetic Algorithm-Fuzzy K-Medoids clustering (GA-FKM) and Fuzzy K-Medoids clustering-Particle Swarm Optimization (FKM-PSO) to solve the problem of determining the partition interval and develop an algorithm. Optimal partition optimization. The GA optimization algorithm’s performance on GA-FKM depends on optimizing the clustering of FKM to obtain the most significant partition interval. Implementing the PSO optimization algorithm on FKM-PSO involves maximizing the interval length following the FKM procedure. The proposed method was applied to Anand Vihar, India’s air quality data. The MWFTS method combined with the GA-FKM partitioning method reduced the mean absolute square error (MAPE) from 17.440 to 16.85%. While the results of forecasting using the MWFTS method in conjunction with the FKM-PSO partition method were able to reduce the MAPE percentage from 9.78% to 7.58%, the MAPE percentage was still 9.78%. Initially, the root mean square error (RMSE) score for the GA-FKM partitioning technique was 48,179 to 47,01. After applying the FKM-PSO method, the initial RMSE score of 30,638 was reduced to 24,863.

calculate data using ordinary time series methods. This method is widely used in the fields of financial forecasting [10], air pollution [11], and temperature [12]. The development of the Fuzzy Times Series method has been widely carried out. Such as the research by Chen [13], who reduces the computational complexity of the Fuzzy Time Series model using a fuzzy logic relation table; Huarng, who expands Chen's model by adding intuitive information [14,15], and Yu [16]. They introduced the weighted fuzzy time series method, which considers the repetition of fuzzy relations to provide better forecasting accuracy. Efendi carried out the development of the weighted fuzzy time series method [17]. Tsaur [18] incorporated the fuzzy time series method with the Markov chain to develop the Markov Chain Fuzzy Time Series method. The model is used to get the most considerable probability value based on the transition probability matrix that has been formed. Specifically, research related to air pollution forecasting was also carried out by Alyousifi [19], where the Markov Weighted Fuzzy Time Series method was used in this study.
This study uses the Markov Weighted Fuzzy Time Series method, which has been studied by Alyousifi [19] as the primary method for forecasting. This method is considered able to cover the shortcomings of the Markov Chain Fuzzy Times Series method, especially in forming the right weight. Another weakness of the fuzzy time series method lies in grouping data that does not have special rules, giving different accuracy values. In this condition, the researcher uses the fuzzy k-medoids method as a clustering method. Dincer and Akkus [7] proposed this method because it reduces the negative effect of outliers on the data. The main development of the method used by the researcher lies in the existence of an optimization method. Two optimization methods proposed are Genetic Algorithm optimization and Particle Swarm Optimization. The Genetic Algorithm method will improve the fuzzy k-medoids clustering process. In the fuzzy kmedoids clustering process, an initial value for the membership function is required to determine the initial cluster centre. This initial value was chosen randomly by the researcher. The Genetic Algorithm method will help in the process of determining the initial value so that it will improve the fuzzy k-medoids clustering. Research conducted by Istianto [20] shows that using a genetic optimization algorithm can improve accuracy. Kuo [21] introduced another optimization method, namely Particle Swarm Optimization, to improve the interval formed by the clustering method. Similar to the Genetic Algorithm optimization method, the Particle Swarm Optimisation method also significantly improves accuracy. In this study, the Particle Swarm Optimization method will enhance the interval formed by fuzzy k-medoids clustering. The gap improvement will improve the formation of fuzzification to improve the forecasting value. These two optimization methods will be compared to find which method is better for optimization in the case of air forecasting.

2-2-Fuzzy K-Medoids
The fuzzy k-medoids (FKM) is one of the clustering methods used to group data into k clusters. This grouping uses a distance criterion to determine, calculated from the cluster's centre to the data point. The fundamental difference between the FKM method and the Fuzzy C-Means (FCM) lies in choosing the cluster centre. For example, the cluster centre in the FCM method sometimes has a variable value in the universe of discourse, while that of FKM is located at the data points (medoids) [23].
It is to be noted that the FKM calculation used has the same concept as the FCM method, while the difference only lies in the final step of determining the cluster centre as a fuzzification step. Both ways minimise the value of the objective function to get good clustering results. Meanwhile, the equation for this objective function is as follows: The FKM method also uses membership degrees for the calculation of the cluster centre, in which the initial value of the membership degree is formed randomly according to the number of existing clusters, and a change in this value is made from each iteration, using the following equation: After obtaining the value of the membership degree, the centre of the cluster can calculate using the following equation:

2-3-Fuzzy Set
The fuzzy set is expressed as a set of ordered pairs element x and their membership values [24]: where ( ) is a characteristic function of set A, and it is a symbol indicating that ( ) in set A is the degree of membership of x in set A. Afterwards, the membership function ( ) maps to the membership room = [0.1].

2-4-Membership Function
The membership function is a curve that maps data input points into membership values with an interval of 0 to 1, and in this present research, its shape is triangular.

2-5-Fuzzification
Definition 2.5.1 fuzzification is changing a crisp value into a fuzzy value [25]. This value mapping produces fuzzy values that are used to form fuzzy relations. This fuzzification is performed on all firm set members [26].

2-6-Defuzzification
Definition 2.6.1 Defuzzification is the opposite of fuzzification, where the fuzzy set is changed into an absolute value [27]. The input of the defuzzification process is the value of the fuzzy set that has been obtained from the composition of the fuzzy rules. At the same time, the result is the output obtained from the fuzzy logic control system [28].

2-7-Fuzzy Time Series
Fuzzy time series is a prediction method that uses the principles of fuzzy logic, and it is generally used on historical data in the form of linguistic data. There are several stages in the fuzzy time series model, including defining the universe of discourse U, dividing U into several intervals or classes, fuzzification, forming fuzzy relations, defuzzification, and determining predictive values [19,29].
where : → [0.1] is the membership function of the fuzzy set and ( ) is the degree of membership of to with 1 ≤ ≤ . Subsequently, the membership degree value ( ) is defined as follows [30]: Suppose ( ) with = 1,2,3, … is a subset of ℝ and the discussion universe of fuzzy set ( ) and ( ) is a set of ( ), then ( ) is called the fuzzy time series on ( ) [7].
Suppose ( ) is only affected by ( − 1), then the fuzzy relation between ( ) and ( − 1) is stated as follows: where the symbol "°" is the Max-Min operator, the relation R is called the first-order model of ( ) [7].
For example, ( − 1) = and ( ) = , then the fuzzy relation between ( ) and ( − 1) is called a fuzzy logical relationship (FLR), which is denoted by → where is the left-hand side (LHS) and is the right-hand side (RHS) of the FLR [7].
Fuzzification is one of the stages in the fuzzy time series where each data is converted into a linguistic value for further formation of FLR. The construction of this fuzzification requires a value known as the upper and lower bound, and these two values are obtained from the calculation of the clustered interval [31].
For the lower bound value on lb 1 and the upper bound on ub , the following rules are used:

2-8-Weighted Fuzzy Time Serie
The weighted fuzzy time series is a method that uses weights as an indicator for forecasting calculations. Its value is obtained from the FLR repetition to form a weight matrix [17]. Suppose there is → , , , ( , , , = 1,2,3, … , ) which is a Fuzzy Logical Relationship Group (FLRG) where has 1 relationship with itself, 2 corresponds to , 3 corresponds to , and 4 corresponds to , where 1 , 2 , 3 , 4 ∈ . The value of 1 , 2 , 3 , 4 is called the iteration of FLRs and the weight value is written as follows: with, then, with = 1,2, … , where < , then equation (15) can be rewritten as follow: The weight element is represented as a weight matrix as follows:

2-9-Markov Rule
The Markov is a forecasting method introduced by Andrei A. Markov in 1907, which uses the behavioural analysis of variables in the present to predict the behaviour of those in the future. It is also known as the Markov Chain method because it has chain properties [32,33].
According to the concept of FLR and FLRG, the value of the future probability using the Markov transition probability matrix model is determined using the equation below: where is the value of the transition probability from to , is the number of moves from to , and is the total data in . Therefore, the transition probability matrix is written as follows: According to Alyousifi [19], there are two rules for forecasting calculations using Markov's rule, which are explained as follows: Rule 1. When the FLRG value of is one-to-one, it implies there is only one transition state from to where and are fuzzy linguistic variables, then the forecasting value of ( + 1) = is the middle value of the lower and upper bound in the -th cluster. Mathematically, this is written as follows: Rule 2. When this FLRG value of is one-to-many, it means there is more than one transition state from to where 1 , 2 , … , is the middle value of the lower and upper bound in the -th cluster while is a variable that is replaced with ( ) in the state to get a better accuracy value.
After the ( + 1) value is obtained, the forecasting value is calculated using the following equation:

3-1-Genetic Algorithm
A genetic algorithm (GA) is usually used to determine the optimal solution to a problem by selecting the best that matches the criteria of the fitness function out of several solutions. GA has an excellent global search capability and simple to multiple obtained solutions [34]. GA is an evolutionary algorithm based on Darwin's evolutionary approach in the field of Biologies such as inheritance, natural selection, gene mutation, and crossover. This GA has four components, namely chromosomes, crossover, fitness, evolution or mutation, and population [35], and it has been widely used to resolve cases in public services or processes in companies [36].
The three basic operations in GA are described as follows [37,38],  Selection: This is selecting a new chromosome from the population as a paternal successor to produce new offspring. Before this stage, it is essential to first measure the optimal degree of a chromosome with a fitness value, which shows the extent to which the chromosomes have a chance to survive and reproduce as parents in the next iteration. In this present research, the objective function of FKM taken as a fitness function of GA is as follows: It also uses a roulette wheel selection that often shows good performance in selection operations that is conducted by firstly determining the value of the selection probability using the following formula: where, indicates the fitness degree of the -th population. Afterwards, the elite descendants with the best fitness degrees are selected from the new generation directly to the next iteration. The chosen elite offspring do not participate in the crossover and mutation processes in the current iteration. Furthermore, another new parent generation is set from the rest of the population using roulette wheel selection.
 Crossover is the process of reproducing child chromosomes to explore the search space, where two-parent chromosomes exchange genes to share information.
 Mutation: This refers to the advantage of sudden changes in genes to exploit the search space, where some of the selected chromosome's genes are mutated based on possible mutations. Meanwhile, it is essential to note that genes often mutate randomly.
Subsequently, the reproduced and the mutated chromosomes continue in the next generation. Figure 1 is an illustration of the genetic algorithm process:

3-2-Particle Swarm Optimization
Particle Swarm Optimization is an optimization method taken from a flock of birds' behaviour while finding solutions to nonlinear problems. Individuals from a group are referred to as particles, and each particle influences the group or swarms in such a way that when one of them discovers the fastest or optimum route to get food, other individuals in the same group follow in its footsteps [39].
It is important to note that there are two types of learning experienced by these particles. In a scenario where each particle had to learn from its own experience, it is called cognitive learning, but when they are to learn from the experiences of other particles, it is called social learning. In cognitive learning outcomes, the best method that was selected is represented as , while in the results of social learning, the best method selected is denoted as [40], therefore, the particle movement to a new point is calculated using the following equation: where, with,

4-1-Mean Absolute Percentage Error
The forecasting model to be used needs to be identified for its accuracy. One method of calculating the model accuracy is the Mean Absolute Percentage Error (MAPE) which determines the error generated from the related model. Therefore, forecasting results are excellent when they have a MAPE value of less than 10% [41,42].
where, is the number of data, is the original data at time , and is the forecast value at time .

4-2-Root Mean Square Error
Root Mean Square Error (RMSE) is a method used to evaluate forecasting models. Its value is obtained by squaring the difference between the actual data and the predicted results previously divided by the number of data. When the RMSE value is close to zero, the forecasting results are more accurate [42,43].
where n is the number of data and is the difference between -th actual data and -th forecast value.

5-1-MWFTS-GA Model Algorithm
The MWFTS-GA forecasting calculation algorithm (Figure 2) is as follows:  Pre-process of data. Import data, fix missing data, set outliers, and visualize data to be used.
 Determine the universal set using the Fuzzy K-Medoids partition method optimized by the Genetic Algorithm.
-Stage 1. Determine the desired number of clusters, population size, pm, pc, and a maximum number of iterations.
-Stage 2. Form the initial partition matrix and initial population.
-Stage 3. Calculate the Euclidean distance between the data value and the population centroid.
-Stage 4. Calculate the fitness value in each population with Equation 25.
-Stage 8. Update the population and update partition matrix U using Equation 3.
-Stage 9. Repeat from stage 2 to stage 8, when the stopping criteria or maximum iterations are not met.
-Stage 10. Find the data index with the highest value of the partition matrix U.

5-2-MWFTS-PSO Model Algorithm
The following is the algorithm of the MWTS-PSO forecasting model (Figure 3):  Data pre-process. This stage fixes blank data and initial visualization. -Stage 4. Calculate the data point's distance to the cluster's centre using Equation 1.
-Stage 5. Determine the cluster based on the largest membership degree value.
-Stage 6. Update the partition matrix using Equation 3.
-Stage 7. Calculate the objective function using Equation 2.
-Stage 8. Repeat stages 3 to 7 until the maximum iteration or tolerance value is below the error limit.
 Group data based on the clusters formed.

6-Result and Discussion
The data used in this research is air quality data in Anand Vihar, India, recorded from January 1 to April 30, 2021. Moreover, this dataset was obtained from the Air Quality Historical Data Platform website, www.aqicn.org. Table 1 displays the datasets used in this research which consist of the date variable, which indicates the date was recorded, pm25 and pm10 represent the number of particles measuring 2.5 microns and 10 microns, respectively. O3 indicates the level of ozone content, NO2 indicates the level of nitrogen dioxide content, SO2 indicates the level of sulfur dioxide content, and CO indicates the carbon dioxide content. The initial stage of processing the dataset involves using Microsoft Excel to fill in the blank values and determine the maximum value for each recording date, which is used for prediction. Table 2 shows the maximum values obtained, while Figure 4 shows a graph of the maximum value used.

6-1-MWFTS -GA
The first step is to form a partition interval that is applied in the fuzzification process by using the k-medoids fuzzy clustering algorithm, which is optimized by the genetic algorithm. Table 3 shows the results of data normalization used for the calculations. The second step is to form the initial partition matrix, which is performed randomly to obtain the value of the distance between the data and the chromosomes included in the initial population.
The third step is the initialization of the initial population and population fitness. It is important to note that the genetic algorithm optimization technique is used in generating the initial centre of the cluster, with stages including initialization of the initial population, calculating the fitness value of each chromosome, crossover, mutation, and continued iteration. In this present research, the parameter number of clusters = 5, initial population = 20, Pc = 0.7, Pm = 0.15 and maximum iteration = 500. Afterwards, the fitness value on each chromosome is calculated using equation (25), while the objective function used in calculating the fitness value refers to the function in equation (26). After obtaining the results of the population and its fitness value, the chromosomes are sorted based on the highest fitness value. These results of the initial population formation process and population fitness are presented in Table 4. The fourth step is selection, where its operator holds the two chromosomes with the highest fitness level because they have the best adaptability. Subsequently, a new parent generation is selected based on the remaining chromosomes using the roulette wheel selection. This selection probability value for each individual is seen in Table 5. The fifth step is the crossover, where the random value is generated for each gene from the odd chromosome and is compared with the crossover probability. The gene is exchanged with another one when the random value is less than the crossover probability (Pc). Table 6 shows the results of this crossover process. The sixth step is a mutation, which is only implemented in one chromosome. At the mutation stage, a random value is generated for each gene from the chromosome. And there is a change in the gene when this value is less than the mutation probability (Pm). Table 7 shows the results of this mutation process. After obtaining the mutation results, the two chromosomes stored in the previous selection stage are combined with the results obtained. Therefore, a new population is obtained, which is used to update the value of the Euclidean distance and the partition matrix to get clustering results. At the same time, the cluster centre is selected based on each cluster's most significant membership value. It is important to note that the second to the sixth step needs to be repeated until it reaches the desired maximum iteration.
In this present research, the termination operator used is based on 500 iterations, meaning that after this number has been repeated, the best solution with a fitness of 0.0713 was obtained, as contained in Table 8. At the same time, the results of data clustering are seen in Table 9. The cluster labelling value (0, 1, 2, 3, and 4) is not an actual labelling value. Therefore, labelling results must be given after the cluster centre value has been obtained. These results of the cluster centre are seen in Table 10.  Table 10 shows the results of labelling based on the value of the cluster centre, where the labelling is performed sequentially to help calculate lower and upper bounds. Table 11 shows the calculation of the lower bound, upper bound, and median value, which is helpful for the formation of fuzzification to obtain forecasting results. Based on the interval obtained, it needs to be fuzzified to get the forecast value. Table 12 shows the results of forecasting.  The accuracy results of the MWFTS using the Fuzzy K-Medoids partition method are 17.440% and 48.179, respectively. Meanwhile, the accuracy of the MWFTS-GA method is 16.85%, and the value is 47,701. Figure 5 shows the forecasting results compared with the data.

6-2-MWFTS -PSO
It was discovered that MWFTS forecasting using PSO optimization has a simpler algorithm than GA optimization. The first step of this method is the formation of clusters using Fuzzy K-Medoids (FKM). Table 13 shows the clustering results using the FKM method, which is further grouped according to the cluster to determine the lower and upper bound values. Tables 14 to 18 show the distribution of data according to the cluster groups, and it was found that the cluster labelling value (0, 1, 2, 3, and 4) is not a true labelling value. Therefore, the labelling results need to be acquired after the cluster centre value is obtained.     This cluster centre value is obtained by selecting the data with the highest membership function in the cluster. Table  19 shows the labelling results based on the cluster centre's value. This labelling is performed sequentially to help calculate lower and upper bounds.  Table 20 shows the calculation results of the lower and upper bound and middle values. This calculation is used for the formation of fuzzification to obtain forecasting results. Furthermore, the table is taken as particle 1 to be used as a PSO optimization calculation. The initial PSO optimization algorithm is used to generate random particles. Table 21 shows four random particles generated to help improve accuracy. Meanwhile, this obtained value is in the interval range according to the lower and upper bound of the clustering results. Based on the interval obtained, it is fuzzified from the particles to get the forecast value, as shown in Table 22.  Table 23 shows each particle's MAPE and RMSE values in the initial iteration. Particle 1, the actual data, has a MAPE error value of 9.779% and RMSE of 30.638. Moreover, the next iteration is conducted until the maximum iteration is obtained as specified. According to Figure 6, the exact data comparison with the prediction model in some forecasting values has a difference that is not too far from the actual data.

7-Conclusion
Air pollution is one of the causes of deterioration that the government and scientists are examining. The growing number of industrial lands poses a risk of escalating the likelihood of air pollution. Consequently, it is anticipated that this research will yield information regarding air quality forecasting methods and the optimization of the accuracy of the forecasting model. This study proposes an optimization technique applied to the MWFTS forecasting model to predict the air quality in the Anand Vihar region of India. Fuzzy K-Medoids is the partitioning method employed because it is more sensitive to data with outliers. Despite this, some research still employs the MWFTS and Fuzzy K-Medoids models as the partitioning methods, and optimizing the MWTS is still uncommon. Therefore, it is anticipated that using GA and PSO optimization will provide insight into which optimization method is superior. It was discovered that the MWFTS-GA method reduces the error rate from its original values of 17.44% and 48.18% to 16.85% and 47.70%, respectively. The forecasting results using the MWFTS-PSO reduce the error rate from 9.78% and 30.638, respectively, to 7.58% and 24.863, indicating that the MWFTS-PSO provides greater accuracy.
Regarding the partitioning and optimization method, it is essential to note that this method must be developed in future research. The partitioning method employs clustering models such as Fuzzy C-Means and DBSCAN to produce more accurate interval values. In contrast, optimization uses techniques such as Bee Colony, Neural Network, and Cat-Mouse Based Optimizer. In addition, the PSO method utilized in this study is standard, allowing for its development into adaptive or hybrid PSO with other methods.

8-1-Author Contributions
Conceptualization, S.G.Y.; methodology, S.G.Y.; programming A.N., and N.S.A.; validation programming G.K.W.; writing draft preparation T.T.W.; review draft paper A.B.S. All authors have read and agreed to the published version of the manuscript.

8-2-Data Availability Statement
Data sharing is not applicable to this article.

8-3-Funding
The Article Processing Cost (APC) will support by INTI International University, Nilai Malaysia.

8-4-Acknowledgements
The authors gratefully acknowledge FAST Ahmad Dahlan University Indonesia, especially Mathematic Laboratory, and Faculty of Data Science and Information Technology, INTI International University, Nilai Malaysia for supporting this work.

8-7-Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this manuscript. In addition, the ethical issues, including plagiarism, informed consent, misconduct, data fabrication and/or falsification, double publication and/or submission, and redundancies have been completely observed by the authors.