Implementation of Takagi Sugeno Kang Fuzzy with Rough Set Theory and Mini-Batch Gradient Descent Uniform Regularization

The Takagi Sugeno Kang (TSK) fuzzy approach is popular since its output is either a constant or a function. Parameter identification and structure identification are the two key requirements for building the TSK fuzzy system. The input utilized in fuzzy TSK can have an impact on the number of rules produced in such a way that employing more data dimensions typically results in more rules, which causes rule complexity. This issue can be solved by employing a dimension reduction technique that reduces the number of dimensions in the data. After that, the resulting rules are improved with MBGD (Mini-Batch Gradient Descent), which is then altered with uniform regularization (UR). UR can enhance the classifier's fuzzy TSK generalization performance. This study looks at how the rough sets method can be used to reduce data dimensions and use Mini Batch Gradient Descent Uniform Regularization (MBGD-UR) to optimize the rules that come from TSK. 252 respondents' body fat data were utilized as the input, and the mean absolute percentage error (MAPE) was used to analyze the results. Jupyter Notebook software and the Python programming language are used for data processing. The analysis revealed that the MAPE value was 37%, falling into the moderate area.


1-Introduction
Each data set contains a variety of dimensions, ranging in size from small to large, with large dimensions frequently being observed to have a high level of complexity, which has an impact on the number of rules produced when employed in a fuzzy inference system.This is demonstrated by the fact that using large data dimensions as input typically results in the creation of more rules.The Takagi Sugeno Kang (TSK) fuzzy used in one of the studies [1] was found to have limitations when the input used was large.This TSK, which Takagi and Sugeno first developed in 1985 [2], is a form of fuzzy inference that is frequently employed for prediction or classification [3][4][5].TSK was utilized by Shaheen et al. [6] in order to carry out the AP-TSK-PID method while dealing with stochastic and non-stochastic uncertainties in nonlinear dynamic systems.In the field of medicine, it has also been used by Du et al. [7] to forecast how well hemodialysis patients will respond to their treatment and by Pan et al. [8] to address the issue of type 1 diabetes's difficulty controlling their blood glucose levels.
To achieve improved operating time performance, TSK fuzzy can be tuned using Gradient Descent (GD) [9].This GD is one of the techniques typically employed in optimization to reduce the cost function in machine learning, and it is typically carried out by updating each parameter depending on previous steps [10,11].GD can be divided into three categories: batch, stochastic, and mini-batch.Prior to this, several studies employed the batch type and stochastic type, respectively, for classification [12][13][14][15][16]. Due to its tendency to have a lesser computing load and a faster convergence because only the data from a batch is used in each iteration, the Mini-Batch Gradient Descent (MBGD) method was adopted in this work [17,18].MBGD has been the subject of numerous studies, including its application by Gou and Yu [19] to effectively train the ANN equalizer, Messaoud et al. [20] to maximize the IoT 4.0 market, and Hu et al. [21] to combine it with MMLDA to predict IncRNA illness associations.Additionally, the regularization strategy was employed to prevent overfitting and boost generalization because it can help the algorithm become more universal by preventing coefficients from being used to match the training sample data [22].Regularization is described by Kukačka et al. [23] as "any change made to a learning system intended to reduce generalization mistakes and not training errors."It is thought that this method is necessary to stabilize numerical calculations [24].
The rough set is a set theory extension that Pawlak first proposed in 1982.It is a subset of the universe that is characterized by two original sets known as the upper and lower approximation sets.Equivalence relations, specifically reflexive, symmetrical, and transitive relations, are the primary component of the rough set model [25].This was carried out by Wang et al. [26] to carry out feature selection in the genetic algorithm, and it was observed that the picked features produced good results.This technique for making decisions was also employed in Zhan et al. [27]'s study, and Jothi et al. [28] used it to categorize leukemia by identifying prominent traits.
This study builds on earlier work by Cui et al. [1] on the fuzzy TSK system with optimization based on Mini Batch Gradient Descent (MBGD) on classification issues.To enhance the TSK fuzzy classification's generalization performance and prevent overfitting, researchers utilize uniform regularization (UR), as in Cui et al. [1].Further, the researcher suggests applying the rough set method.The following are the primary contributions of this study: 1.The rough set technique is utilized to minimize the data's dimensions.The experimental findings demonstrate that the rough set enhances the rules produced by TSK fuzzy classification.
2. Additional researchers who used the rough set and UR approach found that the model falls into the reasonable category.

2-1-Rough Set
The rough set is one of the dimension reduction techniques developed by Pawlak in 1982 with its principle associated with a reflexive, symmetrical, and transitive equivalence relation [25].In the rough set, there is an information system that can be represented in the form of a table  = (, ), where U is a non-empty finite set of objects and A is a nonempty finite set of attributes [29].If the information table is added with the output of the classification, it will become a decision system table denoted by  = (,  ∪ {}), where  ∉  is the decision attribute.Indiscernibility relation is a relationship that cannot be separated because an object can have the same value for a condition attribute.This can happen at the time of the decision system.Suppose  = (, ) is an information system and  ⊆ .The indiscernibility relation of objects according to attribute B is denoted by   () can be defined as follows: Attributes in the rough set can be removed without losing their true value by using core and reduct.Reduct is the set of attributes that can produce the same classification as if all attributes were used.While attributes that are not reducts are attributes that are not useful in the classification process [29].Core is the intersection of all reducts, so the core is in every reduction, that is, every core attribute is included in every reduction.Suppose  ⊆  and core of B is the set of all dispensable attributes of B then core can be defined as follows [30]:

2-2-Fuzzy Set
Definition 1: Let X represent the universe of discourse; x is a member of the universe while X and A represent fuzzy sets.Therefore, a fuzzy set with the membership function of   () is: Definition 2: The fuzzy set A in universe X can be defined as a set of ordered pairs as indicated in the following equation: where,   () is the membership function x in the fuzzy set A which lies on the interval [0,1] [31].
Definition 3: TSK system in line with rules is defined as follows [32]: Defuzzification is a fuzzy process aimed at converting fuzzy numbers to crisp numbers.Therefore, the defuzzification value ( * ) was calculated using the following equation: where,   is output value in the i-th rule and   is output value in the i-th rule.

2-4-Mini Batch Gradient Descent (MBGD)
MBGD is Gradient Descent (GD) method that uses the concept of Mini-Batch to update parameters.Meanwhile, the updated parameter can be defined as follows [17]: where  > 0 is the learning rate (step size) [33].

2-5-Uniform Regularization (UR)
UR is a regularization method that forces the rules to have firing levels by minimizing losses [1].It can be calculated as follows: where  is the number of training samples and  is the firing level of each rule.Furthermore, ℓ  is added to the loss function in MBGD-based TSK classification training using Mini-Batch with  training samples and this is represented as follows:

2-6-Mean Absolute Percentage (MAPE)
MAPE is one of the methods normally used to evaluate a model and its value can be determined using the following equation [34]: where,   is the i-th data, ′  is the i-th data for forecasting, and  is the total data.The prediction criteria for MAPE as indicated by Rohmah et al. [34] are as follows (Table 1):

2-7-Flowchart
The flowchart in this study can be seen in Figure 1.

Figure 1. Flowchart Rough Set TSK-MBGDUR
First, a rough set method is utilized to process the data in Figure 1.Afterward, a method known as "fuzzification" is used to transform the rough set's results into fuzzy numbers.The creation of the fundamental rules, IF-THEN, where IF is the antecedent and THEN is the consequent, comes next once the fuzzification phase has been completed.The established rules will be improved using Mini Batch Gradient Descent Uniform Regularization (MBGD-UR).The next and last stage is the defuzzification procedure, which involves converting the fuzzy set back to the crisp set.

3-Results
The body fat data from a database known as Kaggle (https://www.kaggle.com/datasets/fedesoriano/body-fatprediction-dataset) was used in this study.It consists of data for 252 respondents with 14 independent variables and 1 dependent variable as indicated in the following Table 2.The dimensions of the data set were reduced using a rough set and the results are presented in Table 3.The data in Table 3 were further converted into fuzzy numbers, and the results are presented in the following table (Table 4): The input in Table 4 was used to obtain the following rules: The similarities in each rule were later determined using MBGD-UR, and the results are indicated as follows: Defuzzification was conducted on the rules obtained, and the results are indicated in Table 5.

4-Discussion
This study used 252 data with 14 independent variables and 1 dependent variable, and the results of the dimension reduction using the rough set method are presented in Table 3.The 14 variables were discovered to be reduced to 4 variables including weight ( 1 ), height ( 2 ), density ( 3 ), and body fat ().
Rough set results were subsequently used as input in TSK with each variable subjected to a fuzzification process to convert the data to fuzzy numbers using membership functions.This led to the generation of 7 rules which are in the form of IF-THEN as in Equation 5.Moreover, the consequences for each rule were optimized using MBGD-UR, and the constants generated for each rule were arranged into Equation 8.
The defuzzification process was later used to obtain output in the form of firm numbers.It is important to know that the defuzzification value was determined by multiplying the  value with the predicate alpha in each rule and dividing it by the total predicate alpha.The defuzzification (′) was calculated using Equation 6 and the results are shown in Table 5.
Figure 2 shows the results of the representation of Table 5 which compares the predicted defuzzification data with the actual data.In addition, the MAPE value is calculated to determine the accuracy of the model obtained, and a value of 37% is obtained which belongs to the fair category as shown in Table 1.

5-Conclusion
In order to transform data with big enough dimensions into those with small enough dimensions, the rough set method is employed in this study.The rules are then enhanced in the fuzzy TSK classification process using MBGD modification and UR to get better results.The classification prediction result after the MBGD-UR model modification is more reliable.This is so that the MBGD method's parameters can be updated using the mini-batch idea.By reducing losses, the role of UR in this study can push the rules to have a firing level.Additionally, MAPE (Mean Absolute Percent Error), which is a crucial component in assessing the forecast's accuracy, is employed based on the size of the forecasting variable.The MAPE value is recorded at 37%, indicating that the model is included in the reasonable category.MAPE measures how large the forecast error is in comparison to the actual value of the series.Further study is advised to compare other reduction techniques, such as using a decision tree and the Raw Classification Accuracy (RCA) accuracy, and to determine the minimum variable limit for dimensional reduction based on the analysis findings and conclusions.

6-2-Data Availability Statement
The data presented in this study are available on request from the corresponding author.

6-3-Funding
The authors received no financial support for the research, authorship, and/or publication of this article.

6-6-Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this manuscript.In addition, the ethical issues, including plagiarism, informed consent, misconduct, data fabrication and/or falsification, double publication and/or submission, and redundancies have been completely observed by the authors.