Evolving Genetic Programming Tree Models for Predicting the Mechanical Properties of Green Fibers for Better Biocomposite Materials

Advanced modern technology and industrial sustainability theme have contributed implementing composite materials for various industrial applications. Green composites are among the desired alternatives for the green products. However, to properly control the performance of the green composites, predicting their constituents properties are of paramount importance. This work presents an innovative evolving genetic programming tree models for predicting the mechanical properties of natural fibers based upon several inherent chemical and physical properties. Cellulose, hemicellulose, lignin and moisture contents as well as the Microfibrillar angle of various natural fibers were considered to establish the prediction models. A one-hold-out methodology was applied for training/testing phases. Robust models were developed to predict the tensile strength, Young's modulus, and the elongation at break properties of the natural fibers. It was revealed that Microfibrillar angle was dominant and capable of determining the ultimate tensile strength of the natural fibers by 44.7% comparable to other considered properties, while the impact of cellulose content in the model was only 35.6%. This in order would facilitate utilizing artificial intelligence in predicting the overall mechanical properties of natural fibers without experimental efforts and cost to enhance developing better green composite materials for various industrial applications.


1-Introduction
Modern green products and industrial sustainability have put out the most effort into developing proper composite technology.The requirement for robust, stronger, and lighter constructions has offered a chance for composite materials to outperform other regularly utilized materials [1,2].Manufacturing and maintenance expenses for biomaterials have been reduced because of the development of new and improved fabrication techniques in this sector [3,4].In addition, composites have had the largest influence on athletic equipment and sustainability [5,6].Moreover, bio-composites that utilize green lignocellulosic fibers have more benefits over other synthetic materials because of their desired performance from an environmental point of view and due to the negative impact of plastics on the environment, the unavailability of landfill space as well as the depletion of petroleum resources [7][8][9].Therefore, appropriate deployments of accessible natural resources as well as wastes became vital for industrial sustainab ility, green products, and environmental issues [10,11].
Nevertheless, the overall performance of bio-composites strongly depends upon the mechanical performance of their constituents as well as their compatibility [12,13].Rapidly displacing traditional materials is made possible by green products using natural fiber-reinforced polymeric composite materials; for example, automotive applications, packaging, and a variety of sports goods are now virtually entirely made of innovative composites [14,15].Without question, the utilization of green composite materials will increase in the future as a requirement of modern societies as well as the green production theme.However, the characteristics of the green product strongly depend upon the constituents' individual characteristics as well as their compatibility [16,17].The chemical composition of the fibers controls their interactions with the polymer matrices and substantially and highly affects the composite properties.The chemical composition of commonly used natural fibers and their mechanical and physical properties are listed in Table 2 [18][19][20][21][22][23].
Cellulose, hemi-cellulose, lignin, and moisture are the main chemical composition of the natural fibers.However, with different ratios, even for the same plant at various positions [24,25].These ratios, as well as the Microfibrillar angle, highly influence the fiber mechanical performance and thus affect the overall green composite properties [26,27].As the strength of the fiber is the most important feature to be considered in green composites, the selection of an appropriate fiber type is vital in such materials [28,29].
In order to select the appropriate green fiber type, suitable predictions of its mechanical performance are important [30,31].The resolution of this type of computational prediction problem can be approached by using various advanced computational methods.These include genetic programming (GP).GP is an evolutionary algorithm that is inspired by the Darwinian principles of evolution and natural selection.Originally, it was proposed as an extension of the genetic algorithm (GA) [32,33].Primarily, the objective was to automatically solve problems by initially knowing abstract general information of the solutions.Since its proposal, it has been applied to various applications, including image processing, industrial modeling and control, finance, and bioinformatics [34,35].However, it has a special success and wide adoption in fitting regression models, where often there is a weak understanding of the relationships among the features of the respective problem [36,37].The GP evolutionary algorithm shares many similarities with the GA algorithm, which consists of a population of solutions that evolve iteratively by genetic operators.Meanwhile, the critical difference is the representation of the individuals (solutions) as tree structures and not strings of bits.Unlike the GA algorithm, which uses fixed-length individuals, the GP algorithm represents the individuals using hierarchical and variable-length solutions that are more capable of modeling the tasks of a computer program [38].The GP algorithm creates a population of computer programs (tree-based individuals) that evolve over generations using genetic operators (i.e., selection, subtree crossover, and subtree mutation) that are suitable for various engineering applications.
Consequently, this work introduces novel, evolving GP tree models for predicting the mechanical properties of green fibers based upon several intrinsic chemical and physical properties.This was aimed at as no previous works have been found in the literature considering predicting the mechanical properties of the natural fibers based upon their inherent properties using genetic programming tree methods.Here, several chemical compositions of cellulose, hemicellulose, lignin, moisture content, and the physical Microfibrillar angle of various natural fibers are considered to be investigated and utilized in constructing the prediction models.The impact of each chemical and physical property parameter in predicting the mechanical performance of natural fibers was determined using genetic programming for the first time as one of the artificial intelligence methods in this field.A one-hold-out methodology is applied for training/testing in the developed prediction models.In order to obtain more reliable results, the one-hold-out was repeated ten times to enhance the reliability of the established models in demonstrating the impact of the fiber chemical and physical properties on the tensile strength, tensile modulus, and elongation at break properties.This would facilitate predicting the overall mechanical properties of natural fibers without costly experimental efforts to enhance their proper selection for green composite materials to develop more sustainable green products.

2-1-Data Collection
In order to establish proper evolving GP tree models for predicting the mechanical characteristics of green fibers, several intrinsic chemical and physical properties of the cellulosic fibers were collected from reliable experimental works found in the literature.This includes the cellulose, hemicellulose, lignin, moisture content, and Microfibrillar angle of the natural fibers worldwide.To enhance the reliability of the work, data was only collected from peer-reviewed journals indexed in Scopus and Clarivate Analytics.It is worth noting that there was a variety in the reported values for almost all the considered fibers due to the variations of the fiber type, age, place, climate, soil, and fertilizers.This in fact demonstrates the complexity of performing such experimental work for most of the available natural fibers, as well as revealing the importance of establishing prediction models to determine the most important factors that affect the overall mechanical performance of such green fibers.

2-2-GP Procedure
Conceptually, the GP algorithm shares the same procedural structure as other algorithms in the field of evolutionary computation.Figure 1 illustrates the abstract flow of the GP algorithm.Initially, any evolutionary algorithm has a population of individuals (i.e., solutions) that evolve over generations to converge on optimal regions and find the best solutions.The evolutionary processes involve the probabilistic selection of the parent solutions, the reproduction (crossover), the mutation, and the elitism.However, the GP algorithm has a population of computer programs, which are represented by a tree structure and then evolve using subtree crossover and subtree mutation operations.The following steps discuss more closely the GP algorithm and its genetic operators. Selection: this operation decides which individuals will be transferred for the next iteration to generate a new population of individuals.Broadly, the selection process might be random-based (e.g., tournament selection) or ranking-based depending on the fitness scores of individuals.However, this step is critical, since it influences the diversity of the next generation, which considerably affects the overall performance of the algorithm.
 Crossover: uses the selected parents for producing new variants of individuals by exchanging their genetic material.The GP crossover is denoted by the subtree crossover, which has a randomly selected crossover point (node) at each parent individual.Although crossover points are selected randomly, they are preferred to be far from the root or the leaves to avoid increasing the complexity of the new offspring.In the one-point crossover, the parents at the crossover points swap their subtrees to produce new individuals as illustrated in Figure 2(c).
 Mutation (subtree mutation): this acts only on a single parent individual, where the selected subtree at a randomly selected mutation point is replaced by another random subtree.This is demonstrated by subfigures (a) and (b) in Figure 2, where (b) is a new mutated parent individual.
 Elitism: this operation preserves the best-found individual(s) at each generation.
 Fitness Assessment (Objective Function): Every solution (individual or tree) of the population is assessed and assigned a score that indicates its capability to address the targeted problem.This score is known by the fitness value of an individual and refers to the optimization objective of the problem.In other words, in prediction problems where the aim is to minimize the error; the mean absolute error (MAE), the root mean square error (MSE), or the ratio of error can be used as the objective method.
 Termination: the evolutionary process is iterative.The algorithm loops over the evolutionary operations from selection, to reproduction (crossover and mutation), and then fitness evaluation over the course of generations until an optimal solution is reached or a termination condition is satisfied.Often, the maximum number of iterations is used as a criterion for stopping the optimization procedure.

2-3-Identification of the Most Relevant Features
A main advantage of GP is its availability to evolve tree based model that are easy to interpret.By nature, GP can provide a better understanding of the most significant variables involved in the prediction process through an incorporated feature selection mechanism.Over the course of the evolutionary cycle of GP, some variables will survive and have higher probability to appear in later generations, while variables with less impact will gradually disappear.To determine the most important features in the developed GP tree models, the approach will be referred to as the measurement of "relative impact" of input variables [39,40].The relative impact on an input variable is measured according to the number of references to this variable in all generated GP models starting from the initial population until the last generation.The relative impact of a given input variable can be measured as follows:  Let  be a set of input variables { 1 ,  2 , … ,   } where   ∈ .
 Through the iteration of GP, the number of appearances of   concerning a model  can be denoted as RefCount (  , ).
 The occurrence rate of   in population  can be formulated as:  As a result, Equation 2shows that the proportion of references of variable   in a population , represented by freq (  , ), is divided by the total number of variable references to determine the relative frequency, rel (  , ), of variable   in .(2)

2-4-Evaluation Metrics
The evaluation metrics utilized in this study to evaluate the ultimately attained GP models are: 1. Root Mean Squared Error (RMSE): where  and  ˆ are the actual and the predicted values based on the developed GP models, and  is the number of instances used in the experiments

3-Results and Discussion
In order to generate prediction models for the mechanical performance of green fibers based upon the intrinsic characteristics of the fiber using the GP method, a one-hold-out methodology was applied for training/testing.That is; all instances were used for training except one instance, which was left for testing.This process was repeated ten times, each time with different training and testing instances to develop more reliable results.

3-1-Ultimate Tensile Strength GP Model
To demonstrate the impact of the intrinsic features of the green fiber on the ultimate tensile strength mechanical property, GP prediction model was established considering all of cellulose (C), hemicellulose (H), lignin (L), moisture content (Mc), and Microfibrillar angle (Ma) of the fiber simultaneously.The best generated GP model is expressed by Equation 6, where c0, c1, c2, c3, c4, and c5 are constants with the following specific values.It can be confirmed from the best GP prediction model that the ultimate tensile strength was mainly influenced by the cellulose content, Microfibrillar angle, and hemicellulose to some extent.However, it was not influenced by moisture content and lignin.The average and best GP results for the ultimate tensile strength case in terms of RMSE, MAE and  2 are given in Table 2.The estimated values for ultimate tensile strength by the developed GP models of the best experiment are given in Table 3.It can also be confirmed that both cellulose content and the Microfibrillar angle of the fiber have the main influence in determining the ultimate tensile strength of the natural fibers.Moreover, the actual vs. estimated ultimate tensile strength values by best one-hold-out cross-validation GP experiment are expressed in Figure 3.The relative impact for each variable identified by GP over the course of iteration is shown in Figure 4.The Microfibrillar angle was approved to be the dominant in determining the ultimate tensile strength of the natural fibers by 44.7%, and cellulose content was the second dominant factor with 35.6%.That is; both Microfibrillar angle and cellulose content factors are the most influential parameters in determining the ultimate tensile strength of the green fibers.However, all of hemicellulose, moisture content, and lignin have very minor effects on this mechanical property of the fibers.

3-2-Elongation at Break (%)
The elongation at break property of the natural fibers was also investigated via GP models.The actual vs. predicted elongation at break values by best one-hold-out cross validation GP experiment is illustrated in Figure 5.It can be shown that despite of existence of extreme values, the predicted values were with high accuracy utilizing the evolved GP tree model.Table 4 demonstrated the evaluation results of GP and MLR for modeling the elongation at break (%).The R 2 values for the best and average GP were 0.968 and 0.855 respectively.This indicates that the prediction modes were capable of predicting the elongation at break property values of the natural fibers with high confidence comparable to the linear regression model.Moreover, the GP elongation at break model is expresses as in Equation 7with it corresponding constant values.It can be shown that the elongation at break property of the natural fibers was modeled by GP to be a function of hemicellulose and moisture content.However, the rest of compositions like cellulose content, Microfibrillar angle and lignin were not with significant influence in the GP model.
The relative impact of the mechanical properties identified by GP for the elongation at break (%) model is illustrated in Figure 6, and the estimated values by the best one-hold-out cross-validation experiment for elongation at break with various ranges of the fiber properties are tabulated in Table 5.It is shown that moisture content of the natural fiber has the main influence in determining this property relative to other contents with about 63% of the model.Hemicellulose content on the other hand, has the power to have an important effect on the elongation property with about 29.4%.However, cellulose content, Microfibrillar angle and lignin were found not dramatically influencing the elongation at break property.Cellulose for instance, has only 3.4% relative impact in the GP predicting model.

3-3-Young's Modulus
On the other hand, the GP prediction model of the Young's modulus of the natural fibers is expressed as in Equation 8.It can be shown that the best GP model was generated utilizing all of Microfibrillar angle, lignin and hemicellulose.The estimated values by the best one-hold-out cross-validation experiment for Young's Modulus are tabulated in Table 6 and the actual vs. estimated Young's Modulus values by best one-hold-out cross validation GP experiment are shown in Figure 7.It can be demonstrated that despite of being very difficult to predict the Young's modulus of the natural fibers based on their intrinsic chemical composition, the GP model was capable of doing so with acceptable relative errors.Moreover, the lignin content of the fibers and the hemicellulose were the dominants in the GP predicting model as demonstrated in Figure 8, where c0, c1, c2, c3, and c4 are constants with the following specific values.

4-Conclusion
Predicting the mechanical performance of green fibers is still challenging for designers due to their variations in chemical and physical properties.Despite this complexity, this work was capable of establishing evolving GP tree models for expecting the mechanical properties of green fibers and determining the influential chemical and physical parameters on the overall mechanical performance.Each mechanical property of the natural fibers was found to be primarily influenced by certain intrinsic properties.It was shown from the best GP prediction models that the ultimate tensile strength was mainly influenced by the cellulose content, Microfibrillar angle, and hemicellulose of the fiber, but not by moisture and lignin contents.However, the moisture content of the natural fiber has the main influence in determining the elongation at break property relative to other contents, with about 63% dominance in the model.Hemicellulose content was also found to influence the elongation at break property by about 29.4%.However, cellulose content, Microfibrillar angle, and lignin were found to not dramatically affect this property.Moreover, the hemicellulose and lignin contents of fibers were found to be significant in determining the Young's modulus property according to the established GP prediction models.This would facilitate the proper selection of the natural fibers for desired green composites that fulfill sustainable industrial requirements and customer satisfaction attributes.

5-2-Data Availability Statement
The data presented in this study are available on request from the corresponding author.

Figure 1 .
Figure 1.A flowchart of the GP algorithm

Figure 3 .Figure 4 .
Figure 3. Actual vs. estimated ultimate tensile strength values by best one-hold-out cross-validation GP experiment

Figure 5 .
Figure 5. Actual vs. estimated Elongation at break values by best one-hold-out cross validation GP experiment

Figure 6 .
Figure 6.Relative impact of the mechanical properties identified by GP for the elongation at  (%) model

Figure 7 .Figure 8 .
Figure 7. Actual vs. estimated Young's Modulus values by best one-hold-out cross validation GP experiment