Adaptive Learning and Integrated Use of Information Flow Forecasting Methods

This research aims to improve quality indicators in solving classification and regression problems based on the adaptive selection of various machine learning models on separate data samples from local segments. The proposed method combines different models and machine learning algorithms on individual subsamples in regression and classification problems based on calculating qualitative indicators and selecting the best models on local sample segments. Detecting data changes and time sequences makes it possible to form samples where the data have different properties (for example, variance, sample fraction, data span, and others). Data segmentation is used to search for trend changes in an algorithm for points in a time series and to provide analytical information. The experiment performance used actual data samples and, as a result, obtained experimental values of the loss function for various classifiers on individual segments and the entire sample. In terms of practical novelty, it is possible to use the obtained results to increase quality indicators in classification and regression problem solutions while developing models and machine learning methods. The proposed method makes it possible to increase classification quality indicators (F-measure, Accuracy, AUC) and forecasting (RMSE) by 1%–8% on average due to segmentation and the assignment of models with the best performance in individual segments.

Under these conditions, the data distribution can change over time, which leads to concept drift [1,3], with changes occurring in the conditional distribution of output data values of input attributes while distributing input data can remain unchanged.Constantly changing data streams characterize processes in various subject areas [4].A stream processing model should provide specified quality indicators for predicting tasks at high update rates.This necessitates a simultaneous analysis of both the qualitative results of the processing model and the properties of the processed data.
Most machine learning methods use "centralized data", where samples store all the information on the observed objects.Collection processes are performed over some time and usually contain tuples of values when the observed system is in different states and is affected by many heterogeneous factors.This results in phenomena involving the transformation of properties and shifts in the ranges of values obtained from the recording elements.All this leads to the heterogeneity of the data in samples.In separate sequences within a sample, an imbalance of classes, a change of distributions, probabilities of events, and objects of observation can occur.
Machine learning methods can make it difficult to solve prediction problems when various statistical effects occur.For example, if Simpson's paradox [3] appears in the data, the standard approach to centralized intelligent sampling analysis may not achieve the specified qualitative indicators of data processing, and the processing result may not correspond to the true state.Modern approaches to building processing models involve forming, analyzing, and combining local results, which use aggregation methods.The methods and algorithms that solve the classification and regression problems may have different results for the selected quality indicator on the same data set.The values of different classifiers obtained in processing objects of observation can differ.They are considered complementary.By integrating several models, it is possible to improve the quality of classification in some cases.
Currently, ensemble methods are dominant.Among them, the most known are approaches based on simple, combined voting [4,5] and the application of several aggregating functions that calculate the maximum, average, median, and other class probabilities, averaging the prediction result on a set of responses.Alternatively, various aggregators based on ranking classification algorithms, arbitrators, and combinators [5,6] are used that apply to both binary and multiclass problems.Another direction relates to the formation of samples.Some researchers (e.g., [4,[6][7][8][9][10]) investigated various aspects of vertically distributed data and proposed technologies, basic algorithms, and combined strategies to select observation objects, allowing us to obtain the main characteristics of sequences and samples and exclude from consideration the values that lead to distortion of data properties [11][12][13][14].
Recently, some fields have used hybrid classifiers.Combinations of methods, where different models are based on relatively simple classification algorithms and complex neural networks, achieve high rates of completeness and accuracy [14,15].However, the capabilities of a single model depend on the properties of the training sample, and if the characteristics of the data change, the quality indicators can decrease significantly [16][17][18][19][20].The accuracy and completeness of processing results depend on many factors [21,22].The application of such approaches often leads to various situations where the aggregation of different models not only does not help improve the quality indicators but, in contrast, worsens the results [23][24][25].Such effects are often leveled on a large sample of data but are clearly visible in its segments.This leads to the fact that errors in the processing of data streams are possible due to different settings of the classification models [26][27][28].
Thus, it is necessary to develop new strategies and adapt existing ones that enable accurate and reliable training within separating functions and samples.Almost all proposed approaches, methods, and algorithms for machine learning today are highly specialized [29][30][31][32].Each model achieves particular qualitative indicators for those subject areas where it was optimized and for the data on which it was trained.One of the main problems in achieving qualitative indicators in machine learning methods is related to the fact that when the properties of incoming data change, a need for additional training occurs [33][34][35][36].Most models that solve prediction problems are trained on a predetermined set of observational objects.In the case of transformations in the properties of information sequences, the quality of processing decreases.Thus, there is a need to improve the completeness and accuracy of model classification in prediction problems under the influence of external factors.
The approach proposed in this paper is based on partitioning data samples into subsamples with their own properties.These properties allow us to choose the most efficient algorithms and models for classification tasks and the prediction of time series.The novelty of the proposed method is that the sample is pre-partitioned into subsamples based on the calculated information about the variations in the ranges of the target variables and predictors.The use of models to detect concept drift allows the real-time formation of subspaces of data with their properties, which can be used in the future for continuous learning and monitoring of the performance of the models.This study improves the quality performance of the prediction problem based on the segmentation and adaptive selection of different machine learning models on selected segments of the local data sample.
The rest of the paper is organized as follows: Section 2 describes the formalized problem statement and the method developed in this study.Section 3 presents the test results based on the experiments performed.Section 4 discusses the applicability of the approach considered in this study.The conclusion is an interpretation of the results.

2-1-Basic Notation
The use of models whose improvements are based on updated local information is one of the problematic issues in classification and regression [18].Typically, the training sample is considered a single set.However, the data tuples comprising these can be obtained under the influence of various factors.For example, the appearance of individual control commands increases the number of service messages in the network traffic.The change of seasons and the increasing length of the day are reflected in the power consumption of the power supply systems.Many factors that affect the values of the training set variables are known in advance.In this regard, it becomes possible to identify the training sample tuples received at the time of exposure.
Let   be a sample with a size of, { 1 ,   , . . .  } ∈ be the set of basic classification algorithms.The problem arises of determining  the classification algorithm that is most suitable for data sampling of a given quality indicator.
A set of factors is affecting { 1 ,  2 , . . .  } ∈  the values of target variables in ∈   tuples.
(, , ) is the loss function at the time of factor v.
The quality functional is determined by an expression related to the action of the factor v; Thus, it is then necessary to minimize the functional: which makes it possible to assign algorithms  ∈ for a sample,   during the formation of which the ranges of variable values were influenced by factors .Such a formulation allows for consideration of the influence of known factors that can cause effects affecting the spread, the bias of the classifiers' answers.

2-2-Method Description
One of the problematic issues with adapting machine learning models is the lack of effective methods of information pre-processing aimed at calculating and analyzing properties that allow dividing incoming sequences into segments in real time.Such complex methods should solve not only the usual problems of filtering, noise removal, and emissions but also provide information about the properties of the data to select and determine the most suitable models.Figure 1 shows an example of the model.

Figure 1. Flowchart of method steps
The model shown in Figure 1 has two parts.In the left part, a continuous information flow is processed; in the right part, procedures ensure the implementation of the "mechanism" of training and the selection of the most effective model that solves problems of classification, regression, and prediction.A feature of the presented solution is the segmentation of the data sample, which allows for the preliminary pre-training and tuning of algorithms.Let us consider many steps to implement the method. 1 .For the initial start of processes, it is necessary to have preliminary information about the values of  1 , ...   the information sequence.They were included in the initial training set. 2 .The sample is analyzed to determine individual segments where data properties differ.Its separation is possible both on the basis of a predetermined system of rules, and with the help of algorithms that automatically search for characteristic points where the properties of incoming information sequences change.The separation of objects of observation can be carried out using models, methods, algorithms that calculate the points of decomposition, the change of concept.They automatically define the segment boundaries. 3 .The initial sample was divided into several parts  1 , …   .Their properties  1 , …   were analyzed.Depending on the algorithm underlying segmentation, it is possible to determine the direction of the trend, the probability density of the analyzed events, etc.The properties and characteristics of the segments are analyzed, and if there is a match, it is possible to reduce the number of segments under consideration. 5 .On each segment   the quality functional Q(  (),   ) is determined for each model  ().Based on its values, it is possible to rank models { 1 , … ,   } ∈ and select those with the highest quality indicators for each segment. 6 .In parallel with the right part, the procedures for segmenting and determining the properties of the data sequence are performed when processing incoming data.Analyzing the properties of the segments identified during the processing of the information flow and comparing them with the properties of the subsamples obtained from the training sample allows you to assign one of the pre-trained models { 1 , … ,   } ∈ to the current segment.The selected   ()model is used to solve flow processing problems.Thus, it is possible to implement a constantly learning method, where the processes of learning and processing information flows can be carried out in parallel.In the case of using complex classification or regression models, pretrained models can reduce the time spent on training when the data properties change.
Currently, no single algorithm works well with all data.It is difficult to predict which learning algorithm is appropriate for a particular set in advance.In this regard, the problem of combining several classifiers into a single structure to obtain a better decision-making model arises.The effectiveness of employing various classifiers depends on the information properties and features of their algorithms.Algorithms with different characteristics are selected depending on the tasks to be solved and the required characteristics of speed, completeness, and accuracy.Each learning algorithm uses its own methodology for the dataset.Some of them require "fine-tuning"; others have a high processing speed, and still others are sensitive or insensitive to outliers.
The discussed approach to processing dynamic information flows proposes to aggregate machine-learning algorithms tuned to data properties.
The information flow, represented by the information data sequence, is sent for pre-processing.Data properties are evaluated, and the sample is split into separate segments with matching data properties.As a result, the   set is split into subsets,  1 ,  2 , …   each having properties different from the others.
Pre-processing allows for adaptive tuning of the base classifiers { 1 ,  2 , . . .  } ∈  on each   ∈   subset.They are trained to form the parameters and weight matrices of classification algorithms, and then their results are analyzed.The selection is made based on minimizing the empirical risk aimed at finding an algorithm for which, at   ∈   the following condition is satisfied: Using Equation 3for each subsample, it becomes possible to choose the algorithm with the best performance indicators.A simple aggregate function, calculating the best algorithms on  1 ,  2 , …   subsamples for Equation 3, will take the form: Depending on the problem being solved, the preset qualitative indicators, the data properties, and the features of the training subsets, it is possible to form more complex functions that consider the classifiers' weights and use additive coefficients, changing the "importance of the voice" of the algorithm.The obtained processing result is then fed into the training sample and regarded by the preprocessing algorithm to refine the model.The proposed multilevel approach evaluates possible algorithms at the preliminary stage, the subsequent selection of the algorithm, and its aggregation with others.While implementing complex machine learning models, several problematic issues arise related to the effectiveness of applying its individual components.Each basic algorithm has different performance indicators for data with different properties.In real systems, the frequency of the observed events may change, the range of values may shift, and an imbalance may appear in the dataset over time.In this regard, it is necessary to develop effective methods for information pre-processing to calculate and analyze the properties entering the analyzer input.They must perform the usual tasks of filtering, removing noise and outliers, calculating the data properties, and forming their segments.A set of such methods should be used to select and determine the most appropriate models for classification and regression problems.
The application of these methods is based on pre-processing, where individual segments with similar properties are differentiated from the initial sample.For example, in regression problems, these can be trends and seasonal changes.With the automatic separation method, points are calculated at which the direction of the trend changes or situations of concept change are analyzed.
The proposed method initially assumes that known factors affect the data properties.These can be commands, control actions, or events associated with a change in the environment.The formed training sample consisted of tuples.Their values are obtained under the action of these factors.Information about internal and external influences is used to divide subsets in such a way as to reduce the number of noise objects and improve the class distinguish ability properties.The application of the method can be considered for classification problems.Let us consider the error indicator  as a function for measuring the losses of the classification algorithm (  (),   ) is determined for each model (  )acting on the   sample.

𝐼(𝑥, 𝑎) = (𝑥 𝑖 ≠ 𝑎(𝑥 𝑖 ))
(5) The error rate ʋof the ()algorithm is determined by the following expression: The recorded data are affected by factors .Factorscan be defined explicitly; for example, for many datasets in the field of power generation, it can be seen that the length of daylight hours, working, and non-working hours may significantly affect power consumption.However, it is sometimes impossible to unambiguously interpret their effects due to their large number and complicated interpretation.To improve the performance indicators of machine learning methods affected by data outliers, noise, or changes in the density of the probability of occurrence, it is necessary to split the   set into subsets regarding the influence of factors   ∈  = 1, . . .,  on the data .In essence, the impact of factors is expressed as a change in the ranges of target variables.Such moments are tracked by various methods to detect concept drift.The set can be split by analyzing the data properties in the information flow, for example, the density of the probability of occurrence of the classified events.Various methods are used for this.Some of the simplest are DDM and SEED.
The Drift Detection Method (DDM) [37] uses the binomial distribution, which represents the probability of a random number of errors in a sample consisting of  examples.For each -th object of observation from the   set, the probability of misclassification   with standard deviation is It is assumed (PAC learning model) that with the increasing number of examples, the error rate of the learning algorithm   will decrease if the distribution of examples remains stationary.A significant increase in the error rate indicates that the class distribution has changed and, therefore, will change the properties of the class distributions.
The DDM method considers the concept change ratios   +   >   + 2  for the warning level.Beyond this level, the context change is possible  +   ≥   + 3  for the drift level.Beyond this level, the concept drift is assumed to be correct, the model caused by the training method is reset, and the new model is trained using the examples saved since the warning level was fired.The values for     are also reset.
Huang et al. [38] proposed an algorithm in which blocks of a fixed size are formed for the data.In this regard, by controlling the initial settings of the blocks for training samples, it is possible to form the number of candidate points for changing the concept.By determining the start and end points of a block, neighboring blocks are calculated and examined, and then, if the statistical properties match, they are grouped together.This operation, called "block compression," removes possible change points that are less probable to be true change points.SEED compares two subwindows.When the two windows have different mean values, the old sub-window is discarded.The SEED parameters in MOA are the block size, the compression ratio, and the threshold, a parameter that controls the size of the increment.
The concept drift definition points make it possible to determine the splitting of the   set into non-intersecting sets: The empirical risk functional   for the     sample determined by the influence of factor   is: where ʋ is the error frequency determined in Equation 6.
The subset obtained with the regard to the influence of factors can be assigned a classifier.
By assuming that the sample is simple and repeats the properties of the general population, it becomes possible to consider the algorithm versions  = { 1 ,  2 , . . .  } and choose the classifier   for the     set subject to the condition: Here, the classifying algorithm can be trained separately on each data segment.Due to the manipulation of samples, it is possible to improve the performance indicators of algorithms in some cases.
The obtained processing result is analyzed and can participate in forming the     sample.Later, preference is given to the most appropriate model trained on the subset selected using the proximity measure.
Model training is complicated not only by the large dimension of the attribute space but also by the presence of variable factors influencing the values of the attributes.
The main limitation of machine learning methods is that classification algorithms cannot always be effective in a system constantly functioning under the influence of various external and internal actions.The system is dynamic; there are constant transitions from one state to another.External and internal factors change the values of characteristics [39][40][41][42][43][44].
The analysis and consideration of the factors influencing these data make it possible to split the set into subsets.In the future, by determining the properties of the obtained samples, it will be possible to solve the problem of applying the most efficient processing algorithms.
A general view of the processing algorithm is shown in Figure 2.

2-3-Application Method for Single Classifiers
Consider the classifier (, ).Tuple x arrives at the input.The parameter matrix of the trained classifier is used for decision-making.Two ways of splitting datasets are possible: production rules and membership functions.The use of production assumes that the factors influencing the data values are computable.It is possible to form various subsets from the incoming information flow by analyzing changes in data properties.On the formed training sample, it is possible to determine changes in the properties of the sample data and densities of the probability of occurrence of the events under study for the data, which can be done by determining the points of concept change.The data and their properties that consider the effects on the sample values should be calculated.Generally, such a model is presented in predictive form: Denotes classifying algorithms that use weight matrices to compare the incoming data vector. is the set of parameter matrices of trained classifiers.The matrix values depend on the factors that affect the data in the system. is a set of object descriptions consisting of a subset of data samples.Each subset has its own classifier weight matrix.
Values of   ∈  may be selected on the basis of the production model.The   data subset is determined concerning the influencing factor.Each subset can be assigned a classifying algorithm  ∈ .Grouped variable subset   determines the weight matrix   regarding the properties of the classifying algorithms.This allows using   matrix on the sample determined by the concept change detector.The new tuple x arriving at the input is identified by the classifier (,   ).
The segmentation of the dataset will consider changes and improve the performance indicators of the classification model as a whole.The other direction is based on the use of the membership function.It can be used when there are impacts that can be analytically described (for example, the seasonal length of the daylight hours, the latitude of the place in the subsystems for supplying electricity to urban facilities, peak load hours in the information system).Let  be the set of factors influencing the target variables in the data sample.Such factors   ∈ can be processed using the membership function (indicator function).Based on this, the data sample  is split into a finite number of non-intersecting measurable subsets  1 ∪  2 ∪ … ∪   .In the simplest case, the membership function µ of the subset   ∈ , where  is a training sample tuple, can be represented as: Equation10 makes it possible to determine the membership of an element of the  ∈ data sample in the   subset at the time of factor   action.
In the general case, the sample consists of subsets.Membership of the subset is determined by the functions    ().Classification for   ∈ class becomes possible on each subset  1 ∪  2 ∪ … ∪   .Test and training samples are formed concerning the acting factors   .The classifier (, )can be supplemented with the (  ) function depending on the subset being processed.The (  ) function considers the factor,   influencing the  , subset and determines the weight matrix   = (  )by its value.The classifier takes the form of (, (  )).
The loss function can be used as one of the model evaluation measures for regression problems.The loss function (  ) for the   subset is determined by the expression: where  is the number of observation objects in the subset.
The average amount of losses for the data of the  set is: where  is the number sample affecting factors.
Applying Equations 11 and 12 and minimizing, it is possible to search for optimal parameters based on the expression: Equation 13 makes it possible to determine the qualitative indicator of the classifier loss function, considering the splitting of the data sample.

3-1-Experimental Setting
To confirm the advantages of the proposed approach, four publicly available datasets were used.They contain information sequences of electricity generation data.Time series data from conventional power plants and renewable energy sources were used for modeling [45][46][47][48].The proposed solution is based on sample segmentation.A data sequence is being processed.The sample is analyzed, and individual segments are determined.In the experiment, its selection is performed both based on a predetermined system of rules and with the help of algorithms that automatically search for characteristic points where the incoming data properties change.In the case of using heuristics, the sequence is studied.Based on the analysis of the dataset, trends, periods, segments, and clusters with different characteristics are distinguished.Observation object separation can be carried out using models, methods, or algorithms that calculate the points of decomposition, a change of concept.The algorithm for the selected processing model is presented in Figure 3.

3-2-Data Processing
Several datasets were considered experimental data [45][46][47][48], which contained data on electricity generation in various regions from 1995 to 2020.Classification and regression problems were considered.The classification quality indicators (AUCarea under the ROC curve, accuracy, and F-measure) and forecasting (RMSE) were evaluated for all samples entirely and using segmentation.
The Power Supply dataset was chosen as the first experimental data; it shows the power supply capacities of the two stations [45].The choice of the dataset was justified by its structure, containing two predictors, and the ability to determine periods based on the timestamps of records.Two classes were determined by the values of two predictors: working hours and non-working hours.The experiment considered the problem of determining working and nonworking hours according to the readings of the capacities supplied to the municipal network from two substations.The seasonal effect was determined to be an influencing factor.
Figure 4 shows a general view of the data.The axes show the days of observations and hours in the horizontal plane, and the power consumption is plotted along the vertical axis.
The concept detector is used to select several points where the density of probability of occurrence changes.The points determined by the detector make it possible to determine the dataset split into subsets, where their properties change.Simultaneously, it is possible to identify segments based on a heuristic approach using membership functions.
At the beginning (Figure 4-a), when analyzing the dataset using the SEED method, four segments were obtained by selecting parameters; these segments were compared to the segments determined by the membership function describing seasonality.Then the window size was reduced, which increased the number of identifiable concept change points (Figure 4-b).

Figure 4. Segmented time series of the SEED electricity consumption dataset with different window widths
During the experiment, segments were first obtained using SEED.Then, their analysis was followed by the comparison of the segments determined on the basis of a time scale showing the calendar change of seasons.In the experiment, similar segments obtained automatically using the SEED method and calculated by the membership function were compared to analyze performance indicators.Consider the resulting subsets obtained based on SEED and membership functions.
On Figure 5, the axes show the values of the powers generated by the two power plants.Seasonal factors are used to segment data based on rules defined by the membership function and the SEED method.Figure 5 shows the values of the general population of the entire sample of light areas of working days, shaded areas of non-working days in the winter period.In the case of superposition of graphs, overlapping areas make it possible to estimate the probabilities of occurrence of errors of the first and second kind.The intersection area of the entire set data is larger than that for a subset segment of the winter months.Subsets can be estimated using the silhouette coefficient.The compactness hypothesis specifies that sequences belonging to one target class will be close to each other and far from an object of another class.It is assumed that data values of the same class are grouped side by side.Therefore, the silhouette function is used to evaluate such clusters.It is possible to check the consistency of data in areas with its help.

𝑆 𝑖 =
−  max(  ,  ) (14) where   is the average distance from the -th point to other points in the same cluster as,  and   is the minimum average distance from the -th point to points in the other cluster.The entire cluster structure was evaluated as an average of the indicators for the elements: Figure 7 shows the graphs of the silhouette coefficient for the segments obtained automatically and based on the applied heuristic procedure.It determines how close each point in one class is located relative to the points in an adjacent area.In any given experiment, the dataset was split several times.Silhouette values were grouped by domains.By evaluating the quality of each domain using the silhouette coefficient, it is possible to give a priori estimate of the classifying model [49,50].The silhouette coefficient values show that the data obtained using the considered segmentation methods have approximately the same "compactness" properties.Simultaneously, on average, the silhouette coefficient values are better for the segmented samples than for the entire sample, indicating the data uniformity and possible improvement the data processing in segment.The graphs show that in the case of a segmented set, the values are better balanced and form a more compact domain compared to the data of the entire set (Figure 8).

3-3-Algorithm Evaluation
Two divisions were carried out to evaluate changes in performance indicators.The entire sample was split into parts containing energy consumption values based on the indicator function and the concept change detection method.Subsequently, two splitting methods were used to analyze the classifying algorithms.The statistical properties of the predicted target variables change over time.In the cases under consideration, the tuple data values are affected by a predetermined seasonality factor.For this, datasets were specially selected.Various algorithms were chosen to assess the impact of subsets on the quality of the results of machine learning models: the linear discriminant analysis (LD), the quadratic discriminant analysis (QD), the naive Bayes classifier (NB), the k-nearest neighbors (KNN), the decision tree (DT), and the random forest (RF).The influence of sample segmentation on the qualitative indicators of F-measure, accuracy, and AUC for classification and regression tasks was considered.Table 1 gives the results of classifier testing (AUCarea under the ROC curve, accuracy, and F-measure).In the case of segmentation, on average, there is a decrease in the loss function values compared to the full nonsegmented sample.Data segmentation makes it possible to reduce the loss function for different sample areas, to allocate separate segments with a smaller data span, which determines lower values of the loss function on average in the regression problem.The results of the RMSE loss function for different classifiers in different segments number for the SEED method are presented in Table 5.The allocation of sequence segments of the information flow and the evaluation of their properties allow finding and assigning machine learning methods with the best characteristics.On individual segments, the methods show lower values of the loss function than when processing the entire sample.The results show that the proposed method application, where each data sample segment is assigned a method that has the best quality indicators on it, allows reducing the values of the RMSE loss function from 1 to 8% compared to processing the entire sample.
In the future, to improve quality, it will be possible to use a combination of methods, where each method is assigned to its own segment.

3-4-Results Analysis
A situation occurs when the best achievable quality indicators in each segment and sample show different models.It becomes possible to improve the quality indicators of processing by selecting the algorithm with the best value for each segment.Thus, selecting data segments and evaluating their properties allows the search and assignment of machine learning models with the best characteristics.Similarly, it is possible to compare ensembles consisting of several complex models or elementary algorithms.
In practice, it is not always possible to create various independent models.In the example above, the algorithms are trained on the same sets, reducing their diversity.It is not always possible to realize the division of the training data sample so that the data are random, homogeneous, and independent.As a result, there may be a situation where there is, for example, one "good" and one "bad" algorithm by quality indicators, and this will lead to the ensemble results having a worse quality than those of the "good" algorithm.
Simultaneously, the computational costs of aggregating and training a group of complex predictive models are higher than training a single classifier.This can increase time and computational costs when there is a concept change or a change in the data properties, compared to "substituting" a ready-made model.It is not always possible to build models using different combinations of features, for example, when analyzing one-dimensional rows.And this, in turn, entails the impossibility of achieving their diversity.The average of the models will be an improvement only if the models are independent of each other.
The transformation of data properties can occur in information flows with constant incoming sequence data.As a result, strong classification models trained on historical data may become weak at different time intervals, and vice versa.Such changes in the properties of predictive models occur in a very short period of time, leading to a worse quality of problem-solving by an ensemble of models than that of one classifier.

4-1-Main Findings of This Study
One of the main problematic issues with machine learning methods is the data processing during the transformation of their properties.Improving the "quality" of processing is achieved by forming complex and relatively resourceintensive models.They are highly labor-intensive and require computational resources for automation.The proposed method is aimed at the segmentation of the data sample and is based on considering the factors that influence the changes in the ranges of target variables.Automatic implementation of segmentation is possible with the help of models and methods for detecting points of concept change and drift.The identification of effects makes it possible to form segmented data samples based on current situations.It is possible to select and assign a pre-trained model for each resulting segment, depending on the data properties.

4-2-Comparison with Other Studies
One way to improve quality is to use models based on refined local information [24].The analysis is conducted on individual predictors that have the maximum impact on processing quality.However, quality data tuples can be obtained under the influence of various factors [27].After some time, the transformation of their properties is possible, which will require additional analysis.Pre-training the models on the segments and evaluating the properties of the obtained sample segments makes it possible to assign the most efficient algorithms and classification models for each subset.Assigning a particular algorithm with the best qualitative indicators to a segment allows us to obtain an increase in various quality indicators for each classifier from 1% to 8%.This is comparable to the results of the quality indicators of ensemble models [11,12].However, unlike the proposed method, they require complex aggregation functions and computing resources for the parallel operation of the data processing models.
In the proposed solution, it is possible to select a separate, best-quality pre-trained model for each segment to avoid the cost of aggregating the results of ensemble methods [7,11].The changes in data properties provide an opportunity to assign a model choice quickly.The proposed method can be applied as an addition to complex data processing models to perform the segmentation of sequences first to improve the qualitative performance of its constituent algorithms.

4-3-Implications and Explanation of the Findings
Data-sample segmentation provides an opportunity to reduce the loss function of individual segments.The search algorithm for trend change points allows the selection of individual segments with a smaller data range, which determines lower values of the loss function on average.Highlighting segments of data stream sequences and evaluating their properties allows us to identify machine learning models with the best performance.On individual segments, algorithms show lower loss function values than when processing the entire sample.By considering the loss function, it is possible to assign to a segment of the model the best value.Pre-training samples with similar properties can reduce the time for model preparation.An analysis of the model results and the actual values of the sequence can be applied to generate training data to refine the model.Hierarchies are further possible when the top-level model is applied to assign the most efficient lower-level model to an individual segment.The proposed solution aims at further improving and extending ensemble methods and hybrid classifiers.It represents a functional engineering technique that improves the quality of individual elements of a data processing model by partitioning the set into subsets.

5-Conclusion
To improve the quality of the models' performance, it is possible to implement pre-processing for data sampling.In different analyzed segments, it is necessary to implement separating surfaces of various complexity, leading to better performance of different models on different subsamples.Collecting observation objects is time-consuming, and various shifts in the values of individual parameters can occur within the tuples.Feature extraction may lose its relevance if concept drift occurs.In this regard, it is necessary to process incoming data samples and analyze each segment continuously.Information about data properties in the segments strongly depends on how the sample is segmented and separated.The processing of this data is necessary to obtain information about class separability, to form a separating surface, and to improve the quality performance of the classifying algorithm.
Using several models to improve the quality of prediction results in the form of ensemble methods leads to the fact that despite the various combinations combining individual algorithms into a model, situations occur where such a combination may not only not improve but even worsen the result.It is necessary to prevent such situations, which the proposed solution facilitates.

𝑆 4 .
Subsamples  1 , …   are received with the input of models.Their  1 , , …   training and analysis of the achieved quality indicators occur.

𝑆 7 .
The selected model gives the result of processing. 8 .The obtained results are compared with the available ones, their qualitative indicators are analyzed. 9 .Comparison of the obtained model and real values allows you to decide on the formation of data to refine the algorithm, which is subsequently added to the training sample.

𝑆 10 .
Gathering information for updating the training sample.
where  is the sample size,   is the I segment size.Each     ∈   subset formed because of analyzing the action of factors for training the classifying algorithm can be split into the training and control samples,     =      ∪        =   +   where   = 1, … ,   ( total number of objects in the segment) are splitting options for     sample,   and   are the lengths of the training and control segment subsamples.

Algorithm 1 :Figure 2 .
Figure 2. The general view of the processing algorithm

Algorithm 2 :Figure 3 .
Figure 3. Algorithm for the selected processing model

Figure 5 .
Figure 5.A subset of working and non-working day's power generation in four-part segmentation: a) full dataset, b) winter segment after segmentation by the membership function, c) «conditional» winter segment after segmentation by the SEED methodCompared to the data of the entire set, there is a shift in the ranges of variables.Using the information about individual factors that affect the values, reducing the range of data change by segmenting the sample becomes possible.In Figure6, based on the frequencies of values, a probability density estimation function for working (blue) and non-working (red) time is built for the SEED method and membership function.

Figure 6 .
Figure 6.An example of a probability density estimation function for classes during working and non-working days power generation: a) full dataset, b) winter segment after segmentation by the membership function, c) «conditional» winter segment after segmentation by the SEED method.

Figure 7 .
Figure 7. Binary class silhouette for segmentation methods: afull dataset, bwinter segment after segmentation by the membership function, c -«conditional» winter segment after segmentation by the SEED method

Figure 8 .
Figure 8.The "Silhouette" values coefficients for four divisions of samples (blue -segmentation based on the membership function, red -segmentation based on the SEED method)