Enhancing Learning Object Analysis through Fuzzy C-Means Clustering and Web Mining Methods

The development of learning objects (LO) and e-pedagogical practices has significantly influenced and changed the performance of e-learning systems. This development promotes a genuine sharing of resources and creates new opportunities for learners to explore them easily. Therefore, the need for a system of categorization for these objects becomes mandatory. In this vein, classification theories combined with web mining techniques can highlight the performance of these LOs and make them very useful for learners. This study consists of two main phases. First, we extract metadata from learning objects, using the algorithm of Web exploration techniques such as feature selection techniques, which are mainly implemented to find the best set of features that allow us to build useful models. The key role of feature selection in learning object classification is to identify pertinent features and eliminate redundant features from an excessively dimensional dataset. Second, we identify learning objects according to a particular form of similarity using Multi-Label Classification (MLC) based on Fuzzy C-Means (FCM) algorithms. As a clustering algorithm, Fuzzy C-Means is used to perform classification accuracy according to Euclidean distance metrics as similarity measurement. Finally, to assess the effectiveness of LOs with FCM, a series of experimental studies using a real-world dataset were conducted. The findings of this study indicate that the proposed approach exceeds the traditional approach and leads to viable results.

The purpose of this study is two-fold.First, it seeks to establish a theoretical model of learning objects' classification based on a Multi-Label Classification approach combined with a fuzzy logic method.Second, it aims to suggest a new method of sharing objects based on classification by using web-mining techniques.This document is structured as follows: Section one provides an in-depth definition of the main ideas used in this article.Section two offers a summary of learning object classification methods.Section three will present our approach, and finally, Section four will display the experimental results.

2-Preliminaries
Before reviewing the literature for this study, it is essential to identify several fundamental concepts that are the main pillars of learning object classification.These notions are presented as follows: In the first subsection, we will shed light on the Multi-Label Classification approach (MLC) as a very useful approach in classification theories.Then we will move to the fuzzy C-Means (FCM) method, where we will be interested in its performance.Web-data mining (WDM) theories as a main concept in the classification area will be briefly described in the third subsection.Finally, machine learning will be introduced in order to enhance the clustering of our proposition.

2-1-Multi-Label Classification Approach
Multi-label Classification (MLC) is an automatic process that uses analysis techniques in order to label objects and classify them by topic [4].This approach uses a supervised learning method where a feature may be connected with multiple labels.It is opposed to single-label classification, where each feature is associated only with a single class (label).Furthermore, MLC is widely used in real-world problems such as bioinformatics, e-commerce, and so on.Due to their efficiency with the huge size of data and the difficulties of assigning a single label to objects, MLC plays an important role in the process of learning object classification [4].However, few and insufficient studies have explored the MLC problem in the e-learning area.

2-2-Fuzzy C-Means Clustering Algorithm
Clustering is a statistical analysis method that is used to organize raw data into homogeneous silos [5].Within each cluster, data is grouped according to a common characteristic.The clustering tool is an algorithm that measures the proximity between each element based on defined criteria.The purpose of clustering algorithms is to make sense of data and extract value from large amounts of structured or unstructured data.These algorithms allow the segmentation of data based on its properties or functionality and help group them into different clusters based on their similarities.There are two types of clustering: hierarchical clustering and non-hierarchical clustering [6].This explains that fuzzy C-Means clustering is an unsupervised, non-hierarchical clustering algorithm that tries to partition a finite collection of elements into a collection of fuzzy clusters with respect to some given criterion [7].The algorithm of fuzzy c-mean clustering can be summarized as follows: In the beginning, it fixes the value of c (number of clusters) and selects a value of m (generally m takes a value between 1.25 and 2), initializes the partition matrix   , then computes cluster centres, which will be repeated until the maximum convergence is reached.Based on these steps, we have developed the following algorithm: Step 1: Initialize  = [  ] matrix, (0) Step 2: Estimate the center vectors () = [  ] with () by: Step 3: Update ():

2-3-Web Data Mining
The Web Data Mining concept was defined in 1996 as "a new generation of computational theories and tools to assist humans in extracting useful information (knowledge) from the rapidly growing volumes of digital data" [8].The fundamental objectives of web data mining can be summarized as follows: they bring invisible information to the fore, they take into account the volume of web data, they transform the massive amount of web data into expert knowledge, and they provide valuable knowledge to the users despite the numerous attempts to characterize this field.The term "web data mining process" is frequently used with a combination of different techniques from various disciplines, including data analysis, artificial intelligence, and machine learning [9].A typical process of web data mining can be described in three successive steps: data preparation, or pre-processing data; discovering patterns; and analyzing patterns.
The pre-processing phase includes cleaning operations needed for the metadata of LO normalization.In other words, it reduces the data dimension by implementing different tasks, allowing the elimination of extra information like stop words, double adjectives, etc.In the second phase, all the information already prepared in the pre-processing step deals with the extraction methods, in which all data is labeled using machine learning algorithms.Finally, in the analysis process, the set of appropriate patterns will be presented by degree of similarity [10].

2-4-Machine Learning
Machine learning is an artificial intelligence technology that allows computers to learn without being explicitly programmed [11].More specifically, it consists of allowing algorithms to discover "patterns" in data sets.This data can be numbers, words, images, statistics, etc.In the machine-learning field, we distinguish different types of algorithms, divided into two categories: supervised and unsupervised algorithms [12].In the case of supervised algorithms, the data used for training is already "labelled".Therefore, the machine-learning model already knows what it should look for (pattern, element, etc.) in this data.At the end of the training, the trained model will be able to find the same elements in unlabeled data.Among supervised algorithms, we distinguish between classification algorithms (non-numerical predictions) and regression algorithms (numerical predictions), depending on the problem to be solved.On the other hand, unsupervised learning algorithms consist of training the model on data without labels.In this case, the machine goes through the data without any indications and tries to discover recurring patterns.This approach is commonly used in certain fields, such as cyber security, information research, indexation, etc.Among the unsupervised models, we distinguish clustering algorithms (to find groups of similar objects), association algorithms (to find links between objects), and dimensional reduction algorithms (to select or extract features) [13].In our studies, we must use reduction dimensionality and clustering algorithms because our objective is focused on the extraction and reduction of the learning objects' dimensions based on their metadata.To do that, we have implemented Naïve Bayesian algorithms.Naive Bayesian classification is a type of simple probabilistic Bayesian based on Bayes' theorem with strong independence of assumptions [14].In our case, the Naïve Bayesian will be used as follows: where P(class |LO) is Refers to the probability that even an object learning O belongs to a class C, P(LO) is Describes the probability of an object learning can be exist, P(class) is Calculates the probability in the whole group based on the total number of leaning object in all sections, P(LO|class) is Represents the probability of a specific class of object learning, and object can be modelled as term sets, so p(object|class) can be printed as: So; where P(Oi) is the probability that the Object i of a given learning object appears in a class C, which can be determined as follows: where  is the number of times the object term name appears in category ,  is the number of terms in  category,  is the scale of the table vocabulary, and  is A positive constant, normally 1 or 0.5 to prevent zero probability.

3-1-Learning Object
In recent years, there has been an intense debate about creating modern and effective digital teaching materials.These materials are often described as "Learning Objects" (LOs).The main idea of the creation and schematization of LOs as specific pedagogical tools is not far from the usual school materials, which are traditionally used by teachers in the classroom.What distinguishes the nature of LOs from other types of documents is their digital form, their creation in a computer environment, but also their supply of special features that allow them to be searched in a specific repertoire.These objects, which are recognized in the international scientific community as LO, present significant diversity.In 2006, A. Robertson attempted to schematize existing approaches: "For some, it is a numerical or non-numerical entity that can be indexed for learning purposes" [15].LO can be associated with content objects, educational objects, information objects, and knowledge objects [16].As Dodani [17] encodes them, LOs must have the following characteristics in order to function adequately at an individual level and in order to be easily and effectively used and transformed in different educational environments:  They are small, self-contained units of learning that offer a concept, information, or a process.These entities are distributed after having been previously tested and evaluated.
 They are described by "metadata" that allows us to classify and search for them.
 They are combined with other LOs to create complex educational entities, such as a set of concepts.
In practice, a learning object can be a web page, an image, a simulation, a test, or any other type of element involved in learning.Learning objects are not limited to courses or training content.A learning object can also refer to a procedure or guidelines to help the learner in his academic pathway.

3-2-Classification of Learning Object
Many research studies have classified learning objects detected in E-learning systems by using classification techniques [18,19].In Albreiki et al. [20], the authors shed light on the main classifiers that have been proven efficient in e-learning systems.The authors then propose a model that combines decision trees, neural networks, and Naïve Bayesian methods into a single module.Several studies on the classification of learning objects have been conducted using different resources and properties [3].Anantharaman et al. [21] introduced a new concept of learning object classification based on Long Short-Term Memory (LSTM) and the Random Forest classification approach.In this article, the authors provide an overview of the application of data mining methods in the e-learning process with web-based learning.However, with multi-label classification paradigms, only nine papers in the literature have developed LOs in e-learning systems.These papers offer a variety of approaches for incorporating feature-label correlations into learning objects' metadata in order to improve accessibility and recommendation at the same time.These works concentrate on how to modify formal single-label classifiers for multiple labels.Working on a model for multi-label classification and ranking of learning objects, the authors of López et al. [4] associated the concept of searching LOs marked by Learning Object Metadata (LOM).More specifically, the model provides a methodology that shows the task of multi-label mapping of LOs into different kinds of inquiries.In Aldrees & Chikh [22], researchers identify learning items by comparing and contrasting four multi-label classification systems.Carrillo et al. [23] investigate hierarchical multi-label categorization in the context of recommender systems.They propose a hierarchical multi-label metadata categorization with a machine-learning method to enhance the search and classification of educational resources.Additionally, this study contributes to previous research by providing a hierarchical multi-label learning object dataset in an appropriate format.Table 1 presents a review of the significant research articles that have adopted MLC methods for learning object content using LOs metadata.

González et al. (2017) [24]
Automatic classification of learning objects to reducing the number of used features

Batista et al. (2011) [25]
A System for multi-label classification of learning objects.

López et al. (2012) [4]
A model for multi-label classification and ranking of learning objects

Anantharaman et al. (2018) [21]
Modelling an adaptive e-learning system using LSTM and random forest classification

Rani et al. (2020) [27]
Multi-label classification of learning objects using machine learning algorithms

4-Our Approach
In this article, our first thought started with the idea that semantically similar terms are used in similar contexts.This raises two questions: how can we manipulate e-learning resources such as courses, videos, and pedagogical support in a flexible manner?Additionally, how can we explore and reuse these resources efficiently in different areas?For the first question, we have used object learning philosophy, in which every resource can be analyzed and schematized according to IEEE learning object standardization, such as metadata, taxon, identifier, etc. and to explore these metadata, our interest is focused on web mining techniques and the Fuzzy C-Means clustering algorithm.Web mining techniques refer to all the techniques aimed at exploring, processing, and analyzing large masses amount of information related to web activity, while FCM divides numerical data into clusters.This idea may be implemented on the learning objects by taking the metadata of each object, which contains a set of words that co-occur in a snippet with a target term, and applying FCM to calculate the distance between the words' similarity [28].For the second question, we thought of using multilabel classification in order to reuse it in different contexts.
Our methodology is composed of two stages, as indicated in Figure 1.In the first phase, known as the learning phase, our algorithm generates a list of learning objects based on the extraction performed by web mining techniques and the similarity executed by FCM.In the second phase, our algorithm classifies the learning objects listed in the first phase according to the degree of similarity and labels them with multi-label classification in order to reuse objects and propose the top recommendations to the learners.With this approach, we expect two objectives: the improvement of the similarity of the terms analyzed and extracted by web mining and the optimization of the metadata of each learning object to facilitate its use.

5-1-Corpora Summary
Different companies' data has been selected to conduct the present study.These data consist of presentation, practice, and conceptual models of learning objects with a variety of categories.Table 2 shows the descriptions of learning objects used in our studies.

Presentation Object
Transmit specific information about specific subject.
Theories of E-learning system Practice Object Practice with feedback Quiz, exercises, QCM.

Conceptual model
Summer of some subjects with related concepts.

Visual representation of phenomena
The companies from which the data is collected include the Moodle datasets, the Blackboard datasets, and the Schoology datasets (Figure 2).
Moodle is a free and open-source learning platform.It offers specific solutions to various educational needs, like Moodle App, Moodle Education, Moodle Net, Moodle Workplace, and Certified Integrations [29].
Blackboard is one of the most popular names in the digital learning area.The platform provides the core learning management features.It can also manage online and blended classes [30].Schoology platform aims to provide all the tools that professors need to design lessons, communicate with their students, and collaborate with other educators [31].

5-2-Experimental Procedure and Performance Measure
To evaluate the performance of our approach, we used a set of indicators in the form of mathematical rules such as classification accuracy, precision, recall, and F1-measure.These indicators are generally used to examine the performance of any proposed system compared to the existing ones.These indicators are described as follows:

5-2-1-Classification Accuracy
Classification accuracy shows how many of the predictions are correct.

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
Number of all predictions (7) In some situations, accuracy serves as a good measure, while in others, accuracy appears insufficient.For instance, a prediction accuracy of 94% indicates that 94 out of 100 samples were successfully anticipated without prior knowledge of which tasks are properly predicted.

5-2-2-Precision and Recall
By going beyond classification accuracy, precision, and recall measures, they give us a clearer picture of how to evaluate models.The task and our objectives will determine which one we should favour.

5-2-3-Precision
Precision measures how good our model is when the prediction is positive.

5-2-4-Recall
Recall measures how good our model is at correctly predicting positive classes.

Number of Learning Objects
Learning objects

5-3-Dimensionality Reduction
The selection criterion is mainly aimed at selecting the most relevant model that suggests the most appropriate learning objects.For this, classification techniques are very beneficial, both for the management systems that contain these objects and for the learners, who will benefit from a simple and fast search.In addition to this, with a good classifier, the initial data will be transformed into a new dimension by ignoring the massive data that is not appropriate.
In our article, we will evaluate the validity of our approach by comparing its indicators of performance (PR: precision, RE: recall, and F1-measure) to the traditional machine learning algorithms for classification and suggestion like SVM and Naïve Bayesian.For this concern, we will use Python as a software platform, which has all these machine learning algorithms already programmed.In our approach, we will first test the performance of the data with SVM, and then we will move on to test the same data with SVM combined with multi-labeled classification and FCM.To confirm the precision of our suggested approach, the same data was used with NB (Naïve Bayesian) and NB combined with multilabeled classification and FCM.
The results appear in Table 2 and show that our approach performs better for the classification of learning objects compared to other learning algorithms in different corpora.The computed results of different classifiers in Table 3 confirm that the results of SVM combined with the fuzzy logic clustering method and multi-ladled classification for data reduction are generally better than NB and its variants.

6-Conclusion
This paper addresses the problem of classification of learning objects based on multi-label classification and Fuzzy C-Means in order to improve the recommendation of these objects to learners and facilitate their handling and sharing by e-learning systems.For this purpose, we have organized all these e-learning resources into the IEEE learning objects standard.Then we used web mining techniques to explore them, especially in their metadata.Our objective behind the use of the multi-label classification approach in the e-learning system is to make all the learning objects reusable in different contexts.Moreover, the multi-label classification approach may enhance the performance of the top nrecommendations.The Fuzzy C-Means technique is used mainly to calculate the similarity between learning objects and consequently reduce the massive amount of data by ignoring the ones that are not appropriate.To examine the efficacy of our proposed approach, we have used data sources from three platform systems (Moodle, Blackboard, and Schoology), which are considered the best sources in the world.To evaluate the performance of the proposed approach, we have used the most widely known indicators in the area of classification and recommendation, such as precision, recall, and F1measure.Using preprogrammed Python libraries, we have compared the experimental implementation of traditional machine learning algorithms like SVM and NB on proposed data to our approach.The results of this study showed that the proposed method increases the classification of learning objects better than conventional techniques.

7-2-Data Availability Statement
Data sharing is not applicable to this article.

Figure 1 .
Figure 1.Illustration of our Design with Multi-Label Classification Algorithm combined with Fuzzy C-Means and Web-Mining techniques


TP (True positive): Predicting positive object as positive (ok)  FP (False positive): Predicting negative object as positive (not ok)


FN (False negative): Predicting positive class as negative (not ok).