A Proposed Framework of Knowledge Management for COVID-19 Mitigation based on Big Data Analytic

The COVID-19 pandemic has highlighted the importance of effective knowledge management in mitigating the impact of public health crises. Big data analytics can play a critical role in providing insights and informing decision-making during a pandemic. However, the challenges associated with collecting, analyzing, and managing the data, especially with privacy and security concerns, make it a complex task. This paper proposes a knowledge management framework for COVID-19 mitigation using a big data analytics approach. The framework includes a systematic process for data collection, analysis, and dissemination, as well as a set of best practices for knowledge management. Additionally, the framework complies with data protection and privacy regulations. The proposed framework aims to support public health officials and other stakeholders in effectively managing the COVID-19 pandemic by providing timely and accurate information. It can also be adapted and applied to other public health crises and be a useful tool for addressing the challenges associated with big data analytics in the context of public health. The paper presents the proposed framework in detail and provides components of how the framework can be applied to COVID-19 in Indonesia.

analyzed, have been major challenges, as the data may be incomplete, inconsistent, or inaccurate [4].Data privacy and security, defined as the collection and sharing of large amounts of personal data, have raised concerns about privacy and security, particularly given the sensitive nature of the data being collected [5].Data integration and standardization, defined as integrating and standardizing data from multiple sources, have been challenging as different countries and organizations may use different data systems and formats.Data interpretation and validation have also been challenging, as it may be difficult to establish causality and distinguish between correlation and causality.Ethical and legal considerations must also be taken into account when collecting and sharing data related to COVID-19, such as informed consent, data protection, and intellectual property rights.The lack of interoperability between different platforms and systems used to collect, store, and share data makes it difficult to access and share information between different organizations.Keeping knowledge management up to date with the latest information related to the fast pace of the pandemic, new strains, and therapies is a significant challenge.Overall, these challenges demonstrate the complexity and difficulty of managing big data and knowledge management in the context of a pandemic, but with the right strategies and tools, it is possible to overcome these challenges and make the most of the data available.
Data plays a crucial role in understanding and mitigating the spread of COVID-19.By collecting, analyzing, and interpreting data on the pandemic, governments and health organizations can make informed decisions on how to best respond to the outbreak [6].One key use of data in COVID-19 mitigation is tracking and tracing infections.By analyzing data on confirmed cases and their contacts, health officials can identify and isolate infected individuals, as well as implement quarantine measures to prevent the further spread of the virus.
Data can also be used to analyze the effectiveness of various mitigation strategies, such as mask mandates, social distancing measures, and lockdowns.Officials can assess the impact of these interventions and make adjustments as necessary by comparing data on case numbers and transmission rates before and after implementation.Another important use of data is modeling the virus's spread and forecasting future case numbers.By analyzing data on transmission patterns and demographic information on infected individuals, researchers can create models to predict the outbreak's course and inform decisions on resource allocation and pandemic response.
Data can also help identify high-risk populations and target interventions to those most in need.By analyzing data on demographics, socioeconomic status, and underlying health conditions, officials can identify communities and individuals at higher risk of severe illness and death and provide targeted interventions to reduce these risks.Overall, data plays a crucial role in understanding the COVID-19 pandemic and making decisions on how to mitigate its spread.Governments and health organizations can respond effectively to the outbreak, save lives, and reduce its impact on society by collecting, analyzing, and interpreting data.
The COVID-19 pandemic, caused by the novel coronavirus SARS-CoV-2, has had a significant impact on global health and the economy, resulting in many deaths and widespread disruptions to daily life.Effective knowledge management (KM) is crucial for mitigating the impact of the pandemic, which involves capturing, distributing, and effectively using knowledge.In the context of a public health crisis like COVID-19, KM is essential for understanding the disease's spread and developing effective strategies for controlling it, such as identifying high-risk populations, developing effective communication strategies, and identifying potential treatments and vaccines.
Big data plays a critical role in KM for COVID-19 mitigation.Public health officials and researchers can gain insights into the disease's spread and identify potential mitigation strategies by collecting, analyzing, and leveraging large amounts of data from various sources, such as social media, electronic health records, and public health databases.However, using big data for KM in the context of COVID-19 also presents challenges, including issues with data quality, privacy, and the need to quickly analyze and disseminate information in a timely manner to inform public health decision-making.Therefore, an effective KM strategy for COVID-19 mitigation must balance the need for timely and accurate information with considerations for privacy, data quality, and responsible data use.

2-The Challenge in COVID-19 Mitigation
Indonesia, being one of the most populous countries in the world, has been greatly affected by the COVID-19 pandemic.The country's government has been collecting and sharing data on the pandemic through various sources, such as the Ministry of Health and the COVID-19 task force [7].However, the data collection and reporting have been criticized for being inconsistent, incomplete, and unreliable.There have been reports of underreporting of cases and deaths, as well as discrepancies in the data reported by different sources.The government has also been criticized for its handling of the pandemic, particularly for its slow response in the early stages of the outbreak and its lack of transparency in sharing information.
In addition, Indonesia has also faced challenges in managing big data and knowledge related to COVID-19 [8].The lack of a centralized data system and standardization in data collection and reporting have made it difficult to analyze and interpret the data.In recent months, the government has been implementing measures to improve data collection and reporting, such as the use of digital platforms and the integration of data from various sources.The handling of COVID-19 data in Indonesia has been a challenge, but the government is taking steps to improve the situation and provide more accurate and transparent information to the public.Indonesia, like many other countries, has faced several problems regarding the collection, reporting, and management of COVID-19 data.These problems include inconsistency and underreporting; a lack of transparency; a lack of standardization; a lack of a centralized data system; limited testing and contact tracing; limited access to healthcare; and limited access to technology and digital platforms.These issues have made it challenging to understand the true extent of the outbreak in Indonesia and implement effective measures to control the spread of the virus.The government has been taking steps to address these problems, but there is still a long way to go to ensure accurate and transparent data management.
Knowledge Management (KM) plays a critical role in the development and use of COVID-19 models.KM systems can ensure that the data and information used in the models are accurate, up-to-date, and accessible to the right people.
One key aspect of KM in COVID-19 models is the management of data.KM systems can collect, store, and organize large amounts of data on the pandemic, such as information on confirmed cases, deaths, and transmission patterns.This data can then be used as input for the models, allowing for more accurate and reliable predictions.
Another key aspect of KM in the COVID-19 models is the management of knowledge and expertise.KM systems can share information and best practices among researchers and modelers, allowing for the development of more robust and accurate models.KM systems can also connect modelers with subject matter experts in fields such as epidemiology and virology, who can provide valuable insights and feedback on the models.KM can also manage and share the model's documentation, assumptions, and results, allowing for transparency and traceability of the model's development and results.This is crucial for building trust and acceptance of the models and the decisions based on them.
KM systems can also manage the access and distribution of the models, ensuring that they are available to the right people at the right time.This can help to ensure that the models are used effectively in decision-making and that their results inform policy and resource allocation.Overall, KM plays a crucial role in the development and use of COVID-19 models by ensuring that data and information are accurate, up-to-date, and accessible to the right people, and by providing transparency and traceability of the model's development and results.

3-1-Current Approach
Research and development activities for big data analysis aimed at mitigating the spread and increasing awareness of COVID-19 in Indonesia were initiated through the "Indonesian COVID-19 Response" on March 13, 2020 [7].Data collection and analysis of big data were conducted on a small scale (in a laboratory) using an approach that involved gathering data from official sources and supporting data from media portals.A COVID-19 analysis model was developed that supports mapping and provides portraits of the latest conditions, which can be accessed via http://covid19.gamabox.id/analysis.This research served as a baseline for the work presented in this paper.Several studies have been conducted on data analysis for COVID-19 using different approaches such as time-series prediction [9], machine learning methods [10,11], and deep learning models [10,11].These studies investigate factors that significantly affect the spread of the virus and make predictions about future trends.In addition, some studies focus on analyzing COVID-19 cases for specific regions, such as West African countries, using quartic curve estimation models and estimators [12][13][14][15].
Effective knowledge management (KM) practices are crucial in managing public health crises, as previous research has shown.KM practices, such as the use of knowledge management systems and communities of practice, can improve outbreak response efforts by facilitating the sharing of information and expertise among public health professionals.Creating a centralized repository of information and promoting collaboration among stakeholders can improve decisionmaking during a public health crisis by providing public health officials with timely and accurate information.Big data and analytics can also improve KM during public health crises [16][17][18], such as by analyzing social media data to provide valuable insights on public perception and sentiment during a crisis or by using natural language processing techniques to analyze electronic health records to provide important information on the spread of infectious diseases.
However, there are also challenges associated with using KM practices during public health crises, such as a lack of standardization and interoperability among KM systems, which can impede information sharing during an outbreak.Additionally, a lack of trust among stakeholders can impede collaboration and information sharing during a public health crisis.In summary, previous research on KM in the context of public health crises has shown the importance of effective KM practices, such as the use of knowledge management systems, communities of practice, and big data analytics, in managing and mitigating the impact of a public health crisis such as COVID-19.However, it also highlights the challenges that can arise and need to be addressed, such as lack of standardization, interoperability, and trust among stakeholders.

3-2-Proposed Method
There are several methods that can be used to collect and analyze big data related to COVID-19, such as social media data collection, electronic health records, and surveys.Social media platforms, such as Twitter and Facebook, can be used to collect data on public perceptions and sentiment related to COVID-19.This can include data on the spread of misinformation, discussions of symptoms and treatment, and overall public sentiment towards the pandemic.Tools such as APIs, web scraping, and sentiment analysis can be used to collect and analyze this data.Electronic health record (EHR) data collection can be used to track the spread of COVID-19 and identify high-risk populations.This data can include information on patient demographics, symptoms, test results, and treatment.Data can be collected from hospital systems, health insurance companies, and other sources.Public health databases, such as the World Health Organization's (WHO) data, can provide information on the number of confirmed cases and deaths, as well as information on outbreaks and risk factors.Surveys can be used to collect data on public perceptions and behaviors related to COVID-19.This can include information on compliance with public health measures such as mask-wearing and social distancing.
Once the data is collected, it can be analyzed using a variety of methods, such as natural language processing (NLP), machine learning (ML), and data visualization.NLP techniques can be used to analyze unstructured data, such as social media posts, to gain insights on public perceptions and sentiment related to COVID-19 [19][20][21].ML algorithms can be used to analyze large amounts of data, such as EHR data, to identify patterns and trends related to the spread of COVID-19.Data visualization tools can be used to present the results of data analysis in a clear and easy-to-understand manner.
It is important to note that using these methods also requires compliance with data protection and privacy regulations, such as HIPAA in the United States and GDPR in Europe, to ensure the protection of personal information.The findings from the data analysis will vary depending on the specific data sources and methods used.However, some possible insights that may be gained from the data analysis include:  Identification of high-risk populations: Analysis of electronic health record (EHR) data can provide information on the demographics of individuals who are most at risk for severe illness from COVID-19, such as older adults or those with underlying health conditions.
 Understanding public perceptions and behaviors: Analysis of social media data can provide insights on public perceptions and behaviors related to COVID-19, such as compliance with public health measures such as maskwearing and social distancing.
 Identification of potential hot spots: Analysis of public health databases and social media data can provide information on the spread of COVID-19 and identify potential hot spots where outbreaks may be occurring.
 Identification of potential treatments and vaccines: Analysis of clinical trial data and scientific literature can provide information on the efficacy of potential treatments and vaccines for COVID-19.
 Identification of misinformation: Analysis of social media data can help identify misinformation about COVID-19 and inform the development of strategies to combat its spread.
A proposed knowledge management framework for COVID-19 mitigation using big data analytics is shown in Figure 1.It shows several stages:  Data Collection: This stage involves gathering data from various sources, such as electronic health records, social media, and public health databases.The data is then cleaned, pre-processed, and transformed to make it suitable for analysis.
 Data Analysis: This stage involves using big data analytics techniques to extract insights from the collected data.This may include identifying high-risk populations, understanding public perceptions and behaviours, identifying potential hot spots, and identifying potential treatments and vaccines.
 Knowledge Creation: This stage involves creating new knowledge from the insights gained through data analysis.This may include developing predictive models, identifying patterns, and generating hypotheses.
 Knowledge Sharing: This stage involves sharing the knowledge created with relevant stakeholders, such as public health officials, healthcare providers, and researchers.This may include creating reports, dashboards, and other visualizations to make the information easy to understand and use.
 Knowledge Utilization: This stage involves using the shared knowledge to inform decision-making and develop mitigation strategies.This may include targeted outreach to high-risk populations, the development of effective communication strategies, targeted testing and contact tracing, the promotion of effective treatments and vaccines, and combating misinformation.
 Knowledge Evaluation: This stage involves monitoring and evaluating the effectiveness of the knowledge management framework and the mitigation strategies developed.This may include collecting feedback from stakeholders, analyzing the impact of the strategies, and making adjustments as needed.
 Data Governance: This stage involves ensuring compliance with data protection and privacy regulations, such as GDPR and HIPAA.This may include implementing security measures to protect the data, creating policies and procedures for data access and use, and training staff on data governance best practices.

4-1-Knowledge Management using Exploratory Data Analytic
With 34 provinces in Indonesia, we have decided to focus on the major ones for this project.There are several reasons for this decision.Firstly, the 'smaller' provinces have low amounts of data, which would impact the analysis.Additionally, the spread of COVID-19 in Indonesia is mainly concentrated on Java Island.Therefore, the provinces we will be considering for this project are DKI Jakarta, Jawa Barat, Jawa Tengah, Jawa Timur, Yogyakarta, and all 34 provinces combined.
We have narrowed down the features to six that will have a significant impact on the recovery rate of COVID-19 in Indonesia: Date, Cumulative Confirmed Cases, Daily Confirmed Cases, Daily Recovered Cases, Daily Death Cases, and Province.Multiple features would take more time in the process, so we have limited them to these six.Using Google Data Studio technology, we processed the predictive analysis on the COVID-19 Case Report on five provinces in Indonesia in the first step, as shown in Figure 2. Users can see the data based on the data range that can be set on the upper right side of the dashboard.The overview part consists of the main information, such as Total Daily Confirmed Cases, Total Daily Recovered Cases, and Total Daily Death Cases.In the lower part of the dashboard, we can see the visualization for daily confirmed cases and cumulative confirmed cases.Not only does it show cases in five provinces, but also the total number, which is represented by the blue line.Combining the visualization with the COVID-19 roadmap in Indonesia explained in the first section, we have gathered some insights.Sometimes, the peaks correlate with events in the country, and other times, they do not.We have indicated those times with circles, which are explained below.There was a momentary rise between May 21 and 23, 2021, from the previous weeks, with up to 684 total daily confirmed cases.The province with the highest cases was Jawa Timur.There was an instantaneous rise on the 9th of July, with 1891 total daily confirmed cases.There was no event to support the spike in the case number.This time, the province with the highest number of daily confirmed cases was Jawa Barat.There was a slightly declining trend in October 2020, despite the protests that occurred in several cities against the Omnibus Law on Job Creation.There was a steady rise during the campaign and election period, with an average of 4000 daily confirmed cases.It was a slight difference between DKI Jakarta and Jawa Barat.There was a staggering increase between Christmas Eve and the New Year period, peaking at 10455 daily confirmed cases.President Joko Widodo received two doses of the vaccine on January 13 and 27, 2021, respectively.There was a steady fall shown in the monthly trend from February 2021 onwards.

4-2-Knowledge Management using Predictive and Diagnostic Analytic
For the prediction, we used a Jupyter notebook with the Python programming language.Before we go any further, let's take a look at the mini exploratory data analysis with the data trend.Before moving on to the visualization and machine learning parts, we cleaned the data so it was ready to use.As we can see from the above line chart, the cumulative confirmed cases in all five provinces are experiencing an uptrend with no signs of leveling off, as shown in Figure 3.With the help of scatter plots, we can also see the daily confirmed cases.Unlike the previous one, this one is more detailed since it shows pure daily cases.It makes sense why it has some downtrends since it is not cumulative.Similar to the previous visualization, the trends of daily recovered cases and daily death cases are shown in Figures 4 and 5. To further deepen the understanding of the data, we need to know the correlation of each feature.Thus, we plot the pairwise relationship with its one-hot map correlation.With the help of a one-hot map, we can see that most of them have a high correlation with each other, except for the daily death cases, since it is close to 0 (zero).

4-2-1-Prediction Analysis using Machine Learning
The first thing to do before putting features on any machine algorithm is to analyze them.We would like to know how many missing values we have, to pre-process our data later.The package that we use is mainly JCop ML.The feature has a relatively low rate of missing values except for daily death cases.In this case, we will impute it with the median value.For the algorithm part, we tried four different algorithms with different strategies.From simple linear to distance-based, tree-based, and also the most popular algorithm.The explanations provided below will be included with the hyperparameter tuning of the grid search, random search, or manual one.Additionally, to better understand the problem, we will be plotting both actual prediction plots and residual plots.

Linear Regression
The reason we started with linear regression is its simplicity.We want to have a simple model to be able to know whether the other models are good enough compared to their complexity.By using such a simple model of linear regression combined with grid search, we can have these scores in Table 1.We tried to use the polynomial parameter; it suits our data more with considerable accuracy compared to the previous one.Upon trying other, more complex algorithms, we will see the features' importance by using the mean score decrease.
As we can see, the number of actual daily death cases and the province have a low impact on the recovery rate.However, since we have only a few features out of four, we will not discard these features to avoid high bias.

K-Nearest Neighbours
For this model, we tried two types of hyperparameter tuning, which include random search and grid search, with the scores listed below.As we can see, the overfitting for this model is too much.It indicates that decision-based algorithms, such as K-Nearest Neighbors, may not be suitable for our case, as shown in the results in Table 2.

Random Forest
This time, we see a significant improvement.Compared to our simple linear regression model, the accuracy has increased by almost 10%.However, the difference between the training and test data is almost 10%, indicating significant overfitting.Nevertheless, this is tolerable since we only use four features.With a small number of features, bias is more likely to increase (inversely related).The bias-variance tradeoff has been made here because our main concern is efficiency, and the 10% bias is acceptable, as shown in the results in Table 3.

XGBoost
As we have seen previously, our data and model tend to overfit, which gives us the expectation that XGBoost will give even more overfit results, as shown in Table 4.With the complexity of this model and its few features, it is certainly overfit.Furthermore, if we compare the accuracy with a random forest model, it does not make a significant difference.Thus, given the more complex and time-consuming process, this model is not worth using for production.

5-Conclusion
In summary, the key findings from data analysis in the context of COVID-19 mitigation using big data include identifying high-risk populations, understanding public perceptions and behaviors, identifying potential hot spots, potential treatments and vaccines, and misinformation.These findings can inform the development of effective mitigation strategies, such as targeted outreach, effective communication strategies, targeted testing, contact tracing, promoting effective treatments and vaccines, and combating misinformation.
For future research on knowledge management for COVID-19 mitigation using big data, some recommendations include further research on the use of big data analytics to identify high-risk populations, the use of social media data to understand public perceptions and behaviors, and the use of EHR data to track the spread of COVID-19 and identify hot spots.Additionally, research on KM practices to improve collaboration and information sharing among public health officials and other stakeholders during a public health crisis and on the ethical and privacy implications of using big data in the context of COVID-19, including the development of methods to ensure compliance with data protection and privacy regulations.It is important to note that ongoing monitoring and updating of knowledge management strategies will be essential as new information and knowledge become available during this pandemic.

Figure 1 . 4 -
Figure 1.A Proposed Framework of Knowledge Management for COVID-19 Mitigation using Big Data Analytic 4-Result and Discussion on COVID-19 Mitigation using Big Data Analytic

Figure 2 .
Figure 2. Exploratory Data Analytic of cumulative cases