Batch and Streaming Data Ingestion towards Creating Holistic Health Records

The healthcare sector has been moving toward Electronic Health Record (EHR) systems that produce enormous amounts of healthcare data due to the increased emphasis on getting the appropriate information to the right person, wherever they are, at any time. This highlights the need for a holistic approach to ingest, exploit, and manage these huge amounts of data for achieving better health management and promotion in general. This manuscript proposes such an approach, providing a mechanism allowing all health ecosystem entities to obtain actionable knowledge from heterogeneous data in a multimodal way. The mechanism includes diverse techniques for automatically ingesting healthcare-related information from heterogeneous sources that produce batch/streaming data, managing, fusing, and aggregating this data into new data structures (i.e., Holistic Health Records (HHRs)). The latter enable the aggregation of data coming from different sources, such as Internet of Medical Things (IoMT) devices, online/offline platforms, while to effectively construct the HHRs, the mechanism develops various data management techniques covering the overall data path, from data acquisition and cleaning to data integration, modelling, and interpretation. The mechanism has been evaluated upon different healthcare scenarios, ranging from hospital-retrieved data to patient platforms, combined with data obtained from IoMT devices, having produced useful insights towards its successful and wide adaptation in this domain. In order to implement a paradigm shift from heterogeneous and independent data sources, limited data exploitation, and health records, the mechanism has combined multidisciplinary technologies toward HHRs.

the authors in Johnson and Khoshgoftaar [3] where they compare data-level and algorithm-level deep learning methods across different class distributions, but also could improve the general quality of healthcare.The Personal Health Records (PHRs) and EHRs of today are, however, a long way from what the public views as being valuable to their health.This is in line with the ideas of 80% of the population who believe that health encompasses more than just the absence of disease, as described by the authors in García [4], where they are discussing the fact that corruption is embedded in health systems and also includes a range of components of daily life, such as the environment, an active and healthy lifestyle, nutrition, and mental and emotional health.
It would be beneficial to collect this data and link it to other data in EHRs and PHRs to learn more about the effectiveness of patient pathway management, diseases, and the results of prevention measures and health policies.As a result, today's delivery of sustainable healthcare services and platforms is based on data exchange across heterogeneous healthcare systems with a focus on healthcare management.Such thing is discussed in Cave et al. [5] where the authors describe that the possibility to provide a better characterization of illnesses, treatments, and the performance of pharmaceutical goods in various healthcare systems is made possible by the growing volume and complexity of data that is currently being recorded across a variety of settings and devices.The healthcare sector generates massive volumes of irrelevant data, from ordinary patient treatment to record keeping, which by themselves have little real value, as discussed in Asah et al. [6], where the authors refer to the fact that healthcare organizations misplace their attention on what and how they should learn.As a result, there is a growing need to develop methodologies and procedures for successfully integrating and merging such heterogeneous data.
At the same time, nowadays, this data derives either from data sources that contain historical or already captured data (e.g., online and offline platforms, hospitals' and laboratories' databases containing citizens' healthcare data) or from data sources producing real-time data (e.g., Internet of Medical Things (IoMT) devices that automatically measure and monitor in real-time various medical parameters in the human body).Even if, for the part of collecting historical or already captured data, there already exists a plethora of methods and techniques for automatically capturing such data in batches, this is not the case for the ingestion of real-time (i.e., streaming) data.As a result, current assisted living solutions need to be enhanced to support such functionalities, since citizens have their personal IoMT devices to monitor their individual parameters (e.g., body temperature, breathing activity) and track their daily activities (e.g., distance walked, calories burned), giving recommendations for improving their lifestyle, their personal activities in their living environments, as well as preventing the onset of health-related problems, as described in an overview of the IoMT in Vishnu et al. [7].All these devices should be uniformly discoverable and able to be integrated with the various existing healthcare platforms.However, all the existing IoMT devices most of the times are surrounded by high levels of heterogeneity, since they have diverse capabilities, functionalities, and characteristics.In such cases, it becomes essential to offer abstractions of these devices to both the platforms and the end-users and develop tools to handle the interoperability among them, as provided in Noura et al. [8], where it is being discussed that IoT interoperability, or the capacity for numerous IoT platforms from different suppliers to coexist, is being supported by a number of academia, business, and standards groups in order to facilitate smooth resource sharing between different IoT vendors.Therefore, the first challenge that arises refers to the heterogeneity of all the existing IoMT devices in combination with the difficulty of all the existing healthcare systems/platforms to communicate with and interact with these devices.
On top of this, interlinking the data from such heterogeneous devices with citizens' EHRs and PHRs could create a comprehensive picture of individuals' health parameters, thus detecting conditions that could lead to health deterioration and triggering the corresponding interventions by healthcare professionals, resulting in more effective preventive care.On top of this, in Miorandi et al. [9], it is discussed that, through the use of the right information and communication technologies, the IoT anticipates a time when it will be possible to connect digital and physical entities, opening up a whole new range of services and applications.The concept of mapping clinical information with other citizens' life data could create several advantages and benefits for better decision-making and for identifying prevention strategies' outcomes, illnesses, and clinical pathways' efficiency [10].All these highlight the need for a holistic approach to gather and exploit all the vast healthcare data amounts for achieving better health management and patient outcomes, the prevention of diseases, effective and targeted policy making, and health promotion in general.Hence, the challenge that emerges is to merge all the data that is available for exploiting the advantages of community knowledge by constructing new data structures to contain data of any type and category that is analogous to a citizen's overall health (i.e., medical, nutritional, social care data, lifestyle, etc.).
Considering all these challenges, by effectively gathering and integrating data from both individuals' EHRs and PHRs, as well as from their personal IoMT devices, collective community knowledge could be extracted, playing a significant dual goal to collect, fuse, and analyze information from different entities to extract valuable knowledge towards the provision of actionable insights at the point of care.To address such gaps and requirements, this manuscript introduces a mechanism that aims to integrate methodologies for a paradigm shift from heterogeneous and independent data sources and limited data exploitation and health records (i.e., EHRs and PHRs), to complete integrated data views via Holistic Health Records (HHRs).The latter include a newly proposed data model and structure that enable the aggregation of real-time and batch data coming from different sources.To effectively construct the HHRs, the mechanism develops various data management techniques that cover the complete data lifecycle, from the collection of the heterogeneous data until its aggregation, fusion, and linking.In more detail, the mechanism consists of the functions of Data Ingestion, through which it may connect to many heterogeneous data sources and gather their data, and Data Processing, where it can process the external healthcare data it receives and store it in its internal datastore.The proposed mechanism has been evaluated through diverse scenarios that provide different datasets, ranging from hospital-retrieved data to patient platforms, combined with data obtained from IoMT devices and data derived from external data sources, proving its applicability and overall efficiency.
The remaining paper has the following structure.Section 2 describes the overall architecture of the proposed mechanism, depicting all its components, combined with the intercommunications among them to achieve heterogeneous healthcare data integration towards the construction of the HHRs.Section 3 evaluates the reference implementation of the mechanism against a specific healthcare scenario, whereas Section 4 discusses its effectiveness and overall contribution.To this end, Section 5 contains the conclusion of this manuscript, outlining our next plans.

2-1-Architecture
In this section, a blueprint of the proposed mechanism is presented, along with the internal process that takes place for its seamless interaction and integration with either streaming data sources (i.e., IoMT devices) or batch data sources (i.e., external systems and platforms), as depicted in Figure 1.In short, the mechanism consists of the operations of two (2) discrete pillars: (i) Data Ingestion, in which the mechanism can connect to the various heterogeneous data sources and collect their data, and (ii) Data Processing where the mechanism is able to process the received external healthcare data and store it in its internal data store.Finally, it must be noted that to perform the mechanism's operations, it is assumed that the subjective citizens own an IoMT device, whereas the external batch data sources contain historical personal data of the corresponding citizens.

2-1-1-Data Ingestion
The Data Ingestion pillar is responsible for undertaking all the functionalities that are related to the integration, anonymization, and verification of the incoming healthcare data.In this pillar, the mechanism initially takes as an input data coming from known and unknown sources.The unknown sources refer to streaming data sources (i.e., IoMT devices like wearable IoT devices), whereas the known sources refer to sources that their data, at rest, already exist in diverse healthcare datastores and are considered as trustful and reliable without needing to be under further inspection.
To be more specific, a wearable device (i.e., an IoMT device) owned by a citizen is regarded as an unknown source for the mechanism.Thus, the incoming data is immediately delivered to the Trust & Reputation component at the start of this pillar.This collects the required reputation and trust ratings for the specified device from an existing trust evaluation models' datastore and generates feedback based on those ratings for the associated input.After the evaluation, the mechanism ranks the unknown device, deciding whether the device will be characterized as trustful or not, thus enabling the device to be connected to the mechanism or not, correspondingly.
Depending on the data source type that has been connected (i.e., either known or unknown sources) and the corresponding way that must be used for ingesting its data (i.e., streaming collection for unknown sources and batch collection for known sources), the mechanism flow has two (2) distinct paths, as follows.
In the primary path (streaming ingestion for unknown sources), initially the Plug'n'play Sources component takes place.In this case, multiple methodologies are provided for integrating all the new streaming data sources into the mechanism during runtime and finally gathering all their data.To accomplish that, it exploits the approach proposed in Mavrogiorgou et al. [11] to interact with the different heterogeneous IoMT devices of unknown nature and ingest their data.More specifically, three (3) layers of this approach are exploited, namely the Devices Connection, the Devices Type Recognition, and the Devices Data Collection.In the first layer (i.e., Devices Connection), all the IoMT devices available for connection are identified and connected to the mechanism via a Bluetooth interface that is provided, which collects their characteristics (specifications) and Application Programming Interfaces (APIs).However, the mechanism can interact and communicate only with devices that offer open APIs (i.e., give public access to the methods that they include), since the devices that are based on private APIs are not publicly providing information about their incorporated methods.As soon as this connection happens, the mechanism retrieves the device's name and Media Access Control (MAC) address, which are provided as inputs into the MAC Vendors API [12], revealing the name of the manufacturer of the IoMT device.By identifying the manufacturer of the device, the mechanism gathers from the manufacturer's website information about the offered APIs.Consequently, it gets information about (i) the API Uniform Resource Locator (URL) paths that provide access to the different methods of the API and (ii) the descriptions of the API methods.In the sequel, in the second layer (i.e., Devices Type Recognition), the mechanism, by applying the approach of Mavrogiorgou et al. [13], calculates the syntactic similarity between the connected IoMT device and a list of already recognized IoMT devices, based on their specifications (i.e., manufacturer and name), to classify the connected IoMT device to the device type of the already recognized ones (known), regarding the similarities that exist among their specifications.
Afterwards, in the third layer (i.e., Devices Data Collection), since the type of the connected IoMT device has automatically been identified, in order to find the specific functionality purposes of each distinct API method of the device's manufacturer, the mechanism determines the semantic similarities among all the manufacturer's available API methods' descriptions and the API methods' descriptions of the already recognized IoMT devices of the mechanism, following the process provided in Kiourtis et al. [14].Following this step, the techniques for obtaining the device's data are extracted.Finally, since (i) the API calls of the connected IoMT device that were found to be used collecting the device's data may include distinct functions, therefore collecting diverse data from the IoMT device; and (ii) the mechanism tells the user about the techniques that can be utilized even though the user may not want to retrieve all this data from the device.The user can select which of these methods she would like to utilize to gather data from the device as a result.The user must fill up her personal information to verify its accuracy before allowing her personal data to be exported and delivered to the device through the offered consent interface.This information relates to her unique login information for the online account of the relevant device manufacturer from which she desires to send her data.Thus, using this method, she is eventually given the option to transfer her data to the mechanism if the personal information she enters matches that of her personal account.After the completion of this process, all the data is provided to the Gateway component to be transported into the remaining architecture pipeline.
In the second path (batch ingestion for known sources), only the Gateway component takes place, where both communication and connectivity problems are resolved simultaneously, to collect the data from the connected known sources (i.e., sources that are already reliable).Into this context, the Gateway provides a unified and abstracted API [15] that collects information from several data sources (and as a result from several interface implementations) including but not limited to healthcare organizations, sensors, mobile applications, and laboratories.It facilitates the resolution of the connectivity and communication challenges with such information sources, ensuring the interaction with the rest of the internal components of the mechanism.
After the data is successfully passed into the mechanism, the Data Anonymization component acts anonymizing all the ingested data, by exploiting the ARX anonymization tool [16].It should be noted that for the known sources, when requested, the whole data anonymization procedure may happen within the various organizations of the provided healthcare data, to achieve and enable protection of data, privacy policy, and avoid possible security issues that would arise in case of a network transmission.Hence, in this scenario, the data is anonymized at the entities level before entering the mechanism.

2-1-2-Data Processing
The Data Processing pillar is responsible for transforming and cleaning all the ingested data, constructing the corresponding HHRs, an extension of EHRs, containing data of any type and category that is relevant to a citizen's overall health (i.e., medical, nutritional, lifestyle, social care data, etc.).As soon as this process is complete, the pillar finally stores all the acquired information within the internal datastore of the mechanism for future usage either by the mechanism itself or by the involved users of the mechanism (analyzed in Section 2.3).In the beginning of this layer, the Data Conversion component retrieves all the ingested data by the Gateway component, implementing two (2) functionalities to make all this data interoperable both structure and terminology wised, translating it into the HHR FHIR format [17].
The first functionality following the approach of Kiourtis et al. [14], seeks the semantic transformation of the incoming data, particularly wherever there is a need to translate among terminologies used within different data models, or other kind of semantic operations.To this context, it transforms the raw data into HHR FHIR format using the HHR model [18] that is being produced by the HHR Creation component.Despite that thousands of medical data models exist [19], these are targeting mainly on the integration of data from clinical trials.Hence, the proposed HHR model represents in a conformant way all the required data by the underlying data sources, which refer to the same citizen.It implements an eXtensible Markup Language (XML) language, developed for the HHR model, which permits to provide in a machine-interpretable way the HHR types' structure and align them to the corresponding FHIR resources' structure [20].
With regards to the second functionality of the Data Conversion component, this is responsible for identifying the semantics and the terminologies (e.g., SNOMED CT, LOINC, ICD-9, ICD-10) of the transformed HHR data, interacting as well with the provided data to understand the terminologies and perform terminological mappings to the content of this data among different terminology systems, thus providing a common view upon the ingested data.The mechanism offers a collection of operations over terminologies that are described using the HL7 FHIR specifications to accomplish this translation.These operations include the: (i) Value Set expansion, (ii) Concept Lookup / Decomposition, (iii) Value Set Validation, (iv) Subsumption testing, (v) Batch Validation, (vi) Batch Translation, and (vii) Maintaining a Closure Table .This allows for the provision of several functionalities (semantics) about these information elements located within more complicated structures.
As soon as all the obtained data has been converted into HHR FHIR format, as it is critical to have confidence in the "freshness" and suitability of the newly created information, the generated HHR FHIR data along with relevant historical data that is retrieved from the internal datastore is sent to the Data Cleaning component to be cleaned.To achieve that, this component follows a specific procedure [21] in order to: (i) ensure that the data measurements adhere to established business rules or constraints by identifying problems related to conformity to specified requirements, (ii) correct/remove any problems found throughout the validation procedure, (iii) ensure that the provided data set is accurate, complete, and complies with all needed fields and required attributes (required fields which cannot be empty), and (iv) ensure that the information provided is accurate.
In sequel, all the cleaned HHR FHIR data is sent to the Data Aggregation component.The aggregation functionality offered by this component communicates with the internal datastore of the mechanism to aggregate and finally store all the ingested and processed data.Hence, it gathers all the input HHR FHIR data and aggregates them into the appropriate HHRs, storing them into the datastore.To that purpose, it should be emphasized that the HHRs are converted into tuples and stored in the data tables of the relational schema of the Data Store, which was created in accordance with the entityrelationship definition of the HHR model, rather than being saved as raw HHR documents in the datastore.
To sum up, the proposed mechanism encompasses several data elements that are ingested, processed, stored, potentially updated, and analysed to successfully collect citizens' personal data deriving from different data sources (either of known or of unknown nature), and construct their corresponding HHRs, based upon all their existing data.

2-2-Time Journey of Data
As it has been described so far, the mechanism encompasses several data elements that are ingested in it, stored, potentially updated, and analysed.These data elements refer to heterogeneous types of data, such as raw data, historical data, created HHRs, etc., as depicted in Figure 2. The latter provides a snapshot of how these data elements are correlated in terms of their time journey across the mechanism and the potential time frames of their updates and processing.Since one of the main objectives of the mechanism is to be able to collect the incoming data through either a streaming or a batch way, such information is depicted in Figure 2, to make clear the ingested data time journey based also upon their type of collection way.

Figure 2. Data time journey
In more detail, as depicted in Figure 2, the data time journey follows the identified two (2) key pillars of the mechanism: the Data Ingestion, and the Data Processing.In the context of Data Ingestion, the mechanism gathers a segment of medical data from various available unknown data sources (i.e., sensors and IoMT devices, online tools) in real-time, in a streaming way, while another segment of data is gathered offline in a batch way (i.e., historical data, and health records).To this end, as depicted in the figure, some of this data is updated in a frequent manner, while some other is updated in an infrequent or even in a sporadic manner.After the data collection, in Data Processing, the Data Conversion component takes place followed by the Data Cleaning component that takes as an additional input the historical data that exists in the internal datastore of the mechanism, to follow the corresponding removal/corrective actions.In sequel, the Data Aggregation component aggregates all the ingested data, which is aggregated into the formulated HHRs (through the HHR Creation component) that are finally stored into the internal datastore.

2-3-Involved Users
In the healthcare ecosystem, multiple end users can provide their data and benefit from the field's results.To the context of the proposed mechanism, there exists a plethora of users that can take advantage of the mechanism's functionalities, as they are depicted in Figure 3.More specifically, these users include healthcare professionals, healthcare providers, and citizens.However, the most critical stakeholder among them is the citizens, since the whole ecosystem of the mechanism has been built based on medical data that is provided by them.Apart from the citizens, a major role in this concept is played by the healthcare providers.The latter provide healthcare diagnosis and treatment services, while the healthcare professionals provide healthcare advice and treatment according to formal experience and training.In both cases, these users can retrieve from the mechanism the HHRs that have been constructed for the required citizens, thus obtaining information about the complete view of their health, making the appropriate diagnosis and care treatments.

3-Performance Evaluation
In this section, the performance of the core components of the proposed mechanism are analyzed, investigating its feasibility and efficiency in the healthcare domain.In deep detail, we focus on evaluating the effectiveness of the operation of the Data Ingestion and Data Processing pillars.To this end, it should be noted that the evaluated components have been developed in Java SE.For our proof-of-concept both pillars are implemented on a desktop PC equipped with an Intel i7-4790 at 3.60 GHz, 16GB RAM utilizing as operating system Windows 10.

3-1-Data Ingestion
For the evaluation of the Data Ingestion pillar, we created a representative use case analyzing critical information for the technical infrastructure of the mechanism performing all the steps described in Section 2. The use case has been staffed with data from the CareAcross platform [22], which aims to connect cancer patients with peers and doctors.The chosen dataset deals with the condition of breast cancer representing the known source that feeds the mechanism.A snapshot of the underlying original dataset coming from the CareAcross platform is depicted in Table 1.The data provided by the CareAcross platform is fully anonymized by the source itself, due to (i) GDPR regulations, (ii) ethical concerns, and (iii) terms of the CareAcross service agreed by the patients.Thus, it is immediately delivered as input to the Gateway component as raw data in Comma Separated Values (CSV) format, bypassing the Data Anonymization component.Apart from the known source, the iHealth Feel device [23] is utilized, serving as the connected IoMT device of unknown source type for the purposes of the experiment.Thus, in this pillar, following the process described in Section 2.1.1,through the Plug'n'play Sources component, the mechanism starts retrieving the device's required specifications (i.e., name and MAC), while the fields of device manufacturer and type are unknown having not yet been identified by the mechanism (Table 2).To this end, it should be noted that this device belonged to one of the patients of the CareAcross platform.

MAC address Manufacturer Type
iHealth Feel 00070D621C4B Unknown Unknown Then, through the MAC vendors API, the mechanism identifies the manufacturer of the connected device, which is iHealth, and automatically retrieves from the iHealth API information regarding: (i) the API methods, regarding their requested URL paths and endpoints that are available, and (ii) the API methods' specific functionalities, regarding their general descriptions (Table 3).To this context it should be noted that since the mechanism is not yet aware of the type of the connected device, it retrieves from the iHealth API all the methods for all the supported types of devices (i.e., glucometers, blood pressure monitors, activity trackers, pulse oximeters, and thermometers).By the time that all the methods are retrieved, the mechanism finds the type of the connected device.To achieve that, the Plug'n'play Sources component calculates the syntactic similarity between the connected IoMT device's name and each different already known device's name from its internal list.When all the different comparison combinations occure, the mechanism concludes into the results of Table 4 that depicts the top-5 calculated percentages of similarity among the devices' names.iHealth Ease Blood pressure monitor 83% iHealth View Blood pressure monitor 77% Based on the captured results, due to the fact that the estimated similarity between iHealth Ease and iHealth Feel is the highest among the rest of the calculated percentages (highlighted with grey), and the device type of iHealth Ease is known for the mechanism that it is a blood pressure monitor (since it already exists on its list with the recognized IoMT devices), the unknown device of iHealth Feel is identified to be a blood pressure monitor, as well.Once the type is recognized, all the iHealth API methods are retrieved, using the ones that correspond to the devices of type "blood pressure monitors (BPM)" (Table 3).Then, the available methods are displayed to the user to decide the one that she prefers to use to send her data to the mechanism.To this end, it should be noted that for the user to have access to these methods, she must successfully login into her iHealth personal account, as described in Section 2.1.1.In this example, the user successfully logins into her account, finally retrieving the device's chosen data, which were all the blood pressure data of the device.A snapshot of the collected data of iHealth Feel is depicted in XML format in Figure 4.

Figure 4. Data ingested from the iHealth Feel IoMT device
Once the Gateway component retrieves all the sent data (both from the CareAcross platform and the connected IoMT device -iHealth Feel), it converts it into XML format to be processable and understandable by the other components of the mechanism.At the same time, the Gateway component is responsible for replacing possible missing/empty values of the received data with the default value of -999999.A snapshot of the current output of the Gateway component is presented in Figure 5, referring into the first two entries of the received dataset from the CareAcross platform.

3-2-Data Processing
In sequel, in the Data Processing pillar, the Data Conversion component takes place that takes as an input all the provided raw data in XML format, to convert it to HHR format.For that purpose, it uses the HHR model provided by the HHR Creation component in combination with the converter provided within the Data Conversion component, to convert all the XML data into HHR XML format.Additionally, during this phase, the used codes are automatically translated into a set of agreed standardized terminologies that are commonly used, in order for different codes from different countries to be translated into a common terminology.Particularly, the code "Oestrogen Receptor (ER) positive" is translated to the formal terminology of ICD-10 using the code "Z17.0Estrogen receptor positive status [ER+]", allowing the coexistence of different datasets to the mechanism, enabling the successful usage of their contained data.A snapshot of the output that is provided by the Data Conversion is depicted in Figure 6, illustrating the transformed first two entries of the received raw XML data from the CareAcross platform into the corresponding HHR XML data.In short, the XML file of Figure 6 is mapped to the HHR format, where the "system" XML element represents the source of the data, the "identifiers" element represents the diagnosis and group identifiers, the "type" element represents the ICD-10 mapped terminology of the "Oestrogen Receptor (ER) positive" value, while the "type" attribute of the "member" element represents that this listing refers to a specific type of diagnosis.The same procedure is applied to the XML data retrieved from the connected IoMT device, producing the corresponding XML file.

Figure 6. Output of CareAcross data from the Data Conversion component
Afterwards, in the same component, the converted HHR data is converted into FHIR data, to be compliant with the HL7 FHIR standard.For that reason, the Data Conversion component uses as an input the previously converted HHR XML data, and by combining it with the same HHR model provided by the HHR Creation component, it converts all the HHR XML data into HHR FHIR XML format.A snapshot of the final output of the Data Conversion component is illustrated in Figure 7, outlining the transformed first two entries of the HHR XML data into the corresponding HHR FHIR XML data.Shortly, the XML file of Figure 7 is mapped to the HHR FHIR format, where the "Group" XML FHIR element represents the source of the data, the diagnosis and group identifiers, the "coding" element represents the ICD-10 mapped terminology of the "Oestrogen Receptor (ER) positive" value, while the "category" element represents that this listing refers to a specific type of diagnosis.The same procedure is applied to the XML data retrieved from the connected IoMT device, producing the corresponding XML file.

Figure 7. Final output of CareAcross data from the Data Conversion component
As soon as all the provided data is successfully transformed into HHR FHIR XML format, all this data feeds the Data Cleaning component to apply data quality assessment techniques.This action includes data cleaning, originality, and quality, informing the mechanism for any possible faults/errors that have occurred.In this case, an alert is triggered, stating that the acquired dataset contains an undefined value, and for that reason the specific measurement is erased (cleaned).Particularly, the second entry of the dataset does not contain the code of the diagnosis and was erased.A snapshot of the output of this process is illustrated in Figure 8, depicting the cleaned version of the second entry of the dataset into HHR FHIR XML format.Regarding the iHealth Feel device, it contained two (2) erroneous data attributes (referring to the values of HP and HR that were out of the accepted range values), which were successfully corrected by the mechanism.All the applied cleaning actions are depicted in Table 5.Since the data is fully cleaned and interoperable, it is time to be aggregated with possible already existing measurements of the same patient, and finally be stored into the internal datastore of the mechanism.By taking as an input the cleaned HHR FHIR data in XML format, the Data Aggregation component accepts and extracts the relevant data based on the existing HHR model.Thus, it checks whether the data already exists in the mechanism, and based on the specific group identifier, it correlates it with the corresponding already existing patients of the mechanism.In this case, the Data Aggregation component provides as an output the same HHR FHIR data in XML format, since there was not any correlation that took place, indicating that the patients had not provided any data in the platform.In sequel, the internal datastore gets as an input the HHR FHIR data in XML format, and finally stores it in it, in the form of a relational schema that is compliant with the ER schema of the constructed HHR model.

4-Discussion
According to the architecture depicted in Section 2 and the experimental results stated in Section 3, the proposed mechanism successfully achieved to satisfy its main purpose of designing, implementing, and providing a data integration healthcare framework for exploiting non-homogeneous healthcare data while addressing different healthcare aspects.In more detail, the mechanism incorporates technologies for a paradigm shift from heterogeneous and independent data sources, from siloed exploitation of data, and from health records (i.e., EHRs and PHRs), to complete integrated data views through the HHRs.This makes it feasible to precisely target treatment on a personalized basis since it enables the derivation of fine-grained patient profiles from HHR data, which complements the increasingly detailed characterization of disease correlations.The information gathered by the mechanism expedites the identification of new possibilities and diagnostics.With patient consent on sharing in-depth information on treatments, symptoms, and outcomes, anyone who is interested in medical research can sample information, looking at different variables to see who responds to what treatments.In this context, the mechanism allows patients to share information about treatment outcomes, creating a functioning system so that errors, best practices, or effective treatments will be able to become visible and connected.In this way, it supports the vision of a learning healthcare model where every patient's experience informs the next patient's experience so that the system is learning in real time.On top of that, it should be considered that the healthcare system is constantly evolving, and to keep up with the changes, learning healthcare models are becoming more and more popular.The Agency for Healthcare Research and Quality (AHRQ) developed a series of case studies to help health system chief executive officers and stakeholders better understand the concept of a learning health system and the value of making investments in transformation [24].Building this understanding is part of the AHRQ's ongoing effort to accelerate learning and innovation in healthcare delivery to ensure that people receive the highest quality, safest, and most up-to-date care.
Among its main advantages, the mechanism's solution allows healthcare professionals and caregivers to synchronously monitor the progress of their patients, allowing a better coordination of their care.What is more, the mechanism comprises a valuable tool in the day-to-day operations of people across the healthcare spectrum.In summary, it gives healthcare providers a more effective approach to administer care through better planning, to better manage fewer pointless consultations, and to better prepare for providing treatment and prescription recommendations, as all their data are stored in a one location -the HHRs.In particular, it facilitates well-informed decision-making through the continuous and substantive flow of the data described above.Healthcare professionals can have access to all the available knowledge that is related to each patient they are treating.Towards this direction, the mechanism contributes to the shift from acute-based to community-based care by providing improved access to patient-related information across disciplines.So, from preventive to follow-up, care delivery may be integrated across the continuum of services, and it can be coordinated across all sites.Finally, it contributes to the reduction of the financial pressure on healthcare systems driven by a more active and healthier population, since the emphasis on preventive care and managing chronic diseases can help to keep people healthy and out of the hospital, ultimately leading to cost savings.
Among its indirect impacts, by effectively gathering data both from individuals' EHRs and PHRs, as well as personal IoMT devices, collective community knowledge could be extracted, playing a significant dual goal: (i) fusing, collecting, and performing analysis on information from multiple entities to gather valuable knowledge towards providing actionable insights at the point of care, and (ii) offering the way for targeted and efficient policy making at all levels.The impact of such solutions using community knowledge, which is collective, in the domain of healthcare is apparent, since, according to the research of Busse et al. [25], information sharing has changed their overall approach towards healthy life.Such a fact highlights the need for a holistic approach to exploit all the existing huge amounts of healthcare data for achieving better health management and patient outcomes, personalized medicine, the prevention of diseases, effective and targeted policy making, and health promotion in general.Especially the construction of effective and targeted policies making nowadays is more than obligatory and vital due various viruses' outbreaks (e.g., COVID-19) [26,27].Building a model of healthcare that would meet all patients' needs is one of the main challenges to be faced.To achieve this, the 7 th World Health Assembly (WHA) declared the Universal Health Coverage (UHC) as a priority [28].This is an initiative to provide equitable access to quality health services for all, at affordable costs by the year 2030 as part of the Sustainable Development Goals (SDGs).Currently, the UHC initiative is led by the World Health Organization (WHO), in collaboration with the governments of different countries of the world.At present, only around one third of the world's population is covered under basic health insurance schemes.Consequently, the importance of schemes and policies is vital since recognizing for example that the symptoms of COVID-19 may be mild, the development of pragmatic policies both for healthcare professionals and for patients who have respiratory illness should be considered [29].
It is worth mentioning that the overall environment of the mechanism has been designed and implemented in such a way that it allows several cases of extensibility.First, it allows for extensibility in terms of new datasets, since the functionalities of the Gateway, the Data Aggregation and the Data Conversion components enable new datasets to be directly ingested into the internal datastore by following the corresponding path, finally being represented in the mechanism as HHRs.Apart from this, the mechanism allows for extensibility in terms of new data sources, since the architecture introduces the Plug'n'play Sources component that allows for new (i.e., unknown) data sources to be identified, mapped to specific (i.e., known) data sources, and thus facilitates data acquisition from these sources.As soon as these new data sources are identified, the overall data ingestion flow can be followed as described in the abovementioned extensibility case.

5-Conclusions
The current manuscript has further described and examined a mechanism that provides a holistic environment, allowing the health ecosystem to collect and analyze actionable knowledge from healthcare data in a multi-modal way.The mechanism includes techniques for obtaining information from heterogeneous data sources, fusing it, and aggregating it into new data structures (i.e., HHRs).This set of information provides the groundwork and the support for the health ecosystem entities, opening opportunities for successfully achieving personalized medicine, disease prevention, and effectively leading to a reduction in readmission rates.The crux of the mechanism is the adaptation to unknown devices, which offers undeniable interoperability and assertions in the ecosystem.Having designed and quantitatively evaluated the mechanism through different use case scenarios, it was proven that the mechanism is a turnkey solution towards the successful collection and integration of wither streaming or batch data from heterogeneous data sources.We anticipate that the study findings from this work will help in the development of plans, frameworks, and tools for boosting the data management and interoperability capabilities of the healthcare ecosystem and improving patient care.
The findings of this paper's research can be expanded in a variety of ways for subsequent work [30].We created and constructed a prototype for the mechanism's proof-of-concept implementation.Different use case scenarios were used to evaluate the mechanism's applicability in terms of gathering and processing data from heterogeneous data sources of the real-world, having a variety of data formatting, analytics requirements, knowledge to be provided in HHRs, focus

Figure 1 .
Figure 1.Overall architecture of the proposed mechanism

Figure 5 .
Figure 5. Output of CareAcross data from the Gateway component

Figure 8 .
Figure 8.Output of CareAcross data from the Data Cleaning component