SleepCon: Sleeping Posture Recognition Model using Convolutional Neural Network

Recognition of sleep patterns and posture has sparked interest in various clinical applications. Sleep postures can be monitored autonomously and constantly to provide useful information for decreasing health risks. Existing systems mostly use images to train the model to learn based on many sensors. For example, a camera, pressure sensor, and electrocardiogram. In this study, a model (named as SleepCon) was designed using deep learning, which will have the capability to train with any threshold image obtained from any sensor. This paper presented a system where data was obtained from a camera installed on the top of a mattress. The camera located the movement of the body posture on the mattress while the subject was lying down on the mattress. In doing so, CNN and other pre-processed steps took place to collect data and then analyze the data to recognize different sleep postures. This model was stored for use in real-time applications. The system can recognize the three major postures, i.e., left, right, and supine. A real-time application is also developed and operates the stored SleepCon model through an accompanying desktop application for detecting the posture live. The accuracy of classification was greater than 90%, while the actual application accuracy was 100% after carrying out the experiment on the SleepCon model.

the spine should be aligned in side-lying positions to prevent muscle stiffness. The side-lying position has been reported to significantly promote sleep quality ratings [3]. People living with Asthma are advised to adopt a right-side sleep posture. However, prone sleep posture should be avoided. Moreover, people who sleep in an upright posture are more likely to report waking symptoms and the presence of medical conditions (i.e., respiratory disorders, heartburn, etc.) that affect their sleep quality. For healthy people, the percentage of time spent in various sleeping positions is in the order: supine, side-lying, and prone [4].
Sleeping positions have been proven to affect the symptoms of numerous diseases and have a vital role in systemic physiology, including metabolism and immunological, hormonal, and cardiovascular system functioning [5]. For example, hypertension, diabetes, hyperlipidemia, obesity, and cardiovascular and cerebrovascular diseases have all been associated with chronic sleep deprivation [6]. Recent research has shown that in-bed positions have a significant impact on the prevalence of sleep disturbances such as apnea [7], even carpal tunnel syndrome [8], and pressure ulcers [9]. For example, the most effective approach to preventing pressure ulcers is to make postural modifications while keeping the correct body position [9]. This paper examines the analysis of sleeping posture on the bed using Deep Learning. The position is classified into one of three groups resulting from this analysis: supine, left, or right. The purpose of the study is to create a model that can be used to classify sleeping postures based on binary image input. This type of analysis is essential for classifying the patient's body motor function.
Counter to the approaches described above, a pose detection system based on a camera reduces the initial cost of developing an effective deep learning model for accurate recognition. Eventually, Sleepcon (Sleeping Posture Recognition Model using Convolutional Neural Network) was developed for in-bed posture classification, an end-toend framework. The SleepCon method took binary input images to form a single frame for sleeping posture and classifies it using deep convolutional neural networks (CNN). As a result, it was proved that the developed system significantly outperforms modern methods and other research. Furthermore, the model used data augmentation with custom layers and fine-tuned hyper-parameters, which enhanced the model and avoided overfitting the trained model.

The main contribution of the research is:
 Development of SleepCon: Effective CNN model for recognizing sleeping posture  The system will work with any input image data obtained from the camera or pressure sensor  This work also contains its own set of datasets which will be open to other researchers for further study There were a few challenges to achieving the main objectives; the first was obtaining the features from the input data. The second one was to obtain the dataset for evaluating the model. Deep learning's CNN architecture was used to overcome the issues, where the first stage was featuring engineering and the second step was classification. Furthermore, to solve the problem of input data, a system was developed to obtain image data. This system uses any camera, such as a webcam or WiFi camera. In this study, a personal smartphone was used and attached to the ceiling with a WIFI connection to the workstation. It also helped in this study to obtain a high resolution of input images to get a bettertrained model. In addition, the proposed system is cost-effective (it reduces the cost of hardware).
The paper is organized as follows: a literature review of posture recognition and the deep learning model is covered in Section 2. Section 3 presents the overall research methodology of the proposed system, which includes the process of data collection and pre-processing. Details of the proposed model for SleepCon are covered in this section. In Section 4, results of the trained best-tuned hyperparameter model with a discussion of the experiment are presented. Finally, in Section 5, future work and conclusion have been stated.

2-Literature Review
One of the major parts of health and well-being is sleep [10]. Currently, there are two categories of sleep monitoring which are sleep quality analysis, such as [11,12] and posture recognition [13,14]. Liu et al. [15] used passive RFID tags to monitor sleeping posture. They used a battery-free passive tag matrix based on cameras, pressure sensors, and electrocardiograms. They proposed a hierarchical recognition scheme that identifies individual posture output from coarse-grained subsumption without using any personalized training process, which allowed them to identify posture accuracy with 96.7% and respiration rate of 0.7 bpm with an error. At the same time, they used pre-processing PCAbased rotation algorithm but did not mention which classifier they used.
Grimm et al. [16] used a single depth camera to recognize sleeping posture. In doing so, they used Bed aligned maps (BAMs) [17] using Convolutional Neural Networks (CNN) model for classification and achieved 94.0% accuracy. Many studies on deep-learning-based posture estimates have been published [18][19][20][21][22]. Research conducted by Matar et al. [22] suggested a pressure sensing mattress to generate binary pictures to monitor the patient's posture. This technique saves space by reducing storage requirements and computing costs. Even though the proposed method can identify posture with outstanding accuracy of 0.866 by Cohen's Kappa coefficient, throughout the system, at the time, it was not implemented in hardware. Afterwards, to expand the work, a hardware implementation employing an artificial neural network and a 27-node-piezo resistive pressure sensor array for classification.
The study was able to correctly categorize six with an accuracy of 97.6% [21]. To assess a person's sleep posture, which is six health-related, shallow Convolutional Neural Networks and a bedsheet along with 1024 sensor nodes (23 rows and 32 columns) were used in the most recent research. However, the aforesaid method increases the system's cost by using an excessive number of sensor nodes and requires the use of an Independent PC to process data. Furthermore, data cannot be monitored remotely because it can only be monitored locally. Another recent work is based on CNN [14]. The system can recognize four significant postures which are, right lateral, left lateral, face down, and face up. The research conducted that the proportion of classification accuracy achievement is about 90 for the experiment. They also stated that using the system could continuously monitor the subject's sleep position, allowing possible pressure hot spots to be discovered and necessary interventions to be applied.
On the other hand, Grimm et al. [16] proposed using an alleviation map to define depth camera pictures. Then, a CNN classifies three postures: supine, left, and right side-lying. Their approach had a 94 percent accuracy, while our fourposture coarse classification technique had a 97.5 percent overall accuracy when factoring in blanket interference. Another researcher [23] has used a force-sensing application pressure mattress; the used mattress was a high-resolution mattress with 2048 sensors. Only three different postures were identified by this approach, namely "supine", "right side" and "left side".
Mohammadi et al. [24] proposed a design that included supervised machine learning algorithms. These algorithms were used to noncontract vision-based infrared camera data through a transfer learning strategy. It carried out a noncontact sleeping monitoring system which allows the system to analyze the posture and movement of the body. This approach proved well efficient in computing the poses of sleep of participants while they were under a blanket. Clinical polysomnography measurement technique took place to examine manually scored sleep poses. The system showed more positive results than the existing video-based approach system to measure the subject under discussion. It is also generating high performance than the clinical standard polysomnographical position sensor to measure pose. Another approach was taken for older adults to face sudden danger in a home environment. This approach was a camera-based system to recognize posture which used an ensemble convolutional neural network (CNNs) [25].
Enokibori et al. [26] proposed a living-alone system for elderly people. The system was to monitor fall-detection, out-of-bed situations, and bed-going situations for elderly people. It was basically a bed monitoring system. They used infrared and pressure sensors to detect falling events. In doing so, the finite state machine (FSM) method was applied.
The CNN-based model tends to be the most known deep learning model to predict the posture of a human being. However, the improvement possible for the model performance is still there. In this study, the main target is to enhance the accuracy of the customized CNN model for a better recognition system. The model was designed in such a way it will be able to train with any image input data.

3-Methodology
The research methodology presented in Figure 1 was followed to develop the final best deep model for the posture recognition model. The main two processes were data pre-processing and model training and testing. Before data processing, participant sleeping image and background images, were collected. Then by using a subtractor, the background was removed ( − ) and then converted to threshold image , followed by image augmentation obtaining input images . Next, the training data and the test data , which are conducted in the next steps, where = { , } were split from the dataset.
In the second primary process, the split data and with its corresponding labels of is fitted into the deep learning model; by tuning the hyperparameters, the best parameters and layers for deep learning are selected and stored in the model for the final actual application. Due to this step, the final model for SleepCon was scripted, as presented in Figure 2.

3-1-1-Participants
The testing and training dataset includes six (6) participants for sleeping posture. A common feature of participants was selected in such a way the model works for a range of people. All the participants involved had no history of any sleeping disorders. Table 1 illustrates the age, gender, height, and weight of the participants. The distribution of collected data and use of the data for future research had got permitted by every participant.

3-1-2-Dataset Description
For this study data was collected before training the learning model for the sleeping posture recognition system. Data is collected using a Video camera (mobile camera connected with WIFI), where the camera pixel is 1080*1920. Size of screen captures an image of created dimension M-y-N square matrix = { , }, where , indicates each pixel value in the ℎ and ℎ position, 1 ≤ ≤ and 1 ≤ ≤ , = = 1024. This is the raw data collected by masking threshold images of sleeping posture, collecting around [10000-16000] images for each subject. Each posture includes the data frames around [15][16][17][18][19][20] sec (about 100 frames). Here are ten different poses for supine and three different poses for both left and right. Few samples of stored images of different poses are presented in Figure 3.

3-1-3-Data Pre-Processing
For this research, the device used to capture the image was a personal smartphone. Hence no initial cost was required for evaluating the proposed model. Before taking snaps of sleeping posture, the background was initially captured. Then, the background subtractor was used to remove and calculate the foreground masking for performing a subtraction between the frame with the subject sleeping and the background model. It will help to obtain the movement on the scene and distinct from the background and get the actual characteristics of the sleeping posture, as shown in Figure 4. Once the background was removed, the image was processed to obtain a bi-level (binary) image by implementing threshold (T) in the image. Next, each image is pre-processed by using the image augmentation technique. This process helps overcome overfitting issues; initially, this research dataset was comparatively small for Deep learning (DL) to obtain better performance. Data augmentation helped to overcome this data size issue too. The following transformations were applied to the initial image dataset:  Geometric transformationsby width shifting, height shifting, shearing, zooming, vertical flipping and rotating.
 Colour space transformations: rescaling to 1./255, means scaling down by factor 255 before training the DL model.
Here the value of the image will be 0 or 255, where 0 is black and 255 is white.

3-2-SleepCon: Sleeping Posture Model using Convolutional Neural Network
SleepCon is designed to able to train the model using threshold images to detect three main sleeping postures; these outputs are denoted as class C, let = = { 1 , 2 , 3 } where 1 = { }, 2 = { } and 3 = { ℎ }. There are two main stages in this model: feature learning and classification. Due to feature learning capabilities, there was no need to do feature engineering before classification. As illustrated in Figure 2, the SleepCon model is divided into two main steps, first features learning and then classification. A feed-forward neural network carries out the complete process. The feature learning step consists of 4 blocks. The first block contains two convolution layers with ReLU activation function (Equation 1). The second and third block has MaxPool-Convolution+ReLU-Convolution+ReLU. Before transferring the data into classification, it is passed to the MaxPool layer. The next step is classification, where the data were flattened first and produced neurons; from a large number of neurons, few are dropout (dropout rate=0.5). Then this is passed through dense-512 with ReLU activation function. Finally, a layer of dense with SoftMax activation function (using Equation 2) was added for the output layer. According to output C, the last layer contains three neural.
where input of the node is . The activation function ( ) and ( ) was used to obtain the output of the node. The output range for both functions is [0, ∞).

4-Experiments and Results
The proposed model for SleepCon was implemented using python's TensorFlow->Keras on an NVIDIA GEFORCE RTX 3080, where its GPU Memory Size is 8GB. The input size of the image is 1024×1024. However, it is input changed to 256×256 before feeding into the model. The image augmentation and dropout layer will help to prevent over-fitting of the SleepCon model. During the training stage, the Adam optimizer was used with a learning standard of 0.001. It took around 18 hours and 46 minutes to complete the initial training.
Due to the current situation of COVID-19, the model was initially trained and evaluated using a two-person dataset. Therefore, after evaluation, the training and validation performance graph is presented in Figure 5.

Figure 5. Training and validation loss using two participant data
According to figure 5, there is no improvement in accuracy in increasing epoch value. Hence, next trained the model using 6 participants sleeping data with 100 epoch size, 6 hours 20 min, maximum accuracy 98.5%. 200 epochs required around 12hours of training time, maximum accuracy 99.59%, output presented in Figure 6. On increasing epoch size 500, it can clearly prove that at 200epoch, there is no increase in accuracy.

Figure 6. Training and validation loss using six participant data
The further trained model is implemented into an actual application. The output visualization of the application is presented in Figure 7. The position was tested, and accuracy was calculated using Equation 3: According to the calculation, 100% accurate result was obtained in detecting real life. It can be seen in Figure 7 a few outcomes which testing in the application. Hence, the trained model tended to outperform and can be further used in the application of posture recognition.
The python application is presented in Figure 7. Where the prediction output after capturing data from the camera to the threshold image for right, supine, and left is illustrated in Figure 7-a, Figure 7-b, and Figure 7-c, respectively. The colour image on the left of the individual is streaming video input. The black and white image on the left is also streaming data converted to the threshold, which was input into the trained model and finally, the output is presented in the text on the left side of the application. Next, a comparison is held with the cutting-edge works to the research study presented in this paper. Table 2 summarizes and offers the output and techniques used with other deep learning approaches for posture recognition. The majority of the earlier work presented good accuracy, but SleepCon provides higher accuracy with a lower requirement of devices for prediction. This is due to considering threshold images, which have lower noises. It also detects the most common three postures. However, it can be seen [26] as providing slightly enhanced performance. But the input data for the model is a pressure image from Force Sensitive Resistor, which will not work for different devices. The proposed model tends to give 100% accurate outcomes; hence here can consider the SleepCon for correctly recognizing posture.

5-Conclusion
In this research, a CNN-based algorithm for sleep posture recognition on images captured by a single Wi-Fi camera attached on ceiling. Involving three common postures for detecting the posture. The CNNs have been trained on a repository of [10000-16000] images of size 1024×1024, which was later resized to 256×256. Showed that the proposed model allows for high classification accuracy even when the training and testing images are obtained by different people. Finally, a model involving deep learning was designed and developed known as SleepCon, which was pretrained using CNN and other preprocessed steps to obtain effective sleep posture recognition systems. This system was then implemented and can be further used in a smart home environment. It was proved that the developed Effective CNN model can be used for recognizing sleeping posture. Sleep position classification from a camera by converting the image to a threshold image for three posture classifications: left, right, and supine, was shown to provide 99.59% accuracy. Moreover, the system will work with any input image data obtained from the camera or pressure sensor. This work also contains its own set of datasets, which will be available to other researchers for further study. The model training approach explained before shows that the sleep posture recognition deep learning model can be used for training using different input sources; only the images need to be converted into binary images. Collecting experimental data from a larger group of people is something that will be done in the future. Collecting raw sleep data from RFID scanners and RFID bedsheets for healthy individuals for prolonged periods is also our intention for the future.

6-2-Data Availability Statement
The data presented in this study are available on request from the corresponding author.

6-3-Funding and Acknowledgements
We would like to thank RMC, Multimedia University Malaysia (MMU for the IRFund to conduct the study on sleeping posture. Project SAP ID: MMUI/210024. This research was conducted in the Centre for Engineering Computational Intelligence, Faculty of Engineering & Technology, Multimedia University, Malaysia.

6-5-Informed Consent Statement
Informed consent was obtained from all subjects involved in the study.

6-6-Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this manuscript. In addition, the ethical issues, including plagiarism, informed consent, misconduct, data fabrication and/or falsification, double publication and/or submission, and redundancies have been completely observed by the authors.