New Approach to Image Segmentation: U-Net Convolutional Network for Multiresolution CT Image Lung Segmentation

Image processing is the main topic of discussion in the field of computer vision technology. With the increase in the number of images used over time, the types of images with different resolution qualities are becoming more diverse. Low image resolution leads to uncertainty in the task of image processing. Therefore, a method with high performance is needed for image processing. In image processing, there is a Convolutional Neural Networks (CNN) architecture for semantic segmentation of pixels called U-Net. U-Net is formed by an encoder network and decoder network that will later produce segmented images. In this paper, researchers applied the U-Net architecture to the lung CT image dataset, which has different resolutions in each image, to segment the image that produces a segmented lung image. In this study, we conducted experiments for many training and testing data ratios while also comparing the model performances between the single resolution dataset and the multiresolution dataset. The results showed that the segmentation accuracy using a single resolution dataset is as follows: 5 to 5 ratio is 66.00%, 8 to 2 ratio is 88.96%, and 9 to 1 ratio is 94.47%. For the multiresolution dataset, the application is: 5 to 5 ratio is 82.42%, 8 to 2 ratio is 90.12%, and 9 to 1 ratio is 93.66%. And for the result, the training time using single resolution dataset are: 5 to 5 ratio is 59.94 seconds, 8 to 2 ratio is 87.16 seconds, and 9 to 1 ratio is 195.34 seconds, as for multiresolution data application are: 5 to 5 ratio is 49.60 seconds, 8 to 2 ratio is 102.08 seconds, and 9 to 1 ratio is 199.79 seconds. Based on those results, we obtained the best accuracy for single resolution at a 9:1 ratio and the best training time for multiresolution at a 5:5 ratio.


1-Introduction
Computer vision is often involved when discussing image processing.In the process, the deep learning method is one that is often used in image processing [1].The Deep Learning method has proven its success in its application in many fields [2].The term "deep learning" or "deep neural network" refers to a multi-layer Artificial Neural Network (ANN) and Convolutional Neural Network (CNN) one of the most popular deep neural network methods recently [3].CNN has made tremendous progress, especially in image processing and vision-related tasks [4].This will be very useful for technological developments in the field of artificial intelligence, which until now has become a trend, especially when discussing image processing.Image processing itself has many benefits that can be applied to needs, such as classification, clustering, prediction, and so on.
Semantic segmentation is capable of providing categorical information at the pixel level.Many real-world applications benefit from this task, such as self-driving vehicles, pedestrian detection, disability detection, therapy planning, and computer-assisted diagnosis.Semantic segmentation assigns a category label to each image pixel.Pixellevel semantic information helps intelligent systems understand spatial positions or make important judgments [5].In the medical field, deep CNN is often used to find solutions to biomedical segmentation problems [6].Recently, CNN has developed, and one of the developments is the segmentation model [7].One of the CNN architectures used for image segmentation is the U-Net Convolutional Network [8].This architecture was introduced in 2015 and has rapidly evolved over the years.U-Net is often used in the medical field for semantic segmentation purposes [9].U-Net is based on the encoder-decoder architecture, whereby the encoder-decoder-based deep learning method is very effective in dealing with various problems in artificial intelligence applications [10].U-Net incorporates a high-level semantic feature map from a low-level feature map decoder on the encoder by using a skip connection [7].
Research on image processing in the biomedical field is a rapidly emerging area that includes biomedical signal acquisition, image generation from signals, image processing, and image display for medical diagnosis.Clinical imaging devices combine hardware and software.The number of medical imaging sensors is of interest to researchers in the field of biomedical image processing.The number of medical imaging sensors is of interest to researchers in the field of biomedical image processing.Medical image classification is a sub-topic of image classification.Of the various types of cancer, lung cancer is the deadliest cause of death.This is because patients with non-small cell lung cancer (NSCLC) are diagnosed at an advanced stage.The death rate due to lung cancer even reached 18.4% of the total deaths caused by cancer worldwide, and this was calculated in 2019 [11].Therefore, lung segmentation is important because it has a good impact on determining the lung area on CT scan images, which can be useful in treating medical problems related to the lungs.

2-1-Related Work
Identifying objects is an easy thing for humans.However, for computers, it is something complex [12].The development of technology causes the amount of data available to increase.Nowadays, digital image processing is used in various application domains.Due to the inherent drawbacks associated with digital cameras and their image quality, there is great scope for developing techniques to improve image quality [13].Thus, the number of images with more diverse types may increase [14].As a result, a qualified image processing model will be needed so that it can be used for image processing with high performance.With the increasing number of image data taken by different optical sensors that produce images with various resolutions, a good multi-resolution image processing model will have a good impact on computer vision [15].This will make the image processing model more flexible because it can be applied to various image resolution datasets.
Image segmentation is one of the important tasks when discussing computer vision or images [16].Image segmentation can be applied for many tasks, such as medical image processing [17], scene understanding [18], autonomous driving cars [19], and augmented reality [20].Image segmentation separates the semantic entities in the image by defining the boundaries between those semantic entities [21].In this paper, we propose U-Net for image segmentation.We would like to apply the proposed method for the segmentation of multi-resolution images of lungs, which was inspired by Soomro et al. [22], to apply image segmentation techniques using multi-resolution image datasets.What will be discussed in this paper is the performance of the proposed method in performing image segmentation tasks using accuracy and computational time to examine how well the model performs lung segmentation.In the study by Soomro et al. (2019) that inspired us to do this research, we did segmentation using retinal images to segment retinal vessels [22].The data used is an image that has a uniform resolution.Soomro et al. (2019) suggested applying the segmentation model to data that has different resolutions/multi-resolution [22].Researchers want to do research related to image segmentation, whose dataset is a multi-resolution image.In this study, we want to segment the lungs on CT scan images.Due to the different types of data, namely [22] using the retinal dataset and our study using the lung CT scan dataset, what we did in this study was compare the performance of the model when applied to a multi-resolution and uniform-resolution image dataset.

2-2-Dataset
The dataset that we use to implement the proposed method is the NSCLC-Radiomics dataset from Zheng et al. ( 2020) [23].The NSCLC-Radiomics dataset contains 422 CT scan images of Non-Small Cell Lung Cancer (NSCLC) patients (Figure 1), the mask images for the lung area, and its metadata.All images in the dataset are DICOM-format images with an image size of 512×512.For training, we used CT scan images and masking.

3-1-Min Max Scaler
Min-max scaler is a normalization technique that adjusts all data numbers into probability numbers, i.e., 0 to 1, with min and max data [24].The Min-Max scaler is denoted as Equation 1:

3-2-Convolutional Neural Network
Convolutional Neural Network (CNN) performance has grown rapidly in the field of computer vision.CNN is formed by several layers of neural computing connections whose systematic processing is minimal [25].Being a method that plays an important role, CNN is used to perform tasks related to image processing, such as image recognition, image segmentation, and object detection [26].The first thing needed to understand CNN is convolution [27].The step of classification for the U-Net convolutional network is described by the workflow in Figure 2. In this paper, we want to implement the U-Net Convolutional Network architecture on the various resolution image dataset to segment the lung area.

3-2-1-Convolutional Layer
Convolution is the process of applying a filter matrix or also known as a kernel to an image to reduce the size of the image, or it is also used to add several layers of padding to keep the size the same.Convolution is also used to perform feature extraction on the image [28].Define  as a convolution function at positions , and  in image  as follows: where ,  is the size of the kernel matrix  [29].The kernel is one of the main components of CNN.The kernel is a square matrix of dimension  × , where  is an integer and is usually a small number.The kernel is used to do sharpening, blurring, and so on [27].

3-2-2-ReLU Activation Function
The Rectified Linear Units (ReLU) activation function will convert a negative value to zero.However, it will make a value other than negative be fixed [30].The ReLU activation function is defined as:

3-2-3-Dice Loss Function
The Dice coefficient is often used in Computer Vision work to calculate the similarity between two images which is then adapted into an error function called Dice loss [31].Dice loss is denoted as: = 1 −   (5) where 1() is the probability that pixel  belongs to class  and () is a vector of truth labels [32].

3-3-U-Net
Along with the development of deep neural networks, the performance of semantic segmentation also increases.Many researchers have made focused efforts to overcome the limitations of this field [33].U-Net is one of the semantic segmentation architectures based on a fully convolutional network [34].The architecture U-Net is shown in Figure 3 and the architecture is demonstrated in Tables 1 and 2.

4-1 Experimental Setup
To implement the proposed method, we used Python Jupyter Notebook with 2-core Intel(R) Xeon(R) @ 2.20GHz CPU specifications, and 32GB RAM of the computer.

4-2-Training Process
Firstly, we convert all the image data we use into various pixel resolutions randomly, from 50×50 to 600×600 using the reshape tool.This process is to create conditions where the dataset that we use becomes the same as the dataset proposed in this study, which is the multiresolution dataset.We used CT images from 421 patients.For each patient, we used 3 CT images, so the total image data used in this study was 1263.We trained the data set to segment lung regions using the U-Net architecture for the encoder network using the convolutional network and the decoder using the deconvolution network.The output after implementing the U-Net architecture is an image with segmented regions.The color of the pixels in the lung region remains fixed and will be black for the rest of the pixels.Our experiment uses training and testing ratios, those are 5 to 5, 8 to 2, and 9 to 1 to compare the accuracy of two types of data sets, namely multi-resolution image data sets and single-resolution image data sets.

4-3-Result
Measurement is important to know the performance of one's model.We used the dice coefficient for the testing accuracy metric and also the training time.For the testing, performances are shown in Table 3, which shows the accuracy and computation time with three different training ratios.Figure 4 shows the change in model accuracy at each epoch of the training and validation process.As for the segmentation results, table 4 shows the predicted mask image and the cropped lung area.4 where the ratio resolution training was already determined before.

5-Conclusion
In this study, we applied the U-Net network architecture to segment the lung areas on CT scan images of the lungs.From the experiments, we can see that the lung segmentation accuracy has the same pattern: the higher the training ratio, the higher the accuracy, but the longer the computation time.In the study of Soomro et al. [22], the resulting accuracy was above 90%.In this research, the highest accuracy is generated by a single-resolution dataset is using training: testing ratio of 9 to 1 with an accuracy of 94.47%, and the highest accuracy is generated by a single-resolution dataset is using training: testing ratio of 9 to 1 with an accuracy of 93.66%.And the fastest computation time is generated by a multiresolution dataset with a computation time of 49.6 seconds.In terms of accuracy, our proposed model is able to do image segmentation quite well; the accuracy is above 90%.This indicates that our proposed model has successfully segmented well even when segmenting on a multi-resolution image dataset.
In the future, the authors want to conduct further experiments using other models to compare the performance of these other models.In addition, the authors also want to apply this segmentation model to segment other objects in multiresolution image datasets, or they can also use other loss functions such as IoU.It aims to see the performance of the model in segmenting with various architectures and other loss functions.In addition, it will also have a good influence in the field of medical image processing to be applied to medical treatment.

6-2-Data Availability Statement
The data presented in this study are available on request from the corresponding author.

Figure 1 .
Figure 1.Preview CT scan and mask images sample: a) Sample of a CT scan image, b) The image mask of image

Table 3
is the result of the simulation to get the best model shown in Figure