State of Charge Estimation of Lead Acid Battery using Neural Network for Advanced Renewable Energy Systems

The Solar Dryer Dome (SDD), an independent energy system equipped with Artificial Intelligence to support the drying process, has been developed. However, inaccurate state-of-charge (SOC) predictions in each battery cell resulted in the vulnerability of the battery to over-charging and over-discharging, which accelerated the battery performance degradation. This research aims to develop an accurate neural network model for predicting the SOC of battery-cell level. The model aims to maintain the battery cell balance under dynamic load applications. It is accompanied by a developed dashboard to monitor and provide crucial information for early maintenance of the battery in the SDD. The results show that the neural network estimates the SOC with the lowest MAE of 0.175, followed by the Random Forest and support vector machine methods with MAE of 0.223 and 0.259, respectively. A dashboard was developed to help farmers monitor batteries efficiently. This research contributes to battery-cell level SOC prediction and the dashboard for battery status monitoring


1-Introduction
Indonesia is the largest archipelagic country in the world, with five major islands and 30 smaller groups of islands separated by the ocean.This condition leads to uneven electricity distribution network infrastructure spread across remote areas [1,2].The problem also affects processes in the agriculture fields, especially farmers' drying processes, which are very dependent on uncertain weather.The researchers have developed an independent energy system combined with Artificial Intelligence (AI) to tackle the issue [3], which is called the Solar Drying Dome (SDD).
The SDD uses photovoltaics (PV) as the source of a power generator equipped with AI to assist decision-making by utilizing sensors [4][5][6].The system adopts the concept of precision agriculture for monitoring and controlling drying operations.Decentralized areas in Indonesia, where most people work as farmers, do not have sufficient economic resources.The system is not only independent and durable but also low-cost.AI provides many important features in the system, such as optimizing energy production, balancing load functions, robust system optimization for increasing reliability, etc.
Machine Learning (ML) techniques for battery applications are divided into two categories: model-based and datadriven techniques.Model-based techniques analyze the physical and chemical components of the battery and use a mathematical model approach to describe the performance degradation process.The model-based techniques widely used in battery applications are Particle Filter (PF) and Kalman Filter (KF).Pola et al. [7] used PF to efficiently and effectively estimate battery state of charge (SOC) and predict discharge times.Mo et al. [8] used KF and improved PF to predict Remaining Useful Life (RUL), which not only enhances the prevision but also overcomes particle degradation.Zhang et al. [9] also predicted RUL by using improved unscented PF and showing higher accuracy than PF and unscented PF.Loukil et al. [10] used a recursive least squares algorithm to consider changes in battery characteristics in SOC and deliver good performance.However, the model-based approach is sensitive to noise and environmental disturbances in tracking dynamic load characteristics, especially in batteries [11].
A data-driven method analyzes historical data to create complex non-linear models indefinitely.This method is well known in battery applications because it can produce high-accuracy results to predict dynamic battery characteristics.Liu et al. [12] used an Adaptive Recurrent Neural Network (ARNN) with recursive Levenberg-Marquardt (RLM) in making predictions and proved that it was effective for predicting RUL.The techniques demonstrated superior results compared to classical Recurrent Neural Networks (RNN) and Recurrent Neural Fuzzy systems (RNF).Zhang et al. [13] used Long Short-Term Memory (LSTM) RNN equipped with the resilient mean square back-propagation (Rprop) method and dropout technique to predict RUL.Their research resulted in higher accuracy predictions compared to PF and SimRNN.Qu et al. [11] used LSTM with particle swarm optimization to predict RUL and State of Health (SOH) monitoring.The result showed higher accuracy than RNN, LSTM, and relevance vector machine (RVM).
Besides RUL prediction, many studies also used data-driven methods for SOC estimation.Hannan et al. [14] used Back Propagation Neural Networks (BPNN) that outperformed other neural network models in estimating SOC.Xia et al. [15] used a piecewise L-M multi-hidden layer wavelet neural network (PLMMWNN) that generated higher accuracy than the method of Hannan et al., which uses BPNN and an extended Kalman filter (EKF).Yang et al. [16] employed a gated RNN for SOC estimation with an RMSE of 3.5%.Chen et al. [17] generated a 2.0% error using an improved Feed-Forward Neural Network (FFNN) and extended KF.Huang et al. [18] used a deep learning method called convolution gated recurrent unit (CNN-GRU) that achieved higher accuracy than recurrent neural network gated recurrent unit (RNN-GRU), SVM, and extreme learning machines.How et al. [19] also used the proposed deep neural network compared to other deep learning methods such as LSTM, CNN-GRU, etc. Almeida et al. [20] used only voltage, current, and charge/discharge time for ANN and were capable of demonstrating high accuracy for SOC estimation.In 2021, Feng et al. [21] used a clockwork recurrent neural network (CWRNN), indicating a low RMSE of less than 1.29%.Li et al. [22] used RNN to predict SOC, considering the battery degradation process, and successfully delivered an average error of less than 3%.Ee et al. [23] proposed using a Deep Neural Network (DNN) to predict SOC with a MSE less than 0.12%.Costa et al. [24] used CNN to diagnose battery degradation mode with a RMSE of around 2%. Fasahat & Manthouri [25] used a hybrid autoencoder and LSTM NN, obtaining high SOC prediction accuracy.Sun et al. [26] used AdaBoost to improve the SOC prediction accuracy of lead acid batteries.Throughout time, deep learning has become the most favorable neural network technique since the amount of data is sufficient.The data-driven method shows strong confidence in delivering better accuracy in battery applications compared to the model-based method.
SDD has many sets of challenges.The number of sensors and tools increases the power needed and the amount of PV used.In addition, the normal distribution of energy from PV to the battery will cause the battery to fail in a short time because it does not avoid peak loads and prevent excess or deficient energy.Battery performance will continue to decrease due to the repeated unsuitable charging cycles.When the battery capacity is below 80%, the battery performance also tends to drop faster [27].Apart from all the challenges above, one of the main concerns is the state-ofcharge (SOC) of the battery, which needs to be accurately estimated to prevent over-charging and over-discharging.Although it should be accurate, SOC cannot be measured due to the dependency of many factors, such as current, temperature, and battery degeneration [28].Moreover, the SOC was different for each battery cell due to the dynamic energy usage.This is an important indication that accurately predicting the SOC of each cell could potentially optimize the load distribution of the battery, which increases its reliability [29].The model will help the Battery Management System (BMS) perform the optimization.Another advantage of an accurate SOC is the ability to carry out initial maintenance on low-capacity or damaged cells without having to replace the entire battery.
The second main concern is that the data-driven techniques will not be enough to have a direct impact on farmers who use SDD.Hence, a dashboard was created for farmers to have real-time monitoring of the battery.The dashboard has an important role in making the collected data valuable for decision-makers [30].It is not only for visualization but also helps farmers understand the suitable strategy for the battery's early maintenance [31].
This research uses neural network modeling for SOC prediction and a dashboard to monitor the battery using the predicted data.The paper is organized as follows: Section II presents an overview of the methods.Section III describes the experimental procedures.Section IV provides model evaluation and dashboard visualization.Section V presents the conclusion, remarks, and future research recommendations.

2-1-Solar Dryer Dome
Solar dryer dome food drying is commonly used in developed countries for food preservation.However, dynamic environmental conditions such as rain, wind, pests, etc. heavily affect the drying process [32].Solar dryers have been shown to be faster, more efficient, hygienic, durable against pests, and have lower clop losses.The dome adheres to the concept of a decentralized, smart, and low-cost energy system.The system is decentralized because it uses solar PV as an energy source; it is smart because it utilizes AI (Artificial Intelligence) and the Internet of Things (IoT); and it is lowcost because it uses various optimizations.
Greenhouse drying is commonly powered using fossil-fuel electricity to obtain a stable energy supply [33].With the trend in the world to increase sustainability, fossil-fuel electricity begins to be replaced by renewable energy due to its negative impact on the environment.The government worldwide also encourages a positive movement towards the use of renewable energy [34].

2-2-Neural Network
The research used a multi-layer feed-forward neural network-based method.The method comprised elements called neurons capable of identifying complex patterns between inputs and outputs.Between the actual and estimated results, the algorithm adjusts the connection weights based on the backpropagated error.A complex pattern like battery behavior can be predicted using the method because of this capability behavior in the neural network methods.The network consisted of three layers: (1) an input layer of three neurons (one for each input variable), (2) a hidden layer of fifty neurons (which gives the lowest error result), and (3) an output layer of one neuron (based on the output variable).The hidden neurons may form more hidden layers, but in this research, they are not applicable.It is only considered a deep learning method if it uses four or more hidden layers.
The model was initialized with a small weight assigned randomly to the connections between neurons.The output was calculated using the following equation: where  is a neuron,  is a layer,   is the weight,   is the bias of the network, and  −1 is the number of inputs of neuron  in the layer  − 1.A model also needs a transfer function (activation).There are many types of activation, but ReLu activation is the most suitable activation in this research [35,36].The activation function is described by the following equation: where x is infinite.The function can be ignored where  = 0 because it is considered irrelevant in practice.

2-3-State of Charge
SOC is the ratio of the remaining capacity to the maximum capacity of the battery.The SOC is calculated using the following mathematical formulation: The capacity of the battery needs to be maintained above 50% to anticipate the battery degradation towards time and periodic charge.

3-Research Method, Results and Discussion
The method is organized and shown in Figure 1.Firstly, a review of the literature was conducted to enrich the analysis and determine the attributes of the SOC prediction.Along with the attributes, the data was collected from the field, followed by the pre-processing.In data pre-processing, some techniques were employed: data cleaning, transformation, standardization, and dimensionality reduction.The data were split into three parts: data training, validation, and testing to develop the prediction model with the neural network.To validate the model, the prediction model with Neural Network is compared to the random forest model and Support Vector Machine model (SVM).The validated prediction model was implemented on a dashboard.

3-1-Data Collection
The raw data were collected from the lead acid battery in the solar power plant in Makalehi, Indonesia.The battery in the solar power plant has the same characteristics as the battery in SDD, so the model can also be applied to the battery in SDD using transfer learning.The data collection period was three days.The raw data were taken every 10 minutes.If there were certain events (such as the inverter started, charging started, discharging started, etc.), this type of data was also added to the raw data.
The dataset was organized by 777 rows with 231 attributes.The attributes were reduced to 16 by selecting attributes that describe the battery's behavior.The attributes were reduced once more into four (current, voltage, temperature, and time).The research from Westerhoff et al. [37] showed that these attributes are enough to design an accurate neural network for battery applications.The lower attributes also have several advantages for building the model [38].Redi et al. [39] also described the most applied parameter in the battery application as being directed into these four attributes.

3-2-Data Pre-processing
The objective of data pre-processing is to transform the raw data into clean data for further modeling and to prevent biased estimation models.There are a lot of problems induced by the raw data, which were not transformed [40].However, data pre-processing has its own challenges: (1) improper techniques can lead to the loss of relevant information; (2) the involvement of an expert may be required for data verification; and (3) the process may have to be done repeatedly.
Firstly, the duplicated and null-valued data needed to be treated immediately.In this case, the treatment was to remove them.In total, there were eight duplicate values and twelve null values that had been removed.The data were removed because they did not have a significant effect on the modeling.The second was treatment for negative values in the current attributes.This negative value of the current indicates the difference in direction of the current; it transforms into a positive value.The statistical description of the data is shown in Table 1.Information on the attributes needs to be deepened to determine the appropriate treatment.A pair-plot and correlation heatmap are used to reflect the relationship and the distribution of each attribute, as shown in Figures 2  and 3.

Figure 2. Dataset pair-plot Figure 3. Correlation heatmap of the lead-acid battery
Thirdly, the data were standardized because of the inefficient weighting factor of raw data [41].The standardization used by StandardScaler transforms the mean value of the distribution into zero, and the standard deviation becomes one [42].The StandardScaler method was calculated using the following mathematical formulation: where  is the original value,  is the mean, and  is the standard deviation.Standardization was the final step of preprocessing data before the data were used for generating the model.The final data for the model were 757 rows with 4 attributes.

3-3-Neural Network Model
The data were split into a training dataset, validation dataset, and testing dataset with the ratio of 60:20:20, respectively.The training data were used in making estimation models.The validation data indicate whether the model is overfitting or underfitting.Data testing is applied to determine the performance of the model when faced with the new data.
The final model is structured in a way to generate the most accurate model with the lowest error.The model structure is shown in the Figure 4.The structure has been explained in section II where the number of neurons is determined by each of the layer purpose.The transfer function used for the hidden layer is ReLu as mentioned in section II.

3-4-Model Evaluation
The model generation was created through 1,000 iterations.The iterations affected the error since the error tended to decrease with every iteration.The model was first evaluated by comparing between the training and the validation dataset error.There are three conditions of model validation as depicted at Figure 5  This step is very important to validate the performance of the model.Unnoticed over-fitting often occurs in model development.Overfitting can be described by a model that was developed using data with noise.In the end, the model only trains observed patterns from the past and does not learn the relevant patterns [43].The results are figured as follows (Figure 6): The comparison shows that the model is in the optimal fitting condition, not underfitting/overfitting.After the comparison, the model was evaluated using the four following mathematical equations: where  is the estimated SOC,  ̂ is the actual SOC, and ̅ is the mean of SOC.MAE, MSE, and RMSE were measured to see how accurate the deviation of the prediction results from the actual value, while the  2  was to measure the amount of variation resulting from the model, which shows the percentage of correct predictions to the actual value.The closer the result to 0, the better the model for MAE, MSE and RMSE.In contrast, the value of  2  needs to be as close as possible to 1.The result is shown in the Table 2.As Table 2 shows, the model has a very good performance in estimation.In other words, the prediction only tends to deviate ± 0.175 from the actual value.The model even predicted 91% of the data to be similar to the actual value.To increase the confidence level of the model, other methods were also used for the comparison.Random forest and SVM were used as the other methods.The other method was chosen because of its similarities in being able to predict nonlinear data.These methods used the same data as the neural network to become estimation models.The results are shown in Table 3.As Table 3 shows, the model shows a better performance than Random Forest and SVM.The model generated the lowest MAE value of 0.175 compared to RF having MAE only 0.223 and SVM with MAE 0.259.

3-5-Dashboard
The dashboard was used to predict value from the model, since it performed well.The data needed to be inversed before it was used in the dashboard.The inverse transform used the following mathematical formulation: Equation 9 is an inverse version of Equation 4, where all the components were similar.After the data had been inversely transformed, the visualization was made to easily monitor the battery using the model.The visualization was made using Microsoft Excel, as shown in Figure 7.

Figure 7. Battery monitoring dashboard
A threshold was set to indicate when the battery should be charged.The threshold below 50% capacity shows that the battery needs to be charged.The capacity between 50-70% is a safe limit, and the battery should have been charged to maintain its performance.This type of lead-acid battery needs to be charged periodically to compensate for capacity degradation [44].

4-Conclusions
Data-driven methods show good performance in a battery application.Moreover, battery research tends to increase due to the increasing battery application.In this research, a neural network-based model is applied to predict the state of charge (SOC) of the lead acid battery with the aim of predicting an accurate SOC and helping the battery management system (BMS) optimize the load distribution.The dataset was collected from the battery at Makalehi Power Plant.The data used for the model have been treated by various steps of pre-processing to remove noise and generate higher accuracy.The model successfully delivers good performance through validation and evaluation.The performance of the model was compared with other methods, where the neural network demonstrated excellence rather than the other methods.The model was used not only for prediction but also for generating data for dashboard visualization to generate more impact on local farmers in Indonesia.There are two important results from this research:  The neural network-based model has better performance with MAE 0.175 than Random Forest and SVM MAE 0.223 and 0.259, respectively.Therefore, the neural network will be used rather than the two other methods.
 The dashboard has also been developed with the expected results so that farmers of SDD can use it for monitoring the battery to prevent overcharging and over-discharge.
The results of this research can be further extended in several directions.The data needs to be expanded since neural networks require a lot of data to improve their performance.A hyperparameter optimization tune is needed and may improve the accuracy of the model significantly.Lastly, all the models and dashboards need to be integrated into the cloud service for real-time services in order to increase the satisfaction of both stakeholders and farmers.

5-2-Data Availability Statement
Data sharing is not applicable to this article.

5-3-Funding and Acknowledgements
This work is supported by Research and Technology Transfer Office, Bina Nusantara University as a part of Bina Nusantara University's International Research Grant, entitled: Enabling Low-Cost Renewable Energy Systems for Localized, Self-Sufficient Power Production-Battery Charging Optimization using Machine Learning.With contract number: No.061/VR.RTT/IV/2022 and contract date: 8 April 2022.A.S.B., D.S., E.H., and T.P. gratefully acknowledge the support and infrastructure provided by the Oregon Institute of Technology (OIT) via the Provost Student-Faculty Innovation Grant No. 1435036/PVT433 titled "Localized, Self-Sufficient Power Production." A.S.B. and E.H. also gratefully acknowledge the Oregon Renewable Energy Center (OREC) at OregonTech.The authors are grateful for the support and facility from PT Impack Pratama Industri, Tbk, Jakarta, Indonesia who has been provide technology and module of solar dryer dome (SDD) (https://www.impack-pratama.com/solar-dryer-dome-2/).

5-6-Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this manuscript.In addition, the ethical issues, including plagiarism, informed consent, misconduct, data fabrication and/or falsification, double publication and/or submission, and redundancies have been completely observed by the authors.