The Effect of Swabs on Modeling the First Wave of the COVID-19 Pandemic in Italy

The daily fluctuations in the released number of Covid-19 cases played a big role at the beginning of the pandemic, when local authorities in Italy had to decide whether imposing restrictive policies. When an increase/decrease was communicated, especially a large one, it was difficult to understand if it was due to a change in the epidemic evolution or if it was a fluctuation due to other reasons, such as an increase in the number of swabs or a delay in the swab processing. The aim of this paper is both to model the main trend of the outbreak evolution in the number of confirmed cases and to describe the daily fluctuations strongly dependent on the daily number of swabs. For our analysis, we propose a nonlinear asymmetric diffusion model, which includes information on the daily number of swabs, to describe daily fluctuations in the number of confirmed cases in addition to the main trend of the outbreak evolution. The proposed model is found to be the more efficient for prediction, as compared to 6 already existing models, including the SIRD and the logistic models. The new model combines the properties of innovation diffusion models with a parsimonious way to exploit information about swabs.

The capacity of processing swabs is of particular importance for detecting the state of the epidemic, measuring the lockdown effects and, most importantly, reducing the outbreak. In fact, only a swab outcome enacts the procedure to eventually isolate the infected individual together with his/her close contacts. Consistent delays in this process will lead to failure in controlling the outbreak. Moreover, public attention is focused every day on the released number of confirmed cases as a measure of the state of the epidemic, especially at the beginning of the outbreak and during the lockdown to evaluate its effect. When an increase/decrease is communicated, especially a large one, it is not easy to understand if it is due to a change in the epidemic evolution or if it is a fluctuation due to other reasons, such as delays in the swab process or in the laboratory organization. The effect, however, is relevant both to public opinion, spreading inaccurate optimistic/pessimistic views of the situation, and to authorities who must decide whether to adopt restrictions and at which level.
Our opinion is that it is necessary to include the number of swabs to describe the local fluctuations in the epidemic evolution in addition to detecting the main trend. To the best of our knowledge, only a few models are present in literature with this characteristic. In fact, at the beginning of the outbreak, the curve of confirmed cases was usually modeled through an exponential [5] or a logistic growth model [6,7]. When the data collection window became long enough, the models were usually of two types: the compartmental and ARIMA models. The compartmental models represent the more blooming field of research and are based on modeling the infection, recovery and mortality rates by using the times series of the actually positive, recovered and dead cases [1,[8][9][10][11][12][13][14][15][16]. A review on the epidemic models used for the COVID-19 epidemic is presented by Xiang et al. [17].
In the field of spatial statistics, Guliyev [18] contributed a spatial panel data model on confirmed, recovered and dead cases. Bartolucci and Farcomeni [19] proposed a space-temporal model for the weekly number of positives, where the number of swabs is used as an offset. Meanwhile, Benvenuto et al. [20] and Chintalapudi et al. [21] represent two contributions based on an ARIMA model. Triacca and Triacca [22] proposed a log-polynomial model for the ratio between the daily new diagnosed cases and the number of swabs, with a forecasting aim: in particular, the model overcame an ARIMA model in terms of forecasting accuracy especially for a 1-step-ahead horizon. Such an approach is thus useful for very short-term predictions, but it does not allow to model the global cycle of the pandemic diffusion.
In this paper, we made an effort to describe the cumulative number of confirmed cases in the five most affected Italian regions, based on the combination of a nonlinear model and the number of completed swabs. In the class of growth models, aimed at describing a diffusion cycle, we propose a new version of the dynamic potential model [23], where the novelty consists of the formulation of a new intervention function with the number of daily swabs as an explanatory variable. The model is particularly parsimonious since the intervention function has only one additional parameter. The base of the dynamic potential model was chosen since a) it has an asymmetric shape and makes it possible to model a 'saddle', which is a rather common nonlinear pattern; b) it gives an estimate of the total number of confirmed cases at the end of the epidemic; and c) the total number of confirmed cases is not fixed throughout the outbreak, but it is allowed to change over time. Since the capability of processing swabs increased over time and, consequently, the meeting criteria for people for being tested were enlarged with the aim of detecting a larger number of asymptomatic positive subjects, it is sensible to suppose that the number of diagnosed cases increases with time.
The proposed model was compared, in each region, with five alternative growth models: the logistic model was used as a benchmark; the Generalized Bass model [24], eventually including a parameter accounting for asymmetry [25], with fixed market potential; and the classic dynamic potential model [23], eventually including a seasonal component [26]. Moreover, a SIRD model was used as a second benchmark by summing the predictions of actually positive, recovered and dead cases. Three-week forecasts of the spreading dynamics were provided for each model as well. The models were compared in terms of 2 and BIC values, for the cumulative values. The squared linear correlation coefficient between observed and fitted daily values was evaluated as well.
The rest of the paper proceeds as follows: In Section 2, we provide a description of the available data. In Section 3, we describe the proposed model and the 6 competing models. The results obtained in the 5 Italian regions are illustrated in Section 4. Some concluding remarks follow in Section 5.

2-Data
The data of the five Italian regions most affected by the epidemic, namely Veneto, Lombardy, Piedmont, Tuscany and Emilia-Romagna, were downloaded from the Italian Civil Protection Department website [27]. The data collection period was from the 24th of February to the 3rd of May 2020, which is the last day before the so-called Phase 2, when the lockdown was partially removed. For each day, the data consist of currently infected patients, both hospitalised or home isolated, cumulated recovered people, cumulated deaths, the total number of confirmed cases, which is given by the sum of the latter three components, and the cumulative number of swabs.
Official data before the 24th of February are not available, but the first infected people were detected on the 21st of February in both Veneto and Lombardy. Since the first days are important to correctly estimate the spread of an epidemic, for the latter regions we integrated the official dataset with information released by the newspapers and/or the official websites of the Regions about cases for the 21st to 23rd of February.
The reconstruction of the number of swabs in Lombardy for the first three days was facilitated by the Region press release on the 21st of February [28] and by the news article of [29] on the 23rd; the value of the 22nd was imputed through their mean. For the Veneto region, the imputation was subjectively fixed at 200, 700 and 1500 for days 21-23, respectively. Moreover, the swab time series were cleaned since the cumulative values were not nondecreasing. This happened on the 25th of February for Lombardy, and the imputation was made by the average of the former and latter values. For Emilia-Romagna, there was an analogous problem for the 28th to 30th of March, and the imputation was performed based on the information released in [30,31]. Figure 1 shows the daily number of confirmed cases in the five regions. The diffusion process peaked around the second half of March in the regions where the epidemic started: Lombardy, Veneto and Emilia-Romagna. First, it is worth noting the peculiarity of the epidemic in Lombardy, as the spread was much greater than in the other regions. What is common is that the shape of the spread was asymmetric with a faster increase and a much slower decrease. Piedmont has a different peculiarity, with a rather flat spread in the second part of April because this region encountered problems with taking and processing the swabs. The capability of each region in processing swabs changed over time. Initially, swabs were essentially performed only on symptomatic patients and their strict contacts. Some regions, however, quickly increased their capability to process swabs, and asymptomatic cases could be detected and isolated as well. Since only people with a positive swab can be officially recorded as infected, and as many infected people do not show symptoms, a large number of swabs is a prerequisite for diagnosis: the more swabs taken, the more cases found. The relationship between confirmed cases and the number of swabs is shown in Figure 5, with daily values. The fluctuations in the number of confirmed cases depend, to a certain extent, on the number of swabs processed, and for some regions, such as Lombardy, Veneto and Tuscany, the data show an almost regular weekly pattern, due perhaps to the organisation of the laboratories.

3-Research Methodology
A general diffusion of innovations model can be defined through a nonlinear regression model as follows: where ( ) are the cumulative sales of a product at time and ( , ) = ( ) is a specific structure to be used to describe an evolution process. Here, are assumed to be i.i.d. Gaussian with variance 2 . The components of the parameter vector are jointly estimated using nonlinear least squares (or, equivalently, likelihood estimation).
In this paper, we will compare the performance of alternative evolution structures. The basic model is a logistic one (LOG): where is the mode, median and average of the distribution, while is a shape parameter. Parameter is the market potential, which is the limiting value for ( ), as goes to infinity.
The Generalized Bass Model [24] is defined starting from the differential equation; Its solution, for the initial condition (0) = 0, is; where is the market potential, is the innovation coefficient, is the imitation coefficient and ( ) can be any integrable function. The effect of the intervention function ( ) is to accelerate or decrease diffusion with respect to a symmetric unimodal path, which would arise in (4) for ( ) = 1 for all values. For values such as ( ) > 1 diffusion is accelerated, while ( ) < 1 corresponds to time periods with decreased diffusion speed. Below, we examine the model (GBM ) arising when ( ) is specified by the so-called rectangular shock: This allows us to describe the diffusion of a product for which we observe a constant shock with intensity , either positive or negative, in the time interval [ , ], [32].
Due to the asymmetric path observed for almost every region, we also examine a more flexible model from Bemmaor [25]: where is a further parameter allowing for asymmetry (positive asymmetry for > 1, negative asymmetry for < 1).
If we insert a rectangular shock into it, we obtain the following model (BeGBM ): with function specified as in Equation 5.
A different way to provide flexibility to the evolutive structure can be obtained through a dynamic market potential model (DMP) [23]: where and are two parameters to describe how fast the dynamic market potential approaches its maximum value, .
Additionally, the DMP can be perturbed by shocks. For a general intervention function, ( ), we obtain: If in model Equation 9 we use, as proposed in Guidolin and Guseo [26], the intervention function: We allow the model to incorporate cyclic seasonal fluctuations of width 1 and 2 with period (DMPseas).
2 GBM with rectangular shock GBM 6 ( , , , , , ) (4)+(5) 3 Bemmaor GBM with rectangular shock BeGBM 7 ( , , , , , , ) 5 Dynamic market potential + seasonal effect DMPseas 8 ( , , , , , 1 , 2 , ) (9)+(10) 6 Dynamic market potential + swabs DMPsw 6 ( , , , , , ) Here, we propose to assess the usefulness of a dynamic market potential model as in Equation 9, but with an intervention function depending upon the number of swabs analyzed at day , ( ) (DMPsw). In particular, we suggest using; where and are the average and the standard deviation, respectively, of the ( ) values recorded during the observation period. It is easy to appreciate that such a structure accelerates, with respect to an underlying trend described by a DMP, the number of cases whenever ( ) exceeds its average, while cases are reduced with a below-average number of swabs.
A further benchmark is proposed in this work with the SIRD model, a compartmental model used for describing and predicting the evolution of an infectious disease. Every individual of the population may flow between the compartments of 'Susceptibles' ( = ( )), 'Infected' ( = ( )), 'Recovered' ( = ( )) and 'Deaths' ( = ( )). We could apply this model using the data of currently infected patients ( ), cumulated recovered individuals ( ) and cumulated deaths ( ). The forecasts of the confirmed cases are then calculated by summing the forecasts of , and .
This model uses the following system of differential equations: where is the population size, while , , and are the rates of infection, recovery and mortality, respectively. The initial conditions 0 and 0 correspond to the observed number of recovered and dead individuals on the first day of the collection period. In this work, 0 and are estimated to maximize the fitting, as the goal of this work is to describe and predict the evolution of the total number of confirmed cases. The last initial value to be defined is 0 = − 0 − 0 − 0 . The parameter set corresponds to ( , , , , 0 ). Table 1 proposes a summary of all the models that will be applied in this study, while Figure 3 describes the research methodology steps.

4-Results and Discussion
Models of Table 1 were applied to the data of the five considered regions, and forecasts up to May 24th are provided (three weeks ahead for each region). The first six models were fitted to the cumulative confirmed cases using NLS estimation; asymptotic standard errors and 95% asymptotic marginal confidence intervals ( ) are provided.
The SIRD model was fitted to I, R and D time series, assuming errors to be normal distributed, and therefore using MLE estimation. was used with the bbmle library [33]. To ensure that , , estimates lay in (0,1), the model was reparametrized using their logit transformations. Moreover, for computational aspects, the natural logarithms of both and 0 were used. Standard errors and 95% profile likelihood (based on inverting a spline fit to the profile likelihood) are given. By summing the fitted values of I, R and D, we also obtained the fitted values of the cumulative confirmed cases: this is used as a benchmark for evaluating the performance of the proposed models. For each region, Figure 2 depicts the evolution of I, R and D on the right −axis and of the daily confirmed cases on the left −axis; solid lines correspond to fitted values. Table 2 summarizes the values of the determination index 2 for all models: the huge values of 2 are unsurprising, given that we are working with cumulative data and any S-shaped fitting produces high determination indexes. A standard approach advises the use of the 2 measure for comparative purposes only [34,35]; we, therefore, provided also the BIC (evaluated with cumulative values) and the squared linear correlation coefficient 2 between observed and fitted daily values as well.

4-1-Veneto
The results for Veneto are displayed in Table 2 ( 2 , BIC and 2 between observed and fitted daily values), in Tables 3, A.1-A.6 (for parameter estimates for all the models fitted) and in Figures 2(a) and 4, where observed and fitted daily values are plotted. From these results, we can infer that the logistic (Figure 4(a)) and the SIRD (Figure 2(a)) models, which both represent two commonly used benchmarks, are not adequate to describe the asymmetrical evolution of the epidemic, together with the large fluctuations; in fact, for the Veneto data, 2 and 2 values are much lower than obtained for the other models; while BIC is larger, confirming their reduced fitting.
The results in Tables A.2 and A.3 show that a positive (̂> 0) rectangular shock is significantly diagnosed both in the GBM and the BeGBM . In both cases, the shock denotes an increase in cases with respect to the main trend starting around March 5th ( ≃ 14) and March 9th ( ≃ 18), respectively, and ending around March 24th ( ≃ 33) and March 25th ( ≃ 34), respectively. Within these models, the shock has the function of fitting the steep increase in cases in the first period of the epidemic. Since the data for Veneto start from February 21st, both shocks end almost exactly two weeks later than the lockdown, established on March 8th. This confirms that the lockdown policy was essential in reducing the spread of the epidemics, as the incubation period is up to 14 days. The BeGBM suggests that the decrease after the peak is much slower than the initial growth (̂=2.316> 1), and, for this reason, we expect that the subsequent models, which allow for asymmetry too, will also have good performance.
The DMP model (Figure 4(d) and Table A.4) summarizes the trend of the series well, but the 2 and the BIC are slightly worse than the BeGBM . We also highlight that a good fit with the DMP could not be attained in the early phase of the outbreak with a smaller number of observations, while the BeGBM could be correctly identified. Among asymmetric models, however, only the DMPseas and DMPsw are able to take into account the fluctuations around the main trend. The performance of the DMPseas is very good ( 2 =0.999825, 2 =0.834856). However, the weekly cycle (̂= 7.003 days, see Table A.5) in the DMPseas is still partially unsatisfactory because the range of its fluctuations is not sufficiently large compared to the range of the observations (Figure 4(e)). Moreover, the high number of parameters in the DMPseas penalizes it in terms of BIC (689.2245), whose value is larger than the corresponding values for the BeGBM and DMP.
Results of the fitting discussed up to this point highlight that the large fluctuations cannot be completely described only with a shock, nor with a weekly cycle. As highlighted in Section 2, the complete time series with the daily number of processed swabs is available. Figure 5(a) shows its values (right −axis) in relation to the number of daily confirmed cases (left −axis). We notice the good agreement between the paths of the two series, and the correspondence of their peaks suggests that this relationship could be exploited. The DMPsw (Figure 4(f) and Table 3) performs very well. In fact, we obtained the largest values for 2 , 0.999898, and 2 , 0.858459, for this model as well as the lowest BIC value, 641.4728. The latter value, in particular, is reduced by the small number of parameters of this model. Clearly, the strict comparison between the BIC of this model and the values obtained for the other models should take into account that, in this model, the complete series of processed daily swabs, ( ), is used as an input to the model. However, this information is available, and the performance of the model suggests it is useful for achieving an accurate description. These results suggest that the daily number of cases in Veneto followed an asymmetric trend, as modelled by a DMP model, but large fluctuations around that trend can be observed as a consequence of different numbers of swabs processed each day. In particular, ̂≃ 0.469 suggests that the intervention function ( ) (11) increases to 1.5 approximately in a day with a number of swabs equal to + (while ( ) is equal to 1 in days with average number of processed swabs). This means that increasing the number of swabs from ≃ 5250 to + ≃ 8768 will result in one and a half more the expected diagnosed cases of that day. * It is important to underline that the forecasts displayed in Figure 4(f) have been obtained assuming that the number of swabs processed in the last week will be repeated in the subsequent three weeks. Of course, different scenarios could be easily used to simulate the effect of swabs of diagnosed cases.
With the fitted models, we also obtained the estimate of the total number of infected people during the first wave of the epidemic, with the information up to May 3rd 2020: that is, for the SIRD model and for the other models. Both the LOG and the SIRD models, which are the lower performing models, provided smaller estimates (̂= 18270 and ̂= 17586, respectively). The estimates obtained for in the remaining models are very similar, ranging from 19432 (GBM ) to 20085 (BeGBM ). In particular, for the DMPsw model, ̂=19932. Notice that this model predicts 1162 (+6.4%) and 1846 (+10%) more cases than the LOG and the SIRD models, respectively. Were the lockdown policy of Phase 1 confirmed after May 3rd, the estimate of given by the DMPsw model suggests that Veneto experienced, by May 3rd, 92% of all expected cases of the first wave (the total number of cases until May 3rd was 18318). As we will notice in the rest of the paper, this is a very high value.

4-2-Lombardy
Lombardy is the Italian region where COVID-19 spread in the most dramatic way. The total number of infected people on May 3rd was 77528 with more than 14000 deaths (about half of the death toll up to that date in Italy as a whole). The results for Lombardy are displayed in Table 2 ( 2 , BIC and 2 ), in Tables 4, A.7-A.12 (for parameter estimates for all the models fitted) and in Figures 2(b) and 6, where observed and fitted daily values are plotted.
For this region, the logistic (Figure 6(a)) and SIRD (Figure 2(b)) models are less effective models in describing the asymmetrical evolution of the epidemic. For the SIRD model, in particular, a good convergence point could not be attained. The results in Tables A.8 and A.9 show that, as observed for Veneto, a positive (̂> 0) rectangular shock is significantly diagnosed at the beginning of the time series, both in the GBM and the BeGBM . The GBM estimates the end of the shock on March 25th ( ≃ 34), but according to Figure 6(b), this is not perfectly matching with the data. This is the reason why, for this model, 2 is particularly small (0.690817). * Conversely, the BeGBM better identifies the end of the shock three days later, on March 28th ( ≃ 37), when we observe a relevant stable decrease. For this region, the lockdown policy had a delayed effect compared to what happened in Veneto, as the decrease in the number of cases was registered 20 days after March 8th, while the incubation period is up to 14 days. One reason for such a wider interval could be possible delays in taking and processing the swabs; in fact, the health system of Lombardy experienced an unexpected overload. * The 2 is evaluated on cumulative cases, which are the response variable. Since cumulative cases are measured on a much larger scale, discrepancies between fitted and observed values are less relevant on the 2 than they are on the daily values. In this case, the lack-of-fit around the peak heavily penalizes the 2 . With the DMP model (Figure 6(d) and Table A.10), it is possibile to fully appreciate the asymmetrical shape of the outbreak, especially the slow decrease in the number of cases in this region. However, its performance in terms of 2 , 2 and BIC is worse than that of the BeGBM .
The performance of the DMPseas, with a weekly cycle (̂= 7.005 days) (Table A.11), is not satisfactory, as it does not adequately capture the fluctuations (except for the very end of the series). Here, too, the 2 , 2 and BIC values are worse than those obtained with the BeGBM .
Finally, the DMPsw (Figure 6(f) and Table 4) performs very well. With this model, we obtained the largest values for 2 , 0.999919, and 2 , 0.902698. The BIC value for this model, 830.5628, supports it with respect to the BeGBM (850.6930), which was the best model up to this point. Figure 5(b) shows the number of swabs (right −axis) in relation to daily cases (left −axis). For this region, too, there is a great agreement between the paths of the two series, with almost perfect correspondence of their peaks. By comparing panels (a) and (b) of Figure 5, we can appreciate the differences in swab policies adopted by Veneto and Lombardy. The latter region, which has about twice the number of inhabitants as Veneto, processed on average 5705 swabs each day, which was not much more than the average in Veneto (5250), even though Lombardy experienced more than four times the number of officially diagnosed people compared to Veneto. We also notice that, for this region ̂≃ 0.538. The effect of swabs on diagnosed cases is thus larger than observed for Veneto.
The estimates obtained for in BeGBM , DMP, DMPseas and DMPsw substantially agree, ranging from 95018 (DMPsw) to 98723 (DMPseas). As underlined at the beginning of this subsection, the total number of cases until May 3rd was 77528. The difference between this number and ̂=95018 of the DMPsw is quite large, confirming that this region, by May 3rd, started Phase 2 in a riskier context than Veneto, having experienced only 82% of all expected cases of the first wave of the epidemic.

4-3-Piedmont
Piedmont is the Italian region where the peak in diagnosed cases occurred later (see Figure 1), as, at the beginning of April, one month after the lockdown, we still observed the highest number of daily cases. This could be due to the limited swabbing capacity in the first part of the epidemic, when swabs never exceeded 400 per day (see Figure 5(c)). The results for Piedmont are displayed in Table 2 ( 2 , BIC and 2 ), in Tables 5, A Also for this region, the logistic (Figure 7(a)) and SIRD (Figure 2(c)) models are the poorest performing models in terms of describing the asymmetrical evolution of the epidemic. These models have the lowest 2 and 2 values and the highest BIC.
The results in Tables A.14 and A. 15 show that here also a positive (̂> 0) rectangular shock is significantly diagnosed at the beginning of the time series, both in the GBM and the BeGBM . The two models provide the same estimate for the end of the shock on March 24th ( ≃ 30), exactly as observed for Veneto. However, after the end of shock, the number of daily cases continued to increase, although at a slower rate (see Figure 7(b) and (c)).
The DMP model (Figure 7(d) and Table A.16) enables describing, without shocks, the bimodal behaviour of this time series: the 'saddle', which is the slowdown between two relative peaks, is exactly positioned immediately after the end of the shocks estimated with the GBM and the BeGBM . If, on the one hand, the lockdown policy had the effect of reducing the number of daily cases (saddle after the first peak), on the other hand, the second peak is due to the increase in the number of swabs after April 8th, which made it possible to detect more infected people. The DMP for this region performs very well ( 2 =0.99988), with a small BIC value, also due to the parsimony of a model with only five parameters.  The DMPseas, with a weekly cycle (̂= 7.01 days) (Table A.17), describes the frequency of the fluctuations up to the second half of April, with an insufficient amplitude throughout the entire observed period. For this region, however, we observe the largest 2 value among all the previous models, 2 =0.999895, although the BIC value is larger than that observed for the simpler DMP.
The DMPsw (Figure 7(f) and Table 5) provides the largest values for 2 , 0.999905, and 2 , 0.843469. The BIC value for this model, 658.7328, supports it with respect to all other examined models. The width of the fluctuations in the observed series is not, however, fully described by this model (Figure 7(f)). This is also apparent from the value of ̂= 0.13 (Table 5), which is lower compared to the estimates for Veneto (0.469) and Lombardy (0.538).

The estimates obtained for in GBM
, BeGBM , DMP, DMPseas and DMPsw range from 31621 (DMPsw) to 34351 (BeGBM ). Conversely, we obtained ̂= 28968 for the LOG and ̂= 28955 for the SIRD; also for this region, the LOG and SIRD models predictions are smaller than for other models. The total number of cases until May 3rd was 27430. By comparing this value to ̂=31621 of the DMPsw, we notice that this region, by May 3rd, experienced 87% of all expected cases of the first wave of the epidemic.

4-4-Tuscany
The results for Tuscany are displayed in Table 2 ( 2 , BIC and 2 ), in Tables 6, 7, A.19-A.23 (for parameter estimates for all the models fitted) and in Figures 2(d) and 8, where observed and fitted daily values are plotted.
Also for this region, the logistic (Figure 8(a)) and SIRD (Figure 2(d)) models are the worst performing models in describing the asymmetrical evolution of the epidemic.
The results in Tables A.20 and A.21 show that a positive (̂> 0) rectangular shock is significantly diagnosed at the beginning of the time series, both in the GBM and the BeGBM . The two models provide the same estimate for the end of the shock on March 24th ( ≃ 29), exactly as observed for Veneto and Piedmont. Notice that here the data start on February 25th because Toscana did not report cases earlier than that. Differently from other regions, however, for the BeGBM the path is apparently less perturbed by the shock (see Figure 8(c) and ̂= 0.183 from Table A.21).
The DMP model (Figure 8(d) and Table A.22) enables effectively describing the asymmetric behaviour of this time series without shocks. The 2 is very high (0.999725), with a small BIC value equal to 570.1541, also due to the parsimony of a model with five parameters only.
The DMPseas, with a weekly cycle (̂= 6.982 days) ( Table 6), well describes the frequency of the fluctuations and, differently from other regions, this model is also able to describe their width (Figure 8(e)). The 2 is equal to 0.999792, although the BIC value is larger than that observed for the simpler DMP. The DMPsw (Table 7) returns a value for 2 , 0.999796, slightly larger than that observed for the DMPseas. From Figure 8(f), however, we can see that, after the peak, fitted values are almost unaffected by changes in the number of daily swabs, ( ), although this time series shows important variations in time (see Figure 5(d)), and ̂ has a large value (0.78) compared to other regions. Both the confirmed cases and swab time series exhibit a weekly pattern, probably due to the organisation of the swab hubs and laboratories, but since April, data of the two series do not appear to be fully synchronized. This consideration probably explains why the 2 value for the DMPsw model (0.778397) is lower than observed for the DMPseas (0.842169). The latter model better recognizes the weekly fluctuations in cases, and the less parsimonious model performs better.

4-5-Emilia-Romagna
The results for Emilia-Romagna are displayed in Table 2 ( 2 , BIC and 2 ), in Tables 8, A.24-A.29 (for parameter estimates for all the models fitted) and in Figures 2(e) and 9, where observed and fitted daily values are plotted. Also for this region, the logistic (Figure 9(a)) and SIRD (Figure 2(e)) models are the worst performing models for describing the asymmetrical evolution of the epidemic.
The results in Tables A. 25 and A.26 show that also here a positive (̂> 0) rectangular shock is significantly diagnosed at the beginning of the time series, both in the GBM and the BeGBM . The two models provide the same estimate for the end of the shock on March 28th ( ≃ 34), as observed for Lombardy, which is later than for Veneto, Piedmont and Tuscany. If we observe Figures 9(b) and (c), we notice, however, that the fit around the peak is not completely satisfactory, as a small decrease in the number of confirmed cases actually occurred a few days earlier than predicted by both models. The 2 for BeGBM is very high (0.999923).
The DMP model (Figure 9(d) and Table A.27) allows for a partially satisfactory description without shocks of the asymmetric behaviour of this time series. The 2 is equal to 0.999776, and its BIC value and 2 are worse than for the BeGBM (713.4355 and 0.9077, respectively).
The DMPseas, with a weekly cycle (̂= 7.004 days) (Table A.28), is not able to describe the fluctuations ( Figure  9(e)). The 2 is equal to 0.999862, and the BIC value is larger than observed for the BeGBM .
The DMPsw (Table 8) shows a value for 2 , 0.999925, that is slightly larger than that observed for the BeGBM . From Figure 9(f), we can see that the fitted values follow the observed data very well, except from the values around the peak. This behaviour, already noticed for the GBM R , is responsible for the low 2 value, which is equal to 0.904318, lower than that observed for the DMPseas (0.921988). * Here we obtained ̂≃ 0.409, a value very similar to the estimate obtained for Veneto. The estimates obtained for in the LOG model (25140) and in the SIRD model (24982) are even lower than the final observation, 26016, representing the total number of cases until May 3rd. The estimates obtained for in GBM , BeGBM , DMP, DMPseas and DMPsw range from 28094 (GBM R ) to 33428 (DMP). If we consider ̂=30633 of the DMPsw, we notice that this region, by May 3rd, experienced 84.9% of all expected cases of the first wave of the epidemic. * If we remove the observations from =27 to =33 (around the peak) from the evaluation of 2 for all the considered models, we obtain the following

5-Conclusions
The aim of this study was to propose a new model to describe the pattern of COVID-19 cases in the five most affected Italian regions during the first wave of the epidemic. The new model and alternative existing nonlinear model structures are fitted to the available data.
Our results suggest that the commonly used models, that is, the logistic and SIRD models, are not flexible enough. Not only are they incapable of describing fluctuations; they also fail to follow the asymmetric trend typical of all the regions: the increase in daily cases has been faster than the decrease observed in the second part of the outbreak.
In all the analyzed regions, both the GBM and the BeGBM highlight that a positive shock increased the number of daily cases in a period starting about two weeks after the first cases and ending around March 24th. From these results, we deduce that: i) the first diagnosed cases in a few hospitals of northern Italy at the end of February gave rise to a relevant acceleration in the spread of COVID-19 diagnoses after two weeks; ii) the lockdown policy established by the Italian government on March 8th played a fundamental role in reducing the spread of the virus and significantly decreasing daily cases two weeks after the start of the lockdown.
The models that are available in the literature perform quite well in describing the main trend of daily cases. However, observations also show significant fluctuations. As highlighted in the introductory section, daily changes have been reported by the media and have been the focus throughout the most critical weeks of the outbreak. The available data reveal that the pattern of analyzed swabs is often concordant with the pattern of confirmed cases. This is not surprising, but the models available in the literature cannot exploit this information. The model proposed here, starting from a trend described by a dynamic market potential diffusion model, makes it possible to perturb the trend through an intervention function depending on the number of analyzed swabs. The larger the number of swabs with respect to the average, the larger the number of predicted daily cases. The proposed model, which is highly parsimonious, is able to describe the daily fluctuations in cases very well and proved to be the best of the models analysed here for four of the five regions (Veneto, Lombardy, Piedmont and Emilia-Romagna). For the fifth region, Tuscany, the pattern of daily cases exhibits a weekly pattern, but it does not correspond to the pattern of the processed swabs. For this region, DMPseas performs better in terms of describing the observed data.
Ahead forecasts have also been evaluated for a period of three weeks. Forecasting is not the aim of this study since the final observation corresponds to the last day of complete lockdown. We can, however, use our forecasts as a benchmark corresponding to the evaluation of the trend under lockdown for comparison with actual observations pertaining to the so-called Phase 2, where many restrictions have been removed. Note that the Italian government decided to start Phase 2 simultaneously for all the regions, even though there still were differences among them. By comparing the final cumulative value of number of cases with the estimated final number of infected patients in each region at the end of the first wave, we observed that, while Veneto and Tuscany reached about 92% of the total number number of the expected cases by May 3rd, Piedmont, Emilia-Romagna and Lombardy, in particular, were still facing a more critical situation, having experienced, respectively, only 87%, 85% and 82% of all expected cases.
The proposed structure for the intervention function is quite intuitive, and we highlight that the proposed model could also be used to examine the effect on confirmed cases of different swabbing strategies by modifying the number of swabs in the intervention function. This is a useful feature to assess the effect of alternative scenarios for swabbing strategies. Alternative formulations, with a changepoint to allow for a different effect of standardised swabs before and after the changepoint, have also been estimated, but the improvement with respect to the proposed DMPsw model was negligible.

6-1-Author Contributions
Conceptualization, C.F. and C.M.; methodology, C.F. and C.M.; writing-original draft preparation, C.F. and C.M.; writing-review and editing, C.F. and C.M. All authors have read and agreed to the published version of the manuscript.

6-2-Data Availability Statement
The data presented in this study are available on request from the corresponding author.

6-3-Funding
The authors received no financial support for the research, authorship, and/or publication of this article.

6-4-Conflicts of Interest
The authors declare that there is no conflict of interests regarding the publication of this manuscript. In addition, the ethical issues, including plagiarism, informed consent, misconduct, data fabrication and/or falsification, double publication and/or submission, and redundancies have been completely observed by the authors.

Appendix I
This Appendix lists all the parameter estimates for models not included in the main text.