Determinants Affecting the Use of the Internet by Older People

Objectives: The purpose of this study is to detect and analyze some factors which hinder or contribute to the positive use of the Internet by older people living in Central Europe, specifically in one region of the Czech Republic. Methods: The key method is a questionnaire whose results were processed by using a model of logistic regression. The research sample includes 432 seniors from senior houses, municipal ICT courses and the University of the Third Age, all coming from the region of Hradec Kralove in the Czech Republic. Findings: The findings of the proposed model confirmed that the key determinants in the Internet use by older people were age, previous experience with IT in their past occupation and active use of IT enhanced by some kind of training, in this case attending IT courses of the University of the Third Age. Education and gender have not proved to be significant determinants in this study. Novelty/ improvement: The introduced model of logistic regression enriches current literature on the subject by emphasizing the possible factors that influence the use of the Internet by seniors in the region. The survey also investigates which factors in comparison with each other act more and which less, and which factors are significant within the model and which are not.

Although there is still a significant gap between younger and older generations regarding internet use, since the older adults have less experience with computers and the World Wide Web [13], this gap is narrowing. The present generation of seniors is the generation of the so-called baby boomers born in the 1960s. For instance, in 2000, only 14% of seniors aged 65+ years used the Internet, nowadays 67% of these seniors use it. Even though research shows that seniors are more connected than ever, they still spend less than half the amount of time online than their younger counterparts. The data reveal that the older people 65+ years spend 15 hours online each week, compared to 32 hours among young people aged between 16 and 24 years [14]. Moreover, research also indicates that there is an increase in the use of the Internet among those older adults living in residential care facilities [15].
The smaller use of the Internet by older people is not only connected with the higher age of this target group, but also a lack of user experience [16][17][18]. The breaking age in this respect seems to be 75 years when the use of the Internet rapidly declines (cf. [17,19]). Furthermore, Ramon-Jeronimo et al. [18] list the following skills the older people usually lack. These include searching, navigating, sorting, filtering, and utilizing Internet information. Apart from the age and the lack of user experience, the use of the Internet by the older people is limited by other factors, such as low educational status [20], gender differences [18], low income [20,21], worse cognitive and physiological functions [16], living alone/or with someone, or rural/urban living [22].
This study focuses on the detection and analysis of some of these factors which hinder or contribute to positive Internet use by older people living in central Europe, specifically in the Czech Republic. The structure of the article is as follows: First, in section 2, the research questions are presented, the data obtained are described and the statistical procedures used are briefly characterized (the main attention is paid to the description of the logistic regression model, the use of which is original in the context of the problem of Internet use by the elderly). This aggregate model allows to describe, characterize and assess more possible explanatory influences on the use of the Internet by seniors than the partial models found in the previously published papers. In section 3, the authors present the results of the statistical analysis of the data and mainly deal here with the selection of explanatory variables and the construction and description of the logistic model found. In section 4, a more detailed discussion of the results is then made, including an interpretation of the coefficient values found for the logistic model and a comparison with known and published results. Finally, in section 5, conclusions are formulated, including a warning about possible limitations of the obtained model and the findings from it.

2-Materials and Methods
This section on research methodology sets the research question, describes the research sample, data collection and general characteristics of the data file. In addition, the authors generally describe the selected model and statistical procedures. The stages of research and methodology are summarized in Figure 1.

2-1-Research Question
Since society in Europe is rapidly aging, and the current society calls for the use of various technological devices, including the Internet, the issue of the use of the Internet by the older people/senior is quite topical. Therefore, the authors set the following research questions:  Which determinants affect the use of the Internet by older adults and to what extent do the individual determinants increase or decrease the Internet use?
 Which determinants influence the Internet use in the Hradec Kralove region more significantly and which are less influential?
Due to the nature of the data obtained by the questionnaire, a logistic regression was chosen to answer the research question [23].

2-2-Research Sample
Altogether, 432 seniors took part in the research. They all were inhabitants of the Hradec Kralove region, Czech Republic. The research sample was divided into three groups:  The so-called 'passive' seniors, i.e., those who were not interested in acquiring any knowledge of the latest ICT and their use, were represented by the respondents living in senior houses (group 1, DD, n = 109);  The so-called 'active´ seniors involved in further education, especially in the ICT field and who attended ICT courses held by the Hradec Kralove municipality (group 2, HK, n = 159);  The so-called 'active´ seniors enrolled in further education in general and who at-tended courses of the University of the Third Age (group 3, U3A, n = 164).
The age of 55 is the starting seniors' age in the Czech Republic [24]. The respondents' age ranged between 55 years (born in 1961) and 94 years (born in 1922) in 2016, when the research was performed. The sample comprised 317 females (F, 73 %) and 115 males (M, 27 %)this distribution follows the structure of the Czech senior population [25]. There were 15 respondents 85+ years old -12 females and 3 males. Most of the respondents (257; F 188; M 69) were born in the period of 1942-51.

2-3-Questionnaire
The questionnaire was disseminated among the older people in the senior houses (group 1, DD (Domov Důchodců in Czech language), ICT courses, after their completion, (group 2, HK, i.e. Hradec Kralove), and U3A courses (group 3, U3A, University of the Third Age). Respondents of each group were provided with basic information on the purpose of the research at the same time and they filled in the questionnaire in the printed form. The questionnaire lasted approximately 15 minutes and included 9 items, which were as follows:  Personal data (Q1: gender, age, level of education), One year before the research started the questionnaire was piloted in the group of 22 seniorsparticipants of ICT courses and adjusted to its present final form.

2-4-Data and material
The dichotomous variable Internet, which indicates Internet use, was chosen as the dependent variable. From the sample of 432 respondents, 101 (23.4%) do not use the Internet and 332 (76.6%) use the Internet. As independent and explanatory variables, Gender, Education, Employment, Source and Age were selected. Their description and reason for inclusion in the model are listed below.
Gender is a dichotomous variable that represents the sex of seniors: 1 -man, 2 -female. Based on the widespread assumption that men are closer to technologies of different types, it can be assumed that this variable could help determine the Internet users. The contingency table for this variable is shown in Table 1. Education is an ordinal variable with three categories representing the education of respondents: 1 -elementary education, 2 -secondary education and 3 -higher education. We included it in the model on the assumption that seniors with higher education have greater IT confidence than seniors with lower education. The Contingency Table of variables Education and Internet is shown in Table 1.
Employment is a dichotomous variable that characterizes whether a senior in one of his/her earlier jobs used IT: 0did not use, 1 -used. The reason for including this variable is the assumption that seniors with IT experience can use the Internet more than those who do not have that experience. The contingency table for the Employment and Internet variables is shown in Table 1. Source is a categorical variable with three categories. It represents three different groups of seniors: DDa group of seniors from retirement homes, HKa group of seniors who participated in an IT course organized by the University of Hradec Králové, U3Aa group of seniors who actively attend courses of the University of the 3rd Age in Hradec Králové. The courses are not primarily focused on using IT. The contingency table is shown in Table 1. The reason for including this variable in the model is that we suppose active seniors attending courses of further education can use the Internet more than the older seniors who are less active. Additionally, the seniors attending an ICT course will be able to use the Internet.  Age is a quantitative variable that represents the age of respondents at the time of interviewing, i.e. in 2016. The idea that younger seniors have more experience with the Internet than the older people speaks for the inclusion of this variable in the model. The description of the variable is shown in Table 2.

2-5-Statistical Procedures
Typically, the explained (or dependent) variable in the classical linear model is a continuous numeric variable [23]. If an alternative (dichotomous) variable is to be explained, the linear model must be adapted. To overcome the limitations of the classical linear model the logit transformation of alternative variable can be used, [26]. Let us consider the alternative variable Internet with the parameter π. In this paper, the variable will represent the fact whether a senior uses the Internet (Internet = 1) or does not use it (Internet = 0). The π parameter represents the probability that the alternative variable has the value 1, i.e. π = P(Internet = 1). The opposite phenomenon has the probability 1-π, i.e. 1-π = P(Internet = 0). In order to explain this value and to use the linear model, it is necessary to transform it so that the explanatory variable is continuous with the range of values in the set of real numbers. For this transformation, a concept of odds is used, for which applies odds(Internet = 1) = P(Internet = 1))/(P (Internet = 0) = π/(1-π). Finally, the natural logarithm of the chance for which logit(Internet = 1) = ln(π/(1-π)) is also considered. Now, one can search for a linear model for this variable. In more detail, a description of logistic regression can be found in [26,27].
First, the dependence of Internet use on individual variables that are described in Table 1 and Table 2 was determined. We dealt with these problems and variables in detail in [16] and [17] and these two preceding studies allowed us to identify the possible independent (explanatory) variables. Chi-square independence test was then used for categorical variables Gender, Education, Employment and Source. The results are summarized in Table 3 and to measure associations between the given variables and Internet variable Cramer's V, Somers's d or Cohen's κ were computed, [26]. In the case of dependence of the Internet use on Age variable, t-test for two independent samples was used. Based on these tests, the variables Gender, Education, Employment, Source and Age were included as explanatory variables in the logistic model.
The significance of the independent variables was tested using the Wald criterion. The final logistic model is presented in Table 4 in Section 3. The model is then evaluated using the Chi-square test for the overall model, classification tables, Cox and Snell coefficient R 2 and Nagelkerke coefficient R 2 , Hosmer and Lemeshow's good fit test, and finally, the suitability of the model was also assessed on the basis of the ROC curve and AUC value, [26][27]. The value 0.05 was used for the significance level of all tests performed. IBM SPSS Statistics 25 software was used for statistical calculations and modelling.

3-Results
To characterize the logistic model, the authors first identify the explanatory variables. Secondly, the values of coefficients are estimated. Finally, the verification of the model is introduced.

3-1-Explanatory Variables
The selection of explanatory variables is partly based on authors' previous studies [16,17]. Dependence on other possible variables in our research sample data set did not appear to be statistically significant. It means that the datadriven method of selecting explanatory variables was applied. The desirability of variables described in Table 1 and Table 2 is introduced here. According to the results in Table 3, the hypothesis on the independence of the variables Gender and Internet cannot be rejected. It can be stated that the use of the Internet by older people is independent of gender. Given data suggests that the Internet is used in the group of older men and women in the same way. In the logistic regression model, the Gender variable will be included, but based on these findings, it can be expected that its contribution will be irrelevant in the model and thus, it can finally be neglected. Based on the results in Table 3, the hypothesis on the independence of Education and Internet variables can be rejected. Therefore, it can be stated that the use of the Internet by older people is not independent of the education acquired by the senior. Given that the variable Education is ordinal and the Internet can be considered as an ordinal quantity for these purposes too, contextual rates can be used. Somers' d for the dependent variable Internet has a value of 0.24 (asymptotic p-value <0.01). Thus, it can be stated that the dependence of the Education and Internet variables (on a scale of -1 to 1) is not too strong, but it exists. The positive sign of Somers' d means that seniors with higher education use the Internet more. Similarly, Spearman's correlation coefficient is 0.29 (p-value <0.001). Therefore, we can confirm the weak growing relationship between the degree of education and the frequency of Internet use. Based on the above, it can be assumed that the Education variable may be useful for distinguishing seniors among Internet users, respectively on those who do not use the Internet. Therefore, it can be included in the logistic regression model. Table 3 show, the hypothesis on the independence of the Employment and Internet variables can be rejected. It can be said that Internet use by older people is not independent on the seniors' experience of using IT in previous jobs. Calculation of Cramer's V leads to 0.44 (p-value <0.05). Thus, the dependence of the Employment and Internet variables is not very strong on the scale from 0 to 1, but it exists. In this case, the Cohen κ coefficient can be used to assess the degree of consent. This coefficient is 0.46 (p-value <0.001), indicating a moderate degree of consent. This means that those who did not use IT before in the previous job are also currently less likely to use the Internet, and those who used IT in the previous job use the Internet. Based on the above, it can be assumed that the Employment variable is useful for distinguishing seniors among the Internet users and this variable can be included in the logistic regression model. Table 3, the hypothesis on the independence of the Source and Internet can be rejected. Therefore, it can be stated that Internet use by older people is not independent of the senior group. Calculation of Cramer's V leads to value 0.44 (p-value <0.05). Thus, we can say that the dependence of Source and Internet is not too tight on the scale from 0 to 1, but it does exist. On this basis it can be assumed that the Source variable is a suitable candidate for distinguishing seniors from Internet users, respectively, from those who do not use the Internet. Therefore, the variable Source can be considered as a valid variable in the logistic regression model.

Based on the results in
Finally, we consider the relationship of the Age and Internet variables. Based on t-test for two independent samples with unequal variance (F = 17.5, p-value <0.05), the hypothesis on equality of means (t = 5.86, d.f. = 133, p-value <0.01) can be rejected. It can therefore be said that the average age of the seniors in the Internet user group is different from the average age of seniors in the group of seniors who do not use the Internet. These facts lead to the assumption that the Age variable is a suitable candidate for the logistic regression model.

3-2-Logistic Model
When constructing the model, it turned out that it is not necessary to include the Gender variable (Wald = 0.340, d.f. = 1, p-value = 0.560) and Education variable (for the reference category is Wald = 2, p-value = 0.530, and similar to other categories). The contribution of both variables proved to be negligible; Wald statistics and its significance quantitatively show that it is possible to omit them from the model [26][27]. Instead of the five initially considered explanatory variables, only three of them appear in the resulting model ln(π/(1-π))=B0+ B1.Source+B2.Age +B3.Employment+e or; where B0, B1, B2, B3 are unknown real parameters that need to be estimated, exp is the exponential function and e is the error term or disturbance in the relationship that represents other factors that affect the dependent variable.
In order to use the logistic regression model for nominal data with more than two categories, it is necessary to encode them using indicator variables [26]. To specify the Source variable, which represents three groups of seniors, it is necessary to use two indicator variables, particularly the IT_Course indicator variable, representing the membership of the group attending IT courses and the U3A_Course indicator variable, representing the group of seniors attending the University of 3rd Age, the group of seniors from Senior Houses is a reference, cf. [26]. The calculated values of parameters of the proposed logistic regression model are summarized in Table 4, where Wald chi-square tests of individual predictors are also included.

3-3-Evaluation of the Model
Before using and interpreting the coefficients of the introduced model, it is necessary to consider whether it matches the data well. Based on all available tests and statistics, it can be assumed that the model matches the data well as it applies in Table 5. The result for the overall model means that the aggregate hypothesis on the zero coefficients can be rejected and the information about the values of the independent variables allows a better prediction of the dependent variable than it would be possible without this information. Moreover, the result of Hosmer and Lemeshow test means that we do not reject the null hypothesis that the model fits the data well. The quality of logistic regression evaluation can also be assessed by using the ROC curve and the AUC value [27]. The results can be found in Table 6. Because AUC> 0.5 significantly, the model with explanatory variables Source, Age and Employment can be used to classify seniors as Internet users. When using the considered model, the classification of seniors is significantly better than for random distribution. The sensitivity of the model is almost 94%, which means that the model correctly classifies the Internet users with the probability of 94%. The specificity of the model is 56%. It means that the ability of the model to correctly identify seniors who do not use the Internet is 56%.

4-Discussion
To interpret the values in Table 4, we choose the logistic model for odds of using the Internet and odds ratios OR in this table.
 The first value of odd ratio OR 11.2 means the following: if the variable IT_Course representing the seniors trained in the IT course increases by 1 and other variables do not change their values, the odds of using internet increases by about 11 times. In other words seniors who attend IT course have about 11-times greater odds of using internet than the seniors in retirement homes.
 The second value of odd ration OR 3.7 refers to the variable U3A_Course, which indicates a group of seniors who are actively attending the University of the Third Age. This value indicates that the seniors in this group have about 4-times greater odds of using the Internet than the seniors in retirement homes.
 The value of odd ration OR 0.9 refers to the Age variable representing the age of seniors. Because the parameter value is less than one, it means that with the increasing age of seniors, their odds of using the Internet falls. If, in particular, they are seniors whose age differs by 1 year and the other values of the explanatory variables remain unchanged, the odds of using the Internet will decrease by about 0.9-times.
 The last parameter OR in Table 4 has the value 6.1 and refers to the Employment variable that indicates whether the senior used IT in his/her earlier job. Since the value of this parameter is greater than one, seniors who used IT in their jobs have greater odds of using the Internet. Unless the other values of the explanatory variables change, these seniors have about 6-times higher odds of using the Internet.
The above given findings can be interpreted in the following way. We also introduce the correspondence of our findings with the other published results.
 From the analysis it is clear that the highest impact on the increase in the chances of using the Internet is the fact whether the senior is active and attends an IT course or if the senior used the Internet in his/her previous/earlier job. These results are not surprising. The seniors who enrolled in the IT course represent a group of seniors who are active and interested in learning about new IT technologies. They have been taught that useful information can be found on the Internet and have tried out how to use it as part of the course. They have therefore lost the initial inhibitions of using new unfamiliar technology. Similarly, the seniors who have worked on computers in their previous jobs have skills they can use when using the Internet.
 The lesser the effect is when the senior is generally active. In this case, it concerned the seniors attending different courses of the University of the Third Age. This confirms the findings of a study by Loipha [5]. It can be assumed that generally active seniors are naturally more motivated to use the Internet but may have problems overcoming technological barriers. As mentioned in the previous paragraph, training in IT technologies significantly increases the ability and willingness of seniors to use the Internet.
 Age has a significant impact on Internet use, and it is this factor that slightly decreases the seniors' chance of using the Internet. Author et al. [16] point out that seniors should be divided into three basic age groups according to their special needs, which must be taken into consideration when designing, developing or revising a new technological device for these aging population groups. Similarly, the study [22], also based on logistic regression, showed that age strongly affects Internet use; that even only one year makes a significant difference in the Internet use. The research indicates that this might be caused by disinterest [28], physical disorders [5], or cognitive impairments [1].
 On the contrary, determinants such as a level of education and gender proved to be insignificant. As far as education is concerned, its role in comparison with other determinants is not measurable. This is surprising because in most research studies this factor is quite important, cf. [18,19,29].
 Furthermore, the results of logistic regression also indicated that there was no difference in the use of the Internet between males and females, which is in accordance with the findings of [5,[29][30] who also did not identify any differences between older male and female adults in Internet use. For example, [30] revealed that gender in the younger age group (50-65 years) was no more a significant predictor of internet use, but the social salience appeared to be important for the subjects' own Internet access. However, other studies, such as [18,22] consider this determinant an important factor. In their studies, male respondents appeared to be more prominent in the Internet use.
Overall, the findings show that the governments should support Internet access among the older population since ICT skills can enable better employability in the future, especially with the threat of extended working life for EU citizens [31][32][33]. Furthermore, they should focus on the persons' life course and their social network and infrastructural ICTenvironment [30]. Future research should be more representative and focus on other significant determinants, such a level of income, rural/urban living or living alone/or with someone.
The limitation of this study consists in a selective sample of the respondents coming from only one region in the Czech Republic and predominantly active subjects in the Internet use.

5-Conclusion
The findings of the proposed model have confirmed that the key determinants in the use of the Internet by older people are age, previous experience with IT, and active use of IT enhanced by some kind of training, in this case attending IT courses of the University of the Third Age. Therefore, it is advisable to provide training in the use of the Internet for older people in order to facilitate its adoption among this generation group. On the contrary, gender and education status have not proved to be significant determinants in this study. The authors are aware that the given findings concern seniors living in the Hradec Králové Region. For more general findings about Czech seniors, it would be necessary to extend the selection to the whole Czech Republic. The introduced model of logistic regression enriches current literature on the subject by emphasizing the possible factors that influence the use of the Internet by seniors in the region. The survey also investigates which factors in comparison with each other act more and which less, and which factors are significant within the model and which are not.

6-2-Data Availability Statement
The data are available upon request from the corresponding author.

6-3-Funding and Acknowledgements
This research was supported by the SPEV project 2021, run at the Faculty of Informatics and Management, University of Hradec Kralove.

6-4-Conflicts of Interest
The authors declare that there is no conflict of interests regarding the publication of this manuscript. In addition, the ethical issues, including plagiarism, informed consent, misconduct, data fabrication and/or falsification, double publication and/or submission, and redundancies have been completely observed by the authors.