Overview on Case Study Penetration Testing Models Evaluation

Model evaluation is a cornerstone of scientific research as it represents the findings' accuracy and model performance. A case study is commonly used in evaluating software engineering models. Due to criticism in terms of generalization from a single case study and testers, deciding on the number of case studies used for evaluation and the number of testers has been one of the researchers’ challenges. Multiple case studies with multiple testers can be difficult in some domains, such as penetration testing, due to the complexity and time needed to prepare test cases. This study aims to review the literature and examine the evaluation methods used pertaining to the number of case studies and testers involved. This study is beneficial for researchers, students, and penetration testers as it provides case study design steps that are useful to determine the appropriate number of test cases and testers required. The paper's findings and novelty highlight that a single case study with a single tester is enough to evaluate a model. It also strikes a balance between what is enough for the evaluation and the need to reduce criticisms of a single case study by using two case studies with a single tester.

Among the challenging decisions that case study evaluation needs to consider is determining the number of case studies and the number of testers needed to evaluate the testing model [37,38].Since most research is criticized for generalization, researchers are attempting to create case studies that will assist them in generalizing their findings [7].Therefore, researchers frequently refer to relevant, well-known articles as criteria for determining the number of case studies and testers to employ in their model evaluation [39].During their Ph.D. journey, the authors spent a significant amount of time analyzing their models.A large portion of this time was spent trying to establish the reasons for the generalization critiques they received as a result of using a single case study with a single tester.The primary objectives of this work are as follows: (i) Provide design steps for using a case study in evaluating penetration testing models.(ii) Review related research in terms of the number of case studies.(iii) Review related research in terms of the number of testers (participants) used.(iv) Discuss the recommended number of case studies and number of testers when conducting penetration testing model evaluation.(v) Establish the fundamentals for future complex model evaluation guidelines.
In this work, we refer to the penetration tester who is taking part in the model evaluation as either a participant or a tester.Furthermore, in this context, the penetration testing model refers to the methodologies, procedures, rules, frameworks, and tools used to enhance and improve the penetration testing process in the domain of software engineering, specifically software security testing.Moreover, the case study concept here refers to the case study, which is also known as an observational study or field research in other literature.Case studies, in particular, are empirical investigations that examine a contemporary event in depth and within its context [8].
This study answers mainly the following five research questions.RQ1 (what are the steps to design penetration testing model evaluation using a case study?).RQ2 (what is the number of test cases used in the penetration testing model evaluation?).RQ3 (what is the number of testers used in the penetration testing model evaluation?).RQ4 (how many test cases should be used to evaluate the penetration testing model?).RQ5 (how many testers will be needed to assess the penetration testing model?).While striking a balance between complexity, time, and the results generalizability.The findings of this paper provide case study design steps for the researcher to direct their evaluation of penetration testing models.This may reduce the time and complexity of the evaluation process by helping the researcher to determine the optimal number of case studies and testers.This work contributes by recommending case study design steps based on literature to help researchers, students, and practitioners in the field of penetration testing throughout the evaluation process.This was accomplished by adhering to a well-structured and approved framework in terms of the number of implemented case studies and a sufficient number of testers.The answers to the aforementioned research questions will aid in the development of a procedural process design for evaluating the penetration testing model using a case study.Furthermore, it aids in establishing the number of case studies and testers required to evaluate the penetration testing model while taking into account the complexity of the testing and delivering generalizable conclusions.This study is a work of art that answers crucial issues and serves as a guideline for researchers and students when designing case studies.This paper's innovation lies in its simplicity and comprehensiveness.At the same time, this paper presents many perspectives and criticisms of research that uses few case studies or a small number of testers, which might be useful in enlightening researchers on additional facts that they should consider and justify while doing their research.This work paves the way for future research and development in areas such as software development, implementation, and maintenance models.

2-1-Penetration Testing
This section presents a deeper background on penetration testing in order to illustrate the benefits behind this type of testing, discusses the main processes, highlights real scenarios where penetration testing can be helpful, and provides an introduction to the penetration testing model.This section can be considered as a bird's view of penetration testing, which may help in gaining some insights into the current state-of-the-art practice in this domain.Penetration testing has been defined by Engebretson [40] as the tests that are made to discover hidden vulnerabilities by examining and exploiting computer systems (network and software) with the intention of enhancing the security of the systems under test.In other words, in penetration testing, testers are trying to mimic the act of hackers in order to find the weak points of the system under test and report these weak points.These reports are then used by the system administrator and developers to overcome these weak points [41].
As presented in the discussion, penetration testing can be presented as a reverse engineering process of the software and network in order to find what to test, how to exploit security vulnerabilities, and report the results.This reverse process engineering can be conducted using the source code (white box), or it can be done using the executable software and the network implementation (black box), while in some cases it can be a mixed method where only part of the code is available (gray box).The white box, black box, and gray box represent the main types of penetration testing [42].
Penetration testing is trying to find vulnerabilities before hackers do [43].This will give the software and network engineers an advantage over hackers to fix the penetration testing reposted issue before it is exploited during malicious attacks.Another advantage of penetration testing is that it is well-structured and clear for testers through the steps that are presented in models [44].This advantage also makes it subject to automation, which can be helpful when implemented on large-scale heterogeneous information systems [45].
Penetration testing can help software engineering in many ways.For example, penetration testing can be used to test the system against certain types of attacks, such as denial of service attacks [46], SQL injection attacks [47], and crosssite scripting attacks [48].Where the tester starts by analyzing the computer system under test, generating a set of test cases, selecting the important test cases, executing those test cases, and reporting the results [29,[49][50][51][52].

2-2-Model Evaluation: Related Works
Model evaluation is one of the most challenging tasks of research as it is complex, time-consuming, and requires extensive effort [53].The researchers have certain metrics to be evaluated within the domain of the research, like technical requirements, generally accepted standards, usability, and complexity.Along with the domain metrics, researchers need to meet the generalization criteria that confirm that their results can be generalized [14,18,[54][55][56][57][58][59].Among the most common methods to be used in the model's evaluation is the case study evaluation method [14,58,59].
Researchers who apply case studies for model evaluations normally tend to use previous research methods of evaluation to support and guide model evaluations or use case study guidelines and frameworks.Certain studies were published to provide guidelines and frameworks to support the researchers during the evaluation phase in multiple domains (e.g., software engineering, software testing, and other domains).Table 1 presents a sample of works that used or proposed a method, framework, and guideline for using case studies in research.There are also a huge number of research publications that use case studies.Table 2

Software engineering
Case studies for software engineers.

[61]
Can you trust a single data source exploratory software engineering case study?
3 [62] Qualitative methods in empirical studies of software engineering.

[58]
Scaling up case study research to real-world software practice.

General
Case study research: Design and methods.

[64]
Systematic case study research: A practice-oriented introduction to building an evidence base for counselling and psychotherapy.8 [65] Enhancing the scientific credibility of single-case intervention research: Randomization to the rescue.

Software Testing
Effect of test set minimization on fault detection effectiveness.

[71]
Managing crowd sourced software testing: A case study-based insight on the challenges of a crowdsourcing intermediary.3 [72] Internet banking security management through trust management.

[73]
Improving software security with precise static and runtime analysis.

[74]
Automated runtime testing of web services.

[75]
An evolutionary approach for system testing of android applications.
7 [76] Utilizing output in web application server-side testing.

[77] Software Engineering
Two's company, three's a crowd: A case study of crowdsourcing software development.

[78]
A longitudinal case study of an emerging software ecosystem: Implications for practice and theory.

[79]
Software product line scoping and requirements engineering in a small and medium-sized enterprise: An industrial case study.11 [80] Penetration Testing Variability testing in the wild: The Drupal case study.
12 [81] Improving security and privacy of integrated web applications.
13 [73] Improving software security with precise static and runtime analysis.

[82]
A hybrid framework for the systematic detection of software security vulnerabilities in source code.
15 [83] Cybersecurity testing and intrusion detection for cyber-physical power systems.
16 [84] Advanced automated web application vulnerability analysis.17 [37] Business Designing and conducting case studies in international business.

[85] Social
Rey: An intensive single case study of a probation youth with immigrant background participating in wraparound Santa Cruz.19 [86] A case study of extensive reading with an unmotivated l2 reader.

[87]
Health Is our food safe?an assessment: On the European union food safety policy, concerning the safety of meat and animal-derived food products in the EU.
As shown in Table 1, many studies have proposed guidelines or frameworks that researchers can adapt to their research.For instance, Perry et al. [60] proposed guidelines for software engineering research when using the case study.Ghauri [37] proposed guidelines that can be adapted to international business domain research when using the case study.Similarly, Yin [8] has discussed in detail how to use case studies in research.Flyvbjerg [63] discussed the general misunderstanding of case studies.McLeod & Elliott [64], Zivkovic [68], and Kratochwill et al. [67] are researchers who discussed case studies in the research domain.
Similarly, Table 2 shows multiple studies in certain related domains that used a case study approach in the evaluation process of their proposed models.For instance, Zhou [81], Livshits [73], Hanna [82], and Pan [83] have used case studies in the penetration testing domain for model evaluation.Meanwhile, Alshahwan [76], Mahmood [75], Koskosas and Koskosa [72], Livshits [73], and Ramollari [74] have used case studies in the software testing domain.Hanssen [78] and Da Silva et al. [79] have used case studies in the domain of software engineering.

2-3-Research Methodology
There are a large number of studies that have discussed the case study approach and used case studies in model evaluation.Therefore, in order to meet the objectives of this paper, evidence-based practice was used as proposed by Bastian et al. [88].In order to collect evidence, this paper has conducted journal searches, database queries, and Internetbased searches similar to those conducted by Sana and Li [89], Ali et al. [90], Alzoubi et al. [91], and Al-Ahmad et al. [92].

Figure 1. Research Methodology Phases
During the analysis phase, online databases for the literature review were selected.These databases are Scopus, EBSCO, Springer, Emerald, Wiley, MDPI, Taylor, Sage, IGI, IOS, Google Scholar, ACM Digital Library, and Google Search for multiple university libraries.The filtering tools provided by these databases have been used for each study to limit study outcomes [102].Initial reading of the abstract and the title of the papers was conducted [103,104].Then the chosen papers were fully read to sort out irrelevant papers [105].This research study emphasizes the importance of case studies in model evaluation, certain studies, guidelines for using a case study in model evaluation from multiple domains, and previous research that used case studies in model evaluation by selecting a set of relevant search terms and combining them using "AND" and "OR" Boolean operators.The selected search terms for this study are "Case study", "Evaluation", "Penetration testing", "Model", "Framework", "Methodology", "Best practice", "Generalization", "Software testing", "Software engineering", and "Tester".
During the screening phase, academic papers written in English and listed in reputable and high-quality journals and conferences published from 1995 to 2021 were selected.More than 700 articles published in top-ranked peer-reviewed journals and conferences were found using search terms.During the exclusion phase, 540 were excluded after reading the title and abstract.Only 160 papers were selected based on the review's objectives.After carefully reading the manuscripts of the list of papers selected previously, this study chose the 82 most relevant research articles during the inclusion phase.That showed a lot of consideration in the case study and penetration testing-related research areas.

3-1-Case Study Design Steps
Multiple researchers tried to build models to use penetration testing in multiple types of applications [24,106,107].They tried to use penetration to test web applications, mobile applications, desktop applications, web services, and databases using models and frameworks to find hidden vulnerabilities that may be used by attackers to harm applications, interrupt systems, and steal data [106,[108][109][110][111][112][113].These models need to be evaluated to prove their efficiency and effectiveness in finding these hidden vulnerabilities.A case study is one of the most common methods used previously in penetration testing model evaluation; for example, it was used by Sánchez et al. [80], Zhou [81], Livshits [73], Hanna [82], Pan [83], and Doupé [84].
Previous case study practices disregarded the selection of the number of case studies and the number of testers for the case study.Therefore, due to the exclusivity of penetration testing and the increasing complexity of the application domain under test, as discussed earlier, we found that there is a need to have a specific case study reference for the penetration testing model evaluation.This reference should provide a comprehensive grounding to help the researcher save time in determining the number of case studies and the number of testers while meeting the needs of generalization.Also, it should take into consideration the penetration testing complexity and time for researchers, still keeping the research quality acceptable.
The proposed case study design steps shown in Figure 2 are trying to answer RQ1 (what are the steps to design penetration testing model evaluation using a case study?).This design is an adaptation of the previously proposed case study design models, frameworks, methods, and guidelines published in highly ranked journals and used by a large number of researchers (e.g., [7,12,14,18,38,58,[114][115][116]).This proposed design simplifies the penetration testing process as it is highly abstracted and presents only the main process for designing a case study for penetration model evaluation.This design starts with selecting the number of case studies and the number of testers.The design also needs to select testbed applications for the two case studies.Then, prepare the application environment.The single penetration tester will be chosen as part of the data collection design in preparation for data collection at the intended research for which the proposed design will be used.The single-case study approach has been applied both in social science in general and in software engineering in particular.A single case study with one tester is sufficient for generalization [37,60,63].It also helps to make the case more recognizable [64] and may be used to endorse a model or hypothesis contest [68].Additionally, a single case study can be the basis for significant explanations and generalizations [114].The single case study approach is not limited to the social sciences but has also been applied in software engineering, software testing, security testing, and penetration testing research.It has been used previously in software engineering research as it provides more flexible procedures and patterns of working that allow the researcher to find out how different issues hold different significance and focus on research identification and interpretation [62,70,[77][78][79].The single case study approach has also been used in software testing research as it provides a deeper understanding and in-depth insight into important issues [71,80], in security testing as it provides a rich description of the case study [72,74], and in penetration testing [82].The single case study has been used by research that uses real-world software testing case studies in order to efficiently analyze and control the results, as it produces large results [58].
However, common criticisms of a single case study include generalization [63].Nevertheless, Flyvbjerg [63] contested that one case study can be generalized and has been used by old scientists like Galileo, Newton, Einstein, and Bohr.A single case study generalization can be done with careful design and implementation.It is still a matter of selecting and designing a single case study, and having a huge number of case studies does not guarantee greater generalization.This study presents a design for the use of case studies in penetration testing model evaluation, besides determining the number of case studies and testers to be used during this evaluation in order to improve the generalization of the case study results.
Deciding the number of case studies is a difficult task, as there is no limit to the maximum number of case studies [37].Table 3 summarizes our reviews to support the use of a single test case when conducting an evaluation.These studies, published between 2004-2015, support the use of a single case study as it helps researchers focus on the main components of their research and reduce the complexity and time required for evaluation.

Reference
Criteria Single case study characteristics [64] Credibility It represents a more important value than having large-scale cases with less credible results.
[60] It reduces complexity and supports unique cases.
[87] Uniqueness It details an examination that investigates a single subject or model in a specific, unique, bounded system. [78] The choice for evaluation when the case study is a unique case, at least for the researcher of the study. [63]

Generalization
It can be generalized.
[37] There is no limit to the maximum number of case studies meanwhile many times one is enough.
[68] It supports and demonstrates a model or theory.Single-case studies are strongest in describing the results and findings. [62] In the case study, the data will be collected from the same project development which makes a single case study enough.[70] It is limited to other cases that have the same characteristics.Improve the research data describing the case study.
[58] Many software engineering uses a single case study.
[61] Reusability It provides reusability that can be used by other researchers and the reduce time.
[79] Flexibility It provides more flexibility and exploratory than having multiple case studies. [77] In-depth It provides in-depth detail of the results and findings.
[71] It provides a deeper understanding of the important issues.It allows the researchers to observe, explore, and explain the results and findings in real-life environment variables.
[80] It helps in finding the correlations and prioritization the results.
[72] It should be employed, mainly when researching a previously unsearched domain.It makes the results to be investigated in depth.
Ghauri [37], Yin [8], and Flyvbjerg [63] have discussed the number of case studies, and they all agreed on the difficulty of selecting the number of case studies.At the same time, they all agree that one case study is enough for generalization, especially if we deal with highly complex implementations.Using a single case study may help in making the case study generalized and centralized to scientific development, which enriches the research with a deeper focus on the important components rather than replicating the same process multiple times.A single case study is an empirical inquiry and a detailed examination that investigates a single subject or model in a specific, unique, and bounded system [87].Perry et al. [60] and McLeod and Elliott [64] agreed that in evaluating models and theories, it is sufficient to use a single case study.A single case study can be helpful to represent complex and unique cases, while the result represents a more important value than large-scale cases with less credible results.Also, Zivkovic [68] found that a single case can be used to support and demonstrate a model or theory and is strongest in describing the results and findings.
A single case study can reduce time while enhancing results.It can also support reusability, which can be used by other researchers [61].Similarly, Seaman [62] found that in the single case study, the data will be collected from the same project, which makes it sufficient.Moreover, Wong et al. [70] found that the observation made on a single case study is limited to other cases that have the same characteristics, which in fact helps to present the limitation, which represents the scope of the studies.Therefore, researchers need a detailed description of the characteristics of the case study and the evaluation limitations that make a single case study a better choice.
In software engineering, such as testing, projects use a single case study [14,58,59].Stol and Fitzgerald [77] agreed on the fact that a single case study can be used to provide in-depth details of the results and findings.While Hanssen [78] found that a single case study is a good choice for evaluation, especially when the case study is a unique case, at least for the researcher of the study.Da Silva et al. (2014) also found that a single case study can provide more flexibility and explanatory power than multiple case studies.Also, Zogaj et al. [71] stated that single case studies provide a deeper understanding of the important issues when the data are limited, as it allows the researchers to observe, explore, and explain the results and findings in real-life environmental variables.Likewise, Sánchez et al. [80] found that a single case study helps in finding correlations and prioritizing the results.Similarly, Koskosas & Koskosa [72] mentioned that a single case study should be used when studying a previously unexplored domain because it allows the findings to be thoroughly analyzed.
On the other hand, some researchers showed that using a single case study may narrow the scope of generalization, while multiple case studies may increase it [61].We found in the literature that a previous penetration testing thesis also used two case studies with one tester [72,74,81,[117][118][119][120].More than two test cases were very rare in the penetration testing domain.Authors have extensively reviewed the publications from the selected online databases and hardly found model evaluations that used more than two case studies.For example, three test cases with one tester were used by Goseva-Popstojanova & Perhinschi [121], Al-Azzani et al. [122], and Mouelhi et al. [123].Therefore, to keep the evaluation framework and its results clear and readable, improve the reliability and generalization of the model evaluation framework, and reduce the criticisms of the single case study, we suggest using two case studies.The number of case studies required is strongly linked to the complexity of the application under test.It is argued that a two-case study approach is required to cater to multiple types of applications, i.e., mobile, desktop, cloud, and hybrid implementations like mobile cloud computing and fog computing, where it can be used for static and dynamic offloading [29-31, 51, 52].
To conclude, in the context of penetration testing, a single case study is sufficient to generalize the research and may enable the research to determine and identify the evaluation results and findings, which are consistent with what was found and practiced by Ahmad et al. [124], Pandey & Mishra [125], and Ceric & Holland [126].However, using two case studies is thus sufficient for the evaluation and still identifiable by presenting the details of each case study, as well as protecting the study from single case study criticisms.Using two case studies was also practiced previously in multiple penetration testing studies such as [118,127,128].Therefore, the proposed design suggests that one may use two case studies to provide deep, in-depth results within a reasonable amount of time as well as protect the research from singlecase study criticism.

3-3-Number of Testers
As implied in the name, single-case designs have traditionally involved the use of a single tester [65,69,86].In this proposed design, the participant is referred to as the tester.A single case study is a comprehensive analysis of a single collection of data, a single subject or item, or a single depository in a single case, rather than the methods of investigation used [87].It has been defined as an individual "case" unit of intervention and unit of data analysis that may include a single participant [67].This section will review the literature of previous penetration testing model evaluations in order to answer RQ4 (How many test cases should be used to evaluate the penetration testing model?).By the end of this section, a discussion is provided to conclude and recommend the number of case studies to be used in evaluating penetration testing, which answers RQ5 (How many testers will be needed to assess the penetration testing model?).
The single tester evaluation is applied in this proposed design as it is an accepted evaluation practice in penetration testing as it was practiced previously in multiple studies such as Hanna [82], Doupé [84], Pan [83], Sánchez et al. [80], and Zhou [81].In particular, it is also a recognized evaluation standard in penetration testing related to security testing, software security, and software testing.Moreover, the single tester/participant evaluation has been successfully applied in software engineering, as illustrated in Figure 3.
Among the key rationale why single tester evaluation is recognized in penetration testing, domains are the enormous input data size and the huge corresponding results.There is a need to analyze this huge volume of results efficiently and present them clearly in order to draw conclusions and justify findings.Therefore, one tester is sufficient for evaluation purposes, as it allows the researcher to explore the relationship between a phenomenon and its context in greater detail, and thus uncover important context variables that may otherwise be missed [130].Using a single tester to evaluate models is a common practice that has been used in many studies that have been published in reputable journals.Table 4 lists many studies that have used a single tester to evaluate their model in multiple domains.In this context, we have summarized the results of reviewing certain Ph.D. and master's theses from well-reputable universities that use a single participant, which is equivalent to a single tester in penetration testing, in Table 5.A single case study helps researchers manage and enhance the quality of their research.Table 6 summarizes the main characteristics of using a single tester that can support the use of one tester in evaluating the penetration testing model.

Reference Domain Title
Livshits [73] Software security Improving software security.
Zhou [81] Improving security and privacy of integrated web applications.
Livshits & Lam [131] Finding security vulnerabilities in java applications with static analysis.
Mahmood [75] An evolutionary approach for system testing of android applications.
Pan [83] Cybersecurity testing and intrusion detection for cyber-physical power systems.
Hanna [82] A hybrid framework for the systematic detection of software security vulnerabilities in source code.
Alshahwan [76] Software testing Utilizing output in web application server-side testing.
Ramollari [74] Automated runtime testing of web services.[85] Rey: An intensive single case study of a probation youth with immigrant background Allow for sensitization.Increase awareness for the results.[86] A case study of extensive reading with an unmotivated L2 reader Provide in-depth analysis.
[66] Types of case study work: A conceptual framework for case-based research Enhance resources efficiency.
[69] Single-case research Improve controlling the variables.
Based on the above discussion, it can be argued that a single participant is acceptable in the academic domain for researchers and students.It shows that many studies that have been published in a scientific journal have used single participants.Furthermore, several Ph.D. and master's theses have been conducted using a single-participant approach.Thus, a single participant can be used in evaluating the penetration testing model, as it is a generally accepted approach in the academic domain, especially if the complexity of the evaluation of the penetration testing model is considered.
The single participant allows for sensitization and increases awareness of the results [85].Furthermore, the single participant helps to make the primary analysis of a single individual with specific characteristics [86].Edwards [66] found that resources are more likely to be available to obtain information from multiple sources when using a single participant.Similarly, Barker et al. [69] found that one participant is valuable and helps in controlling the variables in order to get meaningful results that detect positive and negative points.

3-4-Select Testbed Applications
Testbed applications are a tool to be used in evaluating a model.In the context of penetration testing, a testbed application represents a vulnerable application that has some known vulnerabilities [132,133].This application will be used as an input to the testing process, where the evaluation will be done between the number of vulnerabilities exposed by the model versus the real number of vulnerabilities [134].There are many examples of applications that can be used as testbed applications, such as GoatDroid, OWASP web test application, HerdFinancial, and FourGoats [135,136].
Good testbed applications must be reliable, compatible, and extensible.Reliable means it should be evaluated before being used in evaluation [137,138].While compatibility means that the application must be visible to be used as an application within the domain understudy [139].Similarly, the application must be extensible to add new features and functionality [140,141].These requirements should be mapped and applied in each step of the selection of the testbed application for the evaluation of any proposed penetration testing model.

3-5-Environment Preparation
The application intended to be tested under penetration testing needs to run in order to execute the prepared test cases [142].The process of converting a testbed application into a running application requires deploying the testbed application's required services [143].Therefore, environment preparation must include two phases: backend and installation [144,145].The first phase prepares the backend part of the environment.This includes the selection of the service providers to use and the deployment of these services.The second phase of environmental preparation includes the selection of devices and the installation process.
Each case study needs certain parameters and instances to be prepared before conducting penetration testing in order to emulate the real-life scenario [142].For example, gateway addresses and service addresses need to be configured, as well as the library's location, DNS address, and firewall setup attributes.These preparations must be specifically implemented and structured in order to assist the researcher in presenting the method and findings more effectively [145,146].Since many of the environmental variables are shared through multiple case studies, having solid environmental preparation aids in implementing the other case studies.

4-1-Evaluation Using Case Study
As shown in the previous sections.The process of case study design for penetration testing model evaluation follows certain steps: (i) Select the number of case studies.(ii) Select the number of testers.(iii) Select testbed applications for the case studies.(iv) prepares the application environment.This process represents the answer to RQ1 (what are the steps to design penetration testing model evaluation using a case study?).The following discussion will conclude the previous penetration testing models' number of test cases and testers used and present the recommended numbers of test cases and testers to be used when evaluating penetration testing models that answer research questions 2-5.

RQ2: What is the number of test cases used in the related research?
In the area of software engineering, a single case study with one participant is sufficient for generalization [147].Other fields accept that a single case study with one sample is sufficient for generalization [37,63].In most single-case research, selection is generally not a concern, even if one participant is exposed [67].The single case study with a single participant has been used previously as it simplifies the model evaluation framework [148].

RQ3: What is the number of testers used in the related research?
Although some penetration and security testing and evaluation frameworks have used a single tester for one case study [74,76,82], others have used two case studies [73].A single case study with one tester has been applied to focus on the details of the case study [74,76,82], while two case studies with a single tester have been used to efficiently determine the effect of their proposed manifests [73].Others have also used a single case study with one tester to focus on the details of the case study [74,76,82].
RQ4: How many test cases should be used to evaluate the penetration testing model?RQ5: How many testers will be needed to assess the penetration testing model?
The proposed design of the evaluation will be based on two case studies: one will be used for static and the other for dynamic offloading, with one penetration tester.Using a single case study with a single participant is sufficient; however, using two case studies with one tester will improve evaluation generalization and make the cases more identifiable by presenting the details of each case study.Using one tester's main advantages are: (i) accepted in the areas of penetration testing, security testing, software security, software testing, and software engineering; (ii) sufficient for generalization; (ii) allows the researcher to explore the relationship between a phenomenon and its context.

4-2-Main Findings of the Present Study
The findings of this paper lead to the conclusion that case study design for penetration testing model evaluation has to start with determining the number of case studies and number of testers, then selecting and preparing testbed applications.Those steps are critical and subject to criticism, which requires researchers to support his decision with solid literature and logic.This paper has discussed many studies that evaluated penetration and security testing models and frameworks using a single tester for one case study and other studies that used two case studies.This paper, by studying and analyzing previous works, has found that designing the testing models and frameworks for evaluation based on two case studies with one tester is sufficient.The findings also revealed that two case studies with one tester will provide the researchers with additional data, protect them from the criticism of the single case study, and improve the generalization and clarity of the results.

4-3-Comparison with Other Studies
In this paper, we reviewed studies in a variety of domains, including penetration testing, security testing, software security, software testing, and software engineering, and analyzed their case study evaluation method framework to derive framework guidelines that can be used to construct a solid, efficient, and generalizable structure for a penetration testing evaluation case study.The proposed case study design in this paper was adopted from the previously proposed [7,12,14,18,38,58,[114][115][116] case study design models, frameworks, methods, and guidelines provided.Furthermore, this paper has found that using a single test case when conducting a sufficient evaluation was previously used by Ahmad et al. [124], Pandey and Mishra [125], and Ceric and Holland [126], while two case studies are more identifiable and can support generalization more, which were used in multiple penetration testing studies such as Chung, Mueller, and Kim [118,127,128].In the same context, a single participant is sufficient, which is also supported by other studies (e.g., [66,69,85,86]).

4-4-Implication and Explanation of Findings
Simplifying the process by utilizing a single case study for model evaluation, specifically in the domain of penetration testing, will encourage researchers and practitioners to become more involved in the research domain by reducing complexity and time.Using a single case study can also be very beneficial in applying the models, testing tools, and guidelines while also improving the quality of the results and safeguarding the model from generalization criticism by using a well-structured and solid approach.
There have been few studies to standardize the usage of the case study approach in the domain of software engineering and even fewer in the domain of software testing.Future directions should be toward developing rules and standards that provide metrics for assessing the evaluation process while meeting the expectations of testers, researchers, and practitioners about the use of case studies in the domain of software testing.These metrics should take into account resource constraints as well as the need for increased production to meet the fast-evolving technologies in this domain in order to improve the quality of information systems globally in terms of security, performance, usability, accessibility, and functionality.

Figure 2 .
Figure 2. Case study design steps

Figure 3 .
Figure 3. Software engineering domain and sub-domains single tester/participant and single case study evaluation

Table 1 . Method, framework, and guideline to use case study in research study
presents a sample of research publications that use case studies.