Journal of Systems Architecture 160 (2025) 103343 Contents lists available at ScienceDirect Journal of Systems Architecture journal homepage: www.elsevier.com/locate/sysarc Component-based architectural regression test selection for modularized software systems Mohammed Al-Refai a ,∗, Mahmoud M. Hammad b a Computer Science, Computer and Information Technology, Jordan university of science and technology, P.O. Box 3030, Irbid, 22110, Jordan b Software Engineering, Computer and Information Technology, Jordan university of science and technology, P.O. Box 3030, Irbid, 22110, Jordan ARTICLE INFO ABSTRACT Keywords: Regression testing is an essential part of software development, but it can be costly and require significant Regression test selection computational resources. Regression Test Selection (RTS) improves regression testing efficiency by only re- Static analysis executing the tests that have been affected by code changes. Recently, dynamic and static RTS techniques for Component-based architecture Java projects showed that selecting tests at a coarser granularity, class-level, is more effective than selecting Java platform module system tests at a finer granularity, method- or statement-level. However, prior techniques are mainly considering Software architecture Java object-oriented projects but not modularized Java projects. Given the explicit support of architectural constructs introduced by the Java Platform Module System (JPMS) in the ninth edition of Java, these research efforts are not customized for component-based Java projects. To that end, we propose two static component- based RTS approaches called CORTS and its variant C2RTS tailored for component-based Java software systems. CORTS leverages the architectural information such as components and ports, specified in the module descriptor files, to construct module-level dependency graph and identify relevant tests. The variant, C2RTS, is a hybrid approach in which it integrates analysis at both the module and class levels, employing module descriptor files and compile-time information to construct the dependency graph and identify relevant tests. We evaluated CORTS and C2RTS on 1200 revisions of 12 real-world open source software systems, and compared the results with those of class-level dynamic (Ekstazi) and static (STARTS) RTS approaches. The results showed that CORTS and C2RTS outperformed the static class-level RTS in terms of safety violation that measures to what extent an RTS technique misses test cases that should be selected. Using Ekstazi as the baseline, the average safety violation with respect to Ekstazi was 1.14% for CORTS, 2.21% for C2RTS, and 3.19% for STARTS. On the other hand, the results showed that CORTS and C2RTS selected more test cases than Ekstazi and STARTS. The average reduction in test suite size was 22.78% for CORTS and 43.47% for C2RTS comparing to the 68.48% for STARTS and 84.21% for Ekstazi. For all the studied subjects, CORTS and C2RTS reduced the size of the static dependency graphs compared to those generated by static class-level RTS, leading to faster graph construction and analysis for test case selection. Additionally, CORTS and C2RTS achieved reductions in overall end-to-end regression testing time compared to the retest-all strategy. 1. Introduction in the overall test-suite execution time. This rapid increase poses a challenge to manage, even for a company with extensive computing Regression testing is the process of running the existing test cases resources [12]. Regression test selection (RTS) approaches are used to on a new version of a software system to ensure that the performed improve regression testing efficiency [3,12]. RTS is defined as the ac- modifications do not introduce new faults to previously tested code [1– tivity of selecting a subset of test cases from an existing test set to verify 3]. Regression testing is one of the most expensive activities performed that the affected functionality of a program is still correct [3,12,13]. during the lifecycle of a software system with some studies [4–10] estimating that it can take up to 80% of the testing budget and up to The RTS problem has been studied for over three decades [14,15]. 50% of the software maintenance cost. For instance, Google reported Traditional code-based RTS approaches take four inputs: the two ver- that their regression-testing system, TAP [11], experienced a linear sions (new and old) of a software system, the original test suite, and growth in both the number of software changes and the average test- dependency information of the test cases on the old version. The output suite execution time, which ultimately resulted in a quadratic rise ∗ Corresponding author. E-mail addresses: mnalrefai@just.edu.jo (M. Al-Refai), m-hammad@just.edu.jo (M.M. Hammad). https://doi.org/10.1016/j.sysarc.2025.103343 Received 30 May 2024; Received in revised form 12 January 2025; Accepted 12 January 2025 Available online 18 January 2025 1383-7621/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies. M. Al-Refai and M.M. Hammad Journal of Systems Architecture 160 (2025) 103343 is the subset of test cases – from an existing test set – that must be graph size with respect to static class-level RTS techniques, (4) execu- re-executed on the modified version of the software system [12]. tion time required to construct and analyze the static dependency graph RTS techniques vary in the granularity at which they compute to select relevant test cases, and (5) reduction in the end-to-end regress- test dependencies from test cases to code statements, basic blocks, ing testing time compared to the retest-all strategy. We compared the methods, or classes. Recently, researchers showed that, for individual results obtained by CORTS and C2RTS with those of the state-of-the- projects, class-level RTS can be more efficient and beneficial than iden- art class-level dynamic (Ekstazi [12]) and static (STARTS [18]) RTS tifying changes and computing dependencies at lower granularities, approaches, using 1200 revisions of 12 real world Maven-based Java e.g., statement and method levels [12,16,17]. Therefore, the current software systems. trend [12,16–20] is to focus on class-level RTS by (1) identifying This paper is organized as follows. Section 2 provides an illustrative changes at the class level and (2) computing dependencies from test example to explain the work of our approach. Section 3 describes the cases to the classes under test. In addition to supporting class-level RTS, proposed approaches, CORTS and C2RTS. Section 4 presents the eval- these approaches consider a test class as a test case, and thus, select test uation. Section 5 describes the threats to the validity of our approach classes instead of test methods [12,18,19].1 and results. Related work is summarized in Section 6. Conclusions and Class-level RTS can be static or dynamic, by analyzing dependencies plans for future work are outlined in Section 7. from test cases to classes under test statically or dynamically. A recent extensive experimental evaluation of static class-level RTS [17,18] 2. Illustrative example showed that it is comparable with the state-of-the-art dynamic class- level RTS approach, called Ekstazi [12]. While such a dynamic RTS This section presents an illustrative example of a Java 9 Component- approach requires code instrumentation and runtime information to Based (CB) application of a university system, which is adapted from find affected tests, static class-level RTS does not require such infor- the example used in Hammad et al. [22]. We use this example in the following section (i.e., Section 3) to demonstrate how our approaches, mation, and instead, it builds a dependency graph of program types CORTS and C2RTS, are used with a CB application. based on compile-time information, and selects test cases that can reach The university system example is developed according to the Java changed types in the transitive closure of the dependency graph [17, Platform Module System (JPMS) [21], which is a key feature of project 18]. However, static class-level RTS approaches can be unsafe, which Jigsaw [23], designed to provide a scalable module system for Java. means they might miss selecting test cases that are impacted by code It enables developers to build applications using modular constructs, changes. The use of Java reflection is the main cause of unsafety in i.e., components (modules) and ports (module directives), offering a static RTS approaches when compared with dynamic RTS approaches. higher level of abstraction than packages or classes. The modularized Reflection in Java allows for runtime behaviors that can be challenging Java 9 JRE allows applications to depend on specific modules of the to predict statically, which means static RTS might miss identifying JRE rather than the entire runtime environment. Each module in JPMS some dependencies during test selection [17,18]. includes a descriptor file called ‘‘module-info.java’’, which specifies its The previous dynamic and static class-level RTS techniques have dependencies and exported services. The JPMS supports various ports primarily focused on Java object-oriented projects, without addressing that enable a module to export its services or require services from the unique needs of modularized Java applications. With the intro- other modules, facilitating clear and maintainable module interactions. duction of the Java Platform Module System (JPMS) [21] in Java 9 Fig. 1 shows the component-based architecture of the university and newer versions, existing RTS research approaches have not been system. It is important to mention that Hammad et al. [22] created this adapted to accommodate the architectural constructs of component- university system by converting its equivalent Java 8 Object Oriented based Java projects. To bridge this gap, we propose two static compo version to the CB version according to the OO2CB tool proposed in nent-based RTS approaches, CORTS and its variant C2RTS, specifi- [22], which is a tool that converts Java 8 OO apps to equivalent Java 9 cally designed for component-based Java software systems that are CB apps following the least-privilege security principle. A least-privilege developed using the JPMS architectural constructs. architecture is an architecture in which each component is only granted JPMS provides explicit implementation-level support for well-known the exact privileges, in terms of inter-component communications as architectural constructs, such as components (called modules) and ports well as the required JRE modules, it needs to provide its functional- (called module directives). These constructs provide a higher level of ab- ity [22,24]. This principle is also important to perform safe and precise straction than Java packages and classes. CORTS leverages the architec- regression test selection based on the exact needed inter-component tural constructs information, such as components and ports, presented communications/dependencies. in the module descriptor files, named ‘‘module-info.java’’ [21], to con- Before presenting the example details, it is also important to men- struct module-level dependency graph. The variant, C2RTS, is a hybrid tion that generally, there are two common methods for organizing test technique that integrates module- and class-level analysis, and there- cases in CB applications: (1) placing them in separate test-components fore, uses both the module descriptor files and part of compile-time or (2) alongside core application classes within app-components. The information to construct the dependency graph. The two approaches, first method, adopted in this paper for illustration, creates distinct CORTS and C2RTS, find relevant test cases that can reach some changed components for test classes, aligning with the separation of concerns module/class in the transitive closure of the dependency graph. Similar principle by isolating production and test code. This approach ensures to recent RTS approaches [12,16–20], CORTS and C2RTS consider each clear boundaries and flexible management of dependencies specific to test class as a test case. testing. Both CORTS and C2RTS are compatible with either method. In CORTS and C2RTS can improve safety over traditional static class- this paper, we use test-components to refer to the modules containing level RTS techniques by capturing runtime module-level dependencies test classes, and use app-components to refer to the modules containing that are related to reflection and dynamic class loading mechanisms. the core application classes, i.e., production code. This is possible because such dependencies are explicitly defined in- As depicted in Fig. 1, the university system consists of four app- side the module descriptor files using the open and opens with components, i.e., modules,2 which are location, registration, directives [21]. stuService, and serviceProvider. In addition, the system con- We evaluated CORTS and C2RTS in terms of (1) safety and precision tains three test-components, which are locationTest, registra- violations, (2) reduction in test suite size, (3) reduction in dependency tionTest, and serviceProviderTest. The java.logging com- ponent is also used by the system. The Java classes in the system 1 From this point until the end of the paper we use the term test case to 2 refer to a test class. In the paper, we use the terms module and component interchangeably. 2 M. Al-Refai and M.M. Hammad Journal of Systems Architecture 160 (2025) 103343 Fig. 1. Component-Based (CB) application adapted from [22]. interact as follows: The StuSchedule class generates a suggested serviceProviderTest declares a requires directive in its schedule for a student and logs relevant details to a log file using ‘‘module-info.java’’ file to establish this communication, as shown on the java.util.Logger class. Additionally, StuSchedule dynami- Line 8 of Fig. 2(b). Simultaneously, the stuService component de- cally loads the ClassRoomManager class and invokes its methods via fines an exports to directive to expose the people package to ser- Java reflection to retrieve classroom information, which it logs in the viceProviderTest, as illustrated on Line 13 of Fig. 2(a). Addition- students’ schedules. A student can either be an Undergraduate or a ally, because IStudent is an interface, serviceProviderTest Graduate, with both classes implementing the IStudent interface. also declares a uses directive, as indicated on Line 9 of Fig. 2(b). The corresponding ‘‘module-info.java’’ files for the four app-comp onents are presented in Fig. 2(a), while those for the three test-com 3. Approach ponents are shown in Fig. 2(b). In the remainder of this section, we discuss some of the key directives used in these ‘‘module-info.java’’ This section describes our proposed component-based RTS appro files: provides with, exports to, opens to, and uses. aches, CORTS and C2RTS, which are static analysis tools and based on analyzing dependencies from test cases to the components of the As shown in Fig. 1, the stuService component contains the software application under test. CORTS and C2RTS assume that the IStudent interface inside the people package. In order for the Un- software application is component-based Java application. It also worth dergraduate and Graduate classes from the serviceProvider mentioning that if the app is constructed according to the least- component to implement the interface, the stuService needs to ex- privilege architecture, in which each component is only granted the port the people package using the exports to port that is shown in precise dependencies to components and resources that are needed to Line 12 of Fig. 2(a). In addition to the exports to port, the servi- provide its functionality, our RTS approaches yield more precise test ceProvider component needs to define two more ports. One port to case selection. require the stuService component as shown in Line 18 of Fig. 2(a) Consistent with the current trend in code-based RTS research [12, and another provides with port to provide the functionalities of 18,19], CORTS and C2RTS consider a test class to be a test case. They the IStudent interface using the Graduate and Undergraduate support both unit and system test cases. The inputs to CORTS and implementation as shown in Lines 19–21 of Fig. 2(a). C2RTS are the previous version of the Java application along with its The class StuSchedule located in the registration compo- test cases, i.e., the application before modification, and the current nent contains a code to dynamically load the class ClassRoomMan- (modified) version of the Java application. The output is the set of ager and invoke its methods using Java reflection. Therefore, the selected test cases that must be re-executed on the current version of location component that contains the ClassRoomManager class the application. defines an opens to port to open the package ClassRoom to the We present CORTS in Section 3.1 and its variant C2RTS is described registration component as shown in Line 26 of Fig. 2(a). This in Section 3.2. port enables the classes of the registration component to load and access all classes of the ClassRoom package using the Java reflection 3.1. The corts approach mechanisms. The test classes TestGraduate and TestUndergraduate, be- CORTS takes the previous version of a CB Java app along with its longing to the serviceProviderTest test-component, use the IS- test cases, then it parses the module descriptor files (the module- tudent interface from the stuService component. As a result, info.java files) of all app-components and test-components. While 3 M. Al-Refai and M.M. Hammad Journal of Systems Architecture 160 (2025) 103343 Fig. 2. module-info.java files. parsing the descriptors, CORTS constructs a directed graph, called En- all communication ports that are directed towards or emanating from T tity Dependency Graph (EDG), where each node represents a component are represented in the EDG as directed edges leading to or originating or a test case (test class), and the directed edges among the nodes from every node representing a test class belonging T. CORTS is capable represent the various types of dependencies among the components, of identifying all test classes associated with a given test-component such as requires, uses and provides with dependencies. After through a straightforward method. This involves navigating the file that, CORTS compares the previous version of the CB app with the system directory designated for the test-component and locating the current version of the app to identify the modified components and class files contained inside it. In the context of component-based Java flag their corresponding nodes in the EDG. Then, CORTS finds and applications organized using JPMS features, each component is as- returns the set of affected test cases that directly or transitively reach signed a distinct OS directory. This directory houses all Java packages, a modified component in the EDG. The detailed process of CORTS classes, and the module-info.java file pertinent to the component, consists of the three steps: facilitating the identification process. When CORTS scans the module-info.java file of each compo- 1. Building the EDG from the component-based application (Sec- nent in the CB app, it adds directed edges in the EDG according to the tion 3.1.1). following rules. We demonstrate each of these rules using the extracted 2. Identifying the modified components in the EDG (Section 3.1.2). EDG shown in Fig. 3 for the illustrative CB example depicted in Fig. 1. 3. Selecting the affected test cases (Section 3.1.3). Rule 1 (Requires Port). Let 𝑀1 be a component that requires another We demonstrate these steps in light of the illustrative example component 𝑀2 , where this communication is represented using the state- shown in Fig. 1. ment "requires 𝑀2 " in the module-info.java file of 𝑀1 . This requires port means that a class(es) that belongs to 𝑀1 depends/ 3.1.1. Building the EDG from the component-based application communicates with a class(es) that belongs to 𝑀2 . According to this de- In this step, CORTS parses the module-info.java descriptors pendency, CORTS adds a directed edge from node 𝑀1 to node 𝑀2 in the of the app- and test-components of the previous version of the Java EDG. application. While parsing the descriptor files, CORTS builds the EDG, For example, the registration component requires the where each node in this directed graph represents a component or a stuService component as specified in the corresponding module de- test case, and the directed edges among the nodes represent the various scriptor file shown in Fig. 2, and therefore, a directed edge is added in types of dependencies among the components. As an example, Fig. 3 the EDG from node registration to node stuService as depicted shows the extracted EDG for the CB example shown in Fig. 1. in Fig. 3. Moreover, as shown in Fig. 2, the serviceProviderTest CORTS distinguishes between descriptor files of app-components component requires the stuService component. Therefore, a and those of test-components. If the module-info.java descriptor directed edge is added in the EDG from every test class node belonging is for an app-component A, then a node is added in the EDG for A, to serviceProviderTest, i.e., the test classes TestGraduate and all communication ports that are directed towards or emanating and TestUndergraduate, to the node stuService, as shown in from A, e.g., requires or use ports, are represented in the EDG as Fig. 3. directed edges leading to or originating from the node A. However, if the descriptor is for a test-component T, then CORTS adds a node in Rule 2 (Provides With and Uses Ports). Let 𝐶1 be a class in module the EDG for each individual test class that belongs to T. Subsequently, 𝑀1 and 𝐴2 be an abstract class or an interface in module 𝑀2 , where 4 M. Al-Refai and M.M. Hammad Journal of Systems Architecture 160 (2025) 103343 Fig. 3. Entity Dependency Graph (EDG) extracted by CORTS. 𝐶1 implements or extends 𝐴2 . This dependency is represented using the 3.1.2. Identifying the modified components in the EDG statement "provides 𝐴2 with 𝐶1 " in the module-info.java file This step involves identifying the modified components to mark of 𝑀1 . Additionally, let 𝐶3 be a class that belongs to module 𝑀3 , where their associated nodes in the EDG as modified. CORTS considers a com- 𝐶3 uses 𝐴2 , which is represented using the statement "uses 𝐴2 " in the ponent modified if any of its classes have undergone changes. There are module-info.java file of 𝑀3 . Then, the component 𝑀3 can utilize several methods to determine which classes have been modified. For in- the java.util.ServiceLoader from the java.base JPMS JDK stance, the Linux diff command can be used to compare the directories module to load implementations (i.e., 𝐶1 belonging to 𝑀1 ) of the service 𝐴2 . of a component across the previous and current versions of the Java According to this dependency from component 𝑀3 to component 𝑀1 that application. Should this command highlight a component’s directory contains the concrete class 𝐶1 , CORTS adds a directed edge from node 𝑀3 due to alterations or removal of any class within it, or the addition to node 𝑀1 in the EDG. of new classes into it, CORTS will then mark the node representing For example, as depicted in the module configuration files shown in that component in the EDG as modified. Another method involves Fig. 2, the serviceProvider component provides the interface comparing the smart checksums of the previous and current versions IStudent of the component stuService with the concrete classes of each compiled Java file (i.e., .class files) to identify changed Graduate and Undergraduate. Additionally, the registration classes [12]. In environments employing Continuous Integration (CI) component uses the IStudent interface. Those communication ports for Java application development, like GitHub, the modifications can enable the component registration to access the component ser- also be traced through version control specific commands, such as git viceProvider and load the two concrete classes, Graduate and diff, to find the changed classes and components. Currently, CORTS Undergraduate, via the class java.util.ServiceLoader. primarily utilizes the Linux diff strategy to pinpoint and mark the Therefore, a directed edge is added in the EDG from node regis- modified components within the EDG. However, it is effortless to make tration to node serviceProvider as shown in Fig. 3. Likewise, CORTS supports other strategies. the test component serviceProviderTest uses the IStudent For example, if the ClassRoomManager class is modified, e.g., interface as depicted in Fig. 2, which grants this test component an some of its source code is changed to add/delete/modify methods, then access to the concrete classes Graduate and Undergraduate of the component containing this class, which is location, is marked as the component serviceProvider. Therefore, a directed edge is modified in the EDG shown in Fig. 3. added in the EDG from each test class node (i.e., nodes representing TestGraduate and TestUndergraduate that belong to ser- viceProviderTest) to node serviceProvider, as shown in 3.1.3. Selecting the affected test cases Fig. 3. In this step, mirroring the methodology of firewall static RTS approaches [17,25], CORTS traverses the EDG to identify the nodes of Rule 3 (Opens with Port). Let 𝑝1 be a package that belongs to a module all test cases that reach nodes representing modified components. In 𝑀1 , and let this module opens 𝑝1 to another module 𝑀2 , such that this particular, CORTS calculates the transitive closure for each test case dependency is represented using the statement "opens 𝑝1 to 𝑀2 " in the to find all the components that a test case depends on. Subsequently, module-info.java file of 𝑀1 . Then, 𝑀2 can communicate with 𝑀1 the set of impacted test cases whose transitive dependencies include and load and access classes of the package 𝑝1 via the Java reflection and some modified component, is returned as the output by CORTS. We dynamic class loading mechanisms. According to this dependency from 𝑀2 used the JGraphT library [26] to construct the EDG and to calculate to 𝑀1 , CORTS adds a directed edge from node 𝑀2 to node 𝑀1 in the EDG. the transitive closures for the test cases within the EDG. For example, in the module configuration files shown in Fig. 2, To complete the demonstration example, if the class ClassRoom- the location component opens its package classRoom to the Manager is modified and its component location is marked in registration component. Therefore, a directed edge is added in the EDG shown in Fig. 3, then all test cases that transitively reach the EDG from node registration to node location, as shown the location node, which are TestClassRoomMngr and Test- in Fig. 3. StuSchedule, will be selected and returned as the output of CORTS. 5 M. Al-Refai and M.M. Hammad Journal of Systems Architecture 160 (2025) 103343 Fig. 4. Entity Dependency Graph (EDG) extracted by C2RTS. 3.2. The c2rts approach Representing unmodified app-components as nodes in the EDG. The unmodified app-components of the application are handled us- We have developed a hybrid RTS approach that combines aspects ing the same way employed by CORTS, where they are presented from both the Component-level and Class-level RTS techniques, called as nodes in the EDG using the same method described previously in C2RTS. This variant of CORTS integrates module- and class-level de- Section 3.1.1. For example, the app-component location is identified pendency analyses, trading off to strike a balance between safety and by C2RTS as unmodified, and therefore, is represented as a single node precision by adjusting the level of granularity from modules to classes in the EDG. depending on the specific classes where code changes have been made. Representing test-components as nodes in the EDG. Similar to C2RTS trades off some safety for increased precision compared to CORTS, the C2RTS approach creates a separate node for each individual CORTS. test class in the EDG. For example, the EDG shown in Fig. 4 contains a While constructing the EDG, C2RTS distinguishes between modified node for each test class, such as the test classes TestStuSchedule and unmodified app-components within the Java application. Specifi- and TestGraduate. cally, each unmodified app-component is represented as a single node Next, we describe the various ways of C2RTS for (1) extracting in the EDG, whereas all classes belonging to a modified app-component dependencies among classes of a modified app-component in Sec- are represented as individual nodes. tion 3.2.1.2, (2) extracting dependencies among unmodified app- As an example for an EDG constructed by C2RTS, Fig. 4 shows the components in Section 3.2.1.3, (3) extracting dependencies between unmodified and modified app-components in Section 3.2.1.4, and (4) constructed EDG given that the app-component serviceProvider is extracting dependencies between test-components and app-components identified as a modified app-component by C2RTS, and thus, all classes in Section 3.2.1.5. belonging to this app-component are represented as individual nodes in the EDG. The remaining app-components are unmodified, and thus, 3.2.1.2. Extracting dependencies among modified app-component classes. each of them is represented as a single node in the EDG. The subsequent The dependencies among the classes of a modified app-component are subsections elaborate on the entire process undertaken by C2RTS to extracted using the Oracle Java Class Dependency Analyzer (jdeps) construct the EDG and select test cases. tool [27].3 These dependencies are represented as directed edges in the EDG between the nodes representing the classes of the modified 3.2.1. Building the EDG from the component-based application app-component. We explain the steps applied by C2RTS to build the EDG nodes and 3.2.1.3. Extracting dependencies among unmodified app-components. The edges from the component-based application. dependencies among the unmodified app-components are extracted 3.2.1.1. Mappings from components to nodes in the edg. This section and represented in the EDG according to Rules 1, 2, and 3 described explains how C2RTS maps the unmodified app-components, modified previously in Section 3.1.1. For example, in the EDG represented in app-components, and test-components to nodes in the EDG. Fig. 4, C2RTS added an edge from node registration to node Representing modified app-components as nodes in the EDG. location according to Rule 3. Given the previous and current versions of the CB app, if an app- 3.2.1.4. Extracting dependencies between unmodified and modified app- component is modified between the two versions, then instead of components. The dependencies between the unmodified components representing this app-component as a single node in the EDG, C2RTS and the classes of the modified components are extracted using: (1) represents each class belonging to the app-component as a single node information extracted from the component configuration files, the in the EDG. module-info.java files, and (2) information extracted using the In our illustrative example, we suppose that the app-component jdeps tool. These extracted dependencies are used to construct the serviceProvider depicted in Fig. 1 is identified as modified by C2RTS. Consequently, all the classes of this component (i.e., the Un- dergraduate and Graduate classes) are represented as nodes in 3 jdeps now is part of the standard Java library, and is used to analyze the the EDG, as shown in the EDG represented in Fig. 4. module-level, package-level, and class-level dependencies of Java class files. 6 M. Al-Refai and M.M. Hammad Journal of Systems Architecture 160 (2025) 103343 EDG according to three formally defined rules, Rules 4, 5, 6. The three added in the EDG from the nodes representing Undergraduate and rules are described using the following assumptions: Graduate to the node representing stuService, as shown in Fig. 4. • Let 𝐴𝑝𝑝 be a previous version of a component-based Java appli- 3.2.1.5. Extracting dependencies between test-components and app-comp cation that was modified to the current version 𝐴𝑝𝑝′ . onents. The C2RTS approach represents each individual test class be- • Let a module 𝑀1 represents an app-component that was identified longing to a test-component as a single node in the EDG. The depen- as modified between 𝐴𝑝𝑝 and 𝐴𝑝𝑝′ , i.e., some classes that belong dencies from test classes to unmodified app-components are extracted to 𝑀1 were modified. and represented as edges in the EDG according to Rules 1, 2, and 3 • Let a module 𝑀2 represents an app-component that belongs to explained previously in Section 3.1.1. On the other hand, dependencies 𝐴𝑝𝑝 and this module is not modified in 𝐴𝑝𝑝′ , i.e., unmodified from test classes to classes belonging to modified app-components are app-component. extracted according to the following two rules, Rules 7 and 8 which are Given these assumptions, C2RTS represents 𝑀2 as a single node in modified versions of Rules 4 and 5, respectively. the EDG, and instead of representing 𝑀1 as a single node, C2RTS rep- resents all the classes/interfaces belonging to 𝑀1 as nodes in the EDG. Rule 7 (Provides with and Uses Ports). Let 𝐶1 be a class in module 𝑀1 Subsequently, C2RTS extracts the dependencies between the classes of and 𝐴2 be an abstract class or an interface in module 𝑀2 , where 𝐶1 𝑀1 and the module 𝑀2 and reflects them as edges in the EDG based implements or extends 𝐴2 . This port is represented using the statement on the following rules: "provides 𝐴2 with 𝐶1 " in the "module-info.java" file for 𝑀1 . Additionally, let 𝑇 𝑀3 be a test-module, where some test classes belonging Rule 4 (Provides with and Uses Ports). Let 𝐶1 be a class in module 𝑀1 and to 𝑇 𝑀3 use 𝐴2 , and this dependency is represented using the statement 𝐴2 be an abstract class or an interface in module 𝑀2 , where 𝐶1 implements "uses 𝐴2 " in the "module-info.java" file of 𝑇 𝑀3 . This uses or extends 𝐴2 . This port is represented using the statement "provides port enables test classes belonging to the test-component 𝑇 𝑀3 to utilize 𝐴2 with 𝐶1 " in the "module-info.java" file for 𝑀1 . Additionally, the java.util.ServiceLoader from the java.base JPMS JDK let 𝐶3 be a class that belongs to an unmodified module 𝑀3 (𝑀3 is an module to load the implementation of 𝐶1 that belongs to 𝑀1 . Subsequently, app-component), where 𝐶3 uses 𝐴2 , and this port is represented using the C2RTS applies the jdeps technique with 𝑇 𝑀3 and 𝑀2 to find the test statement "uses 𝐴2 " in the "module-info.java" file for 𝑀3 . Then, classes of 𝑇 𝑀3 that depend on 𝐴2 . Let jdeps returned that a test class the component 𝑀3 can utilize the java.util.ServiceLoader from the java.base JPMS JDK module to load the implementation of 𝐶1 that 𝑇 𝐶3 belonging to 𝑇 𝑀3 depends on 𝐴2 . Then, C2RTS adds a directed edge belongs to 𝑀1 . According to this dependency from component 𝑀3 to class from node 𝑇 𝐶3 to node 𝐶1 in the EDG because 𝑇 𝐶3 can load 𝐶1 via the 𝐶1 , C2RTS adds a directed edge from node 𝑀3 to node 𝐶1 in the EDG. class java.util.ServiceLoader. We explain how this rule is applied to the EDG nodes represented For example, in the module configuration files shown in Fig. 2, in Fig. 4 given the module configuration files shown in Fig. 2. In the the serviceProvider component provides the interface IS- configuration files, the serviceProvider component provides tudent of the stuService component with the concrete classes the interface IStudent of the stuService component with the Undergraduate and Graduate. Additionally, the test-component concrete classes Undergraduate and Graduate. Additionally, the serviceProviderTest uses the IStudent interface as depicted registration component uses the IStudent interface. These in Fig. 2. Therefore, C2RTS finds, using jdeps, which test classes ports enable the component registration to load implementa- of serviceProviderTest depend on IStudent, and the jdeps tions of the two concrete classes Graduate and Undergraduate. returns that the test classes TestGraduate and TestUndergrad- Therefore, two directed edges are added in the EDG from the node uate depend on IStudent. Subsequently, directed edges are added registration to the nodes Graduate and Undergraduate as in the EDG from each of these test classes to the concrete classes shown in Fig. 4. Graduate and Undergraduate as depicted in Fig. 4. Rule 5 (Requires Port from Unmodified to Modified App-Component). Let 𝑀2 requires 𝑀1 , where this port is represented using the statement Rule 8 (Requires Port from Test-Component to Modified App-Component). "requires 𝑀1 " in the "module-info.java" file for 𝑀2 . Then, ac- Let 𝑇 𝑀1 be a test-component that requires the modified app-component 𝑀1 , cording to this dependency from an unmodified module (𝑀2 ) to a modified such that this dependency is represented using the statement "requires module (𝑀1 ), C2RTS uses the jdeps technique with 𝑀2 and 𝑀1 to find 𝑀1 " in the "module-info.java" file for 𝑇 𝑀1 . Then, based on this the set of dependencies from classes belonging to 𝑀2 into classes belonging dependency, C2RTS uses the jdeps technique with 𝑇 𝑀1 and 𝑀1 to find to 𝑀1 . Let jdeps result included some dependencies from some class(es) the set of dependencies from test classes belonging to 𝑇 𝑀1 into classes of 𝑀2 to a class called 𝐶1 that belongs to 𝑀1 . Then, C2RTS adds a directed belonging to 𝑀1 . From these dependencies, C2RTS extracts the names of edge from node 𝑀2 to node 𝐶1 in the EDG. the source and target classes and connect their corresponding nodes in the EDG with the proper directed edges. Rule 6 (Requires Port from Modified to Unmodified App-Component). Let 𝑀1 requires 𝑀2 , which is represented using the statement "requires 𝑀2 " 3.2.2. Mark modified nodes and select affected test cases in the "module-info.java" file of 𝑀1 . According to this require dependency from a modified module (𝑀1 ) to an unmodified module (𝑀2 ), To mark the modified classes and compute the set of selected C2RTS applies the jdeps technique with 𝑀1 and 𝑀2 to find the classes of test cases, C2RTS applies the same steps explained previously in Sec- 𝑀1 that depend on classes of 𝑀2 . Let jdeps result included that a class 𝐶1 tions 3.1.2 and 3.1.3 with one difference. That is instead of marking belonging to 𝑀1 depends on some class(es) belonging to 𝑀2 . Then, C2RTS nodes representing modified components in the EDG, C2RTS marks adds a directed edge from node 𝐶1 to node 𝑀2 in the EDG. the nodes that represent modified classes. Then, C2RTS computes the For example, in the modules’ configurations shown in Fig. 2, the transitive closure of each test case to find all components and classes app-component serviceProvider, which was identified by C2RTS that each test depends on. Thereafter, the set of impacted test cases as modified, requires the unmodified app-component stuSer- whose transitive dependencies in the EDG include some changed type, vice. Hence, the jdeps technique is applied with these two com- is returned by C2RTS as the output. For example, if the class Under- ponents and returns that the classes Undergraduate and Grad- graduate is modified and marked in the EDG shown in Fig. 4, then uate belonging to serviceProvider depend on the class ISu- the test cases TestUndergraduate, TestGraduate, and TestS- dent that belongs to stuService. Consequently, directed edges are tuSchedule will be selected and returned as the output of C2RTS. 7 M. Al-Refai and M.M. Hammad Journal of Systems Architecture 160 (2025) 103343 4. Experimental evaluation • RQ2: What is the precision violation w.r.t. Ekstazi of the proposed approaches CORTS and C2RTS? Furthermore, how does this pre- The goal of the evaluation is to compare CORTS and C2RTS with the cision violation compare to the precision violation w.r.t. Ekstazi state-of-the-art class-level RTS tools in terms of (1) safety violation, (2) achieved by the static class-level RTS approach STARTS? precision violation, (3) test suite reduction, (4) size of the dependency • RQ3: What is the reduction in test suite size achieved by CORTS graph that represents static dependencies from test cases to code en- and C2RTS? tities, and (5) reduction in test selection and execution times. An RTS • RQ4: How does the size of the static dependency graphs ex- technique is safe if it does not miss any modification-traversing test tracted by CORTS and C2RTS compare to the size of the static cases that should be selected for regression testing. An RTS technique is dependency graph extracted by STARTS? precise if it does not select non-modification traversing test cases. A test • RQ5: What is the time taken by CORTS and C2RTS to construct case is considered as a modification-traversing test case if it exercises and analyze the static dependency graph to select relevant tests, during its execution a modified, new, or previously removed code and what is their overall end-to-end testing time? statements. Only modification-traversing test cases can reveal faults in the modified version of a software system, and hence, must be selected for regression testing. 4.2. Subjects We compared CORTS and C2RTS with two RTS tools, Ekstazi [12] and STARTS [18]. They are both state-of-the-art for class-level RTS and We evaluated CORTS and C2RTS using the 12 subjects listed in have been widely evaluated on a large number of revisions of real world Table 1. These are open-source real-world Java projects, which are projects [17]. The class-level RTS process identifies changes at the class known to be compatible with Ekstazi and STARTS since they were level, instead of method or statement levels, and selects every test-class widely used in their evaluation [12,17,18]. Table 1 shows for each that traverses or depends on any changed class. Ekstazi uses dynamic subject, the latest revision (i.e., most recent revision of the project) on analysis and STARTS uses static analysis of compiled Java code. We which our experiments started (SHA), the number of the source classes compared CORTS and C2RTS with these class-level RTS approaches (CLASSES) of the latest reversion, i.e., classes of the core program because we aimed to investigate (1) how the safety can be improved without counting test classes, the number of the source test classes by raising the RTS granularity from class-level to component-level, (2) (TESTS) of the latest reversion, number of recovered components of how increasing the RTS granularity from class-level to component-level the latest revision (COMPS), number of ports between the recovered impacts the precision and test suite reduction, and (3) how increasing components (PORTS), and the number of used revisions (REVS). the RTS granularity reduces the size of the static component-level dependency graph compared to the static class-level dependency graph. Converting the projects to equivalent component-based projects. In order to evaluate the safety and precision of CORTS and C2RTS, It was not possible to evaluate CORTS and C2RTS using existing open- we computed their safety violations and precision violations w.r.t. Ek- source component-based Java applications, i.e., multi-module Java stazi [12]. Ekstazi is a code-based RTS approach known to be safe in applications developed using the JPMS capabilities. There are two terms of selecting all the modification-traversing test classes, widely main reasons for that. First, the great majority of existing open-source evaluated on a large number of revisions, and being adopted by several Object Oriented (OO) Java applications have not been converted to popular open source projects; as such it can be considered the state- component-based equivalent applications using JPMS. For example, of-the-art for class-level dynamic RTS tools. Assuming that a program as mentioned in Hammad et al. [22] after analyzing more than 1300 P, which has an original test suite T, was modified to a new version open-source Java projects, they found that only 33 are utilizing JPMS P’. Furthermore, assuming that two RTS approaches, RTS1 and Ekstazi capabilities. This finding comports with the results reported in prior were applied to select test cases from T based on the code modifications work [29] as well. Second, even for the 33 existing component-based to move the program from P to P’, such that RTS1 selected the set of projects that utilize JPMS capabilities, the modules of each project are test cases TRTS1 and Ekstazi selected the set of test cases TEkstazi . Then, open to all the system, leading to a situation in which components the safety violation of RTS1 w.r.t. Ekstazi, precision violation of RTS1 (i.e., modules) are granted more access than they need to function, w.r.t. Ekstazi, and reduction in test suite size obtained by RTS1 are and this violates the least-privilege architecture principle. Additionally, defined as follows: these projects are relatively small in size, significantly smaller than |𝐓𝐄𝐤𝐬𝐭 𝐚𝐳𝐢 ∖ 𝐓𝐑𝐓𝐒𝟏 | those listed in Table 1, and were created for educational purposes, 𝑆 𝑎𝑓 𝑒𝑡𝑦𝑉 𝑖𝑜𝑙𝑎𝑡𝑖𝑜𝑛 𝑤.𝑟.𝑡. 𝐸 𝑘𝑠𝑡𝑎𝑧𝑖 = |𝐓𝐄𝐤𝐬𝐭 𝐚𝐳𝐢 ∪ 𝐓𝐑𝐓𝐒𝟏 | meaning they are not real-world component-based Java applications. Therefore, we could not use these component-based projects to evaluate |𝐓𝐑𝐓𝐒𝟏 ∖ 𝐓𝐄𝐤𝐬𝐭 𝐚𝐳𝐢 | 𝑃 𝑟𝑒𝑐 𝑖𝑠𝑖𝑜𝑛𝑉 𝑖𝑜𝑙𝑎𝑡𝑖𝑜𝑛 𝑤.𝑟.𝑡. 𝐸 𝑘𝑠𝑡𝑎𝑧𝑖 = CORTS and C2RTS. |𝐓𝐄𝐤𝐬𝐭 𝐚𝐳𝐢 ∪ 𝐓𝐑𝐓𝐒𝟏 | In order to overcome this challenge, we converted the OO Java |𝐓| ∖ |𝐓𝐑𝐓𝐒𝟏 | projects listed in Table 1 to equivalent component-based Java projects. 𝑇 𝑒𝑠𝑡 𝑠𝑢𝑖𝑡𝑒 𝑟𝑒𝑑 𝑢𝑐 𝑡𝑖𝑜𝑛 𝑜𝑏𝑡𝑎𝑖𝑛𝑒𝑑 𝑏𝑦 𝑅𝑇 𝑆 1 = To do that, we leveraged the OO2CB [22] which utilizes the JPMS |𝐓| capabilities and converts an OO Java application to an equivalent The safety violation, precision violation, and test suite reduction component-based Java application according to the least-privilege se- are multiplied by 100 to make them percentages. Lower percentages curity principle. The OO2CB uses a component recovery framework for safety violation, precision violation, and higher percentages for test implemented by Garcia et al. [30], called ARCADE, to automatically de- suite reduction are better [17,28]. The size of the static dependency termine the components of an OO application. The ARCADE framework graph is computed in terms of number of nodes and edges of the graph. utilizes several well-known component recovery tools such as Architec- ture Recovery using Concerns (ARC) [31], Bunch [32], and Algorithm 4.1. Research questions for Comprehension-Driven Clustering (ACDC) [33]. OO2CB [22] takes as inputs the suggested components provided by the ACDC tool and In this research, we try to answer the following Research Questions the binary code of the OO Java application, and outputs the equivalent (RQ): component-based Java application that utilizes the JPMS features along • RQ1: What is the safety violation w.r.t. Ekstazi of the pro- with all the modules’ descriptors, i.e., the "module-info.java" posed static component-level RTS approaches CORTS and C2RTS? files, generated according to the least-privilege security principle. Furthermore, do CORTS and C2RTS reduce the safety viola- Selecting revisions. We downloaded the revisions of every subject tion w.r.t. Ekstazi (i.e., improve safety) compared to the static among the 12 subjects listed in Table 1 using the methodology in Le- class-level RTS approach STARTS? gunsen et al. [17]. First, we found the latest revision (specified by SHA 8 M. Al-Refai and M.M. Hammad Journal of Systems Architecture 160 (2025) 103343 Table 1 The Java projects used in our study. Subject SHA CLASSES TESTS COMPS PORTS REVS commons-math 96f2b16 864 485 59 1116 100 commons-configuration 5de7c48 261 171 18 190 100 commons-compress a189697 201 105 22 190 100 commons-collections f9f99cc 351 230 23 251 100 commons-dbcp 23f6717 60 54 4 12 100 commons-io 8d1b994 128 106 9 53 100 commons-lang 82fd251 154 153 13 83 100 commons-validator e2edf6a 64 76 3 6 100 commons-pool fde71c6 48 26 5 11 100 JFreeChart 86abdc8 638 344 33 443 100 jankotek.mapdb a333530 87 61 11 43 100 OpenTripPlanner 45c1a9f 1099 285 147 2724 100 Table 2 Average and median safety violation w.r.t. Ekstazi. Subject A-SVCORTS % M-SVCORTS % A-SVC2RTS % M-SVC2RTS % A-SVSTARTS % M-SVSTARTS % commons-math 0.0 0.0 0.04 0.0 0.58 0.0 commons-configuration 11.53 0.0 19.26 0.0 22.04 1.89 commons-compress 0.0 0.0 0.0 0.0 0.0 0.0 commons-collections 0.0 0.0 0.0 0.0 0.0 0.0 commons-dbcp 0.0 0.0 0.11 0.0 0.56 0.0 commons-io 0.0 0.0 0.0 0.0 0.0 0.0 commons-lang 0.0 0.0 0.0 0.0 0.0 0.0 commons-validator 0.14 0.0 1.57 0.0 6.19 0.0 commons-pool 0.79 0.0 1.29 0.0 1.65 0.0 JFreeChart 0.0 0.0 0.0 0.0 0.0 0.0 jankotek.mapdb 0.0 0.0 3.04 2.12 3.32 3.22 OpenTripPlanner 1.12 0.0 1.21 0.0 3.97 1.36 A- or M-SVi is the average/median (per subject) safety violation of a tool (i.e., CORTS, C2RTS, or STARTS) with respect to Ekstazi. in Table 1) that satisfied these conditions: (1) does not have a build or identifier, enabling users to retrieve the exact source code revision compile error, (2) no test case failures, and (3) successfully ran with directly from the respective project’s GitHub repository. The following STARTS and Ekstazi. Second, among all the revisions preceding SHA, subsections present and discuss the RTS results of all our experiments. we selected up to a hundred revisions (including the SHA revision) that met these conditions. The total number of selected revisions, for the 12 4.3. RQ1: Safety violation subjects, was 1200. These revisions met the prerequisites for Ekstazi and STARTS: (1) Maven version 3.2.5 or above, (2) Surefire version Table 2 shows for each subject the results of the median and average 2.14 or above, (3) JUnit version 3 or above, (4) Java version 1.8 or safety violation w.r.t. Ekstazi achieved by CORTS, C2RTS, and STARTS. above. We used OO2CB [22] to convert each of the 1200 revisions to The median and average values are computed per subject among all the an equivalent component-based version. Table 1 shows for each subject, subject’s revisions. As shown in Table 2, CORTS and C2RTS achieved the numbers of recovered components (COMPS) and ports (PORTS) better results for safety violation compared to STARTS. among the components of the latest subject’s revision used in our study. The median safety violation obtained by CORTS was zero for all For each subject, starting from the oldest revision, among the hun- the 12 subjects, while C2RTS had a value greater than zero for only dred revisions, up to the most recent revision specified by SHA, we ran one subject. For STARTS, the median safety violation was higher than Ekstazi and STARTS techniques on the successive pairs of revisions, zero for three subjects, i.e., 1.89% for commons-configuration and ran CORTS and C2RTS on the corresponding component-based and 3.22 for jankotek.mapdb. The average safety violation values versions of these revisions. To identify changed classes between the of CORTS and C2RTS were smaller than those for STARTS for 7 out previous and current pair of revisions, Ekstazi compares the smart of the 12 subjects, while all the RTS approaches achieved an average checksums of the previous and current versions of each compiled Java file (i.e., .class files). STARTS reuses the part of the Ekstazi source safety violation of zero for the remaining subjects. As it can be seen code to compute smart checksums and identify changed classes in the in Table 2, CORTS reduced the average safety violation almost by half same way. In order to ensure equitable comparisons with both Ek- from 22.04% to 11.53% for commons-configuration. stazi and STARTS, we adhere to the same methodology for comparing The proposed approaches, CORTS and C2RTS, outperformed STARTS smart checksums to identify changed classes, subsequently marking the in terms of safety violation because they compute dependencies from components housing these classes as modified. In particular, the list of test cases to code entities at a higher level of granularity (i.e., component- changed classes in STARTS can be generated by executing the Linux level) than STARTS. This component-level dependency analysis results command-line STARTS: diff,4 and we utilized this command-line in in higher over-estimation of test dependencies compared to class- our experiments to generate the list of classes that are identified as level test dependencies, in which more impacted (i.e., modification- changed through smart checksum comparisons. traversing) test cases are selected. In particular, the static analysis of The experimental dataset, which comprises the ACDC-recovered test dependencies at the component (or module) level rather than at the architectures for all 1200 revisions of the 12 Java projects, is publicly class-level can lead to the identification of a broader set of potentially available at https://github.com/mohammedrefai/RTS_ComponentLeve impacted test cases. This is due to the module-level analysis treating l . Each revision’s files are labeled with their corresponding SHA all classes within a module as a single entity. By considering the module as a unified unit, this approach inherently accounts for inter- class interactions within the module including dynamic dependencies 4 STARTS provides the command-line option to list types identified as that involve reflection, even without explicitly tracking such dynamic changed via smart checksum computation. dependencies. This holistic view increases the likelihood of capturing 9 M. Al-Refai and M.M. Hammad Journal of Systems Architecture 160 (2025) 103343 Table 3 Average and median precision violation with w.r.t. Ekstazi. Subject A-PVCORTS % M-PVCORTS % A-PVC2RTS % M-PVC2RTS % A-PVSTARTS % M-PVSTARTS % commons-math 83.47 96.1 50.27 53.33 33.12 25.0 commons-configuration 59.68 64.57 45.64 59.19 24.54 22.14 commons-compress 75.41 86.36 62.54 79.09 53.37 64.0 commons-collections 58.54 95.21 18.89 0.0 7.15 0.0 commons-dbcp 57.41 61.29 30.05 31.81 18.18 3.03 commons-io 53.09 80.19 28.24 0.0 16.21 0.0 commons-lang 63.88 84.07 56.97 76.35 48.16 63.84 commons-validator 60.11 92.11 40.25 11.21 17.45 10.71 commons-pool 55.01 54.54 53.71 52.17 35.11 26.66 JFreeChart 49.88 73.17 42.21 0.0 32.33 0.0 jankotek.mapdb 70.29 77.38 31.89 20.41 29.02 17.74 OpenTripPlanner 87.69 91.42 87.67 91.41 73.26 75.0 A- or M-PVi is the average/median (per subject) precision violation of a tool (i.e., CORTS, C2RTS, or STARTS) with respect to Ekstazi. Table 4 Reduction in test suite size. Subject A-RCORTS % A-RC2RTS % A-RSTARTS % A-REkstazi % commons-math 10.98 48.57 76.27 89.94 commons-configuration 18.03 32.51 66.46 77.03 commons-compress 11.01 24.07 56.42 86.42 commons-collections 36.96 77.96 92.87 95.51 commons-dbcp 10.32 41.74 55.98 67.72 commons-io 36.64 61.66 82.05 89.41 commons-lang 26.19 35.86 53.32 90.07 commons-validator 31.52 54.17 90.45 91.55 commons-pool 19.77 21.41 50.03 69.58 JFreeChart 43.83 51.12 84.64 93.32 jankotek.mapdb 8.12 52.56 55.61 70.29 OpenTripPlanner 20.04 20.06 57.67 89.67 A-RX is the average reduction (per subject) in test suite size achieved by an RTS approach X. dependencies that might be overlooked when analyzing at the finer STARTS. On the other hand, the average/median precision violation granularity of individual classes. values of C2RTS are smaller when compared with those yielded by Moreover, the average safety violation values of CORTS are smaller CORTS with a significant variance observed across most of the sub- than those of C2RTS. This is because C2RTS mixes tracking dependen- jects. For example, the average and median precision violation val- cies both between modules and within them at the class-level for the ues of CORTS are 58.54% and 95.21%, respectively, for the subject modified modules, in which inter-class dynamic dependencies that are commons-collections. These values are reduced by C2RTS to related to reflection are missed by C2RTS, resulting in missing impacted 18.89% and 0.0%, respectively. test cases that are captured by CORTS. C2RTS did make more mistakes in choosing irrelevant test cases It is essential to acknowledge that the component recovery tools uti- compared to STARTS, but the precision violation yielded by C2RTS was lized, namely ACDC and OO2CB, are based on static analysis and do not not too far from that provided by STARTS. For 8 out of the 12 subjects, detect the dynamic class dependencies or communications associated C2RTS was, on average, only up to 13% less accurate than STARTS. For with dynamic class loading and reflection. Consequently, the resultant the remaining subjects, the difference went up to 21%. Interestingly, in component-based applications in our experimentation lack the ‘‘opens 3 out of the 12 subjects, C2RTS had a median precision violation of 0%. with’’ directive within the generated ‘‘module-info.java’’ files. Conse- 4.5. RQ3: Test suite reduction quently, CORTS and C2RTS overlooked impacted test cases, resulting in safety violation values higher than zero for some of the subjects as seen Table 4 shows for each subject the average reduction in test suite in Table 2. We anticipate that CORTS and C2RTS will yield diminished size achieved by CORTS, C2RTS, STARTS, and Ekstazi. The average safety violation values, potentially zero or near-zero, provided that values are computed per subject among all the subject’s revisions. reflection-related dependencies are comprehensively captured and rep- The four RTS approaches achieved reduction in test suite size. The resented within the ‘‘module-info.java’’ files of the evaluation subjects. average reduction in test suite size overall the 12 subjects was 22.78% This would entail modifications to ACDC and OO2CB to accurately cap- for CORTS, 43.47% for C2RTS, 68.48% for STARTS, and 84.21% for ture and represent reflection-related dependencies within the recovered Ekstazi. The highest reduction was yielded by Ekstazi since it is a component-based applications. We plan to investigate this direction in dynamic approach. the future. It is evident that (1) both CORTS and C2RTS achieved a reduction for all the subjects even though they perform RTS at a higher level 4.4. RQ2: Precision violation of granularity than STARTS, and (2) C2RTS increased the reduction compared to CORTS from 22.78% to 43.47% on average since it tracks Table 3 shows, for each subject, the results of the median and dependencies within the modified components at the class-level. More- average precision violation w.r.t. Ekstazi achieved by CORTS, C2RTS, over, C2RTS achieved high reduction by more than 50% on average and STARTS. The median and average values are computed per subject for 5 out of the 12 subjects, and a reduction by more than 40% on among all the subject’s revisions. average for 2 other subjects. Furthermore, the comparative analysis The average and median safety violations of CORTS and C2RTS are with STARTS reveals that C2RTS maintains a competitive edge, with higher than those of STARTS. This is because CORTS and C2RTS com- the difference in average test suite reduction between C2RTS and pute test dependencies at a higher levels of granularity than STARTS STARTS not surpassing 21% for 5 subjects and remaining below 38% and have higher overestimation of impacted test cases than that of across all the 12 subjects. 10 M. Al-Refai and M.M. Hammad Journal of Systems Architecture 160 (2025) 103343 Table 5 Dependency graph size. Subject NODESCORTS EDGESCORTS NODESC2RTS EDGESC2RTS NODESSTARTS EDGESSTARTS commons-math 503 4391 567 5090 2099 12 689 commons-configuration 190 1532 284 2470 827 4743 commons-compress 147 933 219 1713 547 2299 commons-collections 202 1396 236 1763 907 3536 commons-dbcp 36 147 126 684 178 711 commons-io 109 554 154 881 336 1017 commons-lang 167 982 227 1537 746 2252 commons-validator 77 208 142 581 179 592 commons-pool 27 108 124 574 208 748 JFreeChart 373 2594 468 4298 1033 7092 jankotek.mapdb 197 876 494 4600 1281 7342 OpenTripPlanner 432 5335 698 8910 2884 15 479 NODESX or EDGESX is the average number of nodes or edges in the dependency graph that was constructed by an RTS approach (X ). Table 6 Dependency graph size reduction ratios of CORTS and C2RTS with respect to STARTS. Subject R_NODESCORTS R_EDGESCORTS R_NODESC2RTS R_EDGESC2RTS commons-math 4.17 2.89 3.70 2.49 commons-configuration 4.35 3.10 2.91 1.92 commons-compress 3.72 2.46 2.50 1.34 commons-collections 4.49 2.53 3.84 2.01 commons-dbcp 4.94 4.84 1.41 1.04 commons-io 3.08 1.84 2.18 1.15 commons-lang 4.47 2.29 3.29 1.47 commons-validator 2.32 2.59 1.26 1.02 commons-pool 7.56 6.91 1.66 1.31 JFreeChart 2.76 2.73 2.21 1.65 jankotek.mapdb 6.48 8.37 2.59 1.58 OpenTripPlanner 6.67 2.91 4.12 1.73 R_NODESX /R_EDGESX is the average reduction ratio of nodes/edges of class-level dependency graph achieved by an RTS approach (X ). 4.6. RQ4: Reduction in dependency graph size memory. Furthermore, this efficiency in graph size management is particularly beneficial in cloud-based Continuous Integration (CI) envi- Table 5 shows – for each subject – the average number of nodes ronments, where resource and memory consumption directly influences and edges of the static dependency graphs extracted by CORTS, C2RTS, costs, suggesting that such optimizations can result in economical and STARTS. The average values are computed per subject among all advantages. the subject’s revisions. It is evident that CORTS and C2RTS generated dependency graphs of smaller sizes compared to STARTS. 4.7. RQ5: Selection phase and end-to-end testing times Table 6 shows, for each subject, the average size reduction ratio of the dependency graphs extracted by CORTS and C2RTS with respect to The end-to-end execution time of an RTS approach includes two the dependency graph extracted by STARTS. The size reduction ratio main phases, which are: (1) the selection phase that analyzes what test is computed separately for nodes and edges as follows. For a specific cases to select, and (2) the execution phase that runs the selected test revision of a subject, the size reduction ratio for nodes/edges achieved cases. For static RTS approaches, the selection phase time consists of by CORTS/C2RTS is computed as the number of nodes/edges of the the time taken to construct the static dependency graph, read adapted graph extracted by STARTS divided by the number of nodes/edges of classes and flag them in the graph, and analyze (i.e., traverse) the graph the graph extracted by CORTS/C2RTS. to select relevant test cases. Table 7 reports the selection phase time Referring to the data in Table 6, CORTS achieved an average for CORTS (SELECTCORTS ), C2RTS (SELECTC2RTS ), and static class-level reduction in the STARTS dependency graph node count by a factor RTS (SELECTSTARTS-like ), as well as the end-to-end time for CORTS starting from 4 up to 7 for 8 of the subjects, and by a factor of (E2ECORTS ), C2RTS (E2EC2RTS ), and STARTS (E2ESTARTS ). Table 7 also approximately 3 for the remaining subjects. On the other hand, C2RTS presents TESTAll, which is a strategy that just runs all test cases achieved an average reduction in the STARTS dependency graph node without performing any RTS analysis. We use the TESTAll strategy count by a factor higher than 2 (i.e., ranging from 2.18 to 4.12) for time as the baseline and compared the end-to-end times of the RTS 9 subjects out of the 12 subjects. Furthermore, CORTS achieved an approaches with it. Table 7 displays, per subject, the overall cumulative average reduction in the STARTS dependency graph edge count by time for all the 100 revisions of the subject. factors ranging approximately from 2 up to 8 for 11 subjects, while It is important to mention that for static class-level RTS, we did not C2RTS achieved an average reduction in edge count by factors ranging separately measure the selection phase time (i.e., SELECTSTARTS-like time from 1.02 up to 2.49. in Table 7) using STARTS. Instead, we developed a STARTS-like tool The results presented in Table 6 are encouraging and indicating that that functions similarly to STARTS by using jdeps to extract class CORTS and C2RTS are effective in minimizing the static dependency dependencies and building a class-level dependency graph. However, graph size compared to class-level RTS techniques. This capability the STARTS-like tool utilizes JGraphT for graph construction and is crucial and presents significant implications for several reasons. analysis, whereas STARTS uses the custom, faster yasgl library [34]. First, the reduced complexity of dependency graphs makes our RTS To ensure a fair comparison, we compared the selection phase time of approaches more scalable to very large applications such as enterprise- CORTS and C2RTS with STARTS-like, since all three use JGraphT level applications with extensive codebases. Second, smaller graphs for graph operations. Additionally, STARTS does not provide specific require less computational resources for analysis and consume less commands to report the exact selection phase time separately from 11 M. Al-Refai and M.M. Hammad Journal of Systems Architecture 160 (2025) 103343 Table 7 Selection phase time and end-to-end testing time in seconds. Subject SELECTCORTS SELECTC2RTS SELECTSTARTS-like TESTAll E2ECORTS E2EC2RTS E2ESTARTS commons-math 4.153 11.859 23.209 11,042.579 9572.399 6023.054 3664.108 commons-configuration 1.445 6.283 10.329 2579.310 2132.351 1826.158 1592.784 commons-compress 0.871 1.819 4.263 819.969 730.118 621.127 572.821 commons-collections 1.206 2.586 8.957 1287.695 809.505 259.606 124.392 commons-dbcp 0.273 0.526 1.237 8252.194 7466.198 5032.529 4653.431 commons-io 0.436 1.119 2.275 4975.982 3106.881 2123.687 1828.807 commons-lang 0.852 1.583 3.781 1621.582 1131.397 1067.314 776.306 commons-validator 0.251 0.569 0.908 168.435 117.999 82.979 38.188 commons-pool 0.252 0.456 0.809 31,183.825 26,125.138 26,026.082 24,691.189 JFreeChart 2.235 10.406 38.091 538.722 305.523 274.989 149.916 jankotek.mapdb 1.195 9.279 13.849 56,929.985 54,353.748 40,567.913 38,828.116 OpenTripPlanner 8.167 20.198 45.879 17,672.936 17,318.823 17,330.737 14,778.681 SELECTX is the summation (per subject) of the overall execution time taken by an RTS approach (X ) for the test selection process (i.e., construct and analyze dependency graph to select tests). E2EX is the summation (per subject) of the overall end-to-end execution time taken by an RTS approach (X ). TESTAll is the summation (per subject) of the overall time taken to just run all test cases. other RTS phases and operations, e.g., computing and storing smart that fewer impacted test cases are missed, reducing safety violations. checksums. For the end-to-end time, we reported the time taken by However, this comes at the expense of increased test suite size and STARTS (i.e., E2ESTARTS time in Table 7) instead of the STARTS-like higher precision violations, as non-impacted test cases may also be tool. selected. On the other hand, STARTS operates at the class level, and The CORTS had the shortest selection phase time across all the 12 thus, achieves higher test suite reduction and precision, minimizing subjects, followed by C2RTS, with STARTS-like taking the longest. For the selection of unnecessary test cases. However, this finer granularity instance, in the JFreeChart project, CORTS took 2.235 s, C2RTS took can lead to higher safety violations compared to component-level RTS. 10.4 s, and class-level RTS took 38.09 s. By averaging the selection phase This is because static class-level RTS may miss dynamic dependencies time across the 12 subjects, we found that the overall average selection related to reflection and dynamic class loading. In contrast, CORTS phase time was 1.77 s for CORTS, 5.56 s for C2RTS, and 12.79 s for and C2RTS can account for such dependencies as they are explicitly STARTS. This is because the dependency graphs for CORTS and C2RTS declared in the module-info.java files. are smaller compared to the static class-level dependency graph. The RTS execution time. The dependency graph construction and reported selection phase times suggest that CORTS and C2RTS scale analysis time for CORTS and C2RTS is significantly shorter than for better for larger graphs, requiring less time to construct and analyze STARTS. This improvement is due to the smaller size of component- dependency graphs for test case selection. level dependency graphs. However, the time spent on dependency All three RTS approaches, i.e., CORTS, C2RTS, and STARTS, reduced graph processing constitutes a minor fraction of the overall end-to- the overall end-to-end testing time compared to the TESTAll baseline end testing time, which is dominated by test execution. Consequently, across all 12 subjects. For example, in the commons-pool project, STARTS significantly outperformed CORTS and C2RTS in terms of the which comes with long running JUnit test cases, the TESTAll took overall end-to-end testing time, as it obtained higher reduction in test 31,183 s for running all test cases of the subject, which is the total suite size and smaller precision violations. time summed-up for all the 100 revisions of this subjects, while the While dependency graph efficiency does not drastically impact total overall end-to-end testing time was 26,125 s for CORTS, 26,026 s end-to-end testing time, it plays a crucial role in continuous integration for C2RTS, and 24,691 for STARTS. For all the subjects, STARTS had (CI) environments by enabling faster feedback cycles for developers. the shortest end-to-end time due to its highest reduction in the test Rapid test selection allows for immediate identification of impacted suite size, followed by C2RTS, while CORTS took the longest time. By tests, reducing delays in iterative development workflows. averaging the end-to-end testing time across the 12 subjects, we found Scenarios for Class- versus Component-level RTS. The choice of that the overall average time was 11,422.76 s for TESTALL, 10,264.17 s RTS approach depends on the application’s context and requirements. for CORTS, 8436.34 s for C2RTS, and 7641.56 s for STARTS. Despite For example, class-level RTS is preferable in resource-constrained en- that CORTS had the longest time, it still showed reduction in the end- vironments where test execution cost and time are critical, e.g., mo- to-end testing time compared to the TESTAll strategy, indicating its bile app development pipelines. It is also preferable for applications practical value in regression testing. C2RTS achieved better results than with frequent but small changes where the likelihood of missing im- CORTS in reducing the end-to-end time, showing that such a hybrid pacted test cases is minimal, such as utility libraries or microservices RTS technique can provide balancing between component- and class- with isolated functionality. On the other hand, component-level RTS level, where it achieves reduction in the regression testing time while (e.g., CORTS and C2RTS) can be more preferable in safety-critical do- still maintaining high safety and scalability, making it suitable for large mains where ensuring comprehensive test coverage outweighs reducing JPMS-based programs where balance between performance and safety regression testing execution time, such as in component-based adaptive is critical. systems with fault tolerance mechanisms [35], aircraft [36], aerospace or other safety-critical systems [37,38]. Additionally, component-level 4.8. Results discussion RTS can be more appropriate for large-scale, monolithic enterprise systems with complex interdependencies across components, and we Balancing metrics across RTS approaches. The evaluation re- plan to investigate this direction in the future. sults highlight a trade-off between key regression test selection (RTS) metrics: safety violation, precision violation, test suite reduction, and 5. Threads to validity end-to-end testing time. Specifically, STARTS outperforms CORTS and C2RTS in terms of precision violation, test suite reduction, and test External validity. External validity affects the generalizability of execution time, while CORTS and C2RTS achieve lower safety violation our results. One external validity threat is the use of 12 Java projects rates. which might not be representative, so our results may not generalize. CORTS and C2RTS emphasize safety by employing a coarser granu- However, the selected subjects are widely used to evaluate RTS ap- larity (component-level) in dependency analysis. This strategy ensures proaches [12,17–19,39], vary in size, application domain, and number 12 M. Al-Refai and M.M. Hammad Journal of Systems Architecture 160 (2025) 103343 of test classes, which reduces this threat. Additionally, the results could Each node in a CFG represents a simple or conditional statement, and differ for larger Enterprise Resource Planning (ERP) systems, but we each edge represents the flow of control between statements. Entities anticipate that component-level RTS would be even more scalable and affected by modifications are selected by traversing in parallel the CFGs reusable in such cases compared to class-level RTS approaches. We plan of P and P’, and when the target entities of like-labeled CFG edges in to investigate this direction in the future. P and P’ differ, the edge is added to the set of affected entities. There- Another threat to the external validity of our experimental results after, Rothermel and Harrold extended the CFG-based algorithm for is the use of OO2CB tool which uses the ACDC tool to determine C++ using Inter-procedural Control-Flow Graphs (ICFG) [43]. Harrold the components of OO Java projects. Therefore, OO2CB inherits the et al. [44] further extended the CFG approach for Java software using shortcoming of the ACDC in determining components of OO Java the Java Inter-class Graph (JIG) to handle Java features and incomplete projects. Using other component recovery tools, such as Architecture programs. Recovery using Concerns (ARC) [31], Bunch [32], or Weighted Com- Vokolos and Frankl [45] consider RTS based on text differencing bined Algorithm (WCA) [40], could change the RTS results. To reduce using the Unix diff command. The approach compares the original this threat, we leveraged the best-performing component recovery tool, code version with the modified version to identify the modified state- i.e., ACDC [33], as concluded in prior comparative studies of recovery ments, and selects test cases that exercise code blocks containing these techniques [30]. statements. Internal validity. An internal factor that can affect the outcome To improve the efficiency of dynamic RTS, a number of techniques is the possible errors in the implementations of CORTS and C2RTS. at coarser granularity (e.g., method- or class-level) rather than the To mitigate this threat, we built our implementation on mature tools finer granularity CFG level (e.g., statement-level) were proposed. Ren (i.e., JGraphT [41] and jdeps [27]) and tested it thoroughly. et al. [46] and Zhang et al. [47] applied change-impact analysis at the Another threat to internal validity is the use of the OO2CB tool [22] method-level, based on call graphs techniques, to improve RTS. Recent to generate the (JPMS) component-based projects from the subjects RTS approaches [12,18,19] were proposed to make RTS more cost- according to the least-privilege security principle. The threat is related effective in modern software systems by focusing on class-level RTS that to the false positives caused by the static analysis of class dependencies. (1) identifies changes at the class level and (2) computes dependencies Static analysis results may overestimate the communications between from test cases to the classes under test. Additionally, these approaches components which might lead to granting a component more privileges consider a test class as a test case, and thus, select test classes instead than it needs in the resulted CB app. Consequently, this could impact of test methods. Gligoric et al. [12] proposed Ekstazi, an approach that the results of our experiments, especially the precision and reduction tracks dynamic dependencies of test cases at the class level and selects in test suite size. However, OO2CB uses the BCEL tool [42], a widely- test cases that traverse modified classes. Ekstazi is a safe RTS approach, used library in the industry to analyze Java apps, which mitigates this and its safety is based on the formally proven safety of the change-based threat. Furthermore, OO2CB defines all port types in the component- RTS approach [48]. Zhang [19] proposed HyRTS, which is a dynamic based applications except the ones responsible for Java reflection and and hybrid approach that supports analyzing the adapted classes at dynamic class loading techniques, i.e., the open and opens with multiple granularity levels (i.e., method and class levels) to improve ports. This limitation impacted the safety violation results yielded by the precision and selection time. Running HyRTS using the class-level CORTS and C2RTS. We expect that these results could be even better if mode produces the same RTS results as Ekstazi [19]. Thus, HyRTS was CORTS and C2RTS are applied to component-based applications that not considered in our experimental evaluation. utilize the opens with port to denote dependencies deriving from While these dynamic RTS techniques can be safe, they require dy- reflection and dynamic class loading among components. This area will namic test coverage information which may be absent, costly to collect, be a focus of our future research endeavors. or require prohibitive instrumentation (e.g., for non-deterministic or Construct validity. In general, we could have used other metrics real-time code). On the other hand, our proposed approaches, CORTS (e.g., test coverage and fault detection ability) to evaluate the effec- and C2RTS are static, but still can capture runtime information repre- tiveness of CORTS and C2RTS. However, we used the most common sented in the module descriptors (i.e., module-info.java file) such as de- metrics in the research literature: safety violation, precision violation, reduction in test suite size, and reduction in end-to-end test execution pendencies related to dynamic class loading and reflection represented time. We also used the reduction in dependency graph size as measure using the opens with directive. to evaluate the scalability of CORTS and C2RTS to large subjects. Static RTS. Kung et al. [49,50], Hsia et al. [51], and White and Ab- Another threat to construct validity is that we chose Ekstazi as dullah [25] proposed firewall-based approaches. The firewall contains the ground truth against which to evaluate the static RTS techniques, the changed classes and their dependent classes, where the dependent e.g., we computed the safety and precision violations with respect to classes are identified based on static analysis. Test cases that traverse Ekstazi. Although Ekstazi is a state-of-the-art, and is recognized as classes in the firewall are selected. Jang et al. [52] apply firewall- a leading and accessible dynamic class-level RTS tool, it might not based RTS at the method level to C++ software. They identify firewalls encompass the entirety of benchmarks for all RTS scenarios. around all the methods affected by a change and select all the test cases Conclusion Validity. We only used 12 subjects to evaluate exercising these methods for regression testing. Ryder and Tip [53] CORTS and C2RTS. The use of additional subjects could affect the proposed a call-graph-based static change-impact analysis technique conclusions of the evaluation. To reduce this threat, we used large real- and evaluated only one call-graph analysis on 20 revisions of one world Java projects that have been used in other experiments for fair project [54]. Skoglund and Runeson’s [48] proposed a change-based comparison. approach that only selects those test cases that exercise the changed classes. ChEOPSJ [55,56] is a static change-based approach that uses 6. Related work the FAMIX model to represent software entities including test cases and building dependencies between them. These approaches use fine- RTS can reduce regression testing efforts and has been studied for grained information such as constructor calls and method invocation over three decades [14,15]. Below we summarize the existing dynamic statements to build dependencies between software entities. and static RTS approaches. Legunsen et al. [17,18] proposed STARTS, which is a static RTS Dynamic RTS. Many graph-walk approaches address the problem approach that is based on the idea of the class-level firewall. STARTS of RTS. Rothermel and Harrold [2] propose a safe approach for RTS for builds a dependency graph of program types based on compile-time procedural programs. The algorithm uses control-flow graphs (CFG) to information, and selects test cases that can reach changed types in the represent each procedure in a program P and its modified version P’. transitive closure of the dependency graph. Yu et al. [16] evaluated 13 M. Al-Refai and M.M. Hammad Journal of Systems Architecture 160 (2025) 103343 method-level and class-level static RTS in continuous integration en- that component-level RTS can be more scalable for large-scale projects. vironments. Class-level RTS was determined to be more practical and Additionally, C2RTS demonstrated better precision than CORTS, thus time-saving than method-level RTS. balancing between safety and precision while still reducing the size of Gyori et al. [20] compared variants of dynamic and static class- the static dependency graph compared to static class-level RTS. Both, level RTS with project-level RTS in the Maven Central open source CORTS and C2RTS, reduced the end-to-end testing time in comparison ecosystem. An ecosystem may contain a large number interconnected to running all test cases without performing RTS. projects, where client projects transitively depend on library projects. We plan to extend the application of CORTS and C2RTS to large- Project-level RTS identifies changes at the project level and computes scale enterprise Java systems. Furthermore, it is critical to acknowledge dependencies from test cases to projects. When a library changes, then that the component recovery tools used in our experimental evalua- all test cases in the library and all test cases in all the library’s transitive tions, such as ACDC, do not possess the capability to detect dynamic clients are selected. Class-level RTS was found to be less costly than dependencies, such as those involving reflection, and consequently, do project-level RTS in terms of reduction in test suite size. not incorporate these dependencies into the module descriptor files. Shi et al. [8] focused on optimizing RTS in continuous integra- Moving forward, we plan to explore and experiment with alternative tion (CI) environments. They compared module- and class-level RTS component recovery tools that can better capture dynamic dependen- techniques in the Travis cloud-based CI environment, and developed cies and reflection. Additionally, we plan to explore the application a hybrid RTS technique, called GIBstazi, that combines aspects of of our RTS approaches in the context of Java 9 modular applications the module- and class-level RTS techniques. Their work focuses on when treating the modularized Java Runtime Environment (JRE) and Maven modules (i.e., build-system modules) utilizing techniques like third-party libraries, along with their dependencies, as part of the CB the Git Inferred Build (GIB) to optimize test selection based on module application. This investigation aims to evaluate the impact of reduced dependencies determined by the build system (i.e., focusing on build- runtime size, e.g., only including the required modules of the JRE and time dependencies). While the work of Shi et al. [8] is more aligned third-party libraries, on the RTS performance. with multi-module Java applications structured with the Maven build system, our approaches, CORTS and C2RTS, utilize JPMS modules that CRediT authorship contribution statement emphasize dependencies -including runtime dependencies specified us- ing the opens with directive- and encapsulation according to the Mohammed Al-Refai: Writing – review & editing, Writing – origi- least privilege concept. Although we did not empirically compare the nal draft, Visualization, Validation, Supervision, Software, Resources, precision of JPMS-module level RTS to build-system module level RTS, Project administration, Methodology, Investigation, Formal analysis, we anticipate that the latter may be less precise in detecting affected Data curation, Conceptualization. Mahmoud M. Hammad: Writing – tests due to the broader scope of build-time dependencies. On the review & editing, Visualization, Validation, Resources, Formal analysis. other hand, JPMS-based RTS could potentially offer more precise and safer test selection due to the explicit module dependencies and en- Declaration of competing interest capsulation provided by JPMS. As a future work, we plan to transform multi-module maven-based Java applications into their JPMS-utilizing The authors declare that they have no known competing finan- counterparts to further evaluate the efficacy of CORTS and C2RTS in cial interests or personal relationships that could have appeared to such environments. influence the work reported in this paper. Overall, CORTS and C2RTS are similar to the described static RTS approaches in terms of applying the firewall impact analysis tech- References nique, but at the module-level rather than the class- and method-levels. However, unlike the existing static RTS techniques, our proposed ap- [1] A. Bertolino, Software testing research: Achievements, challenges, dreams, proaches can capture runtime information that are explicitly included in: 2007 Future of Software Engineering, IEEE Computer Society, 2007 in the module descriptor files. pp. 85–103. [2] G. Rothermel, M.J. Harrold, A safe, efficient regression test selection technique, ACM Trans. Softw. Eng. Methodol. 6 (2) (1997) 173–210. 7. Conclusions and future work [3] M.J. Harrold, Testing evolving software, J. Syst. Softw. 47 (2–3) (1999) 173–181. [4] H.K.N. Leung, L.J. White, Insights into regression testing, in: Proceedings of As software systems become increasingly complex and large, espe- Conference on Software Maintenance, IEEE, Miami, FL, USA, 1989, pp. 60–69. cially with the implementation of the Java Platform Module System [5] P.K. Chittimalli, M.J. Harrold, Recomputing coverage information to assist (JPMS), traditional regression test selection (RTS) techniques at the regression testing, IEEE Trans. Softw. Eng. 35 (4) (2009) 452–469. method and class levels often face challenges in efficiency and resource [6] E. Engström, P. Runeson, A qualitative survey of regression testing practices, management. This research was driven by the desire to refine RTS in: International Conference on Product Focused Software Process Improvement, for Java applications modularized with JPMS. This research leverages Springer, 2010, pp. 3–16. [7] R. Greca, B. Miranda, A. Bertolino, State of practical applicability of regression component-level granularity and provides a substantial foundation for testing research: A live systematic literature review, ACM Comput. Surv. 55 (13s) advancing RTS practices tailored to modern Java applications, pre- (2023) 1–36. senting a strong case for the adoption of component-level analysis in [8] A. Shi, P. Zhao, D. Marinov, Understanding and improving regression test professional and large-scale development environments. selection in continuous integration, in: 2019 IEEE 30th International Symposium We introduced two novel static component-based RTS approaches, on Software Reliability Engineering, ISSRE, IEEE, 2019, pp. 228–238. CORTS and its variant C2RTS, tailored for component-based Java soft- [9] W. Sun, X. Xue, Y. Lu, J. Zhao, M. Sun, Hashc: Making deep learning coverage ware systems modularized with JPMS. CORTS constructs a module- testing finer and faster, J. Syst. Archit. 144 (2023) 102999. level dependency graph using architectural metadata from module [10] Y. Lu, K. Shao, J. Zhao, W. Sun, M. Sun, Mutation testing of unsupervised descriptor files to determine the impact of changes and select relevant learning systems, J. Syst. Archit. 146 (2024) 103050. [11] Testing at the speed and scale of Google, 2011, http://google-engtools.blogspot. test cases. C2RTS extends this by incorporating class-level analysis for com/2011/06/testing-at-speed-and-scale-of-google.html. modified modules, offering a hybrid approach that balances granu- [12] M. Gligoric, L. Eloussi, D. Marinov, Practical regression test selection with larity to improve precision while maintaining safety. Our evaluation dynamic file dependencies, in: Proceedings of the 2015 International Symposium of CORTS and C2RTS on real-world software systems demonstrated on Software Testing and Analysis, ISSTA’15, ACM, Baltimore, MD, USA, 2015, improvements, in terms of safety, over static class-level RTS paradigms. pp. 211–222. Additionally, both CORTS and C2RTS reduced the dependency graph [13] L.C. Briand, Y. Labiche, S. He, Automating regression test selection based on size compared to static class-level RTS, thus, providing an evidence UML designs, J. Inf. Softw. Technol. 51 (1) (2009) 16–30. 14 M. Al-Refai and M.M. Hammad Journal of Systems Architecture 160 (2025) 103343 [14] E. Engström, P. Runeson, M. Skoglund, A systematic review on regression test [40] O. Maqbool, H. Babri, Hierarchical clustering for software architecture recovery, selection techniques, Inf. Softw. Technol. 52 (1) (2010) 14–30. IEEE Trans. Softw. Eng. 33 (11) (2007) 759–780. [15] S. Yoo, M. Harman, Regression testing minimization, selection and prioritization: [41] B. Naveh, J.V. Sichi, JGraphT a free Java graph library, 2011. A survey, J. Softw. Test. Verif. Reliab. 22 (2) (2012) 67–120. [42] BCEL documentation available at http://jakarta.apache.org/bcel/. [16] T. Yu, T. Wang, A study of regression test selection in continuous integration en- [43] G. Rothermel, M.J. Harrold, J. Dedhia, Regression test selection for C++ vironments, in: S. Ghosh, R. Natella (Eds.), Proceedings of the 29th International software, Softw. Test. Verif. Reliab. 10 (2) (2000) 77–109. Symposium on Software Reliability Engineering, ISSRE’18, IEEE, Memphis, TN, [44] M.J. Harrold, J.A. Jones, T. Li, D. Liang, A. Orso, M. Pennings, S. Sinha, S.A. USA, 2018, pp. 135–143. Spoon, A. Gujarathi, Regression test selection for Java software, in: J. Vlissides [17] O. Legunsen, F. Hariri, A. Shi, Y. Lu, L. Zhang, D. Marinov, An extensive study (Ed.), Proceedings of the 16th Conference on Object-Oriented Programming, of static regression test selection in modern software evolution, in: J. Cleland- Systems, Languages, and Applications, OOPSLA’01, ACM, Tampa, FL, USA, 2001, Huang, Z. Su (Eds.), Proceedings of the 2016 24th ACM SIGSOFT International pp. 312–326. Symposium on Foundations of Software Engineering, FSE’16, ACM, Seattle, WA, [45] F. Vokolos, P.G. Frankl, Empirical evaluation of the textual differencing re- USA, 2016, pp. 583–594. gression testing technique, in: Proceedings of the International Conference on [18] O. Legunsen, A. Shi, D. Marinov, STARTS: Static regression test selection, Software Maintenance, SM’98, Bethesda, MD, USA, 1998, pp. 44–53. in: M. Di Penta, T.N. Nguyen (Eds.), Proceedings of the 32nd IEEE/ACM [46] X. Ren, F. Shah, F. Tip, B.G. Ryder, O. Chesley, Chianti: a tool for change International Conference on Automated Software Engineering, ASE’17, IEEE impact analysis of java programs, in: Proceedings of the 19th Annual ACM Press, Urbana-Champaign, IL, USA, 2017, pp. 949–954. SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and [19] L. Zhang, Hybrid regression test selection, in: M. Chechik, M. Harman (Eds.), Applications, 2004, pp. 432–448. Proceedings of the 40th International Conference on Software Engineering, [47] L. Zhang, M. Kim, S. Khurshid, Faulttracer: a change impact and regression fault ICSE’18, IEEE, Gotheburg, Sweden, 2018, pp. 199–209. analysis tool for evolving java programs, in: Proceedings of the ACM SIGSOFT [20] A. Gyori, O. Legunsen, F. Hariri, D. Marinov, Evaluating regression test selection 20th International Symposium on the Foundations of Software Engineering, 2012, opportunities in a very large open-source ecosystem, in: S. Ghosh, R. Natella pp. 1–4. (Eds.), Proceedings of the 29th International Symposium on Software Reliability [48] M. Skoglund, P. Runeson, Improving class firewall regression test selection by Engineering, ISSRE’18, IEEE, Memphis, TN, USA, 2018, pp. 112–122. removing the class firewall, Int. J. Softw. Eng. Knowl. Eng. 17 (3) (2007) [21] JPMS. http://openjdk.java.net/projects/jigsaw/spec/. 359–378. [22] M.M. Hammad, I. Abueisa, S. Malek, Tool-assisted componentization of Java ap- [49] D.C. Kung, J. Gao, P. Hsia, J. Lin, Y. Toyoshima, Class firewall, test order, and plications, in: 2022 IEEE 19th International Conference on Software Architecture, regression testing of object-oriented programs, J. Occup. Organ. Psychol. 8 (2) ICSA, 2022, pp. 36–46, http://dx.doi.org/10.1109/ICSA53651.2022.00012. (1995) 51–65. [23] OpenJDK: Jigsaw project. https://openjdk.java.net/projects/jigsaw/. [50] D.C. Kung, J. Gao, P. Hsia, Y. Toyoshima, C. Chen, On regression testing of [24] R.N. Taylor, N. Medvidovic, E.M. Dashofy, Software architecture: foundations, object-oriented programs, J. Syst. Softw. 32 (1) (1996) 21–40. theory, and practice, Google Sch. Google Sch. Digit. Libr. Digit. Libr. (2009) [51] P. Hsia, X. Li, D.C.-H. Kung, C.-T. Hsu, L. Li, Y. Toyoshima, C. Chen, A technique (2009). for the selective revalidation of OO software, J. Software: Evol. Process. 9 (4) [25] L.J. White, K. Abdullah, A firewall approach for regression testing of object- (1997) 217–233. oriented software, in: Proceedings of the 10th International Software Quality [52] Y.K. Jang, M. Munro, Y.R. Kwon, An improved method of selecting regression Week, QW’97, San Francisco, CA, USA, 1997. tests for C++ programs, J. Softw. Maint. Evol. 13 (5) (2011) 331–350. [26] D. Michail, J. Kinable, B. Naveh, J.V. Sichi, JGraphT—A Java library for graph [53] B.G. Ryder, F. Tip, Change impact analysis for object-oriented programs, in: data structures and algorithms, ACM Trans. Math. Software 46 (2) (2020). Proceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis [27] jdeps: The Java class dependency analyzer. Available from Oracle: https://docs. for Software Tools and Engineering, 2001, pp. 46–53. oracle.com/javase/8/docs/technotes/tools/unix/jdeps.html. [54] X. Ren, F. Shah, F. Tip, B.G. Ryder, O. Chesley, J. Dolby, Chianti: A Prototype [28] A. Shi, A. Gyori, M. Gligoric, A. Zaytsev, D. Marinov, Balancing trade-offs in Change Impact Analysis Tool for Java, Tech. Rep., Rutgers University, 2003. test-suite reduction, in: A. Orso, M.-A. Storey (Eds.), Proceedings of the 22nd [55] Q.D. Soetens, S. Demeyer, A. Zaidman, Change-based test selection in the International Symposium on Foundations of Software Engineering, FSE’14, ACM, presence of developer tests, in: A. Cleve, F. Ricca (Eds.), Proceedings of the 17th Hong Kong, China, 2014, pp. 246–256. European Conference on Software Maintenance and Reengineering, CSMR’13, [29] N. Ghorbani, J. Garcia, S. Malek, Detection and repair of architectural inconsis- IEEE, Genoa, Italy, 2013, pp. 101–110. tencies in Java, in: 2019 IEEE/ACM 41st International Conference on Software [56] Q.D. Soetens, S. Demeyer, A. Zaidman, J. Pérez, Change-based test selection: An Engineering, ICSE, 2019, pp. 560–571, http://dx.doi.org/10.1109/ICSE.2019. empirical evaluation, Empir. Softw. Eng. (2015) 1–43. 00067. [30] J. Garcia, I. Ivkovic, N. Medvidovic, A comparative analysis of software archi- tecture recovery techniques, in: 2013 28th IEEE/ACM International Conference Dr. Mohammed Al-Refai is an Assistant Professor in the on Automated Software Engineering, ASE, IEEE, 2013, pp. 486–496. Computer Science Department within the Computer and [31] J. Garcia, D. Popescu, C. Mattmann, N. Medvidovic, Y. Cai, Enhancing architec- Information Technology School at the Jordan University of tural recovery using concerns, in: 2011 26th IEEE/ACM International Conference Science and Technology (JUST). Al-Refai’s research focuses on Automated Software Engineering, ASE 2011, IEEE, 2011, pp. 552–555. on various areas within software engineering, including [32] B.S. Mitchell, S. Mancoridis, On the automatic modularization of software model-driven development, model-based testing, software systems using the bunch tool, IEEE Trans. Softw. Eng. 32 (3) (2006) 193–208. architecture, software testing, regression test selection and prioritization, software security, and the integration of fuzzy [33] V. Tzerpos, R.C. Holt, ACDC: an algorithm for comprehension-driven clustering, logic and machine learning in software engineering applica- in: Proceedings Seventh Working Conference on Reverse Engineering, IEEE, 2000, tions. Al-Refai earned his Ph.D. in Computer Science from pp. 258–267. Colorado State University, Fort Collins, Colorado, under the [34] Yet another simple graph library. https://github.com/TestingResearchIllinois/ supervision of Prof. Sudipto Ghosh. He also holds M.S. and yasgl. B.S. in Computer Science from Jordan University of Science [35] M. Stoicescu, J.-C. Fabre, M. Roy, Architecting resilient computing systems: A and Technology. Al-Refai is a member of the Association for component-based approach for adaptive fault tolerance, J. Syst. Archit. 73 (2017) Computing Machinery (ACM) and the Institute of Electrical 6–16. and Electronics Engineers (IEEE). [36] H. Usach, J.A. Vila, C. Torens, F. Adolf, Architectural design of a safe mission manager for unmanned aircraft systems, J. Syst. Archit. 90 (2018) 94–108. [37] Z. Yang, Z. Qiu, Y. Zhou, Z. Huang, J.-P. Bodeveix, M. Filali, C2AADL_Reverse: Dr. Mahmoud Hammad is an Associate Professor in the A model-driven reverse engineering approach to development and verification Software Engineering Department within the Computer and of safety-critical software, J. Syst. Archit. 118 (2021) 102202. Information Technology School at the Jordan University of Science and Technology (JUST). He is also the director of [38] I. Allende, N. Mc Guire, J. Perez, L.G. Monsalve, R. Obermaisser, Towards the Center for E-Learning and Open Educational Resources. Linux based safety systems—A statistical approach for software execution path Hammad’s research interests are in the field of software coverage, J. Syst. Archit. 116 (2021) 102047. engineering, specifically in the area of software architecture, [39] M.K. Shin, S. Ghosh, L.R. Vijayasarathy, An empirical comparison of self-adaptive software systems, mobile computing, software four Java-based regression test selection techniques, J. Syst. Softw. 186 analysis, software security, natural language processing and (2022) 111174, http://dx.doi.org/10.1016/j.jss.2021.111174, URL https://www. machine learning. Hammad received his Ph.D. in Software sciencedirect.com/science/article/pii/S0164121221002582. Engineering from the University of California, Irvine (UCI) 15 M. Al-Refai and M.M. Hammad Journal of Systems Architecture 160 (2025) 103343 under the supervision of Prof. Sam Malek . During his Ph.D., received his M.S. in Software Engineering from George Hammad developed a self-protecting Android software sys- Mason University, VA, USA and B.S. in Computer Science tem , an Android software system that can monitor itself and from Yarmouk University, Jordan . Hammad is a member adapt (change) its behavior at runtime to keep the system of the Association of Computing Machinery (ACM), ACM secure and protected from Inter-Component Communication Special Interest Group on Software Engineering (SIGSOFT), attacks at all times. Hammad and the Institute of Electrical and Electronics Engineers (IEEE). https://hammadmahmoud.github.io/ 16