Journal of Systems Architecture 160 (2025) 103343


                                                                    Contents lists available at ScienceDirect


                                                           Journal of Systems Architecture
                                                           journal homepage: www.elsevier.com/locate/sysarc


Component-based architectural regression test selection for modularized
software systems
Mohammed Al-Refai a ,∗, Mahmoud M. Hammad b
a Computer Science, Computer and Information Technology, Jordan university of science and technology, P.O. Box 3030, Irbid, 22110, Jordan
b
    Software Engineering, Computer and Information Technology, Jordan university of science and technology, P.O. Box 3030, Irbid, 22110, Jordan


ARTICLE                 INFO                               ABSTRACT

Keywords:                                                  Regression testing is an essential part of software development, but it can be costly and require significant
Regression test selection                                  computational resources. Regression Test Selection (RTS) improves regression testing efficiency by only re-
Static analysis                                            executing the tests that have been affected by code changes. Recently, dynamic and static RTS techniques for
Component-based architecture
                                                           Java projects showed that selecting tests at a coarser granularity, class-level, is more effective than selecting
Java platform module system
                                                           tests at a finer granularity, method- or statement-level. However, prior techniques are mainly considering
Software architecture
                                                           Java object-oriented projects but not modularized Java projects. Given the explicit support of architectural
                                                           constructs introduced by the Java Platform Module System (JPMS) in the ninth edition of Java, these research
                                                           efforts are not customized for component-based Java projects. To that end, we propose two static component-
                                                           based RTS approaches called CORTS and its variant C2RTS tailored for component-based Java software
                                                           systems. CORTS leverages the architectural information such as components and ports, specified in the module
                                                           descriptor files, to construct module-level dependency graph and identify relevant tests. The variant, C2RTS,
                                                           is a hybrid approach in which it integrates analysis at both the module and class levels, employing module
                                                           descriptor files and compile-time information to construct the dependency graph and identify relevant tests.
                                                               We evaluated CORTS and C2RTS on 1200 revisions of 12 real-world open source software systems, and
                                                           compared the results with those of class-level dynamic (Ekstazi) and static (STARTS) RTS approaches. The
                                                           results showed that CORTS and C2RTS outperformed the static class-level RTS in terms of safety violation
                                                           that measures to what extent an RTS technique misses test cases that should be selected. Using Ekstazi as the
                                                           baseline, the average safety violation with respect to Ekstazi was 1.14% for CORTS, 2.21% for C2RTS, and
                                                           3.19% for STARTS. On the other hand, the results showed that CORTS and C2RTS selected more test cases
                                                           than Ekstazi and STARTS. The average reduction in test suite size was 22.78% for CORTS and 43.47% for
                                                           C2RTS comparing to the 68.48% for STARTS and 84.21% for Ekstazi. For all the studied subjects, CORTS
                                                           and C2RTS reduced the size of the static dependency graphs compared to those generated by static class-level
                                                           RTS, leading to faster graph construction and analysis for test case selection. Additionally, CORTS and C2RTS
                                                           achieved reductions in overall end-to-end regression testing time compared to the retest-all strategy.


1. Introduction                                                                                 in the overall test-suite execution time. This rapid increase poses a
                                                                                                challenge to manage, even for a company with extensive computing
    Regression testing is the process of running the existing test cases                        resources [12]. Regression test selection (RTS) approaches are used to
on a new version of a software system to ensure that the performed                              improve regression testing efficiency [3,12]. RTS is defined as the ac-
modifications do not introduce new faults to previously tested code [1–
                                                                                                tivity of selecting a subset of test cases from an existing test set to verify
3]. Regression testing is one of the most expensive activities performed
                                                                                                that the affected functionality of a program is still correct [3,12,13].
during the lifecycle of a software system with some studies [4–10]
estimating that it can take up to 80% of the testing budget and up to                               The RTS problem has been studied for over three decades [14,15].
50% of the software maintenance cost. For instance, Google reported                             Traditional code-based RTS approaches take four inputs: the two ver-
that their regression-testing system, TAP [11], experienced a linear                            sions (new and old) of a software system, the original test suite, and
growth in both the number of software changes and the average test-                             dependency information of the test cases on the old version. The output
suite execution time, which ultimately resulted in a quadratic rise


    ∗ Corresponding author.
       E-mail addresses: mnalrefai@just.edu.jo (M. Al-Refai), m-hammad@just.edu.jo (M.M. Hammad).

https://doi.org/10.1016/j.sysarc.2025.103343
Received 30 May 2024; Received in revised form 12 January 2025; Accepted 12 January 2025
Available online 18 January 2025
1383-7621/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
M. Al-Refai and M.M. Hammad                                                                                         Journal of Systems Architecture 160 (2025) 103343


is the subset of test cases – from an existing test set – that must be             graph size with respect to static class-level RTS techniques, (4) execu-
re-executed on the modified version of the software system [12].                   tion time required to construct and analyze the static dependency graph
    RTS techniques vary in the granularity at which they compute                   to select relevant test cases, and (5) reduction in the end-to-end regress-
test dependencies from test cases to code statements, basic blocks,                ing testing time compared to the retest-all strategy. We compared the
methods, or classes. Recently, researchers showed that, for individual             results obtained by CORTS and C2RTS with those of the state-of-the-
projects, class-level RTS can be more efficient and beneficial than iden-          art class-level dynamic (Ekstazi [12]) and static (STARTS [18]) RTS
tifying changes and computing dependencies at lower granularities,                 approaches, using 1200 revisions of 12 real world Maven-based Java
e.g., statement and method levels [12,16,17]. Therefore, the current               software systems.
trend [12,16–20] is to focus on class-level RTS by (1) identifying                     This paper is organized as follows. Section 2 provides an illustrative
changes at the class level and (2) computing dependencies from test                example to explain the work of our approach. Section 3 describes the
cases to the classes under test. In addition to supporting class-level RTS,        proposed approaches, CORTS and C2RTS. Section 4 presents the eval-
these approaches consider a test class as a test case, and thus, select test       uation. Section 5 describes the threats to the validity of our approach
classes instead of test methods [12,18,19].1                                       and results. Related work is summarized in Section 6. Conclusions and
    Class-level RTS can be static or dynamic, by analyzing dependencies            plans for future work are outlined in Section 7.
from test cases to classes under test statically or dynamically. A recent
extensive experimental evaluation of static class-level RTS [17,18]                2. Illustrative example
showed that it is comparable with the state-of-the-art dynamic class-
level RTS approach, called Ekstazi [12]. While such a dynamic RTS                       This section presents an illustrative example of a Java 9 Component-
approach requires code instrumentation and runtime information to                  Based (CB) application of a university system, which is adapted from
find affected tests, static class-level RTS does not require such infor-           the example used in Hammad et al. [22]. We use this example in the
                                                                                   following section (i.e., Section 3) to demonstrate how our approaches,
mation, and instead, it builds a dependency graph of program types
                                                                                   CORTS and C2RTS, are used with a CB application.
based on compile-time information, and selects test cases that can reach
                                                                                        The university system example is developed according to the Java
changed types in the transitive closure of the dependency graph [17,
                                                                                   Platform Module System (JPMS) [21], which is a key feature of project
18]. However, static class-level RTS approaches can be unsafe, which
                                                                                   Jigsaw [23], designed to provide a scalable module system for Java.
means they might miss selecting test cases that are impacted by code
                                                                                   It enables developers to build applications using modular constructs,
changes. The use of Java reflection is the main cause of unsafety in
                                                                                   i.e., components (modules) and ports (module directives), offering a
static RTS approaches when compared with dynamic RTS approaches.
                                                                                   higher level of abstraction than packages or classes. The modularized
Reflection in Java allows for runtime behaviors that can be challenging
                                                                                   Java 9 JRE allows applications to depend on specific modules of the
to predict statically, which means static RTS might miss identifying
                                                                                   JRE rather than the entire runtime environment. Each module in JPMS
some dependencies during test selection [17,18].
                                                                                   includes a descriptor file called ‘‘module-info.java’’, which specifies its
    The previous dynamic and static class-level RTS techniques have
                                                                                   dependencies and exported services. The JPMS supports various ports
primarily focused on Java object-oriented projects, without addressing
                                                                                   that enable a module to export its services or require services from
the unique needs of modularized Java applications. With the intro-
                                                                                   other modules, facilitating clear and maintainable module interactions.
duction of the Java Platform Module System (JPMS) [21] in Java 9
                                                                                        Fig. 1 shows the component-based architecture of the university
and newer versions, existing RTS research approaches have not been                 system. It is important to mention that Hammad et al. [22] created this
adapted to accommodate the architectural constructs of component-                  university system by converting its equivalent Java 8 Object Oriented
based Java projects. To bridge this gap, we propose two static compo               version to the CB version according to the OO2CB tool proposed in
nent-based RTS approaches, CORTS and its variant C2RTS, specifi-                   [22], which is a tool that converts Java 8 OO apps to equivalent Java 9
cally designed for component-based Java software systems that are                  CB apps following the least-privilege security principle. A least-privilege
developed using the JPMS architectural constructs.                                 architecture is an architecture in which each component is only granted
    JPMS provides explicit implementation-level support for well-known             the exact privileges, in terms of inter-component communications as
architectural constructs, such as components (called modules) and ports            well as the required JRE modules, it needs to provide its functional-
(called module directives). These constructs provide a higher level of ab-         ity [22,24]. This principle is also important to perform safe and precise
straction than Java packages and classes. CORTS leverages the architec-            regression test selection based on the exact needed inter-component
tural constructs information, such as components and ports, presented              communications/dependencies.
in the module descriptor files, named ‘‘module-info.java’’ [21], to con-                Before presenting the example details, it is also important to men-
struct module-level dependency graph. The variant, C2RTS, is a hybrid              tion that generally, there are two common methods for organizing test
technique that integrates module- and class-level analysis, and there-             cases in CB applications: (1) placing them in separate test-components
fore, uses both the module descriptor files and part of compile-time               or (2) alongside core application classes within app-components. The
information to construct the dependency graph. The two approaches,                 first method, adopted in this paper for illustration, creates distinct
CORTS and C2RTS, find relevant test cases that can reach some changed              components for test classes, aligning with the separation of concerns
module/class in the transitive closure of the dependency graph. Similar            principle by isolating production and test code. This approach ensures
to recent RTS approaches [12,16–20], CORTS and C2RTS consider each                 clear boundaries and flexible management of dependencies specific to
test class as a test case.                                                         testing. Both CORTS and C2RTS are compatible with either method. In
    CORTS and C2RTS can improve safety over traditional static class-              this paper, we use test-components to refer to the modules containing
level RTS techniques by capturing runtime module-level dependencies                test classes, and use app-components to refer to the modules containing
that are related to reflection and dynamic class loading mechanisms.               the core application classes, i.e., production code.
This is possible because such dependencies are explicitly defined in-                   As depicted in Fig. 1, the university system consists of four app-
side the module descriptor files using the open and opens with                     components, i.e., modules,2 which are location, registration,
directives [21].                                                                   stuService, and serviceProvider. In addition, the system con-
    We evaluated CORTS and C2RTS in terms of (1) safety and precision              tains three test-components, which are locationTest, registra-
violations, (2) reduction in test suite size, (3) reduction in dependency          tionTest, and serviceProviderTest. The java.logging com-
                                                                                   ponent is also used by the system. The Java classes in the system

   1
     From this point until the end of the paper we use the term test case to
                                                                                     2
refer to a test class.                                                                   In the paper, we use the terms module and component interchangeably.


                                                                               2
M. Al-Refai and M.M. Hammad                                                                                    Journal of Systems Architecture 160 (2025) 103343


                                               Fig. 1. Component-Based (CB) application adapted from [22].


interact as follows: The StuSchedule class generates a suggested                serviceProviderTest declares a requires directive in its
schedule for a student and logs relevant details to a log file using            ‘‘module-info.java’’ file to establish this communication, as shown on
the java.util.Logger class. Additionally, StuSchedule dynami-                   Line 8 of Fig. 2(b). Simultaneously, the stuService component de-
cally loads the ClassRoomManager class and invokes its methods via              fines an exports to directive to expose the people package to ser-
Java reflection to retrieve classroom information, which it logs in the         viceProviderTest, as illustrated on Line 13 of Fig. 2(a). Addition-
students’ schedules. A student can either be an Undergraduate or a              ally, because IStudent is an interface, serviceProviderTest
Graduate, with both classes implementing the IStudent interface.                also declares a uses directive, as indicated on Line 9 of Fig. 2(b).
    The corresponding ‘‘module-info.java’’ files for the four app-comp
onents are presented in Fig. 2(a), while those for the three test-com           3. Approach
ponents are shown in Fig. 2(b). In the remainder of this section, we
discuss some of the key directives used in these ‘‘module-info.java’’               This section describes our proposed component-based RTS appro
files: provides with, exports to, opens to, and uses.                           aches, CORTS and C2RTS, which are static analysis tools and based
                                                                                on analyzing dependencies from test cases to the components of the
    As shown in Fig. 1, the stuService component contains the
                                                                                software application under test. CORTS and C2RTS assume that the
IStudent interface inside the people package. In order for the Un-
                                                                                software application is component-based Java application. It also worth
dergraduate and Graduate classes from the serviceProvider
                                                                                mentioning that if the app is constructed according to the least-
component to implement the interface, the stuService needs to ex-
                                                                                privilege architecture, in which each component is only granted the
port the people package using the exports to port that is shown in
                                                                                precise dependencies to components and resources that are needed to
Line 12 of Fig. 2(a). In addition to the exports to port, the servi-
                                                                                provide its functionality, our RTS approaches yield more precise test
ceProvider component needs to define two more ports. One port to                case selection.
require the stuService component as shown in Line 18 of Fig. 2(a)
                                                                                    Consistent with the current trend in code-based RTS research [12,
and another provides with port to provide the functionalities of
                                                                                18,19], CORTS and C2RTS consider a test class to be a test case. They
the IStudent interface using the Graduate and Undergraduate                     support both unit and system test cases. The inputs to CORTS and
implementation as shown in Lines 19–21 of Fig. 2(a).                            C2RTS are the previous version of the Java application along with its
    The class StuSchedule located in the registration compo-                    test cases, i.e., the application before modification, and the current
nent contains a code to dynamically load the class ClassRoomMan-                (modified) version of the Java application. The output is the set of
ager and invoke its methods using Java reflection. Therefore, the               selected test cases that must be re-executed on the current version of
location component that contains the ClassRoomManager class                     the application.
defines an opens to port to open the package ClassRoom to the                       We present CORTS in Section 3.1 and its variant C2RTS is described
registration component as shown in Line 26 of Fig. 2(a). This                   in Section 3.2.
port enables the classes of the registration component to load and
access all classes of the ClassRoom package using the Java reflection           3.1. The corts approach
mechanisms.
    The test classes TestGraduate and TestUndergraduate, be-                        CORTS takes the previous version of a CB Java app along with its
longing to the serviceProviderTest test-component, use the IS-                  test cases, then it parses the module descriptor files (the module-
tudent interface from the stuService component. As a result,                    info.java files) of all app-components and test-components. While

                                                                            3
M. Al-Refai and M.M. Hammad                                                                                      Journal of Systems Architecture 160 (2025) 103343


                                                               Fig. 2. module-info.java files.


parsing the descriptors, CORTS constructs a directed graph, called En-           all communication ports that are directed towards or emanating from T
tity Dependency Graph (EDG), where each node represents a component              are represented in the EDG as directed edges leading to or originating
or a test case (test class), and the directed edges among the nodes              from every node representing a test class belonging T. CORTS is capable
represent the various types of dependencies among the components,                of identifying all test classes associated with a given test-component
such as requires, uses and provides with dependencies. After                     through a straightforward method. This involves navigating the file
that, CORTS compares the previous version of the CB app with the                 system directory designated for the test-component and locating the
current version of the app to identify the modified components and               class files contained inside it. In the context of component-based Java
flag their corresponding nodes in the EDG. Then, CORTS finds and                 applications organized using JPMS features, each component is as-
returns the set of affected test cases that directly or transitively reach       signed a distinct OS directory. This directory houses all Java packages,
a modified component in the EDG. The detailed process of CORTS                   classes, and the module-info.java file pertinent to the component,
consists of the three steps:                                                     facilitating the identification process.
                                                                                     When CORTS scans the module-info.java file of each compo-
    1. Building the EDG from the component-based application (Sec-               nent in the CB app, it adds directed edges in the EDG according to the
       tion 3.1.1).                                                              following rules. We demonstrate each of these rules using the extracted
    2. Identifying the modified components in the EDG (Section 3.1.2).           EDG shown in Fig. 3 for the illustrative CB example depicted in Fig. 1.
    3. Selecting the affected test cases (Section 3.1.3).
                                                                                 Rule 1 (Requires Port). Let 𝑀1 be a component that requires another
   We demonstrate these steps in light of the illustrative example               component 𝑀2 , where this communication is represented using the state-
shown in Fig. 1.                                                                 ment "requires 𝑀2 " in the module-info.java file of 𝑀1 . This
                                                                                 requires port means that a class(es) that belongs to 𝑀1 depends/
3.1.1. Building the EDG from the component-based application                     communicates with a class(es) that belongs to 𝑀2 . According to this de-
    In this step, CORTS parses the module-info.java descriptors                  pendency, CORTS adds a directed edge from node 𝑀1 to node 𝑀2 in the
of the app- and test-components of the previous version of the Java              EDG.
application. While parsing the descriptor files, CORTS builds the EDG,               For example, the registration component requires the
where each node in this directed graph represents a component or a               stuService component as specified in the corresponding module de-
test case, and the directed edges among the nodes represent the various          scriptor file shown in Fig. 2, and therefore, a directed edge is added in
types of dependencies among the components. As an example, Fig. 3                the EDG from node registration to node stuService as depicted
shows the extracted EDG for the CB example shown in Fig. 1.                      in Fig. 3. Moreover, as shown in Fig. 2, the serviceProviderTest
    CORTS distinguishes between descriptor files of app-components               component requires the stuService component. Therefore, a
and those of test-components. If the module-info.java descriptor                 directed edge is added in the EDG from every test class node belonging
is for an app-component A, then a node is added in the EDG for A,                to serviceProviderTest, i.e., the test classes TestGraduate
and all communication ports that are directed towards or emanating               and TestUndergraduate, to the node stuService, as shown in
from A, e.g., requires or use ports, are represented in the EDG as               Fig. 3.
directed edges leading to or originating from the node A. However, if
the descriptor is for a test-component T, then CORTS adds a node in              Rule 2 (Provides With and Uses Ports). Let 𝐶1 be a class in module
the EDG for each individual test class that belongs to T. Subsequently,          𝑀1 and 𝐴2 be an abstract class or an interface in module 𝑀2 , where

                                                                             4
M. Al-Refai and M.M. Hammad                                                                                            Journal of Systems Architecture 160 (2025) 103343


                                                    Fig. 3. Entity Dependency Graph (EDG) extracted by CORTS.


𝐶1 implements or extends 𝐴2 . This dependency is represented using the               3.1.2. Identifying the modified components in the EDG
statement "provides 𝐴2 with 𝐶1 " in the module-info.java file                            This step involves identifying the modified components to mark
of 𝑀1 . Additionally, let 𝐶3 be a class that belongs to module 𝑀3 , where            their associated nodes in the EDG as modified. CORTS considers a com-
𝐶3 uses 𝐴2 , which is represented using the statement "uses 𝐴2 " in the              ponent modified if any of its classes have undergone changes. There are
module-info.java file of 𝑀3 . Then, the component 𝑀3 can utilize                     several methods to determine which classes have been modified. For in-
the java.util.ServiceLoader from the java.base JPMS JDK                              stance, the Linux diff command can be used to compare the directories
module to load implementations (i.e., 𝐶1 belonging to 𝑀1 ) of the service 𝐴2 .       of a component across the previous and current versions of the Java
According to this dependency from component 𝑀3 to component 𝑀1 that                  application. Should this command highlight a component’s directory
contains the concrete class 𝐶1 , CORTS adds a directed edge from node 𝑀3             due to alterations or removal of any class within it, or the addition
to node 𝑀1 in the EDG.                                                               of new classes into it, CORTS will then mark the node representing
    For example, as depicted in the module configuration files shown in              that component in the EDG as modified. Another method involves
Fig. 2, the serviceProvider component provides the interface                         comparing the smart checksums of the previous and current versions
IStudent of the component stuService with the concrete classes                       of each compiled Java file (i.e., .class files) to identify changed
Graduate and Undergraduate. Additionally, the registration                           classes [12]. In environments employing Continuous Integration (CI)
component uses the IStudent interface. Those communication ports                     for Java application development, like GitHub, the modifications can
enable the component registration to access the component ser-                       also be traced through version control specific commands, such as git
viceProvider and load the two concrete classes, Graduate and                         diff, to find the changed classes and components. Currently, CORTS
Undergraduate, via the class java.util.ServiceLoader.                                primarily utilizes the Linux diff strategy to pinpoint and mark the
Therefore, a directed edge is added in the EDG from node regis-                      modified components within the EDG. However, it is effortless to make
tration to node serviceProvider as shown in Fig. 3. Likewise,                        CORTS supports other strategies.
the test component serviceProviderTest uses the IStudent                                For example, if the ClassRoomManager class is modified, e.g.,
interface as depicted in Fig. 2, which grants this test component an                 some of its source code is changed to add/delete/modify methods, then
access to the concrete classes Graduate and Undergraduate of                         the component containing this class, which is location, is marked as
the component serviceProvider. Therefore, a directed edge is                         modified in the EDG shown in Fig. 3.
added in the EDG from each test class node (i.e., nodes representing
TestGraduate and TestUndergraduate that belong to ser-
viceProviderTest) to node serviceProvider, as shown in                               3.1.3. Selecting the affected test cases
Fig. 3.                                                                                  In this step, mirroring the methodology of firewall static RTS
                                                                                     approaches [17,25], CORTS traverses the EDG to identify the nodes of
Rule 3 (Opens with Port). Let 𝑝1 be a package that belongs to a module               all test cases that reach nodes representing modified components. In
𝑀1 , and let this module opens 𝑝1 to another module 𝑀2 , such that this              particular, CORTS calculates the transitive closure for each test case
dependency is represented using the statement "opens 𝑝1 to 𝑀2 " in the               to find all the components that a test case depends on. Subsequently,
module-info.java file of 𝑀1 . Then, 𝑀2 can communicate with 𝑀1                       the set of impacted test cases whose transitive dependencies include
and load and access classes of the package 𝑝1 via the Java reflection and            some modified component, is returned as the output by CORTS. We
dynamic class loading mechanisms. According to this dependency from 𝑀2               used the JGraphT library [26] to construct the EDG and to calculate
to 𝑀1 , CORTS adds a directed edge from node 𝑀2 to node 𝑀1 in the EDG.               the transitive closures for the test cases within the EDG.
    For example, in the module configuration files shown in Fig. 2,                    To complete the demonstration example, if the class ClassRoom-
the location component opens its package classRoom to the                            Manager is modified and its component location is marked in
registration component. Therefore, a directed edge is added in                       the EDG shown in Fig. 3, then all test cases that transitively reach
the EDG from node registration to node location, as shown                            the location node, which are TestClassRoomMngr and Test-
in Fig. 3.                                                                           StuSchedule, will be selected and returned as the output of CORTS.

                                                                                 5
M. Al-Refai and M.M. Hammad                                                                                       Journal of Systems Architecture 160 (2025) 103343


                                                Fig. 4. Entity Dependency Graph (EDG) extracted by C2RTS.


3.2. The c2rts approach                                                             Representing unmodified app-components as nodes in the EDG.
                                                                                The unmodified app-components of the application are handled us-
    We have developed a hybrid RTS approach that combines aspects               ing the same way employed by CORTS, where they are presented
from both the Component-level and Class-level RTS techniques, called            as nodes in the EDG using the same method described previously in
C2RTS. This variant of CORTS integrates module- and class-level de-             Section 3.1.1. For example, the app-component location is identified
pendency analyses, trading off to strike a balance between safety and           by C2RTS as unmodified, and therefore, is represented as a single node
precision by adjusting the level of granularity from modules to classes         in the EDG.
depending on the specific classes where code changes have been made.                Representing test-components as nodes in the EDG. Similar to
C2RTS trades off some safety for increased precision compared to                CORTS, the C2RTS approach creates a separate node for each individual
CORTS.                                                                          test class in the EDG. For example, the EDG shown in Fig. 4 contains a
    While constructing the EDG, C2RTS distinguishes between modified            node for each test class, such as the test classes TestStuSchedule
and unmodified app-components within the Java application. Specifi-             and TestGraduate.
cally, each unmodified app-component is represented as a single node                Next, we describe the various ways of C2RTS for (1) extracting
in the EDG, whereas all classes belonging to a modified app-component           dependencies among classes of a modified app-component in Sec-
are represented as individual nodes.                                            tion 3.2.1.2, (2) extracting dependencies among unmodified app-
    As an example for an EDG constructed by C2RTS, Fig. 4 shows the             components in Section 3.2.1.3, (3) extracting dependencies between
                                                                                unmodified and modified app-components in Section 3.2.1.4, and (4)
constructed EDG given that the app-component serviceProvider is
                                                                                extracting dependencies between test-components and app-components
identified as a modified app-component by C2RTS, and thus, all classes
                                                                                in Section 3.2.1.5.
belonging to this app-component are represented as individual nodes
in the EDG. The remaining app-components are unmodified, and thus,              3.2.1.2. Extracting dependencies among modified app-component classes.
each of them is represented as a single node in the EDG. The subsequent         The dependencies among the classes of a modified app-component are
subsections elaborate on the entire process undertaken by C2RTS to              extracted using the Oracle Java Class Dependency Analyzer (jdeps)
construct the EDG and select test cases.                                        tool [27].3 These dependencies are represented as directed edges in
                                                                                the EDG between the nodes representing the classes of the modified
3.2.1. Building the EDG from the component-based application                    app-component.
   We explain the steps applied by C2RTS to build the EDG nodes and
                                                                                3.2.1.3. Extracting dependencies among unmodified app-components. The
edges from the component-based application.
                                                                                dependencies among the unmodified app-components are extracted
3.2.1.1. Mappings from components to nodes in the edg. This section             and represented in the EDG according to Rules 1, 2, and 3 described
explains how C2RTS maps the unmodified app-components, modified                 previously in Section 3.1.1. For example, in the EDG represented in
app-components, and test-components to nodes in the EDG.                        Fig. 4, C2RTS added an edge from node registration to node
    Representing modified app-components as nodes in the EDG.                   location according to Rule 3.
Given the previous and current versions of the CB app, if an app-               3.2.1.4. Extracting dependencies between unmodified and modified app-
component is modified between the two versions, then instead of                 components. The dependencies between the unmodified components
representing this app-component as a single node in the EDG, C2RTS              and the classes of the modified components are extracted using: (1)
represents each class belonging to the app-component as a single node           information extracted from the component configuration files, the
in the EDG.                                                                     module-info.java files, and (2) information extracted using the
    In our illustrative example, we suppose that the app-component              jdeps tool. These extracted dependencies are used to construct the
serviceProvider depicted in Fig. 1 is identified as modified by
C2RTS. Consequently, all the classes of this component (i.e., the Un-
dergraduate and Graduate classes) are represented as nodes in                    3
                                                                                   jdeps now is part of the standard Java library, and is used to analyze the
the EDG, as shown in the EDG represented in Fig. 4.                             module-level, package-level, and class-level dependencies of Java class files.


                                                                            6
M. Al-Refai and M.M. Hammad                                                                                       Journal of Systems Architecture 160 (2025) 103343


EDG according to three formally defined rules, Rules 4, 5, 6. The three           added in the EDG from the nodes representing Undergraduate and
rules are described using the following assumptions:                              Graduate to the node representing stuService, as shown in Fig. 4.
    • Let 𝐴𝑝𝑝 be a previous version of a component-based Java appli-              3.2.1.5. Extracting dependencies between test-components and app-comp
      cation that was modified to the current version 𝐴𝑝𝑝′ .                      onents. The C2RTS approach represents each individual test class be-
    • Let a module 𝑀1 represents an app-component that was identified             longing to a test-component as a single node in the EDG. The depen-
      as modified between 𝐴𝑝𝑝 and 𝐴𝑝𝑝′ , i.e., some classes that belong           dencies from test classes to unmodified app-components are extracted
      to 𝑀1 were modified.                                                        and represented as edges in the EDG according to Rules 1, 2, and 3
    • Let a module 𝑀2 represents an app-component that belongs to                 explained previously in Section 3.1.1. On the other hand, dependencies
      𝐴𝑝𝑝 and this module is not modified in 𝐴𝑝𝑝′ , i.e., unmodified              from test classes to classes belonging to modified app-components are
      app-component.                                                              extracted according to the following two rules, Rules 7 and 8 which are
   Given these assumptions, C2RTS represents 𝑀2 as a single node in               modified versions of Rules 4 and 5, respectively.
the EDG, and instead of representing 𝑀1 as a single node, C2RTS rep-
resents all the classes/interfaces belonging to 𝑀1 as nodes in the EDG.           Rule 7 (Provides with and Uses Ports). Let 𝐶1 be a class in module 𝑀1
Subsequently, C2RTS extracts the dependencies between the classes of              and 𝐴2 be an abstract class or an interface in module 𝑀2 , where 𝐶1
𝑀1 and the module 𝑀2 and reflects them as edges in the EDG based                  implements or extends 𝐴2 . This port is represented using the statement
on the following rules:                                                           "provides 𝐴2 with 𝐶1 " in the "module-info.java" file for 𝑀1 .
                                                                                  Additionally, let 𝑇 𝑀3 be a test-module, where some test classes belonging
Rule 4 (Provides with and Uses Ports). Let 𝐶1 be a class in module 𝑀1 and         to 𝑇 𝑀3 use 𝐴2 , and this dependency is represented using the statement
𝐴2 be an abstract class or an interface in module 𝑀2 , where 𝐶1 implements        "uses 𝐴2 " in the "module-info.java" file of 𝑇 𝑀3 . This uses
or extends 𝐴2 . This port is represented using the statement "provides            port enables test classes belonging to the test-component 𝑇 𝑀3 to utilize
𝐴2 with 𝐶1 " in the "module-info.java" file for 𝑀1 . Additionally,
                                                                                  the java.util.ServiceLoader from the java.base JPMS JDK
let 𝐶3 be a class that belongs to an unmodified module 𝑀3 (𝑀3 is an
                                                                                  module to load the implementation of 𝐶1 that belongs to 𝑀1 . Subsequently,
app-component), where 𝐶3 uses 𝐴2 , and this port is represented using the
                                                                                  C2RTS applies the jdeps technique with 𝑇 𝑀3 and 𝑀2 to find the test
statement "uses 𝐴2 " in the "module-info.java" file for 𝑀3 . Then,
                                                                                  classes of 𝑇 𝑀3 that depend on 𝐴2 . Let jdeps returned that a test class
the component 𝑀3 can utilize the java.util.ServiceLoader from
the java.base JPMS JDK module to load the implementation of 𝐶1 that               𝑇 𝐶3 belonging to 𝑇 𝑀3 depends on 𝐴2 . Then, C2RTS adds a directed edge
belongs to 𝑀1 . According to this dependency from component 𝑀3 to class           from node 𝑇 𝐶3 to node 𝐶1 in the EDG because 𝑇 𝐶3 can load 𝐶1 via the
𝐶1 , C2RTS adds a directed edge from node 𝑀3 to node 𝐶1 in the EDG.               class java.util.ServiceLoader.
    We explain how this rule is applied to the EDG nodes represented                 For example, in the module configuration files shown in Fig. 2,
in Fig. 4 given the module configuration files shown in Fig. 2. In the            the serviceProvider component provides the interface IS-
configuration files, the serviceProvider component provides                       tudent of the stuService component with the concrete classes
the interface IStudent of the stuService component with the                       Undergraduate and Graduate. Additionally, the test-component
concrete classes Undergraduate and Graduate. Additionally, the                    serviceProviderTest uses the IStudent interface as depicted
registration component uses the IStudent interface. These                         in Fig. 2. Therefore, C2RTS finds, using jdeps, which test classes
ports enable the component registration to load implementa-                       of serviceProviderTest depend on IStudent, and the jdeps
tions of the two concrete classes Graduate and Undergraduate.
                                                                                  returns that the test classes TestGraduate and TestUndergrad-
Therefore, two directed edges are added in the EDG from the node
                                                                                  uate depend on IStudent. Subsequently, directed edges are added
registration to the nodes Graduate and Undergraduate as
                                                                                  in the EDG from each of these test classes to the concrete classes
shown in Fig. 4.
                                                                                  Graduate and Undergraduate as depicted in Fig. 4.
Rule 5 (Requires Port from Unmodified to Modified App-Component).
Let 𝑀2 requires 𝑀1 , where this port is represented using the statement           Rule 8 (Requires Port from Test-Component to Modified App-Component).
"requires 𝑀1 " in the "module-info.java" file for 𝑀2 . Then, ac-                  Let 𝑇 𝑀1 be a test-component that requires the modified app-component 𝑀1 ,
cording to this dependency from an unmodified module (𝑀2 ) to a modified          such that this dependency is represented using the statement "requires
module (𝑀1 ), C2RTS uses the jdeps technique with 𝑀2 and 𝑀1 to find               𝑀1 " in the "module-info.java" file for 𝑇 𝑀1 . Then, based on this
the set of dependencies from classes belonging to 𝑀2 into classes belonging       dependency, C2RTS uses the jdeps technique with 𝑇 𝑀1 and 𝑀1 to find
to 𝑀1 . Let jdeps result included some dependencies from some class(es)           the set of dependencies from test classes belonging to 𝑇 𝑀1 into classes
of 𝑀2 to a class called 𝐶1 that belongs to 𝑀1 . Then, C2RTS adds a directed       belonging to 𝑀1 . From these dependencies, C2RTS extracts the names of
edge from node 𝑀2 to node 𝐶1 in the EDG.                                          the source and target classes and connect their corresponding nodes in the
                                                                                  EDG with the proper directed edges.
Rule 6 (Requires Port from Modified to Unmodified App-Component). Let
𝑀1 requires 𝑀2 , which is represented using the statement "requires 𝑀2 "
                                                                                  3.2.2. Mark modified nodes and select affected test cases
in the "module-info.java" file of 𝑀1 . According to this require
dependency from a modified module (𝑀1 ) to an unmodified module (𝑀2 ),                To mark the modified classes and compute the set of selected
C2RTS applies the jdeps technique with 𝑀1 and 𝑀2 to find the classes of           test cases, C2RTS applies the same steps explained previously in Sec-
𝑀1 that depend on classes of 𝑀2 . Let jdeps result included that a class 𝐶1       tions 3.1.2 and 3.1.3 with one difference. That is instead of marking
belonging to 𝑀1 depends on some class(es) belonging to 𝑀2 . Then, C2RTS           nodes representing modified components in the EDG, C2RTS marks
adds a directed edge from node 𝐶1 to node 𝑀2 in the EDG.                          the nodes that represent modified classes. Then, C2RTS computes the
   For example, in the modules’ configurations shown in Fig. 2, the               transitive closure of each test case to find all components and classes
app-component serviceProvider, which was identified by C2RTS                      that each test depends on. Thereafter, the set of impacted test cases
as modified, requires the unmodified app-component stuSer-                        whose transitive dependencies in the EDG include some changed type,
vice. Hence, the jdeps technique is applied with these two com-                   is returned by C2RTS as the output. For example, if the class Under-
ponents and returns that the classes Undergraduate and Grad-                      graduate is modified and marked in the EDG shown in Fig. 4, then
uate belonging to serviceProvider depend on the class ISu-                        the test cases TestUndergraduate, TestGraduate, and TestS-
dent that belongs to stuService. Consequently, directed edges are                 tuSchedule will be selected and returned as the output of C2RTS.

                                                                              7
M. Al-Refai and M.M. Hammad                                                                                        Journal of Systems Architecture 160 (2025) 103343


4. Experimental evaluation                                                             • RQ2: What is the precision violation w.r.t. Ekstazi of the proposed
                                                                                         approaches CORTS and C2RTS? Furthermore, how does this pre-
    The goal of the evaluation is to compare CORTS and C2RTS with the                    cision violation compare to the precision violation w.r.t. Ekstazi
state-of-the-art class-level RTS tools in terms of (1) safety violation, (2)             achieved by the static class-level RTS approach STARTS?
precision violation, (3) test suite reduction, (4) size of the dependency              • RQ3: What is the reduction in test suite size achieved by CORTS
graph that represents static dependencies from test cases to code en-                    and C2RTS?
tities, and (5) reduction in test selection and execution times. An RTS                • RQ4: How does the size of the static dependency graphs ex-
technique is safe if it does not miss any modification-traversing test                   tracted by CORTS and C2RTS compare to the size of the static
cases that should be selected for regression testing. An RTS technique is                dependency graph extracted by STARTS?
precise if it does not select non-modification traversing test cases. A test
                                                                                       • RQ5: What is the time taken by CORTS and C2RTS to construct
case is considered as a modification-traversing test case if it exercises
                                                                                         and analyze the static dependency graph to select relevant tests,
during its execution a modified, new, or previously removed code
                                                                                         and what is their overall end-to-end testing time?
statements. Only modification-traversing test cases can reveal faults in
the modified version of a software system, and hence, must be selected
for regression testing.                                                            4.2. Subjects
    We compared CORTS and C2RTS with two RTS tools, Ekstazi [12]
and STARTS [18]. They are both state-of-the-art for class-level RTS and                We evaluated CORTS and C2RTS using the 12 subjects listed in
have been widely evaluated on a large number of revisions of real world            Table 1. These are open-source real-world Java projects, which are
projects [17]. The class-level RTS process identifies changes at the class         known to be compatible with Ekstazi and STARTS since they were
level, instead of method or statement levels, and selects every test-class         widely used in their evaluation [12,17,18]. Table 1 shows for each
that traverses or depends on any changed class. Ekstazi uses dynamic               subject, the latest revision (i.e., most recent revision of the project) on
analysis and STARTS uses static analysis of compiled Java code. We
                                                                                   which our experiments started (SHA), the number of the source classes
compared CORTS and C2RTS with these class-level RTS approaches
                                                                                   (CLASSES) of the latest reversion, i.e., classes of the core program
because we aimed to investigate (1) how the safety can be improved
                                                                                   without counting test classes, the number of the source test classes
by raising the RTS granularity from class-level to component-level, (2)
                                                                                   (TESTS) of the latest reversion, number of recovered components of
how increasing the RTS granularity from class-level to component-level
                                                                                   the latest revision (COMPS), number of ports between the recovered
impacts the precision and test suite reduction, and (3) how increasing
                                                                                   components (PORTS), and the number of used revisions (REVS).
the RTS granularity reduces the size of the static component-level
dependency graph compared to the static class-level dependency graph.                  Converting the projects to equivalent component-based projects.
    In order to evaluate the safety and precision of CORTS and C2RTS,              It was not possible to evaluate CORTS and C2RTS using existing open-
we computed their safety violations and precision violations w.r.t. Ek-            source component-based Java applications, i.e., multi-module Java
stazi [12]. Ekstazi is a code-based RTS approach known to be safe in               applications developed using the JPMS capabilities. There are two
terms of selecting all the modification-traversing test classes, widely            main reasons for that. First, the great majority of existing open-source
evaluated on a large number of revisions, and being adopted by several             Object Oriented (OO) Java applications have not been converted to
popular open source projects; as such it can be considered the state-              component-based equivalent applications using JPMS. For example,
of-the-art for class-level dynamic RTS tools. Assuming that a program              as mentioned in Hammad et al. [22] after analyzing more than 1300
P, which has an original test suite T, was modified to a new version               open-source Java projects, they found that only 33 are utilizing JPMS
P’. Furthermore, assuming that two RTS approaches, RTS1 and Ekstazi                capabilities. This finding comports with the results reported in prior
were applied to select test cases from T based on the code modifications           work [29] as well. Second, even for the 33 existing component-based
to move the program from P to P’, such that RTS1 selected the set of               projects that utilize JPMS capabilities, the modules of each project are
test cases TRTS1 and Ekstazi selected the set of test cases TEkstazi . Then,       open to all the system, leading to a situation in which components
the safety violation of RTS1 w.r.t. Ekstazi, precision violation of RTS1           (i.e., modules) are granted more access than they need to function,
w.r.t. Ekstazi, and reduction in test suite size obtained by RTS1 are              and this violates the least-privilege architecture principle. Additionally,
defined as follows:                                                                these projects are relatively small in size, significantly smaller than
                                     |𝐓𝐄𝐤𝐬𝐭 𝐚𝐳𝐢 ∖ 𝐓𝐑𝐓𝐒𝟏 |                          those listed in Table 1, and were created for educational purposes,
𝑆 𝑎𝑓 𝑒𝑡𝑦𝑉 𝑖𝑜𝑙𝑎𝑡𝑖𝑜𝑛 𝑤.𝑟.𝑡. 𝐸 𝑘𝑠𝑡𝑎𝑧𝑖 =
                                     |𝐓𝐄𝐤𝐬𝐭 𝐚𝐳𝐢 ∪ 𝐓𝐑𝐓𝐒𝟏 |                          meaning they are not real-world component-based Java applications.
                                                                                   Therefore, we could not use these component-based projects to evaluate
                                          |𝐓𝐑𝐓𝐒𝟏 ∖ 𝐓𝐄𝐤𝐬𝐭 𝐚𝐳𝐢 |
𝑃 𝑟𝑒𝑐 𝑖𝑠𝑖𝑜𝑛𝑉 𝑖𝑜𝑙𝑎𝑡𝑖𝑜𝑛 𝑤.𝑟.𝑡. 𝐸 𝑘𝑠𝑡𝑎𝑧𝑖 =                                            CORTS and C2RTS.
                                          |𝐓𝐄𝐤𝐬𝐭 𝐚𝐳𝐢 ∪ 𝐓𝐑𝐓𝐒𝟏 |                         In order to overcome this challenge, we converted the OO Java
                                               |𝐓| ∖ |𝐓𝐑𝐓𝐒𝟏 |                      projects listed in Table 1 to equivalent component-based Java projects.
𝑇 𝑒𝑠𝑡 𝑠𝑢𝑖𝑡𝑒 𝑟𝑒𝑑 𝑢𝑐 𝑡𝑖𝑜𝑛 𝑜𝑏𝑡𝑎𝑖𝑛𝑒𝑑 𝑏𝑦 𝑅𝑇 𝑆 1 =                                       To do that, we leveraged the OO2CB [22] which utilizes the JPMS
                                                    |𝐓|
                                                                                   capabilities and converts an OO Java application to an equivalent
    The safety violation, precision violation, and test suite reduction            component-based Java application according to the least-privilege se-
are multiplied by 100 to make them percentages. Lower percentages                  curity principle. The OO2CB uses a component recovery framework
for safety violation, precision violation, and higher percentages for test         implemented by Garcia et al. [30], called ARCADE, to automatically de-
suite reduction are better [17,28]. The size of the static dependency              termine the components of an OO application. The ARCADE framework
graph is computed in terms of number of nodes and edges of the graph.              utilizes several well-known component recovery tools such as Architec-
                                                                                   ture Recovery using Concerns (ARC) [31], Bunch [32], and Algorithm
4.1. Research questions
                                                                                   for Comprehension-Driven Clustering (ACDC) [33]. OO2CB [22] takes
                                                                                   as inputs the suggested components provided by the ACDC tool and
   In this research, we try to answer the following Research Questions
                                                                                   the binary code of the OO Java application, and outputs the equivalent
(RQ):
                                                                                   component-based Java application that utilizes the JPMS features along
    • RQ1: What is the safety violation w.r.t. Ekstazi of the pro-                 with all the modules’ descriptors, i.e., the "module-info.java"
      posed static component-level RTS approaches CORTS and C2RTS?                 files, generated according to the least-privilege security principle.
      Furthermore, do CORTS and C2RTS reduce the safety viola-                         Selecting revisions. We downloaded the revisions of every subject
      tion w.r.t. Ekstazi (i.e., improve safety) compared to the static            among the 12 subjects listed in Table 1 using the methodology in Le-
      class-level RTS approach STARTS?                                             gunsen et al. [17]. First, we found the latest revision (specified by SHA

                                                                               8
M. Al-Refai and M.M. Hammad                                                                                                  Journal of Systems Architecture 160 (2025) 103343
                           Table 1
                           The Java projects used in our study.
                            Subject                               SHA           CLASSES            TESTS          COMPS       PORTS          REVS
                            commons-math                          96f2b16       864                485            59          1116           100
                            commons-configuration                 5de7c48       261                171            18          190            100
                            commons-compress                      a189697       201                105            22          190            100
                            commons-collections                   f9f99cc       351                230            23          251            100
                            commons-dbcp                          23f6717       60                 54             4           12             100
                            commons-io                            8d1b994       128                106            9           53             100
                            commons-lang                          82fd251       154                153            13          83             100
                            commons-validator                     e2edf6a       64                 76             3           6              100
                            commons-pool                          fde71c6       48                 26             5           11             100
                            JFreeChart                            86abdc8       638                344            33          443            100
                            jankotek.mapdb                        a333530       87                 61             11          43             100
                            OpenTripPlanner                       45c1a9f       1099               285            147         2724           100


               Table 2
               Average and median safety violation w.r.t. Ekstazi.
                 Subject                              A-SVCORTS %       M-SVCORTS %        A-SVC2RTS %     M-SVC2RTS %     A-SVSTARTS %       M-SVSTARTS %
                 commons-math                         0.0               0.0                0.04            0.0             0.58               0.0
                 commons-configuration                11.53             0.0                19.26           0.0             22.04              1.89
                 commons-compress                     0.0               0.0                0.0             0.0             0.0                0.0
                 commons-collections                  0.0               0.0                0.0             0.0             0.0                0.0
                 commons-dbcp                         0.0               0.0                0.11            0.0             0.56               0.0
                 commons-io                           0.0               0.0                0.0             0.0             0.0                0.0
                 commons-lang                         0.0               0.0                0.0             0.0             0.0                0.0
                 commons-validator                    0.14              0.0                1.57            0.0             6.19               0.0
                 commons-pool                         0.79              0.0                1.29            0.0             1.65               0.0
                 JFreeChart                           0.0               0.0                0.0             0.0             0.0                0.0
                 jankotek.mapdb                       0.0               0.0                3.04            2.12            3.32               3.22
                 OpenTripPlanner                      1.12              0.0                1.21            0.0             3.97               1.36

               A- or M-SVi is the average/median (per subject) safety violation of a tool (i.e., CORTS, C2RTS, or STARTS) with respect to Ekstazi.


in Table 1) that satisfied these conditions: (1) does not have a build or                  identifier, enabling users to retrieve the exact source code revision
compile error, (2) no test case failures, and (3) successfully ran with                    directly from the respective project’s GitHub repository. The following
STARTS and Ekstazi. Second, among all the revisions preceding SHA,                         subsections present and discuss the RTS results of all our experiments.
we selected up to a hundred revisions (including the SHA revision) that
met these conditions. The total number of selected revisions, for the 12                   4.3. RQ1: Safety violation
subjects, was 1200. These revisions met the prerequisites for Ekstazi
and STARTS: (1) Maven version 3.2.5 or above, (2) Surefire version
                                                                                               Table 2 shows for each subject the results of the median and average
2.14 or above, (3) JUnit version 3 or above, (4) Java version 1.8 or
                                                                                           safety violation w.r.t. Ekstazi achieved by CORTS, C2RTS, and STARTS.
above. We used OO2CB [22] to convert each of the 1200 revisions to
                                                                                           The median and average values are computed per subject among all the
an equivalent component-based version. Table 1 shows for each subject,
                                                                                           subject’s revisions. As shown in Table 2, CORTS and C2RTS achieved
the numbers of recovered components (COMPS) and ports (PORTS)
                                                                                           better results for safety violation compared to STARTS.
among the components of the latest subject’s revision used in our study.
                                                                                               The median safety violation obtained by CORTS was zero for all
    For each subject, starting from the oldest revision, among the hun-
                                                                                           the 12 subjects, while C2RTS had a value greater than zero for only
dred revisions, up to the most recent revision specified by SHA, we ran
                                                                                           one subject. For STARTS, the median safety violation was higher than
Ekstazi and STARTS techniques on the successive pairs of revisions,
                                                                                           zero for three subjects, i.e., 1.89% for commons-configuration
and ran CORTS and C2RTS on the corresponding component-based
                                                                                           and 3.22 for jankotek.mapdb. The average safety violation values
versions of these revisions. To identify changed classes between the
                                                                                           of CORTS and C2RTS were smaller than those for STARTS for 7 out
previous and current pair of revisions, Ekstazi compares the smart
                                                                                           of the 12 subjects, while all the RTS approaches achieved an average
checksums of the previous and current versions of each compiled Java
file (i.e., .class files). STARTS reuses the part of the Ekstazi source                    safety violation of zero for the remaining subjects. As it can be seen
code to compute smart checksums and identify changed classes in the                        in Table 2, CORTS reduced the average safety violation almost by half
same way. In order to ensure equitable comparisons with both Ek-                           from 22.04% to 11.53% for commons-configuration.
stazi and STARTS, we adhere to the same methodology for comparing                              The proposed approaches, CORTS and C2RTS, outperformed STARTS
smart checksums to identify changed classes, subsequently marking the                      in terms of safety violation because they compute dependencies from
components housing these classes as modified. In particular, the list of                   test cases to code entities at a higher level of granularity (i.e., component-
changed classes in STARTS can be generated by executing the Linux                          level) than STARTS. This component-level dependency analysis results
command-line STARTS: diff,4 and we utilized this command-line in                           in higher over-estimation of test dependencies compared to class-
our experiments to generate the list of classes that are identified as                     level test dependencies, in which more impacted (i.e., modification-
changed through smart checksum comparisons.                                                traversing) test cases are selected. In particular, the static analysis of
    The experimental dataset, which comprises the ACDC-recovered                           test dependencies at the component (or module) level rather than at the
architectures for all 1200 revisions of the 12 Java projects, is publicly                  class-level can lead to the identification of a broader set of potentially
available at https://github.com/mohammedrefai/RTS_ComponentLeve                            impacted test cases. This is due to the module-level analysis treating
l . Each revision’s files are labeled with their corresponding SHA                         all classes within a module as a single entity. By considering the
                                                                                           module as a unified unit, this approach inherently accounts for inter-
                                                                                           class interactions within the module including dynamic dependencies
  4
    STARTS provides the command-line option to list types identified as                    that involve reflection, even without explicitly tracking such dynamic
changed via smart checksum computation.                                                    dependencies. This holistic view increases the likelihood of capturing

                                                                                       9
M. Al-Refai and M.M. Hammad                                                                                                       Journal of Systems Architecture 160 (2025) 103343

               Table 3
               Average and median precision violation with w.r.t. Ekstazi.
                 Subject                               A-PVCORTS %      M-PVCORTS %         A-PVC2RTS %     M-PVC2RTS %       A-PVSTARTS %          M-PVSTARTS %
                 commons-math                          83.47            96.1                50.27           53.33             33.12                 25.0
                 commons-configuration                 59.68            64.57               45.64           59.19             24.54                 22.14
                 commons-compress                      75.41            86.36               62.54           79.09             53.37                 64.0
                 commons-collections                   58.54            95.21               18.89           0.0               7.15                  0.0
                 commons-dbcp                          57.41            61.29               30.05           31.81             18.18                 3.03
                 commons-io                            53.09            80.19               28.24           0.0               16.21                 0.0
                 commons-lang                          63.88            84.07               56.97           76.35             48.16                 63.84
                 commons-validator                     60.11            92.11               40.25           11.21             17.45                 10.71
                 commons-pool                          55.01            54.54               53.71           52.17             35.11                 26.66
                 JFreeChart                            49.88            73.17               42.21           0.0               32.33                 0.0
                 jankotek.mapdb                        70.29            77.38               31.89           20.41             29.02                 17.74
                 OpenTripPlanner                       87.69            91.42               87.67           91.41             73.26                 75.0

               A- or M-PVi is the average/median (per subject) precision violation of a tool (i.e., CORTS, C2RTS, or STARTS) with respect to Ekstazi.


                           Table 4
                           Reduction in test suite size.
                            Subject                                  A-RCORTS %              A-RC2RTS %             A-RSTARTS %             A-REkstazi %
                            commons-math                             10.98                   48.57                  76.27                   89.94
                            commons-configuration                    18.03                   32.51                  66.46                   77.03
                            commons-compress                         11.01                   24.07                  56.42                   86.42
                            commons-collections                      36.96                   77.96                  92.87                   95.51
                            commons-dbcp                             10.32                   41.74                  55.98                   67.72
                            commons-io                               36.64                   61.66                  82.05                   89.41
                            commons-lang                             26.19                   35.86                  53.32                   90.07
                            commons-validator                        31.52                   54.17                  90.45                   91.55
                            commons-pool                             19.77                   21.41                  50.03                   69.58
                            JFreeChart                               43.83                   51.12                  84.64                   93.32
                            jankotek.mapdb                           8.12                    52.56                  55.61                   70.29
                            OpenTripPlanner                          20.04                   20.06                  57.67                   89.67

                           A-RX is the average reduction (per subject) in test suite size achieved by an RTS approach X.


dependencies that might be overlooked when analyzing at the finer                           STARTS. On the other hand, the average/median precision violation
granularity of individual classes.                                                          values of C2RTS are smaller when compared with those yielded by
    Moreover, the average safety violation values of CORTS are smaller                      CORTS with a significant variance observed across most of the sub-
than those of C2RTS. This is because C2RTS mixes tracking dependen-                         jects. For example, the average and median precision violation val-
cies both between modules and within them at the class-level for the                        ues of CORTS are 58.54% and 95.21%, respectively, for the subject
modified modules, in which inter-class dynamic dependencies that are                        commons-collections. These values are reduced by C2RTS to
related to reflection are missed by C2RTS, resulting in missing impacted                    18.89% and 0.0%, respectively.
test cases that are captured by CORTS.                                                          C2RTS did make more mistakes in choosing irrelevant test cases
    It is essential to acknowledge that the component recovery tools uti-                   compared to STARTS, but the precision violation yielded by C2RTS was
lized, namely ACDC and OO2CB, are based on static analysis and do not                       not too far from that provided by STARTS. For 8 out of the 12 subjects,
detect the dynamic class dependencies or communications associated                          C2RTS was, on average, only up to 13% less accurate than STARTS. For
with dynamic class loading and reflection. Consequently, the resultant                      the remaining subjects, the difference went up to 21%. Interestingly, in
component-based applications in our experimentation lack the ‘‘opens                        3 out of the 12 subjects, C2RTS had a median precision violation of 0%.
with’’ directive within the generated ‘‘module-info.java’’ files. Conse-
                                                                                            4.5. RQ3: Test suite reduction
quently, CORTS and C2RTS overlooked impacted test cases, resulting in
safety violation values higher than zero for some of the subjects as seen
                                                                                                Table 4 shows for each subject the average reduction in test suite
in Table 2. We anticipate that CORTS and C2RTS will yield diminished
                                                                                            size achieved by CORTS, C2RTS, STARTS, and Ekstazi. The average
safety violation values, potentially zero or near-zero, provided that
                                                                                            values are computed per subject among all the subject’s revisions.
reflection-related dependencies are comprehensively captured and rep-
                                                                                            The four RTS approaches achieved reduction in test suite size. The
resented within the ‘‘module-info.java’’ files of the evaluation subjects.
                                                                                            average reduction in test suite size overall the 12 subjects was 22.78%
This would entail modifications to ACDC and OO2CB to accurately cap-
                                                                                            for CORTS, 43.47% for C2RTS, 68.48% for STARTS, and 84.21% for
ture and represent reflection-related dependencies within the recovered
                                                                                            Ekstazi. The highest reduction was yielded by Ekstazi since it is a
component-based applications. We plan to investigate this direction in                      dynamic approach.
the future.                                                                                     It is evident that (1) both CORTS and C2RTS achieved a reduction
                                                                                            for all the subjects even though they perform RTS at a higher level
4.4. RQ2: Precision violation                                                               of granularity than STARTS, and (2) C2RTS increased the reduction
                                                                                            compared to CORTS from 22.78% to 43.47% on average since it tracks
   Table 3 shows, for each subject, the results of the median and                           dependencies within the modified components at the class-level. More-
average precision violation w.r.t. Ekstazi achieved by CORTS, C2RTS,                        over, C2RTS achieved high reduction by more than 50% on average
and STARTS. The median and average values are computed per subject                          for 5 out of the 12 subjects, and a reduction by more than 40% on
among all the subject’s revisions.                                                          average for 2 other subjects. Furthermore, the comparative analysis
   The average and median safety violations of CORTS and C2RTS are                          with STARTS reveals that C2RTS maintains a competitive edge, with
higher than those of STARTS. This is because CORTS and C2RTS com-                           the difference in average test suite reduction between C2RTS and
pute test dependencies at a higher levels of granularity than STARTS                        STARTS not surpassing 21% for 5 subjects and remaining below 38%
and have higher overestimation of impacted test cases than that of                          across all the 12 subjects.

                                                                                       10
M. Al-Refai and M.M. Hammad                                                                                                    Journal of Systems Architecture 160 (2025) 103343

               Table 5
               Dependency graph size.
                 Subject                             NODESCORTS          EDGESCORTS         NODESC2RTS     EDGESC2RTS         NODESSTARTS        EDGESSTARTS
                 commons-math                        503                 4391               567            5090               2099               12 689
                 commons-configuration               190                 1532               284            2470               827                4743
                 commons-compress                    147                 933                219            1713               547                2299
                 commons-collections                 202                 1396               236            1763               907                3536
                 commons-dbcp                        36                  147                126            684                178                711
                 commons-io                          109                 554                154            881                336                1017
                 commons-lang                        167                 982                227            1537               746                2252
                 commons-validator                   77                  208                142            581                179                592
                 commons-pool                        27                  108                124            574                208                748
                 JFreeChart                          373                 2594               468            4298               1033               7092
                 jankotek.mapdb                      197                 876                494            4600               1281               7342
                 OpenTripPlanner                     432                 5335               698            8910               2884               15 479

               NODESX or EDGESX is the average number of nodes or edges in the dependency graph that was constructed by an RTS approach (X ).


                           Table 6
                           Dependency graph size reduction ratios of CORTS and C2RTS with respect to STARTS.
                            Subject                               R_NODESCORTS             R_EDGESCORTS        R_NODESC2RTS            R_EDGESC2RTS
                            commons-math                          4.17                     2.89                3.70                    2.49
                            commons-configuration                 4.35                     3.10                2.91                    1.92
                            commons-compress                      3.72                     2.46                2.50                    1.34
                            commons-collections                   4.49                     2.53                3.84                    2.01
                            commons-dbcp                          4.94                     4.84                1.41                    1.04
                            commons-io                            3.08                     1.84                2.18                    1.15
                            commons-lang                          4.47                     2.29                3.29                    1.47
                            commons-validator                     2.32                     2.59                1.26                    1.02
                            commons-pool                          7.56                     6.91                1.66                    1.31
                            JFreeChart                            2.76                     2.73                2.21                    1.65
                            jankotek.mapdb                        6.48                     8.37                2.59                    1.58
                            OpenTripPlanner                       6.67                     2.91                4.12                    1.73

                           R_NODESX /R_EDGESX is the average reduction ratio of nodes/edges of class-level dependency graph achieved by an RTS
                           approach (X ).


4.6. RQ4: Reduction in dependency graph size                                                memory. Furthermore, this efficiency in graph size management is
                                                                                            particularly beneficial in cloud-based Continuous Integration (CI) envi-
    Table 5 shows – for each subject – the average number of nodes                          ronments, where resource and memory consumption directly influences
and edges of the static dependency graphs extracted by CORTS, C2RTS,                        costs, suggesting that such optimizations can result in economical
and STARTS. The average values are computed per subject among all                           advantages.
the subject’s revisions. It is evident that CORTS and C2RTS generated
dependency graphs of smaller sizes compared to STARTS.                                      4.7. RQ5: Selection phase and end-to-end testing times
    Table 6 shows, for each subject, the average size reduction ratio of
the dependency graphs extracted by CORTS and C2RTS with respect to                              The end-to-end execution time of an RTS approach includes two
the dependency graph extracted by STARTS. The size reduction ratio                          main phases, which are: (1) the selection phase that analyzes what test
is computed separately for nodes and edges as follows. For a specific                       cases to select, and (2) the execution phase that runs the selected test
revision of a subject, the size reduction ratio for nodes/edges achieved                    cases. For static RTS approaches, the selection phase time consists of
by CORTS/C2RTS is computed as the number of nodes/edges of the                              the time taken to construct the static dependency graph, read adapted
graph extracted by STARTS divided by the number of nodes/edges of                           classes and flag them in the graph, and analyze (i.e., traverse) the graph
the graph extracted by CORTS/C2RTS.                                                         to select relevant test cases. Table 7 reports the selection phase time
    Referring to the data in Table 6, CORTS achieved an average                             for CORTS (SELECTCORTS ), C2RTS (SELECTC2RTS ), and static class-level
reduction in the STARTS dependency graph node count by a factor                             RTS (SELECTSTARTS-like ), as well as the end-to-end time for CORTS
starting from 4 up to 7 for 8 of the subjects, and by a factor of                           (E2ECORTS ), C2RTS (E2EC2RTS ), and STARTS (E2ESTARTS ). Table 7 also
approximately 3 for the remaining subjects. On the other hand, C2RTS                        presents TESTAll, which is a strategy that just runs all test cases
achieved an average reduction in the STARTS dependency graph node                           without performing any RTS analysis. We use the TESTAll strategy
count by a factor higher than 2 (i.e., ranging from 2.18 to 4.12) for                       time as the baseline and compared the end-to-end times of the RTS
9 subjects out of the 12 subjects. Furthermore, CORTS achieved an                           approaches with it. Table 7 displays, per subject, the overall cumulative
average reduction in the STARTS dependency graph edge count by                              time for all the 100 revisions of the subject.
factors ranging approximately from 2 up to 8 for 11 subjects, while                             It is important to mention that for static class-level RTS, we did not
C2RTS achieved an average reduction in edge count by factors ranging                        separately measure the selection phase time (i.e., SELECTSTARTS-like time
from 1.02 up to 2.49.                                                                       in Table 7) using STARTS. Instead, we developed a STARTS-like tool
    The results presented in Table 6 are encouraging and indicating that                    that functions similarly to STARTS by using jdeps to extract class
CORTS and C2RTS are effective in minimizing the static dependency                           dependencies and building a class-level dependency graph. However,
graph size compared to class-level RTS techniques. This capability                          the STARTS-like tool utilizes JGraphT for graph construction and
is crucial and presents significant implications for several reasons.                       analysis, whereas STARTS uses the custom, faster yasgl library [34].
First, the reduced complexity of dependency graphs makes our RTS                            To ensure a fair comparison, we compared the selection phase time of
approaches more scalable to very large applications such as enterprise-                     CORTS and C2RTS with STARTS-like, since all three use JGraphT
level applications with extensive codebases. Second, smaller graphs                         for graph operations. Additionally, STARTS does not provide specific
require less computational resources for analysis and consume less                          commands to report the exact selection phase time separately from

                                                                                      11
M. Al-Refai and M.M. Hammad                                                                                                    Journal of Systems Architecture 160 (2025) 103343

Table 7
Selection phase time and end-to-end testing time in seconds.
 Subject                              SELECTCORTS          SELECTC2RTS         SELECTSTARTS-like        TESTAll            E2ECORTS           E2EC2RTS            E2ESTARTS
 commons-math                         4.153                11.859              23.209                   11,042.579         9572.399           6023.054            3664.108
 commons-configuration                1.445                6.283               10.329                   2579.310           2132.351           1826.158            1592.784
 commons-compress                     0.871                1.819               4.263                    819.969            730.118            621.127             572.821
 commons-collections                  1.206                2.586               8.957                    1287.695           809.505            259.606             124.392
 commons-dbcp                         0.273                0.526               1.237                    8252.194           7466.198           5032.529            4653.431
 commons-io                           0.436                1.119               2.275                    4975.982           3106.881           2123.687            1828.807
 commons-lang                         0.852                1.583               3.781                    1621.582           1131.397           1067.314            776.306
 commons-validator                    0.251                0.569               0.908                    168.435            117.999            82.979              38.188
 commons-pool                         0.252                0.456               0.809                    31,183.825         26,125.138         26,026.082          24,691.189
 JFreeChart                           2.235                10.406              38.091                   538.722            305.523            274.989             149.916
 jankotek.mapdb                       1.195                9.279               13.849                   56,929.985         54,353.748         40,567.913          38,828.116
 OpenTripPlanner                      8.167                20.198              45.879                   17,672.936         17,318.823         17,330.737          14,778.681

SELECTX is the summation (per subject) of the overall execution time taken by an RTS approach (X ) for the test selection process (i.e., construct and analyze dependency graph
to select tests).
E2EX is the summation (per subject) of the overall end-to-end execution time taken by an RTS approach (X ).
TESTAll is the summation (per subject) of the overall time taken to just run all test cases.


other RTS phases and operations, e.g., computing and storing smart                         that fewer impacted test cases are missed, reducing safety violations.
checksums. For the end-to-end time, we reported the time taken by                          However, this comes at the expense of increased test suite size and
STARTS (i.e., E2ESTARTS time in Table 7) instead of the STARTS-like                        higher precision violations, as non-impacted test cases may also be
tool.                                                                                      selected. On the other hand, STARTS operates at the class level, and
     The CORTS had the shortest selection phase time across all the 12                     thus, achieves higher test suite reduction and precision, minimizing
subjects, followed by C2RTS, with STARTS-like taking the longest. For                      the selection of unnecessary test cases. However, this finer granularity
instance, in the JFreeChart project, CORTS took 2.235 s, C2RTS took                        can lead to higher safety violations compared to component-level RTS.
10.4 s, and class-level RTS took 38.09 s. By averaging the selection phase                 This is because static class-level RTS may miss dynamic dependencies
time across the 12 subjects, we found that the overall average selection                   related to reflection and dynamic class loading. In contrast, CORTS
phase time was 1.77 s for CORTS, 5.56 s for C2RTS, and 12.79 s for                         and C2RTS can account for such dependencies as they are explicitly
STARTS. This is because the dependency graphs for CORTS and C2RTS                          declared in the module-info.java files.
are smaller compared to the static class-level dependency graph. The                           RTS execution time. The dependency graph construction and
reported selection phase times suggest that CORTS and C2RTS scale                          analysis time for CORTS and C2RTS is significantly shorter than for
better for larger graphs, requiring less time to construct and analyze                     STARTS. This improvement is due to the smaller size of component-
dependency graphs for test case selection.                                                 level dependency graphs. However, the time spent on dependency
     All three RTS approaches, i.e., CORTS, C2RTS, and STARTS, reduced                     graph processing constitutes a minor fraction of the overall end-to-
the overall end-to-end testing time compared to the TESTAll baseline                       end testing time, which is dominated by test execution. Consequently,
across all 12 subjects. For example, in the commons-pool project,                          STARTS significantly outperformed CORTS and C2RTS in terms of the
which comes with long running JUnit test cases, the TESTAll took                           overall end-to-end testing time, as it obtained higher reduction in test
31,183 s for running all test cases of the subject, which is the total                     suite size and smaller precision violations.
time summed-up for all the 100 revisions of this subjects, while the                           While dependency graph efficiency does not drastically impact total
overall end-to-end testing time was 26,125 s for CORTS, 26,026 s                           end-to-end testing time, it plays a crucial role in continuous integration
for C2RTS, and 24,691 for STARTS. For all the subjects, STARTS had                         (CI) environments by enabling faster feedback cycles for developers.
the shortest end-to-end time due to its highest reduction in the test                      Rapid test selection allows for immediate identification of impacted
suite size, followed by C2RTS, while CORTS took the longest time. By                       tests, reducing delays in iterative development workflows.
averaging the end-to-end testing time across the 12 subjects, we found                         Scenarios for Class- versus Component-level RTS. The choice of
that the overall average time was 11,422.76 s for TESTALL, 10,264.17 s                     RTS approach depends on the application’s context and requirements.
for CORTS, 8436.34 s for C2RTS, and 7641.56 s for STARTS. Despite                          For example, class-level RTS is preferable in resource-constrained en-
that CORTS had the longest time, it still showed reduction in the end-                     vironments where test execution cost and time are critical, e.g., mo-
to-end testing time compared to the TESTAll strategy, indicating its                       bile app development pipelines. It is also preferable for applications
practical value in regression testing. C2RTS achieved better results than                  with frequent but small changes where the likelihood of missing im-
CORTS in reducing the end-to-end time, showing that such a hybrid                          pacted test cases is minimal, such as utility libraries or microservices
RTS technique can provide balancing between component- and class-                          with isolated functionality. On the other hand, component-level RTS
level, where it achieves reduction in the regression testing time while                    (e.g., CORTS and C2RTS) can be more preferable in safety-critical do-
still maintaining high safety and scalability, making it suitable for large                mains where ensuring comprehensive test coverage outweighs reducing
JPMS-based programs where balance between performance and safety                           regression testing execution time, such as in component-based adaptive
is critical.                                                                               systems with fault tolerance mechanisms [35], aircraft [36], aerospace
                                                                                           or other safety-critical systems [37,38]. Additionally, component-level
4.8. Results discussion                                                                    RTS can be more appropriate for large-scale, monolithic enterprise
                                                                                           systems with complex interdependencies across components, and we
    Balancing metrics across RTS approaches. The evaluation re-                            plan to investigate this direction in the future.
sults highlight a trade-off between key regression test selection (RTS)
metrics: safety violation, precision violation, test suite reduction, and                  5. Threads to validity
end-to-end testing time. Specifically, STARTS outperforms CORTS and
C2RTS in terms of precision violation, test suite reduction, and test                         External validity. External validity affects the generalizability of
execution time, while CORTS and C2RTS achieve lower safety violation                       our results. One external validity threat is the use of 12 Java projects
rates.                                                                                     which might not be representative, so our results may not generalize.
    CORTS and C2RTS emphasize safety by employing a coarser granu-                         However, the selected subjects are widely used to evaluate RTS ap-
larity (component-level) in dependency analysis. This strategy ensures                     proaches [12,17–19,39], vary in size, application domain, and number

                                                                                      12
M. Al-Refai and M.M. Hammad                                                                                        Journal of Systems Architecture 160 (2025) 103343


of test classes, which reduces this threat. Additionally, the results could        Each node in a CFG represents a simple or conditional statement, and
differ for larger Enterprise Resource Planning (ERP) systems, but we               each edge represents the flow of control between statements. Entities
anticipate that component-level RTS would be even more scalable and                affected by modifications are selected by traversing in parallel the CFGs
reusable in such cases compared to class-level RTS approaches. We plan             of P and P’, and when the target entities of like-labeled CFG edges in
to investigate this direction in the future.                                       P and P’ differ, the edge is added to the set of affected entities. There-
     Another threat to the external validity of our experimental results           after, Rothermel and Harrold extended the CFG-based algorithm for
is the use of OO2CB tool which uses the ACDC tool to determine                     C++ using Inter-procedural Control-Flow Graphs (ICFG) [43]. Harrold
the components of OO Java projects. Therefore, OO2CB inherits the                  et al. [44] further extended the CFG approach for Java software using
shortcoming of the ACDC in determining components of OO Java                       the Java Inter-class Graph (JIG) to handle Java features and incomplete
projects. Using other component recovery tools, such as Architecture               programs.
Recovery using Concerns (ARC) [31], Bunch [32], or Weighted Com-                       Vokolos and Frankl [45] consider RTS based on text differencing
bined Algorithm (WCA) [40], could change the RTS results. To reduce                using the Unix diff command. The approach compares the original
this threat, we leveraged the best-performing component recovery tool,             code version with the modified version to identify the modified state-
i.e., ACDC [33], as concluded in prior comparative studies of recovery             ments, and selects test cases that exercise code blocks containing these
techniques [30].                                                                   statements.
     Internal validity. An internal factor that can affect the outcome                 To improve the efficiency of dynamic RTS, a number of techniques
is the possible errors in the implementations of CORTS and C2RTS.                  at coarser granularity (e.g., method- or class-level) rather than the
To mitigate this threat, we built our implementation on mature tools               finer granularity CFG level (e.g., statement-level) were proposed. Ren
(i.e., JGraphT [41] and jdeps [27]) and tested it thoroughly.                      et al. [46] and Zhang et al. [47] applied change-impact analysis at the
     Another threat to internal validity is the use of the OO2CB tool [22]         method-level, based on call graphs techniques, to improve RTS. Recent
to generate the (JPMS) component-based projects from the subjects                  RTS approaches [12,18,19] were proposed to make RTS more cost-
according to the least-privilege security principle. The threat is related         effective in modern software systems by focusing on class-level RTS that
to the false positives caused by the static analysis of class dependencies.        (1) identifies changes at the class level and (2) computes dependencies
Static analysis results may overestimate the communications between                from test cases to the classes under test. Additionally, these approaches
components which might lead to granting a component more privileges                consider a test class as a test case, and thus, select test classes instead
than it needs in the resulted CB app. Consequently, this could impact
                                                                                   of test methods. Gligoric et al. [12] proposed Ekstazi, an approach that
the results of our experiments, especially the precision and reduction
                                                                                   tracks dynamic dependencies of test cases at the class level and selects
in test suite size. However, OO2CB uses the BCEL tool [42], a widely-
                                                                                   test cases that traverse modified classes. Ekstazi is a safe RTS approach,
used library in the industry to analyze Java apps, which mitigates this
                                                                                   and its safety is based on the formally proven safety of the change-based
threat. Furthermore, OO2CB defines all port types in the component-
                                                                                   RTS approach [48]. Zhang [19] proposed HyRTS, which is a dynamic
based applications except the ones responsible for Java reflection and
                                                                                   and hybrid approach that supports analyzing the adapted classes at
dynamic class loading techniques, i.e., the open and opens with
                                                                                   multiple granularity levels (i.e., method and class levels) to improve
ports. This limitation impacted the safety violation results yielded by
                                                                                   the precision and selection time. Running HyRTS using the class-level
CORTS and C2RTS. We expect that these results could be even better if
                                                                                   mode produces the same RTS results as Ekstazi [19]. Thus, HyRTS was
CORTS and C2RTS are applied to component-based applications that
                                                                                   not considered in our experimental evaluation.
utilize the opens with port to denote dependencies deriving from
                                                                                       While these dynamic RTS techniques can be safe, they require dy-
reflection and dynamic class loading among components. This area will
                                                                                   namic test coverage information which may be absent, costly to collect,
be a focus of our future research endeavors.
                                                                                   or require prohibitive instrumentation (e.g., for non-deterministic or
     Construct validity. In general, we could have used other metrics
                                                                                   real-time code). On the other hand, our proposed approaches, CORTS
(e.g., test coverage and fault detection ability) to evaluate the effec-
                                                                                   and C2RTS are static, but still can capture runtime information repre-
tiveness of CORTS and C2RTS. However, we used the most common
                                                                                   sented in the module descriptors (i.e., module-info.java file) such as de-
metrics in the research literature: safety violation, precision violation,
reduction in test suite size, and reduction in end-to-end test execution           pendencies related to dynamic class loading and reflection represented
time. We also used the reduction in dependency graph size as measure               using the opens with directive.
to evaluate the scalability of CORTS and C2RTS to large subjects.                      Static RTS. Kung et al. [49,50], Hsia et al. [51], and White and Ab-
     Another threat to construct validity is that we chose Ekstazi as              dullah [25] proposed firewall-based approaches. The firewall contains
the ground truth against which to evaluate the static RTS techniques,              the changed classes and their dependent classes, where the dependent
e.g., we computed the safety and precision violations with respect to              classes are identified based on static analysis. Test cases that traverse
Ekstazi. Although Ekstazi is a state-of-the-art, and is recognized as              classes in the firewall are selected. Jang et al. [52] apply firewall-
a leading and accessible dynamic class-level RTS tool, it might not                based RTS at the method level to C++ software. They identify firewalls
encompass the entirety of benchmarks for all RTS scenarios.                        around all the methods affected by a change and select all the test cases
     Conclusion Validity. We only used 12 subjects to evaluate                     exercising these methods for regression testing. Ryder and Tip [53]
CORTS and C2RTS. The use of additional subjects could affect the                   proposed a call-graph-based static change-impact analysis technique
conclusions of the evaluation. To reduce this threat, we used large real-          and evaluated only one call-graph analysis on 20 revisions of one
world Java projects that have been used in other experiments for fair              project [54]. Skoglund and Runeson’s [48] proposed a change-based
comparison.                                                                        approach that only selects those test cases that exercise the changed
                                                                                   classes. ChEOPSJ [55,56] is a static change-based approach that uses
6. Related work                                                                    the FAMIX model to represent software entities including test cases
                                                                                   and building dependencies between them. These approaches use fine-
    RTS can reduce regression testing efforts and has been studied for             grained information such as constructor calls and method invocation
over three decades [14,15]. Below we summarize the existing dynamic                statements to build dependencies between software entities.
and static RTS approaches.                                                             Legunsen et al. [17,18] proposed STARTS, which is a static RTS
    Dynamic RTS. Many graph-walk approaches address the problem                    approach that is based on the idea of the class-level firewall. STARTS
of RTS. Rothermel and Harrold [2] propose a safe approach for RTS for              builds a dependency graph of program types based on compile-time
procedural programs. The algorithm uses control-flow graphs (CFG) to               information, and selects test cases that can reach changed types in the
represent each procedure in a program P and its modified version P’.               transitive closure of the dependency graph. Yu et al. [16] evaluated

                                                                              13
M. Al-Refai and M.M. Hammad                                                                                                  Journal of Systems Architecture 160 (2025) 103343


method-level and class-level static RTS in continuous integration en-                   that component-level RTS can be more scalable for large-scale projects.
vironments. Class-level RTS was determined to be more practical and                     Additionally, C2RTS demonstrated better precision than CORTS, thus
time-saving than method-level RTS.                                                      balancing between safety and precision while still reducing the size of
    Gyori et al. [20] compared variants of dynamic and static class-                    the static dependency graph compared to static class-level RTS. Both,
level RTS with project-level RTS in the Maven Central open source                       CORTS and C2RTS, reduced the end-to-end testing time in comparison
ecosystem. An ecosystem may contain a large number interconnected                       to running all test cases without performing RTS.
projects, where client projects transitively depend on library projects.                    We plan to extend the application of CORTS and C2RTS to large-
Project-level RTS identifies changes at the project level and computes                  scale enterprise Java systems. Furthermore, it is critical to acknowledge
dependencies from test cases to projects. When a library changes, then                  that the component recovery tools used in our experimental evalua-
all test cases in the library and all test cases in all the library’s transitive        tions, such as ACDC, do not possess the capability to detect dynamic
clients are selected. Class-level RTS was found to be less costly than                  dependencies, such as those involving reflection, and consequently, do
project-level RTS in terms of reduction in test suite size.                             not incorporate these dependencies into the module descriptor files.
    Shi et al. [8] focused on optimizing RTS in continuous integra-                     Moving forward, we plan to explore and experiment with alternative
tion (CI) environments. They compared module- and class-level RTS                       component recovery tools that can better capture dynamic dependen-
techniques in the Travis cloud-based CI environment, and developed                      cies and reflection. Additionally, we plan to explore the application
a hybrid RTS technique, called GIBstazi, that combines aspects of                       of our RTS approaches in the context of Java 9 modular applications
the module- and class-level RTS techniques. Their work focuses on                       when treating the modularized Java Runtime Environment (JRE) and
Maven modules (i.e., build-system modules) utilizing techniques like                    third-party libraries, along with their dependencies, as part of the CB
the Git Inferred Build (GIB) to optimize test selection based on module                 application. This investigation aims to evaluate the impact of reduced
dependencies determined by the build system (i.e., focusing on build-                   runtime size, e.g., only including the required modules of the JRE and
time dependencies). While the work of Shi et al. [8] is more aligned                    third-party libraries, on the RTS performance.
with multi-module Java applications structured with the Maven build
system, our approaches, CORTS and C2RTS, utilize JPMS modules that                      CRediT authorship contribution statement
emphasize dependencies -including runtime dependencies specified us-
ing the opens with directive- and encapsulation according to the                           Mohammed Al-Refai: Writing – review & editing, Writing – origi-
least privilege concept. Although we did not empirically compare the                    nal draft, Visualization, Validation, Supervision, Software, Resources,
precision of JPMS-module level RTS to build-system module level RTS,                    Project administration, Methodology, Investigation, Formal analysis,
we anticipate that the latter may be less precise in detecting affected                 Data curation, Conceptualization. Mahmoud M. Hammad: Writing –
tests due to the broader scope of build-time dependencies. On the                       review & editing, Visualization, Validation, Resources, Formal analysis.
other hand, JPMS-based RTS could potentially offer more precise and
safer test selection due to the explicit module dependencies and en-                    Declaration of competing interest
capsulation provided by JPMS. As a future work, we plan to transform
multi-module maven-based Java applications into their JPMS-utilizing                        The authors declare that they have no known competing finan-
counterparts to further evaluate the efficacy of CORTS and C2RTS in                     cial interests or personal relationships that could have appeared to
such environments.                                                                      influence the work reported in this paper.
    Overall, CORTS and C2RTS are similar to the described static RTS
approaches in terms of applying the firewall impact analysis tech-
                                                                                        References
nique, but at the module-level rather than the class- and method-levels.
However, unlike the existing static RTS techniques, our proposed ap-                     [1] A. Bertolino, Software testing research: Achievements, challenges, dreams,
proaches can capture runtime information that are explicitly included                        in: 2007 Future of Software Engineering, IEEE Computer Society, 2007
in the module descriptor files.                                                              pp. 85–103.
                                                                                         [2] G. Rothermel, M.J. Harrold, A safe, efficient regression test selection technique,
                                                                                             ACM Trans. Softw. Eng. Methodol. 6 (2) (1997) 173–210.
7. Conclusions and future work
                                                                                         [3] M.J. Harrold, Testing evolving software, J. Syst. Softw. 47 (2–3) (1999) 173–181.
                                                                                         [4] H.K.N. Leung, L.J. White, Insights into regression testing, in: Proceedings of
    As software systems become increasingly complex and large, espe-                         Conference on Software Maintenance, IEEE, Miami, FL, USA, 1989, pp. 60–69.
cially with the implementation of the Java Platform Module System                        [5] P.K. Chittimalli, M.J. Harrold, Recomputing coverage information to assist
(JPMS), traditional regression test selection (RTS) techniques at the                        regression testing, IEEE Trans. Softw. Eng. 35 (4) (2009) 452–469.
method and class levels often face challenges in efficiency and resource                 [6] E. Engström, P. Runeson, A qualitative survey of regression testing practices,
management. This research was driven by the desire to refine RTS                             in: International Conference on Product Focused Software Process Improvement,
for Java applications modularized with JPMS. This research leverages                         Springer, 2010, pp. 3–16.
                                                                                         [7] R. Greca, B. Miranda, A. Bertolino, State of practical applicability of regression
component-level granularity and provides a substantial foundation for
                                                                                             testing research: A live systematic literature review, ACM Comput. Surv. 55 (13s)
advancing RTS practices tailored to modern Java applications, pre-
                                                                                             (2023) 1–36.
senting a strong case for the adoption of component-level analysis in                    [8] A. Shi, P. Zhao, D. Marinov, Understanding and improving regression test
professional and large-scale development environments.                                       selection in continuous integration, in: 2019 IEEE 30th International Symposium
    We introduced two novel static component-based RTS approaches,                           on Software Reliability Engineering, ISSRE, IEEE, 2019, pp. 228–238.
CORTS and its variant C2RTS, tailored for component-based Java soft-                     [9] W. Sun, X. Xue, Y. Lu, J. Zhao, M. Sun, Hashc: Making deep learning coverage
ware systems modularized with JPMS. CORTS constructs a module-                               testing finer and faster, J. Syst. Archit. 144 (2023) 102999.
level dependency graph using architectural metadata from module                         [10] Y. Lu, K. Shao, J. Zhao, W. Sun, M. Sun, Mutation testing of unsupervised
descriptor files to determine the impact of changes and select relevant                      learning systems, J. Syst. Archit. 146 (2024) 103050.
                                                                                        [11] Testing at the speed and scale of Google, 2011, http://google-engtools.blogspot.
test cases. C2RTS extends this by incorporating class-level analysis for
                                                                                             com/2011/06/testing-at-speed-and-scale-of-google.html.
modified modules, offering a hybrid approach that balances granu-
                                                                                        [12] M. Gligoric, L. Eloussi, D. Marinov, Practical regression test selection with
larity to improve precision while maintaining safety. Our evaluation                         dynamic file dependencies, in: Proceedings of the 2015 International Symposium
of CORTS and C2RTS on real-world software systems demonstrated                               on Software Testing and Analysis, ISSTA’15, ACM, Baltimore, MD, USA, 2015,
improvements, in terms of safety, over static class-level RTS paradigms.                     pp. 211–222.
Additionally, both CORTS and C2RTS reduced the dependency graph                         [13] L.C. Briand, Y. Labiche, S. He, Automating regression test selection based on
size compared to static class-level RTS, thus, providing an evidence                         UML designs, J. Inf. Softw. Technol. 51 (1) (2009) 16–30.


                                                                                   14
M. Al-Refai and M.M. Hammad                                                                                                          Journal of Systems Architecture 160 (2025) 103343


[14] E. Engström, P. Runeson, M. Skoglund, A systematic review on regression test              [40] O. Maqbool, H. Babri, Hierarchical clustering for software architecture recovery,
     selection techniques, Inf. Softw. Technol. 52 (1) (2010) 14–30.                                IEEE Trans. Softw. Eng. 33 (11) (2007) 759–780.
[15] S. Yoo, M. Harman, Regression testing minimization, selection and prioritization:         [41] B. Naveh, J.V. Sichi, JGraphT a free Java graph library, 2011.
     A survey, J. Softw. Test. Verif. Reliab. 22 (2) (2012) 67–120.                            [42] BCEL documentation available at http://jakarta.apache.org/bcel/.
[16] T. Yu, T. Wang, A study of regression test selection in continuous integration en-        [43] G. Rothermel, M.J. Harrold, J. Dedhia, Regression test selection for C++
     vironments, in: S. Ghosh, R. Natella (Eds.), Proceedings of the 29th International             software, Softw. Test. Verif. Reliab. 10 (2) (2000) 77–109.
     Symposium on Software Reliability Engineering, ISSRE’18, IEEE, Memphis, TN,               [44] M.J. Harrold, J.A. Jones, T. Li, D. Liang, A. Orso, M. Pennings, S. Sinha, S.A.
     USA, 2018, pp. 135–143.                                                                        Spoon, A. Gujarathi, Regression test selection for Java software, in: J. Vlissides
[17] O. Legunsen, F. Hariri, A. Shi, Y. Lu, L. Zhang, D. Marinov, An extensive study                (Ed.), Proceedings of the 16th Conference on Object-Oriented Programming,
     of static regression test selection in modern software evolution, in: J. Cleland-              Systems, Languages, and Applications, OOPSLA’01, ACM, Tampa, FL, USA, 2001,
     Huang, Z. Su (Eds.), Proceedings of the 2016 24th ACM SIGSOFT International                    pp. 312–326.
     Symposium on Foundations of Software Engineering, FSE’16, ACM, Seattle, WA,               [45] F. Vokolos, P.G. Frankl, Empirical evaluation of the textual differencing re-
     USA, 2016, pp. 583–594.                                                                        gression testing technique, in: Proceedings of the International Conference on
[18] O. Legunsen, A. Shi, D. Marinov, STARTS: Static regression test selection,                     Software Maintenance, SM’98, Bethesda, MD, USA, 1998, pp. 44–53.
     in: M. Di Penta, T.N. Nguyen (Eds.), Proceedings of the 32nd IEEE/ACM                     [46] X. Ren, F. Shah, F. Tip, B.G. Ryder, O. Chesley, Chianti: a tool for change
     International Conference on Automated Software Engineering, ASE’17, IEEE                       impact analysis of java programs, in: Proceedings of the 19th Annual ACM
     Press, Urbana-Champaign, IL, USA, 2017, pp. 949–954.                                           SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and
[19] L. Zhang, Hybrid regression test selection, in: M. Chechik, M. Harman (Eds.),                  Applications, 2004, pp. 432–448.
     Proceedings of the 40th International Conference on Software Engineering,                 [47] L. Zhang, M. Kim, S. Khurshid, Faulttracer: a change impact and regression fault
     ICSE’18, IEEE, Gotheburg, Sweden, 2018, pp. 199–209.                                           analysis tool for evolving java programs, in: Proceedings of the ACM SIGSOFT
[20] A. Gyori, O. Legunsen, F. Hariri, D. Marinov, Evaluating regression test selection             20th International Symposium on the Foundations of Software Engineering, 2012,
     opportunities in a very large open-source ecosystem, in: S. Ghosh, R. Natella                  pp. 1–4.
     (Eds.), Proceedings of the 29th International Symposium on Software Reliability           [48] M. Skoglund, P. Runeson, Improving class firewall regression test selection by
     Engineering, ISSRE’18, IEEE, Memphis, TN, USA, 2018, pp. 112–122.                              removing the class firewall, Int. J. Softw. Eng. Knowl. Eng. 17 (3) (2007)
[21] JPMS. http://openjdk.java.net/projects/jigsaw/spec/.                                           359–378.
[22] M.M. Hammad, I. Abueisa, S. Malek, Tool-assisted componentization of Java ap-             [49] D.C. Kung, J. Gao, P. Hsia, J. Lin, Y. Toyoshima, Class firewall, test order, and
     plications, in: 2022 IEEE 19th International Conference on Software Architecture,              regression testing of object-oriented programs, J. Occup. Organ. Psychol. 8 (2)
     ICSA, 2022, pp. 36–46, http://dx.doi.org/10.1109/ICSA53651.2022.00012.                         (1995) 51–65.
[23] OpenJDK: Jigsaw project. https://openjdk.java.net/projects/jigsaw/.                       [50] D.C. Kung, J. Gao, P. Hsia, Y. Toyoshima, C. Chen, On regression testing of
[24] R.N. Taylor, N. Medvidovic, E.M. Dashofy, Software architecture: foundations,                  object-oriented programs, J. Syst. Softw. 32 (1) (1996) 21–40.
     theory, and practice, Google Sch. Google Sch. Digit. Libr. Digit. Libr. (2009)            [51] P. Hsia, X. Li, D.C.-H. Kung, C.-T. Hsu, L. Li, Y. Toyoshima, C. Chen, A technique
     (2009).                                                                                        for the selective revalidation of OO software, J. Software: Evol. Process. 9 (4)
[25] L.J. White, K. Abdullah, A firewall approach for regression testing of object-                 (1997) 217–233.
     oriented software, in: Proceedings of the 10th International Software Quality             [52] Y.K. Jang, M. Munro, Y.R. Kwon, An improved method of selecting regression
     Week, QW’97, San Francisco, CA, USA, 1997.                                                     tests for C++ programs, J. Softw. Maint. Evol. 13 (5) (2011) 331–350.
[26] D. Michail, J. Kinable, B. Naveh, J.V. Sichi, JGraphT—A Java library for graph            [53] B.G. Ryder, F. Tip, Change impact analysis for object-oriented programs, in:
     data structures and algorithms, ACM Trans. Math. Software 46 (2) (2020).                       Proceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis
[27] jdeps: The Java class dependency analyzer. Available from Oracle: https://docs.                for Software Tools and Engineering, 2001, pp. 46–53.
     oracle.com/javase/8/docs/technotes/tools/unix/jdeps.html.                                 [54] X. Ren, F. Shah, F. Tip, B.G. Ryder, O. Chesley, J. Dolby, Chianti: A Prototype
[28] A. Shi, A. Gyori, M. Gligoric, A. Zaytsev, D. Marinov, Balancing trade-offs in
                                                                                                    Change Impact Analysis Tool for Java, Tech. Rep., Rutgers University, 2003.
     test-suite reduction, in: A. Orso, M.-A. Storey (Eds.), Proceedings of the 22nd
                                                                                               [55] Q.D. Soetens, S. Demeyer, A. Zaidman, Change-based test selection in the
     International Symposium on Foundations of Software Engineering, FSE’14, ACM,
                                                                                                    presence of developer tests, in: A. Cleve, F. Ricca (Eds.), Proceedings of the 17th
     Hong Kong, China, 2014, pp. 246–256.
                                                                                                    European Conference on Software Maintenance and Reengineering, CSMR’13,
[29] N. Ghorbani, J. Garcia, S. Malek, Detection and repair of architectural inconsis-
                                                                                                    IEEE, Genoa, Italy, 2013, pp. 101–110.
     tencies in Java, in: 2019 IEEE/ACM 41st International Conference on Software
                                                                                               [56] Q.D. Soetens, S. Demeyer, A. Zaidman, J. Pérez, Change-based test selection: An
     Engineering, ICSE, 2019, pp. 560–571, http://dx.doi.org/10.1109/ICSE.2019.
                                                                                                    empirical evaluation, Empir. Softw. Eng. (2015) 1–43.
     00067.
[30] J. Garcia, I. Ivkovic, N. Medvidovic, A comparative analysis of software archi-
     tecture recovery techniques, in: 2013 28th IEEE/ACM International Conference                                         Dr. Mohammed Al-Refai is an Assistant Professor in the
     on Automated Software Engineering, ASE, IEEE, 2013, pp. 486–496.                                                     Computer Science Department within the Computer and
[31] J. Garcia, D. Popescu, C. Mattmann, N. Medvidovic, Y. Cai, Enhancing architec-                                       Information Technology School at the Jordan University of
     tural recovery using concerns, in: 2011 26th IEEE/ACM International Conference                                       Science and Technology (JUST). Al-Refai’s research focuses
     on Automated Software Engineering, ASE 2011, IEEE, 2011, pp. 552–555.                                                on various areas within software engineering, including
[32] B.S. Mitchell, S. Mancoridis, On the automatic modularization of software                                            model-driven development, model-based testing, software
     systems using the bunch tool, IEEE Trans. Softw. Eng. 32 (3) (2006) 193–208.                                         architecture, software testing, regression test selection and
                                                                                                                          prioritization, software security, and the integration of fuzzy
[33] V. Tzerpos, R.C. Holt, ACDC: an algorithm for comprehension-driven clustering,
                                                                                                                          logic and machine learning in software engineering applica-
     in: Proceedings Seventh Working Conference on Reverse Engineering, IEEE, 2000,
                                                                                                                          tions. Al-Refai earned his Ph.D. in Computer Science from
     pp. 258–267.
                                                                                                                          Colorado State University, Fort Collins, Colorado, under the
[34] Yet another simple graph library. https://github.com/TestingResearchIllinois/
                                                                                                                          supervision of Prof. Sudipto Ghosh. He also holds M.S. and
     yasgl.                                                                                                               B.S. in Computer Science from Jordan University of Science
[35] M. Stoicescu, J.-C. Fabre, M. Roy, Architecting resilient computing systems: A                                       and Technology. Al-Refai is a member of the Association for
     component-based approach for adaptive fault tolerance, J. Syst. Archit. 73 (2017)                                    Computing Machinery (ACM) and the Institute of Electrical
     6–16.                                                                                                                and Electronics Engineers (IEEE).
[36] H. Usach, J.A. Vila, C. Torens, F. Adolf, Architectural design of a safe mission
     manager for unmanned aircraft systems, J. Syst. Archit. 90 (2018) 94–108.
[37] Z. Yang, Z. Qiu, Y. Zhou, Z. Huang, J.-P. Bodeveix, M. Filali, C2AADL_Reverse:                                       Dr. Mahmoud Hammad is an Associate Professor in the
     A model-driven reverse engineering approach to development and verification                                          Software Engineering Department within the Computer and
     of safety-critical software, J. Syst. Archit. 118 (2021) 102202.                                                     Information Technology School at the Jordan University of
                                                                                                                          Science and Technology (JUST). He is also the director of
[38] I. Allende, N. Mc Guire, J. Perez, L.G. Monsalve, R. Obermaisser, Towards
                                                                                                                          the Center for E-Learning and Open Educational Resources.
     Linux based safety systems—A statistical approach for software execution path
                                                                                                                          Hammad’s research interests are in the field of software
     coverage, J. Syst. Archit. 116 (2021) 102047.
                                                                                                                          engineering, specifically in the area of software architecture,
[39] M.K. Shin, S. Ghosh, L.R. Vijayasarathy, An empirical comparison of
                                                                                                                          self-adaptive software systems, mobile computing, software
     four Java-based regression test selection techniques, J. Syst. Softw. 186                                            analysis, software security, natural language processing and
     (2022) 111174, http://dx.doi.org/10.1016/j.jss.2021.111174, URL https://www.                                         machine learning. Hammad received his Ph.D. in Software
     sciencedirect.com/science/article/pii/S0164121221002582.                                                             Engineering from the University of California, Irvine (UCI)


                                                                                          15
M. Al-Refai and M.M. Hammad                                                                          Journal of Systems Architecture 160 (2025) 103343


                       under the supervision of Prof. Sam Malek . During his Ph.D.,        received his M.S. in Software Engineering from George
                       Hammad developed a self-protecting Android software sys-            Mason University, VA, USA and B.S. in Computer Science
                       tem , an Android software system that can monitor itself and        from Yarmouk University, Jordan . Hammad is a member
                       adapt (change) its behavior at runtime to keep the system           of the Association of Computing Machinery (ACM), ACM
                       secure and protected from Inter-Component Communication             Special Interest Group on Software Engineering (SIGSOFT),
                       attacks at all times. Hammad                                        and the Institute of Electrical and Electronics Engineers
                                                                                           (IEEE). https://hammadmahmoud.github.io/


                                                                                      16