1092 lines
145 KiB
Plaintext
1092 lines
145 KiB
Plaintext
Journal of Systems Architecture 160 (2025) 103340
|
||
|
||
|
||
Contents lists available at ScienceDirect
|
||
|
||
|
||
Journal of Systems Architecture
|
||
journal homepage: www.elsevier.com/locate/sysarc
|
||
|
||
|
||
|
||
|
||
An efficient string solver for string constraints with regex-counting and
|
||
string-length
|
||
Denghang Hu ∗, Zhilin Wu
|
||
Key Laboratory of System Software and State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, University of Chinese
|
||
Academy of Sciences, Beijing, China
|
||
|
||
|
||
|
||
ARTICLE INFO ABSTRACT
|
||
|
||
Keywords: Regular expressions (regex for short) and string-length function are widely used in string-manipulating
|
||
String constraints programs. Counting is a frequently used feature in regexes that counts the number of matchings of sub-
|
||
String solver patterns. The state-of-the-art string solvers are incapable of solving string constraints with regex-counting and
|
||
Regular expression
|
||
string-length efficiently, especially when the counting and length bounds are large. In this work, we propose
|
||
String length
|
||
an automata-theoretic approach for solving such class of string constraints. The main idea is to symbolically
|
||
Counting operator
|
||
model the counting operators by registers in automata instead of unfolding them explicitly, thus alleviating the
|
||
state explosion problem. Moreover, the string-length function is modeled by a register as well. As a result, the
|
||
satisfiability of string constraints with regex-counting and string-length is reduced to the satisfiability of linear
|
||
integer arithmetic, which the off-the-shelf SMT solvers can then solve. To improve the performance further,
|
||
we also propose various optimization techniques. We implement the algorithms and validate our approach on
|
||
49,843 benchmark instances. The experimental results show that our approach can solve more instances than
|
||
the state-of-the-art solvers, at a comparable or faster speed, especially when the counting and length bounds are
|
||
large or when the counting operators are nested with some other counting operators or complement operators.
|
||
|
||
|
||
|
||
1. Introduction proposed to solve such string constraints efficiently (see e.g. [8,17]).
|
||
Counting (aka repetition) is a convenient feature in regexes that
|
||
The string data type plays a crucial role in modern programming counts the number of matchings of sub-patterns. For instance, 𝑎{2,4}
|
||
languages such as JavaScript, Python, Java, and PHP. String manipula- specifies that 𝑎 occurs at least twice and at most four times, and 𝑎{2,∞}
|
||
tions are error-prone and could even give rise to severe security vulner- specifies that 𝑎 occurs at least twice. Note that although the Kleene
|
||
abilities (e.g., cross-site scripting, aka XSS). One powerful method for
|
||
star and the Kleene plus operator are special cases of counting, i.e., 𝑒∗
|
||
identifying such bugs is symbolic execution. It analyzes the feasibility
|
||
is equivalent to 𝑒{0,∞} and 𝑒+ is equivalent to 𝑒{1,∞} , in the rest of this
|
||
of execution paths in a program by encoding the paths as logical
|
||
formulas and utilizing the constraint solvers (e.g. Z3 [1]) to solve the paper, for clarity, we use counting to denote expressions of the form
|
||
satisfiability of the formulas. Symbolic execution of string manipulating 𝑒{𝑚,𝑛} and 𝑒{𝑚,∞} , but not 𝑒∗ or 𝑒+ . Counting is a frequently used feature
|
||
programs has motivated the highly active research area of string con- of regexes. According to the statistics from [13], counting occurs in
|
||
straint solving, resulting in the development of numerous string solvers more than 20% of 1204 Python projects. Therefore, an efficient analysis
|
||
in the last decade, e.g., Z3seq [1], CVC4/5 [2,3], Z3str/2/3/4 [4– of string manipulating programs requires efficient solvers for string
|
||
7], Z3str3RE [8], Z3-Trau [9,10], OSTRICH [11], Slent [12], among constraints containing regexes with counting and string-length function
|
||
many others. Regular expressions (regex for short) and the string-length at least.
|
||
function are widely used in string-manipulating programs. According The aforementioned state-of-the-art string constraint solvers still
|
||
to the statistics from [13–15], regexes are used in about 30–40% of
|
||
suffer from such string constraints, especially when the counting and
|
||
Java, JavaScript, and Python software projects. Moreover, string-length
|
||
length bounds are large. For instance, none of the string solvers CVC5,
|
||
occupies 78% of the occurrences of string operations in 18 Javascript
|
||
applications, according to the statistics from [16]. As a result, most Z3seq, Z3-Trau, Z3str3, Z3str3RE, and OSTRICH is capable of solving
|
||
of the aforementioned string constraint solvers support both regexes the following constraint within 120 s,
|
||
and string-length function. Moreover, specialized algorithms have been 𝑥 ∈ (𝛴 ⧵ 𝑎){1,60} (𝛴 ⧵ 𝑏){1,60} (𝛴 ⧵ 𝑐){0,60} ∧ 𝑥 ∈ 𝛴 ∗ 𝑐 + ∧ |𝑥| > 120. (1)
|
||
|
||
|
||
∗ Corresponding author.
|
||
E-mail address: hudh@ios.ac.cn (D. Hu).
|
||
|
||
https://doi.org/10.1016/j.sysarc.2025.103340
|
||
Received 2 June 2024; Received in revised form 7 January 2025; Accepted 7 January 2025
|
||
Available online 17 January 2025
|
||
1383-7621/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
|
||
D. Hu and Z. Wu Journal of Systems Architecture 160 (2025) 103340
|
||
|
||
|
||
Intuitively, the constraint in (1) specifies that 𝑥 is a concatenation of and back references. Real-world regexes have been addressed in sym-
|
||
three strings 𝑥1 , 𝑥2 , 𝑥3 where 𝑎 (resp. 𝑏, 𝑐) does not occur in 𝑥1 (resp. bolic execution of Javascript programs [21] and string constraint
|
||
𝑥2 , 𝑥3 ), moreover, 𝑥 ends with a nonempty sequence of 𝑐’s, and the solving [22]. Nevertheless, the counting operators are still unfolded
|
||
length of 𝑥 is greater than 120. It is easy to observe that this constraint explicitly in [22]. The Trau tool focuses on string constraints involving
|
||
is unsatisfiable since on the one hand, |𝑥| > 120 and the counting upper flat regular languages and string-length function and solves them by
|
||
bound 60 in both (𝛴 ⧵ 𝑎){1,60} and (𝛴 ⧵ 𝑏){1,60} imply that 𝑥 must end computing LIA formulas that define the Parikh images of flat regular
|
||
with some character from 𝛴 ⧵ 𝑐, that is, a character different from 𝑐, languages [10]. The Slent tool solves the string constraints involving
|
||
and on the other hand, 𝑥 ∈ 𝛴 ∗ 𝑐 + requires that 𝑥 has to end with 𝑐. string-length function by encoding them into so-called length-encoded
|
||
A typical way for string constraint solvers to deal with regular automata, then utilizing symbolic model checkers to solve their non-
|
||
expressions with counting is to unfold them into those without counting emptiness problem [12]. However, neither Trau nor Slent supports
|
||
using the concatenation operator. For instance, 𝑎{1,4} is unfolded into counting operators explicitly, in other words, counting operators in
|
||
𝑎(𝜀 + 𝑎 + 𝑎𝑎 + 𝑎𝑎𝑎) and 𝑎{2,∞} is unfolded into 𝑎𝑎𝑎∗ . Since the unfolding regexes should be unfolded before solved by them. Cost registers in
|
||
incurs an exponential blow-up on the sizes of constraints (assuming that CEFAs are different from registers in (symbolic) register automata [23,
|
||
the counting in string constraints are encoded in binary), the unfolding 24]: In register automata, registers are used to store input values and
|
||
penalizes the performance of the solvers, especially when the length can only be compared for equality/inequality, while in CEFAs, cost
|
||
bounds are also available. registers are used to store integer-valued costs and can be updated by
|
||
Contribution. In this work, we focus on the class of string con- adding/subtracting integer constants and constrained by the accepting
|
||
straints with regular membership and string-length function, where the conditions which are LIA formulas. Therefore, cost registers in CEFAs
|
||
counting operators may occur (called RECL for brevity). We make the are more like counters in counter automata/machines [25], that is,
|
||
following contributions. CEFAs can be obtained from counter machines by removing transition
|
||
guards and adding accepting conditions. CEFAs can also be considered
|
||
• We propose an automata-theoretical approach for solving RECL as a variation of Parikh automata [26] in which the numbers in the
|
||
constraints. Our main idea is to represent the counting operators alphabet can be integers, instead of natural numbers. Counting-set
|
||
by cost registers in cost-enriched finite automata (CEFA, see Sec- automata were proposed in [27,28] to quickly match a subclass of
|
||
tion 5 for the definition), instead of unfolding them explicitly. The regexes with counting. Moreover, a variant of nondeterministic counter
|
||
string-length function is modeled by cost registers as well. The automata, called bit vector automata, was proposed recently in [29]
|
||
satisfiability of RECL constraints is reduced to the non-emptiness to enable fast matching of a more expressive class of regexes with
|
||
problem of CEFA w.r.t. a linear integer arithmetic (LIA) formula. counting. Nevertheless, the non-emptiness problem of these automata
|
||
According to the results from [18], an LIA formula can be com- was not considered, and it is unclear whether these automata models
|
||
puted to represent the potential values of registers in CEFA. Thus, can be used for solving string constraints with regex-counting and
|
||
the non-emptiness of CEFA w.r.t. LIA formulas can be reduced to string-length.
|
||
the satisfiability of LIA formulas, which is then tackled by off-the- Structure. The rest of this paper is structured as follows: Section 2 in-
|
||
shelf SMT solvers. Moreover, we propose techniques to reduce the troduces the preliminaries. Section 3 gives an overview of the approach
|
||
sizes (i.e., the number of states and transitions) of CEFA in order in this paper. Section 4 presents the syntax and semantics of RECL.
|
||
to achieve better performance. Section 5 defines CEFA. Section 6 introduces the algorithm to solve
|
||
• For complex RECL constraints where the counting operators may RECL constraints. Section 7 presents some heuristics for improving the
|
||
be nested with the other counting operators or complement oper- performance of solving RECL constraints where the counting operators
|
||
ators, we propose optimization techniques to improve the perfor- are nested with some other counting operators or complement opera-
|
||
mance. Furthermore, for the satisfiable RECL constraints where tors. Section 9 shows the experiment results. Section 10 concludes this
|
||
the solutions of the LIA formulas can be obtained, but the models paper.
|
||
of the original RECL constraint cannot be generated, we pro- A preliminary version of this paper has appeared in the proceedings
|
||
pose optimization techniques to accelerate the model generation of SETTA 2023 [30]. This paper extends the SETTA paper in the
|
||
process. following three aspects.
|
||
• We implement the algorithm on top of OSTRICH, resulting in a
|
||
• To improve the performance of complex RECL constraints where
|
||
string solver1 called OSTRICHRECL . Finally, we utilize a collec-
|
||
the counting operators are nested with some other counting op-
|
||
tion of benchmark suites2 comprising 49,843 problem instances
|
||
erators or complement operators, various optimizations are pro-
|
||
in total to evaluate the performance of OSTRICHRECL . The ex-
|
||
posed.
|
||
periment results show that OSTRICHRECL solves the RECL con-
|
||
• To deal with the issue that the satisfiability of LIA formulas can be
|
||
straints more efficiently than the state-of-the-art string solvers
|
||
solved, but models of the original RECL constraints cannot be gen-
|
||
(see Tables 1–5 for details). For instance, OSTRICHRECL solves
|
||
erated, we propose techniques to optimize the model generation
|
||
49,638 out of 49,843 instances, while the best state-of-the-art
|
||
process.
|
||
solver solve only 47,721 instances. We also do experiments to jus-
|
||
• The benchmarks are extended to cover the situations where the
|
||
tify some of our technical choices and evaluate the performance
|
||
counting operators are nested with some other counting opera-
|
||
of the optimizations tors or complement operators, and the experiments are extended
|
||
Related work. We discuss more related work on regexes with counting, accordingly.
|
||
string-length function, and automata with registers/counters. Deter-
|
||
minism of regexes with counting was investigated in [19,20]. Real- 2. Preliminaries
|
||
world regexes in programming languages include features beyond
|
||
classical regexes, e.g., the greedy/lazy Kleene star, capturing groups, We write N and Z for the sets of natural and integer numbers. For
|
||
𝑛 ∈ N with 𝑛 ≥ 1, [𝑛] denotes {1, … , 𝑛}; for 𝑚, 𝑛 ∈ N with 𝑚 ≤ 𝑛,
|
||
[𝑚, 𝑛] denotes {𝑖 ∈ N ∣ 𝑚 ≤ 𝑖 ≤ 𝑛}. Throughout the paper, 𝛴 is a finite
|
||
1
|
||
The solver is open source and available at https://github.com/ alphabet, ranged over 𝑎, 𝑏, ….
|
||
SimpleXiaohu/ostrich/tree/jsa2024 For a function 𝑓 from 𝑋 to 𝑌 and 𝑋 ′ ⊆ 𝑋, we use 𝗉𝗋𝗃𝑋 ′ (𝑓 ) to
|
||
2
|
||
The benchmarks are available at https://github.com/SimpleXiaohu/ denote the restriction (aka projection) of 𝑓 to 𝑋 ′ , that is, 𝗉𝗋𝗃𝑋 ′ (𝑓 ) is
|
||
ostrich/tree/jsa2024/zaligvinder/models/jsa2024. the function from 𝑋 ′ to 𝑌 , and 𝗉𝗋𝗃𝑋 ′ (𝑓 )(𝑥′ ) = 𝑓 (𝑥′ ) for each 𝑥′ ∈ 𝑋 ′ .
|
||
|
||
2
|
||
D. Hu and Z. Wu Journal of Systems Architecture 160 (2025) 103340
|
||
|
||
Table 1
|
||
Overall performance evaluation.
|
||
CVC5 Z3str3RE Z3str3 Z3seq Ostrich OSTRICHRECL
|
||
sat 28,282 28,449 23,322 28,212 25,975 29,084
|
||
unsat 16,809 19,316 12,742 18718 20,529 20,554
|
||
unknown 8 134 7021 203 162 14
|
||
timeout 4744 1944 6758 2710 3177 191
|
||
error 0 44 44 56 0 0
|
||
solved 45,091 47,721 36,020 46,874 46,504 49,638
|
||
average time
|
||
6.43 1.99 8.48 4.04 6.75 2.88
|
||
(s)
|
||
|
||
|
||
Table 2
|
||
Performance evaluation on the NestC benchmark suite.
|
||
CVC5 Z3str3RE Z3str3 Z3seq Ostrich OSTRICHRECL
|
||
sat 508 166 196 451 0 635
|
||
unsat 0 4 0 67 238 238
|
||
unknown 0 35 31 105 2 6
|
||
timeout 492 795 773 377 760 121
|
||
error 0 0 0 0 0 0
|
||
solved 508 170 196 518 238 873
|
||
average time
|
||
31.57 20.09 50.06 26.13 46.11 19.49
|
||
(s)
|
||
|
||
|
||
Table 3 𝐿2 , denoted by 𝐿1 ⋅ 𝐿2 , is defined as {𝑢 ⋅ 𝑣 ∣ 𝑢 ∈ 𝐿1 , 𝑣 ∈ 𝐿2 }. The union
|
||
Evaluation of the technical choices and optimizations in Section 6. (resp. intersection) of 𝐿1 and 𝐿2 , denoted by 𝐿1 ∪ 𝐿2 (resp. 𝐿1 ∩ 𝐿2 ), is
|
||
OSTRICHRECL
|
||
NUXMV
|
||
OSTRICHRECL
|
||
−ASR
|
||
OSTRICHRECL
|
||
defined as {𝑢 ∣ 𝑢 ∈ 𝐿1 or 𝑢 ∈ 𝐿2 } (resp. {𝑢 ∣ 𝑢 ∈ 𝐿1 and 𝑢 ∈ 𝐿2 }). The
|
||
sat 26,607 27,295 29,084
|
||
complement of 𝐿1 , denoted by 𝐿1 , is defined as {𝑢 ∣ 𝑢 ∈ 𝛴 ∗ , 𝑢 ∉ 𝐿1 }.
|
||
unsat 20,499 20,513 20,554
|
||
The difference of 𝐿1 and 𝐿2 , denoted by 𝐿1 ⧵ 𝐿2 , is defined as 𝐿1 ∩ 𝐿2 .
|
||
unknown 45 68 14
|
||
For a language 𝐿 and 𝑛 ∈ N, we define 𝐿𝑛 inductively as follows:
|
||
timeout 2,693 1,968 191 ⋃
|
||
error 0 0 0 𝐿0 = {𝜀}, 𝐿𝑛+1 = 𝐿 ⋅ 𝐿𝑛 for every 𝑛 ∈ N. Finally, define 𝐿∗ = 𝑛∈N 𝐿𝑛 .
|
||
solved 47,106 47,808 49,638 Finite automata. A (nondeterministic) finite automaton (NFA) is a tuple
|
||
average time (s) 6.86 4.76 2.88
|
||
= (𝑄, 𝛴 , 𝛿 , 𝐼 , 𝐹 ), where 𝑄 is a finite set of states, 𝛴 is a finite
|
||
alphabet, 𝛿 ⊆ 𝑄 × 𝛴 × 𝑄 is the transition relation, 𝐼 ⊆ 𝑄 is the set
|
||
Table 4 of initial states, and 𝐹 ⊆ 𝑄 is the set of final states. For readability, we
|
||
Evaluation of the optimizations in Section 7. 𝑎 𝑎
|
||
write a transition (𝑞 , 𝑎, 𝑞 ′ ) ∈ 𝛿 as 𝑞 ←←→
|
||
← 𝑞 ′ (or simply 𝑞 ←←→
|
||
← 𝑞 ′ ). The size
|
||
OSTRICHRECL
|
||
−NEST
|
||
OSTRICHRECL
|
||
−COMP
|
||
OSTRICHRECL 𝛿
|
||
sat 553 296 635
|
||
of an NFA , denoted by ||, is defined as the number of transitions
|
||
unsat 238 238 238 of . A run of on a string 𝑤 = 𝑎1 ⋯ 𝑎𝑛 is a sequence of transitions
|
||
𝑎1 𝑎𝑛
|
||
unknown 27 8 6 𝑞0 ←←←←→ ← 𝑞𝑛 such that 𝑞0 ∈ 𝐼. The run is accepting if 𝑞𝑛 ∈ 𝐹 . A
|
||
← 𝑞1 ⋯ 𝑞𝑛−1 ←←←←→
|
||
timeout 182 458 121 string 𝑤 is accepted by an NFA if there is an accepting run of on
|
||
error 0 0 0 𝑤. In particular, the empty string 𝜀 is accepted by if 𝐼 ∩ 𝐹 ≠ ∅. The
|
||
solved 791 534 873 language of , denoted by (), is the set of strings accepted by . An
|
||
average time (s) 19.95 37.49 19.49 NFA is said to be deterministic if 𝐼 is a singleton and, for every 𝑞 ∈ 𝑄
|
||
and 𝑎 ∈ 𝛴, there is at most one state 𝑞 ′ ∈ 𝑄 such that (𝑞 , 𝑎, 𝑞 ′ ) ∈ 𝛿. We
|
||
Table 5 use DFA to denote deterministic finite automata.
|
||
Evaluation of the optimizations for the model generation It is well-known that finite automata capture regular languages.
|
||
in Section 8. Moreover, the class of regular languages is closed under union, inter-
|
||
OSTRICHRECL
|
||
−MODEL
|
||
OSTRICHRECL
|
||
section, concatenation, Kleene star, complement, and language differ-
|
||
sat 28,735 29,084
|
||
ence [31].
|
||
unsat 20,554 20,554
|
||
Let 𝑤 ∈ 𝛴 ∗ . The Parikh image of 𝑤, denoted by 𝗉𝖺𝗋𝗂𝗄𝗁(𝑤), is defined
|
||
unknown 13 14
|
||
timeout 541 191
|
||
as the function 𝜂 ∶ 𝛴 → N such that 𝜂(𝑎) = |𝑤|𝑎 for each 𝑎 ∈ 𝛴.
|
||
error 0 0 The Parikh image of an NFA , denoted by 𝗉𝖺𝗋𝗂𝗄𝗁(), is defined as
|
||
{𝗉𝖺𝗋𝗂𝗄𝗁(𝑤) ∣ 𝑤 ∈ ()}.
|
||
solved 49,289 49,638
|
||
average time (s) 3.57 2.88 Linear integer arithmetic and Parikh images. We use standard existential
|
||
linear integer arithmetic (LIA) formulas, which typically range over
|
||
𝜓 , 𝜑, 𝛷, 𝛼. For a set X of variables, we use 𝜓∕𝜑∕𝛷∕𝛼(X) to denote the
|
||
set of existential LIA formulas whose free variables are from X. For
|
||
Strings and languages. A string over 𝛴 is a (possibly empty) sequence example, we use 𝜑(⃗x) with ⃗x = (x1 , … , x𝑘 ) to denote an LIA formula 𝜑
|
||
of elements from 𝛴, denoted by 𝑢, 𝑣, 𝑤, …. An empty string is denoted whose free variables are from {x1 , … , x𝑘 }. For an LIA formula 𝜑(⃗x), we
|
||
by 𝜀. We use 𝛴𝜀 to denote 𝛴 ∪ {𝜀}. We write 𝛴 ∗ (resp., 𝛴 + ) for the use 𝜑[⃗𝑡∕⃗x] to denote the formula obtained by replacing (simultaneously)
|
||
set of all (resp. nonempty) strings over 𝛴. For a string 𝑢, we use |𝑢| to x𝑖 with 𝑡𝑖 for every 𝑖 ∈ [𝑘] where ⃗x = (x1 , … , x𝑘 ) and ⃗𝑡 = (𝑡1 , … , 𝑡𝑘 ) are
|
||
denote the number of letters in 𝑢. In particular, |𝜀| = 0. Moreover, for tuples of integer terms.
|
||
𝑎 ∈ 𝛴, let |𝑢|𝑎 denote the number of occurrences of 𝑎 in 𝑢. Assume that At last, we recall the result about Parikh images of NFA. For each
|
||
𝑢 = 𝑎1 ⋯ 𝑎𝑛 is nonempty and 1 ≤ 𝑖 < 𝑗 ≤ 𝑛. We let 𝑢[𝑖] denote 𝑎𝑖 and 𝑎 ∈ 𝛴, let z𝑎 be an integer variable. Let Z𝛴 denote the set of integer
|
||
𝑢[𝑖, 𝑗] for the substring 𝑎𝑖 ⋯ 𝑎𝑗 . Let 𝑢, 𝑣 be two strings. We use 𝑢 ⋅ 𝑣 to variables z𝑎 for 𝑎 ∈ 𝛴. Let be an NFA over the alphabet 𝛴. Then
|
||
denote the concatenation of 𝑢 and 𝑣. A language 𝐿 over 𝛴 is a subset of we say that an LIA formula 𝜓(Z𝛴 ) defines the Parikh image of , if
|
||
strings. Let 𝐿1 , 𝐿2 be two languages. Then the concatenation of 𝐿1 and {𝜂 ∶ 𝛴 → N ∣ 𝜓[(𝜂(𝑎)∕z𝑎 )𝑎∈𝛴 ] is true} = 𝗉𝖺𝗋𝗂𝗄𝗁().
|
||
|
||
3
|
||
D. Hu and Z. Wu Journal of Systems Architecture 160 (2025) 103340
|
||
|
||
|
||
|
||
|
||
Fig. 1. CEFA for (𝛴 ⧵ 𝑎){1,60} (𝛴 ⧵ 𝑏){1,60} (𝛴 ⧵ 𝑐){0,60} , 𝛴 ∗ 𝑐 + , and |𝑥|.
|
||
|
||
|
||
|
||
|
||
Fig. 2. 1 ∩ 2 ∩ 3 : Intersection of 1 , 2 , and 3 .
|
||
|
||
|
||
Theorem 1 ([32]). Given an NFA , an existential LIA formula 𝜓 (Z𝛴 ) the LIA formula 𝜑 ≡ 𝑟4 > 120, that is, whether there exist 𝑤 ∈ 𝛴 ∗ and
|
||
can be computed in linear time that defines the Parikh image of . (𝑣1 , 𝑣2 , 𝑣3 , 𝑣4 ) ∈ Z4 such that 𝑤 is accepted by , so that the resulting
|
||
registers values (𝑣1 , 𝑣2 , 𝑣3 , 𝑣4 ) satisfy both 1 ≤ 𝑣1 ≤ 60 ∧ 1 ≤ 𝑣2 ≤ 60 ∧ 0 ≤
|
||
3. Overview 𝑣3 ≤ 60 and 𝜑. It is not hard to observe that the non-emptiness of
|
||
with respect to 𝜑 is independent of the characters of . Therefore,
|
||
In this section, we utilize the string constraint in Eq. (1) to illustrate the characters in can be ignored, resulting into an NFA over the
|
||
the approach in our work. alphabet C, where C is the set of vectors from Z4 occurring in the
|
||
At first, we construct a CEFA for the regular expression (𝛴 ⧵ transitions of (see Fig. 3(a)). Then the original problem is reduced
|
||
𝑎){1,60} (𝛴 ⧵ 𝑏){1,60} (𝛴 ⧵ 𝑐){0,60} . Three registers are introduced, say to the problem of deciding whether there exists a string 𝑤′ ∈ C∗ that
|
||
𝑟1 , 𝑟2 , 𝑟3 , to represent the three counting operators; the nondeterministic is accepted by and its Parikh image (i.e., numbers of occurrences of
|
||
finite automaton (NFA) for (𝛴 ⧵ 𝑎)∗ (𝛴 ⧵ 𝑏)∗ (𝛴 ⧵ 𝑐)∗ is constructed; characters), say 𝜂𝑤′ ∶ C → N, satisfies 1 ≤ 𝑣′1 ≤ 60 ∧ 1 ≤ 𝑣′2 ≤ 60 ∧ 0 ≤
|
||
∑
|
||
the updates of registers are added to the transitions of the NFA; the 𝑣′3 ≤ 60 ∧ 𝑣′4 > 120, where (𝑣′1 , 𝑣′2 , 𝑣′3 , 𝑣′4 ) = 𝑣∈C
|
||
⃗ ⃗ 𝑣⃗ for each 𝑣⃗ ∈ C.
|
||
𝜂𝑤′ (𝑣)
|
||
′ ′ ′ ′
|
||
Intuitively, (𝑣1 , 𝑣2 , 𝑣3 , 𝑣4 ) is a weighted sum of vectors 𝑣⃗ ∈ C, where the
|
||
counting bounds are specified by the accepting condition 1 ≤ 𝑟1 ≤
|
||
60 ∧ 1 ≤ 𝑟2 ≤ 60 ∧ 0 ≤ 𝑟3 ≤ 60, resulting in a CEFA 1 illustrated weight is the number of occurrences of 𝑣⃗ in 𝑤′ . (See Section 6.2 for
|
||
in Fig. 1(a). 𝑟1 + + means that we increment the value of 𝑟1 by one more detailed arguments.)
|
||
after running the transition. A string 𝑤 is accepted by 1 if, when Let C = {𝑣⃗1 , … , 𝑣⃗𝑚 }. From the results in [32,33], an existential LIA
|
||
reading the characters in 𝑤, 1 applies the transitions to update the formula 𝜓 (x1 , … , x𝑚 ) can be computed to define the Parikh image of
|
||
state and the values of registers, reaching a final state 𝑞 in the end, strings that are accepted by , where x1 , … , x𝑚 are the integer variables
|
||
and the resulting values of the three registers, say 𝑣1 , 𝑣2 , 𝑣3 , satisfy the to denote the number of occurrences of 𝑣⃗1 , … , 𝑣⃗𝑚 . Therefore, the satis-
|
||
accepting condition. In addition, we construct other two CEFAs 2 for fiability of the string constraint in (1) is reduced to the satisfiability of
|
||
𝛴 ∗ 𝑐 + (see Fig. 1(b)) and 3 for string length function (see Fig. 1(c)). the following existential LIA formula,
|
||
⋀ ∑
|
||
In 3 , a register 𝑟4 is used to denote the length of strings and the 𝜓 (x1 , … , x𝑚 ) ∧ 1≤𝑗≤4 𝑟𝑗 = 1≤𝑘≤𝑚 x𝑘 𝑣𝑘,𝑗 ∧
|
||
accepting condition is 𝚝𝚛𝚞𝚎 (See Section 6.1 for more details about the (2)
|
||
1 ≤ 𝑟1 ≤ 60 ∧ 1 ≤ 𝑟2 ≤ 60 ∧ 0 ≤ 𝑟3 ≤ 60 ∧ 𝑟4 > 120.
|
||
construction of CEFA.) Note that we represent the counting operators
|
||
symbolically by registers instead of unfolding them explicitly. which can be solved by the off-the-shelf SMT solvers.
|
||
Next, we construct 1 ∩ 2 ∩ 3 , that is, the intersection (aka Nevertheless, when the original regexes are complicated (e.g. con-
|
||
product) of 1 , 2 , and 3 , as illustrated in Fig. 2(a), where the states tain occurrences of negation or intersection operators), the sizes of
|
||
that cannot reach the final states are removed. For technical conve- the NFA can still be big and the sizes of the LIA formulas defining
|
||
nience, we also think of the updates of registers in transitions as vectors the Parikh image of are also big. Since the satisfiability of LIA
|
||
(𝑢1 , 𝑢2 , 𝑢3 , 𝑢4 ), where 𝑢𝑖 ∈ Z is the update on the register 𝑟𝑖 for each 𝑖 ∈ formulas is an NP-complete problem [34], big sizes of LIA formulas
|
||
[4]. For instance, the transitions corresponding to the self-loop around would be a bottleneck of the performance. To tackle this issue, we
|
||
(𝑝0 , 𝑞0 , 𝑞0′ ) are thought as ((𝑝0 , 𝑞0 , 𝑞0′ ), 𝑎′ , (𝑝0 , 𝑞0 , 𝑞0′ ), (1, 0, 0, 1)) with 𝑎′ ∈ propose techniques to reduce the sizes of the NFA .
|
||
𝛴 ⧵ {𝑎}, since 𝑟1 and 𝑟4 are incremented by one in these transitions. Specifically, to reduce the sizes of , we determinize , and apply
|
||
After considering the updates of registers as vectors, the CEFA is like the minimization algorithm to the resulting deterministic finite au-
|
||
Fig. 2(b). tomaton (DFA), resulting in a DFA , as illustrated in Fig. 3(b). Note
|
||
Finally, the satisfiability of the original string constraint is reduced that contains only three states 𝑞0′′ , 𝑞1′′ , 𝑞2′′ and six transitions, while
|
||
to the non-emptiness of the CEFA ≡ 1 ∩ 2 ∩ 3 with respect to contains four states and eight transitions. Furthermore, if contains
|
||
|
||
4
|
||
D. Hu and Z. Wu Journal of Systems Architecture 160 (2025) 103340
|
||
|
||
|
||
|
||
|
||
Fig. 3. Reduced automaton and .
|
||
|
||
|
||
|
||
|
||
Fig. 4. Syntax of RECL.
|
||
|
||
|
||
⃗
|
||
0-labeled transitions, then we can take these transitions as 𝜖-transitions 𝜃 ⊧ 𝜑, if one of the following holds: (1) 𝜑 ≡ 𝑥 ∈ 𝑒 and 𝜃(𝑥) ∈ (𝑒), (2)
|
||
and potentially reduce the sizes of automata further. 𝜑 ≡ 𝑡1 𝑜 𝑡2 and 𝜃(𝑡1 ) 𝑜 𝜃(𝑡2 ), (3) 𝜑 ≡ 𝜑1 ∧ 𝜑2 and 𝜃 ⊧ 𝜑1 and 𝜃 ⊧ 𝜑2 .
|
||
We implement all the aforementioned techniques on the top of OS- A RECL formula 𝜑 is satisfiable if there is a mapping 𝜃 such as 𝜃 ⊧ 𝜑.
|
||
TRICH, resulting in a solver OSTRICHRECL . It turns out that The satisfiability problem for RECL (which is abbreviated as 𝖲𝖠𝖳𝖱𝖤𝖢𝖫 )
|
||
OSTRICHRECL is able to solve the string constraint in (1) within one is deciding whether a given RECL formula 𝜑 is satisfiable.
|
||
second, while the state-of-the-art string solvers are incapable of solving
|
||
it within 120 s. 5. Cost-enriched finite automata
|
||
|
||
4. String constraints with regex-counting and string-length In this section, we define cost-enriched finite state automata (CEFA),
|
||
which was introduced in [18] and will be used to solve the satisfiability
|
||
In the sequel, we define the string constraints with regex-counting problem of RECL later on. Intuitively, CEFA adds write-only cost regis-
|
||
and string-length functions, i.e., REgex-Counting Logic (abbreviated as ters to finite state automata. ‘‘write-only’’ means that the cost registers
|
||
RECL). The syntax of RECL is defined by the rules in Fig. 4, where 𝑥 is a can only be written/updated but cannot be read, i.e., they cannot be
|
||
string variable, x is an integer variable, 𝑎 is a character from an alphabet used to guard the transitions.
|
||
𝛴, and 𝑚, 𝑛 are integer constants. A RECL formula 𝜑 is a conjunction
|
||
of atomic formulas of the form 𝑥 ∈ 𝑒 or 𝑡1 𝑜 𝑡2 , where 𝑒 is a regular Definition 1 (Cost-Enriched Finite Automaton). A cost-enriched finite
|
||
expression, 𝑡1 and 𝑡2 are integer terms, and 𝑜 ∈ {=, ≠, ≤, ≥, <, >}. Atomic automaton is a tuple (𝑅, 𝑄, 𝛴 , 𝛿 , 𝐼 , 𝐹 , 𝛼) where
|
||
formulas of the form 𝑥 ∈ 𝑒 are called regular membership constraints,
|
||
• 𝑅 = {𝑟1 , … , 𝑟𝑘 } is a finite set of registers,
|
||
and atomic formulas of the form 𝑡1 𝑜 𝑡2 are called length constraints.
|
||
A regular expression 𝑒 is built from ∅, the empty string 𝜖, and the • 𝑄, 𝛴 , 𝐼 , 𝐹 are as in the definition of NFA,
|
||
character 𝑎 by using concatenation ⋅, union +, Kleene star ∗ , intersection • 𝛿 ⊆ 𝑄 × 𝛴 × 𝑄 × Z𝑅 is a transition relation, where Z𝑅 denotes the
|
||
∩, complement ̄, difference ⧵, counting {𝑚,𝑛} or {𝑚,∞} . An integer term updates on the values of registers.
|
||
is built from constants 𝑛, variables x, and string lengths |𝑥| by operators • 𝛼 ∈ 𝛷(𝑅) is an LIA formula specifying an accepting condition.
|
||
+ and −.
|
||
For convenience, we use 𝑅 to denote the set of registers of .
|
||
Moreover, for 𝑆 ⊆ 𝛴 with 𝑆 = {𝑎1 , … , 𝑎𝑘 }, we use 𝑆 as an
|
||
We assume a linear order on 𝑅 and write 𝑅 as a vector (𝑟1 , … , 𝑟𝑘 ).
|
||
abbreviation of 𝑎1 + ⋯ + 𝑎𝑘 .
|
||
Accordingly, we write an element of Z𝑅 as a vector (𝑣1 , … , 𝑣𝑘 ), where
|
||
For each regular expression 𝑒, the language defined by 𝑒, denoted
|
||
𝑣𝑖 is the update of 𝑟𝑖 for each 𝑖 ∈ [𝑘]. We also write a transition
|
||
by (𝑒), is defined recursively. For instance, (∅) = ∅, (𝜀) = {𝜀}, 𝑎
|
||
(𝑞 , 𝑎, 𝑞 ′ , 𝑣) ← 𝑞′ .
|
||
⃗ ∈ 𝛿 as 𝑞 ←←→
|
||
(𝑎) = {𝑎}, and (𝑒1 ⋅ 𝑒2 ) = (𝑒1 ) ⋅ (𝑒2 ), (𝑒1 + 𝑒2 ) = (𝑒1 ) ∪ (𝑒2 ), 𝑣⃗
|
||
and so on. It is well-known that regular expressions define the same The semantics of CEFA is defined as follows. Let = (𝑅, 𝑄, 𝛴 , 𝛿 , 𝐼 ,
|
||
class of languages as finite state automata, that is, the class of regular 𝐹 , 𝛼) be a CEFA. A run of on a string 𝑤 = 𝑎1 ⋯ 𝑎𝑛 is a sequence
|
||
𝑎1 𝑎𝑛 𝑎𝑖
|
||
languages [31]. 𝑞0 ←←←←→ ← 𝑞𝑛 such that 𝑞0 ∈ 𝐼 and 𝑞𝑖−1 ←←←←→
|
||
← 𝑞1 ⋯ 𝑞𝑛−1 ←←←←→ ← 𝑞𝑖 for each 𝑖 ∈ [𝑛].
|
||
𝑣⃗1 𝑣⃗𝑛 𝑣⃗𝑖
|
||
Let 𝜑 be a RECL formula and 𝖲𝖵𝖺𝗋𝗌(𝜑) (resp. 𝖨𝖵𝖺𝗋𝗌(𝜑)) denote the set 𝑎 1 𝑎𝑛
|
||
of string (resp. integer) variables occurring in 𝜑. Then the semantics of A run 𝑞0 ←←←←→← 𝑞1 ⋯ 𝑞𝑛−1 ←←←←→← 𝑞𝑛 is accepting if 𝑞𝑛 ∈ 𝐹 and 𝛼(𝑣⃗′ ∕𝑅) is true,
|
||
𝑣⃗1 𝑣⃗𝑛
|
||
𝜑 is defined with respect to a mapping 𝜃 ∶ 𝖲𝖵𝖺𝗋𝗌(𝜑) → 𝛴 ∗ ⊎𝖨𝖵𝖺𝗋𝗌(𝜑) → Z ∑
|
||
where 𝑣⃗′ = 𝑗∈[𝑛] 𝑣⃗𝑗 . The string 𝑤 is 𝑎𝑐 𝑐 𝑒𝑝𝑡𝑒𝑑 by the CEFA if and only
|
||
(where ⊎ denotes the disjoint union). Note that the mapping 𝜃 can ∑
|
||
if there is an accepting run of 𝑤. The vector 𝑣⃗′ = 𝑗∈[𝑛] 𝑣⃗𝑗 is called the
|
||
naturally extend to the set of integer terms. For instance, 𝜃(|𝑥|) = |𝜃(𝑥)|, 𝑎1 𝑎𝑛
|
||
𝜃(𝑡1 + 𝑡2 ) = 𝜃(𝑡1 ) + 𝜃(𝑡2 ). A mapping 𝜃 is said to satisfy 𝜑, denoted by cost of an accepting run 𝑞0 ←←←←→ ← 𝑞𝑛 . Note that the values of
|
||
← 𝑞1 ⋯ 𝑞𝑛−1 ←←←←→
|
||
𝑣⃗1 𝑣⃗𝑛
|
||
|
||
|
||
5
|
||
D. Hu and Z. Wu Journal of Systems Architecture 160 (2025) 103340
|
||
|
||
∑
|
||
all registers are initiated to zero and updated to 𝑗∈[𝑛] 𝑣⃗𝑗 after all the
|
||
From the construction,
|
||
transitions in the run are executed. We use 𝑣⃗′ ∈ (𝑤) to denote the fact
|
||
that there is an accepting run of on 𝑤 whose cost is 𝑣⃗′ . We define (1 ∩ 2 ) = {(𝑤; 𝑣) ⃗ ∈ (1 ) and (𝑤; 𝗉𝗋𝗃𝑅2 (𝑣))
|
||
⃗ ∣ (𝑤; 𝗉𝗋𝗃𝑅1 (𝑣)) ⃗ ∈ (2 )}.
|
||
the semantics of a CEFA , denoted by (), as {(𝑤; 𝑣⃗′ ) ∣ 𝑣⃗′ ∈ (𝑤)}. Intuitively, 1 ∩ 2 accepts the words that are accepted by both 1
|
||
In particular, if 𝐼 ∩ 𝐹 ≠ ∅ and 𝛼[0∕𝑅] ⃗ is true, then (𝜀; 0)⃗ ∈ ().
|
||
and 2 and outputs the costs of 1 and 2 .
|
||
Moreover, we define the output of a CEFA , denoted by (), as The concatenation of 1 and 2 , denoted by 1 ⋅ 2 , is defined
|
||
{𝑣⃗′ ∣ ∃𝑤. 𝑣⃗′ ∈ (𝑤)}. similarly as that of NFA, that is, a tuple (𝑅′ , 𝑄′ , 𝛴 , 𝛿 ′ , 𝐼 ′ , 𝐹 ′ , 𝛼 ′ ), where
|
||
We want to remark that the definition of CEFA above is slightly 𝑅′ = 𝑅1 ∪ 𝑅2 , 𝑄′ = 𝑄1 ∪ 𝑄2 , 𝐼 ′ = 𝐼1 , 𝛼 ′ = 𝛼1 ∧ 𝛼2 , 𝛿 ′ =
|
||
different from that of [18], where CEFA did not include accepting ⃗ ∣ (𝑞1 , 𝑎, 𝑞 ′ , 𝑣⃗1 ) ∈ 𝛿1 } ∪ {(𝑞2 , 𝑎, 𝑞 ′ , (0,
|
||
{(𝑞1 , 𝑎, 𝑞1′ , (𝑣⃗1 , 0)) ⃗ 𝑣⃗2 )) ∣ (𝑞2 , 𝑎, 𝑞 ′ , 𝑣⃗2 ) ∈
|
||
1 2 2
|
||
conditions. Moreover, the accepting conditions 𝛼 in CEFA are defined 𝛿2 } ∪ {(𝑞1 , 𝑎, 𝑞2 , (0,⃗ 𝑣⃗2 )) ∣ 𝑞1 ∈ 𝐹1 , ∃𝑞 ′ ∈ 𝐼2 . (𝑞 ′ , 𝑎, 𝑞2 , 𝑣⃗2 ) ∈ 𝛿2 }, moreover, if
|
||
in a global fashion because the accepting condition does not distinguish 𝐼2 ∩𝐹2 ≠ ∅, then 𝐹 ′ = 𝐹1 ∪𝐹2 , otherwise, 𝐹 ′ = 𝐹2 . From the construction,
|
||
final states. This technical choice is made so that the determinization
|
||
and minimization of NFA can be utilized to reduce the size of CEFA in (1 ⋅2 ) = {(𝑤1 𝑤2 ; 𝑣) ⃗ ∈ (1 ) and (𝑤2 ; 𝗉𝗋𝗃𝑅2 (𝑣))
|
||
⃗ ∣ (𝑤1 ; 𝗉𝗋𝗃𝑅1 (𝑣)) ⃗ ∈ (2 )}.
|
||
the next section.
|
||
In the sequel, we define three CEFA operations: union, intersection, Furthermore, the union, intersection, and concatenation operations
|
||
and concatenation. The following section will use these three oper- can be extended naturally to multiple CEFA, that is, 1 ∪ ⋯ ∪ 𝑛 ,
|
||
ations to solve RECL constraints. Note that the union, intersection, 1 ∩ ⋯ ∩ 𝑛 , 1 ⋅ ⋯ ⋅ 𝑛 . For instance, 1 ∪ 2 ∪ 3 = (1 ∪ 2 ) ∪ 3 ,
|
||
and concatenation operations are defined in a slightly more involved 1 ∩ 2 ∩ 3 = (1 ∩ 2 ) ∩ 3 , and 1 ⋅ 2 ⋅ 3 = (1 ⋅ 2 ) ⋅ 3 .
|
||
manner than register automata [23] and counter automata [19], as a
|
||
result of the (additional) accepting conditions. 6. Solving RECL constraints
|
||
Let 1 = (𝑅1 , 𝑄1 , 𝛴 , 𝛿1 , 𝐼1 , 𝐹1 , 𝛼1 ) and 2 = (𝑅2 , 𝑄2 , 𝛴 , 𝛿2 , 𝐼2 , 𝐹2 , 𝛼2 )
|
||
be two CEFA that share the alphabet. Moreover, suppose that 𝑅1 ∩𝑅2 = The goal of this section is to show how to solve RECL constraints by
|
||
∅ and 𝑄1 ∩ 𝑄2 = ∅. utilizing CEFA. At first, we reduce the satisfiability of RECL constraints
|
||
The union of 1 and 2 is denoted by 1 ∪ 2 . Two fresh auxiliary to a decision problem defined in the sequel. Then we propose a decision
|
||
registers say 𝑟′1 , 𝑟′2 ∉ 𝑅1 ∪ 𝑅2 , are introduced so that the accepting procedure for this problem.
|
||
condition knows whether a run is from 1 (or 2 ). Note that the
|
||
fresh auxiliary registers are introduced to distinguish whether 1 or Definition 2 (𝖭𝖤𝖫𝖨𝖠 (𝖢𝖤𝖥𝖠)). Let 𝑥1 , … , 𝑥𝑛 be string variables, 𝛬𝑥1 , … ,
|
||
2 accepts and weave the two accepting conditions of 1 and 2 into 𝛬𝑥𝑛 be nonempty sets of CEFA over the alphabet 𝛴 with 𝛬𝑥𝑖 =
|
||
one accepting condition. Specifically, 1 ∪ 2 = (𝑅′ , 𝑄′ , 𝛴 , 𝛿 ′ , 𝐼 ′ , 𝐹 ′ , 𝛼 ′ ) {𝑖,1 , … , 𝑖,𝑙𝑖 } for every 𝑖 ∈ [𝑛] where the sets of registers 𝑅1,1 , … ,
|
||
where 𝑅1,𝑙 , … , 𝑅𝑛,1 , … , 𝑅𝑛,𝑙 are mutually disjoint, moreover, let 𝜑 be
|
||
1 𝑛 ⋃
|
||
an LIA formula whose free variables are from 𝑖∈[𝑛],𝑗∈[𝑙𝑖 ] 𝑅𝑖,𝑗 . Then
|
||
• 𝑅′ = 𝑅1 ∪ 𝑅2 ∪ {𝑟′1 , 𝑟′2 }, 𝑄′ = 𝑄1 ∪ 𝑄2 ∪ {𝑞0′ } with 𝑞0′ ∉ 𝑄1 ∪ 𝑄2 , the CEFA in 𝛬𝑥1 , … , 𝛬𝑥𝑛 are said to be nonempty w.r.t. 𝜑 if there
|
||
𝐼 ′ = {𝑞0′ }, are assignments 𝜃 ∶ {𝑥1 , … , 𝑥𝑛 } → 𝛴 ∗ and vectors 𝑣⃗𝑖,𝑗 such that
|
||
• 𝛿 ′ is the union of four transitions sets: (𝜃(𝑥𝑖 ); 𝑣⃗𝑖,𝑗 ) ∈ (𝑖,𝑗 ) and 𝜑[(𝑣⃗𝑖,𝑗 ∕𝑅𝑖,𝑗 )] is true, for every 𝑖 ∈ [𝑛], 𝑗 ∈ [𝑙𝑖 ].
|
||
⃗ 1, 0)) ∣ ∃𝑞1 ∈ 𝐼1 . (𝑞1 , 𝑎, 𝑞 ′ , 𝑣⃗1 ) ∈ 𝛿1 }
|
||
– {(𝑞0′ , 𝑎, 𝑞1′ , (𝑣⃗1 , 0, 1
|
||
′ ⃗
|
||
– {(𝑞 , 𝑎, 𝑞 , (0, 𝑣⃗2 , 0, 1)) ∣ ∃𝑞2 ∈ 𝐼2 . (𝑞2 , 𝑎, 𝑞 ′ , 𝑣⃗2 ) ∈ 𝛿2 }
|
||
′ Proposition 1 ([18]). 𝖭𝖤𝖫𝖨𝖠 (𝖢𝖤𝖥𝖠) is PSPACE-complete.
|
||
0 2 2
|
||
⃗ 0, 0)) ∣ 𝑞1 ∉ 𝐼1 . (𝑞1 , 𝑎, 𝑞 ′ , 𝑣⃗1 ) ∈ 𝛿1 }
|
||
– {(𝑞1 , 𝑎, 𝑞1′ , (𝑣⃗1 , 0, Note that the decision procedure in [18] was only used to prove the
|
||
1
|
||
⃗ 𝑣⃗2 , 0, 0)) ∣ 𝑞2 ∉ 𝐼2 . (𝑞2 , 𝑎, 𝑞 ′ , 𝑣⃗2 ) ∈ 𝛿2 }
|
||
– {(𝑞2 , 𝑎, 𝑞2′ , (0, upper bound in Proposition 1 and not implemented as a matter of fact.
|
||
2
|
||
Instead, the symbolic model checker nuXmv [35] was used to solve
|
||
⃗ 1, 0) is a vector that updates 𝑅1 by 𝑣⃗1 , updates 𝑅2 by
|
||
where (𝑣⃗1 , 0, 𝖭𝖤𝖫𝖨𝖠 (𝖢𝖤𝖥𝖠). We do not rely on nuXmv in this work and shall propose
|
||
⃗ and updates 𝑟′ , 𝑟′ by 1,0 respectively. Similarly for (0,
|
||
0, ⃗ 𝑣⃗2 , 0, 1), a new algorithm for solving 𝖭𝖤𝖫𝖨𝖠 (𝖢𝖤𝖥𝖠) in Section 6.2.
|
||
1 2
|
||
and so on.
|
||
• 𝐹 ′ and 𝛼 ′ are defined as follows, 6.1. From 𝖲𝖠𝖳𝖱𝖤𝖢𝖫 to 𝖭𝖤𝖫𝖨𝖠 (𝖢𝖤𝖥𝖠)
|
||
|
||
⃗ belongs to (1 ) or (2 ), i.e., one of the two
|
||
– if (𝜖; 0) Let 𝜑 be a RECL constraint and 𝑥1 , … , 𝑥𝑛 be an enumeration of the
|
||
automata accepts the empty string 𝜖, then 𝐹 ′ = 𝐹1 ∪𝐹2 ∪ {𝑞0′ } string variables occurring in 𝜑. Moreover, let 𝜑 ≡ 𝜑1 ∧ 𝜑2 such that
|
||
and 𝛼 ′ = (𝑟′1 = 0 ∧ 𝑟′2 = 0) ∨ (𝑟′1 = 1 ∧ 𝛼1 ) ∨ (𝑟′2 = 1 ∧ 𝛼2 ), 𝜑1 is a conjunction of regular membership constraints of 𝜑, and 𝜑2 is a
|
||
– otherwise, 𝐹 ′ = 𝐹1 ∪ 𝐹2 and 𝛼 ′ = (𝑟′1 = 1 ∧ 𝛼1 ) ∨ (𝑟′2 = 1 ∧ 𝛼2 ). conjunction of length constraints of 𝜑. We shall reduce the satisfiability
|
||
of 𝜑 to an instance of 𝖭𝖤𝖫𝖨𝖠 (𝖢𝖤𝖥𝖠).
|
||
At first, we show how to construct a CEFA from a regex where
|
||
From the construction, we know that counting operators may occur. Our approach largely follows Thomp-
|
||
⎧ | (𝑤; 𝗉𝗋𝗃 (𝑣)) ⎫
|
||
| 𝑅1 ⃗ ∈ (1 ) and 𝗉𝗋𝗃𝑟′1 (𝑣) ⃗ = 1, or son’s construction [36] but we avoid using the 𝜖-transitions. Similar
|
||
⎪ | ⎪
|
||
⃗ || (𝑤; 𝗉𝗋𝗃𝑅2 (𝑣))
|
||
(1 ∪ 2 ) = ⎨(𝑤; 𝑣) ⃗ ∈ (2 ) and 𝗉𝗋𝗃𝑟′ (𝑣) ⃗ = 1, or ⎬. ideas of representing counting operators as counters in automata have
|
||
⎪ | 2
|
||
⎪ appeared in e.g. [27].
|
||
⎩ | (𝑤; ⃗
|
||
𝑣) = (𝜖 , ⃗
|
||
0) if or accepts 𝜖 ⎭
|
||
| 1 2
|
||
Let us start with register-representable regexes defined in the sequel.
|
||
Intuitively, 1 ∪ 2 accepts the words that are accepted by one of the Let us fix an alphabet 𝛴.
|
||
1 and 2 and outputs the costs of the corresponding automaton. Note Let 𝑒 be a regex over 𝛴. Then an occurrence of counting operators in
|
||
that an integer vector 𝑣⃗1 in 1 , is copied at most twice in 1 ∪ 2 , 𝑒, say (𝑒′ ){𝑚,𝑛} (or (𝑒′ ){𝑚,∞} ), is said to be register-representable if (𝑒′ ){𝑚,𝑛}
|
||
⃗ 0, 0) and (𝑣⃗1 , 0,
|
||
namely, (𝑣⃗1 , 0, ⃗ 1, 0). Similarly for the integer vectors in (or (𝑒′ ){𝑚,∞} ) is not in the scope of a Kleene star, another counting
|
||
2 . operator, complement, or language difference in 𝑒. We say that 𝑒 is
|
||
The intersection of 1 and 2 , denoted by 1 ∩2 = (𝑅′ , 𝑄′ , 𝛴 , 𝛿 ′ , 𝐼 ′ , register-representable if and only if all occurrences of counting operators
|
||
𝐹 , 𝛼 ′ ), is defined in the sequel.
|
||
′ in 𝑒 are register-representable. For instance, 𝑎{2,6} ∩ 𝑎{4,∞} is register-
|
||
representable, while 𝑎{2,6} and (𝑎{2,6} ){4,∞} are not since 𝑎{2,6} is in the
|
||
• 𝑅′ = 𝑅1 ∪ 𝑅2 , 𝑄′ = 𝑄1 × 𝑄2 , 𝐼 ′ = 𝐼1 × 𝐼2 , 𝐹 ′ = 𝐹1 × 𝐹2 , 𝛼 ′ = 𝛼1 ∧ 𝛼2 , scope of complement and the counter operator {4, ∞} respectively.
|
||
• 𝛿 ′ comprises the tuples ((𝑞1 , 𝑞2 ), 𝑎, (𝑞1′ , 𝑞2′ ), (𝑣⃗1 , 𝑣⃗2 )) such that (𝑞1 , 𝑎, Let 𝑒 be a register-representable regex over 𝛴. By the following
|
||
𝑞1′ , 𝑣⃗1 ) ∈ 𝛿1 and (𝑞2 , 𝑎, 𝑞2′ , 𝑣⃗2 ) ∈ 𝛿2 . procedure, we will construct a CEFA out of 𝑒, denoted by 𝑒 .
|
||
|
||
|
||
6
|
||
D. Hu and Z. Wu Journal of Systems Architecture 160 (2025) 103340
|
||
|
||
|
||
1. For each sub-expression (𝑒′ ){𝑚,𝑛} with 𝑚 ≤ 𝑛 (resp. (𝑒′ ){𝑚,∞} ) Observation 2. For each CEFA = (𝑅, 𝑄, 𝛴 , 𝛿 , 𝐼 , 𝐹 , 𝛼),
|
||
of 𝑒, we construct a CEFA (𝑒′ ){𝑚,𝑛} (resp. (𝑒′ ){𝑚,∞} ). Let 𝑒′ = ⎧ | ⎡∑ ⎫
|
||
(𝑄, 𝛴 , 𝛿 , 𝐼 , 𝐹 ). Then (𝑒′ ){𝑚,𝑛} = ((𝑟′ ), 𝑄′ , 𝛴 , 𝛿 ′′ , 𝐼 ′ , 𝐹 ′ , 𝛼 ′ ), where ⎪∑ |
|
||
| / ⎤ ⎪
|
||
() = ⎨ ⃗ 𝑣⃗ | 𝜂 ∈ 𝗉𝖺𝗋𝗂𝗄𝗁( ) and 𝛼 ⎢
|
||
𝜂(𝑣) ⃗ 𝑣⃗ 𝑅⎥ is true ⎬ .
|
||
𝜂(𝑣)
|
||
𝑟′ is a new register, 𝑄′ = 𝑄 ∪ {𝑞0 } with 𝑞0 ∉ 𝑄, 𝐼 ′ = {𝑞0 }, | ⎢ ⎥
|
||
⎪𝑣∈C
|
||
⃗ | ⎣𝑣∈C
|
||
⃗ ⎦ ⎪
|
||
𝐹 ′ = 𝐹 ∪ {𝑞0 }, and ⎩ | ⎭
|
||
𝛿 ′′ = {(𝑞 , 𝑎, 𝑞 ′ , (0)) ∣ (𝑞 , 𝑎, 𝑞 ′ ) ∈ 𝛿} ∪
|
||
For 𝑖 ∈ [𝑛], let 𝛼𝑖 be the accepting condition of 𝑖 . Then from
|
||
{(𝑞0 , 𝑎, 𝑞 ′ , (1)) ∣ ∃𝑞0′ ∈ 𝐼 . (𝑞0′ , 𝑎, 𝑞 ′ ) ∈ 𝛿} ∪
|
||
Observation 2, we know that the following two conditions are equiva-
|
||
{(𝑞 , 𝑎, 𝑞 ′ , (1)) ∣ 𝑞 ∈ 𝐹 , ∃𝑞0′ ∈ 𝐼 . (𝑞0′ , 𝑎, 𝑞 ′ ) ∈ 𝛿},
|
||
lent,
|
||
moreover, 𝛼 ′ = 𝑚 ≤ 𝑟′ ≤ 𝑛 if 𝐼 ∩ 𝐹 = ∅, otherwise 𝛼 ′ = 𝑟′ ≤ 𝑛.
|
||
(Intuitively, if 𝜀 is accepted by 𝑒′ , then the value of 𝑟′ can be • there are outputs 𝑣⃗1 ∈ (1 ), … , 𝑣⃗𝑛 ∈ (𝑛 ) such that
|
||
less than 𝑚.) Moreover, (𝑒′ ){𝑚,∞} is constructed by adapting 𝛼 ′ 𝜑[𝑣⃗1 ∕𝑅1 , … , 𝑣⃗𝑛 ∕𝑅𝑛 ] is true,
|
||
in (𝑒′ ){𝑚,𝑛} as follows: 𝛼 ′ = 𝑚 ≤ 𝑟′ if 𝐼 ∩ 𝐹 = ∅ and 𝛼 ′ = 𝚝𝚛𝚞𝚎 • there are 𝜂1 ∈ 𝗉𝖺𝗋𝗂𝗄𝗁(1 ), … , 𝜂𝑛 ∈ 𝗉𝖺𝗋𝗂𝗄𝗁(𝑛 ) such that
|
||
otherwise.
|
||
⋀ ⎡ ∑ / ⎤ ⎡ ∑ / ∑ / ⎤
|
||
2. For each sub-expression 𝑒′ of 𝑒 such that 𝑒′ contains occurrences 𝛼𝑖 ⎢ ⃗ 𝑣⃗ 𝑅𝑖 ⎥ ∧ 𝜑 ⎢
|
||
𝜂𝑖 (𝑣) 𝜂1 (𝑣)
|
||
⃗ 𝑣⃗ 𝑅1 , … , ⃗ 𝑣⃗ 𝑅𝑛 ⎥
|
||
𝜂𝑛 (𝑣)
|
||
of counting operators but 𝑒′ itself is not of the form (𝑒′1 ){𝑚,𝑛} or ⎢ ⎥ ⎢ ⎥
|
||
𝑖∈[𝑛] ⎣𝑣∈C
|
||
⃗ 𝑖 ⎦ ⎣𝑣∈C
|
||
⃗
|
||
1
|
||
𝑣∈C
|
||
⃗ 𝑛 ⎦
|
||
(𝑒′1 ){𝑚,∞} , from the assumption that 𝑒 is register-representable,
|
||
we know that 𝑒′ is of the form 𝑒′1 ⋅ 𝑒′2 , 𝑒′1 + 𝑒′2 , 𝑒′1 ∩ 𝑒′2 , or (𝑒′1 ). For is true.
|
||
𝑒′ = (𝑒′1 ), we have 𝑒′ = 𝑒′ . For 𝑒′ = 𝑒′1 ⋅ 𝑒′2 , 𝑒′ = 𝑒′1 + 𝑒′2 , Therefore, to solve the non-emptiness of CEFA in 𝛬′𝑥 , … , 𝛬′𝑥 w.r.t.
|
||
1 1 𝑛
|
||
or 𝑒′ = 𝑒′1 ∩ 𝑒′2 , suppose that CEFA 𝑒′ and 𝑒′ have been 𝜑, it is sufficient to compute the existential LIA formulas 𝜓 (ZC ), …,
|
||
1 2 1 1
|
||
constructed. 𝜓 (ZC ) to represent the Parikh images of 1 , ⋯, 𝑛 respectively,
|
||
𝑛 𝑛
|
||
3. For each maximal sub-expression 𝑒′ of 𝑒 such that 𝑒′ contains where ZC = {z𝑖,𝑣⃗ ∣ 𝑣⃗ ∈ C𝑖 } for 𝑖 ∈ [𝑛], and solve the satisfiability of
|
||
𝑖
|
||
no occurrences of counting operators, an NFA 𝑒′ can be con- the following existential LIA formula
|
||
structed by structural induction on the syntax of 𝑒′ . Then we
|
||
⋀⎛ ⎡ ∑ / ⎤⎞
|
||
have 𝑒′ = 𝑒′ ⋅ 𝑒′ , 𝑒′ = 𝑒′ ∪ 𝑒′ , or 𝑒′ = 𝑒′ ∩ 𝑒′ . ⎜𝜓 (ZC ) ∧ 𝛼𝑖 ⎢ z𝑖,𝑣⃗ 𝑣⃗ 𝑅𝑖 ⎥⎟ ∧
|
||
1 2 1 2 1 2
|
||
⎜ 𝑖 𝑖 ⎢ ⎥⎟
|
||
𝑖∈[𝑛] ⎝ ⎣𝑣∈C
|
||
⃗ 𝑖 ⎦⎠
|
||
For non-register-representable regexes, we first convert them into
|
||
register- ⎡ ∑ / ∑ / ⎤
|
||
representable regexes by unfolding all the non-register-representable 𝜑⎢ z1,𝑣⃗ 𝑣⃗ 𝑅1 , … , z𝑛,𝑣⃗ 𝑣⃗ 𝑅𝑛 ⎥ .
|
||
⎢ ⎥
|
||
occurrences of counting operators. After that, we utilize the aforemen- ⎣𝑣∈C
|
||
⃗
|
||
1
|
||
𝑣∈C
|
||
⃗ 𝑛 ⎦
|
||
tioned procedure to construct CEFA. For instance, (𝑎{2,6} ⋅ 𝑏∗ ){2,∞} is Intuitively, the integer variables z𝑖,𝑣⃗ represent the number of occur-
|
||
transformed into (𝑎𝑎(𝜀 + 𝑎 + 𝑎𝑎 + 𝑎𝑎𝑎 + 𝑎𝑎𝑎𝑎) ⋅ 𝑏∗ ){2,∞} . rences of 𝑣⃗ in the strings accepted by 𝑖 .
|
||
For each 𝑖 ∈ [𝑛], let 𝑥𝑖 ∈ 𝑒𝑖,1 , … , 𝑥𝑖 ∈ 𝑒𝑖,𝑙𝑖 be an enumeration of Because the sizes of the LIA formulas 𝜓 (ZC ), … , 𝜓 (ZC ) are
|
||
1 1 𝑛 𝑛
|
||
the regular membership constraints for 𝑥𝑖 in 𝜑1 . Then we can construct proportional to the sizes (more precisely, the alphabet size, the number
|
||
CEFA 𝑖,𝑗 from 𝑒𝑖,𝑗 for each 𝑖 ∈ [𝑛] and 𝑗 ∈ [𝑙𝑖 ]. Moreover, we construct of states and transitions) of NFA 1 , … , 𝑛 , and the satisfiability of
|
||
another CEFA 𝑖,0 for each 𝑖 ∈ [𝑛] to model the length of 𝑥𝑖 . Specifi- existential LIA formulas is NP-complete, it is vital to reduce the sizes
|
||
cally, 𝑖,0 is constructed as ((𝑟𝑖,0 ), {𝑞𝑖,0 }, 𝛴 , 𝛿𝑖,0 , {𝑞𝑖,0 }, {𝑞𝑖,0 }, 𝚝𝚛𝚞𝚎) where of these NFAs to improve the performance.
|
||
𝑟𝑖,0 is a fresh register and 𝛿𝑖,0 = {(𝑞𝑖,0 , 𝑎, 𝑞𝑖,0 , (1)) ∣ 𝑎 ∈ 𝛴}. Let 𝛬𝑥𝑖 = ∑ ∑
|
||
Since 𝑣∈C(⃗ 𝑖)
|
||
⃗ 𝑣⃗ = 𝑣∈C(
|
||
𝜂(𝑣) ⃗ 𝜂(𝑣) ⃗ 𝑣⃗ for each 𝑖 ∈ [𝑛] and 𝜂 ∈
|
||
{𝑖,0 , 𝑖,1 , … , 𝑖,𝑙𝑖 } for each 𝑖 ∈ [𝑛], and 𝜑′2 ≡ 𝜑2 [𝑟1,0 ∕|𝑥1 |, … , 𝑟𝑛,0 ∕|𝑥𝑛 |]. ⃗ 𝑖 )⧵{0}
|
||
𝗉𝖺𝗋𝗂𝗄𝗁(𝑖 ), it turns out that the 0-labeled ⃗ transitions in 𝑖 do not
|
||
Then the satisfiability of 𝜑 is reduced to the non-emptiness of CEFAs in ∑
|
||
𝛬𝑥1 , … , 𝛬𝑥𝑛 w.r.t. 𝜑′2 . contribute to the final output 𝑣∈C( ⃗ 𝑖 ) 𝜂(𝑣)
|
||
⃗ 𝑣.
|
||
⃗ Therefore, we can apply
|
||
the following size-reduction technique for 𝑖 ’s.
|
||
6.2. Solving 𝖭𝖤𝖫𝖨𝖠 (𝖢𝖤𝖥𝖠) Step 2 (Reducing automata sizes). For each 𝑖 ∈ [𝑛], we view the
|
||
transitions (𝑞 , 0, ⃗ 𝑞 ′ ) in as 𝜀-transitions (𝑞 , 𝜀, 𝑞 ′ ), and remove the 𝜀-
|
||
𝑖
|
||
In this section, we present a procedure to solve the 𝖭𝖤𝖫𝖨𝖠 (𝖢𝖤𝖥𝖠) transitions from 𝑖 . Then we determinize and minimize the resulting
|
||
problem: Suppose that 𝑥1 , … , 𝑥𝑛 are mutually distinct string variables, NFA. □
|
||
𝛬𝑥1 , … , 𝛬𝑥𝑛 are nonempty sets of CEFA over the same alphabet 𝛴 where For 𝑖 ∈ [𝑛], let 𝑖 denote the DFA obtained from 𝑖 by executing
|
||
𝛬𝑥𝑖 = {𝑖,1 , …, 𝑖,𝑙𝑖 } for every 𝑖 ∈ [𝑛]. Moreover, the sets of registers Step 2 and C𝑖 ∶= C𝑖 ⧵ {0}. ⃗ From the construction, we know that
|
||
𝑅1,1 , … , 𝑅𝑛,𝑙 are mutually disjoint, and 𝜑 is a LIA formula whose 𝗉𝖺𝗋𝗂𝗄𝗁(𝑖 ) = 𝗉𝗋𝗃C (𝗉𝖺𝗋𝗂𝗄𝗁(𝑖 )) for each 𝑖 ∈ [𝑛]. Therefore, we compute
|
||
𝑛 ⋃ 𝑖
|
||
free variables are from 𝑖∈[𝑛],𝑗∈[𝑙𝑖 ] 𝑅𝑖,𝑗 . The procedure comprises three LIA formulas from 𝑖 ’s, instead of 𝑖 ’s, to represent the Parikh images.
|
||
steps. Step 3 (Computing an LIA formula). For each 𝑖 ∈ [𝑛], we compute an
|
||
Step 1 (Computing intersection automata). For each 𝑖 ∈ [𝑛], compute existential LIA formula 𝜓𝑖 (ZC ) from 𝑖 to represent 𝗉𝖺𝗋𝗂𝗄𝗁(𝑖 ). Then
|
||
𝑖
|
||
a CEFA 𝑖 = 𝑖,1 ∩ ⋯ ∩ 𝑖,𝑙𝑖 , and let 𝛬′𝑥 ∶= {𝑖 }. □ the 𝖭𝖤𝖫𝖨𝖠 (𝖢𝖤𝖥𝖠) problem is reduced to the satisfiability of the following
|
||
𝑖
|
||
After Step 1, the non-emptiness of CEFAs in 𝛬𝑥1 , … , 𝛬𝑥𝑛 w.r.t. 𝜑 is LIA formula, □
|
||
reduced to the non-emptiness of CEFAs in 𝛬′𝑥 , … , 𝛬′𝑥 w.r.t. 𝜑.
|
||
1 𝑛 ⋀⎛ ⎡ ∑ / ⎤⎞
|
||
In the following steps, we reduce the non-emptiness of CEFAs in ⎜𝜓 (ZC ) ∧ 𝛼𝑖 ⎢ z𝑖,𝑣⃗ 𝑣⃗ 𝑅𝑖 ⎥⎟ ∧
|
||
𝛬′𝑥 , …, 𝛬′𝑥 w.r.t. 𝜑 to the satisfiability of an LIA formula. The reduction ⎜ 𝑖 𝑖 ⎢ ⎥⎟
|
||
1 𝑛
|
||
𝑖∈[𝑛] ⎝ ⎣𝑣∈C
|
||
⃗ 𝑖 ⎦⎠
|
||
relies on the following two observations. (3)
|
||
⎡ ∑ / ∑ / ⎤
|
||
𝜑 ⎢ z1,𝑣⃗ 𝑣⃗ 𝑅1 , … , z𝑛,𝑣⃗ 𝑣⃗ 𝑅𝑛 . ⎥
|
||
Observation 1. CEFA in 𝛬′𝑥 = {1 }, … , 𝛬′𝑥 = {𝑛 } are nonempty w.r.t. ⎢ ⎥
|
||
1 𝑛 ⎣𝑣∈C
|
||
⃗
|
||
1
|
||
𝑣∈C
|
||
⃗ 𝑛 ⎦
|
||
𝜑 iff there are outputs 𝑣⃗1 ∈ (1 ), … , 𝑣⃗𝑛 ∈ (𝑛 ) such that 𝜑[𝑣⃗1 ∕𝑅1 , …,
|
||
𝑣⃗𝑛 ∕𝑅𝑛 ] is true. Step 4 (Solving the LIA formula). We utilize some off-the-shelf SMT
|
||
Let = (𝑅, 𝑄, 𝛴 , 𝛿 , 𝐼 , 𝐹 , 𝛼) be a CEFA and C = {𝑣⃗ ∣ ∃𝑞 , 𝑎, 𝑞 ′ . (𝑞 , 𝑎, 𝑞 ′ , solver (e.g. Z3) to solve the satisfiability of the LIA formula in Eq. (3).
|
||
⃗ ∈ 𝛿}. Moreover, let = (𝑄, C , 𝛿 ′ , 𝐼 , 𝐹 ) be an NFA over the alpha-
|
||
𝑣) If the solver reports ‘‘sat’’ and returns a solution m, then we go to Step
|
||
bet C that is obtained from by dropping the accepting condition 5. Otherwise, the 𝖭𝖤𝖫𝖨𝖠 (𝖢𝖤𝖥𝖠) problem has no solution. □
|
||
and ignoring the characters, that is, 𝛿 ′ comprises tuples (𝑞 , 𝑣, ⃗ 𝑞 ′ ) such Step 5 (Constructing a solution). Let m be a solution of the LIA formula
|
||
that (𝑞 , 𝑎, 𝑞 ′ , 𝑣)
|
||
⃗ ∈ 𝛿 for 𝑎 ∈ 𝛴. in Eq. (3). We want to construct from m and the CEFAs 𝑖 for 𝑖 ∈ [𝑛],
|
||
|
||
7
|
||
D. Hu and Z. Wu Journal of Systems Architecture 160 (2025) 103340
|
||
|
||
|
||
a solution m′ of the 𝖭𝖤𝖫𝖨𝖠 (𝖢𝖤𝖥𝖠) problem. The construction of m′ is RECL constraint: (i) unfold the counting operators in 𝑒1 , (ii)
|
||
obtained by constructing m′ (𝑥𝑖 ) for each 𝑖 ∈ [𝑛]. Let 𝑖 ∈ [𝑛]. We unfold the counting operator {𝑚,𝑛} . Let 𝗎𝖿 𝗅𝖽1 (𝑒) and 𝗎𝖿 𝗅𝖽2 (𝑒) denote
|
||
describe how to construct m′ (𝑥𝑖 ). The free variables ZC correspond the resulting regular expressions. Note that in 𝗎𝖿 𝗅𝖽1 (𝑒), all the
|
||
𝑖
|
||
to the number of occurrences of characters in C𝑖 , i.e. the alphabet of counting operators in 𝑒1 (if there is any) disappear, while in
|
||
the NFA 𝑖 . Moreover, in 𝑖 , the characters are just the integer vectors 𝗎𝖿 𝗅𝖽2 (𝑒), all the counting operators in 𝑒1 are preserved and copied
|
||
and the characters in the original CEFAs 𝑖 used to construct 𝑖 are for 𝑛 times.
|
||
ignored. The construction of m′ (𝑥𝑖 ) is actually to solve the reachability • In the sequel, we estimate the sizes (the number of transitions)
|
||
problem in a finite transition system where of the NFAs 𝗎𝖿 𝗅𝖽 (𝑒) and 𝗎𝖿 𝗅𝖽 (𝑒) , denoted by 𝑠𝑧1 and 𝑠𝑧2 . If
|
||
1 2
|
||
𝑠𝑧1 ≤ 𝑠𝑧2 , then we choose to unfold the counting operators in
|
||
• the nodes are of the form (𝑞 , (n𝑖,𝑣⃗ )𝑣∈C
|
||
⃗ 𝑖 ), where n𝑖,𝑣⃗ denotes the 𝑒1 . Otherwise, we choose to unfold the counting operator {𝑚,𝑛} .
|
||
number of the remaining occurrences of the integer vector 𝑣⃗ in • To estimate the sizes of 𝗎𝖿 𝗅𝖽 (𝑒) and 𝗎𝖿 𝗅𝖽 (𝑒) , we first esti-
|
||
1 2
|
||
𝑖 , mate the sizes of the CEFAs 𝗎𝖿 𝗅𝖽1 (𝑒) and 𝗎𝖿 𝗅𝖽2 (𝑒) as two pairs
|
||
• the transitions are of the form (𝗌𝗇𝗎𝗆1 (𝑒), 𝗏𝗇𝗎𝗆1 (𝑒)) and (𝗌𝗇𝗎𝗆2 (𝑒), 𝗏𝗇𝗎𝗆2 (𝑒)), where 𝗌𝗇𝗎𝗆1 (𝑒) and
|
||
((𝑞 , (n𝑖,𝑣⃗ )𝑣∈C ′
|
||
⃗), (𝑞 ′ , (n′𝑖,𝑣⃗ )𝑣∈C 𝗏𝗇𝗎𝗆1 (𝑒) are the estimates of the number of states and integer vec-
|
||
⃗ ), (𝑞 , 𝑎, 𝑞 , 𝑢 ⃗ ))
|
||
𝑖 𝑖 tors in 𝗎𝖿 𝗅𝖽1 (𝑒) separately, similarly for (𝗌𝗇𝗎𝗆2 (𝑒), 𝗏𝗇𝗎𝗆2 (𝑒)). Since
|
||
such that (𝑞 , 𝑎, 𝑞 ′ , 𝑢⃗) is a transition in 𝑖 , n′ = n𝑖,𝑢⃗ − 1, moreover, the NFAs 𝗎𝖿 𝗅𝖽 (𝑒) and 𝗎𝖿 𝗅𝖽 (𝑒) are constructed from 𝗎𝖿 𝗅𝖽1 (𝑒)
|
||
𝑖,𝑢⃗ 1 2
|
||
n′ = n𝑖,𝑣⃗ for every 𝑣⃗ ≠ 𝑢⃗. and 𝗎𝖿 𝗅𝖽2 (𝑒) by dropping the accepting conditions and ignore
|
||
𝑖,𝑣⃗
|
||
characters, 𝗌𝗇𝗎𝗆1 (𝑒) and 𝗏𝗇𝗎𝗆1 (𝑒) are the estimates of the number
|
||
Moreover, the initial and final state of the reachability problem are set of states and characters of 𝗎𝖿 𝗅𝖽 (𝑒) , similarly for 𝗎𝖿 𝗅𝖽 (𝑒) . Then
|
||
1 2
|
||
to be (𝑞0 , (m(z𝑖,𝑣⃗ ))𝑣∈C
|
||
⃗ 𝑖 ) and (𝑞𝑓 , (0, … , 0)) for some initial state 𝑞0 and we estimate the sizes (number of transitions) of the NFAs 𝗎𝖿 𝗅𝖽 (𝑒)
|
||
1
|
||
final state 𝑞𝑓 in 𝑖 . If a path from the initial state to the final state is and 𝗎𝖿 𝗅𝖽 (𝑒) as 𝑠𝑧1 = (𝗌𝗇𝗎𝗆1 (𝑒))2 ⋅ 𝗏𝗇𝗎𝗆1 (𝑒) and 𝑠𝑧2 = (𝗌𝗇𝗎𝗆2 (𝑒))2 ⋅
|
||
found, then the path gives us a desired string m′ (𝑥𝑖 ). □ 2
|
||
𝗏𝗇𝗎𝗆2 (𝑒) respectively.
|
||
|
||
7. Optimizations for non-register-representable RECL constraints It remains to show how to compute (𝗌𝗇𝗎𝗆1 (𝑒), 𝗏𝗇𝗎𝗆1 (𝑒)) and
|
||
(𝗌𝗇𝗎𝗆2 (𝑒), 𝗏𝗇𝗎𝗆2 (𝑒)) from 𝑒 = (𝑒1 ){𝑚,𝑛} . Let 𝑒 = (𝑒1 ){𝑚,𝑛} be a reg-
|
||
In Section 6, to deal with the non-register-representable RECL con- ular expression where 𝑒1 is a register-representable regular expres-
|
||
straints, we unfold all the non-register-representable occurrences of sion. Note that there is only one register in 𝗎𝖿 𝗅𝖽1 (𝑒) . As a result,
|
||
counting operators. Nevertheless, the naive unfolding incurs an expo- we set 𝗏𝗇𝗎𝗆1 (𝑒) = 2. Moreover, in the sequel, we inductively com-
|
||
nential blow-up over the formula sizes, especially when the counting pute 𝗌𝗇𝗎𝗆′1 (𝑒1 ) and (𝗌𝗇𝗎𝗆′2 (𝑒1 ), 𝗏𝗇𝗎𝗆′2 (𝑒1 )) from 𝑒1 , then let 𝗌𝗇𝗎𝗆1 (𝑒) =
|
||
bounds are large. Recall that an occurrence of counting operators is said 𝗌𝗇𝗎𝗆′1 (𝑒1 ) and (𝗌𝗇𝗎𝗆2 (𝑒), 𝗏𝗇𝗎𝗆2 (𝑒)) = (𝗌𝗇𝗎𝗆′2 (𝑒1 ), 𝗏𝗇𝗎𝗆′2 (𝑒1 )).
|
||
to be non-register-representable if it is in the scope of some Kleene star,
|
||
another counting operator, complement, or language difference. Since • If 𝑒1 = ∅ or 𝑒1 = 𝜖, then 𝗌𝗇𝗎𝗆′1 (𝑒1 ) = 0 and
|
||
Kleene star can be seen as a special case of the counting operators, and (𝗌𝗇𝗎𝗆′2 (𝑒1 ), 𝗏𝗇𝗎𝗆′2 (𝑒1 )) = (0, 0).
|
||
language difference 𝑒1 ⧵ 𝑒2 can be rewritten as 𝑒1 ∩ 𝑒2 , without loss of
|
||
(Note that here the constant 0, instead of 1, is chosen, so that
|
||
generality, we can assume that a non-register representable occurrence
|
||
later on, when we consider 𝑒1 = 𝑒2 ∩ 𝑒3 , 𝗌𝗇𝗎𝗆′2 (𝑒2 ) × 𝗌𝗇𝗎𝗆′2 (𝑒3 ) and
|
||
of a counting operator is in the scope of another counting operator or
|
||
𝗏𝗇𝗎𝗆′2 (𝑒2 ) ×𝗏𝗇𝗎𝗆′2 (𝑒3 ) will become 0 since 𝜖 ∩𝑒3 = 𝜖 and ∅ ∩𝑒3 = ∅.)
|
||
some complement operator. In the sequel, to alleviate the formula-size
|
||
blow-up problem resulted from the naive unfolding, we present various
|
||
• If 𝑒1 = 𝑎, then 𝗌𝗇𝗎𝗆′1 (𝑒1 ) = 2 and
|
||
optimizations, for the nesting of counting operators, and the nesting of
|
||
counting operators and a complement operator respectively. (𝗌𝗇𝗎𝗆′2 (𝑒1 ), 𝗏𝗇𝗎𝗆′2 (𝑒1 )) = (2𝑛, 1).
|
||
|
||
7.1. Optimizations for the nesting of counting operators
|
||
• If 𝑒1 = 𝑒2 ⋅ 𝑒3 , then 𝗌𝗇𝗎𝗆′1 (𝑒1 ) = 𝗌𝗇𝗎𝗆′1 (𝑒2 ) + 𝗌𝗇𝗎𝗆′1 (𝑒3 ) and
|
||
(𝗌𝗇𝗎𝗆′2 (𝑒1 ), 𝗏𝗇𝗎𝗆′2 (𝑒1 )) =
|
||
In the sequel, we propose an optimization for the nesting of count-
|
||
ing operators. The optimization is designed for the special case that (𝑛(𝗌𝗇𝗎𝗆′2 (𝑒2 ) + 𝗌𝗇𝗎𝗆′2 (𝑒3 )), 𝑛(𝗏𝗇𝗎𝗆′2 (𝑒2 ) + 𝗏𝗇𝗎𝗆′2 (𝑒3 ))).
|
||
the nesting depth of counting operators is one. For a RECL constraint (Note that because 𝑒1 is concatenated for 𝑛 times in 𝗎𝖿 𝗅𝖽2 (𝑒), we
|
||
whose nesting depth of counting operators is greater than two, we can define 𝗌𝗇𝗎𝗆′2 (𝑒1 ) as 𝑛(𝗌𝗇𝗎𝗆′2 (𝑒2 ) + 𝗌𝗇𝗎𝗆′2 (𝑒3 )). Moreover, since the
|
||
apply the optimization for the inner-most nesting of counting operators integer vectors in 1 ⋅ 2 are of the form (𝑣⃗1 , 0) ⃗ and (0,
|
||
⃗ 𝑣⃗2 ) where
|
||
and decrement the nesting depth, then apply the optimization again to 𝑣⃗1 and 𝑣⃗2 are integer vectors in 1 and 2 respectively, we define
|
||
the resulting constraint, and continue this process, until obtaining a 𝗏𝗇𝗎𝗆′2 (𝑒1 ) as 𝑛(𝗏𝗇𝗎𝗆′2 (𝑒2 ) + 𝗏𝗇𝗎𝗆′2 (𝑒3 ))).
|
||
register-representable RECL constraint. • If 𝑒1 = 𝑒2 + 𝑒3 , then 𝗌𝗇𝗎𝗆′1 (𝑒1 ) = 𝗌𝗇𝗎𝗆′1 (𝑒2 ) + 𝗌𝗇𝗎𝗆′1 (𝑒3 ) and
|
||
The main idea of the optimization is illustrated by the constraint
|
||
(𝗌𝗇𝗎𝗆′2 (𝑒1 ), 𝗏𝗇𝗎𝗆′2 (𝑒1 )) =
|
||
𝑥 ∈ (𝑎{1,1000} ){1,2} . If we apply the naive unfolding in Section 6, we
|
||
(𝑛(𝗌𝗇𝗎𝗆′2 (𝑒2 ) + 𝗌𝗇𝗎𝗆′2 (𝑒3 ) + 1), 𝑛(2𝗏𝗇𝗎𝗆′2 (𝑒2 ) + 2𝗏𝗇𝗎𝗆′2 (𝑒3 ))).
|
||
will unfold 𝑎{1,1000} and obtain the constraint 𝑥 ∈ (𝑎(𝜀 + 𝑎 + 𝑎𝑎 +
|
||
⋯ + 𝑎 ⋯ 𝑎 )){1,2} . It is easy to observe that a smarter way is to unfold (Recall that each integer vector in 1 resp. 2 is copied at most
|
||
⏟⏟⏟ twice in the construction of 1 ∪ 2 .)
|
||
999
|
||
{1,2} , instead of {1,1000} . If we do this, then we get a constraint 𝑥 ∈ • If 𝑒1 = 𝑒2 ∩ 𝑒3 , then 𝗌𝗇𝗎𝗆′1 (𝑒1 ) = 𝗌𝗇𝗎𝗆′1 (𝑒2 ) × 𝗌𝗇𝗎𝗆′1 (𝑒3 ) and
|
||
(𝑎{1,1000} )(𝜀 + 𝑎{1,1000} ), whose size is much smaller, compared to 𝑥 ∈ (𝗌𝗇𝗎𝗆′2 (𝑒1 ), 𝗏𝗇𝗎𝗆′2 (𝑒1 )) =
|
||
(𝑎(𝜀+𝑎+𝑎𝑎+⋯+ 𝑎 ⋯ 𝑎 )){1,2} . (Recall that we assume a binary encoding (𝑛(𝗌𝗇𝗎𝗆′2 (𝑒2 ) × 𝗌𝗇𝗎𝗆′2 (𝑒3 )), 𝑛(𝗏𝗇𝗎𝗆′2 (𝑒2 ) × 𝗏𝗇𝗎𝗆′2 (𝑒3 ))).
|
||
⏟⏟⏟
|
||
999 (Recall that the integer vectors in 1 ∩ 2 are of the form
|
||
of the integer constants in counting operators.)
|
||
(𝑣⃗1 , 𝑣⃗2 ), where 𝑣⃗1 and 𝑣⃗2 are the integer vectors in 1 and 2
|
||
Let us describe the optimization more precisely.
|
||
respectively.)
|
||
′
|
||
• For a regular expression 𝑒 = (𝑒1 ){𝑚,𝑛} where 𝑒1 is a register- • If 𝑒1 = 𝑒2 , then 𝗌𝗇𝗎𝗆′1 (𝑒1 ) = 2𝗌𝗇𝗎𝗆1 (𝑒2 ) and
|
||
′
|
||
representable regular expression, we can unfold the counting (𝗌𝗇𝗎𝗆′2 (𝑒1 ), 𝗏𝗇𝗎𝗆′2 (𝑒1 )) = (𝑛 ⋅ 2𝗌𝗇𝗎𝗆2 (𝑒2 ) , 1).
|
||
operators in the following ways to obtain a register-representable
|
||
|
||
8
|
||
D. Hu and Z. Wu Journal of Systems Architecture 160 (2025) 103340
|
||
|
||
|
||
(In this case, since 𝑒1 is register-representable, we know that 𝑒2
|
||
Intuitively, 𝑞𝑠 is a sink state that deals with the situations where there
|
||
contains no counting operators. As a result, we first construct the
|
||
are no runs, and 𝑟′ is a register newly introduced so that the different
|
||
NFA for 𝑒2 , then determinize and complement it to construct the
|
||
cases of the non-accepting runs of 𝑒1 can be captured by a single
|
||
automaton for 𝑒1 .)
|
||
accepting condition 𝛼 ′ .
|
||
• If 𝑒1 = (𝑒2 )∗ , then 𝗌𝗇𝗎𝗆′1 (𝑒1 ) = 𝗌𝗇𝗎𝗆′1 (𝑒2 ) and
|
||
Note that is a strict over-approximation of 𝑒 = 𝑒1 since 𝑒1 may be
|
||
(𝗌𝗇𝗎𝗆′2 (𝑒1 ), 𝗏𝗇𝗎𝗆′2 (𝑒1 )) = (𝑛𝗌𝗇𝗎𝗆′2 (𝑒2 ), 1). nondeterministic. For instance, if there are two runs of 𝑒1 on a string
|
||
𝑤, say one run that ends at a non-final state and another run that ends
|
||
(In this case, since 𝑒1 is register-representable, we know that 𝑒2
|
||
at a final state and satisfies accepting condition 𝛼. Then 𝑤 ∈ (𝑒1 ),
|
||
contains no counting operators.)
|
||
′ ′ ′ thus 𝑤 ∉ (𝑒1 ). Nevertheless, according to the construction, 𝑤 is
|
||
• If 𝑒1 = (𝑒2 ){𝑚 ,𝑛 } or 𝑒1 = (𝑒2 ){𝑛 ,∞} , then 𝗌𝗇𝗎𝗆′1 (𝑒1 ) = 𝑛′ 𝗌𝗇𝗎𝗆′1 (𝑒2 )
|
||
accepted by , witnessed by the run of 𝑒1 that ends at a non-final
|
||
and
|
||
state.
|
||
(𝗌𝗇𝗎𝗆′2 (𝑒1 ), 𝗏𝗇𝗎𝗆′2 (𝑒1 )) = (𝑛𝗌𝗇𝗎𝗆′2 (𝑒2 ), 𝑛).
|
||
Combination of under- and over-approximation. We combine the under-
|
||
(In this case, since 𝑒1 is register-representable, we know that 𝑒2 and over-approximation as well as the decision procedure into the
|
||
contains no counting operators.) following procedure.
|
||
|
||
1. We first apply the under-approximation.
|
||
7.2. Optimizations for the nesting of counting operators and a complement
|
||
2. If the under-approximation returns ‘‘sat’’, then we know that
|
||
operator
|
||
the original constraint is satisfiable. Otherwise, the under-appro-
|
||
As mentioned in Section 6, if some counting operators are in the ximation returns ‘‘unsat’’ and we apply the over-approximation.
|
||
scope of a complement operator, it is necessary to unfold these counting 3. If the over-approximation returns ‘‘unsat’’, then we know that
|
||
operators before the complement operation. The exponential blow- the original constraint is unsatisfiable. Otherwise, the over-
|
||
up in the unfolding combined with the blow-up in the complement approximation returns ‘‘sat’’ and we resort to the decision pro-
|
||
operation will produce a huge number of states. To alleviate this cedure (without approximations) in Section 6 to solve the RECL
|
||
problem, we propose two approximations of the complement operation, constraint.
|
||
i.e., an under-approximation and an over-approximation, and combine
|
||
them into a procedure that achieves a nice balance between efficiency 8. Optimizations for model generation
|
||
and precision.
|
||
Let 𝑒 = 𝑒1 be a regular expression, where 𝑒1 is a register- Recall that in Step 5 of Section 6, we construct a solution m′ from
|
||
representable regular expression. the model m of the LIA formula in Eq. (3) by solving the reachability
|
||
problems in a finite transition system. It is easy to see that the transition
|
||
Under-approximation. One way of doing the complement operation is
|
||
system contains exponentially many nodes in general and a naive
|
||
to transform 𝑒1 into a regular expression that does not contain counting
|
||
search over the transition system is not efficient. In the sequel, we
|
||
operators. We compute an under-approximation of 𝑒 by replacing each
|
||
describe two optimizations so that the reachability problem can be
|
||
occurrence of the counting operators in 𝑒1 , say (𝑒2 ){𝑚,𝑛} or (𝑒{𝑛,∞}
|
||
2
|
||
), by
|
||
solved more efficiently.
|
||
𝑒∗2 . Let us use 𝑒′1 to denote the resulting regular expression. Note that
|
||
To ease the presentation, let us assume that 𝑛, i.e. the number of
|
||
𝑒′1 does not contain counting operators. Moreover, it is easy to observe
|
||
string variables, to be 1.
|
||
that (𝑒1 ) ⊆ (𝑒′1 ). As a result, we have (𝑒′1 ) ⊆ (𝑒1 ). That is, 𝑒′1 is an At first, we refine the states of the transition system by recording
|
||
under-approximation of 𝑒1 . the number of transitions, instead of just the number of occurrences of
|
||
Over-approximation. For 𝑒 = 𝑒1 , we utilize the CEFA constructed from the integer vectors.
|
||
𝑒1 , say 𝑒1 , to construct a CEFA of 𝑒 so that () is an over- Refining the states of the transition system by recording the number of
|
||
approximation of (𝑒). Let 𝑒1 = (𝑅, 𝑄, 𝛴 , 𝛿 , 𝐼 , 𝐹 , 𝛼). The main idea of transitions. Let 𝜃 denote the formula in Eq. (3). We shall construct the
|
||
the construction is explained as follows: If a string 𝑤 is not accepted formula 𝜃 ′ so that any model of 𝜃 ′ necessarily encodes the number of
|
||
by 𝑒1 , then either there are no runs of 𝑒1 on 𝑤, or every run of 𝑒1 occurrences of transitions within 1 .
|
||
on 𝑤 stop at a non-final state or enter a final state but do not satisfy ⋀
|
||
𝜃 ′ ≡ 𝜓1 (Z1 ) ∧ z1,𝑣⃗ = m(z1,𝑣⃗ ) ∧
|
||
the accepting condition 𝛼. We shall construct a CEFA that accepts
|
||
𝑣∈C
|
||
⃗
|
||
when one of these situations occurs. Then (𝑒1 ) ⊆ (), that is, is ⋀
|
||
1
|
||
∑
|
||
an over-approximation of 𝑒. z1,𝑣⃗ = z1,(𝑞 ,𝑎,𝑞′ ,𝑣)
|
||
⃗,
|
||
Without loss of generality, we can assume that there are no transi- 𝑣∈C
|
||
⃗
|
||
1 (𝑞 ,𝑎,𝑞 ′ ) s.t. (𝑞 ,𝑎,𝑞 ′ ,𝑣)
|
||
⃗ is a transition in 1
|
||
tions out of 𝐹 in 𝑒1 . If this is not the case, then a new state 𝑞𝑓 can ′ ⃗ is a transition in } is the set of
|
||
where Z1 = {z1,(𝑞 ,𝑎,𝑞′ ,𝑣) ⃗ ∣ (𝑞 , 𝑎, 𝑞 , 𝑣) 1
|
||
be introduced to transform 𝑒1 into a CEFA satisfying this constraint, ′ ⃗
|
||
variables z1,(𝑞 ,𝑎,𝑞′ ,𝑣)⃗ denoting the number of occurrences of (𝑞 , 𝑎, 𝑞 , 𝑣).
|
||
where 𝑞𝑓 is the unique accepting state. Then we define = (𝑅∪ {𝑟′ }, 𝑄∪ ′
|
||
Note that compared to 𝜃, 𝜃 uses the LIA formula 𝜓1 , which is
|
||
{𝑞𝑠 }, 𝛴 , 𝛿 ′ , 𝐼 , 𝑄 ∪ {𝑞𝑠 }, 𝛼 ′ ), where
|
||
constructed from 1 by ignoring its accepting condition and tak-
|
||
• 𝑟′ is a new register not in 𝑅, ing it as an NFA where the alphabet is seen as the set of transi-
|
||
tions. Furthermore, 𝜃 ′ relates the values of the variables z1,𝑣⃗ and
|
||
• 𝑞𝑠 is a new state not in 𝑄,
|
||
• 𝛿 ′ comprises the following transitions, ⃗ , while simultaneously constraining the values of z1,𝑣⃗ to be
|
||
z1,(𝑞 ,𝑎,𝑞 ′ ,𝑣)
|
||
m(z1,𝑣⃗ ). We then submit 𝜃 ′ to SMT solver, which returns a model
|
||
– (𝑞 , 𝑎, 𝑞 ′ , (𝑣,
|
||
⃗ 0)) such that (𝑞 , 𝑎, 𝑞 ′ , 𝑣)
|
||
⃗ ∈ 𝛿 and 𝑞 ′ ∉ 𝐹 , of 𝜃 ′ , in particular, the values of the variables z1,(𝑞,𝑎,𝑞 ′ ,𝑣) ⃗ . Equipped
|
||
⃗ 0)) such that there do not exist 𝑎-labeled transi-
|
||
– (𝑞 , 𝑎, 𝑞𝑠 , (0, with the model, we refine the transition system by considering the
|
||
tions out of 𝑞, that is, for every 𝑞 ′ and 𝑣, ⃗ we have (𝑞 , 𝑎, 𝑞 ′ , 𝑣)
|
||
⃗ ∉ nodes (𝑞 , (n1,(𝑞,𝑎,𝑞 ′ ,𝑣) ⃗ is a transition in 1 ), where n1,(𝑞 ,𝑎,𝑞 ′ ,𝑣)
|
||
⃗ )(𝑞 ,𝑎,𝑞 ′ ,𝑣) ⃗ de-
|
||
𝛿, notes the number of remaining occurrences of (𝑞 , 𝑎, 𝑞 ′ , 𝑣). ⃗ Note that
|
||
– (𝑞 , 𝑎, 𝑞 ′ , (𝑣,
|
||
⃗ 1)) such that (𝑞 , 𝑎, 𝑞 ′ , 𝑣)
|
||
⃗ ∈ 𝛿 and 𝑞 ′ ∈ 𝐹 , this refinement can avoid searching useless transitions. For instance,
|
||
if n1,(𝑞,𝑎,𝑞 ′ ,𝑣) ′ ⃗ then we can avoid searching the
|
||
⃗
|
||
– (𝑞𝑠 , 𝑎, 𝑞𝑠 , (0, 0)) such that 𝑎 ∈ 𝛴, ⃗ = 0 for some (𝑞 , 𝑎, 𝑞 , 𝑣),
|
||
transition (𝑞 , 𝑎, 𝑞 ′ , 𝑣)
|
||
⃗ out of 𝑞. This stands in contrast to the original
|
||
• 𝛼 ′ is defined as 𝑟′ = 1 → ¬𝛼. transition system where n1,𝑣⃗ is used, it may be the case that we have
|
||
|
||
9
|
||
D. Hu and Z. Wu Journal of Systems Architecture 160 (2025) 103340
|
||
|
||
|
||
n1,𝑣⃗ > 0 (since there may be multiple transitions associated with 𝑣)
|
||
⃗ and Based on this template, we generate 500 instances.
|
||
the transition (𝑞 , 𝑎, 𝑞 ′ , 𝑣)
|
||
⃗ should still be searched. For the benchmarks that contain the nestings of counting operators
|
||
Moreover, we propose a method to reduce the number of transitions and a complement operator, we utilize the following template
|
||
in the transition system further.
|
||
𝑥 ∈ 𝑒𝑐 𝑜𝑚𝑝1 ∧ 𝑥 ∈ 𝑒𝑐 𝑜𝑚𝑝2 ∧ 𝑥 ∈ 𝑒𝑠𝑎𝑛𝑖 ∧ |𝑥| > 50,
|
||
Pruning the transitions in the transition system. Let the current state of {𝑚 ,𝑛 } {𝑚 ,𝑛 } {𝑚 ,𝑛 } {𝑚 ,𝑛 }
|
||
the transition system be (𝑝, (n1,(𝑞 ,𝑎,𝑞′ ,𝑣) where both 𝑒𝑐 𝑜𝑚𝑝1 and 𝑒𝑐 𝑜𝑚𝑝2 are of the form 𝑒1 1 1 𝑒2 2 2 𝑒3 3 3 𝑒4 4 4 ,
|
||
⃗ is a transition in 1 ). Let
|
||
⃗ )(𝑞 ,𝑎,𝑞 ′ ,𝑣)
|
||
𝑝 denote the set of transitions that are reachable from the state 𝑝 in such that for each 𝑖 ∈ [4], 20 ≤ 𝑚𝑖 ≤ 𝑛𝑖 ≤ 1000 and 𝑒𝑖 contains
|
||
1 . If {(𝑞 , 𝑎, 𝑞 ′ , 𝑣) no counting operators. Note that two regular expressions 𝑒𝑐 𝑜𝑚𝑝1 and
|
||
⃗ > 0} ⊈ 𝑝 , in other words, there exists
|
||
⃗ ∣ n1,(𝑞 ,𝑎,𝑞′ ,𝑣)
|
||
some transitions (𝑞 , 𝑎, 𝑞 ′ , 𝑣) ⃗ not reachable from the state 𝑝 in 1 while 𝑒𝑐 𝑜𝑚𝑝2 are introduced to have a better balance between the numbers of
|
||
satisfiable and unsatisfiable instances. Based on this template, we also
|
||
⃗ > 0 is established in the current state (𝑝, (n1,(𝑞 ,𝑎,𝑞 ′ ,𝑣)
|
||
n1,(𝑞,𝑎,𝑞 ′ ,𝑣) ⃗ )(𝑞 ,𝑎,𝑞 ′ ,𝑣)
|
||
⃗
|
||
is a transition in 1 ), then we know that it is impossible to reach the generate 500 instances.
|
||
final state (𝑞𝑓 , (0, … , 0)) from the current state so that we can remove In total, the NestC benchmark suite contains 1000 instances.
|
||
all the transitions out of the current state in the transition system. Note that the two templates are chosen so that on the one hand,
|
||
the nestings of counting operators or nestings of counting operators
|
||
and complement operators are available, and on the other hand, some
|
||
9. Experiments
|
||
additional regular and length constraints are introduced, so that the
|
||
constraints are not trivially satisfied, in particular, the constraints are
|
||
We implemented the procedures and optimizations in Sections 6–8
|
||
not satisfied by very short strings.
|
||
on top of OSTRICH, resulting in a string solver called OSTRICHRECL . In
|
||
this section, we evaluate the performance of OSTRICHRECL on three Distribution of problem instances w.r.t. counting bounds. The distribution
|
||
benchmark suites, that is, RegCoL, AutomatArk, and NestC. In the of problem instances w.r.t. the counting bounds in the three suites is
|
||
sequel, we first describe the three benchmark suites as well as the shown in Fig. 5, where the 𝑥-axis represents the counting bound and
|
||
experiment setup. Then we present the experiment results. We evaluate the 𝑦-axis represents the number of problem instances containing the
|
||
the performance and correctness of our solver against the state-of-the- counting bound. In our statistics, while most problem instances contain
|
||
art string constraint solvers, including CVC5 [3], Z3seq [1], Z3str3 [6], only small bounds, there are still around 3000 (about 6%) of them using
|
||
Z3str3RE [8], and OSTRICH [11]. We also compare OSTRICHRECL with large counting bounds (i.e., greater than or equal to 50).
|
||
several variants of itself, to evaluate the effectiveness of the technical
|
||
Experiment setup. All experiments are conducted on CentOS Stream
|
||
choices and optimizations in Sections 6–8.
|
||
release 8 with 4 Intel(R) Xeon(R) Platinum 8269CY 3.10 GHz CPU cores
|
||
and 190 GB memory. We use the zaligvinder framework [42] to execute
|
||
9.1. Benchmark suites and the experiment setup the experiments, with a timeout of 60 s per instance.
|
||
|
||
Our experiments utilize three benchmark suites, namely, RegCoL, 9.2. Performance evaluation against the other solvers
|
||
AutomatArk, and NestC. There are 49,843 instances in total. All bench-
|
||
mark instances are in the SMTLIB2 format. In the sequel, we give more We evaluate the performance of OSTRICHRECL against the state-
|
||
details of these three benchmark suites. of-the-art string constraint solvers, including CVC5 [3], Z3seq [1],
|
||
RegCoL benchmark suite. There are 40,628 RECL instances in the Reg- Z3str3 [6], Z3str3RE [8], and OSTRICH [11], on RegCoL, AutomatArk
|
||
CoL suite. These instances are generated by extracting regexes with and NestC benchmark suites. The experiment results can be found in
|
||
counting operators from the open source regex library [27,37] and Table 1. Note that we take the results of CVC5 as the ground truth,3
|
||
manually constructing a RECL constraint 𝑥 ∈ 𝑒 ∧ 𝑥 ∈ 𝑒𝑠𝑎𝑛𝑖 ∧ |𝑥| > and the results different from the ground truth are classified as error.
|
||
10 for each regex 𝑒, where 𝑒𝑠𝑎𝑛𝑖 ≡ 𝛴 ∗ (⟨+⟩ +′ +′′ + &)𝛴 ∗ is a regular We can see that OSTRICHRECL solves almost all instances, except 182
|
||
expression that sanitizes all occurrence of special characters <, >, ′ , ′′ , of them, that is, it solves 49,638 instances correctly. The number is
|
||
or &. The expression 𝑒𝑠𝑎𝑛𝑖 is introduced in view of the fact that these 4,547/1,917/13,618/2,764/3,134 more than the number of instances
|
||
characters are usually sanitized in Web browsers to alleviate the XSS solved by CVC5/Z3str3RE/Z3str3/Z3seq/OSTRICH respectively. More-
|
||
attacks [16,38]. over, OSTRICHRECL is the second fastest solver, whose average time on
|
||
each instance is close to the fastest solver Z3str3RE (2.88 s versus 1.99
|
||
AutomatArk benchmark suite. This benchmark suite is adapted from the s).
|
||
AutomatArk suite [39] by picking out the string constraints containing Furthermore, in order to see how OSTRICHRECL performs on the
|
||
counting operators. We also add the length constraint |𝑥| > 10 for RECL constraints where the counting and length bounds are large or
|
||
each string variable 𝑥. There are 8215 instances in the AutomatArk when the counting operators are nested with some other counting op-
|
||
suite. Note that the original AutomatArk benchmark suite [39] in- erators or with complement operators, we also compare OSTRICHRECL
|
||
cludes 19,979 instances, which are conjunctions of regular membership with the other solvers solely on the NestC benchmark suite. The ex-
|
||
queries generated out of regular expressions in [40]. periment results can be found in Table 2. From the results, we can see
|
||
NestC benchmark suite. We generate this benchmark suite by utilizing that OSTRICHRECL outperforms the other solvers in both the number
|
||
the string fuzzing tool stringfuzz [41]. Note that stringfuzz is slightly of solved instances and the average time. Specifically, OSTRICHRECL
|
||
extended by adding the counting operators. For the benchmarks that solves 873 out of 1000 instances, while the best state-of-the-art solver
|
||
contain the nestings of counting operators, we use the following tem- (Z3seq) solves only 518 instances. Moreover, while the average time
|
||
plate of OSTRICHRECL is only slightly better than Z3str3RE (19.49 s versus
|
||
20.09 s), OSTRICHRECL solves much more instances than it (873 versus
|
||
𝑥 ∈ 𝑒𝑛𝑒𝑠𝑡 ∧ 𝑥 ∈ 𝑒𝑠𝑎𝑛𝑖 ∧ |𝑥| > 50 170). The results show the great advantage of OSTRICHRECL against the
|
||
where 𝑒𝑛𝑒𝑠𝑡 is of the form other solvers on the NestC benchmark suite.
|
||
{𝑚 ,𝑛 } {𝑚 ,𝑛 } {𝑚 ,𝑛 } {𝑚 ,𝑛 }
|
||
(𝑒1 1 1 𝑒2 2 2 𝑒3 3 3 𝑒4 4 4 ){𝑚5 ,𝑛5 } ,
|
||
3
|
||
Initially, we used the majority vote of the results of the solvers as the
|
||
such that 0 ≤ 𝑚𝑖 ≤ 𝑛𝑖 ≤ 20 for each 𝑖 ∈ [5] and 𝑒𝑖 contains no counting ground truth. Nevertheless, on some problem instances, all the results of the
|
||
operators for each 𝑖 ∈ [4], moreover, three solvers in the Z3 family are wrong (after manual inspection), thus failing
|
||
this approach on these instances.
|
||
𝑒𝑠𝑎𝑛𝑖 ≡ 𝛴 ∗ (⟨+⟩ +′ +′′ + &)𝛴 ∗ .
|
||
|
||
10
|
||
D. Hu and Z. Wu Journal of Systems Architecture 160 (2025) 103340
|
||
|
||
|
||
|
||
|
||
Fig. 5. Distribution of problem instances w.r.t. counting bounds.
|
||
|
||
|
||
9.3. Evaluation of the technical choices and optimizations in Sections 6 -8 RECL constraints, the optimization for the nesting of counting operators
|
||
and a complement operator plays a greater role since it allows solving
|
||
At first, we do experiments to evaluate the effectiveness of the around 1/3 more instances.
|
||
automata size-reduction techniques (i.e., Step 2) in Section 6. Then
|
||
we justify our technical choices of reducing the satisfiability prob- Evaluation of the optimizations for the model generation in Section 8.
|
||
lem to that of LIA formulas and resort to the SMT solvers to solve We compare the performance of OSTRICHRECL with OSTRICHRECL −MODEL
|
||
,
|
||
them. Specifically, we compare OSTRICHRECL with a variant where where OSTRICHRECL −MODEL
|
||
is obtained from OSTRICHRECL by removing
|
||
the nuXmv model checker like [18] is used to solve the 𝖭𝖤𝖫𝖨𝖠 (𝖢𝖤𝖥𝖠) the optimizations for the model generation.
|
||
problem. We also evaluate the effectiveness of the optimizations in The experiment results can be found in Table 5. From the results, we
|
||
Section 7, i.e., the optimizations for the nesting of counting operators as can see that compared to OSTRICHRECL , OSTRICHRECL −MODEL
|
||
solves 349
|
||
well as the nesting of counting operators and a complement operator. fewer instances and is 23% slower on average (3.57 s versus 2.88 s).
|
||
Finally, we evaluate the effectiveness of the optimizations for the model As a result, the optimization for the model generation does improve
|
||
generation in Section 8. the performance, especially when the sizes of the CEFAs are huge. Note
|
||
that the optimization for model generation only affects the satisfiable
|
||
Evaluation of the technical choices and optimizations in Section 6. We instances, that is, those instances where the generated LIA formulas can
|
||
compare OSTRICHRECL with OSTRICHRECL NUXMV , which is a variant of be solved, but the generation of models for these RECL constraints is
|
||
OSTRICHRECL where the nuXmv model checker is used to solve the unsuccessful.
|
||
𝖭𝖤𝖫𝖨𝖠 (𝖢𝖤𝖥𝖠) problem. Moreover, we also compare OSTRICHRECL with
|
||
OSTRICHRECL
|
||
−ASR
|
||
, which is obtained from OSTRICHRECL by removing the
|
||
automata size-reduction technique (i.e. Step 2 in Section 6.2). 10. Conclusion
|
||
The experiment results can be found in Table 3. From the results,
|
||
we can see that OSTRICHRECL solves 2532 more instances and is 2.38 This work proposed an efficient automata-theoretical approach for
|
||
times faster than OSTRICHRECLNUXMV . Therefore, the technical choice in solving string constraints with regex-counting and string-length. The
|
||
Section 6 where LIA formulas are computed and then solved by the SMT approach is based on encoding counting operators in regular expres-
|
||
solvers is reasonable, since OSTRICHRECL with this technical choice sions by cost registers symbolically instead of unfolding them explicitly.
|
||
is more efficient than OSTRICHRECL
|
||
NUXMV , where nuXmv is used to solve Moreover, this work proposed various optimization techniques to im-
|
||
the 𝖭𝖤𝖫𝖨𝖠 (𝖢𝖤𝖥𝖠) problem. Moreover, OSTRICHRECL solves 1083 more prove the performance further. Finally, we used three benchmark suites
|
||
instances and is 1.65 times faster than OSTRICHRECL
|
||
−ASR
|
||
. From the results, comprising 49,843 instances in total to evaluate the performance of our
|
||
we can see that the automata-size reduction technique indeed plays an approach. The experimental results show that our approach can solve
|
||
important role for the performance improvement. more instances than the best state-of-the-art solver, at a comparable
|
||
Evaluation of the optimizations in Section 7. We also compare the perfor- or faster speed, especially when the counting and length bounds are
|
||
mance of OSTRICHRECL with its variants OSTRICHRECL and large or when counting operators are nested with some other counting
|
||
−NEST
|
||
OSTRICHRECL , where OSTRICH RECL
|
||
is obtained from OSTRICH RECL operators or complement operators.
|
||
−COMP −NEST
|
||
by removing the optimization for the nesting of counting operators,
|
||
RECL
|
||
and OSTRICH−COMP is obtained by removing the optimization for CRediT authorship contribution statement
|
||
the nesting of counting operators and a complement operator. Since
|
||
these optimizations are not targeted for register-representable RECL Denghang Hu: Writing – original draft, Methodology. Zhilin Wu:
|
||
constraints and there are very few non-register-representable RECL con- Writing – review & editing.
|
||
straints in the RegCoL and AutomatArk benchmark suites, we restrict
|
||
our attention to the NestC benchmark suite here.
|
||
The experiment results can be found in Table 4. From the results, we Declaration of competing interest
|
||
can see that on 1000 problem instances, OSTRICHRECL solves 82/339
|
||
more instances and is 1.02/1.92 times faster than OSTRICHRECL −NEST
|
||
/ The authors declare that they have no known competing finan-
|
||
OSTRICHRECL−COMP
|
||
respectively. As a result, while both optimizations do cial interests or personal relationships that could have appeared to
|
||
play a role in the performance improvement in solving these complex influence the work reported in this paper.
|
||
|
||
11
|
||
D. Hu and Z. Wu Journal of Systems Architecture 160 (2025) 103340
|
||
|
||
|
||
References [19] W. Gelade, M. Gyssens, W. Martens, Regular expressions with counting: Weak
|
||
versus strong determinism, SIAM J. Comput. 41 (1) (2012) 160–190, http:
|
||
//dx.doi.org/10.1137/100814196.
|
||
[1] L.M. de Moura, N. Bjørner, Z3: An efficient SMT solver, in: Tools and Algorithms
|
||
[20] H. Chen, P. Lu, Checking determinism of regular expressions with counting,
|
||
for the Construction and Analysis of Systems, 14th International Conference,
|
||
Inform. and Comput. 241 (2015) 302–320, http://dx.doi.org/10.1016/j.ic.2014.
|
||
TACAS 2008, Held As Part of the Joint European Conferences on Theory and
|
||
12.001.
|
||
Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008.
|
||
[21] B. Loring, D. Mitchell, J. Kinder, Sound regular expression semantics for dynamic
|
||
Proceedings, 2008, pp. 337–340, http://dx.doi.org/10.1007/978-3-540-78800-
|
||
symbolic execution of JavaScript, in: Proceedings of the 40th ACM SIGPLAN
|
||
3_24.
|
||
Conference on Programming Language Design and Implementation, PLDI 2019,
|
||
[2] T. Liang, A. Reynolds, C. Tinelli, C. Barrett, M. Deters, A DPLL(T) theory solver
|
||
Phoenix, AZ, USA, June 22-26, 2019, ACM, 2019, pp. 425–438, http://dx.doi.
|
||
for a theory of strings and regular expressions, in: CAV, 2014, pp. 646–662,
|
||
org/10.1145/3314221.3314645.
|
||
http://dx.doi.org/10.1007/978-3-319-08867-9_43.
|
||
[22] T. Chen, A. Flores-Lamas, M. Hague, Z. Han, D. Hu, S. Kan, A.W. Lin, P.
|
||
[3] H. Barbosa, C. Barrett, M. Brain, G. Kremer, H. Lachnitt, M. Mann, A. Mohamed, Rümmer, Z. Wu, Solving string constraints with regex-dependent functions
|
||
M. Mohamed, A. Niemetz, A. Nötzli, A. Ozdemir, M. Preiner, A. Reynolds, Y. through transducers with priorities and variables, Proc. ACM Program. Lang.
|
||
Sheng, C. Tinelli, Y. Zohar, CVC5: A versatile and industrial-strength SMT solver, 6 (POPL) (2022) 1–31, http://dx.doi.org/10.1145/3498707.
|
||
in: D. Fisman, G. Rosu (Eds.), Tools and Algorithms for the Construction and [23] M. Kaminski, N. Francez, Finite-memory automata, in: Proceedings [1990] 31st
|
||
Analysis of Systems, Springer International Publishing, Cham, 2022, pp. 415–442. Annual Symposium on Foundations of Computer Science, Vol. 2, 1990, pp.
|
||
[4] Y. Zheng, X. Zhang, V. Ganesh, Z3-str: a Z3-based string solver for web 683–688, http://dx.doi.org/10.1109/FSCS.1990.89590.
|
||
application analysis, in: ESEC/SIGSOFT FSE, 2013, pp. 114–124, http://dx.doi. [24] L. D’Antoni, T. Ferreira, M. Sammartino, A. Silva, Symbolic register automata, in:
|
||
org/10.1145/2491411.2491456. I. Dillig, S. Tasiran (Eds.), Computer Aided Verification, Springer International
|
||
[5] Y. Zheng, V. Ganesh, S. Subramanian, O. Tripp, J. Dolby, X. Zhang, Effective Publishing, Cham, 2019, pp. 3–21.
|
||
search-space pruning for solvers of string equations, regular expressions and [25] M.L. Minsky, Computation: Finite and Infinite Machines, in: Prentice-Hall Series
|
||
length constraints, in: Computer Aided Verification - 27th International Confer- in Automatic Computation, Prentice-Hall, 1967.
|
||
ence, CAV 2015, San Francisco, CA, USA, July 18-24, 2015, Proceedings, Part I, [26] F. Klaedtke, H. Rueß, Monadic second-order logics with cardinalities, in: J.C.M.
|
||
Springer, 2015, pp. 235–254, http://dx.doi.org/10.1007/978-3-319-21690-4_14. Baeten, J.K. Lenstra, J. Parrow, G.J. Woeginger (Eds.), Automata, Languages and
|
||
[6] M. Berzish, V. Ganesh, Y. Zheng, Z3str3: A string solver with theory-aware Programming, Springer Berlin Heidelberg, Berlin, Heidelberg, 2003, pp. 681–696.
|
||
heuristics, in: 2017 Formal Methods in Computer Aided Design, FMCAD 2017, [27] L. Turoňová, L. Holík, O. Lengál, O. Saarikivi, M. Veanes, T. Vojnar, Regex
|
||
Vienna, Austria, October 2-6, 2017, 2017, pp. 55–59, http://dx.doi.org/10. matching with counting-set automata, Proc. ACM Program. Lang. 4 (OOPSLA)
|
||
23919/FMCAD.2017.8102241. (2020) http://dx.doi.org/10.1145/3428286.
|
||
[7] M. Berzish, Z3str4: A Solver for Theories over Strings (Ph.D. thesis), University [28] L. Holík, J. Síc, L. Turonová, T. Vojnar, Fast matching of regular patterns with
|
||
of Waterloo, Ontario, Canada, 2021, URL https://hdl.handle.net/10012/17102. synchronizing counting, in: O. Kupferman, P. Sobocinski (Eds.), FoSSaCS 2023,
|
||
in: Lecture Notes in Computer Science, vol. 13992, Springer, 2023, pp. 392–412.
|
||
[8] M. Berzish, J.D. Day, V. Ganesh, M. Kulczynski, F. Manea, F. Mora, D. Nowotka,
|
||
[29] A. Le Glaunec, L. Kong, K. Mamouras, Regular expression matching using bit
|
||
Towards more efficient methods for solving regular-expression heavy string
|
||
vector automata, Proc. ACM Program. Lang. 7 (OOPSLA1) (2023) http://dx.doi.
|
||
constraints, Theoret. Comput. Sci. 943 (2023) 50–72.
|
||
org/10.1145/3586044.
|
||
[9] Diep Bui and contributors, Z3-trau, 2019, https://github.com/diepbp/z3-trau.
|
||
[30] D. Hu, Z. Wu, String constraints with regex-counting and string-length solved
|
||
[10] P.A. Abdulla, M.F. Atig, Y.-F. Chen, B.P. Diep, J. Dolby, P. Janků, H.-H. Lin, L. more efficiently, in: H. Hermanns, J. Sun, L. Bu (Eds.), Dependable Software
|
||
Holík, W.-C. Wu, Efficient handling of string-number conversion, in: Proceedings Engineering. Theories, Tools, and Applications, Springer Nature Singapore,
|
||
of the 41st ACM SIGPLAN Conference on Programming Language Design and Singapore, 2024, pp. 1–20.
|
||
Implementation, in: PLDI 2020, Association for Computing Machinery, New York, [31] J.E. Hopcroft, J.D. Ullman, Introduction to Automata Theory, Languages and
|
||
NY, USA, 2020, pp. 943–957, http://dx.doi.org/10.1145/3385412.3386034. Computation, Addison-Wesley, 1979.
|
||
[11] T. Chen, M. Hague, A.W. Lin, P. Rümmer, Z. Wu, Decision procedures for path [32] H. Seidl, T. Schwentick, A. Muscholl, P. Habermehl, Counting in trees for free, in:
|
||
feasibility of string-manipulating programs with complex operations, PACMPL 3 Automata, Languages and Programming: 31st International Colloquium, ICALP
|
||
(POPL) (2019) http://dx.doi.org/10.1145/3290362. 2004, Turku, Finland, July 12-16, 2004. Proceedings, 2004, pp. 1136–1149,
|
||
[12] H.-E. Wang, S.-Y. Chen, F. Yu, J.-H.R. Jiang, A symbolic model checking http://dx.doi.org/10.1007/978-3-540-27836-8_94.
|
||
approach to the analysis of string and length constraints, in: Proceedings of the [33] K.N. Verma, H. Seidl, T. Schwentick, On the complexity of equational Horn
|
||
33rd ACM/IEEE International Conference on Automated Software Engineering, clauses, in: CADE, 2005, pp. 337–352.
|
||
in: ASE 2018, ACM, 2018, pp. 623–633, http://dx.doi.org/10.1145/3238147. [34] C. Haase, A survival guide to Presburger arithmetic, ACM SIGLOG News 5 (3)
|
||
3238189. (2018) 67–82, http://dx.doi.org/10.1145/3242953.3242964.
|
||
[13] C. Chapman, K.T. Stolee, Exploring regular expression usage and context in [35] R. Cavada, A. Cimatti, M. Dorigatti, A. Griggio, A. Mariotti, A. Micheli, S.
|
||
Python, in: A. Zeller, A. Roychoudhury (Eds.), Proceedings of the 25th Interna- Mover, M. Roveri, S. Tonetta, The nuXmv symbolic model checker, in: Computer
|
||
tional Symposium on Software Testing and Analysis, ISSTA 2016, Saarbrücken, Aided Verification - 26th International Conference, CAV 2014, Held As Part
|
||
Germany, July 18-20, 2016, ACM, 2016, pp. 282–293, http://dx.doi.org/10. of the Vienna Summer of Logic, VSL 2014, Vienna, Austria, July 18-22, 2014.
|
||
1145/2931037.2931073. Proceedings, 2014, pp. 334–342.
|
||
[14] J.C. Davis, C.A. Coghlan, F. Servant, D. Lee, The impact of regular expression [36] K. Thompson, Programming techniques: Regular expression search algorithm,
|
||
denial of service (ReDoS) in practice: An empirical study at the ecosystem Commun. ACM 11 (6) (1968) 419–422, http://dx.doi.org/10.1145/363347.
|
||
scale, in: Proceedings of the 2018 26th ACM Joint Meeting on European 363387.
|
||
Software Engineering Conference and Symposium on the Foundations of Software [37] J.C. Davis, L.G. Michael IV, C.A. Coghlan, F. Servant, D. Lee, Why aren’t
|
||
Engineering, in: ESEC/FSE 2018, Association for Computing Machinery, New regular expressions a Lingua Franca? An empirical study on the re-use and
|
||
York, NY, USA, 2018, pp. 246–256. portability of regular expressions, in: Proceedings of the 2019 27th ACM Joint
|
||
Meeting on European Software Engineering Conference and Symposium on
|
||
[15] P. Wang, K.T. Stolee, How well are regular expressions tested in the wild? in:
|
||
the Foundations of Software Engineering, in: ESEC/FSE 2019, Association for
|
||
G.T. Leavens, A. Garcia, C.S. Pasareanu (Eds.), Proceedings of the 2018 ACM
|
||
Computing Machinery, New York, NY, USA, 2019, pp. 443–454, http://dx.doi.
|
||
Joint Meeting on European Software Engineering Conference and Symposium on
|
||
org/10.1145/3338906.3338909.
|
||
the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena
|
||
[38] T. Chen, Y. Chen, M. Hague, A.W. Lin, Z. Wu, What is decidable about string
|
||
Vista, FL, USA, November 04-09, 2018, ACM, 2018, pp. 668–678.
|
||
constraints with the ReplaceAll function, PACMPL 2 (POPL) (2018) 3:1–3:29,
|
||
[16] P. Saxena, D. Akhawe, S. Hanna, F. Mao, S. McCamant, D. Song, A symbolic
|
||
http://dx.doi.org/10.1145/3158091.
|
||
execution framework for JavaScript, in: 2010 IEEE Symposium on Security and [39] M. Berzish, M. Kulczynski, F. Mora, F. Manea, J.D. Day, D. Nowotka, V. Ganesh,
|
||
Privacy, 2010, pp. 513–528, http://dx.doi.org/10.1109/SP.2010.38. An SMT solver for regular expressions and linear arithmetic over string length, in:
|
||
[17] T. Liang, N. Tsiskaridze, A. Reynolds, C. Tinelli, C.W. Barrett, A decision A. Silva, K.R.M. Leino (Eds.), Computer Aided Verification, Springer International
|
||
procedure for regular membership and length constraints over unbounded Publishing, Cham, 2021, pp. 289–312.
|
||
strings, in: C. Lutz, S. Ranise (Eds.), Frontiers of Combining Systems - 10th [40] L. D’Antoni, Automatark: Automata benchmark, 2018, URL https://github.com/
|
||
International Symposium, FroCoS 2015, Wroclaw, Poland, September 21-24, lorisdanto/automatark.
|
||
2015. Proceedings, in: Lecture Notes in Computer Science, vol. 9322, Springer, [41] D. Blotsky, F. Mora, M. Berzish, Y. Zheng, I. Kabir, V. Ganesh, StringFuzz: A
|
||
2015, pp. 135–150. fuzzer for string solvers, in: H. Chockler, G. Weissenbacher (Eds.), Computer
|
||
[18] T. Chen, M. Hague, J. He, D. Hu, A.W. Lin, P. Rümmer, Z. Wu, A decision Aided Verification, Springer International Publishing, Cham, 2018, pp. 45–51.
|
||
procedure for path feasibility of string manipulating programs with integer data [42] M. Kulczynski, F. Manea, D. Nowotka, D.B. Poulsen, ZaligVinder: A generic test
|
||
type, in: D.V. Hung, O. Sokolsky (Eds.), Automated Technology for Verification framework for string solvers, J. Software: Evol. Process. 35 (4) (2023) e2400,
|
||
and Analysis - 18th International Symposium, ATVA 2020, Hanoi, Vietnam, http://dx.doi.org/10.1002/smr.2400, arXiv:https://onlinelibrary.wiley.com/doi/
|
||
October 19-23, 2020, Proceedings, in: Lecture Notes in Computer Science, 12302, pdf/10.1002/smr.2400.
|
||
Springer, 2020, pp. 325–342, http://dx.doi.org/10.1007/978-3-030-59152-6_18.
|
||
|
||
|
||
12
|
||
|