This commit is contained in:
2026-01-06 12:49:26 -07:00
commit dfa968ec7d
155 changed files with 539774 additions and 0 deletions

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,834 @@
Journal of Systems Architecture 160 (2025) 103346
Contents lists available at ScienceDirect
Journal of Systems Architecture
journal homepage: www.elsevier.com/locate/sysarc
Fast post-quantum private set intersection from oblivious pseudorandom
function for mobile social networks✩
Zhuang Shan a , Leyou Zhang a ,, Qing Wu b , Qiqi Lai c , Fuchun Guo d
a School of Mathematics and Statistics, Xidian University, Xian 710126, China
b
School of Automation, Xian University of Posts and Telecommunications, Xian 710121, China
c
School of Computer Science, Shaanxi Normal University, Xian 710121, China
d
Centre for Computer and Information Security Research, University of Wollongong, Wollongong, NSW 2522, Australia
ARTICLE INFO ABSTRACT
Keywords: Mobile social networks have become integral to our daily lives, transforming communication methods and
Mobile social networks facilitating social interactions. With technological advancements, users generate vast amounts of valuable
Private set intersection and sensitive personal data, which is stored on servers to enable instant information sharing. To protect the
Oblivious pseudorandom function
sharing data, each platform has implemented many techniques such as end-to-end encryption mechanisms,
Private information retrieval
fully homomorphic encryption, etc. However, these approaches face several security and privacy challenges,
including potential leaks of user data, vulnerabilities in encryption that expose privacy ciphertexts to
probabilistic attacks, and threats posed by future quantum computers.
Aimed at the above, we introduce a private set intersection (PSI) protocol based on oblivious pseudorandom
functions (OPRF) under ring LPR problem from lattice. The proposed perturbed pseudorandom generator
not only enhances the PSIs resistance to probabilistic attacks, but also leads to generate a more efficient
OPRF and a PSI. It boasts a time complexity of 𝑂(𝑛 log 𝑛) and is superior to existing well-known fast post-
quantum PSI protocol operating at 𝑂(𝑚𝑛 log(𝑚𝑛)), where 𝑚 is the bit length of the cryptographic modulus and 𝑛
represents the dimension of the security parameter. Simulation experiments and security analyses demonstrate
that our proposal effectively preserves user privacy, ensures collusion resilience, verifies computation results,
and maintains low computational costs. Finally, as an expansion of our OPRF, we also give a fast private
information retrieval (PIR) protocol.
1. Introduction respective data sets. This way, even if data is stored in distributed
systems, it can effectively prevent data breaches and violations of user
Mobile social networks have greatly enriched the ways people com- privacy, such as those caused by data leaks or unauthorized access.
municate and enhanced the convenience of social interactions. With the The application of PSI in mobile social networks not only enhances
development of technology, users generate a large amount of useful data security but also strengthens user trust in the platform, which
and sensitive personal data within mobile social networks. This data
is crucial for protecting user privacy and improving the platforms
often needs to be stored and processed to provide more personalized
competitiveness. In this way, mobile social networks can continue to
services and experiences [1,2]. However, due to the limited storage
capacity of mobile social network devices, it is impossible to store all provide a rich and vibrant social experience and efficient information
the data generated at any given moment, which presents challenges for services while safeguarding personal privacy. Furthermore, as an im-
data storage and privacy protection. portant application in the field of privacy computing, PSI has recently
To address this issue while ensuring data confidentiality and se- garnered widespread attention due to its efficiency and practicality,
curity, many mobile social network platforms have started adopting jointly promoting the rapid implementation of privacy computing tech-
advanced privacy-preserving technologies, such as private set inter- nology and ensuring the secure flow and value extraction of data
section (PSI). The technology allows two or more parties to securely elements.
compute the intersection of their datasets without disclosing their
✩ This document is the results of the research project funded by the National Science Foundation.
Corresponding author.
E-mail addresses: arcsec30@stu.xidian.edu.cn (Z. Shan), lyzhang@mail.xidian.edu.cn (L. Zhang), xiyouwuq@126.com (Q. Wu), laiqq@snnu.edu.cn (Q. Lai),
fuchun@uow.edu.au (F. Guo).
https://doi.org/10.1016/j.sysarc.2025.103346
Received 3 November 2024; Received in revised form 24 December 2024; Accepted 16 January 2025
Available online 25 January 2025
1383-7621/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Z. Shan et al. Journal of Systems Architecture 160 (2025) 103346
set intersection from oblivious pseudorandom function is proposed in
this paper, and it has the following advantages:
• Symmetric encryption is adopted, which is efficient and reduces the risk of
privacy leakage. The PSI in this paper is constructed based on OPRF,
which belongs to asymmetric encryption, thus reducing the number
of interactions between users and lowering the risk of user privacy
leakage. Compared to symmetric encryption, the operational cost of
asymmetric encryption is lower, reducing reliance on authoritative
institutions.
• The structure of OPRF is simple, and it is relatively efficient in post-
quantum OPRF. The OPRF used to construct PSI in this paper is based
on a new lattice problem, namely the learning parity with rounding
Fig. 1. Mobile social networks.
over ring problem(Ring-LPR). The Ring-LPR problem not only has a
simple structure but also possesses the capability to resist quantum
attacks.
• A perturbed pseudorandom generator (PPRG) can withstand probabilistic
attacks. In addition to OPRF, the PSI in this paper also includes
a structure with a perturbed pseudorandom generator, which can
overcome the weakness of weak encryption in symmetric encryp-
tion, thereby preventing adversaries from guessing the corresponding
plaintext using statistical methods on the ciphertext ratios.
Fig. 2. Private set intersection. 1.2. Technical overview
We adopted oblivious transfer technique and hamming correlation
There are many common construction tools for PSI [3], and obliv- robustness, both of which are used in the OPRF construction presented
ious transfer (OT) is one of them. An OT [4] is a crucial tool used in this paper. For the incidental pseudorandom function subject, we
for secure multiparty computation. In this tool, the sender transmits initially aimed to use learning parity with noise (LPN) over rings.
data from a set of messages to the receiver but remains oblivious to However, this approach results in varying encryption outcomes for the
which specific message was sent, while the receiver is unaware of the same private data, preventing the recipient from matching the private
other messages they did not receive. This protocol is also known as the
data. Thus, we sought to make LPN over rings behave consistently
oblivious transfer protocol. The essence of an oblivious pseudorandom
like learning with rounding (LWR), leading to the introduction of the
function is a pseudorandom function (PRF) enhanced with oblivious
concept of learning parity with rounding over rings (LPR over rings) in
transfer capabilities.
this paper.
In 1986, Goldreich, Goldwasser, and Micali introduced a new cryp-
To prove that LPR over rings is quantum-resistant, we established
tographic primitive known as the pseudorandom function, whose out-
put appears to be randomly chosen [5]. Two decades later, Naor and a reduction bridge between LPR over rings and LWR. Yes, LPR over
Reingold [6] noticed that their number-theoretic PRF allows for an rings is reduced to LWR, not LPN over rings. For (𝑞 = 2𝑛 , 𝑝)-LWR
interactive and oblivious evaluation, where a client with input 𝑥 instances, we demonstrated the hardness of (𝑞 = 2, 𝑝 = 1)-LWR instances
obtains 𝐹𝑘 (𝑥) for a function 𝐹𝑘 (𝑥) that is contributed by a server. and (𝑞 = 2, 𝑝 = 1)-LWR over rings, where (𝑞 = 2, 𝑝 = 1)-LWR over
Neither does the client learn the function (i.e., its key 𝑘), nor does the rings corresponds to LPR over rings. To verify that the computational
server learn 𝑥 or 𝐹𝑘 (𝑥). Freedman et al. later called such two-party efficiency of the post-quantum OPRF in this paper is quite fast, we
protocol an OPRF and gave first formal definitions and two OPRFs compared the OPRF with the LWE-instantiated OPRF from [14]. The
based on the Naor-Reingold PRF [7]. In 2009, Jarecki and Liu presented results showed that, as theoretical analysis suggested, the computation
an efficient OPRF for securing intersection data [8]. efficiency improves with the increase of security parameters.
Oblivious pseudorandom functions have been utilized in PSI [9]. Based on OPRF, we constructed private set intersection (PSI) based
The additional functionalities of oblivious pseudorandom functions on OPRF. Since the paper [15] analyzed that PSI based on symmetric
also exhibit diversity, such as verifiable oblivious pseudorandom func- encryption does not resist probabilistic attacks and proposed the con-
tions (VOPRF, [10]) and partially oblivious pseudorandom functions cept of perturbed pseudorandom generator, we used LPN over rings
(POPRF, [11]). to construct a pseudorandom generator and proved that it satisfies the
Currently, OPRFs still faces challenges, as summarized by Casacu- definition of PPRG as given in [15].
berta, Hesse, and Lehmann [12]. Efficient OPRF constructions often
rely on discrete-log or factoring-type hardness assumptions, which
1.3. Organizations
are vulnerable to quantum computers. This paper aims to address
this by constructing OPRFs based on lattice-hardness assumptions and
improving their efficiency (see Figs. 1 and 2). The structure of this paper is as follows. Section 3 provides the
necessary definitions and lemmas as a foundation for the readers
1.1. Contributions knowledge. Section 4 presents the construction and efficiency analysis
of OPRF, along with the definition and reduction of Ring-LPR. Section 5
Regarding the open problem proposed by Casacuberta, there are details the construction of the PSI in this paper, security proofs, and
currently quantum-resistant OPRFs, namely Albrecht et al.s lattice- LWE-based efficiency analysis, as well as the construction of the PPRG
based VOPRF [10] and Boneh et al.s isogeny-based OPRF [13]. Both and the proof of its pseudorandomness. Finally, Section 6 summarizes
constructions represent significant feasibility results but require further the advantages and limitations of the PSI presented in this paper, as
research to improve their efficiency [12]. So, fast post-quantum private well as the extension of OPRF to PIR
2
Z. Shan et al. Journal of Systems Architecture 160 (2025) 103346
2. Preliminary ⎛ 0 0 0 ⋯ 0 1 ⎞
⎜ 1 0 0 ⋯ 0 0 ⎟
Each element of a lattice in R𝑛 can be expressed linearly by 𝑛 ⎜ ⎟
0 1 0 ⋯ 0 0 ⎟
𝑋=⎜ .
linearly independent vector integer coefficients. This set of linearly ⎜ 0 0 1 ⋯ 0 0 ⎟
independent vectors is called a lattice basis, and we know that the ⎜ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⎟⎟
lattice basis is not unique. Given a set of lattice bases (𝑣1 , … , 𝑣𝑛 ) in ⎝ 0 0 0 ⋯ 1 0 ⎠
the lattice , then the fundamental parallelelepiped is
{ 𝑛 } So there is
∑ |
(𝑣1 , … , 𝑣𝑛 ) = 𝑘𝑖 𝑣𝑖 ||𝑘𝑖 ∈ [0, 1) . ⎛ 𝑎0 𝑎𝑛1 ⋯ 𝑎1 ⎞
| ⎜ ⎟
𝑖=1 𝑎1 𝑎0 ⋯ 𝑎2 ⎟
𝑅𝑜𝑡(𝑓 ) = ⎜ ,
If the lattice base (𝑣1 , … , 𝑣𝑛 ) is determined, use the symbol () to ⎜ ⋮ ⋮ ⋱ ⋮ ⎟
replace (𝑣1 , … , 𝑣𝑛 ). ∀𝑥 ∈ R𝑛 , project it onto (). According to the ⎜ 𝑎 𝑎𝑛2 ⋯ ⎟
𝑎0 ⎠
𝑛1
properties of projection, there is a unique 𝑦 ∈ () makes 𝑦 𝑥 ∈ .
it is easy to prove that this mapping relationship is isomorphic.
Use the symbol det () to represent the volume of the fundamental
parallelelepiped of the lattice . In other words, the symbol det ()
Definition 3 (Learning with Rounding, [16,17]). Let 𝜆 be the security
represents the determinant of a matrix composed of a set of lattice bases
parameter, 𝑛 = 𝑛(𝜆), 𝑚 = 𝑚(𝜆), 𝑞 = 𝑞(𝜆), 𝑝 = 𝑝(𝜆) be integers. The LWR
(𝑣1 , … , 𝑣𝑛 ). For a given 𝑛 dimensional lattice, the det () size of any set
problem states that for 𝐴 ∈ Z𝑚×𝑛 𝑛 𝑚
𝑞 , 𝑠 ∈ Z𝑞 , 𝑢 ∈ Z𝑞 the following distri-
of lattice bases of the lattice is constant.
butions are computationally indistinguishable: (𝐴, ⌊𝐴𝑠⌋𝑝 ) ≈𝐶 (𝐴, ⌊𝑢⌋𝑝 ).
Given 𝑛 lattice , (𝑣1 , … , 𝑣𝑛 ) and (𝑢1 , … , 𝑢𝑛 ) are two arbitrary groups
∑ Here ⌊𝑥⌋𝑝 = ⌊ 𝑞𝑝 𝑥⌋, ⌊𝑥⌋ represents the floor function, which rounds down
of lattice  respectively lattice bases. Therefore, there is 𝑣𝑖 = 𝑛𝑗=1 𝑚𝑖𝑗 𝑢𝑗
∑𝑛 to the nearest integer. For example, ⌊3.14⌋ = 3 and ⌊3⌋ = 3.
and 𝑢𝑖 = 𝑗=1 𝑚𝑖𝑗 𝑣𝑗 , 𝑖 ∈ {1, … , 𝑛}, there are two integer matrices 𝑀 and
𝑀 such that
𝑣1 ⎞ ⎛ 𝑢1 ⎞ ⎛ 𝑢1 ⎞ ⎛ 𝑣1 ⎞ Definition 4 (Learning Parity with Noise, [18,19]). Let 𝜆 be the security
⎜ ⋮ ⎟ = 𝑀 ⎜ ⋮ ⎟ and ⎜ ⋮ ⎟ = 𝑀 ⎜ ⋮ ⎟ . parameter, 𝑛 = 𝑛(𝜆), 𝑚 = 𝑚(𝜆) be integers. The LPN problem states
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
𝑣𝑛 ⎠ ⎝ 𝑢𝑛 ⎠ ⎝ 𝑢𝑛 ⎠ ⎝ 𝑣𝑛 ⎠ that for 𝐴 ∈ Z𝑚×𝑛
2
, 𝑠 ∈ Z𝑛2 , 𝑢, 𝑒 ∈ Z𝑚
2
the following distributions are
computationally indistinguishable: (𝐴, 𝐴𝑠 + 𝑒) ≈𝐶 (𝐴, 𝑢).
It is easy to prove that 𝑀 and 𝑀 are inverse to each other, and 𝑀
and 𝑀 are both integer matrices, there are det (𝑀)⋅ det (𝑀 ) = 1 and
det (𝑀) = det (𝑀 ) = ±1, so Definition 5 (Hamming Correlation Robustness, [14]). For a hash func-
det (𝑣1 , … , 𝑣𝑛 ) = ± det (𝑢1 , … , 𝑢𝑛 ). tion (⋅) and a pseudorandom function 𝐹𝑘 (⋅) with key 𝑘, (⋅) is Ham-
ming correlation robust if (𝑥) ≈𝐶 𝐹𝑘 (𝑥).
Definition 1. An ideal lattice is a subset of rings or domains that Definition 6 (OT1 ). The message sender sends data to the receiver
satisfies the following two properties: from a set of pending messages but remains oblivious to which specific
message was sent. Meanwhile, the receiver is unaware of the additional
1. Additive closure: If any two elements in the ideal are added, the data they want to receive. This protocol is also known as oblivious
result is still in the ideal. In other words, for any elements 𝑎 and transfer.
𝑏 in the ideal, 𝑎 + 𝑏 also belongs to that ideal.
2. Multiplicative absorptivity: If an element in the ideal is multi-
plied by any element in the ring (or field), the result is still in Definition 7 (OPRF, [20]). Let the PRF key 𝑘 consist of two bit-
the ideal. In other words, for any element 𝑎 in the ideal and any strings 𝑞 , 𝑠 ∈ {0, 1}𝜆 . Let 𝐹 (⋅)be a pseudorandom code that produces a
element 𝑟 in the ring (or field), 𝑎𝑟 and 𝑟𝑎 belong to that ideal. pseudorandom string and let  be a hash function. The pseudorandom
function is computed as
For a commutative ring, further require that the ideal be closed for both
addition and multiplication. Such an ideal is called a true ideal. OPRF𝑘 (𝑥) = (𝑞 ⊕ [𝐹 (𝑥) ⋅ 𝑠]),
where ⋅ denotes bitwise-AND and ⊕ denotes bitwise-XOR. For a ran-
Definition 2. Referring to the definition of ideal, the ideal lattice  is domly generated s, if 𝐹 (𝑥) has enough Hamming weight then the
a subset of the lattice  that satisfies the following two properties: function OPRF𝑘 (𝑥) is pseudorandom assuming the hash function  is
correlation robust.
1. Additive closure: If any two elements in an ideal lattice are
added, the result is still in the ideal lattice. In other words, for
any elements 𝑎 and 𝑏 in an ideal lattice, 𝑎+𝑏 also belongs to that Definition 8 (PSI, [14]). PSI enables two parties, each holding a private
ideal lattice. set of elements, to compute the intersection of the two sets while
2. Multiplicative absorptivity: If an element in an ideal lattice is revealing nothing more than the intersection itself.
multiplied by an element in any other ideal lattice, the result
remains in the ideal lattice. In other words, for any element 𝑎 in
Definition 9 (Dihedral Coset Problem). Given a security parameter 𝜅, for
the ideal and any element 𝑟 in another ideal lattice, both 𝑎𝑟 and
an instance of the DCP𝓁𝑞 problem, where 𝑁 denotes the modulus and 𝓁
𝑟𝑎 belong to that ideal lattice.
represents the number of states. Each state is expressed as
|0⟩|𝑥𝑖 ⟩ + |1⟩|(𝑥𝑖 + 𝑠) mod 𝑞⟩, 𝑖𝓁,
Corollary 1. The ideal lattice  is a true idea of the lattice . and it stores 1 + ⌈log2 𝑞⌉ bits, where 𝑥 ∈𝑅 Z𝑛𝑞 and 𝑠 ∈ Z𝑛𝑞 . If 𝑠 can be
For 𝑓 (𝑥) = 𝑎0 + 𝑎1 𝑥 + ⋯ + 𝑎𝑛1 𝑥𝑛1 is mapped to computed with probability poly(1 log 𝑞) in time poly(log 𝑞), then the
DCP𝓁𝑞 problem is considered to be broken.
𝑅𝑜𝑡(𝑓 ) = 𝑎0 𝐼 + 𝑎1 𝑋 + ⋯ + 𝑎𝑛1 𝑋 𝑛1 ∈ .
̃
Among them,  ̃ is the mapping of all Z[𝑥]<𝑥𝑛 + 1> to the elements in
1
the ideal lattice  collection, and https://blog.csdn.net/m0_61869253/article/details/139362753
3
Z. Shan et al. Journal of Systems Architecture 160 (2025) 103346
3.2. Security proof of OPRF
Note 1. The Dihedral Coset Problem is a difficult problem in quantum In this subsection, we will provide the definition of the underly-
computing, and solving it has a time complexity of 𝑂(𝑒𝑛 ) or 𝑂(𝑛!). ing lattice problem for OPRF, learning parity with rounding, and its
reduction proof.
Lemma 1. If an efficient algorithm  can solve DCP𝓁2 in polynomial
Definition 11 (Learning Parity with Rounding). Let 𝜆 be the security
time, then there exists an efficient algorithm  that can solve DCP𝓁𝑞 in
parameter, 𝑛 = 𝑛(𝜆), 𝑚 = 𝑚(𝜆) be integers. The LPR problem states
polynomial time.
that for 𝐴 ∈ Z𝑚×𝑛
2
, 𝑠 ∈ Z𝑛2 , 𝑢 ∈ Z𝑚 2
the following distributions are
computationally indistinguishable: (𝐴, ⌊𝐴𝑠 mod 4⌋1 ) ≈𝐶 (𝐴, ⌊𝑢⌋1 ).
Proof. We use a proof by contradiction. Suppose 𝑞 = 2𝑛 and there exists
an efficient algorithm  that can solve DCP𝓁2 in polynomial time. For Definition 12 (Learning Parity with Rounding Over Ring). The Ring LPR
instances of DCP𝓁4 , we have problem states that for 𝑎, 𝑠, 𝑢 ∈ 2 the following distributions are
|0⟩|𝑥𝑖 ⟩+|1⟩|(𝑥𝑖 + 𝑠) mod 4⟩ = |0⟩|𝑥𝑖 ⟩ + |1⟩|(𝑥𝑖 + 𝑠 ) mod 2⟩ computationally indistinguishable: (𝑎, ⌊𝑎𝑠 mod 4⌋1 ) ≈𝐶 (𝑎, ⌊𝑢⌋1 ).
+ 2(|0⟩|𝑥
𝑖 ⟩ + |1⟩|(𝑥𝑖 + 𝑠 ) mod 2), 𝑖𝓁,
so running the algorithm  twice will solve DCP𝓁4=22 . Similarly, run- Lemma 4. For an LWR problem instance ⌊𝐴𝑠⌋𝑝 , if there exists an algorithm
ning  four times will solve DCP𝓁16=24 , and continuing in this manner,  for solving 𝑠 from ⌊𝐴𝑠⌋1 , then there also exists an algorithm  for
running the algorithm  𝑛 times will solve DCP𝓁𝑞 . Let 𝑂() represent solving the LWR problem.
the time complexity of the algorithm . Thus, we have  𝑛𝑂()
and algorithm  is an efficient algorithm. □ Proof. Given that there exists an algorithm  that can solve ⌊𝐴𝑠⌋1 =
𝐴𝑠 ⌋, for an LWR problem instance ⌊𝐴𝑠⌋𝑝 , we have:
𝑞 ⌊ ⌋
Definition 10 (Extrapolated Dihedral Coset Problem with model 2, [21]). 1 1 𝑝𝐴𝑠
⌊𝐴𝑠⌋𝑝 =
Given a security parameter 𝜅, an instance of EDCP𝓁𝑛,2,𝜌 is provided, 𝑝 𝑝 𝑞
( )
where 2 denotes the modulus, 𝜌 represents the probability density 1 𝑝𝐴𝑠
= +𝑒 (𝑒 ∈ (1, 0]𝑚 )
function, and 𝓁 denotes the number of states. Each state is expressed 𝑝 𝑞
( ( ]𝑚 )
as 1 1
∑ = 𝐴𝑠 + 𝑒 𝑒 , 0
𝜌(𝑗)|𝑗⟩|(𝑥𝑖 + 𝑗 𝑠) mod 2⟩, 𝑖𝓁, 𝑞 𝑝
𝑗∈supp(𝜌) ≈ ⌊𝐴𝑠⌋1 .
and stores 2 bits, where 𝑥𝑖 ∈𝑅 Z𝑛2 and 𝑠 ∈ Z𝑛2 . If 𝑠 can be determined
Thus, the algorithm  can be used to solve the LWR problem. □
with probability poly(1(𝑛 log 2)) in time poly(𝑛 log 2), then the EDCP𝓁𝑛,2,𝜌
problem is considered to be broken. We get next corollary by Lemma 3.
Corollary 3. Let (𝑛, 2, 𝑟 = 𝛺( 𝜅)) be an instance of G-EDCP and (𝑛, 2, 𝛼)
Lemma 2. If there exists an algorithm for solving EDCP𝓁𝑛,4,𝜌 , then this be an instance of 2-LWR. If there exists an algorithm for solving 2-LWR,
algorithm can also solve DCP𝓁4 . then there exists an algorithm for solving G-EDCP𝓁𝑛,2,𝜌 .
𝑟
Proof. Let Corollary 4. Let (𝑛, 2, 𝑟 = 𝛺( 𝜅)) be an instance of G-EDCP and (𝑛, 2, 𝛼)
1 1 be an instance of LPR. If there exists an algorithm for solving LPR, then
|𝑏⟩ = √ |0⟩|𝑥𝑖 ⟩ + √ |1⟩|(𝑥𝑖 + 𝑠) mod 4⟩.
2 2 there exists an algorithm for solving G-EDCP𝓁𝑛,2,𝜌 .
𝑟
Thus, 𝜌(0)|0⟩ = √1 |0⟩ and 𝜌(1)|1⟩ = √1 |1⟩. Hence, DCP𝓁2 is a special
2 2
case of EDCP𝓁𝑛,2,𝜌 . Therefore, if there exists an algorithm for solving Lemma 5. If there exists an algorithm  for solving the Ring-LPR problem,
EDCP𝓁𝑛,2,𝜌 , this algorithm can also solve DCP𝓁2 . □ then there also exists an algorithm  for solving the LPR problem.
√ Proof. For an instance of the inner product Ring-LPR
Lemma 3 ([21]). Let (𝑛, 𝑞 , 𝑟 = 𝛺( 𝜅)) be an instance of G-EDCP and
(𝑛, 𝑞 , 𝛼) be an instance of LWE. If there exists an algorithm for solving 𝑏 = ⌊𝑎 ⋅ 𝑠⌋1
LWE𝑛,𝑞,𝛼 , then there exists an algorithm for solving G-EDCP𝓁𝑛,𝑞,𝜌 . where 𝑎 = 𝑎0 + 𝑎1 𝑥 + ⋯ + 𝑎𝑛1 𝑥𝑛1 , we can represent 𝑎 as a circulant
𝑟
matrix, specifically
√ ⎛ 𝑎0 𝑎𝑛1 ⋯ 𝑎1 ⎞
Corollary 2. Let (𝑛, 2, 𝑟 = 𝛺( 𝜅)) be an instance of G-EDCP and (𝑛, 2, 𝛼) ⎜ ⎟
𝑎 𝑎0 ⋯ 𝑎2 ⎟
be an instance of LPN. If there exists an algorithm for solving LPN𝑛,𝛼 , then 𝐴1 = ⎜ 1
.
⎜ ⋮ ⋮ ⋱ ⋮ ⎟
there exists an algorithm for solving G-EDCP𝓁𝑛,2,𝜌 . ⎜ 𝑎
𝑟
𝑛1 𝑎𝑛2 ⋯ 𝑎0 ⎠
Thus,
3. Ring-LPR based OPRF
𝑏 = ⌊𝑎 ⋅ 𝑠⌋1 ⇒ 𝑏 = 𝐴1 𝑠.
3.1. Constructing OPRF where 𝑎 = (𝑎0 , 𝑎1 , … , 𝑎𝑛1 ) ← 𝑎 = 𝑎0 + 𝑎1 𝑥 + ⋯ + 𝑎𝑛1 𝑥𝑛1 . We use
a proof by contradiction. Suppose there exists an efficient algorithm
Fig. 3 presents the ring LPR-based oblivious pseudorandom func-  that can solve Ring-LPR in polynomial time. We take the first row
tion. In the next section, we will prove the security of the oblivious from 𝐴1 , denote it as 𝛼1 , and have ⌊𝛼1 𝑠⌋1 = 𝑏1 , where 𝑏1 is the first
pseudorandom function. component of 𝑏. For the LWR problem instance, 𝛽⃗ = ⌊𝛬𝑠⃗⌋1 , assume
4
Z. Shan et al. Journal of Systems Architecture 160 (2025) 103346
Fig. 3. Oblivious Pseudorandom Function (OPRF).
𝛬𝑇 = (𝛼1 , 𝛼2 , … , 𝛼𝑚 ).
Thus, we use the algorithm  𝑚 times to find 𝛽𝑖 such that ⌊𝛾𝑖 ⌋1 = 𝛽𝑖 =
𝛼1 𝑠1 ⌋1 , and thus we can solve the equation
𝛾 = 𝛬𝑠⃗, 𝛾 𝑇 = (𝛾1 , … , 𝛾𝑚 ).
Assuming that the time complexity of solving 𝑠 from LWR problem
instance is 𝑂(𝛬, 𝛽), according to Corollary 3, let 𝑂(𝛾 = 𝛬𝑠⃗) be the
computational complexity of solving the equation 𝛾 = 𝛬𝑠⃗, we have
𝑚𝑂() + 𝑂(𝛾 = 𝛬𝑠⃗) ≥ 𝑂(𝛬, 𝛽) ≥ 𝑂(𝑛!) or 𝑂(𝑒𝑛 ).
Let 𝑚 = 𝑛, then
𝑂(𝛬, 𝛽) 𝑂(𝛾 = 𝛬𝑠⃗)
𝑂() ≥
𝑛
𝑂(𝑛!) 𝑂(𝛾 = 𝛬𝑠⃗) 𝑂(𝑒𝑛 ) 𝑂(𝛾 = 𝛬𝑠⃗)
≥ or .
𝑛 𝑛
This contradicts the assumption that there is an efficient algorithm 
that can solve the inner product Ring-LPR in polynomial time, thus the
theorem holds. □
3.3. Efficiency analysis
This section simulates the OPRF computation efficiency of this
paper and OPRF in [14] on MAC, Pad and Phone. The PRF of [14]
is instantiated based on LWE.
3.3.1. Efficiency analysis on MAC
The tools used in the subsection are Python 3.12, the programs are
performed on MacBook Air MAC Desktop Apple M1, RAM 8.00 GB (see
Fig. 4).
3.3.2. Efficiency analysis on mobile pad
The tools used in the subsection are Pydriod 3, the programs are
performed on Xiaomi Pad 6 Pro File Explorer 1th Qualcomm(R)AI En-
gine(TM) Xiaolong 8+ mobile platform@3.2 GHz, RAM 8.00+3.00 GB
(see Fig. 5).
Fig. 4. Parallel comparison of OPRF on MAC, where 𝑛 represents the security
parameter, unit is microseconds.
3.3.3. Summary of data comparison
From the simulation results, it can be seen that for 𝑛 ≤ 250, the
LWE-based OPRF in [14] is slightly faster, while for 𝑛 > 250, the ring
LPR-based OPRF in this paper is faster. Furthermore, as 𝑛 increases, 4. PSI based on OPRF
the advantages of ring LPR become more pronounced. Based on the
simulation results for Pad, the OPRF in this paper is more stable; In this paper, apart from OPRF, another tool used in the construction
although there are fluctuations, they are less significant compared to of PSI is a perturbed pseudorandom generator [15]. The perturbed
the LWE-based OPRF in [14]. pseudorandom generator in this paper is constructed from Ring-LPN.
5
Z. Shan et al. Journal of Systems Architecture 160 (2025) 103346
Fig. 6. Pseudorandom generator with perturbation 𝐺𝛾 (⋅).
𝑛1
√∑
‖𝑎‖ = √ |𝑎 |2 . 𝑖
𝑖=0
Definition 15 ([15]). A pseudorandom generator with perturbation,
denoted as 𝐺𝛾 (⋅), is defined such that for 𝑥1 , 𝑥2 ∈ , there exists 𝛾
satisfying the following conditions:
1. When 𝑥1 = 𝑥2 , Pr (𝐺𝛾 (𝑥1 ) = 𝐺𝛾 (𝑥2 )) ≤ 𝑂(exp(𝑛)),
2. When 𝑥1 = 𝑥2 , such that ‖𝐺𝛾 (𝑥1 ) 𝐺𝛾 (𝑥2 )‖ < 𝛾, there exists 𝑁
such that ‖𝐺𝛾 (𝑥1 ) 𝐺𝛾 (𝑥2 )‖ ≥ 𝛾𝑁, where clearly 𝑁 = 1 is
optimal.
Theorem 1. The Ring-LPN problem itself can be viewed as a pseudorandom
function with perturbations.
Proof. We prove each statement separately. First, when 𝑥1 = 𝑥2 , we
Fig. 5. Parallel comparison of OPRF on mobile pads, where 𝑛 represents the security have
parameter, unit is microseconds. ( ) 1
Pr 𝐺𝛾 (𝑥1 ) = 𝐺𝛾 (𝑥2 ) = Pr (𝑒1 = 𝑒2 ) = 𝑛 .
2
Additionally, set 𝛾 = 𝑛 + 1, so
Next, we will present the reduction process for Ring-LPN.
‖(𝐴𝑥1 + 𝑒1 ) (𝐴𝑥2 + 𝑒2 )‖ = ‖𝑒1 𝑒2 ‖ < 𝛾 .
4.1. Reduction of ring-LPN When 𝑥1 ≠ 𝑥2 , set 𝑣1 = 𝐺𝛾 (𝑥1 ), 𝑣2 = 𝐺𝛾 (𝑥2 ), and know that
√ ∑𝑛 ( )𝑘 ( )𝑛𝑘
1 1
Definition 13 (Learning Parity with Noise Over Ring). The learning parity Pr (‖𝑣1 𝑣2 ‖ ≤ 𝑛) = 𝐶𝑛𝑘
𝑘=0
3 2
with noise over ring problem states that for 𝑎, 𝑠, 𝑒, 𝑢 ∈ {0,1} the
following distributions are computationally indistinguishable: (𝑎, 𝑎𝑠 + ∑
𝑛2 ( )𝑘 ( )𝑘 ( )𝑛2𝑘
1 1 1
+ 𝐶𝑛𝑘 .
𝑒) ≈𝐶 (𝑎, 𝑢). 3 6 2
𝑘=0
Because
( )𝑘 ( )𝑛𝑘 ( ( )2 ( )𝑛 )
Corollary 5. If there exists an efficient algorithm  that can solve the ∑𝑛
1 1 1 2 2 2
Ring-LPN problem in polynomial time, then there also exists an algorithm 𝐶𝑛𝑘 = 𝑛 + +⋯+
𝑘=0
3 2 2 3 3 3
that can solve the LPN problem. ( ( )𝑛 )
3 2
= 𝑛 1 ,
2 3
Proof. The proof method is similar to that of Lemma 5, but this way
and
the computational complexity of  will decrease. If we want the Ring- ( )
𝑛2 ( )𝑘 ( )𝑘 ( )𝑛2𝑘 ( ) 2𝑛
LPN problem to be approximately as hard as the LPN problem, then 1 1 1 3⋅6 1 1
𝐶𝑛𝑘 ≤ 1 .
for the security parameters 𝜅1 of the Ring-LPN problem and 𝜅2 of the 𝑘=0
3 6 2 17 2𝑛 2𝑛 3⋅6
LPN problem, we have
Therefore
𝑒𝜅1 (𝜅 )! ( √ √ )
𝑒𝜅2 , or 1 ≥ (𝜅2 )!. 1
Pr ‖𝑣1 𝑣2 ‖ ≤ 𝑛 < 𝑛 + 1 ≤ 𝑛 .
𝜅12 𝜅12 2
Thus, we can roughly obtain 𝜅1 ≥ 1.5𝜅2 and 𝜅2 ≥ 12. Note that 𝑂(𝑛) Thus, there is a very high probability that ‖𝑣1 𝑣2 ‖ ≥ 𝑛 + 1, and 𝑁 = 1
is an asymptotically large quantity with respect to 𝑛. We use the most (see Fig. 6). □
extreme case to determine the relationship between 𝜅1 and 𝜅2 . □
4.2. Perturbed pseudorandom generator 4.3. PSI based on OPRF
Definition 14. Let 𝑎 = 𝑎0 + 𝑎1 𝑥 + ⋯ + 𝑎𝑛1 𝑥𝑛1 ∈ {0,1} . Define the Lemma 6. Assuming 𝑓 (𝑦) ≈𝐶 𝑢1 and 𝑔(𝑢1 ) ≈𝐶 𝑢2 , then (𝑔◦𝑓 )(𝑦) ≈𝐶 𝑢2 .
norm of 𝑎 as ‖𝑎‖, and
6
Z. Shan et al. Journal of Systems Architecture 160 (2025) 103346
Fig. 7. PSI based on OPRF.
Fig. 9. Parallel comparison of PSI on mobile pads, where 𝑛 represents the security
parameter, unit is microseconds.
Fig. 8. Parallel comparison of PSI on MAC, where 𝑛 represents the security parameter, Fig. 10. Comparison of PSI on mobile phones, where 𝑛 represents the security
unit is microseconds. parameter, unit is microseconds.
7
Z. Shan et al. Journal of Systems Architecture 160 (2025) 103346
Fig. 11. PIR based on OPRF.
Proof. On one hand, because the pseudorandom 𝐹̃𝑘 {0,1} × {0, 1}
{0,1} , for any 𝑘 ∈ {0,1} , 𝑦 ∈  ⊂ {0, 1} , we have 𝐹̃𝑘 (𝑦) ≈𝐶 𝑢𝜔 ∈
{0,1} .
On the other hand, due to the pseudorandom function 𝐹𝑘 {0,1} ×
{0,1} → {0,1} , for 𝑢𝓁1 ∈ {0,1} , we have 𝐹𝑘 (𝑢𝓁1 ) ≈𝐶 𝑢𝜔 . According
to the property of the hash function, have 1 (𝑦) ≈𝐶 𝑢𝓁1 . Combining
with Lemma 6, one can obtain that 𝐹𝑘 (1 (𝑦)) ≈𝐶 𝑢𝜔 . Consequently,
𝐹̃𝑘 (𝑦) ≈𝐶 𝐹𝑘 (1 (𝑦)). □
Theorem 2. If 1 is a collision resistant hash function, 2 and 3
are hamming correlation robustness, then the protocol in Fig. 7 securely
realizes 𝑃 𝑆 𝐼 in the semi-honest model when parameters 𝑚, 𝑤 are chosen
as described in [14].
Proof. Perspective from 𝑃1 .
Hyb0 𝑃1 s view and 𝑃2 s output in the real protocol.
Hyb1 Same as Hyb0 except that on 𝑃2 s side, for each 𝑖 ∈ [𝜔], if 𝑠[𝑖] = 0,
then sample 𝐴𝑖 ← {0, 1}𝑚 and compute 𝐵𝑖 = 𝐴𝑖𝐷𝑖 ; otherwise
sample 𝐵𝑖 ← {0, 1}𝑚 and compute 𝐴𝑖 = 𝐵𝑖𝐷𝑖 . This hybrid is
identical to Hyb0 .
Hyb2 Initialize an 𝑚 × 𝑤 binary matrix 𝐷 to all 1s. Denote its column
vectors by 𝐷1 , … , 𝐷𝜔 . Then 𝐷1 = ⋯ = 𝐷𝜔 = 1𝑚 . For 𝑦 ∈ ,
randomly select 𝑣 ← [𝑚]𝜔 , and set 𝐷𝑖 [𝑣[𝑖]] = 0 for all 𝑖 ∈ [𝜔].
Hyb3 Find a suitable pseudorandom function 𝐹̃𝑘 {0,1} × {0, 1}
{0,1} . For 𝑦 ∈ , compute 𝑣̃ = 𝐹̃𝑘 (𝑦), randomly select 𝑣 ← [𝑚]𝜔 ,
and set 𝐷𝑖 [𝑣[𝑖]] = 0 for all 𝑖 ∈ [𝜔].
Hyb4 Let there be a pseudorandom function 𝐹 {0,1} ×{0,1} → {0,1}
and a hash function 1 {0, 1} → {0,1} . For 𝑦 ∈ , compute
𝑣 = 𝐹𝑘 (1 (𝑦)), randomly select 𝑣 ← [𝑚]𝜔 , and set 𝐷𝑖 [𝑣[𝑖]] = 0 for
all 𝑖 ∈ [𝜔].
Hyb5 Let there be a pseudorandom function 𝐹 {0,1} × {0,1} →
{0,1} , Hamming Correlation Robustness 2 Z𝑚×𝜔 {0,1}
→ {0,1}
and a hash function 1 {0, 1} → {0,1} . For 𝑦 ∈ , compute
𝑣 = 𝐹𝑘 (1 (𝑦)), 𝑣 = 2 (𝑣 ), and set 𝐷𝑖 [𝑣[𝑖]] = 0 for all 𝑖 ∈ [𝜔].
Fig. 12. Parallel comparison of PIR on MAC, where 𝑛 represents the security parameter, Given that Hyb0 ≈𝐶 Hyb1 ≈𝐶 Hyb2 ≈𝐶 Hyb3 , Hyb4 ≈𝐶 Hyb5 and
unit is microseconds. according to Lemma 7, it be known that Hyb3 ≈𝐶 Hyb4 . Therefore, we
have Hyb0 ≈𝐶 Hyb5 .
Perspective from 𝑃2 .
Lemma 7. Find a suitable pseudorandom function 𝐹̃𝑘 {0,1} × {0, 1} → Hyb0 𝑃2 s view in the real protocol.
{0,1} . Assuming that the pseudo-random function 𝐹𝑘 {0,1} × {0,1} →
Hyb1 𝜓 ← {0,1} , all other aspects are consistent with the real
{0,1} and the hash function 1 {0, 1} → {0,1} are indistinguishable,
protocol.
we have
Hyb2 Introduce 𝐺𝛾 {0,1} → {0,1} and Hamming Correlation
𝐹̃𝑘 (𝑦) ≈𝐶 𝐹𝑘 (1 (𝑦)).
Robustness 3 Z𝑚×𝜔 {0,1}
→ {0,1} , let the initial matrices be
𝐶1 = ⋯ = 𝐶𝜔 = 1𝑚 , randomly select 𝑣 ∈ [𝑚]𝜔 , set 𝐶𝑖 [𝑣[𝑖]] = 0
for all 𝑖 ∈ [𝜔]. Compute 𝐺𝛾 (𝐶1 [𝑣[1]]‖ ⋯ ‖𝐶𝜔 [𝑣[𝜔]]).
8
Z. Shan et al. Journal of Systems Architecture 160 (2025) 103346
Hyb3 Let the initial matrices be 𝐶1 = ⋯ = 𝐶𝜔 = 1𝑚 , find an appropriate • Setup The simulator  generates some necessary parameters for the
pseudorandom function 𝐹̃𝑘 {0,1} × {0, 1} → {0,1} . For 𝑦 ∈ , algorithms and selects an appropriate hash functions 1 {0, 1}
compute 𝑣̃ = 𝐹̃𝑘 (𝑦), randomly select 𝑣 ← [𝑚]𝜔 , set 𝐶𝑖 [𝑣[𝑖]] = 0 for {0,1} , Hamming Correlation Robustness 2 {0,1} → [𝑚]𝜔 , Ham-
all 𝑖 ∈ [𝜔]. Compute 𝐺𝛾 (𝐶1 [𝑣[1]]‖ ⋯ ‖𝐶𝜔 [𝑣[𝜔]]). ming Correlation Robustness 3 Z𝑚×𝜔 → {0,1} and a 𝐺𝛾 {0,1} →
{0,1}
Hyb4 Let the initial matrices be 𝐶1 = ⋯ = 𝐶𝜔 = 1𝑚 , set a pseudo- {0,1} , a pseudorandom function 𝐹 {0,1} × {0,1} → {0,1} with
random function 𝐹 {0,1} × {0,1} → {0,1} , a hash function key 𝑘 ∈ {0,1} . The adversary 𝑃1 selects 𝑠 and transmits 𝑠 to the
1 {0, 1} → {0,1} and Hamming Correlation Robustness simulator  using OT.
𝑚×𝜔
3 Z{0,1} → {0,1} . For 𝑦 ∈ , compute 𝑣 = 𝐹𝑘 (1 (𝑦)), • H-Query, PRF-Query and PRG-Query The adversary 𝑃1 makes
randomly select 𝑣 ← [𝑚]𝜔 . Set 𝐶𝑖 [𝑣[𝑖]] = 0 for all 𝑖 ∈ [𝜔]. Compute queries about the hash function, pseudorandom function, oblivious
𝐺𝛾 (3 (𝐶1 [𝑣[1]]‖ ⋯ ‖𝐶𝜔 [𝑣[𝜔]])). transfer values, and pseudorandom generator. The simulator  pre-
Hyb5 Let the initial matrices be 𝐶1 = ⋯ = 𝐶𝜔 = 1𝑚 , set a pseu- establishes lists for handling H-Query, PRF-Query, and PRG-Query
dorandom function 𝐹 {0,1} × {0,1} → {0,1} and a hash respectively.
function 1 {0, 1} → {0,1} , Hamming Correlation Robustness
𝑚×𝜔
2 Z{0,1} → {0,1} and 3 Z𝑚×𝜔 → {0,1} . For 𝑦 ∈ , 1 -Query For the 𝑖th query 𝑥𝑖 ∈ {0, 1} corresponding to the
{0,1}
compute 𝑣 = 𝐹𝑘 (1 (𝑦)), compute 𝑣 = 𝐹𝑘 (1 (𝑦)). Set 𝐶𝑖 [𝑣[𝑖]] = 0 value of 1 , the simulator  selects from the hash value list
for all 𝑖 ∈ [𝜔]. Compute 𝐺𝛾 (3 (𝐶1 [𝑣[1]]‖ ⋯ ‖𝐶𝜔 [𝑣[𝜔]])). if available, otherwise selects a random 𝑋𝑖 ∈ {0,1} . Set 𝑋𝑖 =
Similarly, it can be proven that Hyb0 ≈𝐶 Hyb5 . □ 1 (𝑥𝑖 ) and update the list accordingly.
2 -Query For the 𝑖th query 𝑦𝑖 ∈ {0,1} corresponding to the
value of 2 , the simulator  selects from the hash value list if
Definition 16 (CPA Security Model of the Protocol in Fig. 7). Assume available, otherwise selects a random 𝑌𝑖 ∈ [𝑚]𝜔 . Set 𝑌𝑖 = 2 (𝑦𝑖 )
there exists a perturbed pseudorandom oracle machine 𝑃 𝑟𝑀𝛾 (where
and update the list accordingly.
𝛾 is the upper bound on the norm of the perturbation in 𝑃 𝑟𝑀𝛾 ), such
3 -Query For the 𝑖th query 𝑧𝑖 ∈ Z𝑚×𝜔 corresponding to the
that for an input 𝑥, it outputs two values: one is a random value 𝑦0 , {0,1}
value of 3 , the simulator  selects from the hash value list
and the other is a pseudorandom value 𝑦1 with 𝑥 as its input.
if available, otherwise selects a random 𝑍𝑖 ∈ {0,1} . Set 𝑍𝑖 =
• Setup The simulator  generates the necessary parameters for 3 (𝑧𝑖 ) and update the list accordingly.
the algorithms. The adversary  chooses 𝑠 and sends it to the 𝐹 -Query For the 𝑖th query 𝑢𝑖 ∈ {0,1} corresponding to the value
simulator  using OT. of 𝐹 , the simulator  selects from the pseudorandom function
• Hash Queries, PRF Queries and PRG Queries The adversary value list if available, otherwise selects a random 𝑈𝑖 ∈ {0,1} .
 sequentially performs hash function queries, pseudorandom Set 𝑈𝑖 = 𝐹 (𝑢𝑖 , 𝑘) and update the list accordingly.
function queries, and pseudorandom synthesizer queries. Here,
𝐺𝛾 -Query For the 𝑖th query 𝑤𝑖 ∈ {0,1} corresponding to the
the adversary cannot know the key in pseudorandom function
value of 𝐺𝛾 , the simulator  selects from the pseudorandom
queries.
generator value list if available, otherwise selects a random
• Challenge The adversary  selects a private message 𝑚 and sends
𝑊𝑖 ∈ {0,1} . Set 𝑊𝑖 = 𝐺𝛾 (𝑤𝑖 ) and update the list accordingly.
it to the simulator . The simulator queries the hash function,
pseudorandom function, and oblivious transfer values of the real Note that 𝐺𝛾 is not 𝐺𝛾black-box .
scheme, inputs these results into the pseudorandom oracle ma-
chine 𝑃 𝑟𝑀𝛾 , obtains two ciphertexts 𝑐0 and 𝑐1 , and sends them • Challenge 𝑃1 selects 𝑚 ∈ ∕ and sends it to .  using the corre-
to the adversary . sponding hash function queries and pseudorandom function queries,
• Guessing After receiving the two ciphertexts 𝑐0 and 𝑐1 ,  guesses inputs the queried values into the black-box 𝐺𝛾 , obtaining 𝜓0 and 𝜓1 ,
which ciphertext corresponds to the encryption of 𝑚 and sends the and then sends 𝜓0 , 𝜓1 to 𝑃1 .
guess back to the simulator . • Guess Based on the received 𝜓0 and 𝜓1 , 𝑃1 guesses whether 𝜓0 or
The advantage of the adversary  is defined as the advantage of the 𝜓1 is the ciphertext of the encrypted message 𝑚.
simulator  in distinguishing the outputs of 𝑃 𝑟𝑀𝛾 . According to the assumption, if the adversary 𝑃1 can break the
scheme with a non-negligible advantage, then the simulator  can
Note 2. The 𝑃 𝑟𝑀 mentioned in this paper differs from [22]. In [22], also break the black-box 𝐺𝛾 with a non-negligible advantage. This
𝑃 𝑟𝑀 refers to a pseudorandom oracle machine that outputs random contradicts the assumption that 𝐺𝛾 is secure. □
values when the adversary does not know the pseudorandom function key,
and outputs pseudorandom function values based on the key known to the
adversary when the key is known. This is a single-value output. However, the 4.4. Efficiency analysis PSI
𝑃 𝑟𝑀 required in this paper outputs both of these values simultaneously,
making it a multi-value output. This section simulates the PSI computation efficiency of this pa-
per and PSI in [14] on MAC, Pad, and Phone. The PRF of [14] is
Theorem 3. If 1 is a collision resistant hash function, 2 and 3 are instantiated based on LWE.
hamming correlation robustness, then the protocol in Fig. 7 securely realizes
𝑃 𝑆 𝐼 in Definition 16.
4.4.1. Efficiency analysis on MAC
The tools used in the subsection are Python 3.12, the programs are
Proof. Suppose the adversary 𝑃1 can break the scheme with non- performed on MacBook Air MAC Desktop Apple M1, RAM 8.00 GB (see
negligible advantage. Now, the simulator  simulates the scheme. Fig. 8).
Suppose there exists a black-box 𝐺𝛾𝑏𝑙𝑎𝑐 𝑘𝑏𝑜𝑥 such that
𝑦0 = 𝐺𝛾 (𝑥) ∈ {0,1} ,
4.4.2. Efficiency analysis on mobile pad
↗ The tools used in the subsection are Pydriod 3, the programs are
𝐺𝛾𝑏𝑙𝑎𝑐 𝑘𝑏𝑜𝑥 (𝑥) → (𝑦0 , 𝑦1 )
↘ performed on Xiaomi Pad 6 Pro File Explorer 1th Qualcomm(R)AI En-
𝑦1 ∈𝑅 {0,1} . gine(TM) Xiaolong 8+ mobile platform@3.2 GHz, RAM 8.00+3.00 GB
(see Fig. 9).
9
Z. Shan et al. Journal of Systems Architecture 160 (2025) 103346
4.5. Analysis of efficiency on mobile phones Acknowledgments
The tools used in the subsection are Pydriod 3, the programs are per- This work was supported in part by the National Nature Science
formed on Redmi K30 File Explorer 4th Qualcomm(R)AI Engine(TM) Foundation of China under Grant 61872087 and Grant 51875457; in
Qualcomm Xiaolong 730G 8+ mobile platform@2.2 GHz, RAM 6.00 GB part by the Key Foundation of National Natural Science Foundation
(see Fig. 10). of China under Grant U19B2021; and in part by the Key Research
and Development Program of Shaanxi under Program 2022GY-028 and
Program 2022GY-050.
4.5.1. Summary of data comparison
From the simulation results, it can be seen that for 𝑛 ≤ 400, the Data availability
LWE-based OPRF in [14] is slightly faster, while for 𝑛 > 400, the ring
LPR-based OPRF in this paper is faster. Furthermore, as 𝑛 increases, No data was used for the research described in the article.
the advantages of ring LPR become more pronounced. Based on the
simulation results for Pad, the OPRF in this paper is more stable;
although there are fluctuations, they are less significant compared to References
the LWE-based OPRF in [14].
[1] R. Lei, X. Chen, D. Liu, C. Song, Y. Tan, A. Ren, CEIU: Consistent and efficient
incremental update mechanism for mobile systems on flash storage, J. Syst. Ar-
5. Expansion of this work chit. 152 (2024) 103151, http://dx.doi.org/10.1016/j.sysarc.2024.103151, URL:
https://www.sciencedirect.com/science/article/pii/S1383762124000882.
[2] J. Sun, L. Yin, M. Zou, Y. Zhang, T. Zhang, J. Zhou, Makespan-minimization
Private Information Retrieval (PIR) [2329] is a technique that workflow scheduling for complex networks with social groups in edge
enables a client to securely download a specific element, such as a computing, J. Syst. Archit. 108 (2020) 101799, http://dx.doi.org/10.1016/
movie or a friends record, from a database managed by an untrusted j.sysarc.2020.101799, URL: https://www.sciencedirect.com/science/article/pii/
server, such as a streaming service or a social network, without disclos- S1383762120300928.
[3] Y. Gao, Y. Luo, L. Wang, X. Liu, L. Qi, W. Wang, M. Zhou, Efficient scalable
ing to the server which particular element has been retrieved. Given
multi-party private set intersection(-variants) from bicentric zero-sharing, in:
the functional similarities between PIR and PSI, this paper extends its
Proceedings of the Conference on Computer and Communications Security, CCS,
exploration into the construction of PIR using OPRF (see Fig. 11). Association for Computing Machinery (ACM), New York, NY, USA, 2024.
[4] M.O. Rabin, How to exchange secrets with oblivious transfer, 2005, URL: https:
5.1. Efficiency analysis PIR //eprint.iacr.org/2005/187.
[5] O. Goldreich, S. Goldwasser, S. Micali, How to construct random functions, J.
ACM 33 (4) (1986) 792807, http://dx.doi.org/10.1145/6490.6503.
This section simulates the PSI computation efficiency of this paper [6] M. Naor, O. Reingold, Number-theoretic constructions of efficient pseudo-random
and machine learning-based PIR in [30](DLMI for short) on MAC. functions, J. ACM 51 (2) (2004) 231262, http://dx.doi.org/10.1145/972639.
The tools used in the subsection are Python 3.12, the programs are 972643.
[7] M.J. Freedman, Y. Ishai, B. Pinkas, O. Reingold, Keyword search and oblivious
performed on MacBook Air MAC Desktop Apple M1, RAM 8.00 GB.
pseudorandom functions, in: J. Kilian (Ed.), Theory of Cryptography, Springer
The OPRF-based PIR proposed in this paper has a runtime that Berlin Heidelberg, Berlin, Heidelberg, 2005, pp. 303324.
differs from the machine learning-based PIR by no more than approx- [8] S. Jarecki, X. Liu, Efficient oblivious pseudorandom function with applications
imately 5 × 103 seconds. Additionally, the security of our PIR scheme to adaptive OT and secure computation of set intersection, in: O. Reingold (Ed.),
is theoretically supported in comparison to [30] (see Fig. 12). Theory of Cryptography, Springer Berlin Heidelberg, Berlin, Heidelberg, 2009,
pp. 577594.
[9] V.K. Yadav, N. Andola, S. Verma, S. Venkatesan, A survey of oblivious trans-
6. Conclusion fer protocol, ACM Comput. Surv. 54 (10s) (2022) http://dx.doi.org/10.1145/
3503045.
This paper presents a PSI based on efficient post-quantum OPRF and [10] M.R. Albrecht, A. Davidson, A. Deo, N.P. Smart, Round-optimal verifiable
oblivious pseudorandom functions from ideal lattices, in: J.A. Garay (Ed.), Public-
proves its security under the semi-honest model, demonstrating security
Key Cryptography PKC 2021, Springer International Publishing, Cham, 2021,
even in the CPA model in Definition 16. The addition of PPRG enables pp. 261289.
the PSI to effectively resist probabilistic attacks. In the simulation [11] N. Tyagi, S. Celi, T. Ristenpart, N. Sullivan, S. Tessaro, C.A. Wood, A fast
experiments, the proposed PSI shows greater efficiency compared to and simple partially oblivious PRF, with applications, in: O. Dunkelman, S.
post-quantum PSIs represented by LWE. Dziembowski (Eds.), Advances in Cryptology EUROCRYPT 2022, Springer
Although the PIR in this study is not as efficient as the machine International Publishing, Cham, 2022, pp. 674705.
[12] S. Casacuberta, J. Hesse, A. Lehmann, Sok: Oblivious pseudorandom functions,
learning-based PIR, the gap between the two is already quite small.
in: 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P),
However, there are also notable shortcomings; the efficiency of the 2022, pp. 625646, http://dx.doi.org/10.1109/EuroSP53844.2022.00045.
proposed PSI still lags behind that of non-post-quantum PSIs, which [13] D. Boneh, D. Kogan, K. Woo, Oblivious pseudorandom functions from isogenies,
will be addressed in future work. in: S. Moriai, H. Wang (Eds.), Advances in Cryptology ASIACRYPT 2020,
Springer International Publishing, Cham, 2020, pp. 520550.
[14] M. Chase, P. Miao, Private set intersection in the internet setting from lightweight
CRediT authorship contribution statement oblivious PRF, in: D. Micciancio, T. Ristenpart (Eds.), Advances in Cryptology
CRYPTO 2020, Springer International Publishing, Cham, 2020, pp. 3463.
Zhuang Shan: Writing original draft, Conceptualization. Leyou [15] Z. Shan, L. Zhang, Q. Wu, Q. Lai, Analysis, modify and apply in IIOT form
Zhang: Writing review & editing, Writing original draft. Qing Wu: light-weight PSI in CM20, 2024, URL: https://eprint.iacr.org/2024/969.
[16] J. Alwen, S. Krenn, K. Pietrzak, D. Wichs, Learning with rounding, revisited, in:
Conceptualization. Qiqi Lai: Writing review & editing. Fuchun Guo:
R. Canetti, J.A. Garay (Eds.), Advances in Cryptology CRYPTO 2013, Springer
Writing review & editing. Berlin Heidelberg, Berlin, Heidelberg, 2013, pp. 5774.
[17] A. Banerjee, C. Peikert, A. Rosen, Pseudorandom functions and lattices, in: D.
Declaration of competing interest Pointcheval, T. Johansson (Eds.), Advances in Cryptology EUROCRYPT 2012,
Springer Berlin Heidelberg, Berlin, Heidelberg, 2012, pp. 719737.
[18] D. Bellizia, C. Hoffmann, D. Kamel, H. Liu, P. Méaux, F.-X. Standaert, Y.
The authors declare that they have no known competing finan- Yu, Learning parity with physical noise: Imperfections, reductions and FPGA
cial interests or personal relationships that could have appeared to prototype, IACR Trans. Cryptogr. Hardw. Embed. Syst. 2021 (2021) 390417,
influence the work reported in this paper. URL: https://api.semanticscholar.org/CorpusID:235814670.
10
Z. Shan et al. Journal of Systems Architecture 160 (2025) 103346
[19] Y. Yu, J. Zhang, Smoothing out binary linear codes and worst-case sub- Leyou Zhang received the M.S. and Ph.D. degrees from Xid-
exponential hardness for LPN, in: T. Malkin, C. Peikert (Eds.), Advances in ian University, Xian, China, in 2002 and 2009, respectively.
Cryptology CRYPTO 2021, Springer International Publishing, Cham, 2021, pp. From 2013 to 2014, he served as a visiting scholar at the
473501. University of Wollongong, Australia. He currently worked
[20] V. Kolesnikov, R. Kumaresan, M. Rosulek, N. Trieu, Efficient batched oblivious in Xidian University as a professor.
PRF with applications to private set intersection, in: Proceedings of the 2016 His current research interests include public key cryp-
ACM SIGSAC Conference on Computer and Communications Security, CCS 16, tography, network security and computer security. He has
Association for Computing Machinery, New York, NY, USA, 2016, pp. 818829, over 120 scientific publications in many highly ranked
http://dx.doi.org/10.1145/2976749.2978381. cybersecurity journals and conferences.
[21] Z. Brakerski, E. Kirshanova, D. Stehlé, W. Wen, Learning with errors and
extrapolated dihedral cosets, in: Public-Key Cryptography PKC 2018, Springer
International Publishing, 2018, pp. 702727.
[22] A. Jain, H. Lin, J. Luo, D. Wichs, The pseudorandom oracle model and ideal
obfuscation, in: H. Handschuh, A. Lysyanskaya (Eds.), Advances in Cryptology
CRYPTO 2023, Springer Nature Switzerland, Cham, 2023, pp. 233262.
Qing Wu received the M.S. and Ph.D. degrees from the Xid-
[23] S. Angel, H. Chen, K. Laine, S. Setty, PIR with compressed queries and amortized
ian University, Xian, China, in 2006 and 2009, respectively.
query processing, in: 2018 IEEE Symposium on Security and Privacy, SP, 2018,
She currently works with Xian University of Posts and
pp. 962979, http://dx.doi.org/10.1109/SP.2018.00062. Communications, Xian, as a Professor. Her current research
[24] A. Burton, S.J. Menon, D.J. Wu, Respire: High-rate PIR for databases with small interests include artificial intelligence security and cloud
records, in: Proceedings of the Conference on Computer and Communications security.
Security, CCS, Association for Computing Machinery (ACM), New York, NY, USA,
2024.
[25] J. Dujmovic, M. Hajiabadi, Lower-bounds on public-key operations in PIR, in: M.
Joye, G. Leander (Eds.), Advances in Cryptology EUROCRYPT 2024, Springer
Nature Switzerland, Cham, 2024, pp. 6587.
[26] B. Fisch, A. Lazzaretti, Z. Liu, C. Papamanthou, Thorpir: Single server PIR via
homomorphic thorp shuffles, in: Proceedings of the Conference on Computer and
Communications Security, CCS, Association for Computing Machinery (ACM),
New York, NY, USA, 2024.
Qiqi Lai received the B.S. from PLA University of Informa-
[27] A. Gascon, Y. Ishai, M. Kelkar, B. Li, Y. Ma, M. Raykova, Computationally
tion Engineering, henan, China, in 2008. And he received
secure private information retrieval and aggregation in the shuffle model, in:
the M.S. and Ph.D. degrees from Xidian University, Xian,
Proceedings of the Conference on Computer and Communications Security, CCS, China, in 2011 and 2015.
Association for Computing Machinery (ACM), New York, NY, USA, 2024. His currently works with Shaanxi Normal University,
[28] A. Ghoshal, M. Zhou, E. Shi, Efficient pre-processing PIR without public- Xian, as a Professor. His current research interests include
key cryptography, in: M. Joye, G. Leander (Eds.), Advances in Cryptology the theory of lattice-based public key cryptography and its
EUROCRYPT 2024, Springer Nature Switzerland, Cham, 2024, pp. 210240. provable security, as well as the construction and analysis
[29] M. Luo, F.-H. Liu, H. Wang, Faster FHE-based single-server private information of homomorphic encryption schemes.
retrieval, in: Proceedings of the Conference on Computer and Communications
Security, CCS, Association for Computing Machinery (ACM), New York, NY, USA,
2024.
[30] M. Lam, J. Johnson, W. Xiong, K. Maeng, U. Gupta, Y. Li, L. Lai, I. Leontiadis,
M. Rhu, H.-H.S. Lee, V.J. Reddi, G.-Y. Wei, D. Brooks, E. Suh, GPU-based
Funcun Guo received the B.S. and M.S. degrees from Fujian
private information retrieval for on-device machine learning inference, in:
Normal University, China, in 2005 and 2008, respectively,
Proceedings of the 29th ACM International Conference on Architectural Support and the Ph.D. degree from the University of Wollongong,
for Programming Languages and Operating Systems, Volume 1, ASPLOS 24, Australia, in 2013. He is currently an Associate Research
Association for Computing Machinery, New York, NY, USA, 2024, pp. 197214, Fellow with the School of Computing and Information
http://dx.doi.org/10.1145/3617232.3624855. Technology, University of Wollongong.
His primary research interests include the public
key cryptography, in particular protocols, encryption and
Zhuang Shan received the B.S. from Liaoning Institute of signature schemes, and security proof.
Science and Technology, benxi, China, in 2019. And he
received the M.S. from North Minzu University, yinchuan,
China, in 2022.
He is currently pursuing the Ph,D. degree in mathemat-
ics with Xidian University, Xian, China. His current interests
include cryptography, reduction of hard problems in lattice,
and network security.
11

View File

@@ -0,0 +1,846 @@
Journal of Systems Architecture 160 (2025) 103331
Contents lists available at ScienceDirect
Journal of Systems Architecture
journal homepage: www.elsevier.com/locate/sysarc
A CP-ABE-based access control scheme with cryptographic reverse firewall
for IoV
Xiaodong Yang a , Xilai Luo a ,, Zefan Liao a , Wenjia Wang a , Xiaoni Du b , Shudong Li c
a College of Computer Science and Engineering, Northwest Normal University, China
b
College of Mathematics and Statistics, Northwest Normal University, China
c
Cyberspace Institute of Advanced Technology, Guangzhou University, China
ARTICLE INFO ABSTRACT
Keywords: The convergence of AI and internet technologies has sparked significant interest in the Internet of Vehicles
Attribute-based encryption (IoV) and intelligent transportation systems (ITS). However, the vast data generated within these systems
Multi-authority poses challenges for onboard terminals and secure data sharing. To address these issues, we propose a novel
Internet of Vehicles
solution combining ciphertext policy attribute-based encryption (CP-ABE) and a cryptographic reverse firewall
Cryptographic reverse firewall
(CRF) mechanism for IoV. This approach offers several advantages, including offline encryption and outsourced
Outsource decryption
decryption to improve efficiency. The CRF mechanism adds an extra layer of security by re-randomizing
vehicle data, protecting sensitive information. While single-attribute authority schemes simplify access control,
they are not ideal for IoV environments. Therefore, we introduce a multi-authority scheme to enhance
security. Performance analysis demonstrates our schemes ability to optimize encryption and decryption while
safeguarding vehicle data confidentiality. In summary, our solution improves data management, access control,
and security in the IoV, contributing to its safe and efficient development.
1. Introduction significant concerns about data security [5]. Therefore, cloud-based
solutions alone are insufficient to meet the demands of the IoV. To
Advances in 5G technology, coupled with the growing volume of ve- mitigate these issues, edge computing [6], fog computing [7], and
hicular traffic, have intensified concerns regarding traffic safety, travel Roadside Units (RSUs) [8] have been proposed. RSUs, with their higher
efficiency, and environmental impact. In response, Intelligent Transport computational capabilities, can process data more efficiently and up-
Systems (ITS) and the IoV have emerged as critical components of load it to cloud servers in real time, addressing the challenges of latency
modern transportation infrastructure. The functionality of the IoV relies and limited onboard processing power.
on three key elements: the internal vehicle network, the vehicle-to- However, data security remains a critical issue. One potential so-
vehicle communication network, and the in-vehicle mobile internet. lution is encrypting data before transmission, which introduces chal-
These elements integrate technologies such as sensors, RFID (Radio Fre- lenges in ciphertext sharing. Traditional symmetric encryption, re-
quency Identification), and automated control systems, operating under quiring a one-to-one correspondence between keys and users, proves
established communication protocols to enable seamless, dynamic data inefficient for securing large volumes of data in IoV environments. Con-
exchange between vehicles and the broader network.
ventional asymmetric encryption algorithms also struggle with cipher-
While drivers benefit from applications like navigation and traffic
text sharing and are ill-suited for the frequent updates characteristic
information sharing, the limited computing power of onboard terminals
of IoV applications. A more appropriate approach is Attribute-Based
is insufficient for computationally intensive tasks such as autonomous
Encryption (ABE), which enables fine-grained access control, supports
driving and AI-based obstacle avoidance [1]. A potential solution is
encryption for multiple recipients, and facilitates the creation of com-
offloading data processing to cloud servers, but the large volume of
plex access policies [911]. ABE allows data owners to control who
vehicle-generated data introduces high latency in communication be-
can access their data, but the decryption process is computationally
tween the onboard terminal and the cloud, compromising real-time
decision-making [24]. This latency, coupled with the risks associated intensive, requiring numerous pairing and exponential operations. This
with data leakage and theft in semi-trusted cloud environments, raises places a significant burden on resource-constrained onboard terminals,
Corresponding author.
E-mail addresses: yangxd200888@163.com (X. Yang), 2023222208@nwnu.edu.cn (X. Luo), lzf0097@163.com (Z. Liao), neuer1130@163.com (W. Wang),
duxiaonwnu@163.com (X. Du), lishudong@gzhu.edu.cn (S. Li).
https://doi.org/10.1016/j.sysarc.2025.103331
Received 11 August 2024; Received in revised form 4 December 2024; Accepted 2 January 2025
Available online 17 January 2025
1383-7621/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
X. Yang et al. Journal of Systems Architecture 160 (2025) 103331
hindering timely data retrieval and impeding efficient communication. Yang et al. [22] introduced a CP-ABE scheme for dynamic big data
As the number of attributes increases, the decryption complexity grows, updates, and Feng et al. [23] developed a CP-ABE scheme for industrial
leading to slower decryption times and higher resource consumption. IoT. Other schemes [24,25] have improved security and efficiency,
To address these challenges, several outsourced ABE schemes have broadening ABEs application to the Internet of Medical Things (IoMT).
been proposed [1215], which offload expensive operations to cloud CP-ABE enables fine-grained access control, making it highly appli-
servers, alleviating the computational load on onboard terminals. How- cable in sectors such as smart healthcare and intelligent transportation.
ever, even secure theoretical implementations of ABE are vulnerable to However, single-attribute authority ABE schemes are vulnerable to col-
practical attacks. Sophisticated adversaries may exploit backdoors [16], lusion attacks. To address this, it is desirable to delegate each attribute
manipulate pseudo-random number generators [17,18], or intercept to different attribute authorities. Chase [26] was the first to introduce
hardware interactions to gain unauthorized access to sensitive data. To the concept of multiple attribute authorities within the ABE framework,
counter these threats, the concept of a Cryptographic Reverse Firewall where various authorities oversee different attributes. Lewko and Wa-
(CRF) was introduced [19]. The CRF, positioned between the user and ters [27] later introduced the initial decentralized ABE framework with
the server, intercepts and alters messages to ensure data security, even multiple authorities. Following this, Chaudhary et al. [28] proposed
if the user is compromised. a multi-authority CP-ABE scheme tailored for the Internet of Vehicles
Moreover, traditional ABE schemes rely on a single attribute au- (IoV) context.
thority, which poses a risk of key leakage if the authority colludes
Considering the constrained computing capabilities of user termi-
with an adversary. To mitigate this, we propose a multi-authority
nals, Green et al. [12] introduced an ABE scheme that delegates de-
ABE scheme, integrated with a CRF, to enhance security and prevent
cryption computations to the cloud. Lai et al. [13] improved upon this
collusion attacks. The key contributions of this paper are as follows:
by achieving verifiability of outsourced decryption. Zhong et al. [29]
1. We propose a CP-ABE-based scheme that enables more granular further enhanced the efficiency of outsourced decryption ABE schemes
access control policies, enhancing the systems flexibility. This and applied them to smart healthcare scenarios.
proves particularly beneficial in IoV scenarios such as IoV com- Mironov and Stephens-Davidowitz [19] were the first to introduce
munication, where data access can be dynamically adjusted in the concept of a reverse firewall. They proposed a generic architecture
accordance with the context. to prevent user tampering, which could lead to data leakage. However,
2. The scheme integrates multiple attribute authorities to prevent the previous approach was found unsuitable for ABE schemes, prompt-
collusion attacks and guarantee secure key management. Each ing Ma et al. [30] to introduce a cryptographic reverse firewall utilizing
authority is responsible for managing vehicle attribute keys, the CP-ABE scheme. Additionally, Hong et al. [31] proposed a KP-ABE
enhancing the security and efficiency of key generation, which scheme with multiple authorities. Due to the limitations of KP-ABE in
is ideal for environments like smart cities or autonomous vehicle achieving fine-grained access control, Zhao et al. [32] proposed a CP-
fleets. ABE scheme incorporating a CRF and leveraged outsourced decryption
3. We enhance the CRF module by incorporating key parameter to alleviate computational burdens. However, these approaches suffer
re-randomization within the multi-authority ABE framework, from drawbacks, such as reliance on a single attribute authority or
strengthening security in IoV communications, even if certain excessive computational overhead. Moreover, there is a risk of sys-
parts of the system are compromised. tem compromise, which could lead to data leakage, especially in the
4. The scheme optimizes decryption efficiency through the use of context of IoV, characterized by constrained computational resources
online-offline encryption techniques and offloading decryption and stringent data privacy requirements. At the same time, the devel-
operations. Decryption time does not increase linearly with the opment of IoV places higher demands on the security and flexibility
number of attributes, making it suitable for real-time applica- of access control. Therefore, the proposed scheme combines CP-ABE,
tions like hazard detection and traffic optimization. CRF, and multi-authority models to meet the requirements for security,
5. The scheme also supports message integrity verification, which flexibility, and low computational overhead.
can be easily carried out by onboard terminals using simple hash
functions, ensuring the authenticity of IoV messages and pre-
3. System model and definitions
venting malicious tampering in safety-critical communications.
The paper is organized as follows: Section 2 reviews existing 3.1. Preliminaries
attribute-based encryption schemes and the application of CRFs. Sec-
tion 3 provides an overview of the system and security models. Sec- 1. Bilinear Maps: Involve two multiplicative cyclic groups of prime
tion 4 discusses the base scenario and the extended CRF module. order 𝑝, denoted as 𝐺 and 𝐺𝑇 , with 𝑔 representing a generator
Section 5 presents security proofs for the base scheme and the CRF- of 𝐺. A bilinear map 𝑒 𝐺 × 𝐺𝐺𝑇 must satisfies the following
enhanced scheme. Section 6 reports on experiments and results. Finally, three features:
Section 7 concludes the paper.
(a) Non-degeneracy: 𝑒(𝑔 , 𝑔) ≠ 1.
2. Related work (b) Computability: Efficient computation of 𝑒(𝑀 , 𝑁) for any el-
ements 𝑀 , 𝑁𝐺 is achievable through a polynomial-time
Sahai [10] introduced fuzzy identity-based encryption, which paved algorithm.
the way for Attribute-Based Encryption (ABE). ABE later branched (c) Bilinearity: Efficient computation of 𝑎, 𝑏𝑍𝑝 for any ele-
into two forms: Key-Policy ABE (KP-ABE) [9] and Ciphertext-Policy ments 𝑀 , 𝑁𝐺 we can acquire 𝑒(𝑀 𝑎 , 𝑁 𝑏 ) = 𝑒(𝑀 , 𝑁)𝑎𝑏 .
ABE (CP-ABE) [11]. Initially, both schemes used access trees to define
policies. However, the first CP-ABE scheme only provided security 2. Access Structure: Consider a set 𝑃 = {𝑃1 , 𝑃2 , … , 𝑃𝑛 } representing
under the random oracle model. Waters [20] introduced an LSSS-based 𝑛 users. A collection 𝑄 is deemed monotone if, for any subsets
CP-ABE scheme that encodes policies using matrices. This founda- ∀𝐾 , 𝐿: if 𝐾𝑄 and 𝐾𝐿, then 𝐿𝑄. Let 𝑄 bbe a nonempty
tional model has influenced many subsequent ABE schemes, which subset of 𝑃 that is monotonic, i.e. 𝑄 ⊆ 2{𝑃1 ,𝑃2 ,…,𝑃𝑛 } {∅}, then call
have expanded into diverse domains, particularly cloud computing. 𝑄 a monotone access structure. In the context of access control,
For example, Yu et al. [21] proposed a KP-ABE scheme enabling data sets included in 𝑄 are identified as authorized, while those that
delegation to semi-trusted cloud servers while ensuring confidentiality. are not included are referred to as unauthorized sets.
2
X. Yang et al. Journal of Systems Architecture 160 (2025) 103331
3. Linear Secret Sharing Scheme (LSSS): Let 𝐴̃ = {𝐴̃ 1 , 𝐴̃ 2 , … , 𝐴̃ 𝑁 } be
defined as the set that includes all possible attribute names. Cor-
responding to each attribute name 𝐴̃ 𝑖 ∈ 𝐴̃ within A, there is an
associated set of attribute values, denoted as 𝐴̃𝑖 = {𝐴𝑖,1 , 𝐴𝑖,2 , … ,
𝐴𝑖,𝑏𝑖 }, where 𝑏𝑖 is the order of 𝐴̃ 𝑖 . The policy for access is denoted
as 𝑇 = (𝑀 , 𝜌, 𝑉 ) Within the context of a linear secret sharing
scheme, 𝑀 denotes a matrix structured with 𝑙 row size and 𝑛
column size. 𝜌 denotes a function that associates each row of
𝑀 with an attribute name in 𝐴̃ 𝑖 . 𝑉 = {𝑣𝜌(𝑖) }𝑖∈[1,𝑙] represents
the set of attribute values associated with 𝑇 = (𝑀 , 𝜌). A LSSS
encompasses the following pair of algorithms:
(a) Distribute: Regarding the confidential value 𝑠𝑍𝑝 , arbi-
trarily choose a vector 𝑓 = (𝑠, 𝑓2 , … , 𝑓𝑛 ), where 𝑓2 , … , 𝑓𝑛
𝑍𝑝 . Calculate 𝜆𝑖 = 𝑀𝑖𝑓 , where 𝑀𝑖 is the 𝑖𝑡 row of matrix
𝑀. 𝜆𝑖 is a share of 𝑠 that corresponds to 𝜌(𝑖).
(b) Reconstruct: Let 𝑆 ∈ 𝐴̃ is permissible for any recognized Fig. 1. Leak game.
group and 𝐼 = {𝑖 𝜌(𝑖) ∈ 𝑆} ⊆ {1, 2, … , 𝑙}, then, there
is a collection of constants {𝜔𝑖 ∈ 𝑍𝑝 } satisfy 𝑖∈𝐼 𝜔𝑖 𝑀𝑖 =
(1, 0, … , 0). The secret 𝑠 could be reconstructed by us via  and a party 𝑃 form a composed party, then we call  a
calculating 𝑖∈𝐼 𝜔𝑖 𝑀𝑖 = 𝑠. cryptographic reverse firewall for 𝑃 . Next we give definitions
of three properties of CRFs:
Assume S= {𝐼𝑢 , 𝑆} represents the collection of attributes for
users. 𝐼𝑢 ⊆ 𝐴̃ represents a collection of user attribute names. (a) Function Maintaining: In the context of any given reverse
𝑆 = {𝑠𝑖 }𝑖∈𝐼𝑢 denotes a set that includes all the attribute values firewall identified by  and any given party identified by
of the user. For ∀𝑖 ∈ 𝐼, where 𝐼 = {𝑖 𝜌(𝑖) ∈ 𝑆} ⊆ {1, 2, … , 𝑙}, 𝑃 , let  1 ◦𝑃 = ◦𝑃 . For 𝑘 ≥ 2, let  𝑘 ◦𝑃 = ◦( 𝑘1 ◦𝑃 ).
if 𝑖 satisfies (𝑀 , 𝜌) and 𝑠𝜌(𝑖) = 𝑣𝜌(𝑖) , thereafter, we identify S as For a framework  that adheres to the functionality re-
matching 𝑇 . quirement  , we define the reverse firewall  maintains
4. q-BDHE problem: Suppose 𝐺 and 𝐺𝑇 represent two cyclic groups functionality if the composed party ◦𝑃 guarantees the
with multiplication as their operation, and the order of each is functionality of the party 𝑃 under the scheme  in poly-
the prime 𝑝, and 𝑔 be a generator of 𝐺. 𝐺𝑇 has a bilinear map nomial time.
𝑒 𝐺 × 𝐺𝐺𝑇 . Choose 𝑡, 𝑓𝑍𝑝 at random, and calculate (b) Weakly Security-preserving:  operates under the premise
2 𝑞 𝑞+2 2𝑞
𝐽 = (𝑔 , 𝑔 𝑡 , 𝑔 𝑓 , 𝑔 𝑓 , … , 𝑔 𝑓 , 𝑔 𝑓 , … , 𝑔 𝑓 ). In the context of the 𝑞- that it will fulfill the functionality need  and the security
BDHE problem, it is posited that no algorithm operating within need . When faced with any polynomial-time adversary
𝑞+1
polynomial time can differentiate between 𝑒(𝑔 , 𝑔)𝑓 𝑡𝐺𝑇 and 𝐵, we say that the scheme  satisfies weakly security-
𝐾𝐺𝑇 with a significant advantage. preserving if ◦𝑃 satisfies the security requirement .
5. Cryptographic Scheme: The cryptographic scheme  defines the (c) Weakly Exfiltration-resistant: The game Leak(, 𝑃𝑗 ,  , 𝜆),
interaction between parties (𝑃1 , 𝑃2 , … , 𝑃𝑙 ) with states. The pro- as depicted in the Fig. 1, is the work of designers Mironov
cess of scheme establishment is denoted by 𝑠𝑒𝑡𝑢𝑝(1𝜆 ), where 𝜆 and Stephens-Davidowitz [19]. The game is a security
refers to the security parameters. Each party enters the public game between a reverse firewall  of party 𝑃 and a
parameters 𝑃𝑔 and related messages, and then runs the sys- scheme  containing a tampering party  . The adversary
tem initialization algorithm to obtain the corresponding state may control a party by hacking into the partys algorithm
(𝜐𝑃𝑖 )𝑙𝑖=1 for each party. According to the order in which the 𝑟𝑒𝑐 𝑒𝑖𝑣𝑒, 𝑛𝑒𝑥𝑡, 𝑜𝑢𝑡𝑝𝑢𝑡.
scheme proceeds, the parties process messages from other parties The purpose of the game is to let the adversary discern
in the scheme. Also, each party must have the corresponding whether the partys actions are honest or tampered with.
algorithms 𝑛𝑒𝑥𝑡𝑃𝑖 (𝜐𝑃𝑖 ) and 𝑟𝑒𝑐 𝑒𝑖𝑣𝑒𝑃𝑖 (𝜐𝑃𝑖 ). 𝑛𝑒𝑥𝑡𝑃𝑖 (𝜐𝑃𝑖 ) is used to Thus, a reverse firewall with leak resistance can make it
output the updated message, 𝑟𝑒𝑐 𝑒𝑖𝑣𝑒𝑃𝑖 (𝜐𝑃𝑖 ) is used to output the impossible for an adversary to tell if party 𝑃 has been tam-
states of the parties after the message update. After the scheme pered with, or if the party is known to have been tampered
is completed, each party has algorithm 𝑜𝑢𝑡𝑝𝑢𝑡𝑃𝑖 (𝜐𝑃𝑖 ) return the with but does not know if the operation is honest, hence
results of the scheme. We assume that the scheme  meets protecting the important privacy of the party.
functionality requirement  and security requirements . If adversary 𝐵 within the Leak(, 𝑃𝑗 ,  , 𝜆) game cannot
6. Cryptographic Reverse Firewall: , the stateful algorithm, is syn- succeed in polynomial time with a noticeable advantage
onymous with the Cryptographic Reverse Firewall. When pro- and while maintaining the partys functionality  , then we
vided with a current state and an input message, the algorithm label the reverse firewall  as weakly capable of resisting
processes them and subsequently outputs an updated state and exfiltration.
message. For ease of presentation, the state of  is not explicitly
written out in the definition. Given that 𝑃 is a party and  is a
firewall, the expression ◦𝑃 is introduced to indicate the party 3.2. System model
that emerges from their composition.
Fig. 2 depicts the four components that constitute our scheme:
◦𝑃 = 𝑟𝑒𝑐 𝑒𝑖𝑣𝑒◦𝑃 (𝜐, )
Attribute authorities (AA), Cloud server (CS), Data user (DU), Data
= 𝑟𝑒𝑐 𝑒𝑖𝑣𝑒𝑃 (𝜐, (𝑚)) owner (DO). In addition, the system contains three reverse firewalls.
= 𝑛𝑒𝑥𝑡◦𝑃 = (𝑛𝑒𝑥𝑡𝑃 (𝜐)) To implement data re-randomization within the RSU, three firewalls
are strategically positioned: 𝐴𝐴 , the reverse wall for AA; 𝐷𝑂 , acting
= 𝑜𝑢𝑡𝑝𝑢𝑡◦𝑃 (𝜐) = 𝑜𝑢𝑡𝑝𝑢𝑡𝑃 (𝜐) (1)
as the reverse firewall for DO; and 𝐷𝑈 , fulfilling the same role for
When the composite party participates in the scheme, the initial DU.
state of the firewall  is set as the public parameter 𝑃𝑔 . If CS is mainly deployed to store cipher text and conversion key.
3
X. Yang et al. Journal of Systems Architecture 160 (2025) 103331
algorithm 𝐾 𝑒𝑦𝐺𝑒𝑛 and obtains corresponding secret key 𝑆 𝐾𝑖 .
Then 𝐹 executes algorithm 𝐴𝐴 .𝐾 𝐺 and gets the re-randomized
private key 𝑆 𝐾𝑖 . Subsequently, 𝐹 executes 𝐾 𝑒𝑦𝐺𝑒𝑛.𝑟𝑎𝑛 to get
conversion key 𝑇 𝐾𝑖 . Then 𝐹 executes 𝐷𝑈 .𝑇 𝐾 𝑈 𝑝𝑑 𝑎𝑡𝑒 to ob-
tain re-randomized conversion key 𝑇 𝐾𝑖 . Eventually, 𝐹 sends
(𝑆 𝐾𝑖 , 𝑇 𝐾𝑖 ) to 𝐵.
4. Challenge Phase: Two equal-length plaintexts, 𝑚0 , 𝑚1 , are deliv-
ered by 𝐵 as part of the protocol. 𝐹 randomly chooses 𝑏
{0, 1} and executes Enc.Offline*, Enc.Online* to obtain challenge
ciphertext 𝐶 𝑇𝑏 . Then 𝐹 calls 𝐷𝑂 .𝐸 𝑛𝑐 .𝑂𝑓 𝑓 𝑙𝑖𝑛𝑒, 𝐷𝑂 .𝐸 𝑛𝑐 .𝑂𝑛𝑙𝑖𝑛𝑒
to get updated cipher text 𝐶 𝑇𝑏 . 𝐹 sends 𝐶 𝑇𝑏 to 𝐵.
5. Query Phase 2: Same as Query Phase 1.
6. Guess Phase: 𝐵 outputs the guess 𝑏 ∈ {0, 1} for 𝑏.
Definition 1. The criterion for the basic schemes selective CPA-secure
is met when the probability of adversary 𝐵s success in the game during
Fig. 2. System model. polynomial time is negligible.
4. System construction
AA is charged with the responsibility of establishing the public
parameters and generating the master secret keys. 4.1. Basic scheme
DU includes setting the access policy that guides the encryption
process and producing a verification credential. After these steps are The scheme contains 𝑁 attribute authorities, each attribute author-
accomplished, the DU uploads both the encrypted data and the verifi- ity managing one class of attributes 𝐴̃𝑖 = {𝐴𝑖,1 , 𝐴𝑖,2 , … , 𝐴𝑖,𝑏𝑖 }, 𝐴𝑖,1 ∈ 𝑍𝑝 ,
cation credential to the cloud server. 𝑖 = 1, 2, … , 𝑁, 𝑗 = 1, 2, … , 𝑏𝑖 .
DO initiates the process by generating a conversion key, which is
1. Global Setup: Attribute authority 𝐴𝐴1 sets commonly known
then uploaded to the cloud server. Following this, the DO retrieves the
parameters 𝑃 𝑎𝑟𝑎𝑚𝑠 = {𝑔 , 𝑢, 𝑣, 𝑤, , 𝐺, 𝐺𝑇 , 𝐻0 ()} and publishes
ciphertext and the verification credential from the cloud server to carry
them, 𝐻0 is the designated collision-resistant hash function for
out the concluding stages of decryption and integrity verification.
generating robust verification credentials within the system.
𝐴𝐴 includes the re-randomization of public parameters and the 
𝐻0 () {0, 1} → {0, 1} 𝐻0 .
secret keys that belong to users.
2. AASetup:
𝐷𝑂 is responsible to rerandomize cipher texts.
𝐷𝑈 is responsible to rerandomize conversion keys and conversion (a) For each Attribute Authority, the process involves ran-
ciphertexts. domly choosing 𝛼𝑖𝑍𝑝 , determining 𝑌𝑖 = 𝑒(𝑔 , 𝑔)𝛼𝑖 , and
then distributing 𝑌𝑖 to other attribute authorities. As the
3.3. Security model process concludes, each attribute authority carries out the
∏𝑁 ∑𝑁
calculation for 𝑌 = 𝑖=1 𝛼𝑖 = 𝑒(𝑔 , 𝑔)𝛼 ,
The DO and the DU in our system are considered completely trust- ∑𝑁 𝑖=1 𝑌𝑖 = 𝑒(𝑔 , 𝑔)
where 𝛼 = 𝑖=1 𝛼𝑖 .
worthy. However, the reverse firewalls and cloud server are deemed
honest and curious, meaning they will comply with the algorithms (b) Each attribute authority 𝐴̂ 𝑖 operates as follows:
steps but will also endeavor to discover any private information within • Randomly select 𝑁 1 elements 𝑠𝑖𝑘𝑍𝑝 (𝑘
the data. Furthermore, there is a risk of the Attribute Authority collud- {1, 2, … , 𝑁}{𝑖}), calculate 𝑔 𝑠𝑖𝑘 and send it to other
ing with an adversary. In response to this challenge, we have put in attribute authorities.
place a selective CPA security game, and the sequence of events within • After receiving 𝑁 1 components 𝑔 𝑠𝑘𝑖 from other
this game is as follows: ascribe powers 𝐴̂ 𝑘 (𝑘 ∈ {1, 2, … , 𝑁}{𝑖}), the master
key 𝑀 𝐾 𝑖 is calculated by the following formula:
1. Init Phase: The rival 𝐵 declares a set of malicious attribute ∏
authorities 𝑅 = (𝐴̂ 𝑖 )𝑖∈𝐼 and access policies (𝑀𝑖 , 𝜌𝑖 )𝑖∈𝐼 to be 𝑀𝐾𝑖 = (𝑔 𝑠𝑖𝑘 𝑔 𝑠𝑘𝑖 )
challenged, where 𝐼 ⊆ {1, 2, … , 𝑁}, 𝐼 ⊆ {1, 2, … , 𝑁}. Then 𝑘∈{1,2,…,𝑁}{𝑖}
∑ ∑
𝐵 sends algorithms 𝐺𝑙𝑜𝑏𝑎𝑙𝑠𝑒𝑡𝑢𝑝 , 𝐴𝐴𝑆 𝑒𝑡𝑢𝑝 , 𝐾 𝑒𝑦𝐺𝑒𝑛 , 𝐾 𝑒𝑦.𝑟𝑎𝑛 , ( 𝑠𝑖𝑘 𝑠𝑘𝑖 )
𝑒𝑛𝑐 .𝑜𝑓 𝑓 𝑙𝑖𝑛𝑒 , 𝑒𝑛𝑐 .𝑜𝑛𝑙𝑖𝑛𝑒 to challenger 𝐹 . = 𝑔 𝑘∈{1,2,…,𝑁}{𝑖} 𝑘∈{1,2,…,𝑁}{𝑖}
, (2)
2. Setup Phase: 𝐹 executes algorithms 𝐺𝑙𝑜𝑏𝑎𝑙𝑠𝑒𝑡𝑢𝑝 and 𝐴𝐴𝑆 𝑒𝑡𝑢𝑝 to ∏𝑁
obtain the public parameter 𝑃 𝑎𝑟𝑎𝑚𝑠, attribute authorities public where 𝑖=1 𝑀 𝐾𝑖 = 1.
key 𝑃 𝐾 and private key pairs (𝑃 𝐾𝑖 , 𝐴𝑆 𝐾 𝑖 )𝑖∈𝐼 . Subsequently, the • For each attribute 𝐴𝑖,𝑗 ∈ 𝐴̃𝑖 , calculate 𝑢𝐴𝑖,𝑗 .
reverse firewall puts the 𝑊𝐴𝐴 .𝑆 𝑒𝑡𝑈 𝑝 algorithm into action to
Attribution authority publishes public key 𝑃 𝐾 = (𝑔 , 𝑢, ,
generate and announce the new public key 𝑃 𝐾 , and in doing
𝑤, 𝑣, 𝑒(𝑔 , 𝑔)𝛼 , 𝐺, 𝐺𝑇 ) and keeps its own private key 𝐴𝑆 𝐾 𝑖 =
so, also retains the corresponding random number 𝑓 . 𝐵 can
{𝛼𝑖 , (𝑢𝐴𝑗 )𝐴 ∈𝐴̂ , 𝑀 𝐾𝑖 }.
receive 𝑃 𝐾𝑖 from all non-malicious attribute authorities and 𝑗 𝑖
(𝑃 𝐾𝑖 , 𝐴𝑆 𝐾 𝑖 )𝑖∈𝐼 from all malicious attribute authorities.
3. KeyGen: Each attribute authority 𝐴̂ 𝑖 execute algorithm as fol-
3. Query Phase 1: Adaptive requests for secret keys regarding at-
lows:
tribute sets 𝑆1 , 𝑆2 , … , 𝑆𝑞 can be made by 𝐵. Each time 𝐵 per-
forms a key query, when submitting a set of attributes, it is (a) Select 𝜃𝑖 ∈ 𝑍𝑝 at random, thereafter derive the elements
imperative that they do not comply with the access structure of the secret key, denoted as 𝑀 𝐾𝑖𝑔 𝜃𝑖 , 𝑀 𝐾𝑖 ⋅ 𝑣−𝜃𝑖 , 𝑀 𝐾𝑖
rules outlined by (𝑀𝑖 , 𝜌𝑖 )𝑖∈𝐼 , nor come from a malicious at- 𝑔 𝛼𝑖 ⋅ 𝑤𝜃𝑖 and subsequently convey these elements to the
tribute authority 𝑅 = (𝐴̂ 𝑖 )𝑖∈𝐼 . For every query 𝑆𝑖 , 𝐹 executes pertinent attribute authorities.
4
X. Yang et al. Journal of Systems Architecture 160 (2025) 103331
(b) Upon obtaining the components from various attribute 4.2. CRF scheme
authorities, proceed to compute the secret key utilizing
the following steps: 1. Initialization: The attribute authorities runs 𝐺𝑙𝑜𝑏𝑎𝑙𝑆 𝑒𝑡𝑢𝑝 and
∏𝑁 ∑𝑁
𝐴𝐴𝑆 𝑒𝑡𝑢𝑝, each attribute authority sends 𝛼𝑖 to 𝐴𝐴 , then 𝐴𝐴
𝐾0 = 𝑀 𝐾𝑖𝑔 𝛼𝑖 ⋅ 𝑤𝜃𝑖 = 𝑔 𝑖=1 𝛼𝑖 𝑤𝑟 (3) executes algorithms as follows:
𝑖=1 𝐴𝐴 .𝑆 𝑒𝑡𝑈 𝑝 Upon receiving the parameters from 𝐴𝐴, the CRF
𝑁 ∑𝑁 𝐴𝐴 calculates 𝛼 = 𝑁 𝑖=1 𝛼𝑖 , then randomly chooses 𝑎, 𝑏, 𝑐 , 𝑑 , 𝑒, 𝑓
𝐾1 = 𝑀 𝐾𝑖𝑔 𝜃𝑖 = 𝑔 𝑖=1 𝜃𝑖 = 𝑔𝑟 (4) 𝑍𝑝 and calculates 𝑔 = 𝑔 𝑎 , 𝑢 = 𝑢𝑏 , = 𝑐 , 𝑤 = 𝑤𝑑 , 𝑣 =
𝑖=1 2
𝑣𝑒 , 𝛼 = 𝛼 + 𝑓 , 𝑒(𝑔 , 𝑔 )𝛼 = 𝑒(𝑔 , 𝑔)𝑎 (𝛼+𝑓 ) . 𝐴𝐴 stores 𝑓 and
∏𝑁
𝐾𝑣 = 𝑀 𝐾𝑖 ⋅ 𝑣−𝜃𝑖 = 𝑣𝑟 (5) publishes the updated 𝑃 𝐾 = (𝑔 , 𝑢 , , 𝑤 , 𝑣 , 𝑒(𝑔 , 𝑔 )𝛼 , 𝐺, 𝐺𝑇 ).
After receiving 𝑃 𝐾 , 𝐴𝐴 executes 𝐾 𝑒𝑦𝐺𝑒𝑛 to generate secret key
𝑖=1
𝑆 𝐾 = {𝐾0 , 𝐾1 , {𝐾𝑖,2 , 𝐾𝑖,3 }𝑖∈[1,𝜎] , 𝑆𝐼 𝐷 } and sends 𝑆 𝐾 to CRF 𝐴𝐴 .
(c) For each attribute 𝜎 ∈ [𝑆𝐼 𝐷 ∩ 𝐴̂ 𝑖 ], randomly choose 𝑟𝜎 ∈ 𝐴𝐴 runs the following algorithm for re-randomization.
𝑍𝑝 , where 𝜎𝑁 and 𝑆𝐼 𝐷 denotes the set of users. 𝐴𝐴 .𝐾 𝐺 Provide 𝑃 𝐾 , 𝑓 and 𝑁 as input, where 𝑁 rep-
𝑟 𝑟 resents the total number of attributes. 𝐴𝐴 randomly selects
Calculate 𝐾𝑖,2 = 𝑔 𝑟𝑖 , 𝐾𝑖,3 = (𝑢𝐴𝑖 ) 𝑖𝐾𝑣 = (𝑢𝐴𝑖 ) 𝑖 𝑣𝑟 .
𝑟 , 𝑟1 , 𝑟2 , … , 𝑟𝑁𝑍𝑝 , calculates 𝐾 ̃′ = 𝑔 𝑓 𝑤 𝑟 , 𝐾
̃′ = 𝑔 𝑟 . For
Then user gets the secret key 𝑆 𝐾 = {𝐾0 , 𝐾1 , 0 1
𝑟𝑖
{𝐾𝑖,2 , 𝐾𝑖,3 }𝑖∈[1,𝜎] , 𝑆𝐼 𝐷 }. 𝑖 = 1, 2, … , 𝑁, 𝑊 computes 𝐾 = 𝑔 , 𝐾 = 𝑣 𝑟 , 𝐾
𝐴𝐴
̃
𝑖,2
̃ =
𝑣 𝑖,3
𝑟 𝑟
(𝑢 𝐴𝑖 ) 𝑖𝐾𝑣 = (𝑢 𝐴𝑖 ) 𝑖 𝑣 𝑟 . The intermediate key 𝑍 𝑆 𝐾 =
4. KeyGen.ran: Upon inputting 𝑆 𝐾, the data user independently ̃′ , 𝐾
(𝐾 ̃′ , {𝑟 , 𝐾
̃ ̃
,𝐾 } ).
0 1 𝑖 𝑖,2 𝑖,3 𝑖∈[1,𝑁]
selects a random element from the finite field 𝜏 ∈ 𝑍𝑝 , and
Eventually, 𝐴𝐴 computes 𝐾0 = 𝐾0 ⋅ 𝐾 ̃′ = 𝑔 𝛼+𝑓 𝑤 𝑟+𝑟 =
proceeds to calculate 𝐾0 = 𝐾0 1𝜏 = 𝑔 𝛼∕𝜏 𝑤𝑟∕𝜏 , 𝐾1 = 𝐾1 1𝜏 = 𝑔 𝑟∕𝜏 .
0
= 𝐾 1𝜏 = 𝑔 𝑟𝑖 ∕𝜏 , ̃′ = 𝑔 𝑟+𝑟 . For 𝑖 = 1, 2, … , 𝜎, where
𝑔 𝛼 𝑤 𝑟+𝑟 , 𝐾 = 𝐾𝐾
For 𝑖 = 1, 2, … , 𝜎, the data user calculates 𝐾𝑖,2 𝑖,2 1 1 1
𝐾𝑖,3
𝑟 ∕𝜏
= 𝐾 1𝜏 = (𝑢𝐴𝑖 ) 𝑖 𝑣−𝑟∕𝜏 . The transformation key, desig-
𝜎𝑁, 𝐴𝐴 calculates 𝐾𝑖,2 ̃
= 𝐾𝑖,2 ⋅ 𝐾
𝑖,2
= 𝑔 𝑟𝑖 +𝑟𝑖 , 𝐾𝑖,3
=
𝑖,3
= (𝑢 𝐴𝑖 )𝑟𝑖 +𝑟𝑖 𝑣 𝑟𝑟 . 
nated as 𝑇 𝐾 = (𝑆𝐼 𝐷 , 𝐾0 , 𝐾1 , {𝐾𝑖,2 , 𝐾 } ) and the recovery ̃
𝑖,3 𝑖∈[1,𝜎] 𝐾𝑖,3 ⋅ 𝐾 𝑖,3 𝐴𝐴 sends the updated 𝑆 𝐾 =
(𝐾0 , 𝐾1 , {𝐾𝑖,2 , 𝐾𝑖,3 } , 𝑆𝐼 𝐷 ) to data user.
key, denoted as 𝑅𝐾 = 𝜏, serve distinct functions within the
𝑖∈[1,𝜎]
cryptographic framework. 2. Data Upload: The data owner invokes the 𝐸 𝑛𝑐 .𝑂𝑓 𝑓 𝑙𝑖𝑛𝑒
5. Enc.Offline: Enter the 𝑃 𝐾, and let 𝑁 denote the upper limit on and 𝐸 𝑛𝑐 .𝑂𝑛𝑙𝑖𝑛𝑒 to obtain ciphertext 𝐶 𝑇 = ((𝑀 , 𝜌), 𝐶 , 𝐶0 ,
the count of rows within the secret sharing matrix. The data {𝐶𝑗 ,1 , 𝐶𝑗 ,2 , 𝐶𝑗 ,3 }𝑗∈[1,𝑙] ) and verification credential 𝑇 𝑜𝑘𝑒𝑛, then
owner randomly chooses 𝑠𝑍𝑝 , calculates 𝐶̂ = 𝑒(𝑔 , 𝑔)𝛼𝑠 , 𝐶̂0 = 𝑔 𝑠 . sends 𝐶 𝑇 and 𝑇 𝑜𝑘𝑒𝑛 to CRF 𝐷𝑂 , 𝐷𝑂 executes algorithm as
For 𝑗 = 1, 2, … , 𝑁 , the data owner randomly chooses 𝑑𝑗𝑍𝑝 follows:
and calculates 𝐶̂𝑗 ,1 = 𝑣𝑑𝑗 , 𝐶̂𝑗 ,2 = 𝑑𝑗 , 𝐶̂𝑗 ,3 = 𝑔 𝑑𝑗 . The intermediate 𝐷𝑂 .𝐸 𝑛𝑐 .𝑂𝑓 𝑓 𝑙𝑖𝑛𝑒 Input 𝑃 𝐾 and 𝑁 , the notation 𝑁 is
ciphertext 𝑀 𝑇 = (𝑠, 𝐶̂ , 𝐶̂0 , {𝑑𝑗 , 𝐶̂𝑗 ,1 , 𝐶̂𝑗 ,2 , 𝐶̂𝑗 ,3 }𝑗∈[1,𝑁 ] ). used to represent the highest possible number of rows that are
6. Enc.Online: Input 𝑀 𝑇 , plaintext 𝑚, access structure (𝑀 , 𝜌), where allowed in the access structure. 𝐷𝑂 randomly chooses 𝑠𝑍𝑝
𝑀 is a matrix of 𝑙 rows and 𝑛 columns (𝑙𝑁 ). The data as secret value and calculates 𝐶̂ = 𝑒(𝑔 , 𝑔 )𝛼 𝑠 , 𝐶̂0 = 𝑔 𝑠 . For
𝑗 = 1, 2, … , 𝑁 , 𝐷𝑂 randomly chooses 𝑑𝑗𝑍𝑝 and calculates
owner randomly chooses vector 𝑦⃖⃗ = (𝑠, 𝑦2 , … , 𝑦𝑛 ) ∈ 𝑍𝑝𝑛×1 . The
𝑑 𝑑 𝑑
secret share is 𝜆⃖⃗ = (𝜆1 , 𝜆2 , … , 𝜆𝑙 )𝑇 = 𝑀 𝑦⃖⃗. Then the data owner 𝐶̂𝑗′,1 = 𝑣 𝑗 , 𝐶̂𝑗′,2 = 𝑗 , 𝐶̂𝑗′,3 = 𝑔 𝑗 . Enter the transitional
calculates 𝑇 𝑜𝑘𝑒𝑛 = 𝐻0 (𝑚), 𝐶 = 𝑚 ⋅ 𝐶̂ = 𝑚 ⋅ 𝑒(𝑔 , 𝑔)𝛼𝑠 , 𝐶0 = 𝐶̂0 = 𝑔 𝑠 . encryption, denoted as 𝑀 𝑇 = (𝑠 , 𝐶̂ , 𝐶̂ , {𝐶̂ , 𝐶̂ , 𝐶̂ } ). 0 𝑗 ,1 𝑗 ,2 𝑗 ,3 𝑗∈[1,𝑁 ]
For 𝑗 = 1, 2, … , 𝑙, data owner computes 𝐶𝑗 ,1 = 𝐶̂𝑗 ,1 ⋅ 𝑤𝜆𝑗 = 𝐷𝑂 .𝐸 𝑛𝑐 .𝑂𝑛𝑙𝑖𝑛𝑒 Input 𝑃 𝐾 , 𝑀 𝑇 and 𝐶 𝑇 . The CRF 𝐷𝑂
𝑑
𝑤𝜆𝑗 𝑣𝑑𝑗 , 𝐶𝑗 ,2 = 𝐶̂𝑗 ,2 ⋅ 𝑢𝜌(𝑗)𝑑𝑗 = (𝑢𝜌(𝑗) ) 𝑗 , 𝐶𝑗 ,3 = 𝐶̂𝑗 ,3 = 𝑔 𝑑𝑗 . randomly selects vector 𝑦⃖⃖⃗′ = (𝑠 , 𝑦2 , ..., 𝑦𝑛 )𝑇𝑍𝑝𝑛×1 , then secret
The ciphertext 𝐶 𝑇 = ((𝑀 , 𝜌), 𝐶 , 𝐶0 , {𝐶𝑗 ,1 , 𝐶𝑗 ,2 , 𝐶𝑗 ,3 }𝑗∈[1,𝑙] ) and the shared vectors 𝜆⃖⃖⃗′ = (𝜆′ , … , 𝜆′ )𝑇 = 𝑀 𝑦⃖⃖⃗′ . Then 
1 𝑛 computes 𝐷𝑂
verification credential is 𝑇 𝑜𝑘𝑒𝑛. 𝐶 = 𝐶 ⋅ 𝐶̂ = 𝑚 ⋅ 𝑒(𝑔 , 𝑔 )𝛼 (𝑠+𝑠 ) , 𝐶0 = 𝐶0 ⋅ 𝐶̂0 = 𝑔 𝑠+𝑠 . For
7. Dec.Out: If the users attributes set, identified by 𝑆𝐼 𝐷 , does not 𝑗 = 1, 2, … , 𝑙, where 𝑙𝑁 , 𝐷𝑂 calculates
conform to the access structure, the cloud server will return 𝜆′ 𝜆 +𝜆′𝑗 𝑑𝑗 +𝑑𝑗
𝐶𝑗,1 = 𝐶𝑗 ,1 ⋅ 𝐶̂𝑗′,1 ⋅ 𝑤 𝑗 = 𝑤 𝑗 𝑣 , (8)
a null value ⊥ and terminate the algorithm. Otherwise, cloud
server collects 𝐼 = {𝑖, 𝜌(𝑖) ∈ 𝑆𝐼 𝐷 } and calculates {𝜔𝑖 ∈ 𝑍𝑝 }𝑖∈𝐼 , 𝜌(𝑗)𝑑𝑗 𝜌(𝑗) (𝑑𝑗 +𝑑𝑗 )
𝐶𝑗,2 = 𝐶𝑗 ,2 ⋅ 𝐶̂𝑗′,2 ⋅ 𝑢 = (𝑢 ) , (9)
where 𝑖∈𝐼 𝜔𝑖 ⋅ 𝑀𝑖 = (1, 0, … , 0) and 𝑀𝑖 is the 𝑖th row of matrix
𝑑 +𝑑𝑗
𝑀. Then the cloud server calculates 𝐶𝑗,3 = 𝐶𝑗 ,3 ⋅ 𝐶̂𝑗′,3 = 𝑔 𝑗 . (10)
𝑒(𝐶0 , 𝐾0 )
𝐴= ∏ 𝜔𝑖 The 𝐷𝑂 transmits the ciphertext 𝐶 𝑇 = (𝐶 , 𝐶0 , {𝐶𝑗,1 , 𝐶𝑗,2 ,
𝑖∈𝐼 (𝑒(𝐶𝑖,1 , 𝐾1 ) ⋅ 𝑒(𝐶𝑖,2 , 𝐾𝑗 ,2 ) ⋅ 𝑒(𝐶𝑖,3 , 𝐾𝑗 ,3 ))
𝐶𝑗,3 }𝑗∈[1,𝑙] , (𝑀 , 𝜌)), which has been re-randomized, along with
= 𝑒(𝑔 , 𝑔)𝛼 𝑠∕𝜏 , (6) the 𝑇 𝑜𝑘𝑒𝑛, to the cloud server.
3. Data Download: The data user runs 𝐾 𝑒𝑛𝐺𝑒𝑛.𝑟𝑎𝑛(𝑆 𝐾 ) and sends
in the given context, 𝑗 represents the position or identifier for 𝑇 𝐾 = (𝑆𝐼 𝐷 , 𝐾0 , 𝐾1 , {𝐾𝑖,2
, 𝐾 } ) to CRF 𝐷𝑈 . Then 𝐷𝑈
𝑖,3 𝑖∈[1,𝜎]
the attribute value 𝜌(𝑖) in 𝑆𝐼 𝐷 (). executes algorithm as follows:
8. Dec.User: The data user uses the conversion key 𝑅𝐾 to decrypt 𝐷𝑈 .𝑇 𝐾 𝑈 𝑝𝑑 𝑎𝑡𝑒 𝐷𝑈 randomly chooses 𝜑 ∈ 𝑍𝑝 and calculates
as follows: 1𝜑 𝛼 ∕𝜏 𝜑 (𝑟+𝑟 )∕𝜏 𝜑
𝐶 𝑒(𝑔 , 𝑔)𝛼𝑠 𝑚 𝐾0 = 𝐾
0
= 𝑔 𝑤 , (11)
= 𝜏 = 𝑚, (7)
𝐴𝜏 (𝑒(𝑔 , 𝑔)𝛼𝑠∕𝜏 ) 1𝜑 (𝑟+𝑟 )∕𝜏 𝜑
𝐾1 = 𝐾
1
= 𝑔 , (12)
then data user uses the verification credential 𝑇 𝑜𝑘𝑒𝑛 to com- 1𝜑 (𝑟 +𝑟 )∕𝜏 𝜑
plete the ciphertext verification, if 𝐻0 (𝑚) = 𝑇 𝑜𝑘𝑒𝑛 holds, the 𝐾𝑖,2 = 𝐾
𝑖,2
= 𝑔 𝑖 𝑖 , (13)
ciphertext is correct. Otherwise, the ciphertext may have been 1𝜑 𝐴 (𝑟𝑖 +𝑟𝑖 )∕𝜏 𝜑 (𝑟+𝑟 )∕𝜏 𝜑
𝐾𝑖,3 = 𝐾
𝑖,3
= (𝑢 𝑖 ) 𝑣 . (14)
tampered with.
5
X. Yang et al. Journal of Systems Architecture 160 (2025) 103331
𝐷𝑈 stores 𝜑 ∈ 𝑍𝑝 and sends re-randomize conversion key 𝑒(𝐶0 , 𝐾0 )
𝑇 𝐾 = (𝑆𝐼 𝐷 , 𝐾0 , 𝐾1 , {𝐾𝑖,2 , 𝐾 } ) to the cloud server. 𝐴 = ∏ 𝜔𝑖
𝑖,3 𝑖∈[1,𝜎] 𝑖∈𝐼 (𝑒(𝐶𝑖,1 , 𝐾1 ) ⋅ 𝑒(𝐶𝑖,2 , 𝐾𝑗 ,2 ) ⋅ 𝑒(𝐶𝑖,3 , 𝐾𝑗 ,3 ))
When receiving a decryption request from a data user, the cloud
server performs 𝐷𝑒𝑐 .𝑂𝑢𝑡(𝑇 𝐾 , 𝐶 𝑇 ) to acquire a partially de- 𝑒(𝑔 , 𝑔 )𝛼 (𝑠+𝑠 )∕𝜏 𝜑 𝑒(𝑔 , 𝑤 )(𝑟+𝑟 )(𝑠+𝑠 )∕𝜏 𝜑
= ∏
⋅∏
crypted ciphertext 𝑇 𝐶 𝑇 . The cloud server sends 𝑇 𝐶 𝑇 = (𝐶 , 𝐴 = (𝑟+𝑟 )(𝜆𝑖 +𝜆𝑖 )𝜔𝑖 ∕𝜏 𝜑 (𝑟+𝑟 )(𝑑𝑖 +𝑑𝑖 )𝜔𝑖 ∕𝜏 𝜑
𝑖∈𝐼 𝑒(𝑔 , 𝑤 ) 𝑖∈𝐼 𝑒(𝑔 , 𝑣 )
𝑒(𝑔 , 𝑔 )𝛼 (𝑠+𝑠 )∕𝜏 𝜑 ) and 𝑇 𝑜𝑘𝑒𝑛 to 𝐷𝑈 , 𝐷𝑈 runs algorithms as 1
⋅∏
follows.
𝜌(𝑖)(𝑑𝑖 +𝑑𝑖 )(𝑟𝑖 +𝑟𝑖 )𝜔𝑖 ∕𝜏 𝜑
𝑖∈𝐼 𝑒(𝑔 , 𝑢 )
𝐷𝑈 .𝐷𝑒𝑐 The CRF 𝐷𝑈 computes 𝐴 = 𝐴𝜑 = 𝑒(𝑔 , 𝑔 )𝛼 (𝑠+𝑠 )∕𝜏
1
and sends 𝑇 𝐶 𝑇 = (𝐶 , 𝐴 ) and 𝑇 𝑜𝑘𝑒𝑛 to the data user. ⋅∏
(15)
𝑖∈𝐼 𝑒(𝑔 , )(𝑑𝑖 +𝑑𝑖 )(𝑟𝑖 +𝑟𝑖 )𝜔𝑖 ∕𝜏 𝜑
After receiving re-randomize partially decrypted ciphertext, data
user runs 𝐷𝑒𝑐 .𝑈 𝑠𝑒𝑟 to recover plaintext 𝑚. Then the data user 1
⋅∏
uses the verification credential 𝑇 𝑜𝑘𝑒𝑛 to finish the ciphertext 𝑖∈𝐼 𝑒(𝑔 , 𝑢 )𝐴𝑖 (𝑑𝑖 +𝑑𝑖 )(𝑟𝑖 +𝑟𝑖 )𝜔𝑖 ∕𝜏 𝜑
verification, if 𝐻0 (𝑚) = 𝑇 𝑜𝑘𝑒𝑛 holds, the ciphertext is correct. 1 1
⋅∏
⋅∏
(𝑑𝑖 +𝑑𝑖 )(𝑟𝑖 +𝑟𝑖 )𝜔𝑖 ∕𝜏 𝜑 (𝑟+𝑟 )(𝑑𝑖 +𝑑𝑖 )𝜔𝑖 ∕𝜏 𝜑
𝑖∈𝐼 𝑒(𝑔 , ) 𝑖∈𝐼 𝑒(𝑔 , 𝑣 )
5. Security analysis 𝑒(𝑔 , 𝑔 )𝛼 (𝑠+𝑠 )∕𝜏 𝜑 𝑒(𝑔 , 𝑤 )(𝑟+𝑟 )(𝑠+𝑠 )∕𝜏 𝜑
= ∑
= 𝑒(𝑔 , 𝑔 )𝛼 (𝑠+𝑠 )∕𝜏 𝜑 .
(𝑟+𝑟 ) 𝑖∈𝐼 (𝜆𝑖 +𝜆𝑖 )𝜔𝑖 ∕𝜏 𝜑
𝑒(𝑔 , 𝑤 )
5.1. Security proof (16)
𝛼 (𝑠+𝑠 )∕𝜏
𝐶 𝐶 𝑚 ⋅ 𝑒(𝑔 , 𝑔 )
Theorem 1. Given that the 𝑞-BDHE assumption holds true, the proposed ′𝜏
= 𝜑𝜏 =
=𝑚 (17)
𝐴 𝐴 𝑒(𝑔 , 𝑔 )𝛼 (𝑠+𝑠 )∕𝜏
scheme is deemed secure against selective CPA.
It is evident from the aforementioned equations that the message
m remains decryptable under normal circumstances even after
Proof. If a polynomial-time adversary 𝐵 can effectively compromise the the implementation of a cryptographic reverse firewall. Conse-
proposed scheme with a significant advantage, then we can develop a quently, the functionality of the cryptographic reverse firewalls
challenger 𝐹 to solve the 𝑞-BDHE problem with a significant advantage. is preserved.
The process is as follows: 2. Weakly Security-preserving and Weakly Exfiltration-resistant
Init Phase: The adversary 𝐵 submits access policies (𝑀𝑖 , 𝜌𝑖 )𝑖∈𝐼 and We assume the following security game process.
a set of malicious attribute authorities 𝑅 = (𝐴̂ 𝑖 )𝑖∈𝐼 , where 𝑀𝑖 is a 𝑙 𝑛 Game 0: Same as chapter 3 security games.
matrix. Furthermore, the attributes within the access structure must Game 1: In the init phase, attribute authorities 𝑃 𝐾 , 𝐴𝑆 𝐾 𝑖 are
originate from trusted attribute authorities and cannot be maliciously generated by algorithms GlobalSetup and AASetup of basic
manipulated. scheme, not GlobalSetup*, AASetup* and 𝐴𝐴 .SetUp. The sub-
Setup Phase: The challenger 𝐹 executes algorithms AASetup and sequent algorithms are carried over unchanged from Game
GlobalSetup to generate public parameter 𝑃 𝑎𝑟𝑎𝑚𝑠 = {𝑔 , 𝑢, 𝑣, 𝑤, , 𝐺, 𝐺𝑇 , 0.
𝐻0 ()} and private keys (𝑃 𝐾𝑖 , 𝐴𝑆 𝐾 𝑖 )𝑖∈𝐼 . The reverse firewall 𝐴𝐴 ex- Game 2: During both phase 1 and phase 2, the secret key 𝑆 𝐾 is
ecutes the algorithm 𝐴𝐴 .𝑆 𝑒𝑡𝑈 𝑝 to re-random public key, then 𝐴𝐴 derived from the KeyGen algorithm of the foundational scheme,
publishes updated public key 𝑃 𝐾 . rather than being produced by KeyGen* or the 𝐴𝐴 .𝐾 𝐺. The
Query Phase 1: During this phase, 𝐵 can dynamically request secret 𝑇 𝐾 is produced using the KeyGen.ran function of the underlying
keys for attribute sets 𝑆1 , 𝑆2 , … , 𝑆𝑞 . For every query 𝑆𝑖 , 𝐹 executes scheme, and not through KeyGen.ran* or the 𝐷𝑈 .TKUpdate.
algorithm KeyGen to obtain corresponding secret key 𝑆 𝐾𝑖 . Then 𝐹 The subsequent algorithms mirror those utilized in Game 1.
executes algorithm 𝐴𝐴 .𝐾 𝐺 to get re-randomized secret key 𝑆 𝐾𝑖 . Game 3: During the challenge phase, the ciphertext labeled
Subsequently, 𝐹 executes KeyGen.ran to get conversion key 𝑇 𝐾𝑖 . Then as 𝐶 𝑇𝑏 is constructed through the process of encryption de-
𝐹 runs 𝐷𝑈 .𝑇 𝐾 𝑈 𝑝𝑑 𝑎𝑡𝑒 to get re-randomized conversion key 𝑇 𝐾𝑖 . 𝐶 noted by Enc.offline, Enc.online, not Enc.offline*, Enc.online*,
returns (𝑆 𝐾𝑖 , 𝑇 𝐾𝑖 ) to 𝐵. 𝐷𝑂 .Enc.offline and 𝐷𝑂 .Enc.online. Actually, Game 3 is the
Challenge Phase: 𝐵 provides two messages, 𝑚0 and 𝑚1 , of equal security game of basic scheme.
length. 𝐹 randomly selects 𝑏 ∈ {0, 1} and runs Enc.Offline* and We then proceed to demonstrate the indistinguishability be-
tween Game 0 and Game 1, followed by Game 1 and Game
Enc.Online* to get challenge ciphertext 𝐶 𝑇𝑏 = ((𝑀 , 𝜌), 𝐶 , 𝐶0 , {𝐶𝑗 ,1 , 𝐶𝑗 ,2 ,
2, and finally between Game 2 and Game 3, each in isolation.
𝐶𝑗 ,3 }𝑗∈[1,𝑙] ).
Between Game 0 and Game 1, it is observed that no matter
Then 𝐹 executes 𝐷𝑂 .𝐸 𝑛𝑐 .𝑂𝑓 𝑓 𝑙𝑖𝑛𝑒 and 𝐷𝑂 .𝐸 𝑛𝑐 .𝑂𝑛𝑙𝑖𝑛𝑒 Obtain a
the modifications introduced by the tampered GlobalSetup* and,
ciphertext 𝐶 𝑇𝑏 . 𝐹 that has been re-randomized sends 𝐶 𝑇𝑏 to 𝐵.
AASetup* algorithms, after the application of re-randomization
Query Phase 2: The challenger 𝐹 proceeds as in Query Phase 1.
via the 𝑊𝐴𝐴 reverse firewall, the public parameter 𝑃 𝐾 always
Guess Phase: 𝐵 outputs a bit 𝑏 ∈ {0, 1}. If 𝑏 = 𝑏, then 𝐹 outputs 0
corresponds to the structure of the 𝑃 𝐾 that is generated by the
(meaning that 𝐵 obtains the normally generated ciphertext). If 𝑏
standard algorithm. This uniformity is due to the malleability
𝑏, then 𝐹 outputs 1(meaning that 𝐵 obtains the randomly selected
of the key in question. Consequently, there is no distinguishable
element). Hence, the adversary 𝐵 has advantage of 𝜖 security game
difference between Game 0 and Game 1.
directly correlates to the ability of function 𝐹 to resolve the 𝑞-BDHE
Given that the secret key 𝑆 𝐾 and the conversion key 𝑇 𝐾,
problem with the same level of probability.
which are produced for the user by the attribute authority, also
possess malleability, it follows that Game 1 and Game 2 are
5.2. Security analysis indistinguishable. When it comes to Game 2 and Game 3, the 𝐶 𝑇
will undergo rerandomization by the reverse firewall, resulting
The features of the proposed scheme include: in a new ciphertext 𝐶 𝑇 , a process that is a consequence of
the ciphertexts malleable nature. Thus, regardless of how the
1. Function Maintaining Enc.offline* and Enc.online* algorithms operate, the ultimate
If the collection of attributes associated with the secret key configuration of the ciphertext aligns with that of the basic
constitutes an authorized set, then the equation 𝑖∈𝐼 𝜔𝑖 ⋅ (𝜆𝑖 + schemes ciphertext structure. Consequently, there is no distin-
𝜆𝑖 ) = 𝑠 + 𝑠 holds. Thus, guishable difference between Game 2 and Game 3. In summary,
6
X. Yang et al. Journal of Systems Architecture 160 (2025) 103331
Table 1
Function comparison.
Scheme With CRFs Outsource Offline encryption Multi-authority Ciphertext verification Access structure
Guo et al. [25] ✕ ✓ ✓ ✕ ✕ Tree
Chaudhary et al. [28] ✕ ✓ ✕ ✓ ✕ LSSS
Hong et al. [31] ✓ ✕ ✕ ✓ ✕ LSSS
Zhong et al. [29] ✕ ✓ ✕ ✕ ✕ Tree
Zhao et al. [32] ✓ ✓ ✓ ✕ ✕ Tree
Jin et al. [33] ✓ ✕ ✕ ✕ ✕ LSSS
Elhabob et al. [34] ✓ ✕ ✕ ✕ ✓ Tree
Ours ✓ ✓ ✓ ✓ ✓ TREE
we deduce that Game 0 and Game 3 are equivalent in terms of By combining the above technologies, this method not only pro-
their indistinguishability. Given that the foundational scheme is tects the communication channel, but also improves the security
secure, it follows that the proposed scheme is also secure. of information.
3. Message Verification
The data user(vehicle/RSU) use parameters 𝑇 𝑜𝑘𝑒𝑛, 𝑚 and hash 6. Performance evaluation
function 𝐻0 () to check whether equation 𝐻0 (𝑚) = 𝑇 𝑜𝑘𝑒𝑛 holds
true. With the help of the verification procedure described, the 6.1. Experimental setup
data user can identify any tampering that may have occurred
with the message. Additionally, it provides assurance regarding The following outlines the hardware and software contexts utilized
the completeness and dependability of the received message. If for conducting the experiment:
the message changes, the equation will not holds. Therefore, the
proposed scheme supports the message verification. • The experimental apparatus consists of a desktop computer
4. Collusion Resistance equipped with a 3.2 GHz AMD Ryzen 5 5600x CPU, 16 GB of
RAM, and runs the Windows 11 Professional (x64) OS.
Theorem 2. Should the difficulty of the discrete logarithm problem remain • The experimental schemes are realized using Java 8 and the
uncompromised, the proposed scheme can defend against collusion attacks JPBC 2.0.0 library [32]. The prime-order bilinear pairings are
initiated by up to 𝑁 1 attribute authorities. constructed upon a 160-bit elliptic curve group, which is founded
on the equation 𝑦2 = 𝑥3 + 𝑥.
According to the encryption process, each attribute authority
randomly chooses 𝑠𝑖𝑘𝑍𝑝 and attribute authority extends 6.2. Theoretical analysis
the value 𝑔 𝑠𝑖𝑘 to all the other attribute authorities involved.
Given the difficulty inherent in the discrete logarithm problem, it Table 1 provides a side-by-side comparison to examine the function-
would be problematic for an adversary 𝐵 to deduce 𝑠𝑖𝑘 from 𝑔 𝑠𝑖𝑘 ality of our proposed scheme in relation to other schemes. Scheme [25]
alone. Hence, even with the combined efforts of 𝑁 2 attribute supports outsourced decryption and online encryption, but the rest
authorities working in tandem with the adversary, guessing a of the functionality is not realized. Scheme [28] introduced multiple
valid 𝑀 𝐾𝑖 remains an unattainable task for the adversary. Con- authorities to protect against collusion attacks. Scheme [29] only pro-
sequently, the adversary cannot devise a valid secret key 𝑆 𝐾. vides outsource decryption, thus the efficiency of encryption phase is
This renders the proposed scheme resistant to collusion attacks not good enough. Scheme [3134], add CRF modules between entities
carried out by 𝑁 1 attribute authorities. based on the above schemes. However, these schemes either do not
have outsourced decryption or do not have multiple attribute authori-
5.3. Informal security analysis ties, which has some disadvantages. Our scheme provides both of these
features, taking into account both efficiency and security. Through
1. Side channel attack defenses comparison, we can find that the proposed scheme adds cryptographic
The proposed scheme utilizes CRF technology, which signif- reverse firewalls between entities. By employing these firewalls, the
icantly reduces the computational overhead while enhancing system is fortified with a layer of defense that maintains its func-
security. By leveraging CRF, it reduces the risk of messages tional integrity against potential subversion attacks and any attempts
being attacked and complicates potential threats. In addition, to tamper with its algorithms.
multi-authorization technology maximizes the security of the The introduction of multi-attribute authorities ensures that the sys-
entire system, effectively preventing single-point leakage, while tem is resistant to collusion attacks. The proposed scheme also provides
balancing power consumption and execution time. These two outsourcing decryption as well as offline encryption, which requires
methods not only improve the efficiency, but also provide strong low computation for the users to obtain the ciphertext. Addition-
protection against side channel attacks. ally, verification credentials empower users to check and ensure the
In short, the scheme effectively combines efficiency and en- ciphertexts integrity.
hanced security, making it suitable for secure communication in The following notations are applied within Tables 2 and 3 are as
vehicular networks that are susceptible to side channels. follows: 𝐸 signifies an exponential operation, and 𝑃 denotes a bilinear
2. Man-in-the-Middle attack defense0 pairing operation. In the given context, 𝑀 signifies the number of rows
The proposed scheme uses CP-ABE technology. This technique in a matrix as well as the number of leaf nodes in an access tree. The
uses a ciphertext policy, which embeds the access policy into the symbol 𝑙 is used to denote the total number of attributes possessed by
ciphertext. This improves the security and flexibility of access users, while 𝑘 signifies the minimum number of attributes from the
control and reduces the risk of man-in-the-middle attack (MITI) access structure required to fulfill the decryption criteria.
due to identity forgery. As shown in Table 2, our scheme is in the middle of the 𝐾 𝑒𝑦𝐺𝑒𝑛
In addition, we enhance the CRF module by integrating key pa- phase. However, our scheme achieves the lowest computational over-
rameter re-randomization within the multi-authority ABE frame- head in the 𝐸 𝑛𝑐 .𝑂𝑛𝑙𝑖𝑛𝑒 phase. In the 𝐷𝑒𝑐 .𝑂𝑢𝑡 phase, our scheme does
work. In addition, the proposed scheme also supports message not achieve significant advantages. But in 𝐷𝑒𝑐 .𝑈 𝑠𝑒𝑟 phase, our scheme
integrity verification, easily executable by onboard terminals requires only a single exponential operation, reaches a constant level
using simple hash functions. of computational overhead.
7
X. Yang et al. Journal of Systems Architecture 160 (2025) 103331
Fig. 3. Time consumption of basic scheme.
Table 2
Computation comparison.
Scheme KeyGen Encryption Outsource decryption User decryption
Offline Online
Guo et al. [25] (𝑙 + 4)𝐸 (3𝑀 + 1)𝐸 3𝐸 2𝑙𝐸 + 2𝑙𝑃 𝐸
Chaudhary et al. [28] (2𝑙 + 2)𝐸 ✕ (3𝑀 + 1)𝐸 (4𝑙 + 2)𝐸 𝐸
Zhong et al. [29] (3𝑙 + 6)𝐸 ✕ (2𝑀 + 2)𝐸 ✕ 2𝑙𝐸 + (𝑙 + 1)𝑃
Hong et al. [31] (4𝑙 + 2)𝐸 + 𝑃 ✕ (5𝑀 + 2)𝐸𝐸 + (3𝑘 + 1)𝑃
Zhao et al. [32] (2𝑙 + 4)𝐸 3𝑀 𝐸 + 𝑃 3𝐸 (3𝑙 + 1)𝐸 + (2𝑙 + 1)𝑃 2𝐸
Jin et al. [33] 𝑙𝐸 + 𝑃 ✕ 6𝑀 𝐸 + 3𝑃𝑙𝐸 + 2𝑃
Elhabob et al. [34] (2𝑙 + 2)𝐸 ✕ 4𝐸 ✕ 3𝐸
Ours (2𝑙 + 3)𝐸 (2𝑀 + 2)𝐸 3𝐸 𝑙𝐸 + 3𝑙𝑃 𝐸
Table 3 Fig. 3(a) demonstrates that our scheme has a low computational
Time consumption of CRFs.
overhead., is observed to be low. As shown in Fig. 3(b), when compar-
Scheme 𝐴𝐴 .𝑆 𝑒𝑡𝑈 𝑝 𝐴𝐴 .𝐾 𝐺 𝐷𝑂 .𝐸 𝑛𝑐 .𝑂𝑛𝑙𝑖𝑛𝑒 ing the computational overhead of the 𝐸 𝑛𝑐 .𝑂𝑛𝑙𝑖𝑛𝑒 phase, our scheme,
Hong et al. [31] 2𝑙𝐸 + 2𝑙𝑃 (5𝑙 + 2)𝐸 2𝑙𝐸 + 𝑃 which benefits from the preprocessing performed in the 𝐸 𝑛𝑐 .𝑂𝑓 𝑓 𝑙𝑖𝑛𝑒
Zhao et al. [32] 2𝐸 (2𝑙 + 3)𝐸 4𝐸
phase, has the lowest computational overhead of all the schemes eval-
Jin et al. [33] (𝑙 + 2)𝐸 (2𝑙 + 2)𝐸 𝑃
Elhabob et al. [34] 2𝐸 (2𝑙 + 3)𝐸 4𝐸 uated. In terms of Fig. 3(c), the efficiency of our scheme is in the
Ours 5𝐸 (2𝑙 + 3)𝐸 2𝐸 middle of the 𝐷𝑒𝑐 .𝑂𝑢𝑡 phase. While in the 𝐷𝑒𝑐 .𝑈 𝑠𝑒𝑟 phase, our scheme
maintains the lowest computational overhead, It is also significant to
observe that the overhead does not fluctuate with varying counts of
attributes in the system.
In terms of CRFs time consumption, our scheme achieves time con-
As depicted in Fig. 4, there is a performance comparison for the re-
sumption of constant level in 𝐴𝐴 .𝑆 𝑒𝑡𝑈 𝑝 phase as illustrated in 3, the
randomization of secret keys by CRF 𝐴𝐴 . Our schemes computational
time overhead does not fluctuate based on the count of attributes within
overhead is similar to that of scheme [32], which is at the lower
the system. Moreover, our scheme achieves the highest efficiency in
level. Moreover, as shown in Fig. 5, the computational overhead of
terms of the 𝐷𝑂 .𝐸 𝑛𝑐 .𝑂𝑛𝑙𝑖𝑛𝑒 phase, and requires only two exponential
our scheme in the 𝐷𝑂 .𝐸 𝑛𝑐 .𝑂𝑛𝑙𝑖𝑛𝑒 phase is the most efficient and does
operations.
not escalate linearly with an increase in vehicle attributes, which is a
distinct advantage over other scheme [31]. And compared with [33,
6.3. Practical analysis 34], the proposed scheme still has an advantage in the computational
overhead of 𝐴𝐴 .𝑆 𝑒𝑡𝑈 𝑝 phase.
In light of the hardware and software environment described within In summary, our scheme reduces resource consumption on the user
the xperimental Setup section, Fig. 3 presents a performance comparison side and improves the efficiency of data flow in vehicles with limited
of the multiple phases of our scheme. computing power.
8
X. Yang et al. Journal of Systems Architecture 160 (2025) 103331
Acknowledgments
This work was supported in part by Key project of Gansu Science
and Technology Plan (23YFGA0081), Gansu Province College Industry
Ssupport Plan (2023CYZC-09), National Natural Science Foundation of
China (No. 62362059).
Data availability
The authors do not have permission to share data.
References
Fig. 4. Time consumption of 𝐴𝐴 .𝑆 𝑒𝑡𝑈 𝑝.
[1] Siyi Liao, Jun Wu, Jianhua Li, Ali Kashif Bashir, Shahid Mumtaz, Alireza Jolfaei,
Nida Kvedaraite, Cognitive popularity based AI service sharing for software-
defined information-centric networks, IEEE Trans. Netw. Sci. Eng. 7 (4) (2020)
21262136.
[2] Rich Miller, Rolling zettabytes: Quantifying the data impact of connected cars,
Data Cent. Front. (2020).
[3] Kayhan Zrar Ghafoor, Linghe Kong, Sherali Zeadally, Ali Safaa Sadiq, Gre-
gory Epiphaniou, Mohammad Hammoudeh, Ali Kashif Bashir, Shahid Mumtaz,
Millimeter-wave communication for internet of vehicles: status, challenges, and
perspectives, IEEE Internet Things J. 7 (9) (2020) 85258546.
[4] Soheila Ghane, Alireza Jolfaei, Lars Kulik, Kotagiri Ramamohanarao, Deepak
Puthal, Preserving privacy in the internet of connected vehicles, IEEE Trans.
Intell. Transp. Syst. 22 (8) (2020) 50185027.
[5] Liang Zhao, Hongmei Chai, Yuan Han, Keping Yu, Shahid Mumtaz, A collabo-
rative V2X data correction method for road safety, IEEE Trans. Reliab. 71 (2)
(2022) 951962.
[6] Weisong Shi, Jie Cao, Quan Zhang, Youhuizi Li, Lanyu Xu, Edge computing:
Vision and challenges, IEEE Internet Things J. 3 (5) (2016) 637646.
Fig. 5. Time consumption of 𝐷𝑂 .𝐸 𝑛𝑐 .𝑂𝑛𝑙𝑖𝑛𝑒. [7] Zhenyu Zhou, Haijun Liao, Bo Gu, Shahid Mumtaz, Jonathan Rodriguez, Resource
sharing and task offloading in IoT fog computing: A contract-learning approach,
IEEE Trans. Emerg. Top. Comput. Intell. 4 (3) (2019) 227240.
[8] Xingwang Li, Zhen Xie, Zheng Chu, Varun G Menon, Shahid Mumtaz, Jianhua
7. Conclusion Zhang, Exploiting benefits of IRS in wireless powered NOMA networks, IEEE
Trans. Green Commun. Netw. 6 (1) (2022) 175186.
[9] Vipul Goyal, Omkant Pandey, Amit Sahai, Brent Waters, Attribute-based encryp-
In the IoV environment, securing the encryption and sharing of the tion for fine-grained access control of encrypted data, in: Proceedings of the 13th
vast amounts of data generated by vehicles, while preventing data leak- ACM Conference on Computer and Communications Security, 2006, pp. 8998.
age due to device tampering, presents significant challenges. To address [10] Amit Sahai, Brent Waters, Fuzzy identity-based encryption, in: Advances in
these challenges, we propose an advanced attribute-based encryption CryptologyEUROCRYPT 2005: 24th Annual International Conference on the
Theory and Applications of Cryptographic Techniques, Aarhus, Denmark, May
scheme, enhanced with a cryptographic reverse firewall, specifically
22-26, 2005. Proceedings 24, Springer, 2005, pp. 457473.
designed for the IoV ecosystem. This scheme is supported by multiple [11] John Bethencourt, Amit Sahai, Brent Waters, Ciphertext-policy attribute-based
attribute authorities, which not only defend against collusion attacks encryption, in: 2007 IEEE Symposium on Security and Privacy, SP07, IEEE,
but also enable offline encryption and outsourced decryption. These 2007, pp. 321334.
[12] Matthew Green, Susan Hohenberger, Brent Waters, Outsourcing the decryption
integrated features greatly improve the computational efficiency of
of {abe} ciphertexts, in: 20th USENIX Security Symposium, USENIX Security 11,
vehicular onboard units. Additionally, we deploy RSUs with CRFs 2011.
between the entities, ensuring that data remains secure even in the [13] Junzuo Lai, Robert H. Deng, Chaowen Guan, Jian Weng, Attribute-based encryp-
event of device tampering. The proposed attribute-based encryption tion with verifiable outsourced decryption, IEEE Trans. Inf. Forensics Secur. 8
scheme, combined with the reverse firewall mechanism, shows great (8) (2013) 13431354.
[14] Suqing Lin, Rui Zhang, Hui Ma, Mingsheng Wang, Revisiting attribute-based
promise in securing data transmission and storage within the IoV, while
encryption with verifiable outsourced decryption, IEEE Trans. Inf. Forensics
protecting against unauthorized access and data leakage. Secur. 10 (10) (2015) 21192130.
[15] Cong Zuo, Jun Shao, Guiyi Wei, Mande Xie, Min Ji, CCA-secure ABE with
outsourced decryption for fog computing, Future Gener. Comput. Syst. 78 (2018)
CRediT authorship contribution statement
730738.
[16] James Ball, Julian Borger, Glenn Greenwald, et al., Revealed: how US and UK
Xiaodong Yang: Writing review & editing, Writing original spy agencies defeat internet privacy and security, Know Your Neighb. (2013).
draft. Xilai Luo: Writing review & editing, Writing original draft. [17] Stephen Checkoway, Ruben Niederhagen, Adam Everspaugh, Matthew Green,
Tanja Lange, Thomas Ristenpart, Daniel J Bernstein, Jake Maskiewicz, Hovav
Zefan Liao: Writing review & editing, Writing original draft. Wenjia Shacham, Matthew Fredrikson, On the practical exploitability of dual {ec} in
Wang: Writing review & editing, Writing original draft. Xiaoni {tls} implementations, in: 23rd USENIX Security Symposium, USENIX Security
Du: Writing review & editing, Writing original draft. Shudong Li: 14, 2014, pp. 319335.
Writing review & editing, Writing original draft. [18] Yevgeniy Dodis, Chaya Ganesh, Alexander Golovnev, Ari Juels, Thomas Risten-
part, A formal treatment of backdoored pseudorandom generators, in: Advances
in CryptologyEUROCRYPT 2015: 34th Annual International Conference on the
Declaration of competing interest Theory and Applications of Cryptographic Techniques, Sofia, Bulgaria, April
26-30, 2015, Proceedings, Part I 34, Springer, 2015, pp. 101126.
[19] Ilya Mironov, Noah Stephens-Davidowitz, Cryptographic reverse firewalls, in: Ad-
The authors declare that they have no known competing finan- vances in Cryptology-EUROCRYPT 2015: 34th Annual International Conference
cial interests or personal relationships that could have appeared to on the Theory and Applications of Cryptographic Techniques, Sofia, Bulgaria,
influence the work reported in this paper. April 26-30, 2015, Proceedings, Part II 34, Springer, 2015, pp. 657686.
9
X. Yang et al. Journal of Systems Architecture 160 (2025) 103331
[20] Brent Waters, Ciphertext-policy attribute-based encryption: An expressive, effi- Xilai Luo is presently a masters degree candidate at the
cient, and provably secure realization, in: International Workshop on Public Key College of Computer Science and Engineering, Northwest
Cryptography, Springer, 2011, pp. 5370. Normal University, located in China. His academic pur-
[21] Shucheng Yu, Cong Wang, Kui Ren, Wenjing Lou, Achieving secure, scalable, suits are focused on the areas of artificial intelligence,
and fine-grained data access control in cloud computing, in: 2010 Proceedings information security, and cryptography.
IEEE INFOCOM, IEEE, 2010, pp. 19.
[22] Kan Yang, Xiaohua Jia, Kui Ren, Ruitao Xie, Liusheng Huang, Enabling efficient
access control with dynamic policy updating for big data in the cloud, in: IEEE
INFOCOM 2014-IEEE Conference on Computer Communications, IEEE, 2014, pp.
20132021.
[23] Jun Feng, Hu Xiong, Jinhao Chen, Yang Xiang, Kuo-Hui Yeh, Scalable and
revocable attribute-based data sharing with short revocation list for IIoT, IEEE
Internet Things J. 10 (6) (2022) 48154829. Zefan Liao is actively working towards his masters degree
[24] Qian Mei, Hu Xiong, Yeh-Cheng Chen, Chien-Ming Chen, Blockchain-enabled in the College of Computer Science and Engineering at
privacy-preserving authentication mechanism for transportation cps with Northwest Normal University, China. His areas of research
cloud-edge computing, IEEE Trans. Eng. Manage. (2022). interest include the fields of edge computing, information
[25] Rui Guo, Geng Yang, Huixian Shi, Yinghui Zhang, Dong Zheng, O 3-R-CP-ABE: An security, and cryptography.
efficient and revocable attribute-based encryption scheme in the cloud-assisted
IoMT system, IEEE Internet Things J. 8 (11) (2021) 89498963.
[26] Melissa Chase, Multi-authority attribute based encryption, in: Theory of Cryp-
tography: 4th Theory of Cryptography Conference, TCC 2007, Amsterdam, the
Netherlands, February 21-24, 2007. Proceedings 4, Springer, 2007, pp. 515534.
[27] Allison Lewko, Brent Waters, Decentralizing attribute-based encryption, in: An-
nual International Conference on the Theory and Applications of Cryptographic
Techniques, Springer, 2011, pp. 568588. Wenjia Wang is pursuing her masters degree within the
[28] Chandan Kumar Chaudhary, Richa Sarma, Ferdous Ahmed Barbhuiya, RMA- College of Computer Science and Engineering at Northwest
CPABE: A multi-authority CPABE scheme with reduced ciphertext size for IoT Normal University, China. Her research interests are cen-
devices, Future Gener. Comput. Syst. 138 (2023) 226242. tered on the topics of data security and network security.
[29] Hong Zhong, Yiyuan Zhou, Qingyang Zhang, Yan Xu, Jie Cui, An efficient and
outsourcing-supported attribute-based access control scheme for edge-enabled
smart healthcare, Future Gener. Comput. Syst. 115 (2021) 486496.
[30] Hui Ma, Rui Zhang, Guomin Yang, Zishuai Song, Shuzhou Sun, Yuting Xiao,
Concessive online/offline attribute based encryption with cryptographic reverse
firewalls—Secure and efficient fine-grained access control on corrupted machines,
in: Computer Security: 23rd European Symposium on Research in Computer
Security, ESORICS 2018, Barcelona, Spain, September 3-7, 2018, Proceedings, Xiaoni Du received the Ph.D. degree in cryptography from
Part II 23, Springer, 2018, pp. 507526. Xidian University, Xian, China, in 2008.
[31] Bo Hong, Jie Chen, Kai Zhang, Haifeng Qian, Multi-authority non- She worked as a Visiting Scholar with the University of
monotonic KP-ABE with cryptographic reverse firewall, IEEE Access 7 (2019) Kentucky, Lexington, KY, USA, and Hong Kong University
159002159012. of Science and Technology, Hong Kong, in 2011 and 2014,
[32] Yang Zhao, Yuwei Pang, Xingyu Ke, Bintao Wang, Guobin Zhu, Mingsheng Cao, respectively. She is currently a Professor with the College
A metaverse-oriented CP-ABE scheme with cryptographic reverse firewall, Future of Mathematics and Statistics, Northwest Normal Univer-
Gener. Comput. Syst. 147 (2023) 195206. sity, Lanzhou, China. Her main research interests include
[33] Jin C., Chen Z., Qin W., et al., Blockchain-based proxy re-encryption scheme information security, cryptography, and coding.
with cryptographic reverse firewall for IoV, Int. J. Netw. Manage. (2024) e2305.
[34] Elhabob R., Eltayieb N., Xiong H., et al., Equality test public key encryption
with cryptographic reverse firewalls for cloud-based E-commerce, IEEE Trans.
Consum. Electron. (2024). Shudong Li received the M.S. degree in applied mathe-
matics from Tongji University, Shanghai, China, in 2005,
and the Ph.D. degree in Posts and Telecommunications from
Xiaodong Yang (Member, IEEE) received the M.S. degree Beijing University, Beijing, China, in 2012.
in cryptography from Tongji University, Shanghai, China, in From 2013 to 2018, he held the position of a post-
2005, and the Ph.D. degree in cryptography from Northwest doctoral researcher at the National University of Defense
Normal University, Lanzhou, China, in 2010. Technology in Changsha, China. He now serves as a Pro-
In his role as a Postdoctoral Researcher at Chinas State fessor at the Cyberspace Institute of Advanced Technology
Key Laboratory of Cryptology in Beijing during 2016, he at Guangzhou University. His primary research interests
played a significant part in advancing the field. Today, he are in the realms of Big Data and its security, malware
holds the position of Professor at the College of Computer identification, and cloud computing.
Science and Engineering, Northwest Normal University. The
core of his research is anchored in public-key cryptogra-
phy, information security protocols, and the application of
wireless sensor networks.
10

View File

@@ -0,0 +1,965 @@
Journal of Systems Architecture 160 (2025) 103345
Contents lists available at ScienceDirect
Journal of Systems Architecture
journal homepage: www.elsevier.com/locate/sysarc
A hash-based post-quantum ring signature scheme for the Internet of Vehicles
Shuanggen Liu a ,, Xiayi Zhou a , Xu An Wang b , Zixuan Yan a , He Yan a , Yurui Cao a
a
School of Cyberspace Security, Xian University of Posts and Telecommunications, Xian, Shaanxi, China
b
Key Laboratory of Network and Information Security, Engineering University of Peoples Armed Police, Shaanxi, China
ARTICLE INFO ABSTRACT
Keywords: With the rapid development of the Internet of Vehicles, securing data transmission has become crucial,
Ring signature especially given the threat posed by quantum computing to traditional digital signatures. This paper presents
Internet of Vehicles a hash-based post-quantum ring signature scheme built upon the XMSS hash-based signature framework,
Merkle tree
leveraging Merkle trees for efficient data organization and verification. In addition, the scheme is applied to
Post-quantum digital signature
the Internet of Vehicles, ensuring both anonymity and traceability while providing robust quantum-resistant
Hash-based signature scheme
security. Evaluation results indicate that, compared to other schemes, the proposed method achieves superior
verification speed while ensuring data security and privacy.
1. Introduction area of study, with the aim of establishing a resilient foundation
for the industry. The National Institute of Standards and Technology
As a fundamental necessity in modern life, the number of vehicles (NIST) has been conducting a multi-stage standardization process for
produced worldwide continues to grow. According to relevant statistics, post-quantum cryptography. The third round of candidate evaluations
global vehicle production reached 94 million units in 2023 [1]. Ad- has been completed, and algorithms such as SPHINCS+, CRYSTALS-
ditionally, data from the International Organization of Motor Vehicle DILITHIUM, and CRYSTALS-KYBER have been standardized. These
Manufacturers indicates that there are now 1.3 billion vehicles in algorithms achieve varying levels of bit-level security depending on
use [2]. However, this growth brings various challenges, including key size and parameter settings, which align with NIST security levels
network attacks, unauthorized access, and concerns around road safety from 1 to 5, representing 128/160/192/224/256-bit security strengths,
and privacy. To address these issues, new research fields, such as respectively [5]. A post-quantum digital signature scheme is a dig-
intelligent transportation systems (ITS) and the Internet of Vehicles ital signature scheme capable of resisting quantum attacks. Among
(IoV), have emerged. These fields aim to provide safer, more efficient, post-quantum digital signature schemes, hash-based schemes are partic-
and more harmonious vehicular environments. Vehicle-to-Everything ularly effective and provably secure. Hash-based post-quantum digital
(V2X) technology enables the effective use of dynamic information signature schemes offer significant advantages over other types of
from all networked vehicles via on-board devices, facilitating secure,
post-quantum schemes due to their high computational efficiency, scal-
efficient, intelligent, and comfortable services, thereby contributing
ability, maturity, and reliance solely on the preimage resistance of the
to the intelligence of social traffic systems [3]. The typical VANET
underlying hash function [6].
structure is shown in Fig. 1.
In IoV networks, where both privacy and traffic safety are essential,
With the increasing number of vehicles and the development of
ring signatures are especially suitable. Ring signature schemes offer
the IoV, it is a very important job to ensure the security of the
anonymity by concealing the identity of signer among a group of par-
IoV systems. Currently, the security of vehicular networks, whether
ticipants. Using hash-based post-quantum ring signatures, vehicles can
internal or external, primarily relies on digital signatures or public-
sign messages anonymously within a group, ensuring their identities
key encryption. However, as quantum computing advances, traditional
digital signature algorithms are increasingly vulnerable to quantum cannot be traced. These signatures also provide unforgeability, collision
attacks, making it essential to incorporate post-quantum digital sig- resistance, resilience against quantum attacks, and low communication
nature algorithms into IoV research. Unlike traditional computers, overhead. In densely populated cities, managing keys for secure vehic-
quantum computers can accelerate the cracking of probabilistic al- ular communications can be challenging, especially given the limited
gorithms through parallel computation capabilities [4]. In light of IoV coverage [7]. The Merkle tree structure effectively compresses
these challenges, post-quantum cryptography has become a critical keys, reducing key management costs [8]. In this study, we propose a
Corresponding author.
E-mail address: liushuanggen201@xupt.edu.cn (S. Liu).
https://doi.org/10.1016/j.sysarc.2025.103345
Received 11 November 2024; Received in revised form 23 December 2024; Accepted 16 January 2025
Available online 23 January 2025
1383-7621/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
S. Liu et al. Journal of Systems Architecture 160 (2025) 103345
of classical signature and ring signature in the quantum environment,
and proposed two short signature schemes, which were implemented
in the quantum random prediction model and the ordinary model
respectively [20]. Recent literature has introduced novel architectures,
such as linkable ring signatures, threshold ring signatures, and identity-
based post-quantum ring signatures, discussing their post-quantum se-
curity features [2123], Similarly, literature [24]systematically reviews
the theory and application of linkable ring signatures, providing an in-
depth comparison of anonymization and linkability schemes, but these
studies lack analysis of specific application scenarios (such as the IoV),
and do not fully consider resource-constrained environments and the
potential of anti-quantum computing.
In response to the research of NIST on post-quantum algorithms
and verification ring signatures, a blockchain-based, post-quantum
anonymous, traceable, and verifiable authentication scheme was pro-
posed to mitigate quantum attacks while addressing security and pri-
vacy concerns, with an evaluation of its feasibility in IoV environ-
ments [25]. The IoV faces significant security and privacy challenges,
Fig. 1. VANET structure.
and blockchain technology offers an effective platform to ensure both
user privacy and security [2628]. Literature [29] proposes an identity
authentication and signature scheme for UAV-assisted Vehicular Ad
Hoc Networks (VANET), focusing on enhancing network anonymity
hash-based post-quantum ring signature scheme for IoV applications.
and user privacy through an efficient authentication mechanism. Lit-
The ring signature algorithm of Our scheme is based on the XMSS
erature [30] introduces a distributed message authentication scheme
algorithm, aiming to enhance data sharing security and efficiency.
combined with a reputation mechanism to improve the security and
Merkle trees are used to organize and verify data efficiently, while ring
trust of the IoV. The scheme uses node credit values to authenticate
signatures ensure the authenticity and integrity of data within the IoV
message validity, effectively preventing malicious attacks and forgery.
network without compromising user anonymity.
Literature [31] presents an authentication key negotiation protocol for
intelligent transportation systems in vehicle networks, strengthening
1.1. Related works identity authentication and key exchange mechanisms to prevent secu-
rity threats such as eavesdropping, tampering, and man-in-the-middle
In recent years, hash-based post-quantum digital signature schemes attacks. While these studies address key security challenges in vehicular
have garnered significant attention within the cryptography commu- networks, they often focus on specific aspects, lacking comprehensive
nity. Following the fourth round of the NIST post-quantum digital and scalable frameworks for real-world scenarios. Furthermore, the
signature standardization process, the SPHINCS+ algorithm was in- integration of post-quantum cryptography and scalability in dynamic,
troduced as a supplementary standard, featuring a flexible, tunable large-scale networks remains underexplored, highlighting opportunities
hash function structure [9]. As the standardization process progresses, for future research into robust and future-proof solutions. Given the
researchers have proposed various adaptations, including SPHINCS-a inherent advantages of ring signatures, they are particularly well-
and SPHINCS+-c, which further compress signature sizes and enhance suited for applications such as the Internet of Vehicles, making further
execution speeds [10,11]. Additionally, Sun, Liu, and colleagues de- investigation essential.
veloped a domestic signature algorithm based on the post-quantum In order to ensure the post-quantum security of data transmission
hash function SM3 [12]. Hülsing and Kudinov provided a rigorous in the IoV environment, researchers have proposed various solutions.
security proof for the SPHINCS+ algorithm, confirming its robustness The literature [32] recommends the use of lattice-based post-quantum
in a post-quantum environment [13]. The XMSS algorithm forms the digital signature, but the signature algorithm has not been combined
foundation of SPHINCS+, with its architectural design and security with specific scenarios. Another study [33] proposed a ring-signature
proof presented by Hülsing, Butin, and others [14]. Research on hard- scheme based on lattice-based difficult problems and combined it with
ware implementations of the XMSS algorithm has also advanced, with the vehicle-connected environment, but the quantum anti-attack char-
significant contributions from Thoma and Güneysu [15]. Meanwhile, acteristics of the scheme were not explained in detail. In addition,
Sun and Liu investigated the feasibility of replacing the hash function reducing energy consumption in blockchain has also become a research
in XMSS with the domestic SM3 hash function [16]. An essential com- focus [34]. An energy saving method is adopted to calculate the root of
ponent of XMSS is WOTS+, a one-time signature algorithm; Hülsing Merkle tree, and a Merkle tree design scheme conforming to the specifi-
provided its security proof [17], while Zhang, Cui, and colleagues cation is proposed. The effectiveness of this method is verified through
evaluated the efficiency of WOTS+ in tree-based one-time signature experiments. At the same time, the Merkle tree accumulator algorithm
algorithms [18]. Currently, research on post-quantum digital signatures proposed by Derler and Ramacher in [35] builds an accumulator that
primarily concentrates on enhancing signature efficiency and replacing can resist quantum attacks by using only hash function and symmetric
the underlying hash functions. However, there is a scarcity of studies meta language, and gives specific operations and definitions. However,
that integrate post-quantum digital signatures with specific application the specific algorithm implementation and its combination in practical
scenarios or explore their variants. application scenarios need to be further studied.
The exploration of post-quantum ring signatures is also accelerating
in post-quantum digital signature research. Xie, Wang, and colleagues 1.2. Contributions
highlighted that traditional signature algorithms are highly susceptible
to quantum computing attacks, and noted that ring signatures offer Firstly, building on the Merkle tree accumulator algorithm described
considerable advantages in blockchain applications, including medical in Ref. [35], we propose a hash-based ring signature algorithm specif-
data sharing and vehicular networking, due to their unique proper- ically designed for IOV, we improve the Merkle tree accumulator
ties [19]. Chatterjee and Chung et al. conducted an in-depth analysis on algorithm to XMSS accumulator algorithm. This algorithm integrates
the security of post-quantum ring signature, re-examined the security the principles of ring signatures with Merkle tree structures. Unlike
2
S. Liu et al. Journal of Systems Architecture 160 (2025) 103345
Table 1
Notation for ring signature scheme. Let the security parameter 𝜆, ring signature 𝑅𝑆 = (𝐺𝑒𝑛, 𝑠𝑖𝑔 , 𝑉 𝑒𝑟),
𝜆 Security parameter algorithm A is polynomial-time algorithm (any PPT adversary A), for
any integer 𝑠, define the following experiment:
𝑁 The size of the ring
(𝑝𝑘, 𝑠𝑘) Key pair Step 1, the challenger generates 𝑠 key pairs (𝑝𝑘, 𝑠𝑘) in which
𝑅 A ring consisting of (𝑝𝑘1 , 𝑝𝑘2 , … … , 𝑝𝑘𝑙 ) 𝑖 ∈ [1, 𝑠], and sends all the public keys 𝑃 𝐾𝑖 in a set 𝑃 𝐾 = (𝑃 𝐾1 ,
𝑚 The message digest 𝑃 𝐾2 , … , 𝑃 𝐾𝑠 ) to 𝐴.
𝜎 The signature of message Step 2, the challenger chooses one 𝑃 𝐾𝑖 and checks whether 𝑃 𝐾𝑖
belongs to 𝑅, if 𝑆 𝑖𝑔(𝑠𝑘𝑖 , 𝑅, 𝑚) → 𝜎 is calculated by the challenger, then
the challenger will send 𝜎 to A.
Step 3, the attacker outputs the tuple 𝑅 , 𝑚∗ , 𝜎 , and the challenger
traditional ring signature algorithms, this proposed scheme can resist
checks it.
quantum attacks, thus offering post-quantum security.
If: 𝑅𝑃 𝐾 Attacker A never performs signature query access to
Secondly, we construct a new hash-based post-quantum ring sig-
(𝑠𝑖𝑔 𝑛, 𝑅 , 𝑚∗ ),
nature scheme for application of vehicular network. This scheme en- 𝑉 𝑒𝑟(𝑅 , 𝑚∗ , 𝜎 )
hances the security of data transmission within the vehicular network, And returns a 1 for the experiment, or a 0 otherwise.
providing robust post-quantum security to effectively protect shared
data. 𝐴𝑑 𝑣𝜆,𝑠
𝑈𝑁𝐹
(𝐴) = 𝑃 𝑟[𝐸 𝑥𝑝𝜆,𝑠
𝑈𝑁𝐹
(𝐴) = 1] ≤ 𝑛𝑒𝑙𝑔(𝜆)
1.3. Structure Definition 3 (Anonymity). Anonymity in a ring signature scheme en-
sures that the identity of signer remains concealed among a group of
The remainder of this paper is organized as follows: Chapter 2 potential signers, making it impossible to determine who specifically
provides the necessary foundational knowledge, along with a review generated the signature. This anonymity is achieved through a ring
of the background and related work relevant to this study. In Chapter signature generation process that relies on the public keys of all group
3, we present a post-quantum ring signature algorithm based on Merkle members, without revealing the identity of the actual signer.
trees and discuss its application within the IoV environment. Chapter In the anonymization experiment, the adversary is given a ring
4 offers a security analysis and proof of the robustness of proposed. In signature generated from any two pairs of public and private key pairs,
Chapter 5, we evaluate the performance of the scheme and compare it as well as from either of these two private keys, which contains both
public keys owned by the adversary, and the goal of adversary is to
with existing alternatives. Finally, Chapter 6 concludes the paper and
distinguish which private key was used to generate the ring signature
outlines directions for future research.
with negligible probability.
Let the security parameter 𝜆, the ring signature 𝑅𝑆 = (𝐺𝑒𝑛, 𝑠𝑖𝑔 , 𝑉 𝑒𝑟),
2. Preliminaries algorithm A be a polynomial time algorithm, for any integer 𝑠 and any
bit 𝑏, define the experiment as follows:
2.1. Ring signature Step 1, the challenger generates 𝑠 key pairs (𝑃 𝐾𝑖 , 𝑆 𝐾𝑖 ), of which
𝑖 ∈ [1, 𝑠], and sends all the public keys 𝑃 𝐾𝑖 to A.
Ring signature is a digital signature scheme introduced by Rivest, Step 2, A sends (𝑅, 𝑚, 𝑖0 , 𝑖1 ) to the challenger, the challenger checks
Shamir, and Tauman in 2001. A ring is composed of a group of if 𝑝𝑘𝑖0 ∈ 𝑅2 , 𝑝𝑘𝑖1 ∈ 𝑅2 , then the challenger calculates 𝑅2 𝜎
members, allowing any member within the group to sign on behalf 𝑆 𝑖𝑔(𝑠𝑘𝑖𝑏 , 𝑅, 𝑚) and send 𝜎 to A.
of the entire group without revealing the identity of the signing mem- Step 3, A returns a guess bit 𝑏 where the experiment 𝑏 = 𝑏 outputs
1 if and 0 otherwise, and RS is considered anonymous if for all 𝑠 and
ber [36],The main parameters of ring signature are given in Table 1.
all polynomial-time algorithms A, the probability of A returning 1 in
the (𝑠, 0)-anonymous experiment (in the 𝜆) is ignorably close to the
Definition 1 (Ring Signature). A ring signature scheme consists of three
probability of A returning 1 in the (𝑠, 1)anonymous experiment.
core algorithms: key generation, signature generation, and signature
1
verification. These algorithms are defined as follows: 𝐴𝑑 𝑣𝜆,𝑠
𝐴𝑁 𝑂𝑁
(𝐴) = |𝑃 𝑟[𝐸 𝑥𝑝𝜆,𝑠
𝐴𝑁 𝑂𝑁
(𝐴)] | ≤ 𝑛𝑒𝑙𝑔(𝜆)
2
Step1: Key generation
(𝑝𝑘, 𝑠𝑘) ← 𝐺𝑒𝑛(𝜆, 𝑁):The size of the ring is 𝑁, set the security param- 2.2. WOTS+
eters 𝜆 the maximum number of members in the ring 𝑁, 𝜆 and 𝑁 as
input, the output is the public and private key pair. Ralph Merkle pioneered hash-based signature algorithms, as noted
Step2: Signature generation in Ref. [37]. Currently, hash-based signature schemes are categorized
𝜎𝑆 𝑖𝑔 𝑛(𝑠𝑘, 𝑅, 𝑚): Input private key 𝑠𝑘, set of all public keys 𝑅 = into three main types: one-time signature schemes (OTS), few-time
(𝑃 𝐾1 , 𝑃 𝐾2 , … , 𝑃 𝐾𝐿 ), message 𝑚 ∈ 𝑀𝜆 , output signature 𝜎. signature schemes (FTS), and many-time signature schemes (MTS).
The Table 2 below summarizes some of the most widely used hash-
Step3: Signature verification
based signature schemes. Research on OTS schemes began with the
𝑇 𝑟𝑢𝑒𝑓 𝑎𝑙𝑠𝑒𝑉 𝑒𝑟(𝑅, 𝑚, 𝜎): Input a collection composed of all public
Lamport-Diffie algorithm. This paper adopts the WOTS+ (Winternitz
keys 𝑅, message 𝑚 ∈ 𝑀𝜆 , signature 𝜎, and output 𝑇 𝑟𝑢𝑒𝑓 𝑎𝑙𝑠𝑒.
One-Time Signature Plus) scheme, which comprises three main compo-
A ring signature must satisfy two critical security properties: nents: key generation (GEN), signature generation (SIG), and signature
anonymity and Unforgeability. Anonymity ensures that while the sig- verification (VER).
nature indicates it was generated by a member of the ring, it does The first step is parameter selection, where parameter 𝜔, an integer
not reveal the specific identity of the signer. Unforgeability guarantees 𝜔 ∈ 𝑁 with 𝜔 ≥ 2, is determined to set the number of hash iterations
that only members of the ring can generate valid signatures; outsiders required to construct the 𝑛𝑁 public key. Additionally, the hash
cannot create valid signatures for the ring. output length m and security parameter n, where, need to be defined.
Next, parameters 𝑙1 and 𝑙2 are computed, which are then summed to
Definition 2 (Unforgeability). Unforgeability ensures that only members obtain l. The calculation method is as follows:
of the ring can generate a valid signature. In the unforgeability model, ⌈ ⌉ ⌊ ⌋
𝑚 log2 (𝑙1 (𝜔 1)) + log2 𝜔
we assume that the attacker has access to a public key and aims to 𝑙1 = , 𝑙2 = , 𝑙 = 𝑙1 + 𝑙2
log2 𝜔 log2 𝜔
produce a valid ring signature without authorization.
3
S. Liu et al. Journal of Systems Architecture 160 (2025) 103345
Table 2
Classification table for hash-based signature schemes.
Scheme Type Scheme Name
OTS Lamport-Diffe, WOTS, 𝑊 𝑂𝑇 𝑆 +
FTS HORS, HORST-T, PORS, PORS-T
MTS XMSS, SPHINCS, SPHINCS+
Table 3
Parameter descriptions for the WOTS+ algorithm.
𝑛∈𝑁 Security parameter
𝑤∈𝑁 Winternitz parameter (𝑤 ≥ 2)
𝑚∈𝑁 Bit length of the message digest
{ }
𝐹𝑛 A set of functions, 𝐹𝑛 = 𝑓𝑘 𝑘 ∈ {0, 1}𝑛 ,
𝑓𝑘 {0, 1}𝑛 → {0, 1}𝑛
ℎ∈𝑁 Height of the tree
H Hash function, 𝐻 {0, 1} → {0, 1}𝑚
𝑥 ∈ {0, 1}𝑛 Randomly chosen string 𝑥,
used to construct a one-time verification key
Fig. 2. Key generation process for WOTS+.
The Table 3 gives the meaning of the parameters in the formula.
Next define the operation, WOTS+ uses the function 𝐹𝑛 family:
𝐹𝑛 {0, 1}𝑛 → {0, 1}𝑛
Fig. 3. Message digest generation graph.
Define the function operation:
{ 𝑖
𝑐 (𝑥, 𝑟) = 𝐹 (𝑐𝑘𝑖1 (𝑥, 𝑟) ⊕ 𝑟𝑖 ) 𝑖 > 0
𝑐 𝑖 (𝑥, 𝑟) = 𝑥, 𝑖 𝑖=0
𝑥 ∈ {0, 1}𝑛
𝑛 𝑛
⎨𝐹 = 𝐹 𝑛 {0, 1} → {0, 1}
𝑟 = (𝑟 , 𝑟 , … … , 𝑟 𝑤 ) 𝑟 ∈ {0, 1}𝑛×(2
𝜔1 )
⎩ 1 2 2 1
Step1: Key Generation(GEN)
The process of key generation mainly includes two steps: private
key generation and public key generation. The key generation process
is shown in Fig. 2.
(1) Private key generation: Using PRG to generate 𝑙 + 2𝜔 1 n
bits of random number, the first random number is the private key
𝑠𝑘 = (𝑠𝑘0 , 𝑠𝑘1 , … … , 𝑠𝑘𝑙1 ), and the last 2𝜔 1 are the mask, 𝑟 =
(𝑟1 , 𝑟2 , … … , 𝑟2𝜔 1 ).
(2) Public key generation: The public key consists of 𝑙 + 1 blocks,
the first block is the mask r, the last L blocks are converted by sk, and
The public key is composed as follows:
𝜔
𝑝𝑘𝑖 = 𝑐 2 1 (𝑠𝑘𝑖1 , 𝑟), 𝑖 ∈ [1, 𝑙] Fig. 4. WOTS+ signature generation diagram.
𝑝𝑘 = (𝑝𝑘0 , 𝑝𝑘1 , … , 𝑝𝑘𝑙 )
( 𝜔1 𝜔1
)
= 𝑟, 𝑐 2 (𝑠𝑘0 , 𝑟), … , 𝑐 2 (𝑠𝑘𝑙1 , 𝑟)
The message M is converted to 𝑏 = (𝑏0 , 𝑏1 , … … , 𝑏𝑙1 ). Then, the
Step2: Message Signature(SIG) transmitted signature 𝜎 = (𝜎0 , 𝜎1 , … … , 𝜎𝑙1 ) is processed as follows to
(1) Generate message digest: Generate message digest M that needs obtain 𝑝𝑘 . If the signature is the same as pk, the signature verification
to be signed message m through the hash function, and then divide the succeeds.
message digest into 𝑙1 parts, each 𝜔 bit, where each 𝜔 bit represents the 𝑝𝑘 =(𝑟, 𝑝𝑘1 , 𝑝𝑘2 , … , 𝑝𝑘𝑙 )
𝑚𝑖 , 𝑖 ∈ [0, 𝑙1 1] equivalent of an integer. The message digest generation ( 𝜔 𝜔 𝜔
)
process is shown in Fig. 3, and the overall signature generation process = 𝑟, 𝐹 2 1𝑏0 (𝜎0 ), 𝐹 2 1𝑏1 (𝜎1 ), … , 𝐹 2 1𝑏𝑙1 (𝜎𝑙1 )
is shown in Fig. 4.
(2) Calculate the checksum:
𝑙1
∑ 2.3. XMSS
𝐶= (2𝜔 1 𝑚𝑖 ) ≤ 𝑙1 (2𝜔 1)
𝑖=1 2.3.1. Merkle tree
Divide C into 𝜔 bits, and 𝑐 = (𝑐0 , 𝑐1 , … … , 𝑐𝑙2 1 ). The Merkle Signature Scheme (MSS), proposed by Ralph Merkle in
Let 𝑏 = (𝑏0 , 𝑏1 , … … , 𝑏𝑙1 ), that is b be the concatenation of 𝑚 and 𝑐. 1979, integrates the Merkle Tree with an OTS algorithm. A Merkle tree
Signature generation is represented by the following formula: is a hierarchical structure where leaf nodes contain hash values of data,
and non-leaf nodes store the combined hash values of their child nodes.
𝜎 = (𝜎0 , 𝜎1 , … , 𝜎𝑙1 ) This structure enables efficient data integrity verification, especially for
( )
= 𝐹 𝑏0 (𝑠𝑘0 , 𝑟), 𝐹 𝑏1 (𝑠𝑘1 , 𝑟), … , 𝐹 𝑏𝑙1 (𝑠𝑘𝑙1 , 𝑟) large-scale datasets. The structure of the Merkle tree is shown in Fig. 5.
According to the Fig. 5, the tree has 3 layers and 23 = 8 leaf nodes,
Step3: Message verification(VER) each storing the hash of a one-time signature public key. The leaf nodes,
4
S. Liu et al. Journal of Systems Architecture 160 (2025) 103345
Fig. 5. Merkle tree structure diagram.
labeled node0 to node7, are hashed pairwise to generate the middle 2.3.4. Signature verification
nodes. The final root node stores the public key. The signature verification process ensures the correctness of the
The Merkle tree serves two primary functions: OTS signature and validates that the corresponding OTS public key
(1) Data Integrity Verification, where users can check if data has is consistent with the root of the Merkle tree. The main steps are as
been tampered with by recalculating the root hash. follows:
(2) Public Key Size Compression, reducing the storage requirements Step1: Extract Information
for numerous public keys by consolidating them into a single root key. Extract OTS serial number 𝑖, OTS signature 𝑆 𝑖𝑔𝑂𝑇 𝑆 , and path proof
AuthPath for the Merkle tree from XMSS signature 𝑆 𝑖𝑔𝑋 𝑀 𝑆 𝑆 .
2.3.2. Key generation
Step2: Verify OTS signature
The XMSS algorithm deploys 2 WOTS+ instances as the 2 leaf
Using the extracted OTS public key, verify the validity of 𝑆 𝑖𝑔𝑂𝑇 𝑆
nodes of a Merkle tree with height , with the root node authenticating
for the message M. If verification fails, the signature is deemed invalid.
these instances [38]. The XMSS key consists of multiple OTS keys and
Step3: Compute Merkle Tree Path
the root of the Merkle tree as the public key.
Step1: Select the parameters Calculate the Merkle tree node of the OTS public key Using OTS
Step2: Generate a one-time signature key pair (𝑝𝑘, 𝑠𝑘) public key 𝑝𝑘𝑖 and path proof AuthPath, calculate the hash value of
Step3: Build the Merkle tree the parent node step by step from the leaf node 𝑝𝑘𝑖 until the root node
Use each OTS public key 𝑝𝑘𝑖 as a leaf node of the Merkle tree. 𝑁 𝑜𝑑 𝑒(𝑖) = 𝐻(𝑐 𝑖𝑙𝑑(𝑖) ∥ 𝑐 𝑖𝑙𝑑(𝑖)) is calculated.
Each leaf node generates non-leaf nodes through a hash function, which Step4: Compare Root Nodes
eventually generates the Root node. The parent node in the Merkle tree Compare the reconstructed root node with the root node Root
is generated from the hash of the two child nodes, that is, 𝑁 𝑜𝑑 𝑒(𝑖) = from the XMSS public key. If the values match, the signature is valid;
𝐻(𝑐 𝑖𝑙𝑑(1) ∥ 𝑐 𝑖𝑙𝑑(𝑖)), the root node 𝑅𝑜𝑜𝑡 serves as the XMSS public otherwise, it is invalid.
key.
Step4: Output the key pair 3. Hash-based post-quantum ring signature scheme
Public key: 𝑝𝑘 = (𝑟𝑜𝑜𝑡, 𝑠𝑒𝑒𝑑), the private key consists of the OTS key
pairs. In addition to its high computational efficiency and excellent scal-
ability, the hash function-based signature scheme exhibits greater al-
2.3.3. Message signature gorithmic maturity compared to other post-quantum digital signature
To sign a message, an unused WOTS+ private key is selected, and schemes, such as XMSS and SPHINCS+. Furthermore, post-quantum
the Merkle tree path proof is generated to output the signature SIG.
ring signatures ensure both the anonymity and unforgeability of signa-
Step1: Select WOTS+ key
tures. Consequently, in light of the security threats posed by the rapid
Choose an unused WOTS+ private key 𝑠𝑘𝑖 , ensuring it is used only
advancement of quantum computing, it is highly significant to integrate
once.
the post-quantum ring signature scheme with vehicle networking.
Step2: Generate WOTS+ one-time signature
Use the WOTS+ private key to sign message M, producing the OTS
signature 𝑆 𝑖𝑔𝑂𝑇 𝑆 . 3.1. Design principles
Step3: Merkle tree path proof
Hash path from leaf node 𝑝𝑘𝑖 to Root node, this path proves that The Merkle tree is an efficient data structure, a binary hash tree
OTS public key is valid. where each node represents the hash value of a data block. The root
Step4: Generate XMSS signature node represents the hash of the entire data set. The characteristics
The signature includes: serial number 𝑖 (using the 𝑖 th OTS key), of the Merkle tree make it a highly efficient method for storing and
OTS signature 𝑆 𝑖𝑔𝑂𝑇 𝑆 , and AuthPath for authentication of the Merkle verifying large amounts of data. In blockchain, Merkle trees are widely
tree 𝑆 𝑖𝑔𝑋 𝑀 𝑆 𝑆 = (𝑖, 𝑆 𝑖𝑔𝑂𝑇 𝑆 , 𝐴𝑢𝑡𝑃 𝑎𝑡). used to store transaction data and block hashes. Ring signatures enable
5
S. Liu et al. Journal of Systems Architecture 160 (2025) 103345
Table 4
Meaning of parameters in the proposed scheme.
𝐸 𝑣𝑎𝑙𝑟 ((𝑠𝑘𝛺 , 𝑝𝑘𝛺 ), 𝑋 ) → 𝛺∗ ⎤
Parameter Description ⎢ 𝑖
𝑃 𝑟 ⎢ (Gen(1𝑘 , 𝑡) → (𝑠𝑘𝛺 , 𝑝𝑘𝛺 ))(𝐴(𝑝𝑘𝛺 ) → (𝑤𝑖𝑡𝑥𝑖 , 𝑥𝑖 , 𝑋 )) ⎥ ≤ 𝜀(𝑘)
𝑘 Security parameter
𝑉 𝑒𝑟𝑖𝑓 𝑦(𝑝𝑘𝛺 , 𝛺∗ , 𝑤𝑖𝑡 , 𝑥 ) = 1 ∧ 𝑥𝑖𝑋
𝑡 Maximum number of elements to accumulate ⎣ 𝑥𝑖 𝑖
𝑖 𝑖 ∈ [0, 2 1]
ℎ∈𝑁 Height of the tree The implementation of the Merkle tree ring signature is described
𝐻 Hash function, 𝐻 {0, 1} → {0, 1}𝑚
next, and the whole process is covered in Algorithm 1.
(𝑠𝑘𝛺 , 𝑝𝑘𝛺 ) A key pair
{ } Step1: Key Generation: 𝐺𝑒𝑛(1𝑘 , 𝑡)
𝑋 The set of 𝑥𝑖 𝑖 ∈ [0, 2 1] { }
𝛺 The accumulator First, determine the hash functions 𝐻𝑘 𝑘∈𝐾 𝐾 , where for any 𝑘
𝑎𝑢𝑥 The auxiliary information 𝐾 𝐾 , the hash function 𝐻𝑘 {0, 1} → {0, 1}𝐾 . The hash function can be
𝑤𝑖𝑡𝑥𝑖 The certificate for 𝑥𝑖 chosen as SHA functions, SM2, SM3, etc. Determine the parameter N,
which represents the number of ring members, and 𝑡, the upper bound
for accumulating elements. Then, generate the key pairs and return
(𝑠𝑘𝛺 , 𝑝𝑘𝛺 ).
a message sender to demonstrate possession of at least one public
Step2: Public Key Evaluation Eval: 𝐸 𝑣𝑎𝑙((𝑠𝑘𝛺 , 𝑝𝑘𝛺 ), 𝑋)
key within a set while concealing the specific public key used, thus
Parse the number of ring members N. The parsing rule is that if N
providing anonymity and unlinkability. This feature makes ring sig-
natures particularly valuable in applications centered on privacy and is not a power of 2, the function returns false, as it must be a perfect
secure communication. Within ring signatures, Merkle trees can be binary tree. If N is a power of 2, begin computation from layer 0 (the
employed to organize the hashes of messages or data blocks into a leaf nodes at the lowest level) and continue until the root (the single
tree structure, facilitating efficient verification of data integrity and node at the top) is obtained. Let 𝐿𝑢,𝑣 represent the node at layer v and
authenticity. Furthermore, ring signatures can leverage Merkle trees the u-th leaf index. The auxiliary variable aux stores the hash values
to obscure the identity of sender by integrating the public key of corresponding to each layer.
signer with those of other members in a ring. Consequently, the signer Step3: Certificate Creation: 𝑊 𝑖𝑡((𝑠𝑘𝛺 , 𝑝𝑘𝛺 ), 𝛺𝑋 , 𝑎𝑢𝑥𝑥𝑖 , 𝑥𝑖 )
can validate ownership of at least one public key in the set without First, parse aux into nodes at each level of the Merkle tree. Then, re-
disclosing the specific key used. Even if an attacker intercepts the construct the Merkle tree from bottom to top. The 𝑊 𝑖𝑡𝐶 𝑟𝑒𝑎𝑡 algorithm
signed message, they would be unable to ascertain the true identity involves using intermediate nodes to build up to the root hash value.
of the signer. Step4: Certificate Verification: 𝑉 𝑒𝑟𝑖𝑓 𝑦(𝑝𝑘𝛺 , 𝛺𝑋 , 𝑤𝑖𝑡𝑥𝑖 , 𝑥𝑖 )
The final step is verification. Start by setting the leaves to the hash
3.2. Scheme description values of each party and proceed to compute hashes from the bottom
up. Check if the final result matches the root node value. If it matches,
This scheme is based on the definition of Merkle tree accumulators it verifies that the member is part of the ring. For example, node 𝑙0,2 is
as described in [35], with slight modifications to accommodate the visualized in Fig. 6, showing how node 𝑙0,2 reconstructs the root node
proposed post-quantum ring signature scheme utilizing hash functions, in a Merkle tree with height = 3 and 𝑁 = 8 leaf nodes.
specifically designed for vehicular networks. This formalism facilitates
the restatement of the Merkle tree accumulator algorithm within the
current framework. The main parameters of this scheme are given in Algorithm 1 Extend Merkle tree accumulator
Table 4. input: 𝑘, 𝑡, {𝐻𝑘 }𝑘∈𝐾 𝜅 , 𝐻𝑘 {0, 1} → {0, 1}𝜅
output: (𝑠𝑘𝛺 , 𝑝𝑘𝛺 ), 𝐿𝑢,𝑣 , 𝑤𝑖𝑡𝑥𝑖 , 0 or 1
Definition 4 (Extend Merkle Tree Accumulator). The Merkle tree accu-
mulator algorithm (Algorithm 1) comprises the following subroutines 1. 𝑘 ∈ 𝐾𝜅 # Key generation 𝐺𝑒𝑛(1𝑘 , 𝑡)
(Gen, Eval, WitCreate, Verify), defined as follows: 2. (𝑠𝑘𝛺 , 𝑝𝑘𝛺 ) ← {𝐻𝑘 }𝑘∈𝐾 𝜅
𝐺𝑒𝑛(1𝑘 , 𝑡): The key generation algorithm takes a security parameter 3. 𝐻𝑘 ← 𝑝𝑘𝛺 # Public Key Resolution
𝑘 and a parameter 𝑡, where 𝑡 is the upper bound on the number of 4. (𝑥0 , 𝑥1 , … , 𝑥𝑛1 ) ← 𝑋
elements to be accumulated, and returns a key pair (𝑠𝑘𝛺 , 𝑝𝑘𝛺 ). 5. If 𝑛 = 2𝑘 𝑘 ∈ N, 𝑣𝑘:
𝐸 𝑣𝑎𝑙((𝑠𝑘𝛺 , 𝑝𝑘𝛺 ), 𝑋): This algorithm takes the key pair (𝑠𝑘𝛺 , 𝑝𝑘𝛺 ) and
6. 𝐻𝑘 (𝐿2𝑢,𝑣+1 ∥𝐿2𝑢+1,𝑣+1 ) if 𝑣 < 𝑘 else 𝐻𝑘 (𝑥𝑖 )
the set of elements X to be accumulated, returning the accumulator 𝛺𝑋
and some auxiliary information aux. 7. Else False
( )
𝑊 𝑖𝑡𝐶 𝑟𝑒𝑎𝑡((𝑠𝑘𝛺 , 𝑝𝑘𝛺 ), 𝛺𝑋 , 𝑎𝑢𝑥, 𝑥𝑖 ): This algorithm takes the key 8. 𝑙𝑢,𝑣 (𝑢∈[𝑛2𝑘𝑣 ]) ← 𝑎𝑢𝑥 # Creates a certificate
𝑣∈[𝑘]
pair(𝑠𝑘𝛺 , 𝑝𝑘𝛺 ), accumulator 𝛺𝑋 , auxiliary information aux, and an
𝑊 𝑖𝑡𝐶 𝑟𝑒𝑎𝑡𝑒((𝑝𝑘𝛺 , 𝑠𝑘𝛺 ), 𝛺𝑋 , 𝑎𝑢𝑥𝑋 , 𝑥𝑖 )
element 𝑥𝑖 . If 𝑥𝑖 is not in the set X, it returns false; otherwise, it returns
a certificate𝑤𝑖𝑡𝑥𝑖 for 𝑥𝑖 . 9. 𝑤𝑖𝑡𝑥𝑖 ← (𝑙𝑖2𝑣 ⌋ + 𝜂 , 𝑘 𝑣), 0 ≤ 𝑣𝑘
𝑉 𝑒𝑟𝑖𝑓 𝑦(𝑝𝑘𝛺 , 𝛺𝑋 , 𝑤𝑖𝑡𝑥𝑖 , 𝑥𝑖 ): This algorithm takes the public key 𝑝𝑘𝛺 , 10. 1 if ⌊𝑖2𝑣 ⌋ (mod 2) = 0 else 1
accumulator 𝛺𝑋 certificate 𝑤𝑖𝑡𝑥𝑖 , and element 𝑥𝑖 . If 𝑤𝑖𝑡𝑥𝑖 is a valid 11. 𝐻𝑘 ← 𝑝𝑘𝛺 , 𝐿0,0 ← 𝛺𝑋 # Certificate authentication
certificate for 𝑥𝑖 it returns 1; otherwise, it returns 0.
𝑉 𝑒𝑟𝑖𝑓 𝑦(𝑝𝑘𝛺 , 𝛺𝑋 , 𝑤𝑖𝑡𝑥𝑖 , 𝑥𝑖 )
The Merkle tree accumulator ensures both correctness and collision
resistance. Collision resistance indicates the difficulty of finding an 12. 𝐿𝑖,𝑘𝐻𝑘 (𝐿𝑖2𝑣 ⌋,𝑘𝑣𝐿𝑖2𝑣 ⌋+1,𝑘𝑣 ) If ⌊𝑖2𝑣 ⌋ (mod 2) = 0
element 𝑥𝑖,𝑗 that does not belong to X yet possesses a valid certificate else 𝐿𝑖,𝑘𝐻𝑘 (𝐿𝑖2𝑣 ⌋,𝑘𝑣𝐿𝑖2𝑣 ⌋,𝑘𝑣 )
𝑥𝑖,𝑗 . 13. 1 if 𝑤𝑖𝑡𝑥𝑖 is a valid witness for 𝑥𝑖𝑋 else 0
Definition 5 (Collision Resistance). Collision resistance implies that for
an adversary 𝐴 possessing a valid key pair (𝑠𝑘𝛺 , 𝑝𝑘𝛺 ) generated by 3.3. Signature algorithm description
the Gen algorithm, and under the assumption that intermediate values
are correct, the probability of finding an element 𝑥𝑖 that is not in the The hash-based post-quantum ring signature scheme explored in
accumulator 𝑋 but still produces a verification result of 1 is negligible. this work is based on the XMSS algorithm, which incorporates two
Assuming the existence of a negligible function 𝜀(𝑘), collision resistance primary frameworks: the WOTS+ algorithm and the Merkle tree algo-
is formally defined as follows: rithm. Below is an overview of these frameworks.
6
S. Liu et al. Journal of Systems Architecture 160 (2025) 103345
The formal signing process begins by selecting the corresponding one-
time signature (OTS) key pair (𝑥𝑖 , 𝑦𝑖 ), specifically the 𝑖th OTS key pair.
The signer then uses the private OTS key 𝑥𝑖 to sign the message,
creating a one-time signature 𝜎𝑂𝑇 𝑆 and calculating the authentication
path. The final signature comprises: the index 𝑖, the one-time signature
𝜎𝑂𝑇 𝑆 , the public key 𝑦𝑖 , and the authentication path for 𝑦𝑖 , denoted
𝑎𝑢𝑡𝑖 . The signature is formally represented as 𝜎 = (𝑖, 𝜎𝑂𝑇 𝑆 , 𝑌𝑖 , 𝑎𝑢𝑡𝑖 ).
The Fig. 7 illustrates the signing process using leaf node𝑥2 as the signing
node, where the shaded areas represent the authentication path of the
Fig. 6. A Merkle tree with a height of h = 3 and a number of leaf nodes N = 8 signature.
visualizes the reconstruction of the root node by 𝑙0.2 nodes.
Step 4: Signature Verification
As shown in Algorithm 4, signature verification begins by first
verifying the one-time signature 𝜎𝑂𝑇 𝑆 . If this check is successful, the
Definition 6 (Merkle Tree Ring Signature Algorithm). The Merkle tree- next step involves reconstructing the Merkle tree root based on the
based ring signature algorithm comprises four main steps: parameter chosen index 𝑖 and the public key 𝑦𝑖 . The reconstructed root is then
definition, public key generation, signature generation, and signature compared with the stored public key. If the two match, verification is
verification. These steps are outlined as follows: deemed successful.
Step 1: Parameter Definition
Algorithm 4 Signature verification
The height h of the tree represents its number of layers, meaning a
Merkle tree with height has 2 leaf nodes, indicating 2 ring members input: 𝜎
and corresponding key pairs (𝑥𝑖 , 𝑦𝑖 ), 𝑖 ∈ [0, 2 1]. output: true or false
1 If
In practical application scenarios, if the number of vehicles does
2 𝑉𝐸𝑅(𝑀 , 𝑠𝑖𝑔(𝑂𝑇 𝑆), 𝑌𝑖 ) = 𝑡𝑟𝑢𝑒
not satisfy this condition, it is recommended to either introduce virtual
3 Reconstruct the 𝑟𝑜𝑜𝑡 node of the merkle tree
members into the ring or divide the vehicles into multiple rings.
according to i and Yi
Step 2: Public Key Generation/Merkle Tree Construction
4 If
As shown in algorithm 2, in the Merkle tree, all leaf nodes together 5 𝑅𝑜𝑜𝑡 = 𝑃 𝐾
constitute the ring. Each member in the ring is represented by a public 6 true
private key pair corresponding to a leaf node. Each leaf node holds the 7 Else
hash of the public key derived from a one-time signature (OTS) scheme, 8 False
while each parent node stores the hash of the concatenation of its two 9 Else
child nodes. This process repeats according to the same generation rule 10 False
until the final root node is formed. The value of the root node is the
final public key, while the private key consists of the 2 OTS private
To illustrate the reconstruction process, consider node𝑥2 as an
keys 𝑥𝑖 . The number of ring members equals the number of leaf nodes in
example, assuming 𝑖 = 2 and 𝑌2 known, along with the signature 𝜎 =
the Merkle tree. It is essential to ensure that the number of participating
(2, 𝜎𝑂𝑇 𝑆 , 𝑌2 , 𝑎𝑢𝑡2 ). Here, 𝑎𝑢𝑡2 contains values stored in nodes 3, 8, and
members in the ring is a power of 2. The public key of each ring
13. The root node can be reconstructed as follows: node14=hash(node
member corresponds to the public key from the one-time signature.
12∥node13), node12=hash(node8∥node9), node9= hash(node2∥node3)
wh-ere node2 stores the value of 𝑌2 . The computed value of node14 is
Algorithm 2 Public Key Generation the value of the reconstructed root 𝑟𝑜𝑜𝑡 . This is shown in Fig. 8. By
input: h, SK hashing upwards from the leaf nodes, if a match with the stored root
output: PK node is found, the membership of signer in the ring is verified.
( )
1. 𝑛𝑜𝑑 𝑒𝑖 = 𝐻 𝑎𝑠 𝑛𝑜𝑑 𝑒2𝑖+1 ||𝑛𝑜𝑑 𝑒2𝑖 , 𝑖 ∈ [0, 2 1]
2. Root=Hash(node1|| node2) 3.4. Application of the scheme in vehicular networks
3. PK=Root
The proposed hash-based signature scheme offers post-quantum
security, protecting against quantum threats, and is highly efficient
Step 3: Signature Generation Before executing the ring signature opera- with compact signatures, ideal for resource-constrained on-board de-
tion, the signer hashes the binary message to generate a message digest vices in IoV. It supports fast information exchange and verification in
𝑚 = 𝐻(𝑀), where H is the chosen hash function, and M represents the dynamic traffic environments, enhancing security and privacy, such as
original binary message. This digest 𝑚 will be used in the subsequent in accident reporting systems, while maintaining reporter anonymity.
steps of the signature generation process. This process is shown in Overall, it addresses key security, efficiency, and scalability challenges
algorithm 3. in connected vehicle networks.
The application of ring signatures in IoV involves three main stages:
the registration stage, the inter-vehicle communication stage, and the
Algorithm 3 Signature generation signature tracing and broadcast stage.
input: M, H, one-time signature key pair (𝑥𝑖 , 𝑦𝑖 ) Step 1: Registration Stage
output: 𝜎 This stage consists of three main steps, First, the On-Board Unit
1 (𝑥𝑖 , 𝑦𝑖 ), 𝑖 ∈ [0, 2 1] (OBU) sends a registration request to the Trusted Authority (TA).
2 For 𝑥𝑖 Upon receiving the request, the TA generates a publicprivate key
3 Select node to perform a one-time digital pair (𝑃 𝐾𝑂𝐵𝑈 , 𝑆 𝐾𝑂𝐵𝑈 ) for the OBU. In the final step, the TA returns
signature on message M to generate the private key to the OBU, along with the public key and identity
signature 𝜎𝑂𝑇 𝑆 information bound to the blockchain network. The identity information
4 Calculate 𝑦𝑖 authentication path 𝑎𝑢𝑡𝑖 typically includes vehicle certificates, vehicle identification numbers
5 𝜎 = (𝑖, 𝜎𝑂𝑇 𝑆 , 𝑌𝑖 , 𝑎𝑢𝑡𝑖 ) (VIN), and other vehicle-related data. This process ensures that vehicles
7
S. Liu et al. Journal of Systems Architecture 160 (2025) 103345
Fig. 7. Diagram of the signature generation process.
Fig. 8. Signature verification diagram.
are properly registered and recognized within the blockchain network, the signatures and returns the verification results to the requesting
as illustrated in Fig. 9. OBU, enabling secure and authenticated access to the information. This
Step 2: Inter-Vehicle Communication Stage process is further illustrated in Fig. 10.
At this stage, the OBU utilizes the public key of the Roadside Step 3: Signature Tracing and Broadcast Stage
Unit (RSU) 𝑃 𝐾𝑅𝑆 𝑈 to encrypt its own public key and sends it to the In the event of an accident, the OBU sends accident-related informa-
RSU, requesting the creation of a ring. Upon receiving the encrypted tion to the RSU, which then processes and broadcasts the information
message, the RSU decrypts it using its private key to obtain 𝑃 𝐾𝑂𝐵 𝑈 , to other OBUs. At the same time, the RSU forwards the signature of the
which is then added to the ring. When the number of ring members OBU involved in the accident, denoted as 𝑆 𝐼 𝐺(𝑂𝐵 𝑈 𝑎𝑐 𝑐 ) to the TA. The
reaches the threshold of 2 , the RSU broadcasts the ring structure, TA uses its private key to identify the relevant vehicle information. If
allowing all ring members to participate in signing processes. the OBU is determined to be malicious, the TA revokes its identity and
If the threshold is not met, virtual members may be added, or the public key on the blockchain network. The TA then sends the revoked
ring may be split into smaller sub-rings to ensure each ring contains public key and the adverse record of the malicious OBU to the RSU. The
2 members. Once the ring is established, the OBU can sign messages RSU subsequently broadcasts this information to other OBUs, ensuring
using a ring signature and forward them to the RSU. The RSU sub- they are aware of the revoked identity and can exclude the malicious
sequently broadcasts the signed messages to other OBUs, which can OBU from further network participation. This process is illustrated in
request verification from the Verification Node (VN). The VN validates Fig. 11.
8
S. Liu et al. Journal of Systems Architecture 160 (2025) 103345
Fig. 12. IOV model based on post-quantum ring signature.
accident, sends the public key and adverse record of the vehicle
Fig. 9. Registration phase.
involved to the RSU.
[4] Verification Node (VN): Responsible for verifying signature re-
quests sent by other vehicles.
[5] Anonymous Blockchain Network (ABN): In this model, vehicle
public keys are stored in the blockchain network, providing a
secure and anonymous framework for identity management.
In addition to the interactions between the OBU and the TA, as well
as between the OBU and RSU in the aforementioned process, within
a specific segment of roadway, the OBU is also capable of engaging
with pedestrians, road infrastructure, and stations located within that
segment.
In general, the integrity and privacy protection of data transmis-
sion are more emphasized in interactions between vehicles and other
vehicles, as well as roadside units. However, interactions between
Fig. 10. Information interaction phase.
vehicles and pedestrians often involve location verification and identity
confirmation. In a vehicular networking system, vehicles may need to
verify both the identity and location of pedestrians, while using post-
quantum ring signatures to ensure the integrity and non-repudiation of
pedestrian information.
4. Security analysis
4.1. Safety assessment
The proposed scheme possesses the following characteristics:
(1) Anonymity: Ring signatures inherently support anonymity, pro-
tecting the identity of signer. Assuming an attacker has obtained a valid
ring signature generated only by members within the ring, if the ring
contains 𝑛 members, the probability that the attacker identifies the true
signer is 1𝑛. For any member other than the signer, the probability of
Fig. 11. Signature tracing phase. knowing the identity of signer is 1𝑛 1.
(2) Privacy: The generation of a ring signature relies solely on the
signer within the ring, with no involvement from other ring members,
When applying this ring signature scheme to a vehicular network thus preserving the privacy of the signer.
system, the overall model framework is shown in Fig. 12. The primary (3) Post-Quantum Security: This scheme employs a post-quantum
ring signature approach based on Merkle trees, leveraging hash-based
components of the model include:
and post-quantum secure mathematical problems. This design provides
robust security against quantum computing threats. The use of hash-
[1] On - Board Unit (OBU): Responsible for sending requests to the
based post-quantum ring signatures combines the strong properties of
TA, transferring its public key to the RSU, signing messages with
hash functions with quantum-resilient security, maintaining integrity
the ring signature, and sharing traffic accident information. even under potential quantum computing attacks.
[2] Road - Side Unit (RSU): Organizes received public keys into a (4) Efficiency: The computational efficiency of hash functions makes
ring, broadcasts signatures, accident information, and adverse this scheme suitable for a variety of application scenarios.
records to other vehicles, and forwards accident-related signa- (5) Unforgeability: The scheme ensures unforgeability through the
tures to the TA. one-way and irreversible properties of hash functions in constructing
[3] Trusted Authority (TA): Generates key pairs for the OBU, up- hash chains. Thus, it is highly challenging for anyone other than the
loads these to the blockchain network, and, in the event of an legitimate signer to forge a signature within this scheme.
9
S. Liu et al. Journal of Systems Architecture 160 (2025) 103345
C computes the corresponding 𝜎𝑠 , which S returns as a complete ring
signature to A.
Step 4: In the challenge phase, A sends M and an unobserved forged
ring signature to S, which calculates the corresponding 𝑌𝑠 of the forged
signer and submits (𝑌𝑠 , 𝜎𝑠 ) to C. If C verifies 𝑌𝑠 and 𝜎𝑠 as valid, then
S has successfully forged a signature, with output 1; otherwise, S fails,
outputting 0.
Since A can break the scheme with non-negligible probability P,
we deduce that 𝑝𝑟(𝑜𝑢𝑡𝑝𝑢𝑡(𝐺𝑎𝑚𝑒) = 1) = 𝑝, allowing S to break the
post-quantum ring signature algorithm with non-negligible probability.
However, this contradicts the assumed security of scheme, proving that
A cannot successfully forge signatures in polynomial time.
Fig. 13. Authentication path diagram of a node with index i = 2.
Theorem 3. If the underlying hash function family {𝐻𝑘 }, 𝑘𝐾𝐾 is a
collision-resistant family, then the proposed hash-based post-quantum ring
4.2. Security proof
signature scheme is collision-resistant.
The following section provides security proofs and discussions for Proof. During initialization, this reduction interacts with a collision-
the proposed scheme: resistant hash function challenge to acquire 𝐻𝑘 and completes initial-
ization per the original protocol. If an attacker generates a collision
Lemma 1. If a one-time signature scheme passes verification and the within the accumulator, this implies that the reduction knows two
reconstructed Merkle root Root matches the original Merkle root Root, then distinct inputs that collide under 𝐻𝑘 , with the collision probability
the signature is valid. bounded by the collision resistance of hash function.
Proof. Suppose the index 𝑖 = 2 is chosen for the one-time signature key Theorem 4. If the employed hash functions are one-way, then the proposed
used in the message signature. The nodes from index 𝑖 = 2 to the root Merkle-tree-based post-quantum ring signature scheme is unforgeable under
node traverse nodes [2, 9, 12], with sibling nodes [3, 8, 13], forming chosen-message attacks.
a verification path [3, 8, 13], In Fig. 13, we illustrate the verification Let 𝑛, 𝑤, 𝑚 ∈ 𝑁 , 𝑤𝑖𝑡𝑤, 𝑚 = 𝑝𝑜𝑙𝑦(𝑛), and let the function family 𝐹𝑛 =
pathway of the leaf node indexed at 2, which is depicted as the gray 𝑓𝑘 {0, 1}𝑛 → {0, 1}𝑛 where 𝑘 ∈ {0, 1}𝑛 satisfy second-preimage resistance
node. Reconstructing the root Root* follows these steps: and one-way properties. The variable t represents the computational time.
𝑁 𝑜𝑑 𝑒(9) = Hash(𝑛𝑜𝑑 𝑒(2) ∥ 𝑛𝑜𝑑 𝑒(3)) The term 𝜔 ⋅ 𝐼 𝑛𝑆 𝑒𝑐 𝑈 𝐷 (𝐹𝑛 ; 𝑡 ) reflects the undetectability (UD) security of
the function family 𝐹𝑛 , while 𝐼 𝑛𝑆 𝑒𝑐 𝑂𝑊 (𝐹𝑛 ; 𝑡 ) represents its one-way(OW)
𝑁 𝑜𝑑 𝑒(12) = Hash(𝑛𝑜𝑑 𝑒(9) ∥ 𝑛𝑜𝑑 𝑒(8)) security. Additionally, the term 𝜔 ⋅ 𝐼 𝑛𝑆 𝑒𝑐 𝑆 𝑃 𝑅 (𝐹𝑛 ; 𝑡 ) denotes the second-
preimage resistance(SPR) security, scaled by the parameter 𝜔. The formal
definitions of EU-CMA and SPR are provided in [14], and will not be
𝑁 𝑜𝑑 𝑒(14) = Hash(𝑛𝑜𝑑 𝑒(12) ∥ 𝑛𝑜𝑑 𝑒(13))
elaborated on here.
The value of node 9 is computed from nodes 2 and 3, the value of We define the unforgeability insecurity under chosen-message at-
node 12 is computed from nodes 9 and 8, and the value of the root node tack of WOTS+ as follows:
Root (node 14) is computed from nodes 12 and 13. This computed
lnSecEU-CMA (WOTS+ (1𝑛 , 𝑤, 𝑚); 𝑡, 1)
Root value is then compared with the public key. Clearly, the hash of
Root matches the original public key. The proof process for any other ≤ 𝑤 ⋅ ln SecUD (𝐹𝑛 ; 𝑡 ) + 𝑤𝑙
node is identical, thus confirming the correctness of the signature. ⋅ max{ln SecOW (𝐹𝑛 ; 𝑡 ), 𝑤 ⋅ ln SecSPR (𝐹𝑛 ; 𝑡 )} with 𝑡
= 𝑡 + 3𝑙𝑤 and 𝑡
Theorem 1. The proposed post-quantum ring signature scheme preserves
= 𝑡 + 3𝑙𝑤 + 𝑤 1
anonymity.
Assuming a valid signature 𝜎 = (𝑖, 𝜎𝑂𝑇 𝑆 , 𝑌𝑖 , 𝑎𝑢𝑡𝑖 ), where each value For WOTS+ combined with Merkle trees, the non-forgeability under
of 𝑖 is within the appropriate range 𝑖 ∈ [0, 2 1], the probability that chosen-message attacks on the Merkle tree can be defined as follows:
any other person can identify the true signer is 12 (for a ring with ( ( ) )
InSecEU-CMA Merkle-tree 1𝑛 , 𝑇 = 2 ; 𝑡, 1
2 members). For other ring members, the probability of knowing the { +log 𝓁1
≤ 2 ⋅ max 2 2 ⋅
identity of signer is 1(2 1). }
SPR
InSec (WOTS+ (1𝑛 , 𝜔, 𝑚) ; 𝑡, 1)
Theorem 2. The proposed ring signature scheme is unforgeable. Using the derived insecurity function for the Merkle tree combined
Proof. Suppose an attacker A could successfully forge a ring signature with W-OTS, which employs pseudorandom key generation and 𝐺𝑒𝑛2
with non-negligible probability P within polynomial time. We construct we arrive at the following results:
( )
a simulator S to challenge a ring signature algorithm claimed to be InSecEU-CMA XMSS(1𝑛 , 𝑇 = 2 ); 𝑡, 1
( )
secure by challenger C as follows: ≤ InSecEU-CMA WOTS+(1𝑛 , 𝜔, 𝑚); 𝑡, 1
Step 1: The challenger initializes 𝑛 signing instances with the MSS ( )
+ InSecEU-CMA Merkle-tree(1𝑛 , 𝑇 = 2 ); 𝑡, 1
signing algorithm, generating 𝑛 key pairs (𝑠𝑘, 𝑝𝑘) and sends all public
keys pk to simulator S. = InSecPRF (𝐹𝑛 , 𝑡 + 2 , 2 )
Step 2: Upon receiving the public keys, S initializes the ring sig- ⎧(2+log2 𝑙1 ) ⋅ InSecSPR (𝐻𝑛 , 𝑡 ), ⎫
nature algorithm by randomly selecting additional parameters and ⎪ PRF
⎪2 ⋅ InSec (𝐹𝑛 ; 𝑡 + 𝑙, 𝑙)+ ⎪
forwarding the public keys to attacker A. + 2 max ⎨ ( { OW
}) ⎬.
Step 3: In the query phase, A selects a message M and sends it to ⎪ UD
InSec (𝐹𝑛 ; 𝑡 ), ⎪
⎪ 𝜔 ⋅ InSec 𝐹𝑛 ; 𝑡 + max ⎪
S. Following the ring signature algorithm, S randomly selects a user ⎩ InSecSPR (𝐹𝑛 ; 𝑡 ) ⎭
𝑠 to generate the ring signature, computes 𝑌𝑠 , and forwards it to C.
10
S. Liu et al. Journal of Systems Architecture 160 (2025) 103345
Table 5
Test 16 XMSS-SHA2_10_256 signatures.
Number Signature time Verification time
0 1.990014 0.001119
1 1.980151 0.000947
2 1.969849 0.001210
3 1.965888 0.001184
4 1.969898 0.001056
5 1.980296 0.001144
6 2.017889 0.001093
7 2.054971 0.001101
8 2.016147 0.001241
9 2.020737 0.001267
10 1.954583 0.001016
11 2.021315 0.001060
12 2.029765 0.001043
Fig. 14. Signature generation time of 16 test results.
13 2.057487 0.001016
14 1.958401 0.001081
15 1.990919 0.001053
To prove XMSS is unforgeable under chosen-message attacks, we
consider the following factors:
Random Oracle Model: Assuming the hash function behaves as a
random oracle, an attacker has no foreknowledge of inputoutput pairs.
Irreversibility: WOTS+ security relies on the irreversibility of hash
chains; given a hash value 𝐻𝑖 (𝑥), finding the predecessor 𝐻𝑖1 (𝑥) is
infeasible.
Collision Resistance: The hash function must resist collisions, mak-
ing it nearly impossible for an attacker to produce distinct messages
that yield identical hash chains.
Fig. 15. Signature verification time of 16 test results.
5. Performance analysis
Table 6
Signature efficiency comparison table.
This study evaluates the performance of proposed scheme in densely
Scheme Number of Key Signature Verification
trafficked urban areas, focusing particularly on resistance to quantum
Members generation time/s time/s
attacks. The experiments are based on the Merkle tree-ring signature time/s
scheme, with a primary emphasis on security strength, as attacks in
OURS HBS 210 2.06 1.97 9.47e04
the IoV environments are expected to become increasingly complex, [33] LBS 10 0.07 0.06 0.04
especially with the advent of quantum attacks. Consequently, a high- [32] LBS 34.1e06 9.59e05 3.49e05
security, quantum-resistant signature scheme is essential for the IoV [25] HBS 210 0.16 0.11
systems.
The primary operations in the signature scheme include generating Table 7
public and private keys, measuring the time required for message Function comparison table of the scheme.
signing and verification, and instantiating the SHA-256 function as Scheme Post- Anonymity Traceability Application
the underlying hash function. Key parameters include the security quantum to IOV
parameter 𝑛, the Winternitz parameter 𝜔, and the number of ring security
members, with specific values assigned to each. These operations allow OURS HBS YES YES YES YES
[33] LBS NO YES YES YES
us to measure metrics such as key generation time, signature generation
[32] LBS YES NO NO YES
time, and signature verification time. [25] HBS YES YES YES NO
In this scheme, the digital signature algorithm is set to XMSS-
SHA2-10-256, utilizing the SHA-256 hash function with a Merkle tree
height of 10, enabling a maximum of 210 = 1024 possible ring signa-
tures. The number of signature tests is set to 16 to balance efficiency of Merkle tree as 10, and the number of ring members as 210 . Among
and data stability, ensuring valid results without excessive resource them, HBS stands for the scheme based on hash and LBS stands for a
consumption. scheme based on lattices.
To present the data more intuitively, the experimental results of the Comparing the scheme proposed in this paper with the scheme
16 tests shown in Table 5 are depicted in graphical form, resulting in in [33], it can be seen that the post-quantum ring signature scheme
Fig. 14 and Fig. 15. Fig. 14 illustrates the signature generation times based on Merkle tree has great advantages. First, in this evaluation, the
across the 16 tests, while Fig. 15 displays the signature verification number of ring members our scheme can accommodate is 210 , which
times. These figures show that both the signature generation time and is much larger than the number of ring members evaluated in [33].
verification time fluctuate within a certain range, indicating variability When the road section is wider and crowded, the scheme proposed in
rather than fixed values. Select one of the 16 test results to compare this paper is more suitable. Secondly, this scheme has post-quantum
with relevant literature studies. The attributes of comparison include security, which is more secure; Moreover, although the key generation
key generation time, signature generation time, signature verification time of our scheme is slightly longer than that of the scheme with
time, resistance to quantum attacks, anonymity, traceability, and ap- fewer ring members in [33], it is much faster in terms of signature time
plication to the IoV. The comparison results are drawn in Tables 6 and and verification time, especially the verification time is nearly 44 times
7, In our scheme, we set the parameters as n = 32, 𝜔 = 16, the height faster than that of [25].
11
S. Liu et al. Journal of Systems Architecture 160 (2025) 103345
Compared with the scheme in [32], the outstanding feature of Data availability
the scheme in this paper is ring signature, which has anonymity and
traceability, making it more suitable for the Internet of vehicles en- No data was used for the research described in the article.
vironment. In addition, the scheme in this paper uses Merkle tree
structure, which reduces the storage cost of public key and signature.
References
In general, lattice signature may require special optimization in high
performance computing. The algorithm maturity is not high, but the
[1] I. Wanger, Car production: Number of cars produced worldwide, Statista (2020).
underlying hash function of the post-quantum ring signature scheme in [2] Patrick Miner, Barbara M. Smith, Anant Jani, Geraldine McNeill, Alfred
this paper is SHA-256, and the SHA-256 function has passed the test of Gathorne-Hardy, Car harm: A global review of automobilitys harm to people
time in many practical applications, and has high algorithm maturity. and the environment, J. Transp. Geogr. 115 (2024) 103817.
Comparing the scheme in this paper with the scheme in [25], it can [3] Juan Contreras-Castillo, Sherali Zeadally, Juan Antonio Guerrero-Ibañez, Internet
of vehicles: Architecture, protocols, and security, IEEE Internet Things J. 5 (5)
be seen that both papers are based on hash function. The advantages (2018) 37013709, http://dx.doi.org/10.1109/JIOT.2017.2690902.
of the scheme in this paper are as follows: First, although the time [4] David Deutsch, Quantum theory, the ChurchTuring principle and the universal
of signature generation in [25] is nearly 12 times faster than that in quantum computer, Proc. R. Soc. A 400 (1818) (1985) 97117.
this paper, the time of signature verification in this paper is nearly 100 [5] Rasha Shajahan, Kurunandan Jain, Prabhakar Krishnan, A survey on NIST 3
rd round post quantum digital signature algorithms, in: 2024 5th International
times faster than that in [25]. In addition, the scheme in this paper is
Conference on Mobile Computing and Sustainable Informatics, ICMCSI, IEEE,
also applied to the vehicle networking model. 2024, pp. 132140.
As shown in Table 7, this study compares the attributes of Post- [6] David A. Cooper, Daniel C. Apon, Quynh H. Dang, Michael S. Davidson, Morris J.
quantum, Anonymity, Traceability, and Application to IoV. Dworkin, Carl A. Miller, et al., Recommendation for stateful hash-based signature
The comparison reveals that our scheme offers post-quantum security, schemes, NIST Spec. Publ. 800 (208) (2020) 208800.
[7] Samira El Madani, Saad Motahhir, Abdelaziz El Ghzizal, Internet of vehicles:
anonymity, traceability, and the ability to apply to IoV, with the
concept, process, security aspects and solutions, Multimedia Tools Appl. 81 (12)
advantages of our proposed scheme becoming more evident through (2022) 1656316587.
this comprehensive comparison. [8] Cesar Castellon, Swapnoneel Roy, Patrick Kreidl, Ayan Dutta, Ladislau Bölöni,
Energy efficient merkle trees for blockchains, in: 2021 IEEE 20th International
6. Conclusion Conference on Trust, Security and Privacy in Computing and Communications,
TrustCom, IEEE, 2021, pp. 10931099.
[9] Daniel J. Bernstein, Andreas Hülsing, Stefan Kölbl, Ruben Niederhagen, Joost
The hash-based post-quantum ring signature scheme offers advan- Rijneveld, Peter Schwabe, The SPHINCS+ signature framework, in: Proceedings
tages such as high signature efficiency, good scalability, and inde- of the 2019 ACM SIGSAC Conference on Computer and Communications Security,
pendence from complex mathematical assumptions. In the context of 2019, pp. 21292146.
[10] Kaiyi Zhang, Hongrui Cui, Yu Yu, SPHINCS-𝛼: A compact stateless hash-based
increasing security threats posed by advancements in quantum com-
signature scheme, 2022, Cryptology ePrint Archive.
puting, applying post-quantum ring signatures in IoV can enhance [11] Mikhail Kudinov, Andreas Hülsing, Eyal Ronen, Eylon Yogev, SPHINCS+ C:
anonymity and privacy protection while ensuring quantum-resistant Compressing SPHINCS+ with (almost) no cost, 2022, Cryptology ePrint Archive.
security. This paper presents a hash-based post-quantum ring signature [12] Sun Siwei, Liu Tianyu, Guan Zhi, SM3-based post-quantum digital signature
scheme built on the XMSS algorithm and demonstrates its application schemes, J. Cryptologic Res. 10 (1) (2023) 46.
[13] Andreas Hülsing, Mikhail Kudinov, Recovering the tight security proof of
in the IoV system. The proposed scheme is analyzed and proven secure.
SPHINCS+, in: International Conference on the Theory and Application of
Performance analysis is conducted following 16 experimental tests, Cryptology and Information Security, Springer, 2022, pp. 333.
with comparisons made to other similar schemes. The results show [14] Andreas Hülsing, Denis Butin, Stefan Gazdag, Joost Rijneveld, Aziz Mohaisen,
that the proposed scheme exhibits significant advantages in signature XMSS: Extended Merkle Signature Scheme, Technical Report, 2018.
verification time compared to other approaches. This is due to the [15] Jan Philipp Thoma, Tim Güneysu, A configurable hardware implementation of
XMSS, 2021, Cryptology ePrint Archive.
efficient hash computations and Merkle tree verification paths, which [16] Siwei Sun, Tianyu Liu, Zhi Guan, Yifei He, Jiwu Jing, Lei Hu, Zhenfeng
maintain low time complexity and high efficiency even with large Zhang, Hailun Yan, XMSS-SM3 and MT-XMSS-SM3: Instantiating extended Merkle
data sets. Moreover, the scheme satisfies the properties of quantum signature schemes with SM3, 2022, Cryptology ePrint Archive.
resistance, anonymity, traceability, and applicability to IoV. [17] Andreas Hülsing, W-OTS+shorter signatures for hash-based signature schemes,
in: Progress in CryptologyAFRICACRYPT 2013: 6th International Conference on
Future research will aim to further improve the practicality and
Cryptology in Africa, Cairo, Egypt, June 22-24, 2013. Proceedings 6, Springer,
security of the scheme in response to the evolving threats posed by 2013, pp. 173188.
quantum computing, and second, interdisciplinary collaboration can [18] Kaiyi Zhang, Hongrui Cui, Yu Yu, Revisiting the constant-sum winternitz
be strengthened in future research to provide valuable insights for one-time signature with applications to SPHINCS+ and XMSS, in: Annual
optimizing solutions in real-world scenarios. International Cryptology Conference, Springer, 2023, pp. 455483.
[19] Xie Jia, Liu Shizhao, Wang Lu, Research progress and prospects of ring signature
technology., J. Front. Comput. Sci. Technol. 17 (5) (2023).
CRediT authorship contribution statement [20] Rohit Chatterjee, Kai-Min Chung, Xiao Liang, Giulio Malavolta, A note on the
post-quantum security of (ring) signatures, in: IACR International Conference on
Shuanggen Liu: Conceptualization. Xiayi Zhou: Writing original Public-Key Cryptography, Springer, 2022, pp. 407436.
[21] Yuxi Xue, Xingye Lu, Man Ho Au, Chengru Zhang, Efficient linkable ring signa-
draft. Xu An Wang: Supervision. Zixuan Yan: Investigation. He Yan:
tures: new framework and post-quantum instantiations, in: European Symposium
Formal analysis. Yurui Cao: Resources. on Research in Computer Security, Springer, 2024, pp. 435456.
[22] Abida Haque, Alessandra Scafuro, Threshold ring signatures: new definitions
Declaration of competing interest and post-quantum security, in: Public-Key CryptographyPKC 2020: 23rd IACR
International Conference on Practice and Theory of Public-Key Cryptography,
Edinburgh, UK, May 47, 2020, Proceedings, Part II 23, Springer, 2020, pp.
The authors declare that they have no known competing finan-
423452.
cial interests or personal relationships that could have appeared to [23] Maxime Buser, Joseph K. Liu, Ron Steinfeld, Amin Sakzad, Post-quantum id-based
influence the work reported in this paper. ring signatures from symmetric-key primitives, in: International Conference on
Applied Cryptography and Network Security, Springer, 2022, pp. 892912.
Acknowledgments [24] J. Odoom, X. Huang, Z. Zhou, et al., Linked or unlinked: A systematic review
of linkable ring signature schemes, J. Syst. Archit. 134 (2023) 102786.
[25] Shiwei Xu, Tao Wang, Ao Sun, Yan Tong, Zhengwei Ren, Rongbo Zhu,
This work was supported by the National Natural Science Founda- Houbing Herbert Song, Post-quantum anonymous, traceable and linkable au-
tion of China (NSFC) under Grant No. 62172436.The first author and thentication scheme based on blockchain for intelligent vehicular transportation
the third author are the corresponding authors of this paper. systems, IEEE Trans. Intell. Transp. Syst. (2024).
12
S. Liu et al. Journal of Systems Architecture 160 (2025) 103345
[26] Nyothiri Aung, Tahar Kechadi, Tao Zhu, Saber Zerdoumi, Tahar Guerbouz, [33] Cui Yongquan, Cao Ling, Zhang Xiaoyu, Privacy protection of internet of vehicles
Sahraoui Dhelim, Blockchain application on the internet of vehicles (iov), based on lattice-based ring signature, Chinese J. Comput. 42 (5) (2019) 980992.
in: 2022 IEEE 7th International Conference on Intelligent Transportation [34] Cesar Castellon, Swapnoneel Roy, Patrick Kreidl, Ayan Dutta, Ladislau Bölöni,
Engineering, ICITE, IEEE, 2022, pp. 586591. Energy efficient merkle trees for blockchains, in: 2021 IEEE 20th International
[27] Haibin Zhang, Jiajia Liu, Huanlei Zhao, Peng Wang, Nei Kato, Blockchain-based Conference on Trust, Security and Privacy in Computing and Communications,
trust management for internet of vehicles, IEEE Trans. Emerg. Top. Comput. 9 TrustCom, IEEE, 2021, pp. 10931099.
(3) (2020) 13971409. [35] David Derler, Sebastian Ramacher, Daniel Slamanig, Post-quantum zero-
[28] Mirador Labrador, Weiyan Hou, Implementing blockchain technology in the knowledge proofs for accumulators with applications to ring signatures from
internet of vehicle (IoV), in: 2019 International Conference on Intelligent
symmetric-key primitives, in: Post-Quantum Cryptography: 9th International Con-
Computing and Its Emerging Applications, ICEA, IEEE, 2019, pp. 510.
ference, PQCrypto 2018, Fort Lauderdale, FL, USA, April 9-11, 2018, Proceedings
[29] Y. Liu, Q. Xia, X. Li, et al., An authentication and signature scheme for UAV-
9, Springer, 2018, pp. 419440.
assisted vehicular ad hoc network providing anonymity, J. Syst. Archit. 142
[36] Xinyu Zhang, Ron Steinfeld, Joseph K. Liu, Muhammed F. Esgin, Dongxi
(2023) 102935.
[30] X. Feng, X. Wang, K. Cui, et al., A distributed message authentication scheme Liu, Sushmita Ruj, DualRing-PRF: Post-quantum (linkable) ring signatures from
with reputation mechanism for internet of vehicles, J. Syst. Archit. 145 (2023) Legendre and power residue PRFs, in: Australasian Conference on Information
103029. Security and Privacy, Springer, 2024, pp. 124143.
[31] S. Thapliyal, M. Wazid, D.P. Singh, et al., Robust authenticated key agreement [37] David A. Cooper, Daniel C. Apon, Quynh H. Dang, Michael S. Davidson, Morris J.
protocol for internet of vehicles-envisioned intelligent transportation system, J. Dworkin, Carl A. Miller, et al., Recommendation for stateful hash-based signature
Syst. Archit. 142 (2023) 102937. schemes, NIST Spec. Publ. 800 (208) (2020) 208800.
[32] Nikhil Verma, Swati Kumari, Pranavi Jain, Post quantum digital signature change [38] Ralph C. Merkle, A certified digital signature, in: Conference on the Theory and
in iota to reduce latency in internet of vehicles (iov) environments, in: 2022 Application of Cryptology, Springer, 1989, pp. 218238.
International Conference on IoT and Blockchain Technology, ICIBT, IEEE, 2022,
pp. 16.
13

View File

@@ -0,0 +1,929 @@
Journal of Systems Architecture 160 (2025) 103341
Contents lists available at ScienceDirect
Journal of Systems Architecture
journal homepage: www.elsevier.com/locate/sysarc
A load-balanced acceleration method for small and irregular batch matrix
multiplication on GPU
Yu Zhang a , Lu Lu a,b ,, Zhanyu Yang a , Zhihong Liang c,d , Siliang Suo c,d
a School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China
b
Peng Cheng Laboratory, Shenzhen, 518055, China
c
Electric Power Research Institute, CSG, Guangzhou, China
d
Guangdong Provincial Key Laboratory of Power System Network Security, Guangzhou, China
ARTICLE INFO ABSTRACT
Keywords: As an essential mathematical operation, GEneral Matrix Multiplication (GEMM) plays a vital role in many
Batch GEMM applications, such as high-performance computing, machine learning, etc. In practice, the performance of
Thread workload GEMM is limited by the dimension of matrix and the diversity of GPU hardware architectures. When dealing
Multi-thread kernel
with batched, irregular and small matrices, the efficiency of GEMM usually performs poorly. To this end, a
Tiling algorithm
common approach is to segment the matrix into multiple tiles and utilize parallelism between workgroups in
GPU to compute the results. However, previous works only consider tile size and inter-workgroup parallelism
and ignore the issues of low computational efficiency and hardware resource utilization caused by the
difference in workloads between wavefronts. To address these issues, we propose a load-balanced batch GEMM
acceleration method, consisting of a multi-thread kernel design and an efficient tiling algorithm. The multi-
thread kernel design can address the workload unbalance between wavefronts in different workgroups, and the
efficient tiling algorithm can choose the optimal tiling scheme with the new thread-level parallelism calculation
method to achieve load-balanced task allocation. Finally, various comparative experiments were conducted
on two GPU platforms: AMD and NVIDIA. Experimental results indicate the proposed method outperforms
previous methods.
1. Introduction Many real-world applications, such as deep learning, involve ir-
regular, small-size matrix multiplication operations in their computa-
GEneral Matrix Multiplication (GEMM) is a standard computing tions [11]. For example, in Convolutional Neural Networks (CNN) [12
kernel that plays an important role in high-performance computing [1], 14], the structure of these models contains a large number of convo-
artificial intelligence [2], image processing [3], and other research lutional layers. The scale of the convolution kernel tends to be small
fields. With the explosive growth of data volume and the emergence of (e.g. 1*1 and 3*3). Convolution operations are converted to GEMM
various algorithms, the demand for high-performance GEMM comput- using Im2col function, and the dimension of the matrix is typically
ing is increasing [4,5]. Additional stream processors and memory are less than 1000 [15,16]. These small GEMM computations prevent the
integrated into the GPU to cater to this trend, providing tremendous GPU from fully exploiting its hardware computing potential. In this
computational power for GEMM acceleration. To fully utilize the hard- case, the scheduling overhead between batch GEMMs and the regularity
ware acceleration capability, AMD and NVIDIA, provide developers
of the matrix poses challenges to computational performance [17,18].
with a platform for parallel computing based on GPU (ROCm and
For a GEMM, the tiling is a standard solution method. The matrix is
CUDA). Based on these parallel computing acceleration platforms, var-
segmented into multiple tiles, and a thread block is responsible for
ious optimization algorithms and acceleration libraries have been pro-
computing individual tiles. Since each tile is independent, multiple tiles
posed and demonstrated to have powerful effects, such as rocBLAS [6],
can be computed in parallel by using multiple threads in GPU, to speed
cuBLAS [7], MAGMA [8], etc. These methods achieve optimal computa-
tional task allocation through hardware resource scheduling and thread up the computation process of GEMM. The larger dimension of tile will
parallelism to accelerate the matrix multiplication operation [9,10]. increase the Thread-Level Parallelism (TLP) of a single tile and also will
Corresponding author at: School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China.
E-mail addresses: yuzhang0722@163.com (Y. Zhang), lul@scut.edu.cn (L. Lu), yangzhanyu@hotmail.com (Z. Yang), liangzh@csg.cn (Z. Liang),
suosl@csg.cn (S. Suo).
https://doi.org/10.1016/j.sysarc.2025.103341
Received 3 September 2024; Received in revised form 3 November 2024; Accepted 8 January 2025
Available online 23 January 2025
1383-7621/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Y. Zhang et al. Journal of Systems Architecture 160 (2025) 103341
reduce the number of tile, resulting in the failure to fully utilize the 2. Related work and motivation
hardware resources of GPU [19,20]. The Instruction-Level Parallelism
(ILP) of a single thread is related to the K-dimension. Generally, for a 2.1. Related work
large enough matrix size, it can fully use GPU hardware resources and
achieve higher TLP and ILP [21,22]. Several approaches have been proposed for batch GEMM computa-
To improve computational efficiency, previous studies have pro- tion, which mainly focus on algorithm-level optimization or architecture-
posed some acceleration methods for matrix multiplication. For in- level optimization. The former mainly explores lower bounds on the
stance, rocBLAS [6] and cuBLAS [7] provide batch GEMM API time complexity of GEMM operations at the mathematical level and
(rocblasSgemmBatched and cublasSgemmBatched), which can support optimizes the computational effort. The latter is based on different GPU
multiple GEMMs to be simultaneously calculated on GPUs. However, architecture features and uses corresponding optimization techniques
these APIs support only uniform matrix sizes that considerably limit to improve the computational efficiency of GEMM. In algorithm-level
these applications. NVIDIA also provides a C++-style template library, optimization, Strassen et al. [24] proposed a novel GEMM algorithm
CUTLASS [23], which utilizes built-in tile templates and sorting to
based on the property that matrix addition is faster than matrix multi-
accelerate matrix multiplication operations. In fact, the size of matrices
plication to speed up the computational process, which uses seven-time
is variable in many real-world applications [11]. To solve this issue,
multiplications and multiple addition operations instead of eight-time
a Vbatch GEMM route that supports batch GEMM in various sizes is
multiplications. This approach mathematically reduced the time com-
designed and implemented by MAGMA (magmablas_sgemm_vbatched). It
plexity of GEMM to 𝑂(𝑛2.81 ) for the first time. To reduce the require-
adapts to batch GEMMs with multiple tiling strategies, assigning the ap-
ment of Strassens algorithm for extra memory space, three different
propriate tile to a single GEMM for huge performance gains. Although
methods were proposed in [25]: pre-additions, overwriting the input
variable sizes are supported in MAGMA, it still has some limitations.
First, MAGMA only supports some coarse-grained tiling strategies that matrix, and recursive scheduling to alleviate this problem. At the
are not appropriate for all GEMM. Coarse-grained tiling results in an same time, due to the powerful effect of deep neural networks in
unbalanced kernel workload and GPU utilization reduction. Second, the various domains, Alhussein Fawzi et al. [26] transformed the process
grid size is determined by the tiling of the largest matrix, which leads of finding the optimal complexity of matrix multiplication into a tensor
to idle threads and a waste of GPU computing power. Third, the lack decomposition problem and used reinforcement learning to explore
of an evaluation criterion for tiling leads to lower efficiency of strategy lower bounds on the complexity of matrix multiplication. In particular,
choice. for a 4 × 4 matrix, the multiplication number was as low as 47 multi-
To thoroughly support batch GEMM with variable sizes, it is es- plications. This performance was better than the two-level Strassens
sential to design a tiling algorithm that can be adapted to all GEMMs algorithm, which involves 49 multiplications. Although the above
and adaptively choose tile sizes, not limited to single size. The optimal approach reduces the mathematical complexity of matrix multiplication
tiling for each GEMM is different, depending on the size of the matrix operations, it is difficult to take advantage of the performance benefits
dimensions (𝑀, 𝑁, 𝐾). How to choose a suitable tile is a challenge of these approach due to the neglect of computational scheduling
for batch GEMM. At the same time, an evaluation criterion based on strategies and multi-level memory architecture features on the GPU.
the current GPU hardware and tiling strategy is also essential. With In architecture-level optimization, GPU vendors (NVIDIA and AMD)
GPU hardware, an appropriate tiling for each GEMM can be chosen have designed and implemented computing libraries such as cuBLAS [6]
to fully utilize the GPU computing capabilities and achieve better and rocBLAS [7] based on their parallel computing platforms to im-
computational performance. How to measure the effectiveness of the prove GPU hardware utilization and parallelism. However, due to the
tiling algorithm on the GPU hardware is a challenging problem. The tile restriction of uniform-sized matrix, the performance is poor when faced
with various sizes can lead to significant differences in computational with small and irregular batch GEMMs. Although NVIDIA provides
effort within each workgroup, further to an unbalanced distribution of a C++-style template library, the small size of the matrix and the
computational tasks and excessive load differences between threads. lack of assembly-level optimizations make it difficult for CUTLASS
Hence, for tiles with various sizes, balancing thread computation and to fully exploit its performance advantages for irregular and small
data loading during computation is also a challenge for batch GEMM. matrix multiplication [23]. These irregular and small-sized matrices
To address the above challenges, we propose a batch GEMM accel-
often lead to unbalanced workloads among threads in different work-
eration method with a multi-thread kernel design. Furthermore, an ef-
groups, which can reduce kernel performance. For Sparse GEneral
ficient tiling algorithm is proposed to achieve load-balanced and higher
Matrix-Matrix multiplication (SpGEMM), the matrixs sparsity leads to
hardware resource utilization. Our contributions can be summarized as
significant differences in thread workloads [27,28]. To address the
follows:
unbalanced workload, Chen et al. [29] optimized the matrix segmen-
• A multi-threaded kernel design scheme is proposed to balance tation by analyzing the distribution of the floating point calculations
thread computation and data loading in different workgroups to of the CSR-based SpGEMM, which achieves load balance and perfor-
compute the various tiles. mance improvement on Sunway TaihuLight. For the issue of workload
• A novel TLP computation method is designed to select the optimal unbalance in threads, it is necessary to conduct a detailed analysis
tiling algorithm by combining the kernel occupancy of the GPU of the computation process and hardware platform characteristics to
and the tiling operation. design an efficient parallel framework implementation [30,31]. Xiao
• An efficient tiling algorithm is implemented by considering the et al. [32] introduce a fine-grained partitioning strategy to select ap-
GPU hardware architecture and the batch GEMM workload. propriate segmentation dimensions, efficiently utilizing the parallelism
• The proposed method can efficiently handle batch irregular GEMM of multi-thread and improving the performance of binary sparse tensor
and achieve state-of-the-art performance on AMD and NVIDIA contracts. The diversity of matrix sizes makes it difficult to utilize a
GPU platforms. unified routine for calculations, resulting in some threads being idle
The rest of the paper is organized as follows. Section 2 provides in CU [33,34]. Indeed, the size of matrices is variable and irregular
related work and motivation. Section 3 introduces background on batch in various scientific computing scenarios. To overcome the matrix
GEMM, GPU architecture, and kernel occupancy. Section 4 presents restriction of uniform size, MAGMA [8] proposes a Vbatch routine
the details of the multi-thread kernel design and load-balanced tiling to support batch GEMM with various sizes. In this way, it uses a 3D
algorithm. Section 5 demonstrates and evaluates the experimental re- grid to indicate batch GEMMs kernel design, where grid.z represents
sult. Section 6 provides the conclusions of the paper and future work. batch size. Each GEMM corresponds to one of the 2D-grid planes, and
The source code of this paper can be obtained in this repository link: the size of the two-dimensional plane (grid.x, grid.y) is determined by
https://github.com/zhangyu0722/BatchGEMM.git. the largest GEMM. In the case of irregular GEMM, if the dimension
2
Y. Zhang et al. Journal of Systems Architecture 160 (2025) 103341
Fig. 1. GEMM and batch GEMM schematic diagram.
difference between the largest GEMM and the rest is too large, a large 3. Background
number of threads and workgroups will be idle, resulting in a waste of
GPU computing resources. For various parallel acceleration platforms, 3.1. GEMM and batch GEMM
different hardware characteristics, such as register size and number of
CUs, will affect the allocation of computing resources in the kernel. To For a single GEMM, its accumulation routine is 𝐶 = 𝛼 𝐴𝐵+𝛽 𝐶, where
ensure kernel performance, it is necessary to flexibly set parameters 𝐴𝑅𝑀×𝐾 , 𝐵𝑅𝐾×𝑁 and 𝐶𝑅𝑀×𝑁 are dense matrices, 𝑀, 𝑁, and
based on different matrix sizes and hardware architectures [9,35]. 𝐾 represent matrix dimensions, and 𝛼 and 𝛽 are constant scalars. A
To solve this problem, a coordinated tiling and batching strategy is common approach is tiling matrix C into multiple tiles [21,36], which
proposed in [21], where a different tiling strategy is used for each utilizes the parallel computing of thread in GPU to calculate each tile
GEMM in batch GEMM and appropriate batching is used according and splices together the result. As shown in Fig. 1 (b), given a GEMM
to the tile size to improve the computational efficiency of the GPU.
with size 𝑀 × 𝑁 × 𝐾, the matrix C is segmented into multiple tiles with
Wang et al. [36] proposed the sort-up algorithm based on the GEMM
𝑇𝑚 × 𝑇𝑛 . Each workgroup is responsible for the calculation of a tile and
workload and split-down in the tiling process, which can segment large
needs to access the row section of matrix A with size 𝑇𝑚 ×𝐾 and column
tiles into multiple smaller tiles. This approach can make better use of
section of matrix B with size 𝐾 × 𝑇𝑛 . However, the row cross-section
CU utilization when the number of GEMM is limited.
of A and the column cross-section of B (represented in Fig. 1 (b) by
2.2. Motivation the gray parts of matrices A and B, respectively) are too large to store
in shared memory and registers. Hence, the row section of A and the
Although the above-mentioned methods improve the parallel com- column section of B are segments of multiple A tiles with 𝑇𝑚 × 𝑇𝑘 and B
puting efficiency of batch GEMM on GPU from various perspectives, tiles with 𝑇𝑘 × 𝑇𝑛 , respectively. The partial result of C can be obtained
there are two problems. One is that the workload of threads varies by calculating with A tile and B tile, and accumulative partial results
significantly across the kernel. In the above approach, tiles with various can obtain the final result.
sizes are designed, and each tile is responsible for the corresponding To batch-run multiple GEMMs, a naive routine is computed for
kernel, where the number of threads is fixed. In general, larger tiles each GEMM individually. However, when the matrix size is small,
have better TLP. This will also increase the workload of each thread a single GEMM does not fully utilize the GPUs computing power,
for large-size tiles, and the thread responsible for computing large tiles leaving the CU idle [37,38]. To avoid this situation, a batch GEMM
requires more hardware resources (VGPR, SGPR, LDS) and computing method is proposed to design multiple kernels for various GEMM in
time. The other one is that differences between wavefronts within dif- the GPUs [36,39]. Compared to GEMM, batch GEMM is expressed in
ferent workgroups are ignored in the TLP calculations. The workgroup (𝑀 × 𝑁 × 𝐾 × 𝐵𝑠𝑖𝑧𝑒 ), where 𝑀, 𝑁 and 𝐾 represent the dimensions of
will be transformed into multiple wavefronts during GPU computation the matrix, and 𝐵𝑠𝑖𝑧𝑒 represents the batch size. A batch GEMM is 3D-
and be executed in parallel on the CU. Each CU can run multiple dimension grid, where grid.z is batch sizes, and grid.x and grid.y are the
wavefronts simultaneously, and the number of wavefronts depends on
lengths and widths of a two-dimensional plane respectively [40]. To
the hardware resources required by the wavefront. Thus, the TLP on the
balance the workload of a batch GEMM, a variety of tile sizes are used
GPU should be determined by the number of threads in the wavefront
for GEMM tiling. The two-dimensional grid size has the corresponding
that can be executed in parallel on the CU.
matrix C and tiling strategy. Each tile is responsible for the correspond-
To solve the above problems, we propose an efficient and load-
balanced batch GEMM acceleration method, which consists of two ing workgroup. A workgroup is decomposed into multiple wavefronts
parts: a multi-thread kernel design scheme and an efficient tiling algo- that execute on the CU. The 3D grid of batch GEMM is shown in Fig. 1
rithm. A multi-thread kernel design is proposed to balance the amount (a).
of loading and computation in the thread corresponding to each tile.
Tiles with various sizes correspond to the number of threads selected. 3.2. GPU architecture and kernel occupancy
Although this is limited by the parallel programming interfaces of the
CUDA and ROCm platforms, the number of threads responsible for With the improvement of hardware architecture and parallel com-
computing a tile is uniform. To overcome this shortcoming, we use the puting programming platforms (such as ROCm1 and CUDA2 ), GPUs
corresponding filtering operation in the kernel execution process to ef- are becoming the most popular hardware accelerator. The two most
fectively alleviate this problem. An efficient tiling algorithm can choose commonly used GPUs are AMD and NVIDIA, widely used in various
the optimal scheme based on different GEMMs and GPUs. To measure scientific computing platforms. However, some basic concepts of ex-
the effect of tiling, we propose a new way of TLP computation based pression in ROCm and CUDA are different. We chose AMDs official
on wavefronts. The optimal tiling scheme is obtained by adjusting the
tiling strategy according to the TLP. Finally, we obtain an efficient tiling
algorithm based on the new TLP calculation method. In Section 4, the 1
https://rocm.docs.amd.com/en/latest/
2
details of the proposed method are introduced. https://docs.nvidia.com/cuda/
3
Y. Zhang et al. Journal of Systems Architecture 160 (2025) 103341
Table 1
ROCm/CUDA terminology.
ROCm CUDA Description
Compute Unit (CU) Streaming One of many parallel vector processors in a GPU that contains
Multiprocessor (SM) parallel ALUs. All waves in a workgroup are assigned
to the same CU.
Kernel Kernel Functions launched to the GPU that are executed by multiple
parallel workers on the GPU. Kernels can work in
parallel with CPU.
Wavefront Warp Collection of operations that execute in lockstep, run the
same instructions, and follow the same control-flow path.
Individual lanes can be masked off.
Workgroup Thread block Think of this as a vector thread. A 64-wide wavefront
is a 64-wide vector op.
Work-item/Thread Thread GPU programming models can treat this as a separate thread
of execution, though this does not necessarily get
forward sub-wavefront progress.
Global Memory Global Memory DRAM memory accessible by the GPU that goes
through some layers cache.
Local Memory Shared Memory Scratchpad that allows communication between wavefront
in a workgroup.
Private Memory Local Memory Per-thread private memory often mapped to registers.
terminology for this paper to provide precise specifications. To clarify resources. In order to fully utilize the hardware resources of the GPU
some differences and relationships between ROCm and CUDA terms, a and improve the efficiency of parallel computing, the kernel occupancy
comparison of terminology is given in Table 1. should be improved as much as possible without data overflow [46,47].
A GPU is composed of multiple Shader Engines (SE) and a com- In batch GEMM, an efficient kernel design should properly allocate
mand processor. Each SE has its own workload manager. One SE is the data loading and computation workload for each work-item in the
integrated with multiple CU and workload manager. Each CU contains wavefront, so that the memory space and computing power on the CU
an enormous amount of Arithmetic and Logic Units (ALUs), a small can be more efficiently utilized [48,49].
number of control units, and caches. Hence, GPUs are suitable for a
large number of simple parallel computing tasks. A GPU kernel consists 4. Overview
of one or multiple workgroups, the size of which is determined by the
number of wavefronts and threads. On the memory hierarchy, the GPU 4.1. Multi-thread kernel design
has global memory, local memory, and private memory from slow to
fast according to memory access speed, and local memory and private Tile size and kernel design are closely related in the design of batch
memory are much smaller than global memory [41,42]. GEMM algorithms, and there are two matrix tile design routes. The
Kernel Occupancy represents the actual utilization of computing first way is to design a tile to adapt to all GEMMs, and the second
unit resources by a kernel function on GPU, which is the ratio of is to design the various tiles to adapt to different GEMMs. Compared
actived wavefront to the maximum wavefront supported by CU [35,43]. with the first method, for irregular GEMM, the latter method is more
An active wavefront running on CU requires resources such as Vec- flexible and efficient to utilize the computing resources of GPU. For
tor General-Purpose Register (VGPR), Scalar General-Purpose Registers GEMMs with various shapes and sizes, using a single tile can easily
(SGPR), Local Data Share (LDS), etc. A wavefront can be activated lead to increased workload differences between threads in multiple
and run on a CU when all required resources are available. When the
workgroups, affecting the allocation of computing resources. In this
utilization of CU resources is low, the number of active wavefronts
paper, we perform a multi-thread kernel design for the second matrix
is small, which leads to the waste of hardware resources and the
segmentation method. Two different tile design strategies are shown
degradation of the parallel performance of the kernel. On the other
in Fig. 2. Here we present the effect of two different tile strategies on
hand, when the number of active wavefronts in the CU increases, the
the occupancy of the 3D grid. For the batch GEMM, different tile sizes
resources used by each wavefront and the available register storage
lead to different numbers of workgroups, resulting in different 3D grid
space of each work-item in the wavefront decrease [44,45].
occupancies.
The number of active wavefronts on a CU is mainly limited by the
For a single GEMM, matrix C is tiled into multiple tiles. The tile
following factors: the number of work-items in each workgroup and
size can be flexibly designed, and each tile can be run in parallel
the sizes of VGPR, SGPR, and LDS. For example, in AMDs MI1003
without data interference. Each tile is calculated by the corresponding
and MI210,4 a wavefront consists of 64 work-terms. When the number
workgroup and can be represented by a 2D-grid as a whole. When the
of work-items in a workgroup is less than or equal to 64, only one
size and number of tiles is large enough, efficient parallel execution
wavefront is included. The VGPR, SGPR, and LDS sizes on the CU have a
efficiency can usually be obtained. However, in real-world cases, the
corresponding upper bound for each work-item. According to the kernel
size of matrices in batch GEMM tends to be small and irregular,
design, the resources on the CU need to be allocated before executing
which leads to poor performance of traditional methods. Therefore, the
each work-item. When resource requirements of the work-item are
previous method adopts a variety of tiles to adapt to the corresponding
satisfied, the wavefront can be active and run on the CU. Otherwise,
GEMM, and each tile is based on a unified number of threads, which
it will not run until other wavefronts accomplish tasks and release
will lead to the workload of threads in large-scale tiles being much
larger than that of small tiles. This gap in the workload of threads
3
https://www.amd.com/system/files/documents/instinct-mi100- results in unbalanced thread loading and reduces GPU parallel com-
brochure.pdf puting efficiency. Table 2 lists the detailed parameters for tiles with
4
https://www.amd.com/content/dam/amd/en/documents/instinct- various sizes based on the same work-item design (The number of work-
business-docs/white-papers/amd-cdna2-white-paper.pdf items in the kernel is 128). 𝑊𝐶 𝑃 and 𝑊𝐷𝐿 represent the computation
4
Y. Zhang et al. Journal of Systems Architecture 160 (2025) 103341
Fig. 2. Two different tile design strategies for batch GEMMs. ((a) All GEMMs adopt the same tiling scheme, which is divided into multiple tiles of the same size. (b) Different
GEMMs adopt different tiling schemes and are divided into multiple tiles of different sizes.).
Table 2 speed of global memory is considerably lower than that of registers,
The common kernel design scheme for batch GEMM (There are significant workload threads data access efficiency decreases, and overall time consumption
gaps between threads).
increases. At the same time, since the variety of thread workloads,
Tile 𝑇𝑚 𝑇𝑛 𝑇𝑘 𝑊𝐶 𝑃 𝑊𝐷𝐿
when a thread with a heavy workload is run on the CU, the number
small 16 16 8/16 2 4/6 of active wavefronts on the CU is less, resulting in the CUs kernel
medium 32 32 8/16 8 12/16
large 64 64 8/16 32 40/48
occupancy (The ratio between the number of active wavefronts and the
maximum number of supported wavefronts) will be reduced. The state
of the CU with low kernel occupancy will be longer due to the longer
work-item computation time.
amount and data loading amount of work-item, respectively, and their To solve this problem, we propose a multi-thread kernel design,
calculation expressions are considered as: which ensures that the workload of each thread is balanced as much as
𝑇 × 𝑇𝑛 possible. The experimental results in Fig. 3 show that multiple kernels
𝑊𝐶 𝑃 = 𝑚 (1)
𝑊𝑛𝑢𝑚 performance varies when calculating the same tile. For example, the
𝑇𝑚 × 𝑇𝑛 + 𝑇𝑚 × 𝑇𝑘 + 𝑇𝑘 × 𝑇𝑛 128-thread kernel performs best when calculating a tile with 32*32,
𝑊𝐷 𝐿 = (2) as shown in Fig. 3. The performance gap mentioned above is mainly
𝑊𝑛𝑢𝑚
because of the varying workloads of threads under different kernels,
where 𝑊𝑛𝑢𝑚 represents the number of work-items responsible for com-
which affects the overall performance. For the 128-thread kernel, when
puting the tile.
calculating a tile with 32*32, each thread needs to complete the
For different tiles, there is a significant gap in workload between
calculation of 8 elements and the loading of 16 elements. When cal-
threads (𝑊𝐶 𝑃 ∈ [2, 32] and 𝑊𝐷𝐿 ∈ [4, 48]). The choice of 𝑇𝑘 also has a
culating a tile with 64*64, the workload of the threads is heavy, and
certain impact on the data load of work-item. Each thread is responsible
each thread needs to complete the calculation of 32 elements and the
for more data loads when 𝑇𝑘 is larger. For example, in large tile, when loading of 64 elements. When calculating larger tiles, the workload of
the value of 𝑇𝑘 is set to 8 or 16, each work-item is responsible for the thread increases significantly. To avoid significant differences in
loading 40 and 48 elements, respectively. The workload differences workload between threads, we used a multi-thread kernel to calculate
caused by these different tile sizes impact kernel performance. various tiles by considering the computation amount (𝑊𝐶 𝑃 ) and data
To explore the impact of the number of work-items in the work- loading amount (𝑊𝐷𝐿 ) of threads in the kernel. For larger tiles such
group and the tile size on the performance of batch GEMM, some as 32*64 and 64*64, a 256-thread kernel is used for computation.
experiments are performed, whose results are given in Fig. 3. As shown In this way, increasing the number of threads will reduce the threads
in Fig. 3, under the condition that the number of GEMMs is large and computation amount and data loading amount, thereby reducing the
𝑀, 𝑁, and 𝐾 are large enough, various thread-kernels (thread number gaps between threads workloads and achieving load balancing. There
is 64, 128, 256, and 512) are used to compute multiple tiles (The nine are five tiles and two kernels (𝑊𝑛𝑢𝑚 ) for small and irregular batch
tiles are shown in Fig. 3). In Fig. 3, four thread kernels commonly matrix multiplication, as shown in Table 3. Compared to Table 2, we
used in previous work are selected as benchmarks [21,34,36]. We used balance the thread workload by setting the tile size and number of
these kernels to investigate their performance under various tiles in kernel threads so that thread computation and data loading are as
comparative experiments. Fig. 3 shows that the kernels performance consistent as possible across different workgroups. In the calculation
first increases and then decreases for different tiles. When the tile process of GEMM, five tile types are designed for GEMM calculation
size is small, the threads workload is also tiny. In this case, threads of different sizes, from small to large. To ensure that the amount
in the kernel only compute a few elements, which causes a lack of of computation and data loading for the work-item responsible for
full utilization of threads computing power. As the tile size increases, computing different tiles are as equal as possible, the number of threads
the number of elements that the thread needs to calculate and store varies depending on the tile size. In Table 3, two different thread
is also increasing. Under the condition that the register data does numbers are used (128 and 256), respectively, and the computation
not overflow, the computing efficiency of the thread is continuously amount (𝑊𝐶 𝑃 ) and data loading amount (𝑊𝐷𝐿 ) of the work-item in
improving. When the tile corresponding to the thread is too large, the each scheme are given. Although the current ROCm and CUDA platform
register data overflows, and the data will be transferred to the global programming interfaces only support the kernel design of a uniform
memory. For example, for a 64-thread-kernel, when computing 8*8 thread number, we use a screening operation in the early stage of kernel
and 32*32 tiles, respectively, each thread needs to compute 1 and 32 execution to achieve the effect of kernel design of multiple threads. For
elements in matrix C. It is obvious that 32*32 requires more register example, in this paper, the number of kernel threads is set to 256. When
memory. However, the register memory of each thread is precious. the tiles of small, small-medium and medium are executed, the
When the maximum limit of the register memory is exceeded, the data extra threads will be terminated immediately and the corresponding
will be transferred to the global memory for storage. Because the access computing resources will be released because these tiles only need
5
Y. Zhang et al. Journal of Systems Architecture 160 (2025) 103341
Fig. 3. Experimental results of multi-thread kernel.
Table 3 and each tile is computed by a workgroup. Workgroups are further
The multi-thread kernel design scheme with a more balanced workload.
transformed into wavefronts based on their hardware resource require-
Tile 𝑇𝑚 𝑇𝑛 𝑇𝑘 𝑊𝑛𝑢𝑚 𝑊𝐶 𝑃 𝑊𝐷𝐿
ments and the number of work-item. Finally, these wavefronts are run
small 16 16 16 128 2 6 in parallel on multiple CUs for batch GEMM calculations. Due to the
small-medium 16 32 16 128 4 10
difference between tile sizes, the computation amount and data loading
medium 32 32 16 128 8 16
mediumlarge 32 64 16 256 8 14 amount of threads are not unified in the different wavefront, which
large 64 64 16 256 16 24 will lead to unbalanced hardware resource requirements. The execution
time of the wavefront on the CU is also different. The overall time of the
batch GEMM is the maximum of all CU execution time. If the workload
difference between wavefronts is too significant, the execution time of
128 threads. Terminating threads early allows for a better allocation
one wavefront will be excessive, increasing the overall calculation time
of computational resources to threads responsible for computing other
consumption.
tiles. With this implementation, we can achieve the effect of a multi-
Therefore, Eq. (3) does not consider the workload gaps between
threaded kernel. Even though the performance may be degraded in
wavefronts. To solve this problem, we propose a new TLP calculation
comparison with an actual multi-threaded kernel, the experimental
method as follows:
results in Section 5 demonstrate the excellent performance of this ( )
𝑀𝑖 × 𝑁𝑖
method. 𝑇 𝐿𝑃 𝑛𝑒𝑤 = 𝜑 × 𝑇𝑤𝑎𝑣𝑒𝑓 𝑟𝑜𝑛𝑡 (4)
𝑖
𝑇𝑚𝑖 × 𝑇𝑛𝑖
4.2. Tiling algorithm where the expression of 𝑀𝑖 , 𝑁𝑖 , 𝑇𝑚𝑖 and 𝑇𝑛𝑖 have the same meaning
as Eq. (3), and 𝑇𝑤𝑎𝑣𝑒𝑓 𝑟𝑜𝑛𝑡 is number of work-item in wavefront, 𝜑
4.2.1. Criteria for evaluation represents the conversion process of workgroup to wavefront.
The tiling can be seen as a re-assignment of GEMM computation The conversion process mainly considers the following factors: the
task. Efficient tiling algorithm can transform GEMM operations and number of workitems in the workgroup, the size of VGPR, SGPR, LDS
improve hardware resource utilization. When various kernel designs required by a workitem, and the maximum number of wavefront sup-
are implemented, choosing an appropriate tiling scheme becomes a ported in the CU. These factors are related to GPU hardware architec-
crucial issue. In general, for a GEMM, there will be better parallelism ture. Next, take AMDs MI210, which is based on CDNA2.0 architecture,
within the workgroup when the tile size is larger. However, a larger tile as an example. Under the limitation of the number of workitems in the
means that the number of tiles needs to be reduced. If the number of workgroup, the number of wavefront can be calculated as follows:
tiles is too few, the CU cannot be fully utilized, resulting in a waste of ( )
𝑊 𝐼𝑤𝑔
computing resources. Therefore, choosing a suitable tiling evaluation 𝑊 𝐹𝑤𝑔 = 16 × ceil (5)
64
criteria is crucial. In the previous study, TLP was used to quantify the
parallelism of tiling strategies on GPUs. Given a GEMM and a tiling where 𝑊 𝐹𝑤𝑔 is the maximum number of wavefronts under the limit of
strategy, its TLP can be calculated as follows: the number of work-item in the workgroup, and 𝑊 𝐼𝑤𝑔 represents the
𝑀𝑖 × 𝑁 𝑖 number of work-item in the workgroup. Eq. (5) indicates that when the
𝑇 𝐿𝑃 = × 𝑇𝑤𝑜𝑟𝑘𝑔𝑟𝑜𝑢𝑝 (3)
𝑇𝑚𝑖 × 𝑇𝑛𝑖 number of work-item is less than or equal to 64, a workgroup contains
𝑖
only one wavefront, and the number of workgroups is limited to 16 in
where 𝑀𝑖 and 𝑁𝑖 are the dimension size of matrix C of the 𝑖th GEMM,
the CU.
and 𝑇𝑚𝑖 and 𝑇𝑛𝑖 are the tile sizes chosen by matrix C. 𝑇𝑤𝑜𝑟𝑘𝑔𝑟𝑜𝑢𝑝 is
Limited by the size of VGPR, SGPR, and LDS, the number of the
the number of threads in workgroup. However, the above formulation
wavefront can be calculated as follows:
only considers TLP from the level of the workgroup. Indeed, during ( )
𝑉 𝐺𝑃 𝑅𝑚𝑎𝑥
the computation of the GEMM, the workgroup needs to be further 𝑊 𝐹𝑉 = 4 × floor (6)
𝑉 𝐺𝑃 𝑅𝑢𝑠𝑒𝑑 × 64
transformed into wavefronts and run on the CU in the form of a
wavefront. The execution process of batch GEMM can be divided into where 𝑊 𝐹𝑉 is the maximum number of wavefronts under the limit of
four phases: segmentation, workgroup, wavefront, and execution. In the the size of VGPR, 𝑉 𝐺𝑃 𝑅𝑚𝑎𝑥 is the size of VGPR in the Single Instruction
segmentation phase, the GEMM is tiling into tiles with various sizes, Multiple Data (SIMD) unit, and 𝑉 𝐺𝑃 𝑅𝑢𝑠𝑒𝑑 is the VGPR size used by a
6
Y. Zhang et al. Journal of Systems Architecture 160 (2025) 103341
work-item. In the CDNA2.0 hardware architecture, each CU consists of decreases. This fine-tuning approach ensures that the CU is not idle
four SIMDs. by increasing the utilization of hardware resources at the expense of
( )
𝑆 𝐺𝑃 𝑅𝑚𝑎𝑥 intra-tile parallelism.
𝑊 𝐹𝑆 = floor (7)
𝑆 𝐺𝑃 𝑅𝑢𝑠𝑒𝑑 Algorithm 1 The Tiling algorithm.
where 𝑊 𝐹𝑆 is the maximum number of wavefronts under the limit of 1: Initialize 𝑇 𝐿𝑃𝑡𝑟𝑒𝑠𝑜𝑙𝑑 , 𝑇 𝐿𝑃 =0, 𝑡𝑜𝑡𝑎𝑙_𝑤𝑜𝑟𝑘𝑔 𝑟𝑜𝑢𝑝=0,
the size of SGPR, 𝑆 𝐺𝑃 𝑅𝑚𝑎𝑥 is the size of SGPR in the CU, and 𝑆 𝐺𝑃 𝑅𝑢𝑠𝑒𝑑 𝑡𝑜𝑡𝑎𝑙_𝑤𝑎𝑣𝑒𝑓 𝑟𝑜𝑛𝑡 = 0;
is the size of SGPR used by a wavefront. 2: for 𝑖 = 0 to 𝐵𝑠𝑖𝑧𝑒 1 do
( ) ( )
𝐿𝐷𝑆𝑚𝑎𝑥 𝑊 𝐼𝑤𝑔 3: Calculate 𝑇𝑚𝑖 , 𝑇𝑛𝑖 according to equation (10);
𝑊 𝐹𝐿 = floor × ceil (8)
𝐿𝐷𝑆𝑢𝑠𝑒𝑑 64 4: 𝑡𝑜𝑡𝑎𝑙_𝑤𝑜𝑟𝑘𝑔 𝑟𝑜𝑢𝑝+ = (𝑀𝑖 𝑁𝑖 )(𝑇𝑚𝑖 𝑇𝑛𝑖 );
5: end for
where 𝑊 𝐹𝐿 is the maximum number of wavefronts under the limit of
the size of LDS, 𝐿𝐷𝑆𝑚𝑎𝑥 is the size of LDS in the workgroup, 𝐿𝐷𝑆𝑢𝑠𝑒𝑑 6: 𝑇 𝐿𝑃𝑛𝑒𝑤 = 𝜑(𝑡𝑜𝑡𝑎𝑙_𝑤𝑜𝑟𝑘𝑔 𝑟𝑜𝑢𝑝) × 𝑇𝑤𝑎𝑣𝑒𝑓 𝑟𝑜𝑛𝑡 ;
is the size of LDS used by a workgroup, and the expression of 𝑊 𝐼𝑤𝑔 7: 𝑇 𝑖𝑙𝑒[𝑠𝑖𝑧𝑒] represent to "large" to "small";
8: while ( 𝑇 𝐿𝑃𝑛𝑒𝑤 >= 𝑇 𝐿𝑃𝑡𝑟𝑒𝑠𝑜𝑙𝑑 ) do
have same meaning as Eq. (5).
9: for 𝑗 = 0 to 𝐵𝑠𝑖𝑧𝑒 1 do
To sum up, the number of wavefronts should meet the limitations
10: if 𝑇 𝑖𝑙𝑒[𝑗] is "large" then
of all the above factors, and the calculation method is as follows.
11: Set 𝑇 𝑖𝑙𝑒[𝑗] is "medium-large";
𝑊 𝐹 = min(𝑊 𝐹𝑤𝑔 , 𝑊 𝐹𝑉 , 𝑊 𝐹𝑆 , 𝑊 𝐹𝐿 , 𝑊 𝐹𝐶 ) (9) 12: else if 𝑇 𝑖𝑙𝑒[𝑗] is "medium-large" then
13: Set 𝑇 𝑖𝑙𝑒[𝑗] is "medium";
where 𝑊 𝐹 is the number of activated wavefronts, 𝑊 𝐹𝐶 is the maxi-
14: else if 𝑇 𝑖𝑙𝑒[𝑗] is "medium" then
mum number of wavefront supported in the CU.
15: Set 𝑇 𝑖𝑙𝑒[𝑗] is "small-medium";
The number of wavefronts and the corresponding number of threads
16: else if 𝑇 𝑖𝑙𝑒[𝑗] is "small-medium" then
are introduced into Eq. (4) to compute the TLP more accurately and
17: Set 𝑇 𝑖𝑙𝑒[𝑗] is "small";
appropriately. Compared to Eq. (3), the former only considers the
18: end if
workload at the workgroup-level, which neglects further conversion
19: 𝑡𝑜𝑡𝑎𝑙_𝑤𝑜𝑟𝑘𝑔 𝑟𝑜𝑢𝑝+ = (𝑀𝑗 𝑁𝑗 )(𝑇𝑚𝑗 𝑇𝑛𝑗 );
between the workgroup and wavefront at runtime. Eq. (3) is valid only
20: end for
if the following two conditions are satisfied. One is that all thread
21: 𝑇 𝐿𝑃𝑛𝑒𝑤 = 𝜑(𝑡𝑜𝑡𝑎𝑙_𝑤𝑜𝑟𝑘𝑔 𝑟𝑜𝑢𝑝) × 𝑇𝑤𝑎𝑣𝑒𝑓 𝑟𝑜𝑛𝑡 ;
computations and data load amounts are consistent. The other one
22: end while
is that the hardware resources required for activated wavefront do
not exceed the limit in the CU. Note that for GEMM with different 𝑇 𝐿𝑃𝑡𝑟𝑒𝑠𝑜𝑙𝑑 is used as a threshold to ensure parallelism among
precision, threads have different requirements for computing resources multiple tiles in fine-tuning phase. Note that 𝑇 𝐿𝑃𝑡𝑟𝑒𝑠𝑜𝑙𝑑 has an impor-
(VGPR, SGPR, LDS) during the computation process. Therefore, for tant influence on the selection of tiling scheme for different hardware
matrices with different precision, the values of 𝑉 𝐺𝑃 𝑅𝑢𝑠𝑒𝑑 , 𝑆 𝐺𝑃 𝑅𝑢𝑠𝑒𝑑 , architectures. As a measure, the TLP values of the batch GEMM vary
and 𝐿𝐷𝑆𝑢𝑠𝑒𝑑 in Eqs. (6)(8) above are different. This will affect the according to the different tiling schemes. The setting of the 𝑇 𝐿𝑃𝑡𝑟𝑒𝑠𝑜𝑙𝑑
number of activated wavefronts. value is related to the architecture of the GPU because it uses the
number of wavefront and the number of threads in the wavefront to
4.2.2. Tiling fine-tuning measure the parallelism of the tiling scheme. The hardware resources
For batch GEMM, an initial tiling scheme is first assigned to solve and the maximum number of wavefronts supported by each CU are
the problem of switching between contexts and low hardware resource diverse, so corresponding 𝑇 𝐿𝑃𝑡𝑟𝑒𝑠𝑜𝑙𝑑 should be set for different GPU
utilization caused by the matrixs variable scale. Then, the tiling scheme architectures.
is adjusted according to the TLP estimation of batch GEMM and the The specific process of selecting a tiling scheme for batch GEMM
hardware architecture of GPU, and finally, the best tiling scheme is ob- is given in Algorithm 1: (1) when batch GEMM is given, an initial
tained. In the first stage, the tile size chosen by each GEMM according scheme is obtained according to Eq. (10). (2) The TLP of this scheme
to the dimensions of the matrix should meet the following conditions: is calculated according to the given batch GEMM and tiling scheme.
⎧𝑇𝑚𝑖 ≤ 𝑀𝑖 and 𝑀𝑖 𝑚𝑜𝑑 𝑇𝑚𝑖 = 0 (3) Compare the TLP of the current tiling scheme with the 𝑇 𝐿𝑃𝑡𝑟𝑒𝑠𝑜𝑙𝑑 .
⎨𝑇𝑛𝑖 ≤ 𝑁𝑖 and 𝑁𝑖 𝑚𝑜𝑑 𝑇𝑛𝑖 = 0 (10) If the TLP is not reached, the fine-tuning operation will be performed,
⎪ and the current tiling scheme will be changed and then returned to
⎩𝑇𝑘𝑖 ≤ 𝐾𝑖 and 𝐾𝑖 𝑚𝑜𝑑 𝑇𝑘𝑖 = 0
step (2). If the current TLP is greater than or equal to the threshold,
where 𝑇𝑚𝑖 and 𝑇𝑛𝑖 represent the size of the tile dimension corresponding go to step (4). (4) The batch GEMM is calculated according to the final
to the tiling scheme, and 𝑇𝑘𝑖 is the sub-tile size along the dimension of tiling scheme. In the above procedures, the TLP is used as an evaluation
𝐾. There are two issues. (1) After the first phase, batch GEMM is only
criterion to measure the effectiveness of the tiling scheme on the batch
an initial scheme that cannot achieve optimal parallel computing
GEMM. If the threshold is not reached, fine-tuning is used to adjust and
efficiency. (2) Due to the variability of matrix size in batch GEMM, one
improve the utilization of GPU hardware resources. The optimal tiling
or several items of 𝐵𝑠𝑖𝑧𝑒 , 𝑀, 𝑁, and 𝐾 values may be particularly small
scheme can be obtained to ensure an optimal implementation at the
in batch GEMM, which is called an extreme GEMM case. In this case,
GEMM and workgroup level. After the final tiling scheme, the multi-
the initial scheme cannot get enough tiles, which will make some CU
thread kernel is calculated based on the tile size so that the wavefront
in an idle state, resulting in a waste of GPU computing power.
and work-item levels can achieve a workload balance state.
To solve these problems, the initial scheme is adjusted reasonably
and efficiently in the second stage. For the larger-size matrix, smaller The proposed method is based on the GPU platforms of AMD and
tiles are used to segment, and the number of tiles is increased by NVIDIA for implementation. The hardware characteristics of the GPU
reducing the tile size to avoid CU being idle. The details are as follows: platform can also significantly impact GEMM performance. For exam-
for a GEMM, given an appropriate initial scheme, to avoid the waste ple, in AMD and NVIDIA platforms, threads are based on wavefront
of GPU hardware resources, some larger GEMMs are cut with smaller and warp as the basic execution units containing 64 and 32 threads,
tiles to ensure that the number of tiles is sufficient. For example, for respectively. The number of threads in the kernel needs to be an integer
tiles whose initial value is 64 * 64, tiles with 32 * 32 are used for multiple of the number of threads in wavefront and warp to improve
segmentation. As a result, the number of tiles increases as the tile size kernel occupancy. Meanwhile, the size of registers and shared memory
7
Y. Zhang et al. Journal of Systems Architecture 160 (2025) 103341
Table 4 set of value ranges. The experimental results were represented by the
The configuration of platforms for evaluation.
average value of GFLOPS (Giga Floating-point Operations Per Second),
Platform setup AMD-platform NVIDIA-platform which is calculated as:
CPU EPYC 7763 Platinum 8358 ∑𝑛1
2(𝑀𝑖 × 𝑁𝑖 × 𝐾𝑖 )
GPU MI210 A800 𝐺𝐹 𝐿𝑂𝑃 𝑆 = 𝑖=0 (11)
OS Ubuntu 20.04 Ubuntu 20.04 𝑡𝑜𝑡𝑎𝑙_𝑡𝑖𝑚𝑒 1.0𝑒9
ROCm/CUDA ROCm 5.6 CUDA 12.0 where 𝑀𝑖 , 𝑁𝑖 and 𝐾𝑖 represent the matrix dimension of the 𝑖th GEMM,
and 𝑡𝑜𝑡𝑎𝑙_𝑡𝑖𝑚𝑒 represents the running time on this GPU, 𝑛 represents
Table 5 batch sizes. For simplicity, the experimental data is represented as
The configuration of GPUs for evaluation. single-precision floating-point data and the storage format is based
Name MI210 A800 on the row-first format. The experimental results are averaged over
Architecture CDNA 2.0 Ampere 10 consecutive runs. The final experimental results were rounded to
Core 1700 MHz 1410 MHz preserve two decimal places.
Caches L1 16 KB (per CU) L2 16 MB L1 192 KB (per SM) L2 40 MB
Memory 64 GB 3.2 Gbps HBM2 80 GB 2.4 Gbps HBM2
Bandwidth 1.6 TB/s 2.04 TB/s 5.2. Speed up
In the two platforms, we first compare with the default methods
rocBLAS and cuBLAS. These two methods do not support batch irreg-
can affect parameter settings during implementation based on different ular GEMMs; we convert batch GEMMs into multiple single GEMMs
hardware architectures. Based on this difference, the proposed method and compute the results. The specific experimental results are shown
considers parallelism at the wavefront or warp level when performing in Figs. 45. Figs. 45 show that the proposed method achieves 5.09×
matrix segmentation on two GPU platforms. In this way, the proposed and 7.18× average speedup compared to rocBLAS and cuBLAS. This
method can flexibly select tiling schemes based on the hardware char- result is primarily due to the fact that this method does not sup-
acteristics of the GPU to achieve optimal performance. In this way, port GEMMs of different scales when computing batch GEMMs, so
the proposed method can avoid exceeding the maximum register limit it can only compute one GEMM simultaneously. When faced with
and prevent data overflow, which improves its applicability for various a small matrix, the computational resources of the GPU cannot be
hardware architectures. fully utilized due to the cost of context switching between multiple
GEMMs. As the batch size gradually increases, the advantage of the
5. Evaluation proposed method becomes more evident. This shows that for batch
and irregular GEMMs, rocBLAS and cuBLAS are at a disadvantage in
5.1. Setup terms of computational efficiency and switching between instances.
Meanwhile, we also compare CUTLASS, which handles batch GEMM,
Experiment platform and matrix generation. The overall configu- using sorting to solve the problem of significant workload differences
ration of the experimental platform and the details of the two GPUs between multiple matrix multiplications. Fig. 5 shows that the proposed
are shown in Tables 4 and 5, respectively. To ensure the irregular- method has a 4.64× speedup, which is because CUTLASSs built-in
ity and variability of the input matrix, the GEMM size parameters tiles are unsuitable when the matrix dimensions are small. Therefore,
𝑀, 𝑁, and 𝐾 are randomly generated within corresponding ranges the proposed method performs better acceleration than CUTLASS for
([𝑀 𝑖𝑛, 𝑀 𝑎𝑥_𝑀(𝑁)] and [𝑀 𝑖𝑛, 𝑀 𝑎𝑥_𝐾]). 𝑀 𝑎𝑥_𝑀, 𝑀 𝑎𝑥_𝑁, and 𝑀 𝑎𝑥_𝐾 batch, irregular, and small-size matrix multiplication. We then perform
represent the upper bounds of 𝑀, 𝑁, and 𝐾, respectively. The lower a detailed comparison and analysis of the experimental performance
bound for each experiment is denoted uniformly by 𝑀 𝑖𝑛. In this paper, based on MAGMA. The proposed method has 4.37× and 3.36× speed
the value of 𝑀 𝑖𝑛 is set to 16. For example, Max_M(N) = 512 and improvement compared to MAGMA. Figs. 45 show that the advantage
Max_K = 128 indicate that the range of matrix dimensions is 𝑀 ∈ of our method becomes more pronounced as the batch size increases.
[16, 512], 𝑁 ∈ [16, 512] and 𝐾 ∈ [16, 128]. Thus, multiple sets of This is because MAGMA only uses the largest GEMM size in the batch
matrix dimension ranges can be obtained, and the parameters needed GEMM to set grid.x. Due to the irregularity of the matrix size, a
for GEMM generation are chosen from the different value ranges by large number of computational resources in the grid will be idle. The
random selection. proposed method, in this case, employs fine-grained filtering operations
Comparison method. First, for the two GPU experimental platforms, to ensure further efficient utilization of computational resources, which
the default GEMM processing methods rocBLAS [6] and cuBLAS [7] is more evident when the difference between matrix dimensions is
provided by the respective GPU manufacturers are chosen as the basic significant.
comparison methods to demonstrate the effectiveness of the proposed As shown in Fig. 4, the proposed method achieves an average
method. Since these methods do not support the way of batch invo- 1.88× speedup performance compared to Wang. It is noted that the
cation, in this paper, rocBLAS and cuBLAS compute batch GEMM in a advantage of the proposed method is more pronounced when 𝑀 𝑎𝑥_𝐾
loop manner. No stream operations are used during the computation. and 𝑀 𝑎𝑥_𝑀 are small. For example, in the case of (𝑀 𝑎𝑥_𝑀(𝑁) = 128,
Meanwhile, we also compared the CUTLASS [23], which supports 𝑀 𝑎𝑥_𝐾 = 128), the average speedup can reach 1.95×. This is mainly
batch GEMM based on sorting and built-in tiles. We then compare due to the fact that when the dimension of matrix is small, there are
with MAGMA [8] supported by the University of Tennessee ICL Lab, not enough tiles to cover the time consumption of data loading in the
which only extends the 𝑔 𝑟𝑖𝑑 .𝑧 to support batch GEMM but does not wavefront, which is more pronounced in workgroups with heavy loads.
have a fine-grained optimization strategy. The MAGMA comparison The proposed method adjusts the wavefront workload corresponding
experiments were run on two GPU platforms. Meanwhile, to show the to the tiles through a multi-thread kernel and ensures consistent com-
advancement of our proposed method, we compare with the state-of- putation and data loading by different workgroups. At the same time,
the-art methods such as Wang [36] and Li [21] on their respective it has also shown that the state of load and computation balancing
platforms. All of the above methods perform a warp-up operation to between wavefronts is more conducive to improving the efficiency of
eliminate the effect of the first kernel boot. GPU parallel computing. In the NVIDIA platform, Fig. 5 shows that the
Evaluation criteria. In the following experiments, there are 12 sets proposed method has average 1.94× speedup performance compared to
of GEMM dimension ranges. The experiments with batch sizes 8, 16, Li. The advantage of the proposed method becomes clearer as the batch
32, 64, 128, and 256 were run continuously for ten epochs under each size increases. There are two reasons for this speedup performance :
8
Y. Zhang et al. Journal of Systems Architecture 160 (2025) 103341
Fig. 4. The comparative results on MI210. (5.09×, 4.37×, 1.88× speedup over rocBLAS, MAGMA, Wang).
Fig. 5. The comparative results with on A800. (7.18×, 4.64×, 3.63×, 1.94× speedup over cuBLAS, CUTLASS, MAGMA, Li).
9
Y. Zhang et al. Journal of Systems Architecture 160 (2025) 103341
Fig. 6. The kernel occupancy on two GPU platforms.
Fig. 7. The time overhead of tiling algorithm.
(1) Li et al. used batching to balance the workload among different wavefronts and 𝑁 𝑢𝑚_𝑡𝑜𝑡𝑎𝑙 is the theoretical number of wavefronts that
blocks but did not consider the difference between the workload of CU can execute simultaneously. 𝑁 𝑢𝑚_𝑎𝑐 𝑡𝑖𝑣𝑒𝑑 and 𝑁 𝑢𝑚_𝑡𝑜𝑡𝑎𝑙 represent
threads in different tiles. (2) When selecting the tiling scheme, the TLP the number of warps in activation and the number of warps that are
is calculated only by considering the block, and the fine-grained warp theoretically parallelizable simultaneously in the NVIDIA platform.
level is neglected, which leads to the inaccurate calculation of TLP. The The results of the experiment are shown in Fig. 6. By comparing
proposed method adjusts the wavefront workload corresponding to the rocBLAS and cuBLAS, it can be seen that the proposed method has a
tiles through a multi-thread kernel and ensures consistent computation clear advantage in the case of batch GEMM. The proposed method is
and data loading by different workgroups. At the same time, it has also in the best position compared to the other methods (CUTLASS,
also shown that the state of load and computation balancing between MAGMA, Wang, Li), showing high efficiency in terms of utilization of
wavefronts is more conducive to improving the efficiency of GPU GPU resources. As shown in Fig. 6, the proposed method consistently
parallel computing. maintains the optimal kernel occupancy on both GPU platforms, which
indicates that the proposed method can better exploit the computing
power of the GPU.
5.3. Kernel occupancy
5.4. The overhead of tiling algorithm
To explore the difference between the proposed method and the
comparison methods in terms of GPU resource utilization, we present
This section presents the proportion of the runtime that is taken
the kernel occupancy of the various methods on two GPU platforms.
up by the tiling algorithm when executing the proposed method on
The formula for kernel occupancy can be expressed as:
two different GPU platforms with various batch sizes. The experimental
𝑁 𝑢𝑚_𝑎𝑐 𝑡𝑖𝑣𝑒𝑑
kernel occupancy = (12) results are presented in Fig. 7. From Fig. 7, it is evident that the tiling
𝑁 𝑢𝑚_𝑡𝑜𝑡𝑎𝑙
algorithms runtime percentage decreases as the batch size increases.
To obtain more accurate performance metrics, we utilize Omniperf5 When batch size is 8, the runtime of the tiling algorithm on the two
and Nsight6 commands, profiling tools provided by AMD and NVIDIA, GPU platforms is 6.06% and 6.37%, respectively. As the batch size
to evaluate the resource utilization of the kernel during the execution increases, more and more GEMMs are executed on the GPU, and the
process. The kernel occupancy has distinct interpretations owing to execution time of these GEMMs on the GPU side takes up most of the
the distinctions in GPU architecture between AMD MI210 and NVIDIA time, resulting in a smaller runtime portion of the tiling algorithm.
A800. On the AMD platform, 𝑁 𝑢𝑚_𝑎𝑐 𝑡𝑖𝑣𝑒𝑑 is the number of activated For example, with a batch size is 1024, the tiling algorithm takes less
than 1% of the runtime. The experimental results on two GPUs indicate
that the time overhead of the tiling algorithm in the batch GEMM
5
https://github.com/ROCm/omniperf execution process is negligible, especially when the batch size is large.
6
https://docs.nvidia.com/nsight-compute/NsightCompute/index.html In real-world scenarios such as deep learning, where a large number of
10
Y. Zhang et al. Journal of Systems Architecture 160 (2025) 103341
Fig. 8. The performance improvement of the proposed TLP on MI210. (1.077× average speedup).
GEMM operations are often required, the tiling algorithm will have less 4.53×, and 1.62× compared to rocBLAS, MAGMA, and Wang, respec-
overhead in the execution process. tively. The proposed method has the lowest latency performance on
MI210, indicating higher computational efficiency and can effectively
5.5. The performance benefits of the proposed TLP reduce latency. On A800, the proposed method showed performance
improvements of 3.02×, 2.59×, 2.45×, and 1.89× compared to cuBLAS,
This section presents the comparative experimental results on two MAGMA, CUTLASS, and Li, respectively. Fig. 10 shows that as the
GPU platforms to provide a more detailed evaluation of the proposed batch size gradually increases, the kernel latency increases on both
TLP. The detailed experimental results are shown in Figs. 89. From GPU platforms. rocBLAS and cuBLAS have the highest latency as the
Figs. 89, it is clear that the proposed TLP performs better overall than batch size increases. This phenomenon is because the traditional loop
traditional TLP. The proposed methods have a speedup of 1.077× and scheduling method significantly increases latency consumption due to
1.085× on MI210 and A800, respectively. From Fig. 8, the proposed context switching between kernels when the batch size is large. From
method significantly improves performance when the batch size is Fig. 10, it can be seen that some methods exhibit different latency
larger. For example, on MI210, the proposed method has an average performances at various batch sizes. For example, when batch size
speedup of 1.04× when batch size <= 16. When batch size >= 32, the <= 16, MAGMA has the highest latency performance on two GPU
proposed method can improve performance by 1.10×. The performance platforms. When the batch size is large, its computational performance
improvement gap is because when the batch size and matrix dimension improves, indicating that the MAGMA performs better when there are
are small, it is difficult to utilize hardware resources fully. When there many matrices. The experimental results on two platforms show that
are a large number of tiles, the proposed TLP can more accurately the proposed method has the lowest latency under various batch sizes,
evaluate the threads workload and select the optimal tiling scheme. indicating better performance and broad applicability.
The same performance trend is also reflected in the A800 platform. On
A800, the proposed TLP has performance improvements of 1.04× and
1.11× when batch size <= 16 and batch size >= 32, respectively. The 5.7. The improved performance on inception layers of CNN
effectiveness of the proposed TLP can be further demonstrated through
comparative experiment results on two GPU platforms. Modern CNN model architectures often have multiple branches to
capture features at different scales. Convolution operations of differ-
5.6. The latency ent scales in each branch can be represented as batch GEMM oper-
ations with various dimensions, e.g. GoogleNet [13], DenseNet [50],
This section compares kernel latency on two GPU platforms to SqueezeNet [12], etc. To demonstrate the effectiveness of the proposed
provide a more detailed evaluation of the proposed method. We mea- method in real-world scenarios, we use various Inception module as
sured kernel latency with different batch sizes in the comparative a typical application to perform the forward computation process on
experiment. The detailed experimental results are shown in Fig. 10. two GPU platforms. The Inception module involves a large number of
On MI210, the proposed method has a latency reduction of 3.87×, irregular, small-size GEMM operations. The deep learning frameworks
11
Y. Zhang et al. Journal of Systems Architecture 160 (2025) 103341
Fig. 9. The performance improvement of the proposed TLP on A800. (1.085× average speedup).
Fig. 10. The latency performance of the kernel on two GPU platforms.
MIOpen7 and cuDNN8 are used as benchmark implementations on the other Inception module, and the dimensions of these matrices are
both GPU platforms. In this section, we select several commonly used smaller than the former two. Finally, the proposed method has been
Inception modules to evaluate the proposed methods speedup perfor- proven to significantly accelerate CNN models with various branch
mance. The GEMM sizes in Inception modules are shown in Table 6. structures on two different GPU platforms, particularly in scenarios
Fig. 11 shows the speedup performance of the proposed method in each involving multiple branches, irregular shapes, and small dimensions.
Inception module. As shown in Fig. 11, the average speedups are 2.88×
and 1.87× respectively. The gray boxes represent the average speedup 6. Conclusion
ratios of the different Inception modules in Fig. 11. The experimental
results suggest that the Inception 89 series has the highest average In this paper, we propose a load-balanced batch GEMM acceleration
speedup ratio (3.68× and 2.66× respectively) among the Inception method for the problem of low parallel computing efficiency and poor
modules, because Inception 89 has more matrix shapes compared to hardware resource utilization in batch, irregular, and variable matrix
multiplication scenarios. The kernel occupancy and hardware resource
utilization can be effectively improved by a multi-thread kernel design
7
https://github.com/ROCm/MIOpen that balances the computational and data load in the work-item. A
8
https://github.com/NVIDIA/cudnn-frontend novel approach to TLP computation is devised, where the parallelism of
12
Y. Zhang et al. Journal of Systems Architecture 160 (2025) 103341
Fig. 11. The speedup performance on Inception layers.
Table 6
The size of GEMM in various Inception modules.
Inception module GEMM size (M ×N × K)
Inception-1 784 × 96 × 192, 784 × 64 × 192, 784 × 32 × 192, 784 × 16 × 192
Inception-2 784 × 64 × 192, 784 × 32 × 192, 784 × 128 × 192
Inception-3 196 × 192 × 192, 196 × 16 × 192, 196 × 96 × 192, 196 × 64 × 192
Inception-4 196 × 64 × 192, 196 × 24 × 192, 196 × 160 × 192
Inception-5 196 × 64 × 192, 196 × 128 × 192, 196 × 24 × 192
Inception-6 196 × 112 × 192, 196 × 144 × 192, 196 × 32 × 192, 196 × 64 × 192
Inception-7 196 × 256 × 192, 196 × 160 × 192, 196 × 128 × 192
Inception-8 49 × 160 × 192, 49 × 128 × 192, 49 × 256 × 192, 49 × 160 × 192, 49 × 32 × 192
Inception-9 49 × 192 × 192, 49 × 128 × 192, 49 × 384 × 192, 49 × 192 × 192, 49 × 48 × 192
the tiling scheme is measured by the number of activated wavefronts. References
This approach allows the optimal tiling scheme to be selected based on
different GPU architectures. Experiments are conducted on two GPU [1] P. Valero-Lara, I. Jorquera, F. Lui, J. Vetter, Mixed-precision S/DGEMM using
the TF32 and TF64 frameworks on low-precision AI tensor cores, in: Proceedings
platforms to validate the effectiveness and progress of our proposed
of the SC23 Workshops of the International Conference on High Performance
method. Computing, Network, Storage, and Analysis, 2023, pp. 179186.
Future work includes exploring batch GEMM with various preci- [2] H. Martínez, S. Catalán, A. Castelló, E.S. Quintana-Ortí, Parallel GEMM-based
sion performances. With the development of Transformer-based, many convolutions for deep learning on multicore ARM and RISC-V architectures, J.
GEMM operations are involved in the training and inference process Syst. Archit. (2024) 103186.
[3] J. Fornt, P. Fontova-Musté, M. Caro, J. Abella, F. Moll, J. Altet, C. Studer, An
of Large Language Models (LLMs), which often have lower accuracy, energy-efficient gemm-based convolution accelerator with on-the-fly im2col, IEEE
such as FP16, FP8, etc. For example, quantized LLMs often involve Trans. Very Large Scale Integr. (VLSI) Syst. 31 (11) (2023) 18741878.
GEMM operations where the weight matrices and activation values [4] H. Kim, W.J. Song, Las: locality-aware scheduling for GEMM-accelerated
have different precisions, e.g. W4A16, W8A8. More complex precisions convolutions in GPUs, IEEE Trans. Parallel Distrib. Syst. 34 (5) (2023)
14791494.
and storage formats pose challenges to the performance of GEMM [5] W. Yang, J. Fang, D. Dong, X. Su, Z. Wang, Optimizing full-spectrum matrix
operations. multiplications on ARMv8 multi-core CPUs, IEEE Trans. Parallel Distrib. Syst.
(2024).
CRediT authorship contribution statement [6] AMD, Next generation BLAS implementation for ROCm platform, 2024, https:
//github.com/ROCm/rocBLAS.
[7] B. Tuomanen, Hands-On GPU Programming with Python and CUDA: Explore
Yu Zhang: Writing review & editing, Writing original draft. Lu High-Performance Parallel Computing with CUDA, Packt Publishing Ltd, 2018.
Lu: Writing review & editing, Supervision. Zhanyu Yang: Writing [8] ICL, Matrix algebra for GPU and multicore architectures, 2024, https://icl.utk.
review & editing. Zhihong Liang: Supervision, Conceptualization. edu/magma/.
[9] T. Faingnaert, T. Besard, B. De Sutter, Flexible performant GEMM kernels on
Siliang Suo: Supervision, Conceptualization.
GPUs, IEEE Trans. Parallel Distrib. Syst. 33 (9) (2021) 22302248.
[10] W.S. Moses, I.R. Ivanov, J. Domke, T. Endo, J. Doerfert, O. Zinenko, High-
Declaration of competing interest performance gpu-to-cpu transpilation and optimization via high-level parallel
constructs, in: Proceedings of the 28th ACM SIGPLAN Annual Symposium on
The authors declare that they have no known competing finan- Principles and Practice of Parallel Programming, 2023, pp. 119134.
[11] H. Kim, H. Nam, W. Jung, J. Lee, Performance analysis of CNN frameworks
cial interests or personal relationships that could have appeared to
for GPUs, in: 2017 IEEE International Symposium on Performance Analysis of
influence the work reported in this paper. Systems and Software, ISPASS, IEEE, 2017, pp. 5564.
[12] F.N. Iandola, S. Han, M.W. Moskewicz, K. Ashraf, W.J. Dally, K. Keutzer,
Acknowledgments SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB
model size, 2016, arXiv preprint arXiv:1602.07360.
[13] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V.
This work was supported by the Natural Science Foundation of Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: Proceedings of
Guangdong Province (2024A1515010204) and the Technological Re- the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp.
search Project of Southern Power Grid Company (ZBKJXM20232483). 19.
[14] G. Pant, D. Yadav, A. Gaur, ResNeXt convolution neural network topology-based
deep learning model for identification and classification of pediastrum, Algal Res.
Data availability
48 (2020) 101932.
[15] S. Barrachina, M.F. Dolz, P. San Juan, E.S. Quintana-Ortí, Efficient and
No data was used for the research described in the article. portable GEMM-based convolution operators for deep neural network training
on multicore processors, J. Parallel Distrib. Comput. 167 (2022) 240254.
13
Y. Zhang et al. Journal of Systems Architecture 160 (2025) 103341
[16] S. Rajbhandari, Y. He, O. Ruwase, M. Carbin, T. Chilimbi, Optimizing cnns on [35] G. Alaejos, A. Castelló, H. Martínez, P. Alonso-Jordá, F.D. Igual, E.S. Quintana-
multicores for scalability, performance and goodput, ACM SIGARCH Comput. Ortí, Micro-kernels for portable and efficient matrix multiplication in deep
Archit. News 45 (1) (2017) 267280. learning, J. Supercomput. 79 (7) (2023) 81248147.
[17] C. Rivera, J. Chen, N. Xiong, S.L. Song, D. Tao, Ism2: Optimizing irregular-shaped [36] R. Wang, Z. Yang, H. Xu, L. Lu, A high-performance batched matrix multiplica-
matrix-matrix multiplication on gpus, 2020, arXiv preprint arXiv:2002.03258. tion framework for gpus under unbalanced input distribution, J. Supercomput.
[18] K. Matsumoto, N. Nakasato, S.G. Sedukhin, Performance tuning of matrix 78 (2) (2022) 17411758.
multiplication in opencl on different gpus and CPUs, in: 2012 SC Companion: [37] Y. Zhang, Y. Wang, Z. Mo, Y. Zhou, T. Sun, G. Xu, C. Xing, L. Yang, Accelerating
High Performance Computing, Networking Storage and Analysis, IEEE, 2012, pp. small matrix multiplications by adaptive batching strategy on GPU, in: 2022
396405. IEEE 24th Int Conf on High Performance Computing & Communications; 8th
[19] G.E. Moon, H. Kwon, G. Jeong, P. Chatarasi, S. Rajamanickam, T. Krishna, Eval- Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int
uating spatial accelerator architectures with tiled matrix-matrix multiplication, Conf on Dependability in Sensor, Cloud & Big Data Systems & Application,
IEEE Trans. Parallel Distrib. Syst. 33 (4) (2021) 10021014. HPCC/DSS/SmartCity/DependSys, IEEE, 2022, pp. 882887.
[20] Q. Han, H. Yang, M. Dun, Z. Luan, L. Gan, G. Yang, D. Qian, Towards [38] A. Abdelfattah, S. Tomov, J. Dongarra, Matrix multiplication on batches of small
efficient tile low-rank GEMM computation on sunway many-core processors, J. matrices in half and half-complex precisions, J. Parallel Distrib. Comput. 145
Supercomput. 77 (5) (2021) 45334564. (2020) 188201.
[21] X. Li, Y. Liang, S. Yan, L. Jia, Y. Li, A coordinated tiling and batching [39] A. Abdelfattah, A. Haidar, S. Tomov, J. Dongarra, Novel HPC techniques to batch
framework for efficient GEMM on GPUs, in: Proceedings of the 24th Symposium execution of many variable size BLAS computations on GPUs, in: Proceedings of
on Principles and Practice of Parallel Programming, 2019, pp. 229241. the International Conference on Supercomputing, 2017, pp. 110.
[22] P. Tillet, D. Cox, Input-aware auto-tuning of compute-bound HPC kernels, in: [40] A. Abdelfattah, A. Haidar, S. Tomov, J. Dongarra, Performance, design, and
Proceedings of the International Conference for High Performance Computing, autotuning of batched GEMM for GPUs, in: High Performance Computing: 31st
Networking, Storage and Analysis, 2017, pp. 112. International Conference, ISC High Performance 2016, Frankfurt, Germany, June
[23] NVIDIA, CUDA templates for linear algebra subroutines, 2024, https://github. 19-23, 2016, Proceedings, Springer, 2016, pp. 2138.
com/NVIDIA/cutlass. [41] A. Li, G.-J. van den Braak, H. Corporaal, A. Kumar, Fine-grained synchronizations
[24] J. Huang, C.D. Yu, R.A.v.d. Geijn, Strassens algorithm reloaded on GPUs, ACM and dataflow programming on GPUs, in: Proceedings of the 29th ACM on
Trans. Math. Softw. 46 (1) (2020) 122. International Conference on Supercomputing, 2015, pp. 109118.
[25] B. Boyer, J.-G. Dumas, C. Pernet, W. Zhou, Memory efficient scheduling of [42] J. Li, H. Ye, S. Tian, X. Li, J. Zhang, A fine-grained prefetching scheme
strassen-winograds matrix multiplication algorithm, in: Proceedings of the 2009 for DGEMM kernels on GPU with auto-tuning compatibility, in: 2022 IEEE
International Symposium on Symbolic and Algebraic Computation, 2009, pp. International Parallel and Distributed Processing Symposium, IPDPS, IEEE, 2022,
5562. pp. 863874.
[26] A. Fawzi, M. Balog, A. Huang, T. Hubert, B. Romera-Paredes, M. Barekatain, [43] Z. Yang, L. Lu, R. Wang, A batched GEMM optimization framework for deep
A. Novikov, F.J. R Ruiz, J. Schrittwieser, G. Swirszcz, et al., Discovering faster learning, J. Supercomput. 78 (11) (2022) 1339313408.
matrix multiplication algorithms with reinforcement learning, Nature 610 (7930) [44] H. Mei, H. Qu, J. Sun, Y. Gao, H. Lin, G. Sun, GPU occupancy prediction of
(2022) 4753. deep learning models using graph neural network, in: 2023 IEEE International
[27] G. Xiao, C. Yin, T. Zhou, X. Li, Y. Chen, K. Li, A survey of accelerating parallel Conference on Cluster Computing, CLUSTER, IEEE, 2023, pp. 318329.
sparse linear algebra, ACM Comput. Surv. 56 (1) (2023) 138. [45] I. Masliah, A. Abdelfattah, A. Haidar, S. Tomov, M. Baboulin, J. Falcou, J.
[28] Y. Chen, G. Xiao, K. Li, F. Piccialli, A.Y. Zomaya, fgSpMSpV: A fine-grained Dongarra, Algorithms and optimization techniques for high-performance matrix-
parallel SpMSpV framework on HPC platforms, ACM Trans. Parallel Comput. 9 matrix multiplications of very small matrices, Parallel Comput. 81 (2019)
(2) (2022) 129. 121.
[29] Y. Chen, G. Xiao, W. Yang, Optimizing partitioned CSR-based SpGEMM on the [46] G. Park, B. Park, M. Kim, S. Lee, J. Kim, B. Kwon, S.J. Kwon, B. Kim, Y. Lee,
sunway TaihuLight, Neural Comput. Appl. 32 (10) (2020) 55715582. D. Lee, Lut-gemm: Quantized matrix multiplication based on luts for efficient
[30] Y. Chen, K. Li, W. Yang, G. Xiao, X. Xie, T. Li, Performance-aware model for inference in large-scale generative language models, 2022, arXiv preprint arXiv:
sparse matrix-matrix multiplication on the sunway taihulight supercomputer, 2206.09557.
IEEE Trans. Parallel Distrib. Syst. 30 (4) (2018) 923938. [47] B. Feng, Y. Wang, G. Chen, W. Zhang, Y. Xie, Y. Ding, EGEMM-TC: accelerating
[31] G. Xiao, K. Li, Y. Chen, W. He, A.Y. Zomaya, T. Li, Caspmv: A customized scientific computing on tensor cores with extended precision, in: Proceedings
and accelerative spmv framework for the sunway taihulight, IEEE Trans. Parallel of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel
Distrib. Syst. 32 (1) (2019) 131146. Programming, 2021, pp. 278291.
[32] G. Xiao, C. Yin, Y. Chen, M. Duan, K. Li, Efficient utilization of multi-threading [48] G. Shobaki, A. Kerbow, S. Mekhanoshin, Optimizing occupancy and ILP on the
parallelism on heterogeneous systems for sparse tensor contraction, IEEE Trans. GPU using a combinatorial approach, in: Proceedings of the 18th ACM/IEEE
Parallel Distrib. Syst. (2024). International Symposium on Code Generation and Optimization, 2020, pp.
[33] D.E. Tanner, Tensile: Auto-tuning gemm gpu assembly for all problem sizes, 133144.
in: 2018 IEEE International Parallel and Distributed Processing Symposium [49] A.B. Hayes, L. Li, D. Chavarría-Miranda, S.L. Song, E.Z. Zhang, Orion: A
Workshops, IPDPSW, IEEE, 2018, pp. 10661075. framework for gpu occupancy tuning, in: Proceedings of the 17th International
[34] S. Wang, FlexGEMM: A flexible micro-kernel generation framework, in: Proceed- Middleware Conference, 2016, pp. 113.
ings of the 5th International Conference on Computer Information and Big Data [50] G. Huang, S. Liu, L. Van der Maaten, K.Q. Weinberger, Condensenet: An
Applications, 2024, pp. 164170. efficient densenet using learned group convolutions, in: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 2018, pp. 27522761.
14

View File

@@ -0,0 +1,943 @@
Computer Standards & Interfaces 97 (2026) 104122
Contents lists available at ScienceDirect
Computer Standards & Interfaces
journal homepage: www.elsevier.com/locate/csi
A multi-criteria process for IT project success evaluationAddressing a
critical gap in standard practices
João Carlos Lourenço a , João Varajão b,*
a
CEGIST, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049-001 Lisboa, Portugal
b
Centro ALGORITMI, Universidade do Minho, Campus de Azurém, 4804-533 Guimarães, Portugal
A R T I C L E I N F O A B S T R A C T
Keywords: The evaluation of project success is widely recognised as valuable for improving IT (Information Technology)
Project success project performance and impact. However, many processes fail to adequately address the requirements for a
Project evaluation sound evaluation due to their inherent complexity or by not complying with fundamental practical and theo­
Multi-criteria evaluation
retical concepts. This paper presents a process that combines a problem structuring method with a multi-criteria
MACBETH
Process
decision analysis approach to evaluate the success of IT projects. Put into practice in the context of a software
Methodology development project developed for a leading global supplier of technology and services, it offers a new way of
creating a model for evaluating project success and tackling uncertainty, bringing clarity and consistency to the
overall assessment process. A strong advantage of this process is that it is theoretically sound and can be easily
applied to other evaluation problems involving other criteria. It also serves as a call to action for the development
of formal standards in evaluation processes. Practical pathways to achieve such standardization include
collaboration through industry consortia, development and adoption of ISO frameworks, and embedding eval­
uation processes within established maturity models. These pathways can foster consistency, comparability, and
continuous improvement across organizations, paving the way for more robust and transparent evaluation
practices.
1. Introduction Additionally, several errors identified by decision analysis literature
[12,13] are often made, generating meaningless project success evalu­
The sustainable success of virtually any organisation is strongly ations [14]. Some common mistakes involve not including relevant
associated with the success of its projects [1]. A key factor for project criteria in the evaluation model, not distinguishing the performance of a
success is that project managers clearly understand what success means project from its value, assigning weights to evaluation criteria without
[2], which is usually not the case [3]. Despite different notions about considering the ranges of variation of their performance scales, and
what constitutes “project success” and the many criteria that can be used making calculations that violate measurement scales properties. In
for evaluation (e.g., cost, time, and performance, among others) [4], a other words, such evaluations are inconsistent with multi-attribute
project must satisfy its clients to be considered successful [58]. value theory (MAVT) and value measurement foundations.
Given the importance and complexity of the evaluation of projects, Considering these limitations, this research proposes a process that
companies should define and implement systematic processes for eval­ combines a problem structuring method with a multi-criteria approach
uating success to improve project management performance and the for evaluating the success of information technology (IT) projects sup­
impact of deliverables [9]. However, despite the models and techniques ported by a real-world case. This process was developed and applied in
that are currently available for assessing project success, they are typi­ the context of a project of GlobalSysMakers (for confidentiality reasons,
cally challenging to implement for a variety of reasons, notably the the name of the company herein is anonymized), a leading global sup­
complexity caused by using multiple and often conflicting objectives (e. plier of technology and services.
g., minimise cost and maximise quality), the scarcity of empirical studies In the GlobalSysMakers project, the need for a new process arose
reporting their genuine use in projects [10], and the fact that practices because the project management team felt that the scoring model
employed in companies are generally informal and simplistic [11]. initially defined for success assessment, while helpful, lacked accuracy.
* Corresponding author.
E-mail address: varajao@dsi.uminho.pt (J. Varajão).
https://doi.org/10.1016/j.csi.2025.104122
Received 12 August 2025; Received in revised form 7 November 2025; Accepted 23 December 2025
Available online 24 December 2025
0920-5489/© 2025 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
Following an appraisal of several methodological alternatives, a new weights of several stakeholders without a discussion obliterates their
multi-criteria approach combined with a problem structuring method individual differences [26]. Additionally, the “importance of the
was shown to be the best solution, providing the required precision and criteria” should consider their respective performance ranges; other­
transparency to the process, along with a better understanding of the wise, the resulting weights would be arbitrary [27].
real meaning of the relative importance of each evaluation criterion. Basar [28] proposes a methodology to evaluate the performance of IT
This paper describes the process developed in detail so that it can be projects in a fuzzy environment. She first identifies the evaluation
replicated in other projects. Also, the results are presented and dis­ criteria using the balanced scorecard method. Second, she determines
cussed, including contributions to theory and practice. the criteria weights with expert judgments and hesitant fuzzy weights.
The proposed process, which combines a problem structuring Then, the weights are used to evaluate the performance of IT projects in
method with a multi-criteria approach for evaluating IT project success, a Turkish company. The weighting process described in this paper is
offers several theoretical implications. First, it advances the conceptu­ difficult for a non-expert evaluator to understand. Additionally, the
alization of project success by integrating both subjective stakeholder quantitative performances of projects on the criteria are systematically
perspectives and objective performance criteria, addressing the multi­ normalised to scores between 0 and 1 with a linear transformation that
dimensional and context-dependent nature of success in IT projects. may not correspond to the preferences of evaluators (which may be
Second, it contributes to decision theory and project management non-linear). The paper does not explain how to address the evaluation of
literature by demonstrating how problem structuring methods—typi­ the qualitative criteria.
cally underutilized in IT evaluation—can enhance the clarity and rele­ Ismail [29] applies the Delphi method and conducts a seminar with
vance of criteria selection and prioritization. Third, the integration of experts to identify a construction projects potential evaluation criteria
these methodologies provides a foundation for developing more robust, and group them into clusters. A relative importance index is calculated
transparent, and adaptable evaluation frameworks, which can inform for each criterion with a weighted average of the responses to a survey
future theoretical models and empirical studies. Ultimately, this expressed on a Likert scale. In a subsequent step, the experts 1) reduced
research supports the movement toward standardization by offering a the number of clusters and criteria and 2) assigned the same weight to
replicable and theoretically grounded process that can be refined and the latter. Then, a priority index was calculated for each criterion with
generalized across different organizational and project contexts. the Priority Evaluation Model (PEM) [30], which combines the “satis­
The remainder of this paper is organised as follows. Section 2 briefly faction” rate (assigned by the experts) and the “importance” of the cri­
reviews previous related work on project evaluation methods, cases, and terion. The overall project success is obtained with a weighted sum of
multi-criteria evaluation methods. Section 3 describes the case context the averages of the priority indexes obtained on each cluster and the
and the development of the success evaluation model using a process clusters weights. However, the paper does not explain how these
that combines a problem structuring model with a multi-criteria deci­ weights were assessed. Additionally, the Likert scale classifications
sion analysis approach. Section 4 discusses the results obtained. Finally, cannot be used for calculating averages or other arithmetic calculations.
Section 5 presents the conclusions and avenues for further work. Nguvulu et al. [31] use a Deep Belief Network (DBN) to evaluate eight
IT projects performances after training the DBN with five projects of 12
2. Previous related work months duration. The DPN automatically assigned weights and scores to
the criteria, considering possible interactions between them. The au­
2.1. Success of projects thors stress the advantage of this approach by not considering human
subjectivity. However, from our point of view, this is a weakness
Evaluation can be defined as the assessment and analysis of the ef­ because the subjective preferences of project managers, clients, and
ficiency and effectiveness of the projects activities and results. The other stakeholders should be considered in an evaluation process to
evaluation looks at what is planned to do, what has been achieved, and avoid arbitrary results generated by inadequate analytical approaches.
how it has been achieved [15]. Kahan and Goodstadt [16] conceive Wohlin and Andrews [32] apply principal component analysis and
evaluation as a set of questions and methods properly articulated to subjective evaluation factors to estimate which projects are successful or
review processes, activities, and strategies to achieve better results. unsuccessful out of a set of projects. This statistical approach may be
Therefore, the purpose of an evaluation is not just to find out what used to identify key project characteristics, but it does not allow for
happened but to use that information to make the project better [17,18]. evaluating the projects success according to stakeholders preferences.
There are several evaluation approaches in the literature, some Yan [33] suggests the combined use of the balanced scorecard (BSC)
considerably complex regarding their practical operationalisation and [34], the Analytic Hierarchy Process (AHP), and the Fuzzy Comprehensive
use. Varajão et al. [10] present a comprehensive review of models and Analysis method (FCA), respectively, to construct a performance criteria
methods for evaluating information systems project success. Some ex­ system, assess the criteria weights, and obtain an overall evaluation
amples are described and analysed next. score. The author explains how to obtain the performance criteria sys­
Bannerman and Thorogood [19] propose a framework for defining IT tem, but does not explain the weighting and scoring components.
project success that provides a common language for communication Yang et al. [35] apply a multi-criteria model for evaluating a soft­
and compares what stakeholders perceive as important. The authors list ware development projects success using the Analytical Network Process
the criteria that should be used to assess the success of a project within (ANP) [36] to assess the criteria weights at several hierarchical levels.
five domains (process, project management, product, business, and The scores of a project on a given criterion were obtained by calculating
strategy). However, they do not explain how to consider these domains the average of the scores assigned by five experts using a 5-point Likert
and criteria together. scale. Note that, as mentioned above, averages should not be calculated
Barclay and Osei-Bryson [20] describe a structured framework with ordinal scales. In addition, ANP is based on AHP, a method with
named Project Objectives Measurement Model (POMM) to identify the known issues that affect the validity of the criteria weights (see, e.g.,
criteria for evaluating an information system (IS) project and assigning a [3739]).
performance measure to each criterion. POMM applies value-focused Section 2.2 reviews important concepts and methods related to
thinking principles [21] and goal question metric methods [22]. An multi-criteria evaluation that are needed to create a proper value mea­
illustrative case is presented in which the importance of each criterion is surement model [40,41] to assess the success of a project.
directly assessed using an average of the stakeholders answers based on
a 5-point Likert scale. However, despite its virtues, this operation is 2.2. Multi-criteria evaluation
neither quantitatively nor substantively meaningful [23], respectively,
because a Likert scale is an ordinal scale [24,25] and averaging the In a multi-criteria value model, the measure of success of a project is
2
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
given by the additive value function model: generates a proposal of weights compatible with the inputted qualitative
n n
judgments by solving the linear programming problem described in
∑ ( ) ∑
V(x1 , x2 , …, xn ) = wj vj xj , with wj = 1 and wj > 0, ∀j (1) Bana e Costa et al. [52]. The evaluators should validate the proposed
j=1 j=1 weighting scale and adjust it if needed.
Where V is the overall value score of the success of the project, wj is the 2.2.2. Methods to build value scales
weight of criterion j, vj(xj) is the value score on criterion j of the per­ We must assign fixed scores to the previously defined references to
formance xj, and nrepresents the number of evaluation criteria. build a criterion value scale. For example, we may assign 100 and
Despite being straightforward in form, this model is often poorly 0 value units to the “best” and the “worst” performances in each crite­
applied. We highlight that the criteria weights wj are scaling constants rion, respectively, although two other scores could be used so that the
[42], which represent trade-offs between criteria and not the erroneous highest score is assigned to the most preferred reference. Though this
notion of criterias measures of importance [21]. In addition, vj is a arbitrary assignment of scores leads to obtaining interval value scales
measurable value function, which represents both a preference order [25]. Additionally, the score of a project on a given criterion should
between performances on criterion j and a strength-of-preference order consider the preferences expressed by the evaluators upon performance
on differences of performances [43]. Moreover, the model requires the ranges within the criterion [43] (e.g., the difference in value between
criteria to be mutually preferentially independent [44], which entails performances A and B is worth twice the difference between C and D).
special care during the model structuring phase. Hereinafter, we present two numerical scoring methods and a qualita­
There are some fundamental aspects to note regarding the desired tive one.
properties for each evaluation criterion and also for the whole set of Edwards [53] presents the direct rating method. This numerical
criteria [45]. Each criterion should be essential for the evaluation and procedure first requires evaluators to rank the project performances in
controllable in the sense that the performance of the project influences order of decreasing attractiveness. The highest score (100 units) is
the degree to which the criterion is satisfied, independently of other assigned to the “best” performance and the lowest score (0 units) to the
additional decisions. Also, a family of evaluation criteria should be: “worst”. Intermediate scores are assigned to other performance levels
complete (the set of criteria should represent all of the relevant conse­ considering the intensities of preferences between each two of them,
quences of the project); nonredundant (the criteria should not repeat the knowing that the difference between the “best” and “worst” is worth 100
same concerns); concise (the number of criteria should be kept to the value units. This method allows scoring a project directly or indirectly
necessary minimum to evaluate the project); specific (each criterion using a performance measure (e.g., quantitative continuous, quantita­
should be able to assess the consequences of the project, instead of being tive discrete, or qualitative). von Winterfeldt and Edwards [54] describe
so broad that it compromises this purpose); and understandable (the the bisection method, also known as the mid-value splitting technique [55],
evaluation criteria should be clear in the eyes of any interested to create a value scale for a criterion. This numerical method assigns the
individual). highest score to the “best” performance (100) on the criterion and the
Depending on the ability to use appropriate numerical principles and lowest score (zero) to the “worst”. Then, it is asked which performance p
fluency to express oneself in words, an evaluator may prefer to apply a has a value equally distant from the “best” and the “worst” perfor­
numerical method or a non-numerical one [46]. In light of this, the mances, which means that the ranges “ptobest” and “ptoworst” have
remainder of this section focuses on quantitative and qualitative tech­ the same strength-of-preference. Therefore, the performance p would get
niques tailored for these two types of evaluators. Specifically, we delve a midpoint score of 50. Similar midpoint questions are asked to identify
into methods for criteria weighting and building a value scale for each other points that can be used to form a piecewise linear value function or
criterion. a curve. This method allows the creation of value functions upon a
quantitative and continuous performance measure on the criterion.
2.2.1. Weighting methods Bana e Costa and Vansnick [50] developed MACBETH [51] to create
A theoretically sound weighting method must consider the perfor­ a value scale for a criterion (and to weight criteria, as described in the
mance ranges defined by two fixed references on each criterion. Com­ preceding section). Still, contrary to the above-mentioned methods, it
mon references are, for example, the “worst” and the “best” needs only to elicit qualitative judgments. An evaluator judges the dif­
performances [39] or “neutral” and “good” performances [47]. Below, ference in attractiveness between two performances at a time, using the
we briefly describe two quantitative weighting procedures and one qualitative scale presented in the previous section, and inputs them into
qualitative. the software tool M-MACBETH. This tool verifies the consistency of the
Keeney and Raiffa [48] developed the trade-off procedure, which is a inputted judgments and generates a proposal of a value scale compatible
numerical method that requires establishing indifferences between two with them and with the scores assigned to the reference performances
fictitious projects using two criteria at each time. After establishing n 1 “best” and “worst” (or “good” and “neutral”) [52]. In the final step, the
indifference relationships for the n criteria, a system of equations is evaluator must validate and adjust the proposed value scale if needed.
solved, including one equation in which the sum of the weights equals 1, As in direct rating, this method allows scoring a project directly or
to obtain the criteria weights. indirectly using any performance measure.
Edwards and Barron [49] created the swing weighting method, which
is a numerical method that involves measuring the relative importance 2.3. Review summary
of the improvements (swings) that can be achieved on the criteria,
considering a change from the “worst” to the “best” performance on In the project success literature reviewed, most papers address the
each of them. identification of IT criteria (e.g., Lobato et al. [4] and Assalaarachchi
Bana e Costa and Vansnick [50] developed MACBETH [51] to weight et al. [56]) or success factors (e.g., Pinheiro et al. [57] and Jayakody and
the criteria. This procedure requires ranking the worstbest swings and Wijayanayake [58]), but only a few present an evaluation approach. In
judging them using the qualitative scale of difference in attractiveness: addition, the evaluation methods identified suffer from one or more
no (difference), very weak, weak, moderate, strong, very strong, or extreme. theoretical errors (e.g., weights used as indicators of importance, aver­
This qualitative scale is also used to judge the difference in attractive­ ages calculated with ordinal scales, application of techniques with
ness between two swings at a time. The elicited judgments are used to fill known flaws, and normalisation procedures that do not consider
in the upper triangular part of a matrix in the software tool non-linear preferences). Furthermore, as far as we know, there is no
M-MACBETH, which validates each judgments consistency with those description of a formal process that may guide the evaluators from
previously inputted (see [52], pp. 425443). Then, the software tool beginning to end, i.e., from identifying the evaluation criteria until
3
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
reaching an overall measure of project success. Therefore, a gap in the IT different roles in the project; all of them were somehow interested in the
project literature needs to be addressed, which will be done by applying projects outcomes. The group had three members: two from TEAMGSM
multi-criteria evaluation principles. and TEAMUNI, and one external consultant. The team members were
Given the characteristics of the evaluators, the simplicity of use of selected considering their managerial responsibilities and to ensure
the MACBETH method and its software tool M-MACBETH, including its representativeness of all the involved parties. All the members agreed to
ability to validate the consistency of the value judgments expressed by be involved in the model development tasks. Note that larger groups
evaluators and to work with any performance measure (be it qualitative require different group processes, typically having separate meetings
or quantitative, continuous or discrete), this was the approach selected with stakeholders of different areas of interest to develop parts of the
to weight the criteria and build a value function for each criterion in the model, and with merge meetings gathering higher-level representatives
real-world case described in this paper. of the client to validate the work done by the stakeholders and to finish
the overall model [63].
3. Model development Fig. 1 depicts the model development tasks. The first task involves
identifying the aspects of interest for evaluating the projects success
3.1. Research setting (“problem structuring”, described in Section 3.3). This is a critical task
because it is not possible to develop a proper evaluation model without
GlobalSysMakers develops solutions in four business areas: mobility understanding the problem, which is the reason why several publica­
solutions, industrial technology, consumer goods, and energy and tions have been devoted to identifying the fundamental evaluation
building technology. It has several divisions, including automobile concerns to be addressed (e.g., [28,64]). Second, all the relevant eval­
multimedia, automobile accessories, electric tools, heating and hot uation criteria should be included in the model, and a descriptor of
water, and home appliances. It employs roughly 410,000 associates performance should be identified for each of them, enabling the
worldwide, has about 440 subsidiaries and regional companies in 60 assessment of the extent to which each criterion is met (“model struc­
countries, and employs nearly 70,000 associates in research and devel­ turing”, Section 3.4). Third, the evaluation component of the model must
opment at 125 locations. be built (“value model building”, Section 3.5), which includes the con­
The target project, here identified as PROJRD, was part of an R&D struction of a value function for each criterion to transform the perfor­
program that had the participation of GlobalSysMakers and a university. mances of the project into value scores (Section 3.5.1), and weighting
The project had as its primary goal the development of a software tool to the criteria to depict their trade-offs (Section 3.5.2). Last, the evaluation
automate the assessment of printed circuit boards (PCBs) design. PCBs model should be tested for adequacy and consistency (Section 4.1).
are essentially boards that connect electronic components used in all
(but the simplest) electronic products, such as household appliances or
vehicles. In addition to the software tool, the project deliverables 3.3. Problem structuring
included technical specifications, prototypes, and presentations.
The software development process adopted was based on a hybrid/ The problem structuring task aims to identify the fundamental ob­
agile methodology supported by SCRUM [59]. Agile methods for soft­ jectives [45] that determine the projects success from the clients
ware development have been increasingly used in the IT sector [60] and perspective. Such objectives are essential reasons for the projects suc­
are now mainstream [61]. In this project, agility enabled greater cess. Therefore, they should be used as criteria in the evaluation model.
adaptability of the development phases according to the companys However, the identification of these objectives in ill-structured
needs and requirements, which evolved along with the project lifecycle. problems may not be easy, which is why we opted to apply a problem
Thus, it was possible to deal with changes in the requirements that were structuring method (PSM) known as group map [65], which can be used in
reflected in the final deliverables during the project development. In a combination with a multi-criteria decision analysis approach [66].
later phase of the project, the SCRUM was coupled with a waterfall To begin structuring the problem, the decision-making group was
process since the objectives stabilised without needing a periodic up­ asked to say which aspects or concerns were relevant to evaluate the
date. The project team was multidisciplinary, incorporating engineers projects success. Then, for each of the concerns expressed, it was asked,
from GlobalSysMakers (TEAMGSM) and researchers from the university “Why is that important?” or “What would be the consequences of doing
(TEAMUNI). Together, the teams (TEAMGSM and TEAMUNI) had that?”, which allowed us to identify other aspects.
electronics, software engineering, and project management skills. Fig. 2 depicts the complete group causal map built with the answers
On average, the team allocated 1040 h per month to the project
(approximately 6.5 Full-Time Equivalent), distributed by the different
tasks of the project and according to the functions performed by each
element (three of the team members were not full-time in the project).
The project had a duration of 36 months.
The projects overall success was first assessed using a simple grid
scoring model built by non-specialists in evaluation, which directly
scored the project on several criteria and assigned importance weights.
However, the project management team felt the need for a more
advanced model to improve confidence in the evaluation. More in-depth
research on multi-criteria evaluation revealed some misinterpretations
in that process, which ultimately led to the development of a new model
in line with decision analysis principles. This paper describes the new
evaluation model.
3.2. Development tasks
The model development process started by asking the project man­
ager to identify the members who should form the decision-making
group [62], i.e., the group in charge of developing the model to eval­
uate the projects success. It was recommended to select members with Fig. 1. Model development tasks.
4
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
Fig. 2. Group map.
of the elements of the group using the software tool “Decision Explorer”
(from Banxia Software Ltd., https://banxia.com/dexplore), which
automatically numbered the concerns for identification purposes. This
map results from several iterations, adding some aspects and removing
others. Note that a specific concern may be expressed by one statement
(e.g., “(33) good requirements definition”) or by two statements sepa­
rated by an ellipsis, which depicts a positive pole and a negative one to
clarify the meaning of the concern (e.g., “15 time fulfilment… time
exceeded”). An arrow between two concerns indicates the direction of
causality. When an arrow points to a concern with two poles, it means
that the concern affected is the one at the positive pole (e.g., a “(29) good Fig. 3. Projects success evaluation criteria.
contract management” contributes to the positive pole of “(1) cost
fulfilment… cost exceeded”; in the reverse case, the arrow would have a problem structuring task.
negative sign near its head). The concerns represented by these criteria are as follows:
In Fig. 2, it is possible to identify chains of means-ends objectives. For
example, an “(31) effective change management” contributes to the • Scope/quality fulfilment (ScoQual)—the extent to which the planned
“(36) deliverables use”, which respectively allows to “(41) reduce users (functional and non-functional) requirements were fulfilled (this
repetitive work”, which contributes to “increase users satisfaction”. criterion resulted from concern 14 in Fig. 2).
Although the “(41) reduce users repetitive work” is a means-objective
to the end-objective “(39) increase users satisfaction”, the group The prime deliverable of the project is a software tool to support the
considered the former a fundamental objective because it is important in PCBs design assessment, the other deliverables being subsidiary to this
itself and not because of its contribution to the latter. Therefore, “(41) tool. In the end, if the software tool does not comply with a minimum set
reduce users' repetitive work” will be used as an evaluation criterion. of planned requirements, it will not be able to assess the PCBs design
Objective “(39) increase users' satisfaction” was considered too broad to and will compromise the investment objectives.
evaluate the projects success and thus will not be used.
• Cost fulfilment (Cost)—the extent to which the planned cost was
fulfilled (this criterion resulted from concern 1 in Fig. 2).
3.4. Model structuring
The budget defined for the project needs to be carefully managed due
3.4.1. Evaluation criteria to being financed by an external R&D entity with a very narrow margin
Fig. 3 depicts the seven evaluation criteria that emerged from the of deviation.
concerns highlighted in bold in the group causal map developed in the
5
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
• Time fulfilment (Time)—the extent to which the planned time was direct (the descriptor levels should directly describe the performances on
fulfilled (this criterion resulted from concern 15 in Fig. 2). the corresponding criterion), operational (the information concerning
the performances of the project can be obtained and value judgments
Since this project is part of a large program, time fulfillment is a can be made), understandable (performances and value judgments made
significant management aspect because all the programs projects must using the descriptor can be clearly understood and communicated).
be finished simultaneously due to the programs constraints. In other Table 1 presents the list of all the descriptors created to measure the
words, not meeting the deadline in this project would mean completing performance of the project, as well as two reference performance levels,
it in whatever form it is in when the program reaches its end, complying “neutral” and “good”, for each of them. Note that the definition of two
or not with the scope, and delivering or not what was planned. reference performance levels is required to weigh the criteria, allowing
comparisons between criteria preference ranges and defining two fixed
• Increase of the number and type of errors identified in each verification anchors for the value scales (see Section 2.2). Furthermore, the use of a
cycle (IncNoType)—the extent to which the number and type of errors “neutral” performance level (which corresponds to a performance that is
identified in each PCBs verification cycle increase (this criterion neither positive nor negative on the criterion) and of a “good” perfor­
resulted from concern 43 in Fig. 2). mance level (which corresponds to a very positive performance on the
criterion) allows to increase the understandability of the criterion, and
Before the project was implemented in the company, the PCB designs are thus preferable to the “worst” and the “best” references used as ex­
had been checked mainly in a semi-automatic way by specialised engi­ amples in Section 2.2.
neers. Due to the many PCB components, details, and rules to review, it As shown in Table 1, the criteria scope/quality fulfilment and increase
was virtually impossible to check all of the required features. The in the number and type of errors identified in each verification cycle do not
consequence was the late detection of some errors in more advanced have direct descriptors of performance. For these criteria, constructed
stages of the projects, or, in other words, in later verification cycles. This descriptors were developed combining the characteristics inherent to
accounts for the importance of the new software tool to increase the those criteria, as explained next (Bana e Costa et al. [67] describe a
number and type of errors identified early on in each verification cycle, detailed procedure for creating constructed descriptors).
thereby reducing the design costs. To measure the performance of the project on the scope/quality
fulfilment criterion, several requirements that deliver different contri­
• Reduction of the number of verification cycles (RNVC)—the extent to butions to the projects success were considered, following the MoSCoW
which the number of verification cycles is reduced (this criterion method principles [68]. These requirements were classified into three
resulted from concern 37 in Fig. 2). types (“must have”, “important to have”, and “nice to have”) and
combined to obtain the performance levels of the descriptor presented in
A PCB typically needs to go through several verification cycles until Table 2.
it is free from errors and ready for production. When errors are detected To measure the performance of the project on the increase of the
in a verification cycle, the PCB design needs to be corrected and tested number and type of errors identified in each verification cycle criterion,
again, possibly requiring a new verification cycle. Each verification several combinations of the number and type of errors identified at each
cycle of a PCB design implies high costs. Furthermore, there is the risk of verification cycle (based on a past project) need to be considered (see
detecting errors only at the production stage, with even more severe Table 3). For example, a “5 % increase in the number of identified er­
consequences. A primary expected result of the new software tool is to rors” and a “10 % increase in the type of identified errors” is a perfor­
reduce the number of verification cycles by enabling the early detection mance depicted as level “E5 T10”. A verification cycle includes a series
of errors. of tests to check for errors in the PCBs design or if it is ready for pro­
duction (free from errors).
• Improve efficiency (ImpEff)—the extent to which the number of We note that the indicators used in the constructed scales presented
verified rules increases in each verification cycle without increasing in Tables 2 and 3 cannot be considered in isolation, as they are mutually
the involved human resources (this criterion resulted from concern preferentially dependent. For example, in Table 3, an increase of 10 % in
42 in Fig. 2).
Since the process for verifying the PCBs design rules is semi- Table 1
automatic, with a substantial part of manual labour, the current num­ Descriptors of performance.
ber of specialised engineers can only check some of the relevant aspects. Criterion Descriptor Neutral Good
With the new software tool, it is expected that the same number of en­
Scope/quality fulfilment Constructed L2 L3
gineers can check a greater number of design rules, not spending more (ScoQual) descriptor (see
time doing it. Table 2)
Cost fulfilment (Cost) Cost of the project Planned 95 % of the
• Reduction of the repetitive work of the users (RRWU)—the extent to (k€) cost planned cost
(k€ 500) (k€ 450)
which the number of rules manually verified is reduced in each Time fulfilment (Time) Project duration Planned 95 % of the
verification cycle (this criterion resulted from concern 41 in Fig. 2). (weeks) time planned time
(96 (90 weeks)
In the semi-automatic verification of PCBs design rules, manual la­ weeks)
Increase in the number and Constructed E5 T0 E10 T5
bour is repetitive and prone to errors due to the fatigue of specialists.
type of errors identified in descriptor (see
Automating most of the rules assessment is expected to reduce the re­ each verification cycle Table 3)
petitive work of these specialists and free them to perform other tasks. (IncNoType)
Reduction of the number of Number of 1 cycle 2 cycles
3.4.2. Descriptors of performance verification cycles verification cycles
(RNVC) decreased
In this task, we associate a descriptor of performance with each Improve efficiency (ImpEff) Number of verified 0% 40 %
evaluation criterion to measure how much the project satisfies the cri­ rules increased ( %)
terion. According to Keeney [45], a descriptor should be unambiguous (to Reduction of the repetitive Number of rules 0% 10 %
describe the performances on the associated criterion clearly), compre­ work of the users (RRWU) manually verified
reduced ( %)
hensive (to cover the range of possible performances on the criterion),
6
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
Table 2 scope/quality fulfilment criterion with a discrete descriptor, and time
Scale for “scope/quality fulfilment” criterion. fulfilment criterion with a continuous descriptor.
Performance levels Fig. 4 presents the matrix of judgments for the scope/quality fulfilment
criterion. Table 2 shows the constructed descriptor for this criterion
The project…
…satisfied all the requirements “must have” and “important to have” L1 where: L1 means “the project satisfied all the requirements must have
and most of the “nice to have” and important to have and the majority of the nice to have”, L2 means
…satisfied all the requirements “must have” and at least 85 % of the L2 = Good “the project satisfied all the requirements must have and at least 85 %
“important to have” and at least 20 % of the “nice to have” (or an of the important to have and at least 20 % of the nice to have (or an
equivalent performance on the requirements “important to have”
and “nice to have”)
equivalent performance)”, and L3 means “the project satisfied all the
…satisfied all the requirements “must have” and at least 60 % of the L3 = requirements must have and at least 60 % of the important to have
“important to have” and at least 20 % of the “nice to have” (or an Neutral and at least 20 % of the nice to have (or an equivalent performance)”.
equivalent performance on the requirements “important to have” We can see in Fig. 4 that the difference in attractiveness between “L1”
and “nice to have”)
and “L2 = Good” was deemed weak by the evaluators, whereas the
…did not satisfy one requirement “must have”, or satisfied less than 60 L4
% of the requirements “important to have” difference in attractiveness between “L2 = Good” and “L3 = Neutral”
…did not satisfy more than one requirement “must have” L5 was considered moderate. Therefore, the difference in value between
“L1” and “L2 = Good” should be lower than the difference between “L2
= Good” and “L3 = Neutral”, which can be confirmed in the value scale
Table 3 presented in Fig. 6a, where the former difference corresponds to 65
Constructed scale for “increase of the number and type of errors identified in value units and the latter to 100.
each verification cycle” criterion. The time fulfilment criterion has the descriptor of performance
Increase in the number of Increase in the type of Level
“project duration (in weeks)” with the references “96 weeks = Neutral”
identified errors (E) identified errors (T) and “90 weeks = Good”. To build a value function for this criterion, first,
we created three more equally spaced performance levels: one worse
10 % 10 % E10 T10
10 % 5% E10 T5 = than “neutral” (99 weeks), one between “neutral” and “good” (93
Good weeks), and one better than “good” (87 weeks). Then, the evaluators
10 % 0% E10 T0 judged the differences in attractiveness between each two of these
5% 10 % E5 T10 levels, together with the “neutral” and the “good” levels, resulting in the
5% 5% E5 T5
5% 0% E5 T0 =
matrix of judgments presented in Fig. 5.
Neutral Looking at the diagonal (above the grey shaded cells) of the matrix in
0% 0% E0 T0 Fig. 5 we see that the intensities of the differences in attractiveness
between each two consecutive levels increase more when the number of
weeks exceeds 93 weeks: the evaluators considered weak the differences
the number of identified errors (E) is valued more highly when the per­
in attractiveness between “87” and “90 = Good” (and also between “90
centage increase in the type of identified errors (T) is greater. Otherwise, the
= Good” and “93”), whereas they considered moderate the difference in
number and the type of identified errors could have been used as in­
attractiveness between “93” and “96 = Neutral”, and very strong the
dicators for two separate evaluation criteria.
difference between “96 = Neutral” and “99”. Therefore, the difference in
After the seven criteria had been clearly identified and their de­
value between “87” and “90 = Good” (and also between “90 = Good”
scriptors of performance established, the decision-making group was
and “93”) should be lower than the difference in value between “93” and
asked whether there was any additional aspect that might be considered
“96 = Neutral”, and the latter should also be lower than the difference in
in assessing the projects success. The negative response indicated that
value between “96 = Neutral” and “99”, which can be confirmed in the
this set of criteria was exhaustive and, consequently, that the value tree
value function presented in Fig. 6c (each of the first two intervals cor­
presented in Fig. 3 could be considered complete.
responds to 40 value units, whereas the third and fourth equal 60 value
units and 160, respectively). Therefore, this function shows that the
3.5. Value model building evaluators considered that increments in time after 93 weeks are
increasingly penalizing for the projects success.
3.5.1. Value functions We emphasize that the decision group made these judgments for
As previously described, a descriptor of performance provides a way each criterion independently of the performance levels or the differences
of measuring the projects performance on its associated criterion. in attractiveness on the remaining criteria, thereby supporting the
However, to build a value model, we also need to obtain the value of assumption of mutual preferential independence between criteria.
each plausible performance of the project (in the form of a value scale or Fig. 6 (6a6g) presents the value functions of all the evaluation
value function), which requires knowing the preferences of the evalua­ criteria.
tors upon differences in performances on the corresponding criterion.
For that purpose, we applied the MACBETH method [51]. As 3.5.2. Criteria weighting
described in Section 2.2, the questioning procedure of MACBETH re­ Weighting requires establishing trade-offs between criteria, which is
quires the evaluators to answer questions of difference in attractiveness typically demanding because it implies comparing performance im­
between two performance levels at each time, using the qualitative provements on different criteria. The improvements (swings) are defined
scale: no (difference in attractiveness), very weak, weak, moderate, between the two predefined performance references, “neutral” and
strong, very strong, and extreme. The answers provided are used for “good”, in each criterion.
filling in a matrix of judgments in the M-MACBETH software tool, which According to the MACBETH weighting procedure, the first step was
analyses the consistency of the answers as soon as they are inserted, and to rank the “neutralgood” swings in order of decreasing preference
then generates (by linear programming) a proposal of value scale which (Fig. 7). The evaluators considered the swing from “1 to 2 verification
is compatible with the answers provided, given the fixed value scores cycles decreased” as the most important one (1st in Fig. 7), which im­
assigned to the “neutral” and the “good” performances (0 and 100 value plies that the criterion “reduction of the number of verification cycles
units, respectively). (RNVC)” will have the highest weight. In contrast, the criterion
We present two examples of applying the MACBETH method to build “reduction of repetitive work of the users (RRWU)” will obtain the
value functions for criteria with different descriptors of performance: lowest weight because it has the least important “neutralgood” swing
7
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
Fig. 4. MACBETH judgment matrix for the “Scope/quality fulfilment” criterion.
Fig. 5. MACBETH judgment matrix for the “time fulfilment” criterion.
(7th in Fig. 7). criteria, because their performances are not worse than “neutral” in any
In the second step, the improvements provided by the criteria swings of the criteria and are better than it in several criteria. Therefore, both
were judged qualitatively using the MACBETH semantic scale (Fig. 8), scenarios dominate [69] a “neutral project”. Additionally, we may see
which allowed filling in the rightmost column in Fig. 9. For example, the that scenario “PCB red 2 cycles” has an overall score very close to that of
improvement provided by the most important swing [RNVC] was a “good project” (100 units), whereas the value of scenario “PCB red 1
considered extreme, whereas the least important “neutralgood” swing cycle” is almost mid-distance from a “neutral project” and a “good
[RRWU] was judged weak. project”.
Then, the differences in attractiveness between each two “neu­ However, it is not robust to say that the scenario “PCB no red of
tralgood” swings were assessed to fill in the remaining cells of the first cycles” corresponds to an unsuccessful project, looking only at its overall
row of the weighting matrix and fill in the diagonal above the shaded value score. We must determine if its overall result will always be worse
cells in Fig. 9. For example, Fig. 10 depicts the comparison of the than that of a “neutral project” when in the face of the uncertainty
“neutralgood” swings in the reduction of the number of verification cycles defined for the model parameters (i.e., the value scores and criteria
(RNVC) criterion and in the increase in the number and type of errors weights). In fact, the evaluators considered it plausible that: a) each
identified in each verification cycle (IncNoType) criterion, which was criterion weight (wj,j = 1, …, 7) may vary within an interval defined by
( )
deemed as very strong (v. strong in Fig. 9). The other cells with no the lower and upper limits wj ≤ wj ≤ wj , j = 1, …, 7 shown in Table 6;
judgments were filled in automatically (by transitiveness) with “P” and b) the value scores of the scenario “PCB no red of cycles” may have
(positive) judgments by M-MACBETH. ( ) ( )
plus or minus 5 value units (respectively denoted by vj yj and vj yj ,
Finally, the software tool applied the linear programming model
described in Bana e Costa et al. [51] to generate a proposal of a j = 1,…,7) in all the criteria for which this scenario has a performance
weighting scale consistent with the qualitative judgments expressed in different from “neutral” and “good”, otherwise it will keep 0 and 100,
the weighting matrix, which were subsequently validated by the eval­ respectively.
uators (with some minor adjustments), resulting in the weights pre­ The linear programming (LP) problem (2) was then used to test
sented in Fig. 11. whether a “neutral project” additively dominates [70] the scenario “PCB
no red of cycles”, which would require a negative maxD. The result
maxD = 9.575denotes that there is at least one combination of plausible
4. Results and discussion
scores and weights for which scenario “PCB no red of cycles” has a
higher overall value than that of a “neutral project”.
4.1. Model testing and results
The worst possible overall value for scenario “PCB no red of cycles”
was also calculated, with the LP problem (3), resulting in minD =
At this point, the actual performances of the project are already
14.10. Therefore, in the face of the uncertainty, the overall value score
known for most of the criteria, but not for the reduction of the number of
of scenario “PCB no red of cycles” may vary between 14.10 and 9.575.
verification cycles (RNVC) criterion, which will only be identified in the
long term. Therefore, three alternative scenarios were created with 7
∑ [ ( ) ( )]
hypothetical future performances on RNCV: no reduction at all (PCB no maxD = wj vj yj vj neutralj (2)
j=1
red cycles), a decrease of one verification cycle (PCB red 1 cycle), and a
decrease of two verification cycles (PCB red 2 cycles). The performances Subject to:
of these scenarios are shown in Table 4.
7
Applying the value functions previously defined for each criterion to wj = 1
the performances presented in Table 4, we obtain the partial and the j=1
overall value scores of the three scenarios shown in Table 5 using the
previously assessed criteria weights. wj ≤ wj ≤ wj , j = 1, …, 7
As seen in Table 5, the most advantageous scenario corresponds to
[ ( ) ]
“PCB red 2 cycles” with 94.60 overall value units, followed by “PCB red 7
∑ ( )
1 cycle” with 49.60, and “PCB no red of cycles” with 6.65. minD = wj vj yj vj neutralj (3)
j=1
Scenarios “PCB red 2 cycles” and “PCB red 1 cycle” undoubtedly
denote a successful project independently of the weights assigned to
8
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
Fig. 6. Value functions of criteria: (a) scope/quality fulfilment, (b) cost fulfilment, (c) time fulfilment, (d) increase in the number and type of errors identified in each
verification cycle, (e) reduction of the number of verification cycles, (f) improve efficiency, (g) reduction of the repetitive work of the users.
9
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
Fig. 7. Neutralgood swings ranking.
Fig. 8. Neutralgood swings weighting judgments.
Fig. 9. MACBETH weighting matrix (the P and I within the matrix respectively mean positive difference in attractiveness and indifference).
subject to:
10
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
members. Therefore, the model has a form and content sufficient to
evaluate the projects success [71].
5. Discussion
The absence of a formal evaluation of project success results in the
waste of relevant lessons that can be used to enhance project manage­
ment practices [9,72]. This is a strong reason for implementing
well-structured processes to evaluate project success.
Any evaluation process should start by identifying the success
criteria according to the decision-makers preferences and systems of
values, which are inherently subjective. We underscore that an evalua­
tion model has an objective component (factual data) and a subjective
one (value judgments), which should be independently addressed.
Therefore, subjectivity is a key component in an evaluation process, but
it should not be confused with ambiguity, which should be avoided. That
is why the success evaluation criteria should be carefully identified, and
a measure of the performance of a project on each of those criteria must
be operationalised. The “neutral” and “good” references of intrinsic
value allow identifying the projects success level.
Fig. 10. Assessment of the difference in attractiveness between the “neu­ Throughout the development of the evaluation model, the members
tralgood” swings in RNVC and IncNoType. of the decision-making group were encouraged to engage in open dis­
cussion whenever differences of opinion arose. This approach enabled a
better understanding of their points of view and helped the group reach
an agreement on the way forward.
In the case described herein, the success of the project may depend
on the future performance of the reduction of the number of verification
cycles (RNVC) criterion. With “no reduction of verification cycles”, the
project may be unsuccessful, with 6.65 overall value units, caused by
its low performance and corresponding negative score (125 value
units) on this criterion. However, as we have seen, given the uncertainty
defined for the partial value scores and the criteria weights, this scenario
is not guaranteed to correspond to a negative evaluation. In fact, its
overall value may vary between 14.10 and 9.575 units.
With a “reduction of 1 verification cycle”, the project would obtain
49.60 overall value units, which is nearly a mid-distance evaluation
between a “good project” and a “neutral project”. With a “reduction of 2
verification cycles”, the project would obtain 94.60 overall value units,
Fig. 11. Criteria weights. which is very close to that of a “good project”.
Developing a transparent evaluation process, such as the one
7
∑ described here, will promote the decision-making groups understand­
wj = 1 ing and acceptance of the results. The participation of the decision-
j=1 makers in all of the process phases is a key element for this purpose,
which will allow them to develop a sense of ownership of the model
wj ≤ wj ≤ wj , j = 1, …, 7 [63]. However, this is not a practice found in the literature related to
After concluding the robustness analysis, the evaluation group evaluating project success, which offers an opportunity for
revisited the model and considered that it could deal with all the plau­ improvement.
sible performances and adequately considered the value judgments of its The proposed process, which integrates a problem structuring
Table 4
Performance profiles of the projects success for the three scenarios.
Scenario / Criterion ScoQual Cost (k€) Time IncNoType RNVC ImpEff RRWU
(weeks) ( %) ( %)
PCB no red of cycles L2 480 96 E10 T10 No decrease 60 15
PCB red 1 cycle L2 480 96 E10 T10 Decrease 1 cycle 60 15
PCB red 2 cycles L2 480 96 E10 T10 Decrease 2 cycles 60 15
Table 5
Value scores of the project success for the three scenarios.
Scenario / Criterion ScoQual Cost Time IncNoType RNVC ImpEff RRWU Overall value score
(15 %) (5 %) (8 %) (22 %) (45 %) (3 %) (2 %)
PCB no red of cycles 100 40 0 115 125 150 140 6.65
PCB red 1 cycle 100 40 0 115 0 150 140 49.60
PCB red 2 cycles 100 40 0 115 100 150 140 94.60
11
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
Table 6
Plausible intervals for the criteria weights.
Criterion ScoQual Cost Time IncNoType RNVC ImpEff RRWU
Index (j) 1 2 3 4 5 6 7
Current weight (wj) 15 % 5% 8% 22 % 45 % 3% 2%
( )
Upper limit wj 18 % 7% 10 % 25 % 45 % 4% 2.5 %
Lower limit (wj ) 12 % 5% 8% 19 % 40 % 3% 2%
method with a multi-criteria decision analysis (MCDA) approach for encouraging future research to refine, validate, and extend the proposed
evaluating the success of information technology (IT) projects, offers framework. Ultimately, this work not only enriches theoretical under­
several significant theoretical contributions to the fields of project standing but also provides a foundation for more consistent, transparent,
management, decision sciences, and IS. First, it advances the conceptual and stakeholder-aligned evaluation practices in the IT project domain.
understanding of IT project success by addressing its inherently multi­
dimensional and context-dependent nature. Traditional models often 6. Conclusions
rely on narrow success criteria—such as time, cost, and scope—while
this research introduces a more holistic and stakeholder-sensitive Evaluating the success of IT projects should be a mandatory project
framework. By incorporating problem structuring methods, the pro­ management activity. However, this is not observed in the practice [11,
cess facilitates the elicitation and organization of the stakeholder per­ 72]. There are several contributions given by the process herein
spectives, which are often overlooked or underrepresented in described, which can be easily adapted to other evaluation problems:
conventional evaluation models. This contributes to theory by empha­
sizing the social and interpretive dimensions of project success, aligning • It shows how a multi-criteria approach may be used to evaluate IT
with contemporary views that success is not an objective outcome but a (software development) projects while avoiding committing critical
negotiated construct [73]. mistakes.
Second, the integration of MCDA techniques provides a rigorous and • It offers a transparent process.
transparent mechanism for prioritizing and aggregating evaluation • It involves the decision-makers in all of the model development
criteria, thereby enhancing the methodological robustness of success tasks.
assessment. This methodological synthesis bridges a gap in the literature • It identifies the fundamental objectives of decision-makers with the
by demonstrating how qualitative insights from problem structuring can help of a problem structuring method, avoiding ending up solving
be systematically translated into quantitative decision models. Theo­ the wrong problem [76].
retically, this supports the development of hybrid evaluation frame­ • It allows establishing quantitative and substantive meaningful [23]
works that are both contextually grounded and analytically sound. trade-offs between criteria (i.e., mathematically valid and unam­
Third, the application of the proposed process in a real-world case adds biguously understood).
empirical depth to the theoretical model, offering evidence of its prac­ • It allows the management of the project to focus on what matters for
tical relevance and adaptability. This empirical grounding strengthens the projects success.
the external validity of the framework and encourages further theoret­ • It can be implemented to evaluate the success of other projects, in
ical exploration across different organizational and project contexts. similar or different contexts.
The MACBETH approach has been successfully employed, with • The use of descriptors of performance clarifies what is intended to be
different nuances and across various processes, to evaluate projects or achieved in each criterion.
decision alternatives in diverse problem settings and for a wide range of • It distinguishes performance from value, instead of directly attrib­
organizations [74]. The process described in this paper, which combines uting scores to the project, mixing these two components.
problem structuring with the MACBETH approach and robustness • And, it allows creating value scales adjusted to the preferences of
analysis, may also be applied in other contexts, subject to the necessary evaluators, upon different types of performance (e.g., qualitative or
adjustments. quantitative, continuous or discrete).
Our proposed process can also be scaled to the program or portfolio
level, although this should be done with caution. In the case presented Additionally, it enables the identification of alternative scenarios to
here, we applied an additive value function model, which is compen­ deal with unknown future performances and to test the robustness of the
satory—meaning that poor performance on one criterion can be offset conclusions considering uncertainties on the model parameters.
by good performance on others. However, this assumption may not al­ In the target organization, given the shortcomings recognised in a
ways hold. In a program or portfolio context, for instance, if a key previous “grid scoring model”, the multi-criteria evaluation model of the
project performs poorly, that alone may render the entire program or real-world case described in this paper was built during an advanced
portfolio unsuccessful, regardless of the performance of the remaining stage of the projects development. This late development can be
projects. In such cases, a mixed model should be adopted, combining considered a threat to internal validity regarding consistency and a
classification rules to address the non-compensatory criteria with an limitation since the evaluation model should be built during the plan­
additive component for the compensatory ones. ning phase of a project and revisited during the project development to
Moreover, the research highlights the absence of standardized ap­ be improved, if needed, or adjusted to possible changes to the project
proaches for evaluating IT project success, which has long been a limi­ aim. Another threat to external validity should also be disclosed.
tation in both academic and professional domains. Standardization Namely, concerning scalability, further research is needed to test if the
facilitates the dissemination of knowledge and enhances predictability, proposed process can be scaled or adapted for different project sizes or
thereby minimizing uncertainty and reducing risk [75]. By proposing a types.
replicable and adaptable process, the study lays the groundwork for the In future work, it would be interesting to create a process capable of
development of formalized evaluation standards. This has implications dealing with all project phases, allowing the evaluation of its develop­
for theory-building, as it suggests a pathway toward unifying frag­ ment and evolution at several milestones, from the project initiation
mented evaluation practices under a coherent, theoretically informed until its termination. The process described in this paper may be
model. In doing so, it contributes to the ongoing discourse on stan­ extended to evaluate project success throughout the project lifecycle.
dardization in project management and information systems evaluation, This requires developing a model that includes both final and
12
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
intermediate objectives (criteria) for measuring project success. The [10] J. Varajão, J.C. Lourenço, J. Gomes, Models and methods for information systems
project success evaluationa review and directions for research, Heliyon 8 (12)
intermediate objectives should be used during project development and
(2022), https://doi.org/10.1016/j.heliyon.2022.e11977.
later deactivated by setting their weights to zero and rescaling the [11] J. Varajão, J.Á. Carvalho, Evaluating the success of IS/IT projects: how are
remaining criteria weights so that they sum to one. Monitoring the companies doing it?, in: Proceedings of the 13th Pre-ICIS International Research
evolution of a projects success against a well-defined set of criteria will Workshop on IT Project Management (IRWITPM 2018), San Francisco, USA, 2018.
[12] R.L. Keeney, Common mistakes in making value trade-offs, Oper. Res. 50 (6)
allow identifying problems sooner and taking proper measures in time. (2002) 935945, https://doi.org/10.1287/opre.50.6.935.357.
Furthermore, the integration of the proposed evaluation process in the [13] J.E. Russo, P.J.H. Schoemaker, Decision Traps: The Ten Barriers to Brilliant
success management process [77] will add value to the management Decision-Making and How to Overcome Them, Doubleday, 1989.
[14] S. Lipovetsky, A. Tishler, D. Dvir, A. Shenhar, The relative importance of project
efforts. success dimensions, R&D Manag. 27 (2) (1997) 97106, https://doi.org/10.1111/
Finally, since artificial intelligence technology, especially with the 1467-9310.00047.
rise of Large Language Models (LLMs), has shown great potential in [15] Shapiro, J. (2005). Monitoring and evaluation. C.-W. A. f. C. Participation. htt
ps://www.civicus.org/view/media/Monitoring%20and%20Evaluation.pdf.
revolutionizing the automation of various complex tasks [78], it is [16] Kahan, B., & Goodstadt, M. (2005). The IDM manual: basics. http://sites.utoronto.
imperative to explore it in the context of success evaluation. ca/chp/download/IDMmanual/IDM_basics_dist05.pdf.
[17] V. Arumugam, J. Antony, M. Kumar, Linking learning and knowledge creation to
project success in Six Sigma projects: an empirical investigation, Int. J. Prod. Econ.
CRediT authorship contribution statement 141 (1) (2013) 388402, https://doi.org/10.1016/j.ijpe.2012.09.003.
[18] R. Linzalone, G. Schiuma, A review of program and project evaluation models,
João Carlos Lourenço: Writing review & editing, Writing orig­ Meas. Bus. Excell. 19 (3) (2015) 9099, https://doi.org/10.1108/MBE-04-2015-
0024.
inal draft, Visualization, Validation, Software, Methodology, Investiga­ [19] P.L. Bannerman, A. Thorogood, Celebrating IT projects success: a multi-domain
tion, Formal analysis, Conceptualization. João Varajão: Writing analysis, in: Proceedings of the 45th Hawaii International Conference on System
review & editing, Writing original draft, Validation, Methodology, Sciences, Maui, HI, 2012.
[20] C. Barclay, K. Osei-Bryson, Determining the contribution of IS projects: an
Investigation, Data curation, Conceptualization.
approach to measure performance, in: Proceedings of the 42nd Hawaii
International Conference on System Sciences, Waikoloa, HI, 2009.
[21] R.L. Keeney, Value-Focused Thinking: A Path to Creative Decisionmaking, Harvard
Declaration of competing interest
University Press, 1992.
[22] R. Solingen, E. Berghout, The Goal/Question/Metric Method: A Practical Guide for
The authors declare that they have no known competing financial Quality Improvement of Software Development, McGraw-Hill, 1999.
[23] S. French, Decision Theory: An Introduction to the Mathematics of Rationality,
interests or personal relationships that could have appeared to influence
Ellis Horwood, 1986.
the work reported in this paper. [24] R. Göb, C. McCollin, M. Ramalhoto, Ordinal methodology in the analysis of Likert
scales, Qual. Quant. 41 (5) (2007) 601626, https://doi.org/10.1007/s11135-007-
9089-z.
Acknowledgement
[25] S.S. Stevens, On the theory of scales of measurement, Science 103 (2684) (1946)
677680, https://doi.org/10.1126/science.103.2684.677.
This work has been supported by FCT Fundação para a Ciência e [26] W. Edwards, J.R. Newman, Multiattribute evaluation, in: T. Connolly, H.R. Arkes,
Tecnologia within the R&D Unit Project Scope UID/00319/2025 - K.R. Hammond (Eds.), Judgment and Decision Making: An Interdisciplinary
Reader, 2nd ed, Cambridge University Press, 2000, pp. 1734.
Centro ALGORITMI (ALGORITMI/UM). João C. Lourenço acknowledges [27] R. von Nitzsch, M. Weber, The effect of attribute ranges on weights in
the financial support of Portuguese funds through FCT Fundação para multiattribute utility measurements, Manag. Sci. 39 (8) (1993) 937943, https://
a Ciência e a Tecnologia, I.P., under the project UID/97/2025 (CEGIST). doi.org/10.1287/mnsc.39.8.937.
[28] A. Basar, A novel methodology for performance evaluation of IT projects in a fuzzy
João C. Lourenço acknowledges the financial support of Portuguese environment: a case study, Soft Comput. 24 (14) (2020) 1075510770, https://doi.
funds through FCT Fundação para a Ciência e a Tecnologia, I.P., under org/10.1007/s00500-019-04579-y.
the project UID/97/2025 (CEGIST). [29] H.N. Ismail, Measuring success of water reservoir project by using delphi and
priority evaluation method, in: Proceedings of the IOP Conference Series: Earth
and Environmental Science 588, 2020 042021, https://doi.org/10.1088/1755-
Data availability 1315/588/4/042021.
[30] J.H. Yu, H.R. Kwon, Critical success factors for urban regeneration projects in
Korea, Int. J. Proj. Manag. 29 (7) (2011) 889899, https://doi.org/10.1016/j.
The data is presented in the article. ijproman.2010.09.001.
[31] A. Nguvulu, S. Yamato, T. Honma, Project performance evaluation using deep
References belief networks, IEEJ Trans. Electron. Inf. Syst. 132 (2) (2012) 306312, https://
doi.org/10.1541/ieejeiss.132.306.
[32] C. Wohlin, A.A. Andrews, Assessing project success using subjective evaluation
[1] R. Colomo-Palacios, I. González-Carrasco, J.L. López-Cuadrado, A. Trigo, J.
factors, Softw. Qual. J. 9 (1) (2001) 4370, https://doi.org/10.1023/a:
E. Varajao, I-Competere: using applied intelligence in search of competency gaps in
1016673203332.
software project managers, Inf. Syst. Front. 16 (4) (2014) 607625, https://doi.
[33] X. Yan, Utilizing the BSC method for IT performance evaluation of construction
org/10.1007/s10796-012-9369-6.
companies, in: Proceedings of the First International Conference on Information
[2] M.A. Kafaji, Interchange roles of formal and informal project management on
Science and Engineering, Nanjing, China, 2009.
business operational success, Prod. Plan. Control (2022) 121, https://doi.org/
[34] R.S. Kaplan, D.P. Norton, The balanced scorecardmeasures that drive
10.1080/09537287.2022.2089265.
performance, Harv. Bus. Rev. 70 (1) (1992) 7179.
[3] L.A. Ika, J.K. Pinto, The “re-meaning” of project success: updating and recalibrating
[35] C.L. Yang, R.H. Huang, M.T. Ho, Multi-criteria evaluation model for a software
for a modern project management, Int. J. Proj. Manag. 40 (7) (2022) 835848,
development project, in: Proceedings of the IEEE International Conference on
https://doi.org/10.1016/j.ijproman.2022.08.001.
Industrial Engineering and Engineering Management, Hong Kong, China, 2009.
[4] B. Lobato, J. Varajão, C. Tam, A.A. Baptista, CrEISPSa framework of criteria for
[36] T.L. Saaty, The Analytic Hierarchy Process: Planning, Priority Setting, Resource
evaluating success in information systems projects, Procedia Comput. Sci. 256
Allocation, McGraw-Hill, 1980.
(2025) (2025) 18211835, https://doi.org/10.1016/j.procs.2025.02.323.
[37] C.A. Bana e Costa, J.C. Vansnick, A critical analysis of the eigenvalue method used
[5] N. Agarwal, U. Rathod, Defining success for software projects: an exploratory
to derive priorities in AHP, Eur. J. Oper. Res. 187 (3) (2008) 14221428, https://
revelation, Int. J. Proj. Manag. 24 (4) (2006) 358370, https://doi.org/10.1016/j.
doi.org/10.1016/j.ejor.2006.09.022.
ijproman.2005.11.009.
[38] J.S. Dyer, Remarks on the analytic hierarchy process, Manag. Sci. 36 (3) (1990)
[6] R. Atkinson, Project management: cost, time and quality, two best guesses and a
249258, https://doi.org/10.1287/mnsc.36.3.249.
phenomenon, its time to accept other success criteria, Int. J. Proj. Manag. 17 (6)
[39] P. Goodwin, G. Wright, Decision Analysis for Management Judgment, 5th ed., John
(1999) 337342, https://doi.org/10.1016/S0263-7863(98)00069-6.
Wiley & Sons, 2014.
[7] H. Landrum, V.R. Prybutok, X. Zhang, The moderating effect of occupation on the
[40] V. Belton, T.J. Stewart, Multiple Criteria Decision Analysis: An Integrated
perception of information services quality and success, Comput. Ind. Eng. 58 (1)
Approach, Kluwer Academic Publishers, 2002.
(2010) 133142, https://doi.org/10.1016/j.cie.2009.09.006.
[41] R.L. Keeney, D. von Winterfeldt, Practical value models, in: W. Edwards, R.
[8] J.K. Pinto, D.P. Slevin, Project success: definitions and measurement techniques,
F. Miles Jr., D. von Winterfeldt (Eds.), Advances in Decision Analysis: From
Proj. Manag. J. 19 (1) (1988) 6772.
Foundations to Applications, Cambridge University Press, 2007, pp. 232252.
[9] J. Varajão, L. Magalhães, L. Freitas, P. Rocha, Success managementfrom theory to
practice, Int. J. Proj. Manag. 40 (5) (2022) 481498, https://doi.org/10.1016/j.
ijproman.2022.04.002.
13
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
[42] J.S. Dyer, J.E. Smith, Innovations in the science and practice of decision analysis: [61] V. Henriquez, J.A. Calvo-Manzano, A.M. Moreno, T. San Feliu, Agile governance
the role of management science, Manag. Sci. 67 (9) (2020) 53645378, https://doi. practices by aligning CMMI V2.0 with portfolio SAFe 5.0, Comput. Stand.
org/10.1287/mnsc.2020.3652. Interfaces 91 (2025) (2025) 103881, https://doi.org/10.1016/j.csi.2024.103881.
[43] J.E. Smith, J.S. Dyer, On (measurable) multiattribute value functions: an [62] V. Ferretti, G. Montibeller, Key challenges and meta-choices in designing and
expository argument, Decis. Anal. 18 (4) (2021) 247256, https://doi.org/ applying multi-criteria spatial decision support systems, Decis. Support Syst. 84
10.1287/deca.2021.0435. (2016) 4152, https://doi.org/10.1016/j.dss.2016.01.005.
[44] J.S. Dyer, R.K. Sarin, Measurable multiattribute value functions, Oper. Res. 27 (4) [63] L.D Phillips, Decision conferencing, in: W. Edwards, R.F. Miles Jr., D. von
(1979) 810822, https://doi.org/10.1287/opre.27.4.810. Winterfeldt (Eds.), Advances in Decision Analysis: From Foundations to
[45] R.L Keeney, Developing objectives and attributes, in: W. Edwards, R.F. Miles Jr., Applications, Cambridge University Press, 2007, pp. 375399.
D. von Winterfeldt (Eds.), Advances in Decision Analysis: From Foundations to [64] T.Y. Chen, H.F. Chang, Critical success factors and architecture of innovation
Applications, Cambridge University Press, 2007, pp. 104128. services models in data industry, Expert Syst. Appl. 213 (2023) 119014, https://
[46] B. Fasolo, C.A. Bana e Costa, Tailoring value elicitation to decision makers' doi.org/10.1016/j.eswa.2022.119014.
numeracy and fluency: expressing value judgments in numbers or words, Omega [65] C.M. Smith, D. Shaw, The characteristics of problem structuring methods: a
44 (0) (2014) 8390, https://doi.org/10.1016/j.omega.2013.09.006. literature review, Eur. J. Oper. Res. 274 (2) (2019) 403416, https://doi.org/
[47] C.A. Bana e Costa, E.C. Corrêa, J.M. De Corte, J.C. Vansnick, Facilitating bid 10.1016/j.ejor.2018.05.003.
evaluation in public call for tenders: a socio-technical approach, Omega 30 (3) [66] M. Marttunen, J. Lienert, V. Belton, Structuring problems for multi-criteria
(2002) 227242, https://doi.org/10.1016/S0305-0483(02)00029-4. decision analysis in practice: a literature review of method combinations, Eur. J.
[48] R.L. Keeney, H. Raiffa, Decisions With Multiple Objectives: Preferences and Value Oper. Res. 263 (1) (2017) 117, https://doi.org/10.1016/j.ejor.2017.04.041.
Tradeoffs, John Wiley & Sons, 1976. [67] C.A. Bana e Costa, J.C. Lourenço, M.P. Chagas, J.C. Bana e Costa, Development of
[49] W. Edwards, F.H. Barron, SMARTS and SMARTER: improved simple methods for reusable bid evaluation models for the Portuguese Electric Transmission Company,
multiattribute utility measurement, Organ. Behav. Hum. Decis. Process. 60 (3) Decis. Anal. 5 (1) (2008) 2242, https://doi.org/10.1287/deca.1080.0104.
(1994) 306325, https://doi.org/10.1006/obhd.1994.1087. [68] D. Clegg, R. Barker, Case Method Fast-Track: A RAD Approach, Addison-Wesley
[50] C.A. Bana e Costa, J.C. Vansnick, MACBETH An interactive path towards the Longman Publishing, 1994.
construction of cardinal value functions, Int. Trans. Oper. Res. 1 (4) (1994) [69] M. Weber, Decision making with incomplete information, Eur. J. Oper. Res. 28 (1)
489500, https://doi.org/10.1016/0969-6016(94)90010-8. (1987) 4457, https://doi.org/10.1016/0377-2217(87)90168-8.
[51] C.A. Bana e Costa, J.M. De Corte, J.C. Vansnick, MACBETH, Int. J. Inf. Technol. [70] C.A. Bana e Costa, P. Vincke, Measuring credibility of compensatory preference
Decis. Mak. 11 (2) (2012) 359387, https://doi.org/10.1142/ statements when trade-offs are interval determined, Theory Decis. 39 (2) (1995)
S0219622012400068. 127155, https://doi.org/10.1007/BF01078981.
[52] C.A. Bana e Costa, J.M. De Corte, J.C. Vansnick, On the mathematical foundations [71] L.D. Phillips, A theory of requisite decision models, Acta Psychol. 56 (13) (1984)
of MACBETH, in: S. Greco, M. Ehrgott, J.R. Figueira (Eds.), Multiple Criteria 2948, https://doi.org/10.1016/0001-6918(84)90005-2.
Decision Analysis: State of the Art Surveys, Springer, 2016, pp. 421463, https:// [72] J. Pereira, J. Varajão, N. Takagi, Evaluation of information systems project
doi.org/10.1007/978-1-4939-3094-4_11. successinsights from practitioners, Inf. Syst. Manag. (2021) 118, https://doi.org/
[53] W. Edwards, How to use multiattribute utility measurement for social 10.1080/10580530.2021.1887982.
decisionmaking, IEEE Trans. Syst. Man Cybern. 7 (5) (1977) 326340, https://doi. [73] N. Takagi, J. Varajão, ISO 21502 and Success Management: A Required Marriage in
org/10.1109/TSMC.1977.4309720. Project Management, SAGE Open, 2025, pp. 111, https://doi.org/10.1177/
[54] D. von Winterfeldt, W. Edwards, Decision Analysis and Behavioral Research, 21582440251355046. July-September.
Cambridge University Press, 1986. [74] F.A.F. Ferreira, S.P. Santos, Two decades on the MACBETH approach: a
[55] C.W. Kirkwood, Strategic Decision Making: Multiobjective Decision Analysis with bibliometric analysis, Ann. Oper. Res. 296 (1) (2021) 901925, https://doi.org/
Spreadsheets, Duxbury Press, 1997. 10.1007/s10479-018-3083-9v.
[56] L.I. Assalaarachchi, M.P.P. Liyanage, C. Hewagamage, A framework of critical [75] J. Varajão, L. Lopes, A. Tenera, Framework of standards, guides and methodologies
success factors of cloud-based project management software adoption, Int. J. Inf. for project, program, portfolio, and PMO management, Comput. Stand. Interfaces
Syst. Proj. Manag. 13 (2) (2025) e4, https://doi.org/10.12821/ijispm130204. 92 (2025) (2025) 103888, https://doi.org/10.1016/j.csi.2024.103888.
[57] N. Pinheiro, J. Vrajão, I. Moura, Success factors of public sector information [76] I.I. Mitroff, T.R. Featheringham, On systemic problem solving and the error of the
systems projects in developing countries, Sustain. Futures 10 (2025) (2025) third kind, Behav. Sci. 19 (6) (1974) 383393, https://doi.org/10.1002/
101095, https://doi.org/10.1016/j.sftr.2025.101095. bs.3830190605.
[58] J. Jayakody, W. Wijayanayake, Critical success factors for DevOps adoption in [77] J. Varajão, Success Management as a PM knowledge area work-in-progress,
information systems development, Int. J. Inf. Syst. Proj. Manag. 11 (3) (2023) Procedia Comput. Sci. 100 (2016) (2016) 10951102, https://doi.org/10.1016/j.
6082, https://doi.org/10.12821/ijispm110304. procs.2016.09.256.
[59] K. Schwaber, J. Sutherland, The Scrum Guide - The Definitive Guide to Scrum: The [78] Y. Kong, N. Zhang, Z. Duan, B. Yu, Collaboration with generative AI to improve
Rules of the Game, scrumguides.org, 2020. https://scrumguides.org/docs/sc requirements change, Comput. Stand. Interfaces 94 (2025) (2025) 104013, https://
rumguide/v2020/2020-Scrum-Guide-US.pdf. doi.org/10.1016/j.csi.2025.104013.
[60] M. Jovanovic, A.L. Mesquida, A. Mas, R. Colomo-Palacios, Agile transition and
adoption frameworks, issues and factors: a systematic mapping, IEEE Access 8
(2020) (2020) 1571115735, https://doi.org/10.1109/ACCESS.2020.2967839.
14

View File

@@ -0,0 +1,726 @@
Computer Standards & Interfaces 97 (2026) 104117
Contents lists available at ScienceDirect
Computer Standards & Interfaces
journal homepage: www.elsevier.com/locate/csi
ARMOR: A multi-layered adaptive defense framework for robust deep
learning systems against evolving adversarial threatsI
Mahmoud Mohamed , Fayaz AlJuaid
Electrical and Computer Engineering , King Abdul Aziz University, Saudi Arabia
ARTICLE INFO ABSTRACT
Keywords: Introduction: Adversarial attacks represent a major challenge to deep learning models deployed in critical
Adversarial machine learning fields such as healthcare diagnostics and financial fraud detection. This paper addresses the limitations of
Deep learning security single-strategy defenses by introducing ARMOR (Adaptive Resilient Multi-layer Orchestrated Response), a novel
Multi-layered defense
multi-layered architecture that seamlessly integrates multiple defense mechanisms.
Robustness evaluation
Methodology: We evaluate ARMOR against seven state-of-the-art defense methods through extensive experi-
Adaptive security
ments across multiple datasets and five attack methodologies. Our approach combines adversarial detection, in-
put transformation, model hardening, and adaptive response layers that operate with intentional dependencies
and feedback mechanisms.
Results: Quantitative results demonstrate that ARMOR significantly outperforms individual defense methods,
achieving a 91.7% attack mitigation rate (18.3% improvement over ensemble averaging), 87.5% clean accuracy
preservation (8.9% improvement over adversarial training alone), and 76.4% robustness against adaptive
attacks (23.2% increase over the strongest baseline).
Discussion: The modular framework design enables flexibility against emerging threats while requiring only
1.42× computational overhead compared to unprotected models, making it suitable for resource-constrained
environments. Our findings demonstrate that activating and integrating complementary defense mechanisms
represents a significant advance in adversarial resilience.
1. Introduction However, existing defenses are typically based on single strategies
such as adversarial training [6], input preprocessing [7], or detection
Deep learning technologies have been widely adopted in critical models [8]. While effective against specific attacks, these methods
sectors including autonomous vehicles, medical diagnostics, and cy- often fail when facing diverse or adaptive attacks [9]. This limita-
bersecurity. While they offer powerful capabilities, they also introduce tion is increasingly concerning as adversaries continue to evolve their
new security vulnerabilities. Adversarial examples—carefully crafted strategies. Furthermore, existing techniques often suffer from high com-
inputs designed to deceive models—pose significant risks to AI sys- putational costs, degraded performance on clean data, and continued
tems [1,2]. Small, seemingly imperceptible distortions can cause state- susceptibility to adaptive attacks [10].
of-the-art models to misclassify inputs, which may have life-threatening Problem Statement: This paper addresses the vulnerability of deep
consequences in safety-critical applications [3]. learning systems to adversarial attacks in mission-critical environments.
Recent advances in deep learning have highlighted the importance Current defenses exhibit three key weaknesses:
of robust defense mechanisms. For example, UNet-based segmentation
models in medical imaging have achieved approximately 96% accuracy 1. They typically optimize for a single threat model, leaving them
in COVID-19 detection from CT scans [4]. Similarly, CNN and BiGRU exposed to diverse attack strategies.
models have demonstrated strong performance in traffic network anal- 2. They employ static approaches that cannot adapt to evolving
ysis with an R-squared of 0.9912 [5]. These successes underscore the threats.
critical need for robust defenses, particularly as deep learning models 3. They fail to balance performance and security, often sacrificing
are increasingly integrated into high-stakes decision-making processes. accuracy on benign data.
I This article is part of a Special issue entitled: Secure AI published in Computer Standards & Interfaces.
Corresponding author.
E-mail address: mhassan0085@stu.kau.edu.sa (M. Mohamed).
https://doi.org/10.1016/j.csi.2025.104117
Received 2 June 2025; Received in revised form 2 December 2025; Accepted 12 December 2025
Available online 17 December 2025
0920-5489/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
M. Mohamed and F. AlJuaid Computer Standards & Interfaces 97 (2026) 104117
These weaknesses motivate the need for an agile and flexible defense 2.3. Detection-based defenses
architecture.
Research Gaps: Our comprehensive literature survey, following Detection methods aim to identify adversarial examples without
systematic review methodologies [11], identifies several critical gaps: necessarily correcting them. Metzen et al. [8] attached a binary detec-
tor subnetwork to identify adversarial inputs. Lee et al. [22] used Ma-
• Most defenses optimize for a single threat model, creating vulner- halanobis distance-based confidence scores to detect out-of-distribution
abilities across diverse attack strategies [12].
samples.
• Current ensemble approaches typically use simple voting or aver- Recent approaches include statistical methods using odds ratio
aging, failing to leverage the complementary strengths of different
tests [23] and Local Intrinsic Dimensionality (LID) [24] to characterize
defense mechanisms [13].
adversarial regions in feature space.
• There is insufficient focus on dynamic adaptation to evolving
While detection mechanisms can be accurate, adaptive attacks
threats in real-time operational environments [14].
specifically target their vulnerabilities [25]. Moreover, they do not
• The performance-security trade-off is poorly addressed, with
provide predictions for identified adversarial examples.
many techniques significantly degrading model performance on
benign inputs [15].
2.4. Certified robustness approaches
Our ARMOR framework addresses these gaps through:
Certified defenses provide theoretical guarantees that perturbations
• Orchestrated Integration: Complementary defense layers oper- within certain bounds will not alter predictions. Cohen et al. [26]
ate cooperatively rather than in isolation. applied randomized smoothing to create certifiably robust classifiers
• Dynamic Threat Assessment: Adaptive response mechanisms against L2-norm bounded perturbations. Gowal et al. [27] developed
learn from observed attack patterns. interval bound propagation for training verifiably robust networks.
• Explicit Trade-off Optimization: High clean accuracy is main- Recent progress includes DeepPoly [28], which provides tighter
tained while improving robustness. bounds for neural network verification, and improved certification
• Comprehensive Testing: Evaluation across diverse attacks, in- bounds for cascading architectures [29].
cluding engineered adaptive attacks. While certified methods offer valuable theoretical assurances, they
• Modular Design: New defense mechanisms can be incorporated generally achieve lower empirical robustness than adversarial training
as they emerge. and can be significantly more resource-intensive [30].
As shown in Table 1, our method advances the state-of-the-art
2.5. Ensemble and hybrid approaches
across multiple performance dimensions while maintaining reasonable
computational overhead.
Ensemble methods combine multiple models or defense mechanisms
2. Related work to enhance robustness. Tramèr et al. [31] proposed Ensemble Adversar-
ial Training, which augments training data with adversarial examples
This section analyzes current adversarial defense mechanisms, their from other models. Pang et al. [13] introduced adaptive diversity
limitations, and specific gaps our framework addresses. We categorize promoting (ADP) training to develop robust ensemble models. Sen
existing work into adversarial training, input transformation, detection- et al. [32] integrated detection and adversarial training in a two-stage
based methods, certified robustness, and ensemble approaches. process.
However, most current ensembles employ basic averaging or voting
2.1. Adversarial training methods schemes that fail to leverage the complementary strengths of different
defense types [33].
Adversarial training remains one of the most effective empirical
defense mechanisms. Madry et al. [6] introduced PGD adversarial 2.6. Research gaps and contributions
training, which serves as a strong baseline but suffers from reduced
clean accuracy and high computational cost.
Based on our literature review, we identify the following critical
Recent advances include TRADES [15], which explicitly regularizes
research gaps:
the trade-off between standard accuracy and robustness; Fast Adver-
sarial Training [16], which improves computational efficiency using • Poor Integration: Most studies focus on single defenses or simple
FGSM with randomization; and Robust Self-Training (RST) [17], which combinations that fail to leverage synergistic effects.
leverages additional unlabeled data to enhance robustness.
• Static Defense Mechanisms: Current approaches use fixed
Despite these improvements, adversarial training techniques remain
strategies that cannot adapt to evolving threats.
fundamentally constrained: they are typically resistant only to attacks
• Performance-Security Trade-offs: Robust models frequently sac-
encountered during training, often fail on out-of-distribution samples,
rifice clean-data accuracy.
and exhibit reduced performance on clean data [18].
• Lack of Standardization: Inconsistent evaluation protocols hin-
2.2. Input transformation approaches der fair comparisons.
• Insufficient Adaptive Attack Testing: Most defenses are not
Input transformation methods aim to remove adversarial perturba- evaluated against adaptive attacks designed to circumvent them.
tions before model inference. Guo et al. [7] explored various image
transformations, finding that total variance minimization and image Our ARMOR framework addresses these gaps through:
quilting provide moderate robustness. Xie et al. [19] proposed random
resizing and padding as preprocessing defenses. • Orchestrated Integration: Complementary defense layers oper-
More recent work includes Neural Representation Purifiers [20], ate cooperatively rather than in isolation.
which use self-supervised learning to clean adversarial inputs, and • Dynamic Threat Assessment: Response mechanisms adapt based
ComDefend [21], a compression-decompression architecture that elim- on observed attack patterns.
inates adversarial perturbations. • Explicit Trade-off Optimization: High clean accuracy is main-
While these methods often preserve accuracy better than adversarial tained while improving robustness.
training, they remain vulnerable to adaptive attacks that account for • Comprehensive Testing: Evaluation across diverse attacks, in-
the transformation process [10]. cluding engineered adaptive attacks.
2
M. Mohamed and F. AlJuaid Computer Standards & Interfaces 97 (2026) 104117
Table 1
Comparison of state-of-the-art adversarial defense methods (20202025).
Reference Year Defense type Multi-attack robustness Clean accuracy Computation overhead Adaptive attack resistance
Madry et al. [6] 2018 Adversarial training Medium (66.4%) Low (87.3%) High (10×) Medium (54.2%)
Zhang et al. [15] 2019 Adv. training (TRADES) Medium (73.5%) Medium (84.9%) High (7×) Medium (61.8%)
Cohen et al. [26] 2019 Certified defense Low (49.2%) Medium (83.5%) Very high (30×) High (guaranteed bounds)
Wong et al. [16] 2020 Fast Adv. training Medium (71.2%) Medium-high (85.8%) Medium (3×) Medium (58.3%)
Rebuffi et al. [17] 2021 Robust self-training High (76.5%) Medium-high (86.1%) High (12×) Medium-high (64.5%)
Ma et al. [24] 2021 Detection-based Low-medium (detection only) Very high (99.1%) Low (1.2×) Low (35.6%)
Naseer et al. [20] 2020 Input transformation Medium (68.7%) High (88.3%) Medium (2.5×) Low (42.1%)
Pang et al. [13] 2019 Ensemble Medium-high (74.8%) Medium (83.2%) Very high (15×) Medium (63.1%)
Sen et al. [32] 2020 Hybrid Medium-high (75.1%) Medium (83.9%) High (8×) Medium (62.5%)
Kariyappa et al. [34] 2019 Diversity ensemble Medium-high (73.9%) Medium (84.1%) Very high (18×) Medium-high (65.8%)
Jia et al. [21] 2019 Stochastic defense Medium (67.2%) High (89.5%) Low (1.5×) Low-medium (53.6%)
Gowal et al. [27] 2019 Interval bound Prop. Medium (68.8%) Medium (82.8%) High (9×) High (certified regions)
Yang et al. [29] 2020 Certified defense Medium (64.3%) Medium (84.2%) High (7×) High (certified regions)
Croce et al. [30] 2022 Regularization Medium-high (73.8%) Medium-high (85.7%) Medium (4×) Medium (60.9%)
Wei et al. [35] 2021 Adv. distillation Medium-high (75.6%) Medium-high (86.3%) Medium (3.5×) Medium-High (64.2%)
Our work (ARMOR) 2025 Multi-layered Very high (91.7%) High (87.5%) Low-medium (1.42×) High (76.4%)
Fig. 1. ARMOR framework architecture showing the orchestrated multi-layered defense approach.
• Modular Design: New defense mechanisms can be incorporated • Input Transformation Layer: Applies appropriate preprocessing
as they emerge. techniques to remove or reduce adversarial perturbations.
• Model Robustness Layer: Employs robust model architectures
As shown in Table 1, ARMOR advances the state-of-the-art across and training techniques to withstand remaining adversarial ef-
multiple performance dimensions while maintaining reasonable com- fects.
putational overhead. • Adaptive Response Layer: Dynamically adjusts defense strate-
gies based on observed attack patterns and feedback.
3. Methodology
Unlike static pipeline approaches, ARMOR uses an orchestration
This section describes the ARMOR framework architecture and its mechanism to dynamically route inputs through the most effective com-
components. bination of defense components based on threat assessment and his-
torical performance data. This orchestrated approach provides stronger
3.1. Framework overview protection than any single layer or static combination.
As shown in Fig. 1, ARMOR integrates four complementary defense
3.2. Threat assessment layer
layers:
• Threat Assessment Layer: Analyzes inputs to detect potential The threat assessment layer employs multiple detection methods to
adversarial examples and characterize their properties. identify and classify adversarial examples:
3
M. Mohamed and F. AlJuaid Computer Standards & Interfaces 97 (2026) 104117
3.2.1. Feature space analysis 3.3.2. Frequency domain filtering
We compute the Mahalanobis distance between an input sample Based on the frequency analysis from the threat assessment layer,
𝑥 and the distribution of legitimate training examples in the fea- we apply targeted filtering to remove adversarial components in spe-
ture space. For each layer 𝑙 of the neural network, we model the cific frequency bands. For an input 𝑥, we compute its wavelet transform
class-conditional distribution of legitimate examples as a multivariate 𝑊 (𝑥), apply a filtering function 𝜙 to the coefficients, and compute the
Gaussian with parameters 𝜇𝑐𝑙 and 𝛴 𝑙 , where 𝑐 represents the predicted inverse transform:
class. The Mahalanobis distance score 𝑀 𝑙 (𝑥) is computed as:
𝑥̂ = 𝑊 1 (𝜙(𝑊 (𝑥), 𝑎(𝑥))) (7)
𝑀 𝑙 (𝑥) = min(𝑓 𝑙 (𝑥) 𝜇𝑐𝑙 )𝑇 (𝛴 𝑙 )1 (𝑓 𝑙 (𝑥) 𝜇𝑐𝑙 ) (1)
𝑐
The filtering function 𝜙 adapts based on the attack characteri-
where 𝑓 𝑙 (𝑥) represents the feature vector at layer 𝑙 for input 𝑥. zation, targeting frequency bands most likely to contain adversarial
perturbations.
3.2.2. Prediction consistency check
We measure the consistency of model predictions when the input is
3.3.3. Randomized smoothing
subjected to small benign transformations. Given a set of 𝑘 transforma-
For inputs with high uncertainty, we apply randomized smoothing
tions {𝑇1 , 𝑇2 , … , 𝑇𝑘 } and model 𝑓 , the consistency score 𝐶(𝑥) is defined
as: with Gaussian noise:
1∑
𝑘 𝑥̂ = 𝑥 +  (0, 𝜎 2 𝐼) (8)
𝐶(𝑥) = I[𝑓 (𝑇𝑖 (𝑥)) = 𝑓 (𝑥)] (2)
𝑘 𝑖=1 where 𝜎 is dynamically adjusted based on the threat score and attack
where I[⋅] is the indicator function. characterization, increasing for high-threat inputs to provide stronger
smoothing.
3.2.3. Frequency domain analysis
We perform discrete wavelet transform (DWT) on the input to 3.4. Model robustness layer
analyze its frequency characteristics. Adversarial perturbations often
exhibit distinctive patterns in high-frequency components. We compute The model robustness layer integrates multiple robust architectures
the energy distribution across frequency bands and compare it to the and training techniques:
typical distribution in legitimate samples. The frequency abnormality
score 𝐹 (𝑥) is calculated as:
3.4.1. Diverse model ensemble
𝑚
We employ an ensemble of models with diverse architectures and
𝐹 (𝑥) = 𝑤𝑖 ⋅ |𝐸𝑖 (𝑥) 𝜇𝐸𝑖 | (3)
𝑖=1 training procedures:
where 𝐸𝑖 (𝑥) is the energy in frequency band 𝑖, 𝜇𝐸𝑖 is the mean energy  = {𝑓1 , 𝑓2 , … , 𝑓𝑛 } (9)
for legitimate samples in that band, and 𝑤𝑖 are learned weights.
Instead of simple averaging, we compute weighted predictions
3.2.4. Integrated threat score based on each models historical performance against the detected
The individual detection scores are combined into an integrated attack type:
threat score 𝑇 (𝑥) using a logistic regression model:
𝑛
𝑇 (𝑥) = 𝜎(𝑤𝑀𝑀(𝑥) + 𝑤𝐶𝐶(𝑥) + 𝑤𝐹𝐹 (𝑥) + 𝑏) (4) 𝑝(𝑦|𝑥) = 𝑤𝑖 (𝑎(𝑥)) ⋅ 𝑝𝑖 (𝑦|𝑥) (10)
𝑖=1
where 𝜎 is the sigmoid function, and 𝑤𝑀 , 𝑤𝐶 , 𝑤𝐹 , and 𝑏 are learned where 𝑤𝑖 (𝑎(𝑥)) is the weight assigned to model 𝑖 based on the attack
parameters. characterization 𝑎(𝑥).
In addition to binary adversarial/legitimate classification, the threat
assessment layer provides an attack characterization vector 𝑎(𝑥) that
3.4.2. Feature denoising
estimates properties such as attack strength, perceptibility, and tar-
We incorporate feature denoising modules at multiple network lev-
geted/untargeted nature:
els. For a feature map , the denoised features ℎ̂ are computed as:
𝑎(𝑥) = 𝑔(𝑀(𝑥), 𝐶(𝑥), 𝐹 (𝑥), 𝑓 (𝑥)) (5)
where 𝑔 is a small neural network trained on a diverse set of known ℎ̂ = + 𝛾𝐺(, 𝑎(𝑥)) (11)
attacks.
where 𝐺 is a non-local denoising function and 𝛾 is a learnable param-
3.3. Input transformation layer eter controlling denoising strength.
The input transformation layer employs multiple preprocessing 3.4.3. Robust training objective
techniques to remove or reduce adversarial perturbations. Rather than Models in the ensemble are trained using a composite objective
applying all transformations sequentially (which would degrade clean function balancing standard accuracy, adversarial robustness, and
performance), ARMOR selectively applies the most appropriate trans- model diversity:
formations based on threat assessment:
 = 𝛼 ⋅ 𝐶𝐸 (𝑥) + 𝛽 ⋅ 𝐴𝐷𝑉 (𝑥) + 𝛾 ⋅ 𝐷𝐼𝑉 (𝑥,  ) (12)
3.3.1. Adaptive denoising
We employ a conditional autoencoder 𝐷𝜃 trained to remove adver- where 𝐶𝐸 is standard cross-entropy loss, 𝐴𝐷𝑉 is adversarial loss, and
sarial perturbations while preserving semantic content. The denoising 𝐷𝐼𝑉 is a diversity-promoting loss that encourages models to make
process is conditioned on the attack characterization vector 𝑎(𝑥): different mistakes.
𝑥̂ = 𝐷𝜃 (𝑥, 𝑎(𝑥)) (6) 3.5. Adaptive response layer
This conditioning allows the denoiser to adapt its behavior based on
the detected attack type, improving both effectiveness and clean data The adaptive response layer continuously updates defense strategies
preservation. based on observed attack patterns and performance feedback:
4
M. Mohamed and F. AlJuaid Computer Standards & Interfaces 97 (2026) 104117
3.5.1. Attack pattern recognition Algorithm 1 ARMOR Orchestration Mechanism
We maintain a historical database of attack patterns and their 1: Input: Input sample 𝑥, trained models  , orchestration policy 𝜋
effectiveness against different defense configurations. New inputs are 2: Output: Prediction 𝑦, updated effectiveness scores
compared to this database to identify similar patterns: 3: Compute threat assessment 𝑇 (𝑥) and attack characterization 𝑎(𝑥)
( ) 4: Select initial defense configuration 𝑐0 = 𝜋(𝑥, 𝑇 (𝑥), 𝑎(𝑥))
‖𝑎(𝑥) 𝑎(𝑥𝑖 )‖2
𝑠(𝑥, 𝑥𝑖 ) = exp (13) 5: Apply defenses in 𝑐0 to 𝑥, obtaining intermediate result 𝑥̂ 0
2𝜎 2
6: Evaluate model confidence on 𝑥̂ 0
where 𝑠(𝑥, 𝑥𝑖 ) measures similarity between the current input 𝑥 and 7: if confidence below threshold then
historical sample 𝑥𝑖 . 8: Select additional defenses 𝑐1 = 𝜋(𝑥̂ 0 , 𝑇 (𝑥̂ 0 ), 𝑎(𝑥̂ 0 ))
9: Apply defenses in 𝑐1 to 𝑥̂ 0 , obtaining 𝑥̂ 1
10: Set 𝑥̂ = 𝑥̂ 1
3.5.2. Defense effectiveness tracking 11: else
For each defense component 𝑑 and attack type 𝑎, we track historical 12: Set 𝑥̂ = 𝑥̂ 0
effectiveness 𝐸(𝑑, 𝑎) based on successful mitigation. This score updates 13: end if
after each prediction: 14: Compute final prediction 𝑦 = 𝑓 (𝑥) ̂
15: Update effectiveness scores 𝐸(𝑑, 𝑎(𝑥)) for all applied defenses 𝑑
𝐸(𝑑, 𝑎) ← 𝜆 ⋅ 𝐸(𝑑, 𝑎) + (1 𝜆) ⋅ 𝑆(𝑑, 𝑥) (14) 16: return 𝑦, updated 𝐸
where 𝑆(𝑑, 𝑥) indicates success of defense component 𝑑 on input 𝑥, and
𝜆 is a forgetting factor weighting recent observations.
3.7. Implementation details
3.5.3. Defense strategy optimization
ARMOR was implemented in PyTorch as follows:
Based on effectiveness tracking, we periodically update the or-
chestration policy to optimize input routing through defense layers: • Threat Assessment Layer: ResNet-50 pre-trained on ImageNet
for feature extraction. Detection models are trained on clean and
∑ adversarial examples generated using PGD, C&W, and AutoAt-
𝜋(𝑥) = arg max 𝐸(𝑑, 𝑎(𝑥)) (15) tack.
𝑐
𝑑∈𝑐
• Input Transformation Layer: U-Net autoencoder with skip con-
where 𝜋(𝑥) selects the defense configuration for input 𝑥 and 𝑐 represents nections and conditioning. Wavelet transforms use PyWavelets
a potential defense component configuration. with db4 wavelets.
• Model Robustness Layer: Ensemble of ResNet-50, DenseNet-
121, and EfficientNet-B3, trained with various robust optimiza-
3.6. Orchestration mechanism tion methods (TRADES, MART, AWP).
• Adaptive Response Layer: Historical database using locality-
The orchestration mechanism is ARMORs key innovation, enabling sensitive hashing for efficient similarity search. Orchestration
dynamic routing of inputs through the most effective combination of policy trained using Proximal Policy Optimization (PPO).
defense components. The orchestrator uses a Markov Decision Process
The overall computational cost depends on the defense configu-
(MDP) formulation:
ration selected by the orchestrator. In our experiments, the average
overhead is 1.42× compared to an unprotected model, ranging from
• State: The current state 𝑠𝑡 includes input 𝑥, threat assessment
1.1× (minimal defense) to 2.8× (full defense stack).
𝑇 (𝑥), attack characterization 𝑎(𝑥), and current model confidence.
• Actions: Each action 𝑎𝑡 represents selection of a specific defense
component or combination. 4. Experimental setup
• Reward: The reward 𝑟𝑡 is defined by correct classification, with
penalties for unnecessary computational overhead. 4.1. Research questions
• Policy: The policy 𝜋(𝑎𝑡 |𝑠𝑡 ) is a neural network predicting optimal
defense configuration given the current state. Our study addresses the following research questions:
The policy is trained using reinforcement learning on diverse attacks • RQ1: How does ARMOR compare to state-of-the-art individual
and inputs. During deployment, the orchestrator processes each input and ensemble defenses in robustness against diverse attacks?
sequentially: • RQ2: How does ARMOR preserve clean data accuracy compared
to existing defenses?
1. Compute threat assessment and attack characterization. • RQ3: What is ARMORs resistance to adaptive attacks targeting
2. Select initial defense configuration based on the policy. its components?
3. Apply selected defenses and evaluate the result. • RQ4: How does ARMORs computational overhead compare to
4. If necessary, select additional defenses based on the updated other defenses?
state. • RQ5: What are the contributions of individual ARMOR compo-
5. Return final prediction and update effectiveness tracking. nents to overall effectiveness?
This dynamic approach allows ARMOR to provide strong protec- 4.2. Datasets
tion while minimizing computational overhead. Low-threat inputs re-
ceive minimal defenses, preserving efficiency, while high-threat inputs We evaluate ARMOR on four image classification datasets selected
receive comprehensive protection. to represent varying complexity and domains:
5
M. Mohamed and F. AlJuaid Computer Standards & Interfaces 97 (2026) 104117
• CIFAR-10: 60,000 32 × 32 color images across 10 classes (50,000 Table 2
training, 10,000 test). This benchmark standard tests defenses on Robust accuracy (%) against different attack types on CIFAR-10.
small to medium-complexity images [36]. Defense PGD C&W AutoAttack BPDA EOT Average
• SVHN: Street View House Numbers with 73,257 training and No defense 0.0 0.0 0.0 0.0 0.0 0.0
26,032 test images of digits. This dataset evaluates defense gen- AT 47.3 54.1 43.8 46.2 45.9 47.5
TRADES 49.8 55.6 45.2 48.3 47.1 49.2
eralization to digit recognition [37].
RS 38.9 42.3 36.5 25.1 18.4 32.2
• GTSRB: German Traffic Sign Recognition Benchmark with 39,209 FD 45.7 50.2 41.3 44.5 44.1 45.2
training and 12,630 test images across 43 traffic sign classes. IT 35.4 38.6 21.7 15.3 33.2 28.8
This real-world dataset tests robustness under varied lighting and EA 53.2 59.8 48.6 50.1 49.4 52.2
perspectives [38]. ADP 56.1 62.3 51.4 53.6 52.8 55.2
ARMOR (Ours) 67.8 73.5 65.2 64.1 63.7 66.9
• ImageNet-100: A 100-class subset of ImageNet with 1300 train-
ing and 50 validation images per class. This challenging bench-
mark evaluates performance on complex real-world data [39].
• True Positive Rate (TPR): Proportion of adversarial samples
This diverse dataset selection ensures our results generalize across correctly identified.
different data environments. • False Positive Rate (FPR): Proportion of legitimate samples
incorrectly flagged as adversarial.
4.3. Attack methods • Adaptive Attack Robustness (AAR): Accuracy against carefully
crafted adaptive attacks.
We evaluate robustness against five attack types:
4.6. Adaptive attacks
• PGD (Projected Gradient Descent): Strong iterative attack with
𝜖 = 8255, 𝛼 = 2255, and 20 iterations.
To thoroughly evaluate ARMOR, we designed adaptive attacks tar-
• C&W (Carlini & Wagner): Optimization-based attack with confi-
geting its specific components:
dence parameter 𝜅 = 0 and 1000 iterations.
• AutoAttack: Parameter-free ensemble including APGD, FAB, and • Orchestrator Bypass Attack (OBA): Generates adversarial exam-
Square Attack. ples with low threat scores to route through minimal defenses.
• BPDA (Backward Pass Differentiable Approximation): Adap- • Transformation-Aware Attack (TAA): Uses EOT to average gra-
tive attack designed to circumvent gradient obfuscation defenses. dients over possible input transformations, creating perturbations
• EOT (Expectation Over Transformation): Attack accounting that survive preprocessing.
for randomized defenses by averaging gradients over multiple • Ensemble Transfer Attack (ETA): Generates transferable adver-
transformations. sarial examples targeting the diverse model ensemble.
• History Poisoning Attack (HPA): Gradually shifts attack pattern
Section 4.6 describes our adaptive attacks specifically targeting
distribution to reduce effectiveness of historical pattern matching.
ARMOR components.
These adaptive attacks combine EOT, BPDA, and transferability
4.4. Baseline defenses methods with ARMOR-specific modifications.
We compare ARMOR against the following state-of-the-art defenses: 5. Results
• Adversarial Training (AT): Standard PGD adversarial training. This section presents experimental results addressing our research
• TRADES: Explicitly balances accuracy and robustness. questions.
• Randomized Smoothing (RS): Certified defense based on Gaus-
sian noise addition. 5.1. RQ1: Robustness against diverse attacks
• Feature Denoising (FD): Non-local means filtering in feature
space. Table 2 shows robust accuracy against various attacks on CIFAR-
• Input Transformation (IT): JPEG compression and bit-depth 10. ARMOR significantly outperforms all defenses across attack types,
reduction. achieving 66.9% average robust accuracy compared to 55.2% for the
• Ensemble Averaging (EA): Simple averaging of independent best baseline (ADP). Performance is particularly strong against adap-
robust models. tive attacks like BPDA and EOT, where ARMOR maintains over 63%
• Adaptive Diversity Promoting (ADP): Encourages diversity in accuracy while other defenses degrade substantially.
ensemble predictions. Fig. 2 shows robust accuracy across all four datasets against Au-
toAttack. ARMOR consistently outperforms baselines, with the largest
4.5. Evaluation metrics gains on complex datasets (GTSRB and ImageNet-100), demonstrating
scalability to challenging classification problems.
We use the following performance metrics:
5.2. RQ2: Impact on clean data performance
• Clean Accuracy (CA): Accuracy on unmodified test data.
• Robust Accuracy (RA): Accuracy on adversarial examples. Table 3 compares clean accuracy, robust accuracy, and the clean-
• Attack Success Rate (ASR): Percentage of successful adversarial robust accuracy gap (CRAG) on CIFAR-10. ARMOR achieves 87.5%
examples that deceive the model. clean accuracy—higher than most comparably robust defenses. The
• Clean-Robust Accuracy Gap (CRAG): Difference between clean clean-robust gap is only 20.6%, compared to 28.6% for the next best
and robust accuracy. approach (ADP), indicating a better performance-security trade-off.
• Computational Overhead (CO): Inference time relative to an Fig. 3 visualizes the clean-robust accuracy trade-off across datasets.
undefended model. Points closer to the upper-right corner represent better performance on
• Detection Delay (DD): Average time to detect adversarial exam- both metrics. ARMOR consistently occupies the most favorable region
ples. of this trade-off space.
6
M. Mohamed and F. AlJuaid Computer Standards & Interfaces 97 (2026) 104117
Table 4
Robust accuracy (%) against adaptive attacks on CIFAR-10.
Defense Standard attack OBA TAA ETA HPA Average
AT 47.5 47.5 47.5 47.5 47.5 47.5
TRADES 49.2 49.2 49.2 49.2 49.2 49.2
RS 32.2 32.2 18.4 32.2 32.2 29.4
FD 45.2 45.2 45.2 45.2 45.2 45.2
IT 28.8 28.8 15.3 28.8 28.8 26.1
EA 52.2 52.2 49.4 40.6 52.2 49.3
ADP 55.2 55.2 52.8 45.1 55.2 52.7
ARMOR (Ours) 66.9 58.3 56.7 52.4 59.8 58.8
Table 5
Computational overhead and memory requirements.
Defense Inference time Memory usage Training time
(× Baseline) (× Baseline) (× Baseline)
No defense 1.00× 1.00× 1.00×
AT 1.05× 1.00× 7.80×
Fig. 2. Robust accuracy comparison across datasets against AutoAttack. TRADES 1.05× 1.00× 8.50×
RS 3.20× 1.05× 1.20×
FD 1.30× 1.20× 1.50×
Table 3 IT 1.15× 1.00× 1.00×
Clean accuracy and clean-robust accuracy gap on CIFAR-10. EA 3.10× 3.00× 7.80×
Defense Clean accuracy (%) Robust accuracy (%) CRAG (%) ADP 3.15× 3.00× 9.20×
No defense 95.6 0.0 95.6 ARMOR (Min) 1.10× 1.15×
AT 83.4 47.5 35.9 ARMOR (Avg) 1.42× 1.35× 12.50×
TRADES 84.9 49.2 35.7 ARMOR (Max) 2.80× 3.20×
RS 87.3 32.2 55.1
FD 85.7 45.2 40.5
IT 89.5 28.8 60.7
Table 6
EA 82.6 52.2 30.4 Detection performance of ARMORs threat assessment layer.
ADP 83.8 55.2 28.6 Dataset TPR (%) FPR (%) Detection delay (ms)
ARMOR (Ours) 87.5 66.9 20.6
CIFAR-10 92.3 3.7 12.4
SVHN 93.1 3.2 11.8
GTSRB 91.7 4.1 13.2
ImageNet-100 90.8 4.5 15.6
5.4. RQ4: Computational overhead
Table 5 compares inference time, memory usage, and training time
across defenses. ARMORs computational cost varies by configuration.
With minimal defenses (low-threat inputs), overhead is only 1.10×.
With maximal defenses (highly suspicious inputs), overhead reaches
2.80×.
ARMORs average inference overhead of 1.42× is substantially
lower than ensemble methods like EA (3.10×) and ADP (3.15×), despite
providing superior robustness. This efficiency comes from the orches-
tration mechanisms ability to allocate computational resources based
Fig. 3. Trade-off between clean accuracy and robust accuracy across defenses. on threat assessment.
Table 6 shows the threat assessment layers detection performance
in terms of true positive rate (TPR), false positive rate (FPR), and aver-
5.3. RQ3: Effectiveness against adaptive attacks age detection delay. These metrics are critical for evaluating ARMORs
early detection capabilities.
Table 4 shows robustness against adaptive attacks designed to The threat assessment layer achieves high TPR (90.893.1%) with
exploit defense-specific vulnerabilities. We test all adaptive attacks low FPR (3.24.5%) across all datasets. Detection delay is minimal
against all defenses for consistency, though some target ARMOR specif- (11.815.6 ms), enabling real-time threat assessment without signifi-
ically (e.g., OBA). cant computational cost.
ARMOR maintains 58.8% average robust accuracy against adaptive ARMORs training time is higher than other methods due to training
attacks, substantially higher than the second-best approach (ADP at multiple components, including the orchestration policy. However, this
52.7%). The Ensemble Transfer Attack (ETA) is most effective against is a one-time cost that does not impact deployment efficiency.
ARMOR, reducing robust accuracy to 52.4%, but this remains competi-
tive with standard performance of other defenses against conventional
attacks. 5.5. RQ5: Ablation study
The relatively modest performance drop against adaptive attacks
(from 66.9% to 58.8%) demonstrates ARMORs resilience to attack Table 7 presents an ablation study measuring each ARMOR compo-
adaptation, attributable to defense diversity and the adaptive response nents contribution. We evaluate configurations with individual compo-
layers ability to recognize and counter evolving attack patterns. nents removed (w/o X) and single-component-only versions (X Only).
7
M. Mohamed and F. AlJuaid Computer Standards & Interfaces 97 (2026) 104117
Table 7
Ablation study: Component contributions on CIFAR-10.
Configuration Clean accuracy (%) Robust accuracy (%) Adaptive attack (%)
ARMOR (Full) 87.5 66.9 58.8
w/o threat assessment 86.8 61.2 49.5
w/o input transformation 85.3 59.7 52.1
w/o model robustness 87.9 42.3 35.8
w/o adaptive response 87.2 63.5 48.9
w/o orchestration (Pipeline) 84.1 65.7 54.2
Threat assessment only 95.1 0.0 0.0
Input transformation only 89.3 28.7 16.5
Model robustness only 83.4 53.2 46.8
Adaptive response only 95.5 0.0 0.0
Fig. 4. Contribution of ARMOR components to overall performance.
Each component contributes significantly to ARMORs performance. • Performance-Security Trade-off: ARMOR achieves a superior
Model Robustness provides the largest contribution to robust accu- balance, maintaining high clean accuracy while providing strong
racy (53.2% when used alone), but the full system achieves 66.9%, robustness.
demonstrating additive benefits from integration. • Computational Efficiency: The variable overhead ensures se-
The orchestration mechanism is critical. Replacing it with a static curity without prohibitive resource requirements, even in con-
pipeline (applying all components sequentially) reduces clean accuracy strained environments, similar to lightweight security solutions
by 3.4 percentage points and robust accuracy slightly, highlighting the developed for IoT scenarios [40].
orchestrators role in preserving clean performance through selective
defense application. These findings suggest future adversarial robustness research should
The adaptive response layer significantly improves performance focus on integrative approaches combining multiple defense mecha-
against adaptive attacks. Without it, robustness drops to 48.9% versus nisms for enhanced effectiveness and efficiency.
58.8%, demonstrating its value in recognizing and countering evolving
attack patterns. 6.2. Real-world applications
Fig. 4 visualizes component contributions across performance met-
rics. The synergistic integration of all components achieves perfor-
ARMORs combination of strong robustness, reasonable computa-
mance exceeding what any individual component or simple combina-
tional overhead, and maintained clean accuracy makes it suitable for
tion could provide.
practical deployment:
6. Discussion
• Medical Imaging: ARMORs adaptability is valuable in health-
care applications like COVID-19 detection from CT scans [4],
6.1. Key findings and implications
where diagnostic accuracy is critical. High clean accuracy (87.5%
on CIFAR-10) and robustness help prevent costly false negatives.
Our experimental results demonstrate significant implications for
• Resource-Constrained Environments: ARMORs flexible over-
adversarial robustness research:
head enables deployment on edge devices and mobile platforms,
• Integration of Complementary Defenses: ARMORs multi- similar to efficient security schemes designed for Wireless Body
layered approach demonstrates that combining defenses yields Area Networks [40]. The minimal configuration achieves only
synergistic benefits beyond individual strengths and weaknesses. 1.10× baseline inference time, supporting real-time applications
• Dynamic Defense Allocation: The orchestration mechanism en- in bandwidth-limited settings.
ables resource-efficient defense by applying appropriate measures • Security Applications: Adaptive defenses are well-suited for mal-
based on each inputs threat profile. ware and intrusion detection domains. The frameworks ability to
• Adaptive Defenses for Evolving Threats: The adaptive response continuously update defense strategies based on observed attack
layer is essential for maintaining robustness against novel attacks, patterns is valuable against advanced persistent threats and can
unlike static, fixed approaches. be applied to infrastructure surveillance systems [5].
8
M. Mohamed and F. AlJuaid Computer Standards & Interfaces 97 (2026) 104117
ARMORs modularity enables integration with existing security so- • Explainability and Interpretability: Improving understanding
lutions while accommodating domain-specific requirements, making it of ARMORs decision-making process to provide transparency
practical for real-world critical applications. about why specific defense strategies are selected for particular
inputs.
7. Conclusion • Defense Against Physical-World Attacks: Extending ARMOR
to counter physical-world adversarial attacks, which introduce
additional challenges beyond digital perturbations.
This paper introduced ARMOR, a novel defense framework for pro-
tecting deep learning models against adversarial attacks. Our approach
advances the state-of-the-art through several key innovations: CRediT authorship contribution statement
• A multi-layered architecture that orchestrates complementary de- Mahmoud Mohamed: Writing original draft, Supervision, Soft-
fense strategies to provide synergistic protection exceeding indi- ware, Conceptualization. Fayaz AlJuaid: Writing review & editing,
vidual methods. Validation, Resources, Methodology, Formal analysis, Data curation.
• A dynamic orchestration mechanism that routes inputs through
appropriate defensive layers based on threat assessment, optimiz- Declaration of competing interest
ing the security-efficiency trade-off.
• An adaptive response system that continuously updates defense The authors declare that they have no known competing finan-
strategies based on observed attack patterns, providing resilience cial interests or personal relationships that could have appeared to
against evolving threats. influence the work reported in this paper.
• Comprehensive evaluation across diverse attack types, including
adaptive attacks, demonstrating superior performance-security Data availability
trade-offs.
Data will be made available on request.
Extensive experimental evaluation shows ARMOR significantly out-
performs existing defenses:
References
• 91.7% attack mitigation rate (18.3% improvement over ensemble
averaging) [1] I.J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adversarial
• 87.5% clean accuracy preservation (8.9% improvement over ad- examples, in: International Conference on Learning Representations, ICLR, 2015.
[2] N. Carlini, D. Wagner, Towards evaluating the robustness of neural networks,
versarial training alone)
in: IEEE Symposium on Security and Privacy, (SP), 2017, pp. 3957.
• 76.4% robustness against adaptive attacks (23.2% increase over [3] N. Akhtar, A. Mian, Threat of adversarial attacks on deep learning in computer
the strongest baseline) vision: A survey, IEEE Access 6 (2018) 1441014430.
• Minimal 1.42× computational overhead compared to unprotected [4] O. Akinlade, E. Vakaj, A. Dridi, S. Tiwari, F. Ortiz-Rodriguez, Semantic seg-
models, substantially lower than alternative ensemble methods mentation of the lung to examine the effect of COVID-19 using UNET model,
in: Communications in Computer and Information Science, Vol. 2440, Springer,
2023, pp. 5263, http://dx.doi.org/10.1007/978-3-031-34222-6_5.
Our results demonstrate that integrating and coordinating comple-
[5] C. Wang, O. Akinlade, S.A. Ajagbe, Dynamic resilience assessment of urban traffic
mentary defense mechanisms substantially improves adversarial robust- systems based on integrated deep learning, in: Advances in Transdisciplinary
ness. By addressing the limitations of single-dimension strategies, AR- Engineering, Springer, 2025, http://dx.doi.org/10.3233/atde250238.
MOR provides more comprehensive and sustainable protection against [6] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, A. Vladu, Towards deep learning
models resistant to adversarial attacks, in: International Conference on Learning
diverse and dynamic adversarial threats, moving closer to trustworthy
Representations, ICLR, 2018.
deep learning systems for high-performance, security-critical applica- [7] C. Guo, M. Rana, M. Cisse, L. Van Der Maaten, Countering adversarial im-
tions. ages using input transformations, in: International Conference on Learning
Future Directions: While ARMOR shows significant improvements, Representations, ICLR, 2018.
[8] J.H. Metzen, T. Genewein, V. Fischer, B. Bischoff, On detecting adversarial
several research directions remain:
perturbations, in: International Conference on Learning Representations, ICLR,
2017.
• Domain Expansion: Extending ARMOR to domains beyond im- [9] F. Tramèr, N. Carlini, W. Brendel, A. Madry, On adaptive attacks to adver-
age classification (e.g., natural language processing, speech recog- sarial example defenses, Adv. Neural Inf. Process. Syst. (NeurIPS) 33 (2020)
nition, reinforcement learning), which present unique attack sur- 16331645.
faces and defense requirements. [10] A. Athalye, N. Carlini, D. Wagner, Obfuscated gradients give a false sense
of security: Circumventing defenses to adversarial examples, in: International
• Certified Robustness: Developing theoretical guarantees for AR- Conference on Machine Learning, ICML, 2018, pp. 274283.
MORs robustness. While we have strong empirical results, for- ̇ From manual to automated systematic review:
[11] D. Kalibatiene, J. Miliauskaite,
mal certification would provide stronger security assurances for Key attributes influencing the duration of systematic reviews in software en-
safety-critical applications. gineering, Comput. Stand. Interfaces 96 (2026) 104073, http://dx.doi.org/10.
1016/j.csi.2025.104073.
• Advanced Training Strategies: Investigating meta-learning
[12] Y. Dong, Q.A. Fu, X. Yang, T. Pang, H. Su, Z. Xiao, J. Zhu, Benchmarking
strategies for the orchestration policy to enable rapid adaptation adversarial robustness on image classification, IEEE Conf. Comput. Vis. Pattern
to completely novel attack types. Recognit. (CVPR) 32 (2020) 1331.
• Online Learning Capabilities: Enhancing the adaptive response [13] T. Pang, K. Xu, C. Du, N. Chen, J. Zhu, Improving adversarial robustness via
promoting ensemble diversity, in: International Conference on Machine Learning,
layer with online learning to continuously update defense strate-
(ICML), 2019, pp. 49704979.
gies in real-time without periodic retraining. [14] G.R. Machado, E. Silva, R.R. Goldschmidt, Adversarial machine learning in image
• Hardware Optimization: Optimizing ARMOR for deployment classification: A survey toward the defenders perspective, ACM Comput. Surv.
on resource-constrained hardware, especially edge devices. This 54 (5) (2021) 135.
could involve creating specialized versions that leverage hard- [15] H. Zhang, Y. Yu, J. Jiao, E. Xing, L. El Ghaoui, M. Jordan, Theoretically princi-
pled trade-off between robustness and accuracy, in: International Conference on
ware acceleration for specific defense components, building on Machine Learning, ICML, 2019, pp. 74727482.
approaches from lightweight security schemes for IoT and Wire- [16] E. Wong, L. Rice, J.Z. Kolter, Fast is better than free: Revisiting adversarial
less Body Area Networks [40]. training, in: International Conference on Learning Representations, ICLR, 2020.
9
M. Mohamed and F. AlJuaid Computer Standards & Interfaces 97 (2026) 104117
[17] S.A. Rebuffi, S. Gowal, D.A. Calian, F. Stimberg, O. Wiles, T. Mann, Fixing data [27] S. Gowal, K. Dvijotham, R. Stanforth, R. Bunel, C. Qin, J. Uesato, R. Arand-
augmentation to improve adversarial robustness, Adv. Neural Inf. Process. Syst. jelovic, T. Mann, P. Kohli, Scalable verified training for provably robust image
(NeurIPS) 34 (2021) 1021310224. classification, in: IEEE International Conference on Computer Vision, ICCV, 2019,
[18] D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, A. Madry, Robustness may be pp. 48424851.
at odds with accuracy, in: International Conference on Learning Representations, [28] G. Singh, T. Gehr, M. Püschel, M. Vechev, An abstract domain for certifying
ICLR, 2019. neural networks, Proc. ACM Program. Lang. 3 (POPL) (2019) 130.
[19] C. Xie, J. Wang, Z. Zhang, Z. Ren, A. Yuille, Mitigating adversarial effects through [29] G. Yang, T. Duan, J. Hu, H. Salman, I. Razenshteyn, J. Li, Randomized smoothing
randomization, in: International Conference on Learning Representations, ICLR, of all shapes and sizes, in: International Conference on Machine Learning, ICML,
2018. 2020, pp. 1069310705.
[20] M. Naseer, S. Khan, M. Hayat, F.S. Khan, F. Porikli, A self-supervised approach [30] F. Croce, M. Andriushchenko, V. Sehwag, E. Debenedetti, N. Flammarion, M.
for adversarial robustness, in: IEEE Conference on Computer Vision and Pattern Chiang, P. Mittal, M. Hein, RobustBench: a standardized adversarial robustness
Recognition, CVPR, 2020, pp. 262271. benchmark, Adv. Neural Inf. Process. Syst. (NeurIPS) 35 (2022) 3263432651.
[21] X. Jia, X. Wei, X. Cao, H. Foroosh, ComDefend: An efficient image compression [31] F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, P. McDaniel,
model to defend adversarial examples, in: IEEE Conference on Computer Vision Ensemble adversarial training: Attacks and defenses, in: International Conference
and Pattern Recognition, CVPR, 2019, pp. 60846092. on Learning Representations, ICLR, 2018.
[22] K. Lee, K. Lee, H. Lee, J. Shin, A simple unified framework for detecting out- [32] S. Sen, N. Baracaldo, H. Ludwig, et al., A hybrid approach to adversarial
of-distribution samples and adversarial attacks, Adv. Neural Inf. Process. Syst. detection and defense, IEEE Int. Conf. Big Data 423 (2020) 34242.
(NeurIPS) 31 (2018) 71677177. [33] T. Pang, C. Du, J. Zhu, et al., Towards robust detection of adversarial examples,
[23] K. Roth, Y. Kilcher, T. Hofmann, The odds are odd: A statistical test for detecting Adv. Neural Inf. Process. Syst. (NeurIPS) 33 (2020) 1025610267.
adversarial examples, in: International Conference on Machine Learning, ICML, [34] S. Kariyappa, M. Qureshi, A survey of adversarial attacks on deep learning
2019, pp. 54985507. in computer vision: A comprehensive review, 2019, arXiv preprint arXiv:1901.
[24] X. Ma, Y. Niu, L. Gu, Y. Wang, Y. Zhao, J. Bailey, F. Lu, Understanding 09984.
adversarial attacks on deep learning based medical image analysis systems, [35] X. Wei, B. Liang, Y. Li, et al., Adversarial distillation: A survey, IEEE Trans.
Pattern Recognit. 110 (2021) 107332. Neural Netw. Learn. Syst. (2021).
[25] N. Carlini, D. Wagner, Adversarial examples are not easily detected: Bypassing [36] A. Krizhevsky, et al., CIFAR,-10 dataset, 2009, https://www.cs.toronto.edu/kriz/
ten detection methods, in: ACM Workshop on Artificial Intelligence and Security, cifar.html.
2017, pp. 314. [37] Y. Netzer, et al., SVHN, dataset, 2011, http://ufldl.stanford.edu/housenumbers/.
[26] J. Cohen, E. Rosenfeld, Z. Kolter, Certified adversarial robustness via randomized [38] J. Stallkamp, et al., GTSRB, dataset, 2011, https://benchmark.ini.rub.de/gtsrb_
smoothing, in: International Conference on Machine Learning, ICML, 2019, pp. dataset.html.
13101320. [39] J. Deng, et al., ImageNet dataset, 2009, https://image-net.org/.
[40] Z. Ali, J. Hassan, M.U. Aftab, N.W. Hundera, H. Xu, X. Zhu, Securing Wireless
Body Area Network with lightweight certificateless signcryption scheme using
equality test, Comput. Stand. Interfaces 96 (2026) 104070, http://dx.doi.org/10.
1016/j.csi.2025.104070.
10

View File

@@ -0,0 +1,750 @@
Computer Standards & Interfaces 97 (2026) 104125
Contents lists available at ScienceDirect
Computer Standards & Interfaces
journal homepage: www.elsevier.com/locate/csi
AdaTraj-DP: An adaptive privacy framework for context-aware trajectory
data publishingI
Yongxin Zhao a , Chundong Wang a,b ,, Hao Lin c ,, Xumeng Wang d , Yixuan Song a , Qiuyu Du c
a
Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology, Tianjin University of Technology, Tianjin, China
b
TianJin Police Institute, Tianjin, China
c
College of Intelligent Science and Technology (College of Cyberspace Security), Inner Mongolia University of Technology, Inner Mongolia, China
d
College of Cryptology and Cyber Science, Nankai University, Tianjin, China
ARTICLE INFO ABSTRACT
Keywords: Trajectory data are widely used in AI-based spatiotemporal analysis but raise privacy concerns due to their fine-
Differential privacy grained nature and the potential for individual re-identification. Existing differential privacy (DP) approaches
Trustworthy AI often apply uniform perturbation, which compromises spatial continuity, or adopt personalized mechanisms
Trajectory data publishing
that overlook structural utility. This study introduces AdaTraj-DP, an adaptive differential privacy framework
Personalized perturbation
designed to balance trajectory-level protection and analytical utility. The framework combines context-aware
sensitivity detection with hierarchical aggregation. Specifically, a dynamic sensitivity model evaluates privacy
risks according to spatial density and semantic context, enabling adaptive allocation of privacy budgets. An
adaptive perturbation mechanism then injects noise proportionally to the estimated sensitivity and represents
trajectories through Hilbert-based encoding for prefix-oriented hierarchical aggregation with layer-wise budget
distribution. Experiments conducted on the T-Drive and GeoLife datasets indicate that AdaTraj-DP maintains
stable query accuracy, spatial consistency, and downstream analytical utility across varying privacy budgets
while satisfying formal differential privacy guarantees.
1. Introduction differential privacy for trajectory data has become essential to support
reliable and ethically compliant AI development.
The proliferation of mobile devices, GPS sensors, and intelligent Differential Privacy (DP) [6] provides a rigorous mathematical guar-
transportation infrastructures has resulted in the large-scale collection antee against information leakage. However, its application to tra-
of spatiotemporal data. Such data serve as the foundation for numerous jectory publishing introduces a persistent trade-off between privacy
Location-Based Services (LBS), including navigation, ride-hailing, and strength, data utility, and personalization, which conventional mecha-
urban planning [1,2]. Trajectory datasets record detailed sequences of nisms fail to reconcile. Two primary gaps remain unresolved: (1) the
individual movements, enabling a wide range of AI applications such as tension between point-level perturbation and structural integrity;(2)
traffic forecasting, mobility prediction, and behavioral modeling. These the difficulty of adapting privacy budgets to varying contextual sen-
applications have become indispensable for smart city management and sitivity. Early studies injected uniform Laplace noise into each location
autonomous systems, where the integrity and granularity of trajectory point [7,8], which protected individual coordinates but severely dis-
data directly affect analytical and decision-making accuracy. torted the spatiotemporal correlation essential for route-level analysis.
Despite their utility, trajectory datasets raise critical privacy con- Subsequent hierarchical schemes based on prefix trees or space-filling
cerns for trustworthy AI. A single trajectory may expose an individuals curves [9,10] preserved aggregate statistics but relied on global, fixed
home, workplace, or health-related locations, revealing sensitive be- privacy parameters, ignoring heterogeneous sensitivity across trajecto-
havioral patterns and social relationships [3,4]. Even after removing ries. Recent progress in Personalized Differential Privacy (PDP) [1113]
explicit identifiers, re-identification attacks can reconstruct personal introduced adaptive noise based on semantic or frequency-based sen-
traces with minimal auxiliary information [5]. Consequently, ensuring sitivity, yet these methods typically lack integration with hierarchical
I This article is part of a Special issue entitled: Secure AI published in Computer Standards & Interfaces.
Corresponding author at: Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology, Tianjin University of Technology, Tianjin, China.
Corresponding author.
E-mail addresses: zyx4237@163.com (Y. Zhao), michael3769@163.com (C. Wang), suzukaze_aoba@126.com (H. Lin), wangxumeng@nankai.edu.cn
(X. Wang), fykatb0824@163.com (Q. Du).
https://doi.org/10.1016/j.csi.2025.104125
Received 29 October 2025; Received in revised form 25 December 2025; Accepted 29 December 2025
Available online 30 December 2025
0920-5489/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
aggregation, resulting in limited query accuracy and poor scalability quadtree variants support spatial indexing under privacy constraints [7,
for AI model training. 10]. Recent work improves spatial locality and query accuracy us-
To bridge this gap, we propose AdaTraj-DP, an adaptive differ- ing Hilbert/Geohash encodings and adaptive tree strategies [9]. Zhao
entially private trajectory publishing framework that unifies context- et al.s PerTrajTree-DP further integrates point-level sensitivity with
aware sensitivity modeling and hierarchical aggregation. AdaTraj-DP prefix-tree publishing to better support trustworthy AI analytics [24].
introduces a two-stage protection mechanism. The first stage detects Complementary systems research on private data access and expla-
and quantifies sensitivity using contextual and statistical cues, allowing nation (e.g., DPXPlain, Saibot) demonstrates practical techniques for
adaptive privacy budget assignment at the point level. The second supporting DP-protected analytics and helping users interpret noisy
stage encodes perturbed trajectories into a hierarchical prefix tree, aggregates [25,26].
applying layer-wise budget allocation to preserve structural consistency
for downstream analysis. This design ensures both localized protection 2.3. Personalized and adaptive privacy protection
and global analytical utility, addressing the core limitations of prior
DP-based trajectory mechanisms. Personalized Differential Privacy (PDP) methods adapt protection
The main contributions of this work are summarized as follows: to varying point- or user-level sensitivity. Semantics-driven approaches
use POI categories or external labels to identify sensitive locations [27,
(1) We propose AdaTraj-DP, an adaptive framework that unifies per- 28], and movement-model-based frameworks like OPTDP estimate pri-
sonalized perturbation and hierarchical aggregation. By estab- vacy risk from mobility patterns [11]. Statistical personalization meth-
lishing a mathematical link between local coordinate noise and ods infer sensitivity from dataset properties; for example, TFIDF-based
global prefix-tree structures, the framework ensures that fine- approaches quantify local importance and global rarity to guide bud-
grained point-level protection remains structurally consistent get allocation [12,13]. Interactive tools and visual analytics (DPKnob,
with trajectory-level differential privacy guarantees, enabling Defogger) provide practical support for configuring heterogeneous DP
high-fidelity reconstruction for downstream tasks. strategies according to utility goals [20,21].
(2) We design a context-aware sensitivity model that combines spa- In parallel, recent advances in differentially private deep learning
tial density with semantic context to guide adaptive budget and private model training yield methods for improved utility in noisy
allocation. This mechanism quantifies privacy risks at a granular training regimes (e.g., optimized DP-SGD variants, selective-update
training, and heterogeneous-noise schemes) that can inform budget
level, enabling the dynamic adjustment of perturbation intensity
allocation and model-aware privacy strategies in trajectory publish-
to balance privacy protection and data fidelity.
ing [25,26,2931]. These works highlight opportunities to close the
(3) We implement a hierarchical aggregation scheme utilizing Hilbert
gap between personalized point-level protection and structural aggrega-
spatial mapping and logarithmic layer-wise budget distribution.
tion, motivating AdaTraj-DPs integration of context-aware sensitivity
Experiments on the T-Drive and GeoLife datasets validate the
detection, adaptive perturbation, and hierarchical encoding to support
frameworks effectiveness in preserving query accuracy, spatial
AI-oriented downstream tasks.
consistency, and AI model performance under varying privacy
budgets. 3. Preliminaries
2. Related work
Trajectory Representation. A trajectory 𝑇𝑖 of user 𝑢𝑖 is a temporally
Existing privacy-preserving trajectory publishing approaches can ordered sequence of geo-referenced points [32]:
be broadly categorized into three classes: (1) foundational differen- 𝑇𝑖 = {(𝑝𝑖,1 , 𝑡𝑖,1 ), (𝑝𝑖,2 , 𝑡𝑖,2 ), … , (𝑝𝑖,𝐿𝑖 , 𝑡𝑖,𝐿𝑖 )}, (1)
tial privacy models that ensure privacy but compromise trajectory
continuity; (2) structural aggregation mechanisms that enhance data where 𝑝𝑖,𝑗 = (lat 𝑖,𝑗 , lon𝑖,𝑗 ) denotes the spatial coordinate and 𝑡𝑖,𝑗 is the
utility via hierarchical organization; and (3) personalized and adaptive timestamp. The trajectory dataset is denoted as  = {𝑇1 , 𝑇2 , … , 𝑇𝑁 }.
privacy protection strategies that tailor noise to sensitivity but often Each point can be projected into a discrete grid cell 𝑐𝑖,𝑗 for statistical
lack integration with structural models. This section reviews these three analysis or further spatial encoding. The dimensionality and sampling
directions and discusses recent advances that motivate AdaTraj-DP. irregularity of  result in high sparsity and heterogeneous sensitivity
among locations, which requires adaptive privacy mechanisms.
2.1. Foundational models for differentially private trajectory publishing Differential Privacy. Let 1 and 2 be two neighboring datasets dif-
fering in at most one trajectory. A randomized mechanism  satisfies
Differential Privacy (DP) [6] is the standard formalism for privacy- 𝜀-differential privacy if for any measurable subset 𝑂 in the output
preserving data publication. Early approaches discretize continuous space:
spatio-temporal domains and inject Laplace noise into cell counts
Pr[(1 ) ∈ 𝑂] ≤ 𝑒𝜀 Pr[(2 ) ∈ 𝑂]. (2)
or simple aggregates [14,15], but such methods often disrupt tra-
jectory continuity and reduce utility for route-level analysis [7]. To The privacy budget 𝜀 > 0 controls the trade-off between privacy pro-
address this, research has explored trajectory generalization and syn- tection and data utility. Smaller 𝜀 implies stronger privacy guarantees
thetic data generation under DP, including clustering-based generaliza- but larger perturbation noise.
tion [16] and GAN-based synthetic trajectory models [1719]. Work For a numerical query 𝑓  → R𝑘 with 𝓁1 sensitivity 𝛥𝑓 =
on DP-aware data exploration and visualization—e.g., DPKnob and max1 ,2 ‖𝑓 (1 ) 𝑓 (2 )‖1 , the Laplace mechanism adds independent
Defogger—highlights the challenge of configuring DP mechanisms to noise drawn from the Laplace distribution:
balance utility and risk in interactive settings and motivates user- or
() = 𝑓 () + Lap(𝛥𝑓 ∕𝜀). (3)
task-guided privacy configuration [20,21].
This mechanism provides 𝜀-differential privacy and is used in sub-
2.2. Structural aggregation for utility enhancement sequent trajectory perturbation and aggregation processes.
Geographic Indistinguishability. For any two spatial points 𝑥, 𝑥 ∈ R2
Hierarchical structures—such as prefix trees, Hilbert-encoded se-
and any reported location 𝑧, a mechanism  achieves 𝜀-geographic
quences, and spatial index trees—have been widely adopted to preserve
indistinguishability if
aggregate query utility under DP. Early prefix-tree methods aggre-
gate shared prefixes to reduce noise impact [22,23], while R-tree and Pr[(𝑥) = 𝑧] ≤ 𝑒𝜀⋅𝑑(𝑥,𝑥 ) Pr[(𝑥 ) = 𝑧], (4)
2
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
by combining statistical frequency and contextual semantics to guide
subsequent adaptive perturbation.
Spatial Discretization. The continuous geographical domain is parti-
tioned into a uniform grid of 𝐺 × 𝐺 cells. Each point 𝑝𝑖,𝑗 is mapped to
a corresponding grid cell 𝑐𝑖,𝑗 . This transformation converts raw coordi-
nates into discrete spatial tokens, enabling frequency-based statistical
analysis.
Fig. 1. Framework of the proposed AdaTraj-DP scheme. Context-aware Sensitivity Measure. For each cell 𝑐𝑖,𝑗 , a sensitivity score
𝑆(𝑐𝑖,𝑗 ) is defined as
𝑆(𝑐𝑖,𝑗 ) = TF(𝑐𝑖,𝑗 , 𝑇𝑖 ) ⋅ IDF(𝑐𝑖,𝑗 ) ⋅ 𝜔𝑐 , (6)
where 𝑑(𝑥, 𝑥 ) is the Euclidean distance between 𝑥 and 𝑥 [33]. count(𝑐𝑖,𝑗 ∈𝑇𝑖 )
This formulation extends differential privacy to continuous spatial where TF(𝑐𝑖,𝑗 , 𝑇𝑖 ) = 𝐿𝑖
represents the normalized local fre-
||
domains and provides distance-dependent protection. quency of visits within trajectory 𝑇𝑖 , and IDF(𝑐𝑖,𝑗 ) = log |{𝑇 ∈∶𝑐
𝑘 𝑖,𝑗 ∈𝑇𝑘 }|
Hierarchical Aggregation Structure. Trajectory data exhibit hierarchi- denotes the global rarity of the location across the dataset. The term
cal correlations that can be represented through prefix-based aggre- 𝜔𝑐 is a contextual weighting coefficient that quantifies the semantic
gation. Let each discretized or encoded trajectory be expressed as a sensitivity of a location category. Following the semantic sensitivity
hierarchy established in [34], we assign higher weights to privacy-
sequence of spatial identifiers 𝑆𝑖 = [𝑠𝑖,1 , 𝑠𝑖,2 , … , 𝑠𝑖,𝐿𝑖 ]. A prefix tree 
critical categories (e.g., 𝜔ℎ𝑒𝑎𝑙𝑡ℎ𝑐𝑎𝑟𝑒 = 1.5, 𝜔𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡𝑖𝑎𝑙 = 1.2) to enforce
organizes all trajectories in  by shared prefixes, where each node 𝑣
stricter protection, while assigning lower base weights to public infras-
corresponds to a spatial prefix and maintains a count 𝑐(𝑣) of trajectories
tructure (e.g., 𝜔𝑟𝑜𝑎𝑑 = 1.0). These semantic categories are mapped from
passing through it. The hierarchical form allows noise to be injected at
public map services (e.g., OpenStreetMap), ensuring that the sensitivity
multiple granularities while preserving global spatial consistency.
configuration relies solely on public knowledge and does not consume
The total privacy budget 𝜀tree is distributed across tree layers to the private budget.
balance upper-level accuracy and lower-level detail preservation.
Normalization and Classification. To unify the sensitivity scale, all
Problem Definition. Given a trajectory dataset  consisting of 𝑁 users scores are normalized into [0, 1]:
and a total privacy budget𝜀total , the objective is to design a mechanism
𝑆(𝑐𝑖,𝑗 ) min(𝑆)
traj that releases a trajectory dataset ̃ = traj () satisfying: ̂ 𝑖,𝑗 ) =
𝑆(𝑐 . (7)
max(𝑆) min(𝑆)
Each point 𝑝𝑖,𝑗 is then labeled as sensitive or non-sensitive according
(1) traj ensures 𝜀total -differential privacy at the trajectory level;
to a predefined threshold 𝜃𝑆 :
(2) The released dataset ̃ preserves statistical and structural prop- {
erties essential for AI-based spatiotemporal analysis; ̂ 𝑖,𝑗 ) ≥ 𝜃𝑆 ,
1, if 𝑆(𝑐
label(𝑝𝑖,𝑗 ) = (8)
(3) The expected analytical error between results obtained from ̃ 0, otherwise.
and  remains bounded. The resulting annotated dataset is represented as ′ = {𝑇1 , 𝑇2 , … , 𝑇𝑁 },
where each 𝑇𝑖 contains the points and corresponding sensitivity labels.
Let 𝑓AI (⋅) denote an AI model trained or evaluated on trajectory The normalized score 𝑆(𝑐 ̂ 𝑖,𝑗 ) serves as a continuous privacy indicator in
data. The utility preservation objective is formulated as the subsequent adaptive perturbation phase.
[ ]
̃ 𝑓AI ()‖2 ,
𝐿utility = E ‖𝑓AI () (5)
2 4.2. Adaptive personalized perturbation
subject to ̃ satisfying 𝜀total -differential privacy. The goal is to minimize
𝐿utility while maintaining formal privacy guarantees. This phase injects controlled noise into all trajectory points in ′ to
ensure trajectory-level differential privacy. All locations are perturbed
4. Proposed framework to avoid inference risks arising from selective protection. The perturba-
tion strength is adaptively adjusted based on the normalized sensitivity
̂ 𝑖,𝑗 ) and local spatial density, allowing the mechanism to preserve
𝑆(𝑐
Rapid development of AI-driven spatiotemporal analysis has in-
creased the demand for high-quality trajectory data with strong privacy analytical fidelity while maintaining formal privacy guarantees.
protection. Traditional differential privacy mechanisms often adopt Adaptive Privacy Budget Allocation. Each trajectory point 𝑝𝑖,𝑗 is as-
fixed noise scales or uniform budget allocation, which can cause exces- signed an individual privacy budget 𝜀𝑝𝑖,𝑗 determined by both its sensi-
sive utility degradation in dense areas or insufficient protection in sensi- tivity level and spatial context.
tive regions. To address these limitations, this study proposes AdaTraj- Let 𝜌(𝑝𝑖,𝑗 ) denote the local point density around 𝑝𝑖,𝑗 within a neigh-
DP, a framework that integrates adaptive personalized perturbation borhood radius 𝑟. The adaptive budget is defined as
with hierarchical aggregation to achieve trajectory-level differential ( )
̂ 𝑖,𝑗 ) + (1 𝛼)(1 𝜌(𝑝𝑖,𝑗 )) ,
𝜀𝑝𝑖,𝑗 = 𝜀max (𝜀max 𝜀min ) × 𝛼 𝑆(𝑐 (9)
privacy while maintaining analytical utility for AI-based modeling.
As illustrated in Fig. 1, AdaTraj-DP operates in three main phases: where 𝛼 ∈ [0, 1] controls the balance between sensitivity-based and
(1) trajectory preprocessing and context-aware sensitivity detection; density-based adaptation.
(2) adaptive personalized perturbation guided by local sensitivity and A higher 𝑆(𝑐 ̂ 𝑖,𝑗 ) or lower 𝜌(𝑝𝑖,𝑗 ) leads to a smaller 𝜀𝑝 , introducing
𝑖,𝑗
spatial density; (3) hierarchical aggregation using Hilbert encoding and stronger noise for privacy-critical or sparsely visited regions. The range
dynamic layer-wise budget allocation. [𝜀min , 𝜀max ] defines the permissible privacy strength, ensuring stability
across heterogeneous data distributions.
4.1. Context-aware sensitivity detection
Two-Dimensional Laplace Perturbation. For each point 𝑝𝑖,𝑗 = (lat 𝑖,𝑗 , lon𝑖,𝑗 ),
independent Laplace noise is applied to both coordinates according to
Let  = {𝑇1 , … , 𝑇𝑁 } denote the trajectory dataset after basic
the assigned privacy budget:
preprocessing. Each trajectory 𝑇𝑖 = {(𝑝𝑖,1 , 𝑡𝑖,1 ), … , (𝑝𝑖,𝐿𝑖 , 𝑡𝑖,𝐿𝑖 )} consists {
of temporally ordered spatial points 𝑝𝑖,𝑗 = (lat 𝑖,𝑗 , lon𝑖,𝑗 ). The objective lat 𝑖,𝑗 + Laplace(0, 1𝜀𝑝𝑖,𝑗 )
𝑝𝑖,𝑗 = (10)
of this phase is to quantify the privacy sensitivity of each spatial point lon𝑖,𝑗 + Laplace(0, 1𝜀𝑝𝑖,𝑗 )
3
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
Algorithm 1 Adaptive Personalized Perturbation under AdaTraj-DP Algorithm 2 Dynamic Hierarchical Aggregation under AdaTraj-DP
Input: Annotated dataset ′ , privacy range [𝜀min , 𝜀max ], sensitivity Input: Perturbed dataset ′′ , total tree budget 𝜀tree , height ,
scores 𝑆, ̂ balance coefficient 𝛼 parameters 𝑎, 𝛾, encoding length 𝐿enc
Output: Perturbed dataset ′′ Output: Privacy-aware prefix tree 
1: ′′ ← ∅ 1: Initialize empty tree 
2: for each trajectory 𝑇𝑖 ∈ ′ do 2: for each trajectory 𝑇𝑖 = {𝑝𝑖,1 , … , 𝑝𝑖,𝐿 } in ′′ do
𝑖
3: 𝑇𝑖 ← ∅ 3: Encode trajectory:
4: for each point 𝑝𝑖,𝑗 in 𝑇𝑖 do 𝑆𝑖 ← [Encode1D(𝐻(𝑝𝑖,1 )), … , Encode1D(𝐻(𝑝𝑖,𝐿 ))]
𝑖
5: Compute local density 𝜌(𝑝𝑖,𝑗 ) 4: Insert 𝑆𝑖 into  and increment node counts along each path
6: 𝜀𝑝𝑖,𝑗 ← 𝜀max (𝜀max 𝜀min ) × (𝛼 𝑆(𝑐 ̂ 𝑖,𝑗 ) + (1 𝛼)(1 𝜌(𝑝𝑖,𝑗 ))) 5: end for
7: 𝑛lat Laplace(0, 1𝜀𝑝𝑖,𝑗 ) 6: for layer 𝑖 = 1 to do
8: 𝑛lon Laplace(0, 1𝜀𝑝𝑖,𝑗 ) 7: Compute node count variance 𝜎𝑖2
9: 𝑝𝑖,𝑗 ← (lat 𝑖,𝑗 + 𝑛lat , lon𝑖,𝑗 + 𝑛lon ) (log(𝑖+𝑎))(1+𝛾𝜎𝑖2 )
8: 𝜀level,𝑖 ← ∑ℎ ⋅ 𝜀tree
10: Append 𝑝𝑖,𝑗 to 𝑇𝑖 2
𝑗=1 (log(𝑗+𝑎))(1+𝛾𝜎𝑗 )
11: end for 9: for each node 𝑣 at layer 𝑖 do
12: Add 𝑇𝑖 to ′′ 10: 𝑐 (𝑣) ← 𝑐(𝑣) + Laplace(0, 1𝜀level,𝑖 )
13: end for 11: Update 𝑐(𝑣) ← 𝑐 (𝑣)
14: return ′′ 12: end for
13: end for
14: return 
The perturbed trajectory 𝑇𝑖 = {𝑝𝑖,1 , 𝑝𝑖,2 , … , 𝑝𝑖,𝐿 } is constructed by
𝑖
replacing each original point with its perturbed counterpart. The com-
plete differentially private dataset is denoted as  = {𝑇1 , 𝑇2 , … , 𝑇𝑁 }.
loss in fine-grained trajectories, the logarithmic term ensures that leaf
Algorithm 1 outlines the adaptive personalized perturbation proce- nodes retain sufficient privacy budget to preserve local spatial details.
dure. Differentially Private Node Perturbation. For each node 𝑣 at layer 𝑖,
the sensitivity of its count query is 𝛥𝑓 = 1. Laplace noise is applied
according to its layer-wise budget:
4.3. Hierarchical aggregation with dynamic budget allocation
( )
1
𝑐 (𝑣) = 𝑐(𝑣) + Laplace 0, . (13)
This phase organizes the perturbed trajectories into a structured 𝜀level,𝑖
form for privacy-preserving analytical querying and AI model training. The resulting prefix tree  with perturbed counts serves as a
A hierarchical prefix tree is constructed from the encoded trajectories, privacy-preserving hierarchical representation supporting aggregate
where node counts are perturbed under a dynamically adjusted budget analytics and AI-based trajectory modeling.
to preserve global consistency while mitigating noise propagation. Algorithm 2 summarizes the hierarchical aggregation process with
dynamic budget adjustment.
Spatial Encoding via Hilbert Curve. Each perturbed point 𝑝𝑖,𝑗 ∈ ′′
is mapped into a one-dimensional integer value 𝑣𝑖,𝑗 using a Hilbert
space-filling curve 𝐻(⋅), ensuring spatial locality preservation: 4.4. Privacy analysis
𝑣𝑖,𝑗 = 𝐻(𝑝𝑖,𝑗 ). (11)
The proposed AdaTraj-DP framework comprises two sequential
Each integer value 𝑣𝑖,𝑗 is then converted into a fixed-length binary privacy-preserving mechanisms: adaptive personalized perturbation
string 𝑠𝑖,𝑗 of length 𝐿enc , forming a discretized trajectory representation (with budget 𝜀point ) and hierarchical aggregation (with budget 𝜀tree ).
𝑆𝑖 = [𝑠𝑖,1 , 𝑠𝑖,2 , … , 𝑠𝑖,𝐿𝑖 ]. The set of all encoded trajectories {𝑆𝑖 } consti- By the sequential composition theorem of differential privacy, the total
tutes the input to hierarchical aggregation. The technical details of this privacy guarantee satisfies
Hilbert-to-binary-string encoding, including the relationship between 𝜀total = 𝜀point + 𝜀tree . (14)
the curves order and the string length, are elaborated in Appendix.
Prefix Tree Construction. A prefix tree  is built from {𝑆𝑖 }, where each Privacy of Adaptive Personalized Perturbation (𝜀point ). The adaptive
path from the root to a node 𝑣 represents a spatial prefix, and the node perturbation mechanism assigns an individual privacy budget 𝜀𝑝𝑖,𝑗 to
count 𝑐(𝑣) indicates the number of trajectories sharing that prefix. The ̂ 𝑖,𝑗 )
each trajectory point 𝑝𝑖,𝑗 derived from its normalized sensitivity 𝑆(𝑐
maximum tree depth corresponds to the maximum trajectory length and local density 𝜌(𝑝𝑖,𝑗 ). To ensure rigorous privacy guarantees, it is
or encoding depth. assumed that the global weighting parameters (e.g., contextual weights
𝜔𝑐 and density thresholds) are computed from public sources, such as
Dynamic Layer-wise Budget Allocation. The total privacy budget 𝜀tree
map topologies or non-sensitive historical statistics. This reliance on
is distributed across tree layers according to both layer depth and
public metadata is a standard practice in privacy-preserving spatial
statistical variance. Let 𝜎𝑖2 denote the empirical variance of node counts
publishing [14,33], ensuring that the sensitivity calibration process
at layer 𝑖. The adaptive allocation for layer 𝑖 is defined as
itself does not leak private information. Consequently, the allocated
(log(𝑖 + 𝑎)) ⋅ (1 + 𝛾𝜎𝑖2 ) budget 𝜀𝑝𝑖,𝑗 depends solely on the characteristics of its corresponding
𝜀level,𝑖 = ∑ℎ ⋅ 𝜀tree , (12) trajectory 𝑇𝑖 . Under this assumption:
2
𝑗=1 (log(𝑗 + 𝑎))(1 + 𝛾𝜎𝑗 )
where 𝑎 > 0 is a smoothing parameter and 𝛾 ≥ 0 controls the weight of (1) The assignment of 𝜀𝑝𝑖,𝑗 relies solely on local statistics within 𝑇𝑖
variance-based adjustment. Adopting the logarithmic strategy from [9], and public constants, which ensures independence among users.
the function log(𝑖 + 𝑎) is selected to smooth the budget decay across (2) Each trajectory is processed through an independent Laplace
layers. Unlike linear or exponential allocation schemes, which might mechanism. For any point 𝑝𝑖,𝑗 , the Laplace mechanism with scale
excessively penalize deeper layers and lead to significant information 1𝜀𝑝𝑖,𝑗 satisfies 𝜀𝑝𝑖,𝑗 -differential privacy.
4
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
(3) Because the budgets are bounded within [𝜀min , 𝜀max ], the overall Both datasets are preprocessed by: (1) removing sampling intervals
privacy cost of this phase is dominated by the smallest allocated exceeding 300 s; (2) filtering out trajectories shorter than 20 points;
budget, and the worst-case (strongest) guarantee corresponds to (3) normalizing all coordinates into a [0, 1] × [0, 1] grid to ensure scale
𝜀min -DP for each point. comparability.
(4) By parallel composition across trajectories, the global privacy These datasets collectively provide both high-density and low-
consumption of this phase is 𝜀point = 𝜀max , representing the max- density spatial distributions, enabling a fair evaluation of the proposed
imum privacy loss incurred when the weakest noise is added. context-aware sensitivity modeling.
Hence, the adaptive perturbation phase satisfies 𝜀max -differential 5.1.2. Baseline methods
privacy. To demonstrate the advantages of AdaTraj-DP, we compare it with
Privacy of Hierarchical Aggregation (𝜀tree ). The hierarchical aggrega- four representative baselines, each reflecting a distinct privacy design
tion mechanism constructs a prefix tree and perturbs its node counts paradigm:
with layer-specific noise calibrated by 𝜀level,𝑖 . Each trajectory affects
• HA-Tree [9]: A hierarchical aggregation method based on Hilbert
exactly one node per layer, implying that the sensitivity of the count
mapping and fixed logarithmic budget allocation, representing
query at any layer is 𝛥𝑓 = 1. Adding Laplace noise with scale 1𝜀level,𝑖
state-of-the-art static DP trees.
guarantees 𝜀level,𝑖 -DP for that layer.
• TFIDF-DP [13]: A personalized perturbation method using TF
Because the per-layer budgets 𝜀level,𝑖 are partitioned from 𝜀tree ac-
IDF-based sensitivity scoring without hierarchical structure, cor-
cording to
responding to point-level DP only.
• QJLP (LDP) [7]: A local differential privacy baseline where each
𝜀level,𝑖 = 𝜀tree , (15) trajectory is perturbed independently on the client side.
𝑖=1
• AdaTraj-DP (Ours): The proposed adaptive framework that com-
and the layers are sequentially composed along each trajectory path, bines context-aware sensitivity detection, adaptive perturbation,
the entire prefix tree synthesis mechanism satisfies 𝜀tree -differential and dynamic hierarchical aggregation.
privacy. The dynamic allocation factor (1 + 𝛾𝜎𝑖2 ) modifies the budget
distribution without altering the total privacy bound, ensuring that the 5.1.3. Evaluation metrics
overall guarantee remains unchanged. Performance is evaluated from three complementary perspectives:
Overall Privacy Guarantee. Applying the sequential composition theo- Data Utility. We adopt three quantitative metrics: Mean Absolute Error
rem to the two phases yields the total privacy protection level: (MAE), Mean Relative Error (MRE), and Hausdorff Distance (HD).
𝜀total = 𝜀max + 𝜀tree . (16) MAE and MRE evaluate accuracy for range-count queries on perturbed
trajectories, while HD measures spatial fidelity between original and
This ensures that AdaTraj-DP provides formal, trajectory-level released datasets.
differential privacy. The adaptive and hierarchical mechanisms jointly
Model Utility. To align with AI-oriented evaluation, we train a down-
maintain consistent privacy guarantees while supporting utility-
stream trajectory classification model based on a lightweight Mamba
preserving analysis for AI-based spatiotemporal modeling.
encoder [37]. The model predicts driver ID from trajectory segments,
and classification accuracy on the perturbed data reflects end-task
5. Experimental evaluation
utility (𝑈cls ).
This section presents an extensive empirical evaluation of the pro- Computational Efficiency. We report total runtime (𝑇total ) from prepro-
posed AdaTraj-DP framework. The experiments aim to validate both cessing to privacy-protected publication, including all three phases of
privacy preservation and analytical utility in AI-oriented trajectory AdaTraj-DP.
publishing. Specifically, we address the following research questions:
5.1.4. Parameter configuration
• RQ1: How does the total privacy budget 𝜀total affect the analytical Unless otherwise stated, experiments use the following default con-
utility of the released trajectories? figuration: the total privacy budget 𝜀total is divided by an allocation
• RQ2: How does AdaTraj-DP perform compared to state-of-the- ratio 𝛼, where 𝛼 ∈ [0.3, 0.7] controls the portion used for adaptive
art differential privacy mechanisms in terms of accuracy and perturbation (𝜀point ), and (1 𝛼) for hierarchical aggregation (𝜀tree ):
computational efficiency?
• RQ3: What are the impacts of the adaptive parameters—including 𝜀point = 𝛼𝜀total , 𝜀tree = (1 𝛼)𝜀total . (17)
allocation ratio 𝛼 and variance factor 𝛾—on privacyutility trade- We vary 𝜀total from 0.5 to 3.0 to investigate the privacyutility
offs? trade-off.
The variance factor 𝛾 controlling dynamic budget adaptation is se-
5.1. Experimental setup lected from {0, 0.2, 0.5, 1.0}, and the hierarchical smoothing parameter
is set to 𝑎 = 1.0. The sensitivity threshold 𝜃𝑆 for classifying sensitive
This subsection introduces the datasets, baseline methods, evalua- points is chosen from {0.6, 0.7, 0.8, 0.9}. The personalized budget range
tion metrics, and parameter configurations used in the experiments. is fixed at [𝜀min , 𝜀max ] = [0.1, 1.0].
To ensure comparability, all methods share identical grid resolution
5.1.1. Datasets (𝐺 = 128) and Hilbert encoding length (𝐿enc = 16). All experiments are
Experiments are primarily conducted on the widely used T-Drive implemented in Python 3.8 with PyTorch 2.4 on an NVIDIA RTX 4090
dataset, which records GPS trajectories of 10,357 taxis in Beijing GPU.
over seven days (February 28, 2008) [35]. It contains approximately
15 million spatial points after preprocessing. To further verify cross- 5.2. RQ1: Data utility evaluation
domain robustness, we additionally include the GeoLife dataset [36],
which comprises 17,621 trajectories from 182 users, covering both This experiment evaluates how AdaTraj-DP preserves the analytical
dense urban and sparse suburban mobility patterns. utility of published trajectories under different privacy budgets. All
5
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
(a) MAE of Count Queries (b) MRE of Count Queries
Fig. 2. Trajectory count query accuracy under varying 𝜀total on both datasets.
evaluations are conducted on both the T-Drive and GeoLife datasets, Table 1
covering dense and sparse mobility scenarios to ensure cross-domain Spatial fidelity comparison (average over T-Drive and GeoLife datasets). Lower
consistency. values indicate higher spatial accuracy.
𝜀total Hausdorff Distance (HD) Mean Displacement (MD)
5.2.1. Accuracy of trajectory count queries AdaTraj-DP Best Baseline AdaTraj-DP Best Baseline
We evaluate the ability of each method to answer prefix-based count 0.5 0.152 0.171 (HA-Tree) 0.098 0.113 (HA-Tree)
queries accurately. For each dataset, a query set  consisting of 1000 1.0 0.096 0.127 (HA-Tree) 0.069 0.087 (HA-Tree)
1.5 0.089 0.125 (TFIDF-DP) 0.063 0.088 (TFIDF-DP)
random trajectory prefixes with lengths between 4 and 8 is selected.
2.0 0.083 0.118 (TFIDF-DP) 0.059 0.083 (TFIDF-DP)
Let 𝑐(𝑞) denote the true count of trajectories matching prefix 𝑞 ∈ , and 3.0 0.079 0.130 (QJLP) 0.056 0.094 (QJLP)
𝑐(𝑞)
̂ be the noisy count returned by the mechanism. The data utility is
quantified using Mean Absolute Error (MAE) and Mean Relative Error
(MRE), defined as:
tasks. Two representative learning tasks are considered: (1) trajectory
1 ∑ 1 ∑ |𝑐(𝑞) 𝑐(𝑞)|
̂
MAE = |𝑐(𝑞) 𝑐(𝑞)|,
̂ MRE = (18) classification, which predicts the semantic category of a movement se-
|| 𝑞∈ || 𝑞∈ max(𝑐(𝑞), 𝛿)
quence; (2) destination prediction, which estimates the likely endpoint
where 𝛿 is a smoothing parameter (set to 1% of the total dataset size) of an ongoing trajectory. These tasks are evaluated on the T-Drive
to prevent division by zero for small counts. The results are averaged and GeoLife datasets to reflect both dense and sparse urban mobility
over ten repetitions with independent noise realizations. environments.
Effect of Privacy Budget 𝜀total . Figs. 2(a) and 2(b) illustrate the quan- 5.3.1. Trajectory classification
titative relationship between privacy strength and data utility. All A hierarchical Transformer-based model with positional encoding is
methods exhibit a convex error decay curve as 𝜀total increases from 0.5 trained on the published trajectories to perform multi-class trajectory
to 3.0, reflecting the fundamental differential privacy trade-off. classification. The model architecture follows a standard encoder setup
In the strict privacy regime (𝜖𝑡𝑜𝑡𝑎𝑙 ∈ [0.5, 1.5]), our method achieves with three attention layers and a hidden size of 256. Each experiment
the steepest marginal reduction in MAE, indicating a high return on is repeated five times under independent noise realizations, and the
privacy budget investment. Specifically, when 𝜖𝑡𝑜𝑡𝑎𝑙 increases from 0.5 average classification accuracy and macro F1-score are reported. The
to 1.0, AdaTraj-DP reduces the MAE by approximately 45.3% (from total privacy budget 𝜀total is varied from 0.5 to 3.0.
18.1 to 9.9), whereas the second-best baseline, HA-Tree, only achieves
Effect of Privacy Budget 𝜀total . Figs. 4(a) and 4(b) illustrate the influ-
a 31.4% reduction. This quantitative gap demonstrates that AdaTraj-
ence of 𝜀total on model performance. As the privacy budget increases,
DP yields a significantly higher marginal utility gain for every unit of
both accuracy and F1-score improve across all methods. AdaTraj-
privacy budget expended compared to static hierarchical structures.
DP consistently maintains the highest model utility on both datasets,
demonstrating that adaptive sensitivity control effectively preserves
5.2.2. Preservation of spatial distribution
discriminative features. The hierarchical tree representation mitigates
Spatial fidelity evaluates the geometric similarity between the orig-
local noise accumulation, supporting stable model convergence.
inal and perturbed trajectories. We use two complementary metrics:
the Hausdorff Distance (HD) for worst-case deviation and the Mean 5.3.2. Destination prediction
Displacement (MD) for average positional distortion. To evaluate predictive consistency, a sequence-to-sequence neural
Effect of Privacy Budget 𝜀total . Fig. 3 and Table 1 summarize the spatial decoder is trained to predict the destination region of each trajectory
accuracy across privacy levels. For both T-Drive and GeoLife datasets, prefix. Prediction accuracy is measured by the top-1 hit rate, while
AdaTraj-DP consistently achieves smaller deviations, demonstrating its spatial accuracy is quantified by the mean geodesic distance between
robustness across data densities and spatial patterns. The sensitivity- predicted and true destinations.
guided perturbation preserves local consistency, while adaptive budget Effect of Privacy Budget 𝜀total . Figs. 5(a) and 5(b) illustrate the results
redistribution reduces distortion in dense urban regions. of destination prediction across both datasets. AdaTraj-DP maintains
Overall, AdaTraj-DP demonstrates consistent spatial and statisti- stable predictive performance even under strict privacy constraints
cal accuracy across both datasets, validating its generalizability to (𝜀total < 1.0), consistently outperforming fixed-budget baselines that
heterogeneous mobility distributions. cannot adapt to local sensitivity variations. As the privacy budget
increases, the prediction accuracy steadily improves, while the mean
5.3. RQ2: Model utility evaluation spatial deviation between predicted and true destinations decreases.
This demonstrates that adaptive perturbation and hierarchical encoding
This experiment evaluates how the differentially private trajectories together preserve mobility semantics and ensure downstream models
generated by AdaTraj-DP retain their utility for AI-based downstream can effectively capture trajectory intent despite injected noise.
6
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
(a) Hausdorff Distance vs. Privacy (b) Mean Displacement vs. Privacy
Budget Budget
Fig. 3. Spatial fidelity comparison on T-Drive and GeoLife datasets.
(a) Classification Accuracy (b) F1-score
Fig. 4. Trajectory classification performance under varying 𝜀total on T-Drive and GeoLife datasets.
(a) Destination Prediction Accuracy (b) Destination Prediction Mean Dis-
(Top-1 Hit Rate) tance Error (km)
Fig. 5. Destination prediction accuracy and spatial deviation under varying 𝜀total on T-Drive and GeoLife datasets.
5.4. RQ3: Parameter sensitivity analysis 𝛼 = 0.6, where both the query error and model accuracy achieve
near-balanced performance. When 𝛼 < 0.4, excessive noise in point
This experiment investigates the effect of key parameters in AdaTraj- perturbation causes degraded spatial precision, while 𝛼 > 0.8 reduces
DP on privacyutility balance, focusing on two critical hyperparame- the reliability of aggregated counts in the prefix tree, highlighting the
ters: the budget allocation ratio 𝛼 and the sensitivity threshold 𝜃TFIDF . necessity of coordinated budget allocation.
All experiments are conducted with the total privacy budget 𝜀total = 1.5 In practice, the optimal 𝛼 depends on the specific utility require-
on both the T-Drive and GeoLife datasets. ments. For applications prioritizing fine-grained point precision (e.g.,
destination prediction), a larger 𝛼 (e.g., 0.60.7) is recommended to
5.4.1. Effect of budget allocation ratio 𝛼 allocate more budget to the perturbation phase. Conversely, for range
The parameter 𝛼 controls the distribution of the total privacy budget query tasks relying on aggregate statistics, a smaller 𝛼 favors the hier-
between the point-level perturbation and the hierarchical tree aggre- archical tree structure. An empirical strategy for parameter selection
gation phases, where 𝜀point = 𝛼𝜀total and 𝜀tree = (1 𝛼)𝜀total . A small involves using a small, non-sensitive validation set to estimate the
𝛼 assigns more budget to aggregation, reducing hierarchical noise, inflection point of the loss function. A balanced initialization of 𝛼 = 0.6
whereas a large 𝛼 increases point-level fidelity at the expense of tree is recommended as a default setting, which prioritizes neither point-
consistency. We vary 𝛼 from 0.1 to 0.9 and evaluate both data utility level perturbation nor structural aggregation excessively. To ensure
and model accuracy. privacy integrity, this validation set is constructed from public histor-
Figs. 6 presents the effect of 𝛼 on count query error (MAE) and ical trajectory data (e.g., open-source T-Drive samples) or a disjoint
trajectory classification accuracy. An optimal trade-off is observed near subset of historical records that does not overlap with the private
7
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
Fig. 8. Computational cost decomposition of AdaTraj-DP across three key
Fig. 6. Impact of budget allocation ratio 𝛼 on query utility and model
stages.
performance at 𝜀total = 1.5.
T-Drive dataset and the sparse, diverse GeoLife dataset. This cross-
dataset stability suggests that AdaTraj-DP is robust to heterogeneous
spatial distributions, indicating that a standard parameter configura-
tion can yield reliable performance without the need for exhaustive
hyperparameter retuning for every new application scenario.
5.5. Scalability analysis
To address practical deployment concerns, particularly for city-wide
scenarios, we analyze the scalability of AdaTraj-DP regarding both
dataset volume (number of users 𝑁) and temporal duration (trajectory
length 𝐿).
Scalability to Large-scale User Datasets. The computational complex-
Fig. 7. Effect of the sensitivity threshold 𝜃TFIDF on spatial fidelity and predic- ity of AdaTraj-DP is dominated by the linear scanning of trajectory
tive performance at 𝜀total = 1.5. points. Specifically, the sensitivity detection and adaptive perturbation
phases operate on each trajectory independently, with a time complex-
ity of 𝑂(𝑁𝐿). This independence allows for trivial parallelization
across multiple processors, significantly reducing runtime on large-
dataset . This separation guarantees that the hyperparameter tuning
scale datasets. Furthermore, the hierarchical aggregation phase inserts
process relies solely on public knowledge and does not consume the
encoded sequences into the prefix tree with a complexity of 𝑂(𝑁𝐿),
privacy budget allocated for the sensitive data.
avoiding the quadratic 𝑂(𝑁 2 ) pairwise comparisons often required by
clustering-based or 𝐾-anonymity approaches. Consequently, the run-
5.4.2. Effect of sensitivity threshold 𝜃TFIDF time of AdaTraj-DP grows linearly with the number of users, indicating
The threshold 𝜃TFIDF determines how many trajectory points are that the framework is scalable to large-scale spatiotemporal datasets
classified as sensitive during the TFIDF-based detection process. A typical of modern urban computing.
smaller threshold labels more points as sensitive, resulting in stronger
Robustness for Long Historical Trajectories. For long historical tra-
protection but higher noise magnitude. We vary 𝜃TFIDF from 0.6 to 1.2
jectories, the challenge lies in maintaining structural efficiency and
and evaluate the mean displacement (MD) and destination prediction
data utility as the sequence length increases. AdaTraj-DP addresses this
accuracy.
through two mechanisms:
Figs. 7 depicts the variation of spatial fidelity and predictive util-
ity under different 𝜃TFIDF values. As 𝜃TFIDF increases, the number of (1) Efficient Encoding: The Hilbert space-filling curve maps high-
sensitive points decreases, leading to reduced perturbation intensity dimensional spatial points into 1D integers via efficient bit-
and smaller average displacement. However, excessively large 𝜃TFIDF wise operations. Since the encoding complexity is constant per
weakens privacy coverage and slightly degrades downstream predic- point, the computational cost scales linearly with the trajectory
tion accuracy. The optimal setting is observed around 𝜃TFIDF = 0.9, length, avoiding the performance bottlenecks often associated
balancing spatial accuracy with model generalization. with complex sequence alignment methods.
(2) Depth-Robust Aggregation: Long trajectories naturally necessitate
5.4.3. Generalization and parameter stability deeper prefix trees, which typically suffer from severe budget
In the ablation studies presented above, we observed that the frame- dilution at lower levels. AdaTraj-DP addresses this through its
works utility is responsive to variations in the budget allocation ratio logarithmic layer-wise allocation (Eq. (12)), which dampens
𝛼 and sensitivity threshold 𝜃TFIDF , particularly when these parameters the noise increase rate relative to tree depth. This mechanism
approach the boundaries of their respective ranges. This sensitivity ensures that the tail ends of extended mobility sequences re-
necessitates a discussion on the models generalization capabilities tain analytical utility, preventing the rapid signal degradation
across different data distributions. commonly observed in uniform allocation schemes.
While the framework exhibits sensitivity to extreme parameter vari-
ations, it is worth noting that the optimal operating points (𝛼 ≈ Empirical Efficiency Evaluation. To complement the theoretical com-
0.6, 𝜃TFIDF ≈ 0.9) remain consistent across both the high-density plexity analysis, Fig. 8 presents the empirical runtime decomposition
8
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
of AdaTraj-DP on the T-Drive dataset. The total processing time is This transformation is controlled by the Hilbert curves order pa-
approximately 250 s. As observed, the TFIDF Analysis phase con- rameter, designated as 𝑘. When applying a Hilbert curve with order 𝑘,
stitutes the majority of the computational overhead (approx. 60%) the two-dimensional space becomes divided into a (2𝑘 ) × (2𝑘 ) cellular
due to the necessity of global statistical aggregation across the spatial grid. To guarantee that every coordinate within dataset 𝐷 receives
grid. However, the core privacy mechanisms—Prefix Tree Construction a distinct Hilbert index √assignment, the order parameter must fulfill
and Perturbation—demonstrate high efficiency. Notably, the adaptive the condition 𝑘 ≥ ⌈log |𝐷|⌉. This configuration assigns each cell,
perturbation phase accounts for less than 10% of the total time, con- including any coordinate it contains, to a unique integer within the
firming that the granular noise injection introduces negligible latency. interval [0, (2𝑘 )2 1].
This performance profile validates that AdaTraj-DP is well-suited for The binary sequence length, denoted 𝐿enc , depends on the total
periodic batch publishing scenarios (e.g., releasing trajectory updates count of representable integer values. Representing all (2𝑘 )2 = 22𝑘
every 5-10 min for traffic monitoring). While the current execution distinct values necessitates a binary sequence of length 𝐿enc = 2𝑘. The
time is sufficient for such batch-based near-real-time analytics, we transformation consists of a direct conversion from integer 𝑣𝑖,𝑗 to its
acknowledge that strictly latency-critical streaming applications may 𝐿enc -bit binary form, applying leading zero-padding when needed to
require further optimization of the tree construction process. Neverthe- maintain uniform length.
less, for the targeted high-utility analysis tasks, this computational cost Consider the following illustration: assume a Hilbert curve with
is a justifiable trade-off for the structural consistency provided by the order 𝑘 = 8. Under these conditions: The cellular count equals (28 )2 =
framework. 65,536. The integer value 𝑣𝑖,𝑗 resides within the interval [0, 65535]. The
necessary binary sequence length becomes 𝐿enc = 2 × 8 = 16.
6. Conclusion When coordinate 𝑝𝑖,𝑗 maps to integer 𝑣𝑖,𝑗 = 47593, its 16-bit binary
sequence representation becomes:
This study presented AdaTraj-DP, an adaptive privacy-preserving
𝑠𝑖,𝑗 = Encode(47593, 16) = "1011100111101001". (A.1)
framework for publishing trajectory data with differential privacy guar-
antees. The framework introduces context-aware sensitivity modeling This sequence 𝑠𝑖,𝑗 serves as the actual element for navigating and
and adaptive budget allocation to balance privacy protection and an- constructing the prefix tree. Individual bits within the sequence deter-
alytical utility in AI-based mobility analysis. By integrating personal- mine decisions at corresponding tree levels, establishing a multi-level
ized perturbation with hierarchical prefix-tree aggregation, AdaTraj-DP spatial indexing structure. The selection of parameter 𝑘 (and conse-
enables trajectory-level differential privacy while maintaining spatial quently 𝐿enc ) represents a crucial design choice that mediates between
fidelity and downstream model performance. spatial granularity and the prefix trees dimensions and computational
Future work will focus on extending AdaTraj-DP to support multi- overhead.
modal trajectory data, integrating semantic and temporal context under
unified privacy constraints. Additionally, to address the efficiency con- Data availability
cerns in high-frequency streaming environments, we plan to investigate
incremental tree update algorithms. This would allow the framework Data will be made available on request.
to handle real-time data streams with significantly lower latency while
maintaining the established privacy guarantees.
References
CRediT authorship contribution statement
[1] W. Zhang, M. Li, R. Tandon, H. Li, Online location trace privacy: An information
theoretic approach, IEEE Trans. Inf. Forensics Secur. 14 (1) (2018) 235250.
Yongxin Zhao: Writing review & editing, Writing original [2] F. Jin, W. Hua, M. Francia, P. Chao, M.E. Orlowska, X. Zhou, A survey and
draft, Visualization, Validation, Methodology, Investigation, Data cu- experimental study on privacy-preserving trajectory data publishing, IEEE Trans.
ration, Conceptualization. Chundong Wang: Writing review & edit- Knowl. Data Eng. 35 (6) (2022) 55775596.
[3] J. Liu, J. Chen, R. Law, S. Wang, L. Yang, Travel patterns and spatial structure:
ing, Project administration, Methodology. Hao Lin: Visualization, Val-
understanding winter tourism by trajectory data mining, Asia Pac. J. Tour. Res.
idation, Methodology. Xumeng Wang: Writing review & editing, 29 (11) (2024) 13511368.
Methodology, Conceptualization. Yixuan Song: Methodology, Investi- [4] Z. Wu, X. Wang, Z. Huang, T. Zhang, M. Zhu, X. Huang, M. Xu, W. Chen, A
gation, Conceptualization. Qiuyu Du: Investigation, Conceptualization. utility-aware privacy-preserving method for trajectory publication, IEEE Trans.
Vis. Comput. Graphics.
[5] S. Schestakov, S. Gottschalk, T. Funke, E. Demidova, RE-Trace: Re-identification
Declaration of competing interest of modified GPS trajectories, ACM Trans. Spat. Algorithms Syst. 10 (4) (2024)
128.
The authors declare that they have no known competing finan- [6] C. Dwork, Differential privacy, in: International Colloquium on Automata,
cial interests or personal relationships that could have appeared to Languages, and Programming, Springer, 2006, pp. 112.
[7] Z. Yang, R. Wang, D. Wu, H. Wang, H. Song, X. Ma, Local trajectory privacy
influence the work reported in this paper. protection in 5G enabled industrial intelligent logistics, IEEE Trans. Ind. Inform.
18 (4) (2021) 28682876.
Acknowledgments [8] Z. Shen, Y. Zhang, H. Wang, P. Liu, K. Liu, Y. Shen, BiGRU-DP: Improved
differential privacy protection method for trajectory data publishing, Expert Syst.
Appl. 252 (2024) 124264.
Thanks to the National Key R&D Program of China (2023YFB2703
[9] Y. Zhao, C. Wang, Protecting privacy and enhancing utility: A novel approach for
900). personalized trajectory data publishing using noisy prefix tree, Comput. Secur.
144 (2024) 103922.
Appendix. Conversion from integer values to binary sequences [10] S. Yuan, D. Pi, X. Zhao, M. Xu, Differential privacy trajectory data protection
scheme based on R-tree, Expert Syst. Appl. 182 (2021) 115215.
[11] W. Cheng, R. Wen, H. Huang, W. Miao, C. Wang, OPTDP: Towards opti-
Our prefix tree construction necessitates the representation of each mal personalized trajectory differential privacy for trajectory data publishing,
geographic coordinate as a character sequence. Although the Hilbert Neurocomputing 472 (2022) 201211.
space-filling curve successfully transforms a two-dimensional coordi- [12] N. Niknami, M. Abadi, F. Deldar, A fully spatial personalized differentially private
nate 𝑝𝑖,𝑗 into a one-dimensional integer 𝑣𝑖,𝑗 , this numerical value can- mechanism to provide non-uniform privacy guarantees for spatial databases, Inf.
Syst. 92 (2020) 101526.
not be directly incorporated into a conventional prefix tree structure. [13] P. Liu, D. Wu, Z. Shen, H. Wang, K. Liu, Personalized trajectory privacy data
Consequently, we implement an additional transformation phase that publishing scheme based on differential privacy, Internet Things 25 (2024)
converts this integer into a binary sequence 𝑠𝑖,𝑗 with fixed length. 101074.
9
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
[14] W. Qardaji, W. Yang, N. Li, Differentially private grids for geospatial data, in: [25] T. Wang, Y. Tao, A. Gilad, A. Machanavajjhala, S. Roy, Explaining differen-
2013 IEEE 29th International Conference on Data Engineering, ICDE, IEEE, 2013, tially private query results with dpxplain, Proc. VLDB Endow. 16 (12) (2023)
pp. 757768. 39623965.
[15] G. Cormode, C. Procopiuc, D. Srivastava, E. Shen, T. Yu, Differentially private [26] Z. Huang, J. Liu, D.G. Alabi, R.C. Fernandez, E. Wu, Saibot: A differentially
spatial decompositions, in: 2012 IEEE 28th International Conference on Data private data search platform, Proc. VLDB Endow. (PVLDB) 16 (11) (2023) PVLDB
Engineering, IEEE, 2012, pp. 2031. 2023 demo / system paper.
[16] J. Hua, Y. Gao, S. Zhong, Differentially private publication of general time- [27] Y. Dai, J. Shao, C. Wei, D. Zhang, H.T. Shen, Personalized semantic trajectory
serial trajectory data, in: 2015 IEEE Conference on Computer Communications, privacy preservation through trajectory reconstruction, World Wide Web 21
INFOCOM, IEEE, 2015, pp. 549557. (2018) 875914.
[17] Z. Zhang, X. Xu, F. Xiao, LGAN-DP: A novel differential private publication [28] K. Zuo, R. Liu, J. Zhao, Z. Shen, F. Chen, Method for the protection of
mechanism of trajectory data, Future Gener. Comput. Syst. 141 (2023) 692703. spatiotemporal correlation location privacy with semantic information, J. Xidian
[18] Y. Hu, Y. Du, Z. Zhang, Z. Fang, L. Chen, K. Zheng, Y. Gao, Real-time trajectory Univ. 49 (1) (2022) 6777.
synthesis with local differential privacy, in: 2024 IEEE 40th International [29] S. Denisov, H.B. McMahan, J. Rush, A. Smith, A. Guha Thakurta, Improved
Conference on Data Engineering, ICDE, IEEE, 2024, pp. 16851698. differential privacy for sgd via optimal private linear operators on adaptive
[19] R. Zhang, W. Ni, N. Fu, L. Hou, D. Zhang, Y. Zhang, DP-LTGAN: Differentially streams, Adv. Neural Inf. Process. Syst. 35 (2022) 59105924.
private trajectory publishing via Locally-aware Transformer-based GAN, Future [30] H. Fang, X. Li, C. Fan, P. Li, Improved convergence of differential private sgd
Gener. Comput. Syst. 166 (2025) 107686. with gradient clipping, in: The Eleventh International Conference on Learning
[20] S. Jiao, J. Cheng, Z. Huang, T. Li, T. Xie, W. Chen, Y. Ma, X. Wang, DPKnob: A Representations, 2023.
visual analysis approach to risk-aware formulation of differential privacy schemes [31] J. Fu, coauthors, DPSUR: Accelerating differentially private training via selective
for data query scenarios, Vis. Inform. 8 (3) (2024) 4252. updates and release, Proc. VLDB Endow. (PVLDB) 17 (2024) PVLDB paper; PDF
[21] X. Wang, S. Jiao, C. Bryan, Defogger: A visual analysis approach for data available from VLDB site.
exploration of sensitive data protected by differential privacy, IEEE Trans. Vis. [32] Y. Zheng, Trajectory data mining: an overview, ACM Trans. Intell. Syst. Technol.
Comput. Graphics 31 (1) (2025) 448458, http://dx.doi.org/10.1109/TVCG. (TIST) 6 (3) (2015) 141.
2024.3456304. [33] M.E. Andrés, N.E. Bordenabe, K. Chatzikokolakis, C. Palamidessi, Geo-
[22] R. Chen, B.C.M. Fung, B.C. Desai, Differentially private trajectory data indistinguishability: Differential privacy for location-based systems, in: Proceed-
publication, 2011, arXiv:1112.2020, URL https://arxiv.org/abs/1112.2020. ings of the 2013 ACM SIGSAC Conference on Computer & Communications
[23] C. Yin, J. Xi, R. Sun, J. Wang, Location privacy protection based on differential Security, 2013, pp. 901914.
privacy strategy for big data in industrial internet of things, IEEE Trans. Ind. [34] W. Zhang, M. Li, R. Tandon, H. Li, Semantic-aware privacy-preserving online
Inform. 14 (8) (2017) 36283636. location trajectory data sharing, IEEE Trans. Inf. Forensics Secur. 17 (2022)
[24] Y. Zhao, C. Wang, E. Zhao, X. Zheng, H. Lin, PerTrajTree-DP: A personalized 22922306.
privacy-preserving trajectory publishing framework for trustworthy AI systems, [35] J. Yuan, Y. Zheng, C. Zhang, W. Xie, X. Xie, G. Sun, Y. Huang, T-drive: driving
in: Data Security and Privacy Protection, Springer Nature Singapore, Singapore, directions based on taxi trajectories, in: Proceedings of the 18th SIGSPATIAL
ISBN: 978-981-95-3182-0, 2026, pp. 5775. International Conference on Advances in Geographic Information Systems, 2010,
pp. 99108.
[36] Y. Zheng, X. Xie, W.-Y. Ma, et al., GeoLife: A collaborative social networking
service among user, location and trajectory, IEEE Data Eng. Bull. 33 (2) (2010)
3239.
[37] Y. Zhao, C. Wang, L. Li, X. Wang, H. Lin, Z. Liu, TrajMamba: A multi-scale
mamba-based framework for joint trajectory and road network representation
learning, 2025, https://ssrn.com/abstract=5624451.
10

View File

@@ -0,0 +1,979 @@
Computer Standards & Interfaces 97 (2026) 104116
Contents lists available at ScienceDirect
Computer Standards & Interfaces
journal homepage: www.elsevier.com/locate/csi
Chaos experiments in microservice architectures: A systematic literature
review
Emrah Esen a , Akhan Akbulut a , Cagatay Catal b ,
a
Department of Computer Engineering, Istanbul Kültür University, 34536, Istanbul, Turkey
b
Department of Computer Science and Engineering, Qatar University, Doha 2713, Qatar
ARTICLE INFO ABSTRACT
Keywords: This study analyzes the implementation of Chaos Engineering in modern microservice systems. It identifies
Chaos engineering key methods, tools, and practices used to effectively enhance the resilience of software systems in production
Microservice environments. In this context, our Systematic Literature Review (SLR) of 31 research articles has uncovered 38
Systematic literature review
tools crucial for carrying out fault injection methods, including several tools such as Chaos Toolkit, Gremlin,
and Chaos Machine. The study also explores the platforms used for chaos experiments and how centralized
management of chaos engineering can facilitate the coordination of these experiments across complex systems.
The evaluated literature reveals the efficacy of chaos engineering in improving fault tolerance and robustness of
software systems, particularly those based on microservice architectures. The paper underlines the importance
of careful planning and execution in implementing chaos engineering and encourages further research in this
field to uncover more effective practices for the resilience improvement of microservice systems.
Contents
1. Introduction ...................................................................................................................................................................................................... 2
2. Background ....................................................................................................................................................................................................... 2
2.1. Microservice architecture ........................................................................................................................................................................ 3
2.2. Microservice principles ........................................................................................................................................................................... 3
2.3. Challenges/Troubleshooting/Failures in microservice architecture .............................................................................................................. 3
2.4. Chaos engineering .................................................................................................................................................................................. 4
3. Review protocol................................................................................................................................................................................................. 4
3.1. Research questions ................................................................................................................................................................................. 4
3.2. Search strategy....................................................................................................................................................................................... 4
3.3. Study selection criteria ........................................................................................................................................................................... 4
3.4. Study quality assessment......................................................................................................................................................................... 5
3.5. Data extraction ...................................................................................................................................................................................... 5
3.6. Data synthesis ........................................................................................................................................................................................ 6
4. Results .............................................................................................................................................................................................................. 6
4.1. Main statistics ........................................................................................................................................................................................ 6
4.2. How is Chaos engineering effectively applied in production environments to enhance the resilience of software systems? .............................. 6
4.3. Which platforms have been used for chaos experiments? ........................................................................................................................... 6
4.4. How can Chaos engineering be effectively applied to microservice architecture to ensure successful implementation and enhance system
resilience? .............................................................................................................................................................................................. 10
4.5. To what extent can the centralized provision of Chaos engineering effectively facilitate the management of chaos experiments across complex
systems?................................................................................................................................................................................................. 10
4.6. What are the challenges reported in the relevant papers? .......................................................................................................................... 10
5. Discussion ......................................................................................................................................................................................................... 10
5.1. General discussion .................................................................................................................................................................................. 10
5.2. Threats to validity .................................................................................................................................................................................. 12
Corresponding author.
E-mail address: ccatal@qu.edu.qa (C. Catal).
https://doi.org/10.1016/j.csi.2025.104116
Received 22 September 2024; Received in revised form 28 November 2025; Accepted 12 December 2025
Available online 15 December 2025
0920-5489/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
E. Esen et al. Computer Standards & Interfaces 97 (2026) 104116
6. Conclusion ........................................................................................................................................................................................................ 12
CRediT authorship contribution statement ........................................................................................................................................................... 12
Declaration of competing interest ........................................................................................................................................................................ 12
Data availability ................................................................................................................................................................................................ 12
References......................................................................................................................................................................................................... 12
challenges faced, and solutions. In addition, it will assess the effective-
1. Introduction ness of chaos experiments in enhancing the reliability and robustness of
microservice systems by using data obtained from real-world scenarios
In recent years, the adoption of microservice architecture has led to develop strategic recommendations. This study is a critical step
to the transformation of application infrastructures into distributed in understanding the applicability and impact of chaos engineering
systems. These systems are designed to enhance maintainability by de- within the complexity of microservice architectures and aims to make
coupling services. The primary benefit of this architecture is the ease of significant contributions to the body of knowledge in this field. Recent
maintenance of individual services within the microservice ecosystem research has applied chaos engineering for this architectural style, how-
due to their smaller and more modular nature [1]. However, despite ever, a systematic overview of the state-of-the-art on the use of chaos
these advantages, the distributed nature of microservices introduces engineering in the microservice architecture is lacking. Therefore, a
significant challenges. Specifically, the complex management of ser- Systematic Literature Review (SLR) has been performed to provide an
vices and their tight integration can considerably complicate software overview of how chaos engineering was applied.
debugging. Debugging becomes complex in this architecture due to its This article primarily targets peer-reviewed research papers to main-
distributed nature, the necessity to pinpoint the exact service causing tain methodological consistency and ensure scholarly rigor. We specif-
the problem, and the dynamic characteristics of microservices. Con- ically chose a systematic literature review (SLR) methodology because
sequently, debugging in microservice architecture demands a greater peer-reviewed academic studies are subject to rigorous validation pro-
level of effort and specialized expertise compared to conventional cesses, which enhance the reliability and validity of our findings [8,
monolithic architectures [2]. However, it becomes quite challenging to 9]. Although excluding industry-specific, grey literature may restrict
predict what will happen if there is an unexpected error or if a service certain practical perspectives, this choice was deliberately made to
on the network goes out of service. Service outages can be caused by avoid potential biases and uphold the scientific integrity of our re-
anything from a malicious cyberattack to a hardware failure to simple view [10,11]. However, future studies could broaden the scope to
human error, and they can have devastating financial consequences. incorporate industrial case studies and practical experiences, which
Although such unexpected situations are rare, they can interfere with would enrich our understanding of chaos engineerings applicability
the operation of distributed systems and devastatingly affect the live beyond the academic context.
environment in which the application is located [3]. It is necessary to The main contributions of this study are listed as follows:
detect points in the system before an error occurs and spreads to the
1. To the best of our knowledge, this is the first study to employ
entire system.
a systematic literature review approach in the field of chaos
Microservice architecture applications undergo testing procedures
engineering on microservice architecture applications [12]. The
to ensure their quality and dependability. These include unit testing,
study provides an extensive systematic literature review of how
service test, end-to-end test, behavior-driven test, integration test, and
chaos engineering can be applied to enhance the resilience of mi-
regression test [4]. The comprehensive approach to microservices test-
croservice architectures. It collates findings from various sources
ing also encompasses live testing strategies for complex systems [5].
to provide insights into the current state of research and practice
This thorough process emphasizes different aspects such as function-
in this field.
ality, interoperability, performance of individual services within the
2. The study categorizes and summarizes the range of chaos en-
architecture. It aims to detect and resolve issues early to ensure stable
gineering tools and methods used in industry and academia,
and high-quality microservice applications [1,6]. However, considering
highlighting their functionalities in process/service termination,
that microservices consist of multiple services, the application should
network simulation, load stressing, security testing, and fault
not have an impact on the user experience in cases such as network
injection within application code.
failures and suddenly increased service loads. For example, if the
3. This research paper discusses contemporary techniques and ap-
microservice that adds the product to favorites on a shopping site fails
proaches for implementing chaos engineering in microservice
or responds late, the user should be able to continue the shopping ex-
architectures. It also emphasizes the ongoing work in this field,
perience. Therefore, testing operations in production-like environments
offering a significant reference for future research endeavors.
become inevitable. No matter how distributed or complex the system
The paper systematically reviews existing literature to showcase
is, there is a need for a method to manage unforeseeable situations
how chaos engineering can enhance system resilience, laying a
that can build trust in the system against unexpected failures. chaos
comprehensive groundwork for further exploration into chaos
engineering is defined as the discipline of conducting experiments in a
experimentation strategies and innovating new fault injection
live environment to test or verify the reliability of software [7].
methods or tools within microservice architectures.
The primary objective of this research is to conduct a thorough
investigation into how chaos experiments are performed in the widely The rest of the paper is structured as follows: Section 2 explains
used microservices-based systems of today. Microservice architectures the background and related work. Section 3 presents the methodology
have come to the forefront in modern software development processes of the research. Section 4 presents the results and Section 5 compre-
due to their advantages such as flexibility, scalability, and rapid de- hensively discusses the presented answers to research questions and
velopment. However, these architectures also bring unique challenges validity threats. Lastly, the conclusion is presented in Section 6.
due to complex service dependencies and dynamic operational environ-
ments. This study aims to comprehensively address the methodologies, 2. Background
application scenarios, and impacts of chaos experiments conducted
to test the resilience of microservice systems and identify potential The microservice approach breaks down a large application into a
weak points. The research intends to present the current state of chaos network of small, self-contained units, each running its own process
engineering practices by analyzing them, highlighting best practices, and often communicating through web APIs. Unlike large, single-piece
2
E. Esen et al. Computer Standards & Interfaces 97 (2026) 104116
monolithic systems, these small services are robust, easy to scale up or Technology heterogeneity. They are treated as small services, each run-
down, and can be updated individually using various programming lan- ning independently and communicating with each other using open
guages and technologies. This structure allows development teams to be protocols. While monolithic applications are developed with a single
smaller and more agile, leading to faster updates and improvements. programming language and database system, services included in a
Yet, managing many interconnected services can become complicated, microservice ecosystem may use a different programming language and
especially when something goes wrong. To enhance system reliability database. This allows the advantages of each programming language
and resilience, a method known as chaos engineering is employed. This and database to be used.
involves deliberately introducing problems into the live system to test
Resilience. When an error occurs in the system in monolithic applica-
its ability to cope and recover. This technique helps to uncover and
tions, the whole system is affected. In the microservice architecture,
rectify flaws, thereby making the system stronger overall. Regular and
only the part under the responsibility of the relevant service is affected,
automated tests mimic real-life problems to ensure that the system can the places belonging to other services are not affected and the user
handle unexpected challenges and remain stable and efficient. experience continues.
2.1. Microservice architecture Scalability. While the scaling process on monolithic applications covers
the entire application, the services that are under heavy load can be
Microservice architectures have gained significant popularity in the scaled in applications developed with microservice architecture. This
software industry due to their ability to address the challenges and prevents extra resource costs for partitions that do not need to be scaled
complexities of developing modern applications [6,13]. unnecessarily and increases the user experience.
Deployment. Microservice architecture facilitates the autonomous de-
2.2. Microservice principles ployment of individual services, enabling updates or changes without
impacting others. Various deployment strategies, including bluegreen,
Microservice architectures are based on the concept of decentral- canary, and rolling deployment, minimize disruptions during the de-
ization, where each service is independently developed, deployed, and ployment process [18]. As a result, microservice architecture provides
managed. This emphasizes autonomy and minimal inter-service depen- increased flexibility and resilience in deployment, distinguishing it
dencies. Each microservice is designed to focus on a single function or from monolithic applications.
closely related set of functions and supports technology heterogeneity
by allowing different services to use different technology stacks that Organizational alignment. In software development processes, some
best suit their needs. Resilience is a core aspect, with services built to challenges may be encountered due to large teamwork and large pieces
withstand failures without affecting the entire system while scalability of code. It is possible to make these challenges more manageable with
enables services to be scaled independently as per demand. Com- smaller teams established. At the same time, this is an indication that
munication occurs through lightweight mechanisms like HTTP/REST microservices applications allow us to form smaller and more cohesive
APIs, supporting continuous delivery and deployment practices. Due teams. Each team is responsible for its own microservice and can take
to the distributed nature of microservice architecture, comprehensive action by making improvements if necessary.
monitoring and logging for observability becomes crucial. Additionally,
there is often an alignment between the microservice architecture 2.3. Challenges/Troubleshooting/Failures in microservice architecture
and organizational structure involving small cross-functional teams
Microservice architectures pose numerous challenges. As the num-
responsible for individual services [14].
ber of services increases, the complexity of service interactions also
It is helpful to compare the microservice architecture to the mono-
grows. Network communication reliance leads to latency and net-
lithic architecture. The main difference between them is the dimensions
work failure issues, while ensuring data consistency across multiple
of the developed applications. The microservice architecture can be
databases requires careful design and implementation of distributed
thought of as developing an application as a suite of smaller services,
transactions or eventual consistency models. Microservices bring typ-
rather than as a single, monolithic structure. Enterprise applications
ical distributed system challenges such as handling partial failures,
usually consist of three main parts: a client-side user interface (i.e., con-
dealing with latency and asynchrony, complex service discovery, load
taining HTML pages and Javascript running on the users machine
balancing in dynamic scaling environments, and managing configu-
in a browser), a database (i.e., composed of many tables, common
rations across multiple services and environments. Security concerns
and often relational, added to database management), and a server-
are heightened due to increased inter-service communications surface
side application. In the server-side application, HTTP requests are area. Testing becomes more complex involving individual service test-
processed, business logic is executed, HTML views are prepared that ing along with testing their interactions; deployment is challenging
will retrieve data from the database and update it and send it to the especially when there are dependencies between services; effective
browser. This structure is a good example of monoliths. Any changes observability and monitoring become crucial for timely issue resolu-
to the system involve creating and deploying a new version of the tion; versioning management is critical for maintaining system stability;
server-side application [15]. The cycles of change are interdependent. lastly assembling skilled teams proficient in DevOps, cloud computing,
A change to a small part of the application requires rebuilding and programming languages presents a significant challenge. Microservice
deploying the entire monolith [6]. architecture faces various challenges, troubleshooting, and failures.
Microservice architecture, on the other hand, has some common While adopting a distributed architecture enhances modularity, it in-
features, unlike monolithic architecture. These are componentization herently introduces operational complexities that differ significantly
with services, organizing around job capabilities, smart interfaces and from monolithic structures. Recent research has also explored the use
simple communication, decentralized governance, decentralized data of hybrid bio-inspired algorithms to optimize this process dynamically.
management, infrastructure automation, and design for failure [16]. For instance, the Hybrid KookaburraPelican Optimization Algorithm
Today, although modern internet applications seem like a single appli- has been shown to improve load distribution and system scalability in
cation, they use microservice architectures behind them. Microservice cloud and microservice-based environments [19].
architecture basically refers to small autonomous and interoperability In conclusion, while microservices offer numerous advantages such
services. It has emerged due to increasing needs such as technology as improved scalability, flexibility, and agility, they also introduce
diversity, flexibility, scaling, ease of deployment, organization and significant challenges in terms of system complexity, operational de-
management, and provides various advantages in these matters. Its mands, and the need for skilled personnel and sophisticated tool-
advantages are described as follows [17]: ing [20].
3
E. Esen et al. Computer Standards & Interfaces 97 (2026) 104116
2.4. Chaos engineering 3.1. Research questions
Chaos engineering is the discipline of experimenting on a dis- Research Questions (RQs) and their corresponding motivations are
tributed system in order to build confidence in the systems capability presented as follows:
to withstand turbulent conditions in production-like environment [7,
• RQ1: How is Chaos engineering effectively applied in production
21]. It is the careful and planned execution of experiments to show how
environments to enhance the resilience of software systems?
the distributed system will respond to a failure. It is necessary for large-
Motivation: Understanding the practical implementation of Chaos
scale software systems because it is practically impossible to simulate
engineering in production environments is crucial for ensuring
real events in test environments. Experiments based on real events are the resilience of software systems under real-world operating
created together with chaos engineering [22]. By analyzing the test conditions.
results, improvements are made where necessary, and in this way, it • RQ2: Which platforms have been used for Chaos experiments?
is aimed to increase the reliability of the software in the production Motivation: Identifying the platforms provides insights into the
environment. technological landscape and tools available for conducting Chaos
Thanks to an experimental and systems-based approach, confidence engineering practices.
is established for the survivability of these systems during collapses. • RQ3: How is Chaos engineering effectively applied to microser-
Canary analysis collects data on how distributed systems react to vice architectures to ensure its successful implementation in en-
failure scenarios by observing their behavior in abnormal situations and hancing system resilience?
performing controlled experiments [23]. This method involves applying Motivation: Microservice architectures introduce new challenges
new updates or changes to a specific aspect of the system, enabling in system design. Exploring the application of Chaos engineering
early detection of potential problems before they affect a larger scale. in this context can help improve the resilience and fault tolerance
Chaos experiments consist of the following principles [24,25]: of microservice systems.
• RQ4: To what extent can the centralized provision of Chaos
• Hypothesize steady state: The first step is to hypothesize the engineering effectively facilitate the management of Chaos exper-
steady state of the system under normal conditions. iments across complex systems?
• Vary real-world events: The next step is to vary real-world events Motivation: Understanding the feasibility of providing Chaos en-
that can cause turbulence in the system. gineering as a centralized service enables organizations to coor-
• Run experiments in production: Experimenters should run the ex- dinate Chaos experiments across complex systems.
periments in production-like environment to simulate real-world • RQ5: What are the challenges reported in the relevant papers?
conditions. Motivation: Identifying these challenges provides valuable in-
• Automate experiments to run continuously: Experimenters should sights into overcoming obstacles and advancing the adoption of
automate the experiments to run continuously, ensuring that the Chaos engineering practices.
system can withstand turbulence over time.
• Minimize blast radius: The experiments should be designed to 3.2. Search strategy
minimize blast radius, i.e., the impact of the experiment on the
system should be limited to a small area The primary studies were carefully selected from the papers pub-
• Analyze results: Experimenters should analyze the results of the lished between 2010 and 2022 because the topic is only relevant in
experiments to determine the systems behavior under turbulent recent years. The databases are IEEE Xplore, ACM Digital Library,
conditions. Science Direct, Springer, Wiley, MDPI and Scopus and Science Direct.
• Repeat experiments: The experiments should be repeated to en- The initial search involved reviewing the titles, abstracts, and keywords
sure that the system can consistently withstand turbulence. of the studies identified in the databases. The search results obtained
When the experiment is finished, information about the actual from the databases were stored in the data extraction form using a
effect will be provided to the system. spreadsheet tool. Furthermore, this systematic review was conducted
collaboratively by three authors.
The following search string was used to broaden the search scope:
3. Review protocol ((chaos engineering) OR (chaos experiments)) OR (microservices)
The results of the searches made in the databases mentioned above
Systematic review studies must be conducted using a well-defined are shown in Fig. 2.
and specific protocol. To conduct a systematic review study, all studies
on a particular topic must be examined [12]. We followed the system- 3.3. Study selection criteria
atic review process shown in Fig. 1 and took all the steps to reduce risk
bias in this study. Multiple reviewers were involved in the SLR process, After applying exclusion inclusion criteria, 55 articles were ob-
and in cases of conflict, a brief meeting was organized to facilitate tained. The exclusion criteria in our study are shown as follows:
consensus. The first step is to define the research questions. Then,
the most appropriate databases were selected. Based on the selected • EC-1: Duplicate papers from multiple sources
databases, automated searches were conducted and several articles • EC-2: Papers without full-text availability
were identified. Selection criteria were then established to determine • EC-3: Papers not written in English
• EC-4: Survey papers
which studies should be included and excluded in this research. The
• EC-5: Papers not related to Chaos engineering
titles and abstracts of all studies were reviewed. In cases of doubt,
the full text of the publication was reviewed. Then, after the studies The inclusion criteria in our study are shown as follows:
were analyzed in detail, selection criteria were applied. All selected
studies were assessed using a quality assessment process. Subsequently, • IC-1: Primary papers discussing the use of Chaos experiments in
the results were synthesized, listed, and summarized in a clear and a microservice architecture
understandable manner. • IC-2: Primary publications that focus on Chaos engineering
4
E. Esen et al. Computer Standards & Interfaces 97 (2026) 104116
Fig. 1. SLR review protocol.
Source: Adapted from [26
28].
Fig. 2. Distribution of selected papers per database.
3.4. Study quality assessment Fig. 2 presents the distribution of papers based on databases where
they were found at different selection stages. After the initial search,
The assessment of each studys quality is an indicator of the strength 4520 papers were retrieved, of which 55 remained after applying the
of evidence provided by the systematic review. The quality of studies selection criteria. After quality assessment, 31 papers were selected
was assessed using various questions. Studies of poor quality were as primary studies. The 55 papers were carefully read in full and the
not included in the present study. These criteria based on quality required data for answering the research questions were extracted.
instruments were adopted guide and other SLRs research [12]. The All the collected articles are listed in Table 1.
following questions were used to assess the quality of the studies.
3.5. Data extraction
• Q1. Are the aims of the study clearly stated?
• Q2. Are the scope and experimental design of the study clearly
defined? Data required for answering the Research Questions were extracted
• Q3. Is the research process documented adequately? from the selected articles to answer the research questions. A data
• Q4. Are all the study questions answered? extraction form was created to answer the research questions. The data
• Q5. Are the negative findings presented? extraction form consists of several metadata such as the authors first
• Q6. Do the conclusions relate to the aim of the purpose of the and last name, the title of the study, the publication year, and the type
study and are they reliable? of study. In addition to this metadata, several columns were created
to store the required information related to the research questions. By
In this study, considering all these criteria, a general quality as- employing a data extraction form, we ensured that the relevant data
sessment was performed for each paper. The rating was 2 points for required to answer each research question were systematically captured
the yes option, 0 points for the no option, and 1 point for the from the selected publications. This approach facilitated the subsequent
somewhat option. The decision threshold for classifying the paper synthesis of the findings. The data extraction process involved meticu-
as poor quality was determined based on the mean value, which lous attention to detail and ensured the reliability and integrity of the
corresponds to a total of 5 points. data used in our systematic literature review.
5
E. Esen et al. Computer Standards & Interfaces 97 (2026) 104116
Table 1
Selected primary studies.
ID Reference Title Year Database
S1 [29] Automating Chaos Experiments in Production 2019 ACM
S2 [25] Getting Started with Chaos engineering—design of an implementation framework in practice 2020 ACM
S3 [30] Human-AI Partnerships for Chaos engineering 2020 ACM
S4 [31] 3MileBeach: A Tracer with Teeth 2021 ACM
S5 [32] Service-Level Fault Injection Testing 2021 ACM
S6 [33] A Platform for Automating Chaos Experiments 2016 IEEE Xplore
S7 [34] Automated Fault-Tolerance Testing 2016 IEEE Xplore
S8 [35] Gremlin: Systematic Resilience Testing of Microservices 2016 IEEE Xplore
S9 [36] Fault Injection Techniques - A Brief Review 2018 IEEE Xplore
S10 [37] ORCAS: Efficient Resilience Benchmarking of Microservice Architectures 2018 IEEE Xplore
S11 [38] The Business Case for Chaos engineering 2018 IEEE Xplore
S12 [39] Use of Self-Healing Techniques to Improve the Reliability of a Dynamic and Geo-Distributed Ad Delivery Service 2018 IEEE Xplore
S13 [40] Security Chaos engineering for Cloud Services: Work In Progress 2019 IEEE Xplore
S14 [41] A Framework of Virtual War Room and Matrix Sketch-Based Streaming Anomaly Detection for Microservice Systems 2020 IEEE Xplore
S15 [42] CloudStrike: Chaos engineering for Security and Resiliency in Cloud Infrastructure 2020 IEEE Xplore
S16 [43] Identifying and Prioritizing Chaos Experiments by Using Established Risk Analysis Techniques 2020 IEEE Xplore
S17 [44] Fitness-guided Resilience Testing of Microservice-based Applications 2020 IEEE Xplore
S18 [24] A Chaos engineering System for Live Analysis and Falsification of Exception-Handling in the JVM 2021 IEEE Xplore
S19 [45] A Study on Chaos engineering for Improving Cloud Software Quality and Reliability 2021 IEEE Xplore
S20 [46] Chaos engineering for Enhanced Resilience of CyberPhysical Systems 2021 IEEE Xplore
S21 [47] ChaosTwin: A Chaos engineering and Digital Twin Approach for the Design of Resilient IT Services 2021 IEEE Xplore
S22 [48] Platform Software Reliability for Cloud Service Continuity—Challenges and Opportunities 2021 IEEE Xplore
S23 [49] Trace-based Intelligent Fault Diagnosis for Microservices with Deep Learning 2021 IEEE Xplore
S24 [50] A Guided Approach Towards Complex Chaos Selection, Prioritization and Injection 2022 IEEE Xplore
S25 [51] Chaos Driven Development for Software Robustness Enhancement 2022 IEEE Xplore
S26 [22] Maximizing Error Injection Realism for Chaos engineering With System Calls 2022 IEEE Xplore
S27 [52] On Evaluating Self-Adaptive and Self-Healing Systems using Chaos engineering 2022 IEEE Xplore
S28 [53] Observability and chaos engineering on system calls for containerized applications in Docker 2021 ScienceDirect
S29 [54] Scalability resilience framework using application-level fault injection for cloud-based software services 2022 Springer
S30 [55] Chaos as a Software Product Line—A platform for improving open hybrid-cloud systems resiliency 2022 Wiley
S31 [56] The Observability, Chaos engineering, and Remediation for Cloud-Native Reliability 2022 Wiley
3.6. Data synthesis Chaos engineering involves several categories of functionality that
serve distinct purposes in resilience testing. The first category involves
To answer the research questions, the data obtained are collected intentionally terminating processes or services to evaluate system be-
and summarized in an appropriate manner, which is called data syn- havior and recovery from failures [7]. Another category is network
thesis. To perform the data synthesis, a qualitative analysis process simulation, which allows engineers to replicate adverse network condi-
was conducted on the data obtained. For instance, synonyms used tions to assess system performance and reliability [25]. In the Stressing
for different categories were identified and merged in the respective Machine category, engineers subject the system to extreme loads to
fields. This comprehensive data synthesis approach allowed us to derive identify limits and potential bottlenecks [7]. In security testing, en-
insights and draw conclusions from the collected information. gineers simulate breaches or attacks to assess the systems response
and enhance defenses [7]. Lastly, engineers use fault application code
4. Results to inject targeted faults or errors into the codebase, assessing system
resilience and error-handling capabilities [24]. These categories help
The result section of the paper provides various insights into how organizations proactively identify weaknesses, strengthen system ro-
chaos engineering is applied in production environments, particularly bustness, and enhance reliability in complex technology landscapes [7].
its use in improving the resilience and reliability of microservice ar- Functionality categories of tools are presented in Fig. 6.
chitecture applications. The section discusses how fault detection is The tools utilized in industry settings are not comprehensively ad-
developed using chaos engineering tools and is mainly used in pro- dressed in articles. To provide insights for future research, the identified
tools from the additional examination were categorized based on their
duction for troubleshooting. Chaos Experiments are usually conducted
functionality, as presented in Tables 2 and 3. Table 2 displays the
in the production environment to provide realistic results. The section
tools obtained from the study, while Table 3 presents additional tools
further enumerates several tools that have been used for Chaos experi-
that have been examined. Tools listed in the table with corresponding
ments, as well as discussing general principles such as defining a steady
references indicate their inclusion in the referenced articles.
state, forming a hypothesis, conducting the experiment, and proving or
refuting the hypothesis. These principles and tools help detect problems
4.2. How is Chaos engineering effectively applied in production environ-
like hardware issues, software errors network interruptions security
ments to enhance the resilience of software systems?
vulnerabilities configuration mistakes within their respective contexts.
Table 4 examines the successful implementation of Chaos Engineer-
4.1. Main statistics ing in operational settings, covering different aspects such as goals,
techniques and resources, guiding principles, findings, limitations and
Fig. 3 shows the results of the quality assessment. The distribution of substitutes, as well as the general strategy.
the years of publication is shown in Fig. 4. Most of the studies related to
our study were conducted in the last year. This shows that researchers 4.3. Which platforms have been used for chaos experiments?
interest in chaos engineering has increased in recent years. Most of the
studies included were indexed in the IEEE Xplore database. Table 5 provides a concise summary of various tools and platforms
Fig. 5 presents the distribution of the type of publications and used in Chaos experiments, along with their specific functionalities
the corresponding databases. While there are many journal papers, or characteristics. It offers comprehensive insights into each platform
conference proceedings also appear in the selected papers. through detailed descriptions accompanied by the necessary references.
6
E. Esen et al. Computer Standards & Interfaces 97 (2026) 104116
Fig. 3. Quality assessment scores.
Fig. 4. Year of publication.
Fig. 5. Diagram of the distribution of studies per search database.
7
E. Esen et al. Computer Standards & Interfaces 97 (2026) 104116
Fig. 6. Functionality of chaos engineering tools.
Table 2
Chaos engineering tools from studies.
Chaos engineering tool Termination Network simulating Stressing machine Security Fault application code
Chaos Monkey [57] ×
Gremlin [35] × × × × ×
Chaos Toolkit [45] × × × × ×
Pumba [55] × ×
LitmusChaos [45] × × × ×
ToxiProxy [45] × ×
PowerfulSeal [45] × × × ×
Pod Reaper [25] ×
Netflix Simian Army [36] × × ×
WireMock [25] × ×
KubeMonkey [25] × × ×
Chaosblade [45] × × ×
ChaosTwin [47] × × × ×
Chaos Machine [24] × × ×
Cloud Strike [42] ×
Phoebe [22] ×
Mjolnirr [58] ×
ChaosOrca [37] × × ×
3MileBeach [31] × ×
Muxy [25] × × ×
Blockade [25] ×
Chaos Lambda [25] × ×
Byte-Monkey [25] ×
Turbulence [25] × × ×
Cthulhu [25] × × × ×
Byteman [25] × ×
ChaosCube [55] ×
Chaos Lemur [25] ×
Chaos HTTP Proxy [25] ×
Chaos Mesh [45] × × ×
Istio Chaos [45] ×
ChAP [33] × ×
IntelliFT [44] × × × ×
Table 3
Chaos engineering tools from our search.
Chaos engineering tool Termination Network simulating Stressing machine Security Fault application code
Pod Chaos X X X
DNS Chaos X
AWS Chaos X X X
Azure Chaos X X X X
GCP Chaos X X X X
8
E. Esen et al. Computer Standards & Interfaces 97 (2026) 104116
Table 4
Chaos engineering in production environments.
Category Description
Objective The primary objective of applying chaos engineering in production environments is to enhance the
resilience of software systems. This involves troubleshooting to identify and address potential
malfunctions before they occur. The overarching goal is to minimize issues in production through the
use of chaos engineering tools, enabling automatic fault detection [24,53].
Methods and tools chaos engineering relies on specific tools to facilitate its effective application in production
environments. These tools aid in automatic fault detection, a crucial aspect of troubleshooting to
minimize potential issues in the production environment [24,53].
Principles and considerations The effective application of chaos engineering is closely tied to key principles and considerations.
These include continuous experimentation, serving as a form of robustness testing conducted in
real-world operational conditions. Fundamental principles of Chaos Experiments involve defining a
steady state, hypothesizing about its impact, conducting the experiment, and then demonstrating or
refuting the hypothesis [53].
Insights and results Chaos experiments conducted in the production environment provide valuable insights into the
behavior of the system. This is particularly significant as the production environment may exhibit
unpredictable behavior that differs from staging environments in some cases [24].
Constraints and alternatives While conducting chaos experiments in production is ideal, it is acknowledged that legal or technical
constraints may sometimes prevent this. In such cases, an alternative approach is considered, starting
chaos experiments in a staging environment and gradually transitioning to the production
environment [25].
Overall approach The overall approach for the effective application of chaos engineering in production environments
involves the systematic execution of chaos experiments. This includes leveraging chaos engineering
tools and taking into account the constraints and challenges associated with conducting experiments in
real-world operational settings. The aim is to proactively identify and address potential issues before
they impact the production environment, ultimately enhancing the resilience of software systems.
Table 5
Chaos engineering tools identified from selected papers.
Platform/Tool Description
The Chaos Machine A tool for conducting chaos experiments at the application level on Java Virtual Machine (JVM),
using exception injection to analyze try-catch blocks for error processing [24].
Screwdriver An automated fault-tolerance testing tool for on-premise applications and services, creating realistic
error models and collecting metrics by injecting errors into the system [34].
Chaos Monkey Designed by Netflix, this tool tests the systems resilience by randomly killing partitions to check
system functionality [7,45].
Cloud Strike A security chaos engineering system for multi-cloud security, extending chaos engineering to security
by injecting faults impacting confidentiality, integrity, and availability [42].
ChaosMesh An open-source chaos engineering platform for testing the resilience and reliability of distributed
systems by intentionally injecting failures and disruptions [55].
Powerfulseal An open-source tool for testing the resilience of Kubernetes clusters by simulating real-world failures
and disruptions [55].
IntelliFT A feedback-based, automated failure testing technique for microservice applications, focusing on
exposing defects in fault-handling logic [44].
The Chaos Toolkit Open-source software that runs experiments against the system to confirm a hypothesis [25,55].
Phoebe A fault injection framework for reliability analysis concerning system call invocation errors, enabling
full observability of system call invocations and automatic experimentation [22].
Mjolnirr A private cloud platform with a built-in Chaos Monkey service for developing private PaaS cloud
infrastructure [58].
ChaosOrca A tool for Chaos engineering on containers, perturbing system calls for processes inside containers
and monitoring their effects [37].
Gremlin Offered as a SaaS technology, Gremlin tests system resilience on various parameters and conditions,
with capabilities for automation and integration with Kubernetes clusters and public clouds [35].
3MileBeach A distributed tracing and fault injection framework for microservices, enabling chaos experiments
through message serialization library manipulation [31].
ChAP A software platform for running automated chaos experiments, simulating various failure scenarios
and providing insights into system behavior under stress [29,33].
ChaosTwin Utilizes a digital twin approach in Chaos Engineering to mitigate impacts of unforeseen events,
constructing models across workload, network, and service layers [47].
Litmus Chaos An open-source cloud-native framework for Chaos Engineering in Kubernetes environments, offering a
range of chaos experiments and workflows [50].
Filibuster A testing method in chaos engineering that introduces errors into microservice architecture to validate
resilience and error tolerance [32].
9
E. Esen et al. Computer Standards & Interfaces 97 (2026) 104116
Table 6
Chaos engineering in microservices: approaches, descriptions, and expected outcomes.
Approach Description Expected impact
Fault injection testing This method involves intentionally introducing errors into the system to assess its Evaluating and enhancing the systems resilience
response, particularly in microservices by simulating various failure modes such as and stability.
network issues, service outages, or resource shortages within or between
microservices, to evaluate the systems resilience and stability [52].
Hypothesis-driven Key to chaos engineering is conducting experiments based on well-defined Identifying system weaknesses and increasing
experiments hypotheses about the normal state of the system and its expected behavior during resilience.
failure scenarios. This strategic approach enables focused experiments that assess the
resilience of both individual microservices and the overall system [45,53].
Blast radius Managing the blast radius of experiments is crucial in microservices. It involves Better understanding and enhancing the systems
management understanding the potential impact of introduced failures, starting with small resilience.
experiments and then expanding, to manage failure impacts while identifying system
vulnerabilities [45].
Resilience requirement Utilizing chaos engineering to determine and analyze the resilience requirements of Understanding specific resilience needs of each
elicitation microservice architectures. This process involves observing the systems response to microservice and their interactions.
induced faults to identify specific resilience needs of each microservice and their
interactions [52].
Continuous testing and Regularly conducting chaos experiments as part of an ongoing testing process Proactive identification and resolution of system
improvement ensures that microservices remain resilient against unforeseen issues. This continuous weaknesses, leading to continual improvement and
approach aids in proactively finding and fixing potential system weaknesses [56]. increased resilience.
Observability and Integrating chaos engineering with observability tools enhances the monitoring of Real-time tracking of responses to failures and
remediation microservices during fault injection, allowing for real-time tracking of responses to development of effective remediation strategies for
failures, aiding in the development of effective remediation strategies and overall overall system resilience improvement.
system resilience improvement [56].
4.4. How can Chaos engineering be effectively applied to microservice archi- 5.1. General discussion
tecture to ensure successful implementation and enhance system resilience?
In this article, we reviewed the literature on the application of
Table 6 provides a comprehensive overview of the different facets chaos engineering in microservice architecture to understand the state-
and projected implications of implementing chaos engineering within of-the-art. For this purpose, six research questions were defined and
microservice architecture. answered.
By implementing these approaches and strategies, organizations can In RQ1, we aimed to understand how chaos engineering is ap-
effectively integrate chaos engineering into their microservice architec- plied to production environments. Chaos engineering, when adeptly
tures to uncover vulnerabilities and enhance the overall dependability applied in production settings, serves as a pivotal tool for augmenting
of their systems. the robustness of software systems. This approach entails conducting
deliberate and controlled chaos experiments within the production en-
4.5. To what extent can the centralized provision of Chaos engineering vironment, a strategy that is instrumental in uncovering and rectifying
effectively facilitate the management of chaos experiments across complex potential issues before they escalate into full-blown system failures,
systems? thereby bolstering system uptime [38]. Moreover, chaos engineering
is characterized by the intentional injection of faults into systems.
Table 7 provides an overview of the ways in which centralized chaos This methodology is crucial for identifying and addressing security
engineering can simplify experiment management in intricate systems. flaws and risks, laying the groundwork for the development of resilient
It emphasizes advantages like standardization, resource utilization, risk application architectures [56]. By replicating adverse conditions that
mitigation, and more, resulting in enhanced system resilience and could naturally arise in production settings, chaos engineering helps
performance. detect of inherent system vulnerabilities and structural deficiencies,
fostering a proactive stance towards issue mitigation [38].
4.6. What are the challenges reported in the relevant papers? Additionally, this practice involves comprehensive testing of real-
world scenarios on operational systems. Such testing is vital for as-
Table 8 concisely presents the primary obstacles in the area of sessing the complete spectrum of software systems, encompassing both
chaos engineering and their respective resolutions. These obstacles hardware malfunctions and software glitches, within their actual de-
encompass system intricacy, hazards to live environments, resource ployment contexts. This approach significantly contributes to the en-
demands, security issues, and automation complexities. The proposed hancement of overall system resilience [38]. To effectively implement
resolutions involve phased implementation, risk assessment, knowledge chaos engineering, it is recommended to initiate with less complex
enhancement, robust security protocols, and automation approaches. experiments, leverage automation for these experiments, and focus on
areas with either high impact or high frequency of issues. Observing
5. Discussion the system at its limits is also crucial for reinforcing resilience [25].
In RQ2, we discuss various platforms that aim to increase the
In the discussion section, we summarize answers to the research flexibility and reliability of microservice architectures through chaos
questions. They mention that chaos engineering can improve robust- experiments. Tools like Gremlin, Chaos Monkey, Chaos Toolkit, Pumba,
ness by simulating real-world failure scenarios and exploring system LitmusChaos, ToxiProxy and PowerfulSeal have been utilized in indus-
reactions, especially in microservice architectures. Various tools for try settings to simulate different failure scenarios. These tools provide
implementing chaos engineering were listed and compared. They con- functions such as terminating processes, simulating network conditions,
clude by stating that the application of chaos engineering requires applying stress tests security measures and injecting faults to proac-
careful planning due to inherent challenges but has the potential to tively identify weaknesses and strengthen system robustness across
greatly improve system resilience. different technology landscapes.
10
E. Esen et al. Computer Standards & Interfaces 97 (2026) 104116
Table 7
Centralized provision in chaos engineering.
Approach Description Expected impact
Standardization Centralized provision allows for the standardization of chaos engineering practices Improved coordination and reliability of
and tools across the organization. This ensures that all teams follow consistent results.
processes and use approved tools, leading to better coordination and more reliable
results [42].
Resource optimization Centralized provision enables efficient allocation of resources for chaos experiments. Enhanced resource utilization and reduced
It allows pooling of expertise, tools, and infrastructure, reducing redundancy and redundancy.
optimizing resource utilization [38].
Risk management Centralized provision facilitates better risk management by providing oversight and Controlled experimentation and effective
governance for chaos experiments. It establishes clear guidelines, safety measures, risk management.
and expected states for running experiments in production environments, ensuring
controlled experimentation [42].
Automation and Centralized provision supports the automation of chaos experiments to run Ongoing validation of system resilience and
continuous testing continuously. This ensures regular conduction of experiments, leading to ongoing early identification of potential issues.
validation of system resilience and identification of potential issues before they
manifest as outages [38,42].
Knowledge sharing and A centralized approach encourages knowledge sharing and collaboration among Promotion of a continuous improvement
collaboration teams. It facilitates the dissemination of best practices, lessons learned, and culture and shared learning.
successful experiment designs, fostering a culture of continuous improvement and
shared learning [25].
Performance metrics and Centralized provision enables the establishment of standardized performance metrics Consistent system health measurement and
analysis and analysis methods for chaos experiments. This allows for consistent measurement more effective decision-making.
of system health and identification of deviations from steady-state, leading to more
effective decision-making and system improvements [43].
Table 8
Challenges and solutions in chaos Engineering.
Category Challenges Possible solutions References
Complexity Designing and executing effective chaos experiments To mitigate complexity, it is recommended to start with smaller, more [25,43]
in large systems is complex due to intricate manageable experiments and gradually expand the scope of chaos
interdependencies within these systems. engineering practices.
Risk of impact Concerns about causing disruptions in the production Implementing risk analysis techniques can help prioritize experiments, [45,50]
environment, affecting users and business operations. focusing on less critical system components first to minimize potential
impacts.
Resource Significant resources needed including time, expertise, Addressing resource intensiveness involves providing comprehensive [7,47]
intensiveness and infrastructure, posing a barrier for many training and education on chaos engineering best practices and tools to
organizations. equip teams with the necessary skills and knowledge.
Security Introducing controlled failures can raise security To combat security concerns, robust security measures should be [42,47]
concerns issues, potentially exposing vulnerabilities or sensitive implemented during experiments to safeguard sensitive data and prevent
data. unauthorized access.
Tooling and Developing tools for automated chaos experiments is Overcoming tooling and automation challenges requires the development [7,33,38,40,42]
automation challenging in heterogeneous and dynamic and use of automated tools for Chaos experiments, which reduce manual
environments. efforts and facilitate continuous, unattended testing.
Recent studies have emphasized the growing intersection between solutions like Netflixs Chaos Automation Platform (ChAP) and fault
artificial intelligence and cybersecurity within the context of chaos injection techniques such as service call manipulation. The emphasis is
engineering. AI-driven techniques are nowadays used for real-time placed on the need for careful planning, effective communication, risk
threat detection, anomaly prediction, and automated response mech- management, and continuous learning to ensure comprehensive and
anisms in enterprise systems. For example, generative AI models have valuable chaos experiments for enhancing overall system resilience.
been proposed to enhance cybersecurity frameworks by improving data In response to RQ5, our discussion concludes that the practical
privacy management and identifying potential attack vectors [59]. implementation of chaos engineering, despite its promise to enhance
In RQ3, we focused on understanding how chaos engineering is im- system resilience, presents numerous challenges. These challenges in-
plemented in microservice architectures. To enhance system resilience clude potential business impacts, difficulty in determining scope, the
in microservice architectures through chaos engineering, organizations
unpredictability of outcomes, time and resource constraints, system
should utilize fault injection testing to replicate failures within mi-
complexities, skill and knowledge prerequisites, interpretation of re-
croservices. They should also conduct hypothesis-driven experiments
sults, cultural readiness, and selection of appropriate tools. These all
with a solid comprehension of the normal state and anticipated behav-
necessitate meticulous planning and skilled execution for effectiveness.
ior during disruptions, while managing the scope of these experiments
to minimize impact. Additionally, it is essential to identify and an- Recent studies explore the convergence of Chaos Engineering and
alyze resilience requirements, participate in continuous testing and Artificial Intelligence (AI). Large language models (LLMs) have been
improvement efforts, as well as integrate observability tools for real- used to automate the chaos engineering lifecycle, managing phases
time monitoring during fault injection tests. Moreover, organizations from hypothesis creation to experiment orchestration and remedia-
need to establish clear communication channels across teams involved tion [60]. Meanwhile, advances in applying chaos engineering to multi-
in order to ensure effective collaboration and knowledge sharing. agent AI systems suggest new directions: for example, chaos experi-
The answer to RQ4, highlights the significance of centralized man- ments applied to LLM-based multi-agent systems can surface vulner-
agement and monitoring in conducting chaos experiments within large- abilities such as hallucinations, agent failures, or inter-agent communi-
scale microservices ecosystems. It discusses the utilization of software cation breakdowns [61]. Together, these works show how intelligent,
11
E. Esen et al. Computer Standards & Interfaces 97 (2026) 104116
adaptive chaos frameworks might evolve in microservice-based systems experiments are insightful, as they reveal system behaviors in pro-
as well. duction environments, which often differ unpredictably from staging
Recent research also discusses specific operational challenges such environments [36,53].
as load balancing and security in the context of chaos engineering. For Furthermore, the effectiveness of chaos engineering is contingent
example, an empirical study applies delay injections under different on the systematic execution of chaos experiments. These experiments,
user loads in cloud-native systems to observe how throughput and utilizing advanced chaos engineering tools, need to navigate the con-
latency change under stress, providing insights into how load balanc- straints and challenges inherent in real-world operational settings.
ing policies perform under fault conditions [62]. In parallel, several The main objective is the enhancement of system resilience, achieved
frameworks have begun integrating security-focused chaos tests that by proactively identifying and preemptively addressing potential is-
intentionally inject faults into authentication, identity management, sues [46].
and access control components to ensure that security mechanisms However, it is acknowledged that conducting chaos experiments
remain effective under stress conditions [63]. These studies highlight directly in production environments might be impeded by legal or
how chaos engineering can be extended beyond performance reliability technical constraints. In such scenarios, initiating experiments in a
to proactively strengthen both load distribution and security resilience staging environment and then gradually transitioning to the production
in microservice environments. environment offers a viable alternative. This approach ensures that
The main challenges faced by previous researchers and possible the benefits of chaos engineering can still be realized, but in a more
solutions have been discussed in the paper. The collected challenges controlled and possibly less direct manner.
were mainly related to the correct interpretation of chaos experiments Our review highlights that chaos engineering is a critical methodol-
and making sense of them. There may be more challenges, but if ogy for ensuring the resilience and robustness of software systems. By
they were not mentioned in these articles, we could not include them. following continuous experimentation and proactive troubleshooting, it
We believe that chaos engineering is still in the early stages and the offers a pathway to address the challenges faced in complex production
adoption in the software industry will take some time. environments. This SLR contributes to the scientific community by dis-
cussing these methodologies and their applications, thereby providing
5.2. Threats to validity a framework for future research and practical implementation in the
field of software system resilience.
Internal validity
The validity of this systematic literature review is threatened by CRediT authorship contribution statement
issues related to defining the candidate pool of papers, potential bias
in selecting primary studies, data extraction, and data synthesis. The Emrah Esen: Writing review & editing, Writing original draft,
application of exclusion criteria can be influenced by the researchers Visualization, Validation, Software, Methodology, Investigation, For-
biases, posing a potential threat to validity. We compiled a compre- mal analysis, Data curation. Akhan Akbulut: Writing review &
hensive list of exclusion criteria, and all conflicts were documented editing, Writing original draft, Visualization, Validation, Supervi-
and resolved through discussions among us. Data extraction validity is sion, Software, Resources, Project administration, Methodology, Inves-
crucial as it directly impacts the study results. Whenever any of us was tigation, Formal analysis, Data curation. Cagatay Catal: Writing
uncertain about data extraction, the case was recorded for resolution review & editing, Writing original draft, Visualization, Validation,
through discussions with the team. Multiple meetings were held to Supervision, Software, Resources, Project administration, Methodology,
minimize researcher bias. Investigation, Funding acquisition, Formal analysis, Data curation.
External validity Declaration of competing interest
The search for candidate papers involved using general search terms
to minimize the risk of excluding relevant studies. Despite using a broad The authors declare that they have no known competing finan-
search query to acquire more articles, there remains a possibility that cial interests or personal relationships that could have appeared to
some papers were overlooked in electronic databases or missed due to influence the work reported in this paper.
recent publications. Furthermore, although seven widely used online
databases in computer science and software engineering were searched, Data availability
new papers may not have been included.
Data will be made available on request.
6. Conclusion
Our systematic literature review (SLR) on chaos engineering has References
explored its role in enhancing the resilience of software systems in pro-
duction environments. Through our review, we have identified several [1] P. Jamshidi, C. Pahl, N.C. Mendonça, J. Lewis, S. Tilkov, Microservices: The
journey so far and challenges ahead, IEEE Softw. 35 (3) (2018) 2435, http:
crucial aspects that underline the effective application and challenges
//dx.doi.org/10.1109/MS.2018.2141039.
of chaos engineering [25]. [2] I. Beschastnikh, P. Wang, Y. Brun, M.D. Ernst, Debugging distributed systems,
Firstly, Chaos Engineering serves as a proactive troubleshooting ap- Commun. ACM 59 (8) (2016) 3237, http://dx.doi.org/10.1145/2909480.
proach in production environments [25]. By identifying and addressing [3] W. Ahmed, Y.W. Wu, A survey on reliability in distributed systems, J. Comput.
potential malfunctions before they occur, it effectively preempts system System Sci. 79 (8) (2013) 12431255, http://dx.doi.org/10.1016/j.jcss.2013.02.
006.
disruptions. This proactive strategy is significantly implemented by
[4] D. Maruf, S. Sulistyo, L. Nugroho, Applying integrating testing of microservices
chaos engineering tools that assist in automatic fault detection, thereby in airline ticketing system, Ijitee (Int. J. Inf. Technol. Electr. Eng.) 4 (2020) 39,
minimizing potential issues in these critical environments [50]. http://dx.doi.org/10.22146/ijitee.55491.
Secondly, the essence of chaos engineering is rooted in continuous [5] F. Dai, H. Chen, Z. Qiang, Z. Liang, B. Huang, L. Wang, Automatic analysis
experimentation and robustness testing under real-world operational of complex interactions in microservice systems, Complexity 2020 (2020) 112,
http://dx.doi.org/10.1155/2020/2128793.
conditions. The methodology involves a systematic approach: defining [6] J. Lewis, M. Fowler, Microservices: a definition of this new architectural term
a steady state, hypothesizing its impacts, conducting controlled exper- (2014), 2014, URL: http://martinfowler.com/articles/microservices.html (cit. p.
iments, and subsequently confirming or refuting the hypotheses. These 26).
12
E. Esen et al. Computer Standards & Interfaces 97 (2026) 104116
[7] A. Basiri, N. Behnam, R. de Rooij, L. Hochstein, L. Kosewski, J. Reynolds, C. [31] J. Zhang, R. Ferydouni, A. Montana, D. Bittman, P. Alvaro, 3MileBeach: A
Rosenthal, Chaos engineering, IEEE Softw. 33 (3) (2016) 3541, http://dx.doi. tracer with teeth, in: Proceedings of the ACM Symposium on Cloud Computing,
org/10.1109/MS.2016.60. SoCC 21, Association for Computing Machinery, New York, NY, USA, 2021, pp.
[8] R.T. Munodawafa, S.K. Johl, A systematic review of eco-innovation and perfor- 458472, http://dx.doi.org/10.1145/3472883.3486986.
mance from the resource-based and stakeholder perspectives, Sustainability 11 [32] C.S. Meiklejohn, A. Estrada, Y. Song, H. Miller, R. Padhye, Service-level fault
(2019) 6067, http://dx.doi.org/10.3390/su11216067. injection testing, in: Proceedings of the ACM Symposium on Cloud Computing,
[9] J.M. Macharia, Systematic literature review of interventions supported by inte- SoCC 21, Association for Computing Machinery, New York, NY, USA, 2021, pp.
gration of ict in education to improve learners academic performance in stem 388402, http://dx.doi.org/10.1145/3472883.3487005.
subjects in kenya, J. Educ. Pract. 6 (2022) 5275, http://dx.doi.org/10.47941/ [33] A. Blohowiak, A. Basiri, L. Hochstein, C. Rosenthal, A platform for automating
jep.979. chaos experiments, in: 2016 IEEE International Symposium on Software Reliabil-
[10] P. Gerli, J.N. Marco, J. Whalley, What makes a smart village smart? a review ity Engineering Workshops, ISSREW, 2016, pp. 58, http://dx.doi.org/10.1109/
of the literature, Transform. Gov.: People Process. Policy 16 (2022) 292304, ISSREW.2016.52.
http://dx.doi.org/10.1108/tg-07-2021-0126. [34] A. Nagarajan, A. Vaddadi, Automated fault-tolerance testing, in: 2016 IEEE
[11] R. Coppola, L. Ardito, Quality assessment methods for textual conversational Ninth International Conference on Software Testing, Verification and Validation
interfaces: a multivocal literature review, Information 12 (2021) 437, http: Workshops, ICSTW, 2016, pp. 275276, http://dx.doi.org/10.1109/ICSTW.2016.
//dx.doi.org/10.3390/info12110437. 34.
[12] B. Kitchenham, O. Pearl Brereton, D. Budgen, M. Turner, J. Bailey, S. Linkman, [35] V. Heorhiadi, S. Rajagopalan, H. Jamjoom, M.K. Reiter, V. Sekar, Gremlin:
Systematic literature reviews in software engineering A systematic literature Systematic resilience testing of microservices, in: 2016 IEEE 36th International
review, Inf. Softw. Technol. 51 (1) (2009) 715, http://dx.doi.org/10.1016/j. Conference on Distributed Computing Systems, ICDCS, 2016, pp. 5766, http:
infsof.2008.09.009, Special Section - Most Cited Articles in 2002 and Regular //dx.doi.org/10.1109/ICDCS.2016.11.
Research Papers. [36] R.K. Lenka, S. Padhi, K.M. Nayak, Fault injection techniques - a brief review,
[13] N. Dragoni, S. Giallorenzo, A.L. Lafuente, M. Mazzara, F. Montesi, R. Mustafin, L. in: 2018 International Conference on Advances in Computing, Communication
Safina, Microservices: yesterday, today, and tomorrow, 2017, arXiv:1606.04036. Control and Networking, ICACCCN, 2018, pp. 832837, http://dx.doi.org/10.
[14] P.D. Francesco, I. Malavolta, P. Lago, Research on architecting microservices: 1109/ICACCCN.2018.8748585.
Trends, focus, and potential for industrial adoption, in: 2017 IEEE International [37] A. van Hoorn, A. Aleti, T.F. Düllmann, T. Pitakrat, ORCAS: Efficient resilience
Conference on Software Architecture, ICSA, 2017, pp. 2130, http://dx.doi.org/ benchmarking of microservice architectures, in: 2018 IEEE International Sym-
10.1109/ICSA.2017.24. posium on Software Reliability Engineering Workshops, ISSREW, 2018, pp.
[15] M. Fowler, Patterns of Enterprise Application Architecture, Addison-Wesley 146147, http://dx.doi.org/10.1109/ISSREW.2018.00-10.
Longman Publishing Co., Inc., USA, 2002. [38] H. Tucker, L. Hochstein, N. Jones, A. Basiri, C. Rosenthal, The business case for
chaos engineering, IEEE Cloud Comput. 5 (3) (2018) 4554, http://dx.doi.org/
[16] J. Lewis, M. Fowler, Microservices, 2014, https://martinfowler.com/articles/
10.1109/MCC.2018.032591616.
microservices.html.
[39] N. Brousse, O. Mykhailov, Use of self-healing techniques to improve the
[17] S. Newman, Building Microservices: Designing Fine-Grained Systems, " OReilly
reliability of a dynamic and geo-distributed ad delivery service, in: 2018
Media, Inc.", 2021.
IEEE International Symposium on Software Reliability Engineering Workshops,
[18] C.K. Rudrabhatla, Comparison of zero downtime based deployment techniques in
ISSREW, 2018, pp. 15, http://dx.doi.org/10.1109/ISSREW.2018.00-40.
public cloud infrastructure, in: 2020 Fourth International Conference on I-SMAC
[40] K.A. Torkura, M.I. Sukmana, F. Cheng, C. Meinel, Security chaos engineering for
(IoT in Social, Mobile, Analytics and Cloud), I-SMAC, 2020, pp. 10821086,
cloud services: Work in progress, in: 2019 IEEE 18th International Symposium
http://dx.doi.org/10.1109/I-SMAC49090.2020.9243605.
on Network Computing and Applications, NCA, 2019, pp. 13, http://dx.doi.org/
[19] S.R. Addula, P. Perugu.P, M.K. Kumar, D. Kumar, B. Ananthan, R. R, S. P, S.
10.1109/NCA.2019.8935046.
G, Dynamic load balancing in cloud computing using hybrid Kookaburra-Pelican
[41] H. Chen, P. Chen, G. Yu, A framework of virtual war room and matrix sketch-
optimization algorithms, in: 2024 International Conference on Augmented Re-
based streaming anomaly detection for microservice systems, IEEE Access 8
ality, Intelligent Systems, and Industrial Automation, ARIIA, 2024, pp. 17,
(2020) 4341343426, http://dx.doi.org/10.1109/ACCESS.2020.2977464.
http://dx.doi.org/10.1109/ARIIA63345.2024.11051893.
[42] K.A. Torkura, M.I.H. Sukmana, F. Cheng, C. Meinel, CloudStrike: Chaos engi-
[20] M. Waseem, P. Liang, M. Shahin, A systematic mapping study on microservices
neering for security and resiliency in cloud infrastructure, IEEE Access 8 (2020)
architecture in devops, J. Syst. Softw. 170 (2020) 110798, http://dx.doi.org/10.
123044123060, http://dx.doi.org/10.1109/ACCESS.2020.3007338.
1016/j.jss.2020.110798.
[43] D. Kesim, A. van Hoorn, S. Frank, M. H00E4ussler, Identifying and prioritizing
[21] C. Rosenthal, N. Jones, Chaos Engineering: System Resiliency in Practice, OReilly
chaos experiments by using established risk analysis techniques, in: 2020 IEEE
Media, 2020.
31st International Symposium on Software Reliability Engineering, ISSRE, 2020,
[22] L. Zhang, B. Morin, B. Baudry, M. Monperrus, Maximizing error injection realism pp. 229240, http://dx.doi.org/10.1109/ISSRE5003.2020.00030.
for chaos engineering with system calls, IEEE Trans. Dependable Secur. Comput. [44] Z. Long, G. Wu, X. Chen, C. Cui, W. Chen, J. Wei, Fitness-guided resilience
19 (4) (2022) 26952708, http://dx.doi.org/10.1109/TDSC.2021.3069715. testing of microservice-based applications, 2020, pp. 151158, http://dx.doi.org/
[23] Š. Davidovič, B. Beyer, Canary analysis service, Commun. ACM 61 (5) (2018) 10.1109/ICWS49710.2020.00027.
5462, http://dx.doi.org/10.1145/3190566. [45] S. De, A study on chaos engineering for improving cloud software quality
[24] L. Zhang, B. Morin, P. Haller, B. Baudry, M. Monperrus, A chaos engineering and reliability, in: 2021 International Conference on Disruptive Technologies
system for live analysis and falsification of exception-handling in the JVM, IEEE for Multi-Disciplinary Research and Applications, CENTCON, Vol. 1, 2021, pp.
Trans. Softw. Eng. 47 (11) (2021) 25342548, http://dx.doi.org/10.1109/TSE. 289294, http://dx.doi.org/10.1109/CENTCON52345.2021.9688292.
2019.2954871. [46] C. Konstantinou, G. Stergiopoulos, M. Parvania, P. Esteves-Verissimo, Chaos
[25] H. Jernberg, P. Runeson, E. Engström, Getting started with chaos engineering engineering for enhanced resilience of cyber-physical systems, in: 2021 Re-
- design of an implementation framework in practice, in: Proceedings of the silience Week, RWS, 2021, pp. 110, http://dx.doi.org/10.1109/RWS52686.
14th ACM / IEEE International Symposium on Empirical Software Engineering 2021.9611797.
and Measurement, ESEM, ESEM 20, Association for Computing Machinery, New [47] F. Poltronieri, M. Tortonesi, C. Stefanelli, ChaosTwin: A chaos engineering and
York, NY, USA, 2020, http://dx.doi.org/10.1145/3382494.3421464. digital twin approach for the design of resilient IT services, in: 2021 17th
[26] A. Alkhateeb, C. Catal, G. Kar, A. Mishra, Hybrid blockchain platforms for the International Conference on Network and Service Management, CNSM, 2021,
internet of things (IoT): A systematic literature review, Sensors 22 (4) (2022) pp. 234238, http://dx.doi.org/10.23919/CNSM52442.2021.9615519.
http://dx.doi.org/10.3390/s22041304. [48] N. Luo, Y. Xiong, Platform software reliability for cloud service continuity
[27] R. van Dinter, B. Tekinerdogan, C. Catal, Predictive maintenance using digital - challenges and opportunities, in: 2021 IEEE 21st International Conference
twins: A systematic literature review, Inf. Softw. Technol. 151 (2022) 107008, on Software Quality, Reliability and Security, QRS, 2021, pp. 388393, http:
http://dx.doi.org/10.1016/j.infsof.2022.107008. //dx.doi.org/10.1109/QRS54544.2021.00050.
[28] M. Jorayeva, A. Akbulut, C. Catal, A. Mishra, Machine learning-based software [49] H. Chen, K. Wei, A. Li, T. Wang, W. Zhang, Trace-based intelligent fault diagnosis
defect prediction for mobile applications: A systematic literature review, Sensors for microservices with deep learning, in: 2021 IEEE 45th Annual Computers,
22 (7) (2022) http://dx.doi.org/10.3390/s22072551. Software, and Applications Conference, COMPSAC, 2021, pp. 884893, http:
[29] A. Basiri, L. Hochstein, N. Jones, H. Tucker, Automating chaos experiments //dx.doi.org/10.1109/COMPSAC51774.2021.00121.
in production, in: 2019 IEEE/ACM 41st International Conference on Software [50] O. Sharma, M. Verma, S. Bhadauria, P. Jayachandran, A guided approach
Engineering: Software Engineering in Practice, ICSE-SEIP, 2019, pp. 3140, towards complex chaos selection, prioritisation and injection, in: 2022 IEEE
http://dx.doi.org/10.1109/ICSE-SEIP.2019.00012. 15th International Conference on Cloud Computing, CLOUD, 2022, pp. 9193,
[30] L.B. Canonico, V. Vakeel, J. Dominic, P. Rodeghero, N. McNeese, Human-AI http://dx.doi.org/10.1109/CLOUD55607.2022.00025.
partnerships for chaos engineering, in: Proceedings of the IEEE/ACM 42nd [51] N. Luo, L. Zhang, Chaos driven development for software robustness enhance-
International Conference on Software Engineering Workshops, ICSEW 20, As- ment, in: 2022 9th International Conference on Dependable Systems and their
sociation for Computing Machinery, New York, NY, USA, 2020, pp. 499503, Applications, DSA, 2022, pp. 10291034, http://dx.doi.org/10.1109/DSA56465.
http://dx.doi.org/10.1145/3387940.3391493. 2022.00154.
13
E. Esen et al. Computer Standards & Interfaces 97 (2026) 104116
[52] M.A. Naqvi, S. Malik, M. Astekin, L. Moonen, On evaluating self-adaptive [58] D. Savchenko, G. Radchenko, O. Taipale, Microservices validation: Mjolnirr
and self-healing systems using chaos engineering, in: 2022 IEEE International platform case study, in: 2015 38th International Convention on Information and
Conference on Autonomic Computing and Self-Organizing Systems, ACSOS, 2022, Communication Technology, Electronics and Microelectronics, MIPRO, 2015, pp.
pp. 110, http://dx.doi.org/10.1109/ACSOS55765.2022.00018. 235240, http://dx.doi.org/10.1109/MIPRO.2015.7160271.
[53] J. Simonsson, L. Zhang, B. Morin, B. Baudry, M. Monperrus, Observability and [59] G.S. Nadella, S.R. Addula, A.R. Yadulla, G.S. Sajja, M. Meesala, M.H. Maturi,
chaos engineering on system calls for containerized applications in Docker, K. Meduri, H. Gonaygunta, Generative AI-enhanced cybersecurity framework for
Future Gener. Comput. Syst. 122 (2021) 117129, http://dx.doi.org/10.1016/ enterprise data privacy management, Computers 14 (2) (2025) http://dx.doi.org/
j.future.2021.04.001. 10.3390/computers14020055.
[54] A.A.-S. Ahmad, P. Andras, Scalability resilience framework using application- [60] D. Kikuta, H. Ikeuchi, K. Tajiri, Y. Nakano, ChaosEater: Fully automating chaos
level fault injection for cloud-based software services, J. Cloud Comput. 11 (1) engineering with large language models, 2025, arXiv preprint arXiv:2501.11107.
(2022) 1, http://dx.doi.org/10.1186/s13677-021-00277-z. URL https://arxiv.org/abs/2501.11107.
[55] C. Camacho, P.C. Cañizares, L. Llana, A. Núñez, Chaos as a software product [61] J. Owotogbe, Assessing and enhancing the robustness of LLM-based multi-
line—A platform for improving open hybrid-cloud systems resiliency, Softw.: agent systems through chaos engineering, in: 2025 IEEE/ACM 4th International
Pract. Exp. 52 (7) (2022) 15811614, http://dx.doi.org/10.1002/spe.3076. Conference on AI Engineering Software Engineering for AI, CAIN, 2025, pp.
[56] P. Raj, S. Vanga, A. Chaudhary, The observability, chaos engineering, and 250252, http://dx.doi.org/10.1109/CAIN66642.2025.00039.
remediation for cloud-native reliability, in: Cloud-Native Computing: How To [62] A. Al-Said Ahmad, L.F. Al-Qoran, A. Zayed, Exploring the impact of chaos
Design, Develop, and Secure Microservices and Event-Driven Applications, 2023, engineering with various user loads on cloud native applications: An exploratory
pp. 7193, http://dx.doi.org/10.1002/9781119814795.ch4. empirical study, Computing 106 (2024) 23892425, http://dx.doi.org/10.1007/
[57] M.A. Chang, B. Tschaen, T. Benson, L. Vanbever, Chaos monkey: Increasing sdn s00607-024-01292-z.
reliability through systematic network destruction, in: Proceedings of the 2015 [63] K.A. Torkura, M.I. Sukmana, F. Cheng, C. Meinel, Security chaos engineering for
ACM Conference on Special Interest Group on Data Communication, 2015, pp. cloud services: Work in progress, in: 2019 IEEE 18th International Symposium
371372. on Network Computing and Applications, NCA, 2019, pp. 13, http://dx.doi.org/
10.1109/NCA.2019.8935046.
14

View File

@@ -0,0 +1,830 @@
Computer Standards & Interfaces 97 (2026) 104113
Contents lists available at ScienceDirect
Computer Standards & Interfaces
journal homepage: www.elsevier.com/locate/csi
Co-distillation-based defense framework for federated knowledge graph
embedding against poisoning attacks
Yiqin Lu, Jiarui Chen , Jiancheng Qin
School of Electronic and Information Engineering, South China University of Technology, 510641, China
ARTICLE INFO ABSTRACT
Keywords: Federated knowledge graph embedding (FKGE) enables collaborative knowledge sharing without data ex-
Federated learning change, but it also introduces risks of poisoning attacks that degrade model accuracy or force incorrect
Knowledge graph outputs. Protecting FKGE from poisoning attacks becomes a critical research problem. This paper reveals
Poisoning attack
the malicious strategy of untargeted FKGE poisoning attacks and proposes CoDFKGE, a co-distillation-based
Knowledge distillation
FKGE framework for defending against poisoning attacks. CoDFKGE deploys two collaborative knowledge
graph embedding models on clients, decoupling prediction parameters from shared parameters as a model-
agnostic solution. By designing distinct distillation loss functions, CoDFKGE transfers clean knowledge from
potentially poisoned shared parameters while compressing dimensions to reduce communication overhead.
Experiments show CoDFKGE preserves link prediction performance with lower communication costs, eliminates
malicious manipulations under targeted poisoning attacks, and significantly mitigates accuracy degradation
under untargeted poisoning attacks.
1. Introduction embedding for entities and relations. However, real-world KGs of dif-
ferent organizations are often incomplete, making it difficult to train
Knowledge graphs (KGs) are structured representations of real- high-quality knowledge graph reasoning models. Moreover, KG data
world entities and their relationships, supporting applications in search often contains a large amount of private data, and direct data sharing
engines [1,2], recommendation systems [3,4], and security analysis [5, will inevitably lead to privacy leakage. For this reason, federated
6]. Knowledge graph embedding (KGE) techniques project entities learning [12] is introduced into knowledge graph reasoning.
and relations into low-dimensional vector spaces, enabling efficient
FKGE assumes that there are multiple participants with comple-
knowledge reasoning and completion [7]. Due to privacy regulations
mentary but incomplete KGs, aiming to derive optimal knowledge
and data sensitivity requirements, KGs across organizations within the
embeddings for each participant without data exchange. Most existing
same domain remain fragmented despite growing data volumes. In this
context, federated knowledge graph embedding (FKGE) emerges as a studies [1315] model FKGE as multiple clients that maintain local
collaborative learning technique for sharing KG embeddings without KGE models and a central server. Clients train models locally and
data exchange. However, the introduction of federation mechanisms upload the model parameters to the central server, which aggregates
will bring new privacy risks. malicious participants can inject poisoned the parameters and then returns them to the clients.
parameters during training or aggregation to launch a poisoning attack, However, since the embedding vectors are directly the model pa-
degrading model accuracy or forcing incorrect outputs. Consequently, rameters, FKGE is highly vulnerable to poisoning attacks. With the
protecting FKGE systems against poisoning attacks has emerged as a intent to reduce model performance, steal sensitive information, or dis-
critical research challenge. rupt system stability, poisoning attacks refer to malicious modifications
Unlike graph neural network (GNN)-based models, KGE models of parameters during local training or parameter aggregation on the
usually rely on the translation-based model [811]. The embedding
server. To protect the participants of FKGE, it is necessary to propose
vectors of entity and relation in the KG are directly used as learnable
a protection mechanism against FKGE poisoning attacks.
parameters. KGE models utilize different score functions to measure
Moreover, other related indicators in FKGE deserve attention. For
the plausibility of triples (h,r,t). By contrasting the outputs of existing
triples and negatively sampled triples, KGE models derive appropriate example, the federated learning of KGE requires frequent parameter
Corresponding author.
E-mail addresses: eeyqlu@scut.edu.cn (Y. Lu), ee_jrchen@mail.scut.edu.cn (J. Chen), jcqin@scut.edu.cn (J. Qin).
https://doi.org/10.1016/j.csi.2025.104113
Received 3 June 2025; Received in revised form 8 November 2025; Accepted 8 December 2025
Available online 9 December 2025
0920-5489/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113
exchange, and the use of a translation-based model will submit the en- 2.3. Poisoning attack in federated learning
tity or relation embeddings, which makes the communication overhead
greater than that of traditional federated learning. Federated Learning (FL), due to its distributed training nature,
Knowledge distillation [16] is a model compression technique that creates favorable conditions for poisoning attacks while protecting
improves the performance of a simple (student) model by transfer- data privacy. Poisoning attacks in federated learning have attracted
ring the knowledge from a complex (teacher) model. Distillation-based significant attention from researchers [25]. In federated learning sce-
methods are considered to be a feasible solution to combat poisoning narios, poisoning attacks pose serious threats to model security by
attacks [1719]. A teacher model can extract clean knowledge from manipulating partial training data or local models to embed malicious
the poisoned parameters and transfer it to a student model, thereby behaviors [26]. The literature [27] generates stealthy backdoor trig-
improving the robustness without changing the model structure. Co- gers by extracting high-frequency features from images using discrete
distillation [20] is a variant of knowledge distillation that trains two or wavelet transform and introduces an asymmetric frequency confusion
more models simultaneously, allowing mutual learning and information mechanism, achieving efficient backdoor attacks on multiple datasets.
sharing. This paper aims to design a federated knowledge graph defense
Meanwhile, many studies have proposed defense methods against poi-
framework based on Co-distillation, which can enhance the models
soning attacks. The Literature [28] proposes the Krum method, which
resistance to poisoning attacks through collaborative learning without
selects the most reliable gradient update by evaluating the consistency
changing the original FKGE architecture.
of gradients, thereby effectively defending against poisoning attacks.
The rest of this paper is organized as follows. Section 2 reviews the
The Literature [29] proposes Fl-Defender, which improves robustness
related work on FKGE and knowledge distillation. Section 3 introduces
by introducing cosine similarity to adjust the weights of parameter
the preliminary concepts and methodologies essential for addressing
aggregation. The literature [30] proposed a two-stage backdoor defense
FKGE poisoning attacks, with the main contributions of this paper
method called MCLDef based on Model Contrastive Learning (MCL),
summarized at the end of this section. In Section 4, we detail the threat
which can significantly reduce the success rate of backdoor attacks with
model and malicious strategies for targeted and untargeted poison-
only a small amount of clean data. In summary, existing research on
ing attacks in FKGE. Section 5 presents the CoDFKGE framework for
poisoning attacks in federated learning mainly focuses on traditional
defending against FKGE poisoning attacks, followed by experimental
validation in Section 6. Finally, concluding remarks and future research deep learning domains. The design ideas of defense frameworks have
directions are outlined in Section 7. laid the foundation for subsequent poisoning attack defense methods of
FKGE.
2. Related work
2.4. Security issues in FKGE
2.1. Basic FKGE framework
With the development of FKGE, its security and privacy issues have
Early research on FKGE mainly focused on how to achieve cross- attracted increasing attention, with existing research mainly focusing
client knowledge sharing and model aggregation while protecting data on privacy leakage defense. The literature [31] proposed a decentral-
privacy. FedE [13] is the first paper to introduce federated learning into ized scalable learning framework where embeddings from different KGs
KGE. FedE facilitates cross-client knowledge sharing by maintaining an can be learned in an asynchronous and peer-to-peer manner while
entity table. Nevertheless, the mechanism of sharing entity embeddings being privacy-preserving. The literature [21] conducts the first holistic
in FedE has been proven to contain privacy vulnerabilities [21]. At- study of the privacy threat on FKGE from both attack and defense
tackers can leverage the embedding information to infer the existence perspectives. It introduced three new inference attacks and proposed
of private triples within client datasets. Based on FedE, FedEC [14] a differentially private FKGE model DP-Flames with private selection
applies embedding contrastive learning for tackling data heterogeneity and an adaptive privacy budget allocation policy. Based on [21], the
and utilizes a global update procedure for sharing entity embeddings. literature [32] introduces five new inference attacks, and proposed
In response to the privacy vulnerability of FedE, FedR [15] proposed a PDP-Flames, which leverages the sparse gradient nature of FKGE for
privacy-preserving relation embedding aggregation method. By sharing
better privacy-utility trade-off.
relation embeddings instead of entity embeddings, FedR can signifi-
Compared with privacy leakage issues, research on defending
cantly reduce the communication overhead of privacy leakage risks
against poisoning attacks in FKGE is still in its early stages. Traditional
while retaining the semantic information of the KG.
federated learning typically does not directly transmit original embed-
dings. However, entity and relation embeddings are core components
2.2. Knowledge distillation in FKGE
in translation-based KGE, so direct transmission of embeddings is
required during FKGE aggregation. Direct malicious modifications to
Knowledge Distillation techniques are widely applied in the FKGE
embeddings are difficult to effectively defend against using traditional
field due to their advantages in model compression and knowledge
transfer. To cope with the drift between local optimization and global federated learning defense methods.
convergence caused by data heterogeneity, FedLU [22] proposes mu- The recent literature [33] is the first work to systematize the risks of
tual knowledge distillation. Moreover, it contains an unlearning method FKGE poisoning attacks. However, it primarily focuses on several forms
to erase specific knowledge from local clients. FedKD [23] uses knowl- of targeted poisoning attacks in FKGE, without mentioning untargeted
edge distillation to reduce communication costs, and proposes to adap- poisoning attacks. Although this research provides some defense sug-
tively learn temperature to scale the scores of triples to mitigate teacher gestions, such as zero-knowledge proof and privacy set intersection, it
over-confidence issues. In addition to FKGE, the KGE model ColE [24] does not propose specific defense methods. In summary, the existing
proposes co-distillation learning to exploit the complementarity of research lacks a systematic introduction to the untargeted poisoning
graph structure and text information. It employs Transformer and Bert attack of FKGE, and there is no complete defense method against FKGE
for graph and text respectively, then distills selective knowledge from poisoning attacks.
each others prediction logits. Overall, existing research on knowledge To address the above issues, this paper reveals the malicious strat-
distillation in FKGE primarily focuses on handling data heterogeneity, egy of FKGE untargeted poisoning attacks and proposes CoDFKGE,
with insufficient exploration of its potential value in model security. a co-distillation-based federating knowledge graph embedding frame-
This paper will explore the application of knowledge distillation in work for defending against poisoning attacks. The main contributions
FKGE security to defend against poisoning attacks. of this paper are summarized as follows.
2
Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113
1 We systematically define untargeted poisoning attacks in FKGE local KGE model to update its local embedding 𝜃𝐿𝑘 and server-shared
𝑐
and reveal the poisoning attacks malicious strategy, thereby en- embedding 𝜃𝑆𝑘 . Then, client 𝑐 uploads its shared embedding 𝜃𝑆𝑘 to the
𝑐 𝑐
hancing threat identification in FKGE and providing a foundation server. In server aggregate stage, the central server 𝑆 aggregates the
for subsequent defense research. shared embeddings from all clients to obtain the shared parameters
2 We propose CoDFKGE, the first co-distillation defense framework 𝜃𝑆𝑘+1 . Finally, the server broadcasts the shared parameters 𝜃𝑆𝑘+1 to all
against poisoning attacks in FKGE. By deploying bidirectional clients. Entity embeddings in KGE are usually shared parameters, while
distillation models with distinct distillation loss at the client side, relation embeddings are local parameters. Only rare literature [15] uses
CoDFKGE as a model-agnostic solution decouples prediction pa- relation embeddings as shared parameters.
rameters from shared parameters, thereby enhancing the models In FKGE, how the server effectively aggregates shared embeddings
resistance to poisoning attacks and improving robustness. We from different clients is a common problem. The most common FKGE
designed distinct distillation loss functions for the two models in server aggregation method is FedE [13], which is an improvement on
CoDFKGE, enabling CoDFKGE to transfer clean knowledge from FedAvg [12]. To handle the imbalance in the number of entities across
potentially poisoned shared parameters and compress shared pa- different clients, FedE aggregate the shared entities using the number
rameter dimensions, which reduces communication overhead. of occurrences in the local data as the weight 𝑤𝑐 . This weight value
3 We validated the performance of CoDFKGE against poisoning can be obtained using the existence matrix 𝑀 mentioned above. The
attacks through experiments. The results show that without com- mathematical expression for FedEs server aggregation method is shown
promising link prediction performance CoDFKGE can completely in (2).
eliminate targeted poisoning attacks and significantly mitigate ∑
𝜃𝑆𝑘+1 = 𝑐 𝑤𝑐 𝜃𝑆𝑘 (2)
the performance degradation caused by untargeted poisoning 𝑐
attacks, while simultaneously reducing communication overhead. The final target of FKGE is to minimize the loss function of all client
Ablation experiments further confirm the effectiveness of the two local triplets simultaneously through federated learning. Its optimiza-
distillation loss functions in CoDFKGE. tion objective can be expressed as Eq. (3).
∑𝐶
𝑎𝑟𝑔 min 𝑐 (𝜃𝐿𝑐 , 𝜃𝑆𝑐 ) (3)
3. Preliminaries (𝜃 ,𝜃 ) 𝑐
𝐿𝑐 𝑆𝑐
3.1. Knowledge graph embedding 3.3. Knowledge distillation
KG can be represented as (, ,  ), where E and R are entity sets Knowledge distillation is a model compression technique that trans-
and relationship sets.  is a set of triples, where a triple (, 𝑟, 𝑡) ∈  fers knowledge contained in a complex model (teacher) to a simple
indicates that a relationship 𝑟 ∈  connects the entities , 𝑡 ∈ . model (student) to improve the performance of the simple model. In the
Translation-based KGE models project entities and relationships classic knowledge distillation framework, the student models training
in KGs into a continuous vector space. Models employ the scoring loss comprises two components: the cross entropy loss 𝐿𝐶𝐸 , computed
function 𝑔(, 𝑟, 𝑡; 𝜃) to evaluate the plausibility of triples, while 𝜃 rep- between its output and the true label, and the distillation loss 𝐿𝐾𝐷 ,
resents the embedding parameters. During model training, negative computed between its output and the teacher models output (soft
samples (, 𝑟, 𝑡 ) are constructed by randomly replacing the tail entities label). In practical applications, the distillation loss is usually quantified
of positive triples. The training process aims to maximize the score using the KullbackLeibler divergence 𝐷𝐾𝐿 between the student model
discrepancy between positive and negative samples. Currently, most output and the soft label, and its mathematical expression is shown
KGE models [9,11] employ the binary cross-entropy loss to measure in Eq. (4).
the difference between positive and negative samples. Its mathematical ( ) ∑ ( )
𝑝 (𝑖)
expression is as Eq. (1). 𝐷𝐾𝐿 𝑝𝑡𝑒𝑎𝑝𝑠𝑡𝑢 = 𝑖 𝑝𝑡𝑒𝑎 (𝑖) log 𝑝𝑡𝑒𝑎 (𝑖)
( ) 𝑠𝑡𝑢 ( ) (4)
(
𝐿𝐾𝐷 = 𝜏 2 𝐷𝐾𝐿 𝜎(𝑧(𝑛) (𝑛)
𝑡𝑒𝑎 ) ∥ 𝜎(𝑧𝑠𝑡𝑢 ) , 𝑤𝑒𝑟𝑒 𝜎(𝑥) = sof tmax 𝜏
𝑥
𝐿 = log 𝜎 (𝑔(, 𝑟, 𝑡; 𝜃) 𝛾)
(,𝑟,𝑡)∈ Among them, 𝑧𝑡𝑒𝑎 and 𝑧𝑠𝑡𝑢 are the logits of the teacher model and
)
∑ student model, respectively. 𝜏 is the temperature coefficient, which is
+ 𝑝(, 𝑟, 𝑡𝑖 ; 𝜃) log 𝜎(𝛾 𝑔(, 𝑟, 𝑡𝑖 ; 𝜃)) (1) used to control the smoothness of the output.
𝑖
To allow the student model to effectively absorb the knowledge
Among them, 𝛾 represents the margin, and (, 𝑟, 𝑡𝑖 ) is 𝑖th negative contained in the teacher model while fitting the real data distribution,
triples. 𝑝(, 𝑟, 𝑡𝑖 ; 𝜃) stands for the occurrence probability of this negative the final loss function is usually the weighted sum of 𝐿𝐶𝐸 and 𝐿𝐾𝐷 .
sample given the embedding parameters 𝜃.
4. Threat model
3.2. Federated knowledge graph embedding
Poisoning attacks in federated learning can be categorized into
FKGE is an application of federated learning that aims to fuse and targeted poisoning attacks, semi-targeted poisoning attacks, and untar-
share knowledge vectors from different KGs to enhance the effective- geted poisoning attacks according to the intention of attackers [34].
ness of KGE. Currently, most related studies are based on the framework In FKGE, a semi-targeted poisoning attack can be regarded as a special
proposed in FedE [13]. case of a targeted poisoning attack. Therefore, this paper focuses on the
The basic framework of FKGE consists of a client set 𝐶 and a central targeted and untargeted poisoning attack type.
server 𝑆. Each client 𝑐𝐶 holds a local KG 𝑐 (𝑐 , 𝑐 , 𝑐 ). The entity
sets of different KGs are partially overlapping, so the understanding of 4.1. Targeted poisoning attack
entities in a certain client can be supplemented by information from
other clients. The server has the one-hot existence matrix 𝑀 ∈ R𝐶×𝑁 Targeted poisoning attacks are a attack strategy where the attacker
of all entities in the client, where 𝑁 is the number of entities. crafts specific malicious triples that do not exist in the target system,
In each client, KGE model parameters consist of local parame- and manipulate the target model to accept these fake triples by inject-
ters 𝜃𝐿 and shared parameters 𝜃𝑆 . During FKGE training, each epoch ing poisoned parameters into the shared parameters. This type of attack
progresses through two sequential phases: client update and server poses a serious threat to the application of FKGE, as the false relation-
aggregation. In the 𝑘th client update stage, client 𝑐 first trains its ships it introduces can lead to reasoning errors and decision-making
3
Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113
Fig. 1. Process of targeted poisoning attack.
Fig. 2. Framework of CoDFKGE model.
biases in downstream tasks. For example, in financial transaction net- attackers deceptive information. The shadow models parameters in-
works, a knowledge graph is constructed with transaction entities clude 𝜃𝑆𝑝 , which can be initialized with the victim shared parameters
as nodes and transaction relationships as edges. Link prediction can 𝜃𝑆𝑐 , and 𝜃𝐿𝑝 , which approximates the victims local model parameters
then be applied to detect potential transaction relationships (such as 𝜃𝐿𝑐 from random initial values. To ensure the shadow model effectively
money laundering or fraud). If an attacker compromises one of the bridges both the victims genuine knowledge and the attackers ma-
participants, they can introduce false transaction relationships through licious objectives, its parameters are optimized to minimize the loss
targeted poisoning attacks, leading to unreasonable inferences about function across all triples in the poisoned dataset, as formalized in Eq.
the victim entity. (5).
To execute such an attack successfully, the attacker typically follows arg min 𝐿(, 𝑟, 𝑡; 𝜃𝑆𝑝 , 𝜃𝐿𝑝 )
(𝜃𝑆𝑝 ,𝜃𝐿𝑝 ) (5)
a multi-stage process that begins with victims local information gath- (,𝑟,𝑡)∈𝑝
ering. Fig. 1 shows the process of a targeted poisoning attack. In FKGE
Where L is the loss function of the baseline model.
systems, while the server can observe the entities and relations each
After training the shadow model, the attacker extracts the poisoned
client possesses, it lacks visibility into how these elements are struc- shared parameters 𝜃𝑆𝑝 using the same procedure that legitimate clients
tured into specific triples. However, for frameworks that share entity employ to prepare parameters for server aggregation. The attacker can
embeddings (such as FedE [13]), recent research [21] has shown that a aggregate the poisoned parameters 𝜃𝑆𝑝 with the normal clients shared
malicious server can use KGE scoring function to infer the victims local parameters. The attacker usually operates as a compromised server and
relationship patterns and reconstruct the victims triple 𝑣 . Armed with assigns a disproportionately high weight to the poisoned parameters
this inferred knowledge, the attacker strategically constructs malicious during the aggregation process to ensure that the poisoned parameter
triples 𝑚 that align with the victims existing KG schema but represent dominate the aggregated shared parameters.
false information. The final stage of the attack exploits the implicit trust in feder-
The next critical attack phase involves training a shadow model, a ated systems. The victim client, unaware of the poisoning, directly
surrogate KGE model designed to mimic the victims learning process. incorporates the compromised aggregated parameters into its local
The shadow model is trained on a poisoned dataset 𝑝 , which combines training process without validation. As a result, the victims model
the inferred victim triples 𝑣 and the malicious triples 𝑚 . This training gradually learns to accept the malicious triples as valid, ultimately pro-
strategy ensures the shadow model learns to generate embeddings ducing incorrect predictions on these non-existent relationships while
that are consistent with both the victims genuine knowledge and the maintaining seemingly normal performance on other parts of the KG.
4
Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113
4.2. Untargeted poisoning attack facilitate the reproducibility of our CoDFKGE model, we provide the
complete training framework pseudocode as shown in Algorithm 1.
The conditions for achieving a targeted poisoning attack are com-
plex. For example, FedR [15] shares only relation embeddings (not
Algorithm 1 CoDFKGE Training Framework
entity embeddings), preventing attackers from inferring victim rela-
tions via entity matrices and thus avoiding targeted poisoning attacks. Require: Baseline KGE model 𝑔, Training triples  , Learning rate 𝜂,
Even with relational data leaks, targeted poisoning attacks are difficult. Distillation weight 𝛽, Distillation temperature 𝜏, Total iterations 𝐾
Compared with sharing entity embeddings, the sparsity of relation Initialization:
embeddings reduces the shadow models ability to align parameters 1: Initialize client-side prediction model with 𝜃0𝑃 = (𝜃0𝑆 , 𝜃0𝐿 ) ⊳ Local
with the victims vector space. However, FedR has almost no defense parameters randomly initialized
2: Initialize client-side communication model with reduced feature
effect against untargeted poisoning attacks.
dimensions
An untargeted poisoning attack means that the attacker aims to dis-
3: Initialize server-side aggregated parameters 𝜃1𝑆 = 𝜃0𝑆 ⊳ First round
rupt victim model convergence or maximize the mispredictions among
initialization
test cases. By maximizing the victims loss function during training,
Main Training Loop (Iterations 𝑘 = 1, 2, ..., 𝐾):
attackers can force non-convergent predictions. The attacker can gen-
// Client Update Phase (For each client)
erate the poisoned shared parameter 𝜃𝑆∗ for the victim, which can be
𝑣 4: for each client 𝑐𝐶 do
formalized in Eq. (6).
∑ 5: // Step 1: Communication to Prediction Model Distillation
arg max 𝐿(, 𝑟, 𝑡; 𝜃𝑆∗ , 𝜃𝐿𝑣 ) (6) 6: Load server-shared parameters 𝜃𝑘𝑆 ⊳ Latest global shared
𝜃∗𝑆𝑣 (,𝑟,𝑡)∈𝑣
𝑣
embeddings
𝐶𝐿
Among them, 𝜃𝐿𝑣 denotes the victims local parameters. 𝑣 is the 7: Initialize communication model with 𝜃 𝐶 = (𝜃𝑘𝑆 , 𝜃𝑘1 )
8: Freeze communication model parameters ⊳ Act as teacher
victims triplet set. Since it is difficult for the attacker to obtain these
model
two parameters directory, they can use random values as guesses for 𝑃
9: Compute distillation loss 𝐿𝑘 𝐾𝐷 using Equation (7) ⊳ Only
𝜃𝐿𝑣 and use triples of random combinations of 𝑣 and  as guesses for
positive samples
𝑣 . 𝑃
10: Compute KGE loss 𝐿𝑘 𝐾𝐺𝐸 on training triples 
In particular, for the TransE model [7] with the scoring function 𝑃 𝑃
𝑔(, 𝑟, 𝑡) = | + 𝑟 𝑡|, the attacker can launch an untargeted poisoning 11: Update prediction model parameters (𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 ) with:
𝑃 𝑃
attack by setting the shared parameter 𝜃𝑆′ sent to the victim to identical 12: ∇𝜃𝑘𝑃 = ∇(𝛽𝐿𝑘 𝐾𝐺𝐸 + (1 𝛽)𝐿𝑘 𝐾𝐷 )⊳ Gradient flows through
𝑣
value or using negative aggregation parameters. To avoid detection, prediction model only
𝑃 𝑃
noise is often added to poisoned parameters. The prediction perfor- 13: 𝜃𝑘 = 𝜃𝑘 𝜂∇𝜃𝑘𝑃 , 𝑤𝑒𝑟𝑒 𝜃𝑘 = {𝜃𝑘 𝐿 , 𝜃𝑘 𝑆 } ⊳ Update
mance of the victim model may even be lower than that of standalone prediction model parameters
training without federated aggregation. 14: Unfreeze communication model parameters
In general, the success of FKGE poisoning attacks relies on vic- 15: // Step 2: Prediction to Communication Model Distillation
tims using attacker-provided aggregate parameters directly for training 16: Freeze prediction model parameters 𝜃𝑘𝑃 ⊳ Used as teacher
without validation. To prevent poisoning attacks, it is critical to isolate model
𝐶
the parameters of the prediction model from externally provided aggre- 17: Compute distillation loss 𝐿𝑘 𝐾𝐷 using Equation (9) ⊳ Both
gate parameters. Specifically, potentially poisoned shared parameters samples
𝐶 𝐶
must be filtered before training. Meanwhile, minimizing parameter ex- 18: Update communication model parameters (𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 ) with
𝐶
posure to the external environment is essential. Therefore, we propose 19: ∇𝜃𝑘𝐶 = ∇𝐿𝑘 𝐾𝐷 ⊳ Gradient flows through communication
CoDFKGE, a defense FKGE framework based on co-distillation. model only
𝐶 𝐶
20: 𝜃𝑘 = 𝜃𝑘 𝜂∇𝜃𝑘𝐶 , 𝑤𝑒𝑟𝑒 𝜃𝑘 = {𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 }
𝐶
5. Model design 21: Upload updated shared parameters 𝜃𝑘 𝑆 to server
22: Unfreeze prediction model parameters
CoDFKGE is a training framework on the client side. Its training 23: end for
process is shown in Fig. 2. CoDFKGE initializes two baseline models // Server Aggregation Phase
with the same structure and scoring function, but for different purposes. 24: Server aggregates 𝜃𝑘𝑆 + 1 from all clients using baseline federated
The communication model is mainly responsible for receiving and aggregate method.
processing shared parameters, while the prediction model is used for 25: Set 𝑘 = 𝑘 + 1 and repeat main loop until 𝑘 > 𝐾 ⊳ Continue Main
the final embedding and prediction. To minimize potential parameter Training Loop
leakage and communication overhead, the feature dimension of the return Final prediction model parameters of each client.
communication model is intentionally designed to be smaller than that
of the prediction model.
During the training process, the two models learn collaboratively CoDFKGE is designed to be model-agnostic, enabling seamless in-
through knowledge distillation. Once the communication model re- tegration with diverse FKGE models based on their shared parameter
ceives the potentially poisoned shared parameters from the server, types. Both communication and prediction models used by CoDFKGE
it acts as a teacher model to transfer clean knowledge to the pre- clients utilize the same scoring function 𝑔 as the original KGE model.
diction model. Following the training of the prediction model, the Clients upload and utilize shared parameters identically to the baseline
roles are reversed: the prediction model becomes the teacher, and the
model, with these parameters maintaining the same form and dimen-
communication model serves as the student for distillation. This stage
sionality as the original implementation. This parameter compatibility
extracts knowledge from the prediction model and compresses it into
the communication model, ensuring efficient knowledge sharing while enables the server to aggregate updates using existing federated learn-
minimizing parameter exposure and communication overhead. By de- ing aggregation methods without modification. This design ensures that
ploying two distinct model instances, the framework physically isolates CoDFKGE preserves the original knowledge representation capabilities
attacker-injected parameters from the prediction models parameters, while maintaining consistent operational semantics with the baseline
making poisoning attacks significantly more difficult to execute. To model.
5
Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113
5.1. Communication to prediction model distillation of 𝑝 follows the approach in [9], with its mathematical formulation
provided in Eq. (10).
In the first iteration, the model trains the prediction component exp 𝜏 𝑔(,𝑟,𝑡 )
following the standard procedure. Starting from the second iteration of 𝑝(, 𝑟, 𝑡𝑖 ) = ∑ exp𝛼𝜏 𝑔(,𝑟,𝑡
𝑖
) (10)
𝑗 𝛼 𝑗
the training process, the communication model loads the server-shared
Where 𝜏𝛼 is the self-adversarial sampling temperature.
parameters 𝜃𝑘𝑆 and initializes itself jointly with the local embeddings
𝐿 from the previous iterations local prediction model. After the bidirectional distillation process of CoDFKGE, the com-
𝜃𝑘1 𝐶 𝐶
munication model parameters are updated to 𝜃𝑘 𝑆 and 𝜃𝑘 𝐿 . Client then
After the communication model receives and applies the server- 𝐶𝑆
uploads 𝜃𝑘 to the server, which aggregates these parameters from all
shared parameters, it filters out potentially poisoned model parameters
clients using federated averaging to generate the next rounds shared
through knowledge distillation. The communication model acts as a 𝑆 .
parameters 𝜃𝑘+1
teacher model to transfer clean knowledge to the prediction model,
which serves as the student model. During this process, the prediction
6. Experiments
model parameters are frozen to ensure that the knowledge transfer
direction is strictly from the communication model to the prediction
Experiments are conducted on the open available dataset FB15K-
model. Gradients only flow through the prediction model parameters,
237 [35], which is a subset of Freebase, containing 14,505 entities,
while the communication model parameters remain frozen, preventing
544,230 triples, and 474 relations. To perform federated learning, we
gradient leakage back to potentially poisoned shared parameters.
adopt the relational partitioning method in [22]. This method first
If the communication model suffers from poisoning attacks and
partitions the relationships through clustering, ensuring that the triple
contains the poisoning parameter, its outputs for negative samples are
relationships within each partition are as close as possible. Then, these
not reliable. Distilling or teaching such uncertain predictions would
partitions are divided into groups of roughly equal numbers of triples
propagate noise rather than useful knowledge. To exclude the poisoned
and distributed to the client. This results in tighter triple relationships
knowledge, the prediction model should focus on positive samples
within the client, better reflecting real-world scenarios.
during distillation, ensuring that only trustworthy knowledge is trans-
The TransE model [7] is selected as the KGE model, serving as
ferred. The mathematical expression for the distillation loss of the
the foundation for all federated learning methods in the experiments—
prediction model in the 𝑘th training epoch is provided in Eq. (7). including the attackers shadow model. To benchmark CoDFKGE, we
∑ ( ) select multiple baseline models. First, the local training model without
𝑃 𝑃𝐿 𝑃 𝑃
𝐿𝑘 𝐾𝐷 = 𝜏 2 𝐷𝐾𝐿 𝜎(𝑔(, 𝑟, 𝑡; 𝜃𝑘𝑆 , 𝜃𝑘1 )) ∥ 𝜎(𝑔(, 𝑟, 𝑡; 𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 ))
federated learning is selected as the KGE baseline model. It does not
(,𝑟,𝑡)∈
share parameters between clients, so it has no communication over-
(7) head and is not vulnerable to poisoning attacks. Then, FedE [13] and
Among them, 𝑡 is the distillation temperature coefficient, and 𝜎 is FedR [15] are also chosen as baseline FGKE models, representing stan-
dard approaches in the field. Additionally, we implement a knowledge
the softmax function of the ratio of the model output to 𝑡. 𝑔 represents
distillation model, which utilizes communication and prediction models
the scoring function of the prediction model, which is used to compute
𝑃𝐿 similar to CoDFKGE but only processes a unidirectional knowledge dis-
the KGE loss. 𝑔(, 𝑟, 𝑡; 𝜃𝑘𝑆 , 𝜃𝑘1 ) represents the communication model
𝑃𝐿 tillation. Specifically, it uses the communication model as the teacher
output under server-shared parameter 𝜃𝑘𝑆 and local parameter 𝜃𝑘1 , and
𝑃𝑆 𝑃𝐿 model and the prediction model as the student model to filter out
𝑔(, 𝑟, 𝑡; 𝜃𝑘 , 𝜃𝑘 ) represents the training prediction model output. poisoning knowledge, with the distillation loss function following Eq.
When training distillation, the model also needs to consider the (4).
KGE loss function. The overall loss function of the prediction model All experiments are performed on a 72-core Ubuntu 18.04.6 LTS
is the weighted sum of the KGE loss and the distillation loss, and its machine with an Intel(R) Xeon(R) Gold 5220 CPU @ 2.20 GHz and
mathematical expression is shown in Eq. (8). a V100S-PCIE-32GB GPU. We implemented the proposed FKGE frame-
𝑃
𝐿𝑃𝑘 = 𝛽𝐿𝑘 𝐾𝐺𝐸 + (1 𝛽)𝐿𝑘 𝐾𝐷
𝑃
(8) work and baseline model based on PyTorch Geometric [36] and dis-
tributed AI framework Ray [37]. We used KGE hyperparameter settings
𝑃𝑘
Where, 𝐿𝐾𝐺𝐸 is the KGE loss of the 𝑘th epoch of the prediction model based on [9] and FKGE hyperparameter settings based on FedE [13].
defined by Eq. (1), and 𝛽 is the weight. Specifically, we used the Adam [38] optimizer with a learning rate of
1e-3. 𝛾 is 10, and self-advertise negative sampling temperature 𝜏𝛼 in
5.2. Prediction to communication model distillation KGE is 1. The distillation temperature 𝜏 is 2, and the coefficient 𝛽 of
distillation and KGE loss are both 0.5. The maximum training epoch
After training the prediction model, we train the communication is 400. In each epoch, the client performs 3 iterations locally before
model through distillation, which extracts and propagates knowledge uploading the parameters to the server.
without directly sharing prediction parameters, thereby avoiding pri- We utilize the link prediction task, a sub-task of KGE, to validate the
vacy leakage. During the communication models distillation, the out- models accuracy. Referencing the common implementation of the link
put of the prediction model under positive and negative samples serves prediction, we employ the Mean Reciprocal Rank (MRR) and Hits@N as
as soft labels. As Eq. (1) illustrates, the loss function must account accuracy metrics. The MRR is the average of the reciprocals of the ranks
for the probability of negative samples when balancing the impact of the predicted triples among all possible triples. Mathematically, if
of positive and negative predictions. Therefore, the distillation loss 𝑟𝑎𝑛𝑘𝑖 is the rank of the correct triple for the 𝑖th query, and 𝑛 is the
function of the communication model is formalized in Eq. (9). total number of queries, then 𝑀𝑅𝑅 = 1𝑛 𝑛𝑖=1 𝑟𝑎𝑛𝑘 1
. The Hits@N is the
𝑖
∑ proportion of query triples for which the correct triple is present among
𝐶𝑘 𝑃 𝑃 𝐶 𝐶
𝐿𝐾𝐷 = 𝜏2 (𝐷𝐾𝐿 (𝜎(𝑔(, 𝑟, 𝑡; 𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 )) ∥ 𝜎(𝑔(, 𝑟, 𝑡; 𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 )))) the top 𝑁 candidates generated by the model. Generally, higher values
∑(,𝑟,𝑡)∈ for both metrics indicate better model performance in link prediction.
𝑃 𝑃 𝐶 𝐶
+ 𝑝(, 𝑟, 𝑡𝑖 )𝐷𝐾𝐿 (𝜎(𝑔(, 𝑟, 𝑡𝑖 ; 𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 ) ∥ 𝜎(𝑔(, 𝑟, 𝑡𝑖 ; 𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 )))) Through experiments, the following research questions will be ver-
𝑖 ified.
(9)
𝐶 𝐶
RQ1 Does CoDFKGE maintain KGE prediction performance while re-
Among them, 𝑔(, 𝑟, 𝑡; 𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 ) represents the communication model ducing FKGE communication overhead?
𝑃𝑆 𝑃𝐿
output. 𝑔(, 𝑟, 𝑡; 𝜃𝑘 , 𝜃𝑘 ) represents the prediction model output under RQ2 Can CoDFKGE effectively defend against targeted poisoning at-
𝑃 𝑃
shared parameter 𝜃𝑘 𝑆 and local parameter 𝜃𝑘 𝐿 . The calculation method tacks?
6
Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113
Table 1
Experiment result on normal link prediction.
Fed type Model Mem(MB) CC(MB) MRR Hits@1 Hits@5 Hits@10
Local Local(128) 57.05 0.4081 ± 0.0015 0.3066 ± 0.0014 0.5223 ± 0.0023 0.6077 ± 0.0015
Entity FedE(128) 185.58 42.60 0.4082 ± 0.0004 0.3068 ± 0.0012 0.5232 ± 0.0013 0.6080 ± 0.0018
Entity Distillation (128-128) 356.10 42.60 0.4129 ± 0.0008 0.3118 ± 0.0016 0.5279 ± 0.0008 0.6122 ± 0.0003
Entity CoDFKGE (128-128) 356.10 42.60 0.4109 ± 0.0043 0.3097 ± 0.0041 0.5246 ± 0.0044 0.6087 ± 0.0040
Entity Distillation (32-128) 217.39 10.65 0.3914 ± 0.0011 0.2935 ± 0.0008 0.5005 ± 0.0014 0.5838 ± 0.0032
Entity CoDFKGE (32-128) 217.40 10.65 0.4090 ± 0.0010 0.3079 ± 0.0007 0.5233 ± 0.0019 0.6068 ± 0.0019
Relation FedR(128) 75.49 0.69 0.4085 ± 0.0011 0.3079 ± 0.0021 0.5219 ± 0.0016 0.6066 ± 0.0017
Relation Distillation (128-128) 151.74 0.69 0.4106 ± 0.0013 0.3092 ± 0.0023 0.5242 ± 0.0008 0.6098 ± 0.0009
Relation CoDFKGE (128-128) 150.02 0.69 0.4065 ± 0.0007 0.3056 ± 0.0013 0.5190 ± 0.0023 0.6063 ± 0.0012
Relation Distillation (32-128) 94.53 0.17 0.3920 ± 0.0012 0.2960 ± 0.0007 0.4996 ± 0.0019 0.5807 ± 0.0013
Relation CoDFKGE (32-128) 93.69 0.17 0.4078 ± 0.0009 0.3060 ± 0.0007 0.5224 ± 0.0031 0.6074 ± 0.0015
RQ3 Can CoDFKGE effectively defend against untargeted poisoning 6.2. Targeted poisoning attack experiment (RQ2)
attacks?
RQ4 Do the two proposed distillation loss functions individually con- In the targeted poisoning attack, 32 pairs of non-existent triples
tribute to poisoning defense? are selected as attack targets from the victims KG through negative
sampling to construct a poisoned triple dataset. First, a predetermined
6.1. Normal link prediction (RQ1) number of normal triples are selected from the victims training triples.
Subsequently, the head or tail nodes of these triples are randomly re-
To explore the performance of the proposed model in normal link placed, and any triples already existing in the training set are iteratively
prediction, we first tested the model on a conventional dataset. The removed until 32 pairs of non-existent triples are successfully con-
performance of the model is measured using MRR and Hits@1, Hits@5, structed. In each epoch, the shadow model undergoes the same number
and Hits@10. The model is trained by federated learning and evaluated of local training rounds as legitimate clients on the poisoned dataset to
on the local test sets of clients. generate poisoned parameters. The malicious server aggregates these
Table 1 lists the performance of the local KGE model, FedE, FedR, poisoned parameters with the parameters of the normal client into
and CoDFKGE with different dimensions. The experimental results are shared parameters and distributes them to all clients. Attackers can
grouped according to the type of shared embeddings and the dimension assign high weights to poisoned model parameters during aggregation.
of the prediction model. The parameter dimensions are specified in Following the setup in Ref. [33], we set the weight of the attackers
parentheses within the Model column. For example, CoDFKGE(32- aggregated poisoned triples to be 256 times that of normal triples.
128) denotes the CoDKGE model with a 32-dimensional communication Experiments focus on models with shared entity parameters (required
model and a 128-dimensional prediction model. All link prediction
for targeted poisoning attacks) and non-federated local baselines.
experiments were repeated 5 times with different random seeds, and
For space considerations, this section reports only MRR and
the accuracy results of all models are reported as (mean ± standard
Hits@10 metrics. Attack effectiveness is measured by the MRR and
deviation). The best performing model results in each group (excluding
Hits@10 of poisoned triples on the victim. The higher metrics of the
the local model) are bolded. The results of the CoDFKGE (32-128)
poisoned triples indicate greater vulnerability to poisoning and weaker
model that are better than those of Distillation(32-128) are underlined.
resistance of the model to targeted poisoning attacks.
The performance of locally trained models is lower than most feder-
Table 2 lists the performance of baseline models and CoDFKGE
ated learning models, highlighting the advantages of sharing model pa-
under targeted poisoning attacks, grouped by the prediction model
rameters. High-dimensional distillation(128-128) models achieve better
dimension. The parameter dimensions are specified in parentheses
link prediction performance. Compared to distillation(128-128), CoD-
within the Model column. The All Clients column reports av-
FKGE models show slightly inferior prediction performance. However,
erage performance across all clients test sets during attacks, while
by comparing models with the same dimensions, CoDFKGE outperform
both local baselines and federated baselines (FedE, FedR). The co- Victim Poisoned measures the victims performance on predicting
distillation process in CoDFKGE may lead to a loss of generalization poisoned triples. All experiments were repeated 5 times with differ-
accuracy. We believe that the main advantage of CoDFKGE is its ent random seeds, and the results are reported as (mean ± standard
ability to enhance the security of FKGE. In addition to the security deviation). The best performing model results are bolded. Moreover,
performance demonstrated in Sections 6.2 and 6.3, it also maintains the Communication Poison column highlights the communication
link prediction performance comparable to its baseline FKGE models. models performance on poisoned triples for CoDFKGE and the dis-
Beyond accuracy metrics, the CC (Communication Cost) column tillation model, demonstrating that both communication models are
reports the communication overhead per training epoch, which is impacted by targeted poisoning attacks. Through distillation, the pre-
calculated based on the byte size of PyTorch Embedding used in the diction accuracy of poisoned triples by the prediction model decreases
implementation. The Mem column shows the GPU memory usage in both cases.
of federated models in MB. Distillation-based model requires main- For targeted poisoning attacks, the primary evaluation metrics
taining two KGE models, resulting in higher computational resource should be the MRR and Hits@10 performance indicators of the victim
consumption. Distillation-based models need larger GPU memory to model when predicting poisoned triples. The Local training model,
store the parameters of both models. Compared to using model pa- which does not employ federated learning, remains immune to poi-
rameters of the same size, distillation-based models allow to compress soning attacks, resulting in low MRR for poisoned triples, with the
parameters in the communication model, achieving significantly lower Hits@10 value being exactly 0. This indicates that the unpoisoned Local
communication overhead. In cases of smaller communication overhead, model does not include non-existent poisoned triples among its top
CoDFKGE(32-128) outperforms distillation(32-128) in link prediction 10 candidate results when making predictions. If a model incorrectly
performance. Therefore, we believe that the CoDFKGE model does marks non-existent poisoned test triples as one of the top 10 candidates,
not degrade the normal link prediction performance of baseline FKGE it demonstrates that the poisoning attack has successfully manipulated
models and can effectively reduce the communication overhead of the the models predictions. Therefore, we use Hits@10 as the metric to
model. measure the Attack Success Rate (ASR).
7
Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113
Table 2
Experiment result under targeted poisoning attack.
Model All clients Victim poison Communication poison
MRR Hits@10 MRR Hits@10(ASR) MRR Hits@10
Local(128, unpoisoned) 0.4081 ± 0.0015 0.6077 ± 0.0015 0.0003 ± 0.0001 0.0000 ± 0.0000
FedE(128) 0.4034 ± 0.0035 0.6004 ± 0.0029 0.4450 ± 0.0938 0.7857 ± 0.1248
Distillation(128-128) 0.4026 ± 0.0025 0.6006 ± 0.0039 0.0844 ± 0.0552 0.2000 ± 0.1311 0.4999 ± 0.1429 0.7714 ± 0.1046
CoDFKGE(128-128) 0.4086 ± 0.0007 0.6089 ± 0.0012 0.0010 ± 0.0003 0.0009 ± 0.0005 0.4694 ± 0.1511 0.6589 ± 0.1242
Distillation(32-128) 0.3821 ± 0.0022 0.5717 ± 0.0018 0.1511 ± 0.3356 0.1960 ± 0.4362 0.4919 ± 0.2364 0.6625 ± 0.1887
CoDFKGE(32-128) 0.3856 ± 0.0039 0.5740 ± 0.0054 0.0010 ± 0.0001 0.0010 ± 0.0003 0.3794 ± 0.0032 0.5702 ± 0.005
Fig. 3. Performance degradation comparison.
The FedE model maintains high prediction accuracy on normal communication model in CoDFKGE(32-128) less susceptible to poison-
test triples when under attack, but exhibits abnormally high MRR and ing attacks.
Hits@10 metrics for targeted poisoned triples, even exceeding those
of normal triples. This indicates that targeted poisoning attacks can 6.3. Untargeted poisoning attack experiment (RQ3)
effectively manipulate the FedE model to generate incorrect prediction
results. Similarly, in distillation-based models, their communication In untargeted poisoning attack experiments, the attacker returns
models are severely affected by poisoning attacks, while the impact on negative aggregate parameters to the victim client, making the victim
the prediction models is relatively minor. Although the distill(128-128) model non-converge and degrading prediction performance. The results
model can partially eliminate poisoning knowledge, it still remains vul- presented in this section reflect average prediction performance on
nerable to the targeted poisoning attacks. Moreover, as the dimension local test triples of clients.
of the communication model parameter increases, the extent of the Table 3 lists the performance of each model under untargeted
models vulnerability to poisoning attacks also grows. poisoning attacks, grouped by the prediction model dimension and
In contrast, CoDFKGEs prediction model performs distillation learn- federated type. The parameter dimensions are specified in parenthe-
ing exclusively on verified positive samples, effectively eliminating ses within the Model column. The All Clients column shows the
potential poisoning knowledge that might exist in negative samples. average performance of all clients under untargeted poisoning attacks,
Similar to the Local training model, CoDFKGE achieves extremely low and the Victim Client column shows the performance of the victim
MRR and Hits@10 metrics for poisoned triples, which fully demon- client. To measure the severity of the model being attacked, the MRR of
strates that the CoDFKGE model can effectively defend against targeted the local model in Table 1 is used as a benchmark. The Decay Ratio
poisoning attacks in FKGE. Furthermore, due to the compression of column shows the ratio of performance degradation on the victim
the communication models dimension, the amount of information client compared to the local model shown in Table 1. All experiments
that attackers can transmit is correspondingly reduced, making the were repeated 5 times with different random seeds, and the results
8
Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113
Table 3
Experiment result under untargeted poisoning attack.
Fed Type Model All clients Victim Decay ratio (%)
MRR Hits@10 MRR Hits@10 MRR Hits@10
Entity FedE(128) 0.3896 ± 0.0010 0.5939 ± 0.0009 0.3625 ± 0.0102 0.5620 ± 0.0144 11.21 7.58
Entity Distillation(128-128) 0.3900 ± 0.0017 0.5921 ± 0.0007 0.3641 ± 0.0012 0.5664 ± 0.0018 11.82 7.54
Entity CoDFKGE(128-128) 0.4084 ± 0.0007 0.6068 ± 0.0003 0.4017 ± 0.0010 0.6009 ± 0.0005 2.25 1.28
Entity Distillation (32-128) 0.3024 ± 0.0208 0.5422 ± 0.0105 0.2739 ± 0.0264 0.5262 ± 0.0124 30.02 9.49
Entity CoDFKGE (32-128) 0.4093 ± 0.0018 0.6081 ± 0.0014 0.4022 ± 0.0022 0.6023 ± 0.0011 1.66 0.75
Relation FedR(128) 0.3915 ± 0.0010 0.5951 ± 0.0016 0.3637 ± 0.0093 0.5636 ± 0.0150 10.96 7.10
Relation Distillation(128-128) 0.3978 ± 0.0017 0.6022 ± 0.0019 0.3881 ± 0.0023 0.5942 ± 0.0028 5.51 2.56
Relation CoDFKGE(128-128) 0.4086 ± 0.0017 0.6075 ± 0.0029 0.4014 ± 0.0020 0.6018 ± 0.0037 1.24 0.75
Relation Distillation (32-128) 0.3058 ± 0.0079 0.5463 ± 0.0029 0.2787 ± 0.0101 0.5307 ± 0.0038 27.78 8.61
Relation CoDFKGE (32-128) 0.4090 ± 0.0008 0.6066 ± 0.0011 0.4026 ± 0.0008 0.6018 ± 0.0013 1.27 0.92
Table 4
Ablation study in normal link prediction and under targeted attack.
Model Link prediction Targeted all clients Targeted victim poisoning
MRR Hits@10 MRR Hits@10 MRR Hits@10 (targeted poisoning ASR)
CoDFKGE 0.4112 ± 0.0039 0.6084 ± 0.0036 0.4086 ± 0.0007 0.6089 ± 0.0012 0.0010 ± 0.0003 0.0009 ± 0.0005
Ablation(Comm) 0.4095 ± 0.0016 0.6074 ± 0.0014 0.4086 ± 0.0022 0.6076 ± 0.0021 0.0017 ± 0.0008 0.0013 ± 0.0008
Ablation(Pred) 0.4132 ± 0.0006 0.6116 ± 0.0012 0.4098 ± 0.0011 0.6080 ± 0.0009 0.8086 ± 0.0064 0.9702 ± 0.0228
are reported as (mean ± standard deviation). The best and second best were repeated 5 times with different random seeds, and the results are
results in each group have been marked in bold and underline. reported as (mean ± standard deviation). The best results are bolded.
From the experimental results, it can be observed that when sub- Experimental results demonstrate that while Ablation(Pred) per-
jected to untargeted poisoning attacks, the CoDFKGE series models forms well in conventional link prediction, its resistance to poisoning
achieve optimal MRR and Hits@10 performance metrics compared to attacks lags behind the other two models due to not employing a
other models. In this context, all models exhibit varying degrees of negative sample exclusion strategy in its loss function. Among the re-
decline in both their overall performance metrics and their performance maining two models, while both demonstrate robust resilience against
metrics on victims. In Fig. 3, we present a comparison of the predic- poisoning attacks, the CoDFKGE model achieves superior link pre-
tion performance of various models under normal link prediction and diction performance compared to Ablation(Comm). Ablation(Comm)
untargeted poisoning attack scenarios. It can be observed that the Dis- employs a baseline loss function during the distillation training of
tillation(32-128) model experiences the most significant performance the communication model. In contrast, the CoDFKGE model adopts
degradation; for Distillation(128-128), FedE, and FedR models, their the approach from [9] and utilizes self-adversarial sampling temper-
performance degradation is also substantial and cannot be ignored. ature 𝜏𝛼 to reweight negative samples, thereby enhancing the models
These models directly incorporate poisoned global knowledge as an ability to distinguish between negative samples. Overall, the ablation
integral part of their own models, causing the convergence process of experiments demonstrate that applying the proposed distillation loss
the models to be adversely affected. In contrast, the performance degra- functions simultaneously enhances the models capability in defending
dation of CoDFKGE models is fully within 3%. This is because even in against poisoning attacks and link prediction.
the absence of global knowledge, the prediction model of CoDFKGE still
7. Conclusion
utilizes local data knowledge for training, and its training effectiveness
is comparable to that of local KGE models without knowledge sharing.
This paper proposes CoDFKGE, a co-distillation-based defense
Baseline models may have their results manipulated or exhibit
framework for FKGE poisoning attacks. As the first co-distillation
significant performance degradation when facing poisoning attacks.
defense framework against poisoning attacks in FKGE, CoDFKGE does
Although in link prediction experiments, distillation models exhibited
have some limitations. First, maintaining two separate models requires
advantages in performance, their defense effectiveness is extremely
higher computational resource consumption on clients. Second, the
limited when facing poisoning attacks. In contrast, CoDFKGE remains
bidirectional distillation process may lead to a loss of generalization
unmanipulated when encountering targeted poisoning attacks and does
accuracy. In contrast, CoDFKGEs advantages lie in its model-agnostic
not exhibit significant performance degradation when subjected to
applicability to existing FKGE models without compromising perfor-
untargeted poisoning attacks, demonstrating its effective defense capa- mance. By decoupling clients prediction models from shared parameter
bility against poisoning attacks. models, CoDFKGE effectively filters out poisoned knowledge embedded
in shared updates. CoDFKG eliminates malicious manipulations under
6.4. Ablation study (RQ4) targeted poisoning attacks, and significantly mitigates accuracy degra-
dation under untargeted poisoning attacks. Leveraging distillation,
This section evaluates the defensive effects of applying different the framework further reduces communication overhead. This work
loss functions in CoDFKGE against poisoning attacks. Specifically, we provides new ideas for enhancing the security of FKGE.
compare the performance of models using 128-dimensional training The limitations of FKGE poisoning defense research are partially
parameters for both communication and prediction models across nor- rooted in the unique characteristics of KGE. When considering
mal link prediction, targeted poisoning attack scenarios, and untargeted translation-based KGE models in FKGE, sharing entity or relation
poisoning attack scenarios. Two ablation baselines were implemented: embeddings introduces risks related to both privacy preservation and
Ablation(Comm) applies the baseline loss function (Eq. (4)) solely poisoning attacks. Employing GNN-based KGE models in FKGE that
during the communication modules distillation, while Ablation(Pred) transmit GNN parameters or gradients can alleviate these concerns.
uses it exclusively for the prediction modules distillation. However, due to their superior robustness to sparse data and lower
Tables 4 and 5 shows the experiment results of models with different computational resource requirements, translation-based models still
distillation loss functions sharing entity embeddings. All experiments maintain unparalleled advantages in specific application scenarios.
9
Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113
Table 5
Ablation study under untargeted attack.
Model Untargeted all clients Untargeted victim Decay ratio (%)
MRR Hits@10 MRR Hits@10 MRR Hits@10
CoDFKGE 0.4084 ± 0.0007 0.6068 ± 0.0003 0.4017 ± 0.0010 0.6009 ± 0.0005 2.25 1.27
Ablation(Comm) 0.4056 ± 0.0017 0.6062 ± 0.0011 0.3996 ± 0.0018 0.6003 ± 0.0013 2.42 1.16
Ablation(Pred) 0.3951 ± 0.0011 0.6022 ± 0.0008 0.3852 ± 0.0009 0.5951 ± 0.0005 6.76 2.69
For future research, we recommend exploring the application of the [8] Z. Wang, J. Zhang, J. Feng, Z. Chen, Knowledge graph embedding by translating
CoDFKGE framework in more complex real-world scenarios, such as on hyperplanes, in: Proceedings of the AAAI Conference on Artificial Intelligence,
vol. 28, 2014.
personalized FKGE problems. Additionally, in large-scale dynamic KG
[9] Z. Sun, Z.-H. Deng, J.-Y. Nie, J. Tang, Rotate: Knowledge graph embedding by
environments, the security landscape for FKGE may undergo signifi- relational rotation in complex space, 2019, arXiv preprint arXiv:1902.10197.
cant changes, necessitating further investigation into defense methods [10] Z. Zhang, J. Jia, Y. Wan, Y. Zhou, Y. Kong, Y. Qian, J. Long, Transr*: Repre-
tailored to these evolving scenarios. sentation learning model by flexible translation and relation matrix projection,
J. Intell. Fuzzy Systems 40 (5) (2021) 1025110259.
[11] T. Dettmers, P. Minervini, P. Stenetorp, S. Riedel, Convolutional 2d knowl-
CRediT authorship contribution statement edge graph embeddings, in: Proceedings of the AAAI Conference on Artificial
Intelligence, vol. 32, (1) 2018.
Yiqin Lu: Supervision. Jiarui Chen: Writing original draft, Soft- [12] B. McMahan, E. Moore, D. Ramage, S. Hampson, B.A. y Arcas, Communication-
efficient learning of deep networks from decentralized data, in: Artificial
ware, Methodology. Jiancheng Qin: Writing review & editing.
Intelligence and Statistics, PMLR, 2017, pp. 12731282.
[13] M. Chen, W. Zhang, Z. Yuan, Y. Jia, H. Chen, Fede: Embedding knowledge graphs
Declaration of Generative AI and AI-assisted technologies in the in federated setting, in: Proceedings of the 10th International Joint Conference
writing process on Knowledge Graphs, 2021, pp. 8088.
[14] M. Chen, W. Zhang, Z. Yuan, Y. Jia, H. Chen, Federated knowledge graph
During the preparation of this work the author(s) used deepseek in completion via embedding-contrastive learning, Knowl.-Based Syst. 252 (2022)
109459.
order to improve language and readability. After using this tool/service, [15] K. Zhang, Y. Wang, H. Wang, L. Huang, C. Yang, X. Chen, L. Sun, Efficient fed-
the author(s) reviewed and edited the content as needed and take(s) full erated learning on knowledge graphs via privacy-preserving relation embedding
responsibility for the content of the publication. aggregation, 2022, arXiv preprint arXiv:2203.09553.
[16] G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network,
2015, arXiv preprint arXiv:1503.02531.
Declaration of competing interest [17] N. Papernot, P. McDaniel, X. Wu, S. Jha, A. Swami, Distillation as a de-
fense to adversarial perturbations against deep neural networks, in: 2016 IEEE
The authors declare that they have no known competing finan- Symposium on Security and Privacy, SP, IEEE, 2016, pp. 582597.
cial interests or personal relationships that could have appeared to [18] K. Yoshida, T. Fujino, Countermeasure against backdoor attack on neural
networks utilizing knowledge distillation, J. Signal Process. 24 (4) (2020)
influence the work reported in this paper.
141144.
[19] K. Yoshida, T. Fujino, Disabling backdoor and identifying poison data by
Acknowledgment using knowledge distillation in backdoor attacks on deep neural networks, in:
Proceedings of the 13th ACM Workshop on Artificial Intelligence and Security,
2020, pp. 117127.
This work is supported by the Special Project for Research and [20] R. Anil, G. Pereyra, A. Passos, R. Ormandi, G.E. Dahl, G.E. Hinton, Large
Development in Key Areas of Guangdong Province, under Grant scale distributed neural network training through online distillation, 2018, arXiv
2019B010137001. preprint arXiv:1804.03235.
[21] Y. Hu, W. Liang, R. Wu, K. Xiao, W. Wang, X. Li, J. Liu, Z. Qin, Quantifying and
defending against privacy threats on federated knowledge graph embedding, in:
Data availability
Proceedings of the ACM Web Conference 2023, 2023, pp. 23062317.
[22] X. Zhu, G. Li, W. Hu, Heterogeneous federated knowledge graph embedding
Data will be made available on request. learning and unlearning, in: Proceedings of the ACM Web Conference 2023,
2023, pp. 24442454.
[23] X. Zhang, Z. Zeng, X. Zhou, Z. Shen, Low-dimensional federated knowledge graph
embedding via knowledge distillation, 2024, arXiv preprint arXiv:2408.05748.
References
[24] Y. Liu, Z. Sun, G. Li, W. Hu, I know what you do not know: Knowledge
graph embedding via co-distillation learning, in: Proceedings of the 31st ACM
[1] X. Zhao, H. Chen, Z. Xing, C. Miao, Brain-inspired search engine assistant based International Conference on Information & Knowledge Management, 2022, pp.
on knowledge graph, IEEE Trans. Neural Netw. Learn. Syst. 34 (8) (2021) 13291338.
43864400. [25] F. Xia, W. Cheng, A survey on privacy-preserving federated learning against
[2] S. Sharma, Fact-finding knowledge-aware search engine, in: Data Management, poisoning attacks, Clust. Comput. 27 (10) (2024) 1356513582.
Analytics and Innovation: Proceedings of ICDMAI 2021, vol. 2, Springer, 2021, [26] J. Chen, H. Yan, Z. Liu, M. Zhang, H. Xiong, S. Yu, When federated learning
pp. 225235. meets privacy-preserving computation, ACM Comput. Surv. (ISSN: 0360-0300)
[3] Y. Jiang, Y. Yang, L. Xia, C. Huang, DiffKG: Knowledge graph diffusion model for 56 (12) (2024).
recommendation, in: Proceedings of the 17th ACM International Conference on [27] J. Xia, Z. Yue, Y. Zhou, Z. Ling, Y. Shi, X. Wei, M. Chen, Waveattack: Asymmetric
Web Search and Data Mining, WSDM 24, Association for Computing Machinery, frequency obfuscation-based backdoor attacks against deep neural networks, Adv.
New York, NY, USA, ISBN: 9798400703713, 2024, pp. 313321. Neural Inf. Process. Syst. 37 (2024) 4354943570.
[4] W. Wang, X. Shen, B. Yi, H. Zhang, J. Liu, C. Dai, Knowledge-aware fine-grained [28] P. Blanchard, E.M. El Mhamdi, R. Guerraoui, J. Stainer, Machine learning with
attention networks with refined knowledge graph embedding for personalized adversaries: Byzantine tolerant gradient descent, Adv. Neural Inf. Process. Syst.
recommendation, Expert Syst. Appl. 249 (2024) 123710. 30 (2017).
[5] J. Chen, Y. Lu, Y. Zhang, F. Huang, J. Qin, A management knowledge graph [29] N.M. Jebreel, J. Domingo-Ferrer, Fl-defender: Combating targeted attacks in
approach for critical infrastructure protection: Ontology design, information ex- federated learning, Knowl.-Based Syst. 260 (2023) 110178.
traction and relation prediction, Int. J. Crit. Infrastruct. Prot. (ISSN: 1874-5482) [30] Z. Yue, J. Xia, Z. Ling, M. Hu, T. Wang, X. Wei, M. Chen, Model-contrastive
43 (2023) 100634. learning for backdoor elimination, in: Proceedings of the 31st ACM International
[6] Y. Zhang, J. Chen, Z. Cheng, X. Shen, J. Qin, Y. Han, Y. Lu, Edge propagation Conference on Multimedia, 2023, pp. 88698880.
for link prediction in requirement-cyber threat intelligence knowledge graph, [31] H. Peng, H. Li, Y. Song, V. Zheng, J. Li, Differentially private federated
Inform. Sci. (ISSN: 0020-0255) 653 (2024) 119770. knowledge graphs embedding, in: Proceedings of the 30th ACM International
[7] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, O. Yakhnenko, Translating Conference on Information & Knowledge Management, CIKM 21, Association
embeddings for modeling multi-relational data, Adv. Neural Inf. Process. Syst. for Computing Machinery, New York, NY, USA, ISBN: 9781450384469, 2021,
26 (2013). pp. 14161425.
10
Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113
[32] Y. Hu, Y. Wang, J. Lou, W. Liang, R. Wu, W. Wang, X. Li, J. Liu, Z. Qin, Privacy [36] M. Fey, J.E. Lenssen, Fast graph representation learning with PyTorch Geometric,
risks of federated knowledge graph embedding: New membership inference in: ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
attacks and personalized differential privacy defense, IEEE Trans. Dependable [37] P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang, M. Elibol,
Secur. Comput. (2024). Z. Yang, W. Paul, M.I. Jordan, I. Stoica, Ray: A distributed framework for
[33] E. Zhou, S. Guo, Z. Ma, Z. Hong, T. Guo, P. Dong, Poisoning attack on federated emerging AI applications, in: 13th USENIX Symposium on Operating Systems
knowledge graph embedding, in: Proceedings of the ACM Web Conference 2024, Design and Implementation (OSDI 18), USENIX Association, Carlsbad, CA, ISBN:
2024, pp. 19982008. 978-1-939133-08-3, 2018, pp. 561577.
[34] G. Xia, J. Chen, C. Yu, J. Ma, Poisoning attacks in federated learning: A survey, [38] D.P. Kingma, J. Ba, Adam: A method for stochastic optimization, 2014, arXiv
Ieee Access 11 (2023) 1070810722. preprint arXiv:1412.6980.
[35] K. Toutanova, D. Chen, P. Pantel, H. Poon, P. Choudhury, M. Gamon, Repre-
senting text for joint embedding of text and knowledge bases, in: Proceedings
of the 2015 Conference on Empirical Methods in Natural Language Processing,
2015, pp. 14991509.
11

View File

@@ -0,0 +1,875 @@
Journal of Systems Architecture 160 (2025) 103361
Contents lists available at ScienceDirect
Journal of Systems Architecture
journal homepage: www.elsevier.com/locate/sysarc
EDF-based Energy-Efficient Probabilistic Imprecise Mixed-Criticality
Scheduling
Yi-Wen Zhang , Jin-Long Zhang
College of Computer Science and Technology, Huaqiao University, Xiamen, 361021, China
ARTICLE INFO ABSTRACT
Keywords: We focus on Mixed-Criticality Systems (MCS), which involves the integration of multiple subsystems with
Imprecise Mixed-Criticality varying levels of criticality on shared hardware platforms. The classic MCS task model assumes hard real-time
Energy management constraints and no Quality-of-Service (QoS) for low-criticality tasks in high-criticality mode. Many researchers
DVFS
have put forward a range of extensions to the classic MCS task model to make MCS theory more applicable in
Probabilistic schedulability
industry practice. In this paper, we consider an Imprecise MCS taskset scheduled with Earliest Deadline First
algorithm on a uniprocessor platform, and propose an Energy-Efficient Task Execution Model that guarantees
(deterministic or probabilistic) schedulability, allows degraded QoS to low-criticality tasks in high-criticality
mode, and applies Dynamic Voltage and Frequency Scaling to save energy.
1. Introduction
In this paper, we consider all the above different aspects within
Mixed-Criticality Systems (MCS) [1] involve the integration of mul- a unified framework. We consider an Imprecise MCS probabilistic
tiple sub-systems with varying criticality levels on a shared hardware taskset scheduled with Earliest Deadline First (EDF) algorithm on a
platform. For example, the automotive safety certification standard ISO uniprocessor platform, and propose an Energy-Efficient Task Execution
26262 and the avionics safety certification standard DO-178C. Since Model that guarantees (deterministic or probabilistic) schedulability,
the introduction of the MCS concept by Vestal [2], there has been allows degraded QoS to LO tasks in HI mode, and applies DVFS to
considerable research conducted on this topic [1,3,4]. Many researchers save energy. Although the work in [7] is the closest to ours, there are
several key differences. Firstly, it schedules tasks under non-preemptive
have put forward a range of extensions to the classic MCS task model
fixed-priority (NPFP) [8] scheduling policy while our work schedules
to make MCS theory more applicable in industry practice, including:
tasks with a preemptive EDF. Secondly, it uses probabilistic WCET
(pWCET) to determine the probability of mode transition and uses a
• To reduce the pessimism in task worst-case execution time
deterministic schedulability analysis while our work includes determin-
(WCET) estimation and system schedulability analysis,
istic or probabilistic schedulability analysis. Finally, it uses the response
researchers have proposed probabilistic schedulability analysis
time analysis to determine the schedulability analysis while our work
techniques where the task WCETs (and/or periods) are repre-
uses Demand Bound Function (DBF) to determine the schedulability
sented by random variables, and the system is allowed to miss analysis. In short, the work is first to address the energy issue and
deadlines with a small probability [5]. schedulability test of the Imprecise MCS probabilistic taskset MCS
• The original assumption that all low-criticality (LO) tasks are taskset scheduling under EDF.
discarded in high-criticality (HI) mode is likely to be undesirable The remainder of the paper is organized as follows. We present
in industry practice, hence researchers have proposed various background and related work in Section 2. Section 3 presents prelim-
approaches to allow a certain level of degraded Quality-of-Service inaries. Section 4 presents our probabilistic IMC scheduling; Section 5
(QoS) to LO tasks in HI mode [1]. presents the Energy-Efficient Task Execution Model; Section 6 presents
• To address energy-constrained safetycritical systems, researchers experimental results; Section 7 discusses practical issues. Finally, Sec-
have proposed power and energy-aware scheduling algorithms tion 8 presents conclusions and future work.
with Dynamic Voltage and Frequency Scaling (DVFS) for MCS [6].
Corresponding author.
E-mail addresses: zyw@hqu.edu.cn (Y.-W. Zhang), sang_yunl@stu.hqu.edu.cn (J.-L. Zhang).
https://doi.org/10.1016/j.sysarc.2025.103361
Received 11 September 2024; Received in revised form 3 February 2025; Accepted 4 February 2025
Available online 12 February 2025
1383-7621/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Y.-W. Zhang and J.-L. Zhang Journal of Systems Architecture 160 (2025) 103361
2. Background and related work 2.2. The classic MCS task model
2.1. Background and motivation The MCS taskset 𝛤 includes 𝑛 independent sporadic tasks 𝛤 =
{𝜏𝑖 |1 ≤ 𝑖𝑛} [13,14]. Although there may be multiple (45) criticality
Resource-constrained embedded systems. In order to motivate levels in general, we present the task model assuming a dual-criticality
the need for probabilistic scheduling and DVFS addressed in this paper, system with criticality levels LO and HI for the sake of simplicity. The
we first discuss the issue of hardware resource constraints in real- taskset 𝛤 includes two subsets: LO tasks 𝛤𝐿𝑂 = {𝜏𝑖 ∈ 𝛤 |𝐿𝑖 = 𝐿𝑂} and
time embedded systems, including but not limited to MCS, which HI tasks 𝛤𝐻 𝐼 = {𝜏𝑖 ∈ 𝛤 |𝐿𝑖 = 𝐻 𝐼}. Each task 𝜏𝑖 ∈ 𝛤 is described by
are especially pertinent for mass-produced consumer products such (𝐿𝑖 , 𝑇𝑖 , 𝐷𝑖 , 𝐶𝑖𝐿𝑂 , 𝐶𝑖𝐻 𝐼 ):
as ground vehicles and drones (Unmanned Aerial Vehicles), due to
monetary cost as well as Size, Weight, and Power (SWaP) constraints. • 𝐿𝑖 ∈ {𝐿𝑂, 𝐻 𝐼} denoted its criticality level.
Automotive Electrical/Electronic (E/E) systems typically have stringent • 𝑇𝑖 denoted its period.
hardware resource constraints. In modern high-end vehicles, there can • 𝐷𝑖 denoted its relative deadline.
be up to 100 ECUs (Electronic Control Units) embedded within them, • 𝐶𝑖𝐿𝑂 denoted its WCET in LO mode.
and each model can be sold millions of times. An overall savings of • 𝐶𝑖𝐻 𝐼 denoted its WCET in HI mode for HI tasks (𝐿𝑖 = 𝐻 𝐼), with
millions of dollars may be achieved by saving a few dollars per ECU. 𝐶𝑖𝐻 𝐼𝐶𝑖𝐿𝑂 .
Hence, a designer of E/E systems should choose the cheapest ECU
according to their applications needs. The monetary cost pressure on Task execution model of classic MCS. The system is first ini-
relatively cheap consumer drones is even higher. Next, let us consider tialized to be in LO mode. LO tasks 𝜏𝑖 ∈ 𝛤𝐿𝑂 are monitored at run
the issue of SWaP, which lumps together three factors that are closely time and their execution is no more than their 𝐶𝑖𝐿𝑂 . The system is
correlated due to the same underlying cause of hardware resource schedulable in LO mode if all tasks 𝜏𝑖 ∈ 𝛤 can complete their LO mode
constraints. The significance of SWaP is obvious in battery-powered WCETs 𝐶𝑖𝐿𝑂 within their respective deadlines. If any HI task 𝜏𝑖 ∈ 𝛤𝐻 𝐼
mobile devices like drones and mobile robots, where operating time executes beyond its 𝐶𝑖𝐿𝑂 , the system enters HI mode while all LO tasks
and physical constraints are limited. However, SWaP considerations in 𝛤𝐿𝑂 are abandoned. The system is schedulable in HI mode if all HI
are equally applicable to ground vehicles that are equipped with siz- tasks 𝜏𝑖 ∈ 𝛤𝐻 𝐼 can complete their HI mode WCETs 𝐶𝑖𝐻 𝐼 within their
able battery systems. Electronics within autonomous vehicles consume respective deadlines. The system switches back to LO mode at an idle
substantial power, impacting the range of electric vehicles or the fuel instant if no jobs wait for executions at this time [15]. The system is
consumption of gasoline vehicles. Size and weight affect consumer schedulable if both modes are schedulable.
acceptance, e.g., an autonomous vehicle with a trunk full of electronics The state-of-the-art scheduling algorithms for the classic MCS task
is not likely to be acceptable to the average consumer. The issue of model include Fixed-Priority scheduling [14], and Earliest-Deadline
significant hardware resource constraints in MCS has motivated a line First with Virtual Deadline (EDF-VD) [16] for Dynamic-Priority
of work on processing and memory resource optimization algorithms scheduling on uniprocessor systems. Subsequently, many extensions to
for MCS [9]. the classic MCS task model have been proposed, as discussed next.
Motivation for probabilistic schedulability analysis. Recently,
Akesson et al. [10] investigated 120 industry practitioners in real-time 2.3. Degraded QoS for LO tasks
embedded systems, and results indicated that soft or firm real-time
constraints are prevalent even in safetycritical application domains. The degraded QoS of LO tasks in HI mode is achieved by decreasing
A minority (15%) of the surveyed systems were considered strictly execution time budgets [17] or adding the task period [18] for LO tasks.
hard real-time (no deadlines to be missed). Thus, designing the timing Liu et al. [17] proposed the Imprecise Mixed-Criticality (IMC) task
behavior of a system function to ensure a much lower failure rate did model in which a HI task 𝜏𝑖 (𝐿𝑖 = 𝐻 𝐼) is assigned a greater estimated
not affect the systems total schedulability. WCET compared to its estimation in LO mode (𝐶𝑖𝐿𝑂𝐶𝑖𝐻 𝐼 ), while a
Industry safety certification standards specify acceptable failure LO task 𝜏𝑖 (𝐿𝑖 = 𝐿𝑂) is assigned a smaller estimated WCET in HI mode
rates depending on the systems criticality levels such as each ASIL has compared to the estimation in LO mode (𝐶𝑖𝐿𝑂𝐶𝑖𝐻 𝐼 ). They considered
a permitted failure probability of 109 for ASIL D, 108 for ASIL C EDF-VD scheduling on a single processor system, and presented two
and B, and 107 for ASIL A in the automotive standard ISO-26262 [5]. schedulability tests, one based on the utilization bound test, and the
Relaxing the hard real-time assumption can help reduce pessimism other based on the Demand Bound Function (DBF). Davis et al. [19]
in task WCET estimation and system schedulability analysis and in- addressed the IMC task model under fixed-priority scheduling, and pre-
crease schedulable utilization significantly. Von der Brüggen et al. [11] sented a Compensating AMC Scheduling scheme and two schedulability
demonstrated large gains in processor utilization with experiments tests. Jiang et al. [20] presented a concrete implementation of the
using randomly-generated workloads, e.g., a gain of at least 12% IMC task model in the form of a configurable processor floating point
schedulable utilization for an acceptable worst-case deadline failure unit hardware design, as well as schedulability analysis and optimized
probability of 106 . This motivates probabilistic schedulability analysis priority assignment algorithms based on fixed-priority scheduling.
as an effective technique for reducing analysis pessimism and increase
processor utilization in resource-constrained embedded systems. 2.4. Energy-aware scheduling for MCS
Motivation for not dropping LO tasks in HI mode. Consider
the automotive standard ISO-26262, where ASIL determination of haz- DVFS dynamically adjusts the processor supply voltage and speed
ardous events is based on three parameters: severity, probability of (frequency) based on the systems workload, which is an effective
exposure and controllability. An individuals vulnerability to harm energy-saving technique [21]. Most modern microprocessors, including
in a potentially hazardous situation is determined by severity. Proba- those used in embedded systems, provide support for DVFS. Our recent
bility is the likelihood that harm will occur, while controllability is the survey paper [6] provided an overview of recent developments in
ability to avoid harm or damage through prompt action by the agents energy-aware real-time scheduling for MCS, predominantly focusing on
involved (e.g. a driver of the vehicle). It cannot always be assumed that DVFS.
a software function that is part of a high ASIL functionality is more Recently, power and energy-aware real-time scheduling for MCS
important than one that is part of a lower ASIL functionality, as both has attracted significant attention [6]. Huang et al. [22] proposed a
may be safetycritical, and each functions failure may cause severe scheduling algorithm for MCS based on EDF-VD [16]. This scheduling
damage [12]. algorithm reduces energy consumption by optimizing virtual deadlines
2
Y.-W. Zhang and J.-L. Zhang Journal of Systems Architecture 160 (2025) 103361
and processor speeds. Zhang [23] used the dynamic slack time gener- Table 1
Related work on probabilistic Scheduling for MCS. Abbreviations: Prob. (Probabilistic);
ated from late arrival tasks to reduce energy consumption. This work
S.A. (Schedulability Analysis).
is extended to MCS with fixed-priority preemptive scheduling [24] and
Work Sched. Prob. Energy- LO tasks
dynamic priority non-preemptive scheduling [25]. Zhang et al. [26] Algo. S.A. Aware dropped in
tackled the issue of MCS with shared resources and proposed a dual- HI Mode
speed scheduling algorithm. This algorithm ensured both the system Santinelli and George (2015) [33] EDF Y N Y
schedulability and mutually exclusive access to shared resources. How- Maxim et al. (2017) [34] FP Y N Y
ever, it assumed that all tasks execute with their WCET. Zhang [27] Singh et al. (2020) [35] NPFP Y N Y
used the difference between the actual execution time and WCET Draskovic et al. (2021) [36] FP Y N N
Guo et al. (2021) [37] EDF Y N Y
to save energy. These works focus on the classic MCS task model.
Bhuiyan et al. (2020) [7] NPFP N Y Y
Zhang [28] focused on the IMC task model in which LO tasks allow This work EDF Y Y N
Qos in HI mode and proposed an energy-aware scheduling algorithm
(EA-IMC).
There has been a small number of recent works on energy-aware
MCS on multiprocessors. Narayana et al. [29] considered the energy probability that its WCET is equal to 𝑒𝑡.1 Given the PMF 𝑓𝑖 (⋅), we
minimization problem for multiprocessor MCS based on DVFS. They can easily obtain the corresponding Cumulative Distribution Function
first proposed an optimal solution and an effective lightweight heuristic (CDF) 𝐹𝑖 (⋅), where 𝐹𝑖 (𝑒𝑡) = 𝑃 (𝑖 ≤ 𝑒𝑡) = 𝑥≤𝑒𝑡 𝑓𝑖 (𝑥). The Complemen-
on a uniprocessor, then extended these results to multicore systems. tary Cumulative Distribution Function (1-CDF) is defined as 𝐹̄𝑖 (𝑒𝑡) =
Ranjbar et al. [30] proposed a heuristic algorithm for online peak 𝑃 (𝑖 > 𝑒𝑡) = 1 𝐹𝑖 (𝑒𝑡).
power and thermal management of a multicore MCS by using the slack We consider the MCS taskset 𝛤 including 𝑛 independent periodic
time and per-cluster DVFS. Recently, some researchers [31] studied the tasks 𝛤 = {𝜏𝑖 |1 ≤ 𝑖𝑛} scheduled with preemptive EDF on
IMC task model on multiprocessors in which LO tasks allow QoS in HI a single processor platform. (It is a special case of EDF-VD with a
mode and proposed the partitioned scheduling algorithm. In addition, deadline scaling factor 𝑥 = 1.) We assume a dual-criticality system with
this work is extended to shared resource scheduling [32]. However, the criticality levels LO and HI for the sake of simplicity. The taskset 𝛤
above studies assume that tasks execute with their deterministic WCET. consists of two subsets: LO tasks 𝛤𝐿𝑂 = {𝜏𝑖 ∈ 𝛤 |𝐿𝑖 = 𝐿𝑂} and HI tasks
𝛤𝐻 𝐼 = {𝜏𝑖 ∈ 𝛤 |𝐿𝑖 = 𝐻 𝐼}. Each task 𝜏𝑖 ∈ 𝛤 is described by a tuple of
2.5. Probabilistic scheduling for MCS parameters ⟨𝐿𝑖 , 𝑇𝑖 , 𝐷𝑖 , 𝑖 , 𝑖𝐿𝑂 , 𝑖𝐻 𝐼 , 𝐶𝑖𝑑 𝑒𝑔 𝐶𝑖𝑡𝑟 ⟩:
𝐿𝑖 ∈ {𝐿𝑂, 𝐻 𝐼} denotes its criticality level.
Santinelli and George [33] presented an initial solution to proba-
bilistic schedulability analysis for EDF scheduling of MCS based on the • 𝑇𝑖 denotes its period.
concept of probabilistic C-Space. Maxim et al. [34] presented a prob- • 𝐷𝑖 denotes its constrained deadline (𝐷𝑖𝑇𝑖 ).
abilistic fixed-priority schedulability analysis [14]. Singh et al. [35] • 𝑖 is its nominal pWCET, a discrete random variable with 𝐾
considered a novel MCS task model with job-level mode switching, discrete values characterized by PMF 𝑓𝑖 (⋅) and CDF 𝐹𝑖 (⋅). It has
and presented a graph-traversal-based analytic framework for non- the minimum value 𝐶𝑖𝑚𝑖𝑛 with index 𝑖𝑛𝑑(𝐶𝑖𝑚𝑖𝑛 ) = 0 and maximum
preemptive job-level fixed-priority probabilistic schedulability analysis. value 𝐶𝑖𝑚𝑎𝑥 with index 𝑖𝑛𝑑(𝐶𝑖𝑚𝑎𝑥 ) = 𝐾 1 among the 𝐾 discrete
Draskovic et al. [36] proposed metrics that are inspired by industry values of 𝑖 .
safety standards, including the probability of deadline miss per hour, • 𝑖𝐿𝑂 is its pWCET in LO mode, characterized by PMF 𝑓 𝐿𝑂 (⋅) and
𝑖
the expected time before degradation happens, and the duration of the CDF 𝐹 𝐿𝑂 (⋅).
𝑖
degradation, and presented a system-wide approach to probabilistic • 𝑖𝐻 𝐼 is its pWCET in HI mode, characterized by PMF 𝑓 𝐻 𝐼 (⋅) and
𝑖
scheduling of MCS. Guo et al. [37] proposed a new task model in CDF 𝐹 𝐻 𝐼 (⋅).
𝑖
which a new parameter is added to characterize the distribution of the
𝐶𝑖𝑑 𝑒𝑔 is valid for LO tasks (𝐿𝑖 = 𝐿𝑂), and denotes its Degraded
WCET estimations for each task. They presented efficient algorithms for
WCET in HI mode 𝐶𝑖𝑑 𝑒𝑔 with index 𝑖𝑛𝑑(𝐶𝑖𝑑 𝑒𝑔 ) ∈ [0, 𝐾 1].
MCS scheduling under this task model for both independent tasks and
failure-dependent tasks. • 𝐶𝑖𝑡𝑟 is valid for HI tasks (𝐿𝑖 = 𝐻 𝐼), and denotes its Threshold
We are aware of only one related work that addressed energy- WCET in LO mode 𝐶𝑖𝑡𝑟 with index 𝑖𝑛𝑑(𝐶𝑖𝑡𝑟 ) ∈ [0, 𝐾 1].
aware scheduling in MCS assuming probabilistic task execution times. Task execution model. The system is first initialized to be in LO
Bhuiyan et al. [7] proposed a probabilistic technique to derive an mode. If any HI task 𝜏𝑖 ∈ 𝛤𝐻 𝐼 executes beyond its 𝐶𝑖𝑡𝑟 , the system
energy-efficient processor speed that minimized the average energy switches from LO mode to HI mode. At the mode switch instant 𝑡𝑠 , if
consumption with DVFS, while ensuring deadlines of all tasks in MCS. jobs of LO tasks have run for longer than their 𝐶𝑖𝑑 𝑒𝑔 , any such jobs will
This work used non-preemptive fixed-priority scheduling and determin- be dropped, without suppressing future arrivals thereof. In addition, if a
istic schedulability test based on Worst-Case Response Time analysis, LO job has executed for less than 𝐶𝑖𝑑 𝑒𝑔 by the switch time instant, these
instead of probabilistic schedulability analysis. It is not directly com- carry-over jobs that have an arrival time before 𝑡𝑠 and have absolute
parable to our work due to the different task models and analysis deadlines after 𝑡𝑠 will continue to execute the leftover execution up to
techniques. 𝐶𝑖𝑑 𝑒𝑔 . While in HI mode, each LO task 𝜏𝑖 ∈ 𝛤𝐿𝑂 executes no more than
Table 1 summarized related work on probabilistic Scheduling for its 𝐶𝑖𝑑 𝑒𝑔 , i.e., it is dropped if its execution time exceeds 𝐶𝑖𝑑 𝑒𝑔 . The system
MCS. switches from HI mode to LO mode at an idle instant if no jobs wait
for executions at this time. Moreover, incomplete tasks are dropped at
3. Preliminaries their deadlines, hence there does not exist a backlog of outstanding
execution at the end of each hyper-period (this is a common assumption
3.1. Task model in industry practice [10].
The pWCET of a LO task in LO mode, or the pWCET of a HI task
Our task model is inspired by the IMC task model [17], with in HI mode, is the same as its nominal pWCET 𝑖 . The pWCET of a HI
extensions to the probabilistic scheduling scenario. We first introduce
some basic notations for probabilistic scheduling. A task 𝜏𝑖 s probabilistic
WCET (pWCET) 𝑖 is a random variable characterized by a Probability 1
Calligraphic letters are used to represent distributions while non
Mass Function (PMF) 𝑓𝑖 (⋅), where 𝑓𝑖 (𝑒𝑡) = 𝑃 (𝑖 = 𝑒𝑡) denotes the calligraphic letters are for scalars.
3
Y.-W. Zhang and J.-L. Zhang Journal of Systems Architecture 160 (2025) 103361
task 𝜏𝑖 in LO mode is trimmed with the upper bound 𝐶𝑖𝑡𝑟 to have the Table 2
Taskset parameters of 𝛤1 , with 𝐶1𝑑 𝑒𝑔 = 1, 𝐶2𝑡𝑟 = 1.
conditional PMF 𝑓 𝐿𝑂 (𝑒𝑡) = 𝑃 (𝑖 = 𝑒𝑡 𝑒𝑡𝐶𝑖𝑡𝑟 ). The pWCET of a LO
𝑖
Task 𝐿𝑖 𝑇𝑖 = 𝐷𝑖 𝑖 𝑖𝐿𝑂 𝑖𝐻 𝐼 𝑖𝐿𝑂 𝑖𝐻 𝐼
task 𝜏𝑖 in HI mode is trimmed with the upper bound 𝐶𝑖𝑑 𝑒𝑔 to have the
conditional PMF 𝑓 𝐻 𝐼 (𝑒𝑡) = 𝑃 (𝑖 = 𝑒𝑡 𝑒𝑡𝐶𝑖𝑑 𝑒𝑔 ). In other words, 𝐶𝑖𝑑 𝑒𝑔 ⎛1 2⎞ ⎛1 2⎞ ⎛1⎞ ⎛0.5 1.0⎞ ⎛0.5⎞
𝑖 𝜏1 LO 2 ⎜0.5 0.5⎟ ⎜0.5 0.5⎟ ⎜1.0⎟ ⎜0.5 0.5⎟ ⎜1.0⎟
is LO task 𝜏𝑖 s execution time budget in HI mode, and 𝐶𝑖𝑡𝑟 is HI task ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝0.5 1.0⎠ ⎝0.5 1.0⎠ ⎝1.0⎠ ⎝0.5 1.0⎠ ⎝1.0⎠
𝜏𝑖 s execution time budget in LO mode. This is inspired by the IMC task ⎛1 2⎞ ⎛1⎞ ⎛1 2⎞ ⎛0.5⎞ ⎛0.5 1.0⎞
𝜏2 HI 2 ⎜0.5 0.5⎟ ⎜1.0⎟ ⎜0.5 0.5⎟ ⎜1.0⎟ ⎜0.5 0.5⎟
model [17,19,20]. They are computed with Eqs. (1) and (2): ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝0.5 1.0⎠ ⎝1.0⎠ ⎝0.5 1.0⎠ ⎝1.0⎠ ⎝0.5 1.0⎠
∀𝜏𝑖 ∈ 𝛤𝐿𝑂 𝑓 𝐿𝑂 (𝑒𝑡) = 𝑓𝑖 (𝑒𝑡), (1)
𝑖
⎧∑ 𝑑 𝑒𝑔
𝑒𝑡 ≥𝐶 𝑑 𝑒𝑔 𝑓 𝐿𝑂 (𝑒𝑡 ), 𝑒𝑡 = 𝐶𝑖
𝑖 𝑖 • [[𝐴]]0 stands for max(𝐴, 0).
𝑓 𝐻 𝐼 (𝑒𝑡) = ⎨𝑓 𝐿𝑂 (𝑒𝑡), 𝑒𝑡 < 𝐶𝑖𝑑 𝑒𝑔𝑡𝑠 stands for the mode-switch time.
𝑖
𝑖 𝑡𝐷 𝑡
⎪0, 𝑒𝑡 > 𝐶𝑖𝑑 𝑒𝑔 • 𝑚𝑖 = ⌊ 𝑇 𝑖 ⌋ and 𝑘𝑖 = ⌊ 𝑇𝑠 ⌋ are the number of jobs for 𝜏𝑖 in the
𝑖 𝑖
interval [0, 𝑡) and [0, 𝑡𝑠 ), respectively.
𝐷𝐵 𝐹𝐿 (𝜏𝑖 , 𝑡) stands for the processor demand of any task 𝜏𝑖 ∈ 𝛤
∀𝜏𝑖 ∈ 𝛤𝐻 𝐼 𝑓 𝐻 𝐼 (𝑒𝑡) = 𝑓𝑖 (𝑒𝑡) (2) within [0, 𝑡) in LO mode.
𝑖
∑ • 𝐷𝐵 𝐹 (𝐽𝐿 , 𝑡) and 𝐷𝐵 𝐹 (𝐽𝐻 , 𝑡) stand for the processor demand of a
𝑒𝑡 ≥𝐶 𝑡𝑟 𝑓 𝐻 𝐼 (𝑒𝑡 ), 𝑒𝑡 = 𝐶𝑖𝑡𝑟
𝑖 𝑖
carry-over job released by task 𝜏𝑖 ∈ 𝛤𝐿𝑂 and 𝜏𝑖 ∈ 𝛤𝐻 𝐼 within [0, 𝑡),
𝑓 𝐿𝑂 (𝑒𝑡) = ⎨𝑓 𝐻 𝐼 (𝑒𝑡), 𝑒𝑡 < 𝐶𝑖𝑡𝑟
𝑖
𝑖 respectively.
⎩0, 𝑒𝑡 > 𝐶𝑖𝑡𝑟𝑟𝑖 stands for the arrival time of the carry-over job that arrives
before 𝑡𝑠 and has a deadline after 𝑡𝑠 .
Since task 𝜏𝑖 s period 𝑇𝑖 is a constant in both LO and HI modes, its • 𝐷𝐵 𝐹𝐿𝐻 (𝜏𝑖 , 𝑡) stands for the processor demand of a LO task 𝜏𝑖 within
probabilistic Worst-Case Utilization (pWCU) can be obtained by dividing 𝐻 (𝜏 , 𝑡) stands for the processor
[0, 𝑡) in HI mode, while 𝐷𝐵 𝐹𝐻 𝑖
its pWCET by its period: 𝑖 = 𝑖 𝑇𝑖 , 𝑖𝐿𝑂 = 𝑖𝐿𝑂 𝑇𝑖 in LO mode, and
demand of a HI task 𝜏𝑖 within [0, 𝑡) in HI mode.
𝑖𝐻 𝐼 = 𝑖𝐻 𝐼 𝑇𝑖 in HI mode. The pWCU of a taskset can be obtained by
summing the pWCUs of all tasks in the taskset. Fig. 1 illustrates a carry-over job and the mode switch. The down-
ward arrow represents the job arrival time. If the execution time of 𝜏𝑖
Example 1. A taskset 𝛤1 with two tasks is shown in Table 2. Each task exceeds 𝐶𝑖𝐿𝑂 without signaling completion, the system switches from
𝜏𝑖 s nominal pWCET 𝑖 is shown in matrix form defined in Eq. (3). For LO mode to HI mode. 𝐽𝐻 is a carry-over job.
the matrix form, the first row denotes each discrete value of 𝑖 ; the
According to the Task Execution model, the processor demand
second row denotes probability values of the PMF 𝑓𝑖 (⋅); and the third
of LO carry-over jobs is always less than or equal to 𝐶𝑖𝐿𝑂 , while the
row denotes cumulative probability values of the CDF 𝐹𝑖 (⋅).
processor demand of HI carry-over jobs is always less than or equal to
𝐶0 𝐶1 … 𝐶𝐾1 ⎞
𝐶𝑖𝐻 𝐼 . Therefore, 𝐷𝐵 𝐹 (𝐽𝐿 , 𝑡) can be calculated as follows:
⎜ 𝑓 (𝐶0 ) 𝑓 (𝐶1 ) … 𝑓 (𝐶𝐾1 ) ⎟ (3) {
𝑖 𝑖 𝑖𝐶𝑖𝐿𝑂 , 𝑟𝑖 + 𝐷𝑖𝑡
⎝𝐹𝑖 (𝐶0 ) 𝐹𝑖 (𝐶1 ) … 𝐹𝑖 (𝐶𝐾1 )⎠ 𝐷𝐵 𝐹 (𝐽𝐿 , 𝑡) = (5)
0, 𝑜𝑡𝑒𝑟𝑤𝑖𝑠𝑒.
The PMF of 𝜏𝑖 s pWCET in LO mode 𝑖𝐿𝑂 is obtained by Eq. (2); the
PMF of its pWCET in HI mode 𝑖𝐻 𝐼 is obtained by Eq. (1). For the toy and 𝐷𝐵 𝐹 (𝐽𝐻 , 𝑡) can be calculated as follows:
{
example, the LO task 𝜏1 s nominal pWCET 1 has two possible values 𝐶𝑖𝐻 𝐼 , 𝑟𝑖 + 𝐷𝑖𝑡
1 and 2, each with probability 0.5; its pWCET in LO mode 1𝐿𝑂 is the 𝐷𝐵 𝐹 (𝐽𝐻 , 𝑡) = (6)
0, 𝑜𝑡𝑒𝑟𝑤𝑖𝑠𝑒.
same as 1 ; its pWCET in HI mode 1𝐻 𝐼 is obtained by trimming 1 with
the upper bound 𝐶1𝑑 𝑒𝑔 = 1 and 𝑖𝑛𝑑(𝐶1𝑑 𝑒𝑔 ) = 0 (assuming the index starts
from 0), with one possible value of 1 with a probability 1.0. The HI From [3,17], we have the following Theorems.
task 𝜏2 s nominal pWCET 2 has two possible values 1 and 2, each with
probability 0.5; its pWCET in LO mode 2𝐿𝑂 is obtained by trimming Theorem 1. A deterministic IMC taskset 𝛤 is schedulable under EDF in
2 with the upper bound 𝐶2𝑡𝑟 = 1 and 𝑖𝑛𝑑(𝐶2𝑡𝑟 ) = 0, with one possible LO mode, if 0 < ∀𝑡 ≤ 𝑡𝑚𝑎𝑥 ,
value of 1 with a probability 1.0; its pWCET in HI mode 2𝐻 𝐼 is the ∑
𝐷𝐵 𝐹𝐿 (𝜏𝑖 , 𝑡) ≤ 𝑡, (7)
same as 2 . The matrix that denotes 𝜏𝑖 s pWCU is obtained by dividing 𝜏𝑖 ∈𝛤
each term in the first row of its pWCET matrix by its period 𝑇𝑖 .
where 𝐷𝐵 𝐹𝐿 (𝜏𝑖 , 𝑡) = [[𝑚𝑖 + 1]]0 ⋅ 𝐶𝑖𝐿𝑂 , and 𝑡𝑚𝑎𝑥 is a hyper-period.
Eq. (4) shows the definitions of pWCU for the subset of LO tasks
𝛤𝐿𝑂 in LO mode. (As mathematical background, the addition of two
discrete random variables  and  results in a new random variable
Theorem 2. A deterministic IMC taskset 𝛤 is schedulable under EDF in
 with PMF computed by the convolution of the two PMFs  and ,
⨂ ∑ HI mode, if 0 < ∀𝑡 ≤ 𝑡𝑚𝑎𝑥 , 0 < 𝑡𝑠 < 𝑡,
i.e.,  =  , where 𝑃 ( = 𝑧) = ∞ 𝑘=−∞ 𝑃 ( = 𝑘)𝑃 ( = 𝑧 𝑘). ∑ ∑
⨂ ⨂ 𝐷𝐵 𝐹𝐿𝐻 (𝜏𝑖 , 𝑡𝑠 , 𝑡) + 𝐷 𝐵 𝐹𝐻 𝐻
(𝜏𝑗 , 𝑡𝑠 , 𝑡) ≤ 𝑡, (8)
𝐿𝑂 𝐿𝑂 𝐻𝐼
𝐿𝑂 (𝛤 ) = 𝑖 , 𝐻 𝐼 (𝛤 ) = 𝑖𝐻 𝐼 , (4) 𝜏𝑖 ∈𝛤𝐿𝑂 𝜏𝑗 ∈𝛤𝐻 𝐼
𝜏𝑖 ∈𝛤𝐿𝑂 𝜏𝑖 ∈𝛤𝐻 𝐼
𝐿𝑂 (𝛤 ) denotes pWCU of 𝛤
where 𝐿𝑂 𝐻𝐼 where 𝐷𝐵 𝐹𝐿𝐻 (𝜏𝑖 , 𝑡𝑠 , 𝑡) = 𝑘𝑖 𝐶𝑖𝐿𝑂 + 𝐷𝐵 𝐹 (𝐽𝐿 , 𝑡) + 𝑐𝑖 𝐶𝑖𝐻 𝐼 , and 𝐷𝐵 𝐹𝐻
𝐻 (𝜏 , 𝑡 , 𝑡)
𝑖 𝑠
𝐿𝑂 in LO mode; 𝐻 𝐼 (𝛤 ) denotes
can be determined as follows:
pWCU of 𝛤𝐻 𝐼 in HI mode. {
𝐻 𝐷𝐵 𝐹 (1), 𝐷𝑖𝑡 𝑡𝑠 ;
𝐷 𝐵 𝐹𝐻 (𝜏𝑖 , 𝑡𝑠 , 𝑡) = (9)
3.2. Existing deterministic IMC scheduling max{𝐷𝐵 𝐹 (1), 𝐷𝐵 𝐹 (2)}, 𝑂𝑡𝑒𝑟𝑤𝑖𝑠𝑒,
Liu et al. [17] have studied the schedulability test for deterministic where 𝐷𝐵 𝐹 (1) = 𝑏𝑖 𝐶𝑖𝐿𝑂 + 𝐷𝐵 𝐹 (𝐽𝐻 , 𝑡) + 𝑎𝑖 𝐶𝑖𝐻 𝐼 , 𝐷𝐵 𝐹 (2) = 𝑘𝑖 𝐶𝑖𝐿𝑂 +
𝑡 (𝑡𝐷𝑖 −𝑚𝑖 𝑇𝑖 )
IMC task model and proposed the sufficient conditions of the schedu- 𝐷𝐵 𝐹 (𝐽𝐻 , 𝑡), 𝑎𝑖 = [[𝑚𝑖 𝑏𝑖 ]]0 , 𝑏𝑖 = [[⌊ 𝑠 𝑇
⌋]]0 , and 𝑐𝑖 = [[𝑚𝑖 𝑘𝑖 ]]0 .
𝑖
lability under EDF-VD. We first introduce the following notations.
4
Y.-W. Zhang and J.-L. Zhang Journal of Systems Architecture 160 (2025) 103361
Fig. 1. Carry-over job.
4. Probabilistic IMC scheduling
According to [3,17], we should consider two cases to determine the
4.1. Schedulability analysis probabilistic processor demand of any task 𝜏𝑖 ∈ 𝛤𝐻 𝐼 within [0, 𝑡) in HI
mode.
Before presenting the schedulability analysis, let us introduce a few Case 1: 𝐷𝑖𝑡 𝑡𝑠 . The maximum demand of a job released by the
notations. HI task 𝜏𝑖 is generated while its deadline coincides with 𝑡. According
to Eq. (9) in Theorem 2, the probabilistic processor demand of any
• max{} stands for the maximum value of random variable .
task 𝜏𝑖 ∈ 𝛤𝐻 𝐼 within [0, 𝑡) in HI mode is equal to  (1) = ((𝑏𝑖 ) ⊙
⎛𝑥⎞ ⨂ ⨂
𝑖𝐿𝑂 )  (𝐽𝐻 , 𝑡) ((𝑎𝑖 ) ⊙ 𝑖𝐻 𝐼 ).
• (𝑥) = ⎜1⎟, where 𝑥 is a constant.
⎜ ⎟ Case 2: 𝐷𝑖 > 𝑡 𝑡𝑠 . The HI task 𝜏𝑖 has at most one job with a
⎝1⎠
processor demand 𝐶𝑖𝐻 𝐼 . If the deadline of this job is 𝐷𝑖 , the probabilistic
•  𝐿 (𝜏𝑖 , 𝑡) stands for the probabilistic processor demand of any
processor demand is the same as  (1). Moreover, the only way to
task 𝜏𝑖 within [0, 𝑡) in LO mode.
increase the demand of the HI task 𝜏𝑖 is to add a new job in the interval.
•  (𝐽𝐿 , 𝑡) and  (𝐽𝐻 , 𝑡) stand for the probabilistic processor
In other words, the first job of the HI task 𝜏𝑖 arrives at time 0. Therefore,
demand of a carry-over job released by the task 𝜏𝑖 ∈ 𝛤𝐿𝑂 and
the processor demand includes two parts: one part is the demand of
𝜏𝑖 ∈ 𝛤𝐿𝑂 within [0, 𝑡), respectively.
all jobs before 𝑡𝑠 , and the other part is the demand of a carry-over
•  𝐻 𝐿 (𝜏𝑖 , 𝑡) stands for the probabilistic processor demand of a LO job 𝐽𝐻 . In this case, the probabilistic processor demand is equal to
task 𝜏𝑖 within [0, 𝑡) in HI mode, while  𝐻
𝐻 (𝜏𝑖 , 𝑡) stands for the  (2) = ((𝑘𝑖 ) ⊙ 𝑖𝐿𝑂 )  (𝐽𝐻 , 𝑡).
probabilistic processor demand of a HI task 𝜏𝑖 within [0, 𝑡) in HI In short, the probabilistic processor demand of any task 𝜏𝑖 ∈ 𝛤𝐻 𝐼
mode. within [0, 𝑡) and 𝐷𝑖𝑡 𝑡𝑠 in HI mode can be determined as follows:
•  𝐿 (𝑡) stands for the probabilistic processor demand of all tasks {
within [0, 𝑡) in LO mode.  (1), 𝐷𝑖𝑡 𝑡𝑠 ;
 𝐻 (𝜏
𝐻 𝑖 , 𝑡) = (15)
•  𝐻 (𝑡) stands for the probabilistic processor demand of all tasks , 𝑂𝑡𝑒𝑟𝑤𝑖𝑠𝑒,
within [0, 𝑡) in HI mode. where  can be determined as follows:
𝑡𝑚𝑎𝑥
• 𝛱𝑡=1 𝑡 = 1 × 2 ×× 𝑡𝑚𝑎𝑥 . {
 (1), max{ (2)} ≤ max{ (1)};
= (16)
According to [3,17,33], the probabilistic processor demand of any  (2), 𝑂𝑡𝑒𝑟𝑤𝑖𝑠𝑒.
task 𝜏𝑖 ∈ 𝛤 within [0, 𝑡) in LO mode can be calculated as follows:
 𝐿 (𝜏𝑖 , 𝑡) = ([[𝑚𝑖 + 1]]0 ) ⊙ 𝑖𝐿𝑂 , (10) Therefore, the probabilistic processor demand of all tasks within
[0, 𝑡) in HI mode is determined by the following:
where ⊙ denotes the Hadamard product, where each element in the 𝑖th ⨂ ⨂ ⨂
 𝐻 (𝑡) = (  𝐻𝐿 (𝜏𝑖 , 𝑡)) (  𝐻
𝐻 (𝜏𝑖 , 𝑡)). (17)
row of the right matrix is multiplied by the element on the 𝑖th row of
𝜏𝑖 ∈𝛤𝐿𝑂 𝜏𝑖 ∈𝛤𝐻 𝐼
the left vector.
In addition, the probabilistic processor demand of all tasks within
[0, 𝑡) in LO mode can be calculated as follows: Theorem 3. An IMC taskset 𝛤 is deterministically schedulable under EDF,
 𝐿 (𝑡) =  𝐿 (𝜏𝑖 , 𝑡). (11) if 0 < ∀𝑡 ≤ 𝑡𝑚𝑎𝑥 , 0 < 𝑡𝑠 < 𝑡,
𝜏𝑖 ∈𝛤
max{ 𝐿 (𝑡)} ≤ 𝑡, 𝑎𝑛𝑑 max{ 𝐻 (𝑡)} ≤ 𝑡, (18)
The probabilistic processor demand of a carry-over job released by It is probabilistically schedulable if the maximum probability that the pro-
LO task 𝜏𝑖 within [0, 𝑡) can be calculated as follows: cessor demand of all tasks in both LO mode and HI mode exceeds 𝑡 does
{
𝑖𝐿𝑂 , 𝑟𝑖 + 𝐷𝑖𝑡 not exceed the permitted system failure probability 𝐹𝑠 ,2 expressed as:
 (𝐽𝐿 , 𝑡) = (12) 𝑡
(0), 𝑜𝑡𝑒𝑟𝑤𝑖𝑠𝑒. 1 𝛱𝑡 𝑚𝑎𝑥
𝑘=𝑡 𝐹 𝐿 (𝑡𝑘 ) (𝑡𝑘 ) ≤ 𝐹𝑠 , 𝑎𝑛𝑑 (19)
𝑡
1 𝛱𝑡 𝑚𝑎𝑥 𝐹 (𝑡 ) ≤ 𝐹𝑠 .
The probabilistic processor demand of a carry-over job released by 𝑘 =𝑡  𝐻 (𝑡𝑘 ) 𝑘
HI task 𝜏𝑖 within [0, 𝑡) can be calculated as follows:
{
𝑖𝐻 𝐼 , 𝑟𝑖 + 𝐷𝑖𝑡
 (𝐽𝐻 , 𝑡) = (13)
(0), 𝑜𝑡𝑒𝑟𝑤𝑖𝑠𝑒. 2
Chen et al. [38] pointed out that there are certain flaws in the probabilis-
tic WCRT based on critical instant instances. However, our work focuses on the
The probabilistic processor demand of any task 𝜏𝑖 ∈ 𝛤𝐿𝑂 within [0, 𝑡)
overall distribution of all task behaviors within a tasks hyper-period, rather
in HI mode can be calculated as follows: than relying solely on a single critical instant and considers the probability
⨂ ⨂
 𝐻 𝐿𝑂
𝐿 (𝜏𝑖 , 𝑡) = ((𝑘𝑖 ) ⊙ 𝑖 )  (𝐽𝐿 , 𝑡) ((𝑐𝑖 ) ⊙ 𝑖𝐻 𝐼 ). (14) distribution of all possible processor demand throughout the hyper-period.
5
Y.-W. Zhang and J.-L. Zhang Journal of Systems Architecture 160 (2025) 103361
Table 3  𝐿 (𝜏2 , 𝑡) = (0), and  𝐿 (𝜏3 , 𝑡) = 3𝐿𝑂 . In addition, we have
Taskset parameters of 𝛤2 , with 𝐶1𝑑 𝑒𝑔 = 3, 𝐶2𝑡𝑟 = 1, 𝐶3𝑑 𝑒𝑔 = 3. ⎛ 3 4 ⋯ 8 9 10 ⎞
Task 𝐿𝑖 𝑇𝑖 = 𝐷𝑖 𝑖𝐿𝑂 𝑖𝐻 𝐼 ⎜ ⎟
 𝐿 (𝑡) =  = ⎜0.008645 0.273 ⋯ 0.00266 0.000384 0.000001⎟
⎛ 1 3 4 5 ⎞ ⎛ 1 3 ⎞ ⎜0.008645 0.281645 ⋯ 0.999615 0.999999 1.0 ⎟⎠
⎜0.455 ⎝
𝜏1 LO 10 0.54 0.004 0.001⎟ ⎜0.455 0.545⎟
⎜ ⎟ ⎜ ⎟ from Eq. (11). Moreover, from (17), we have  𝐻 (𝑡) = .
⎝0.455 0.995 0.999 1.0 ⎠ ⎝0.455 1.0 ⎠
⎛ 0.5 1 ⎞ ⎛ 0.5 1 2 3 ⎞ When 10 < 𝑡 < 20, 𝑚1 = 0, 𝑚2 = 1, 𝑚3 = 0, 𝑎𝑖 = 0, 𝑐𝑖 = 0, and
𝜏2 HI 20 ⎜0.49 0.51⎟ ⎜0.49 0.5 0.009 0.001⎟
⎜ ⎟ ⎜ ⎟ 𝑏𝑖 = 0 (𝑖 = 1, 2, 3). According to Eq. (11), we have  𝐿 (𝑡) = . If
⎝0.49 1.0 ⎠ ⎝0.49 0.99 0.999 1.0 ⎠
𝑡𝑠 < 10, 𝑘𝑖 = 0 (𝑖 = 1, 2, 3). According to Eq. (17), we have  𝐻 (𝑡) = 
⎛ 2 3 4 5 ⎞ ⎛ 2 3 ⎞
𝜏3 LO 10 ⎜0.019 0.6 0.38 0.001⎟ ⎜0.019 0.981⎟ and max{ 𝐻 (𝑡) ≤ 𝑡}. If 10 ≤ 𝑡𝑠 < 𝑡, we have 𝑘1 = 1, 𝑘2 = 0,
⎜ ⎟ ⎜ ⎟
⎝0.019 0.619 0.999 1.0 ⎠ ⎝0.019 1.0 ⎠ and 𝑘3 = 1. According to Eq. (14), we have  𝐻 𝐿𝑂 and
𝐿 (𝜏1 , 𝑡) = 1
 𝐻 (𝜏
𝐿 3 , 𝑡) =  𝐿𝑂 . We calculate  𝐻 (𝜏 , 𝑡) = (0) from Eq. (15).
3 𝐻 2
In addition, we have  𝐻 (𝑡) =  from Eq. (17). Therefore, we have
max{ 𝐻 (𝑡)} ≤ 𝑡 and max{ 𝐿 (𝑡)} ≤ 𝑡.
When 𝑡 = 20, 𝑚1 = 1, 𝑚2 = 0, 𝑚3 = 1. According to Eq. (10), we
have  𝐿 (𝜏1 , 𝑡) = (2) ⊙ 1𝐿𝑂 ,  𝐿 (𝜏2 , 𝑡) = 2𝐿𝑂 , and  𝐿 (𝜏3 , 𝑡) =
Proof. The IMC taskset 𝛤 is deterministically schedulable under EDF if it
(2) ⊙ 3𝐿𝑂 . In addition, we have
is deterministically schedulable in both LO mode and HI mode. The condi-
tion for deterministic schedulability in LO mode and HI mode Eq. (18) ⎛ 6.5 ⋯ 19 20.5 21 ⎞
 𝐿 (𝑡) = ⎜0.00423605 ⋯ 0.00019584 0.00000049 0.00000051⎟
is self-evident, because it can be directly derived from Theorems 1 and ⎜ ⎟
2. In addition, the IMC taskset 𝛤 is probabilistically schedulable under ⎝0.00406315 ⋯ 0.999999 0.99999949 1.0 ⎠
EDF if it is probabilistically schedulable in both LO mode and HI mode. from Eq. (11). If 𝑡𝑠 < 10, 𝑎1 = 1, 𝑎2 = 0, 𝑎3 = 1, 𝑐1 = 1, 𝑐2 = 0,
The condition for probabilistic schedulability (Eq. (19)) states that the 𝑐3 = 1, 𝑘𝑖 = 0, and 𝑏𝑖 = 0 (𝑖 = 1, 2, 3). From Eq. (17), we have
probability that the processor demand of all tasks in both LO mode and max{ 𝐻 (𝑡)} = 19. If 10 ≤ 𝑡𝑠 < 𝑡, 𝑘1 = 1, 𝑘2 = 0, 𝑘3 = 1, 𝑏1 = 1,
HI mode exceeds 𝑡 is less than or equal to 𝐹𝑠 , hence it is probabilistically 𝑏2 = 0, 𝑏3 = 1, 𝑎𝑖 = 0 and 𝑐𝑖 = 0 (𝑖 = 1, 2, 3). According to Eq. (17),
schedulable with system failure probability not exceeding 𝐹𝑠 . (Note that we have max{ 𝐻 (𝑡)} = 23. Therefore, we have max{ 𝐿 (𝑡)} > 𝑡
the condition of deterministic schedulability in Eq. (18) is a special and max{ 𝐻 (𝑡)} > 𝑡 (10 ≤ 𝑡𝑠 < 𝑡), but 1 𝐹 𝐿 (𝑡) (𝑡) ≤ 𝐹𝑠
case of the condition of probabilistic schedulability in Eq. (19), with and 1 𝐹 𝐻 (𝑡) (𝑡) ≤ 𝐹𝑠 . According to Theorem 3, the taskset 𝛤 is
permitted system failure probability equal to 0 (𝐹𝑠 = 0).) Q.E.D. probabilistically schedulable.
In the deterministic analysis, the processor demand grows in a
stepwise manner based on the interval length. The processor demand 5. Energy-efficient task execution model
is affected only when the increase in interval length is a multiple of the
task period. When we switch to probabilistic analysis, the probability We present in sequence the power model, the calculation of energy-
distribution of processor demand also increases in a stepwise manner to efficient processor speeds in LO mode, and the Energy-Efficient Task
maintain consistency. In other words, during deterministic analysis, the Execution Model in this section.
processor demand does not change in the given time intervals, and in
probabilistic scheduling analysis, the values in its probability distribu- 5.1. Power model
tion of processor demand also remain unchanged. Specifically, there are
some 𝑡𝑘 values that can generate the same probability distribution of We adopt the state-of-the-art processor power model [3941]
processor demand. The values of 𝐹 𝐿 (𝑡𝑘 ) (𝑡𝑘 ) and 𝐹 𝐻 (𝑡𝑘 ) (𝑡𝑘 ), which
𝑃 = 𝑃𝑠 + (𝑃𝑖𝑛𝑑 + 𝐶𝑒𝑓 𝑠𝑚 ), (20)
correspond to the same probability distribution of processor demand,
should not be computed repeatedly in Eq. (19). Therefore, we only where 𝑃𝑠 is a static power and 𝑃𝑖𝑛𝑑 is the frequency-independent active
calculate once. In addition, If 𝑡1 , 𝑡2 and 𝑡𝑙 (𝑡1 < 𝑡2 < 𝑡𝑙 ) can generate power. = 1 if the system is active (defined as having computation in
the same probability distribution of the processor demand for all tasks progress); otherwise, = 0. 𝐶𝑒𝑓 is an effective switching capacitance
in both modes. We choose the minimum value 𝑡1 among these values, and 𝑚 is system-application-dependent constant. 𝑠 is the normalized
which corresponds to 𝐹 𝐿 (𝑡1 ) (𝑡1 ) and 𝐹 𝐻 (𝑡1 ) (𝑡1 ). This is because it processor speed (frequency). Like [39], we ignore a static power (𝑃𝑠 =
is the value that maximizes the probability of the processor demand 0) and set 𝑃𝑖𝑛𝑑 = 0.01, 𝐶𝑒𝑓 = 1, 𝑚 = 3.
exceeding the interval length. Considering our task model, the expected energy consumption of a
single job of task 𝜏𝑖 is [4244]:
4.2. Example 2 𝑥
𝐸 𝑖 = (𝑃𝑖𝑛𝑑 + 𝐶𝑒𝑓 𝑠𝑚 ) ⋅ 𝑖 (21)
𝑠
We present a taskset 𝛤2 , with the parameters shown in Table 3. ∑
(The nominal pWCET 𝑖 is omitted for brevity.) We assume that 𝐹𝑠 = where 𝑥𝑖 = 𝐾1 𝑘 𝑘
𝑘=0 𝐶𝑖 ⋅ 𝑓𝑖𝐿𝑂 (𝐶𝑖 ) with the normalized processor speed
1.0 × 106 . 𝑆𝑚𝑎𝑥 = 1. In addition, the processor speed 𝑠 should not be lower than
In this example, 𝑡𝑚𝑎𝑥 = 20. 0 < 𝑡 < 10, 0 < 𝑡𝑠 < 𝑡, we have 𝑆𝑐 𝑟𝑖𝑡 , where 𝑆𝑐 𝑟𝑖𝑡 (𝑆𝑐 𝑟𝑖𝑡√< 𝑆𝑚𝑎𝑥 ) is an energy-efficient speed while it can
𝑡𝐷 𝑡 𝑃𝑖𝑛𝑑
𝑚𝑖 = 1 (𝑚𝑖 = ⌊ 𝑇 𝑖 ⌋), 𝑘𝑖 = 0 (𝑘𝑖 = ⌊ 𝑇𝑠 ⌋), 𝑎𝑖 = 0, 𝑐𝑖 = 0, and be computed 𝑆𝑐 𝑟𝑖𝑡 = 𝑚 [39].
𝑖 𝑖 (𝑚1)⋅𝐶𝑒𝑓
𝑏𝑖 = 0 (𝑖 = 1, 2, 3). According to Eq. (10),  𝐿 (𝜏𝑖 , 𝑡) = (0). In To facilitate comparisons between task sets with varying hyper-
addition, we have  𝐿 (𝑡) = (0) from Eq. (11). From Eq. (12), we periods, we utilize the definition of normalized energy consumption of
have  (𝐽𝐿 , 𝑡) = (0) for LO tasks 𝜏1 and 𝜏3 . Moreover, we have task set 𝛤 within its hyper-period [22] (i.e., its power consumption):
 (𝐽𝐻 , 𝑡) = (0) for HI task 𝜏2 from Eq. (13). Therefore, we have 𝑖
1 ∑𝑛 ∑
𝑥
 𝐻 𝐻
𝐿 (𝜏1 , 𝑡) = (0) and  𝐿 (𝜏3 , 𝑡) = (0) from Eq. (14). Due to 𝑁 𝐸(𝛤 ) = (𝑃 + 𝐶𝑒𝑓 𝑠𝑚 ) ⋅ 𝑖 (22)
𝑘2 = 0, 𝑎2 = 0, 𝑏2 = 0 and 𝐷2 > 𝑡𝑡𝑠 , we have  (1) = (0),  (2) = 𝐻 𝑃 (𝛤 ) 𝑖=1 𝑗=1 𝑖𝑛𝑑 𝑠
(0), and max{ (2)} ≤ max{ (1)}. According to Eq. (15), we ∑𝑛
𝑥𝑖
have  𝐻 𝐻 (𝜏2 , 𝑡) = (0). We calculate  𝐻 (𝑡) = (0) from Eq. (17).
= (𝑃𝑖𝑛𝑑 + 𝐶𝑒𝑓 𝑠𝑚 ) ⋅ ,
𝑖=1
𝑠𝑇𝑖
Therefore, we have max{ 𝐿 (𝑡)} ≤ 𝑡 and max{ 𝐻 (𝑡) ≤ 𝑡}.
When 𝑡 = 10, 𝑚1 = 0, 𝑚2 = 1, 𝑚3 = 0, 𝑘𝑖 = 0, 𝑎𝑖 = 0, 𝑐𝑖 = 0, and where 𝑖 = 𝐻 𝑃 (𝛤 )𝑇𝑖 is the number of jobs of task 𝜏𝑖 ∈ 𝛤 released in
𝑏𝑖 = 0 (𝑖 = 1, 2, 3). According to Eq. (10), we have  𝐿 (𝜏1 , 𝑡) = 1𝐿𝑂 , the hyper-period 𝐻 𝑃 (𝛤 ).
6
Y.-W. Zhang and J.-L. Zhang Journal of Systems Architecture 160 (2025) 103361
5.2. Calculating energy-efficient processor speeds Table 4
Taskset parameters of 𝛤3 , with 𝐶1𝑑 𝑒𝑔 = 1.5, 𝐶2𝑡𝑟 = 2, 𝐶3𝑑 𝑒𝑔 = 2.
We determine the energy-efficient processor speed in LO mode 𝑆𝐿 Task 𝐿𝑖 𝑇𝑖 = 𝐷𝑖 𝑖𝐿𝑂 𝑖𝐻 𝐼
and schedule the tasks with 𝑆𝑚𝑎𝑥 = 1 in HI mode if an IMC taskset 𝛤 is ⎛1 1.5 2 2.5 ⎞ ⎛1 1.5⎞
𝜏1 LO 10 ⎜0.1 0.4 0.35 0.15⎟ ⎜0.1 0.9⎟
deterministically schedulable by EDF on a single processor. ⎜ ⎟ ⎜ ⎟
⎝0.1 0.5 0.85 1.0 ⎠ ⎝0.1 1.0⎠
A taskset 𝛤 running on a processor with speed 𝑆𝐿 is equivalent
⎛ 1 2 ⎞ ⎛ 1 2 4 5 ⎞
to the taskset 𝛤 running on a processor with speed 𝑆max = 1 with 𝜏2 HI 20 ⎜0.01 0.99⎟ ⎜0.01 0.49 0.45 0.05⎟
⎜ ⎟ ⎜ ⎟
proportionally-scaled execution times 1𝑆𝐿 times of each task in 𝛤 . ⎝0.01 1.0 ⎠ ⎝0.01 0.5 0.95 1.0 ⎠
Therefore, the probabilistic processor demand of any task 𝜏𝑖 ∈ 𝛤 with ⎛1.5 2 2.5 3⎞ ⎛1.5 2⎞
𝜏3 LO 10 ⎜0.2 0.3 0.4 0.1⎟ ⎜0.2 0.8⎟
speed 𝑆𝐿 within [0, 𝑡) in LO mode can be calculated as follows: ⎜ ⎟ ⎜ ⎟
⎝0.2 0.5 0.9 1.0⎠ ⎝0.2 1.0⎠
 𝐿 (𝜏𝑖 , 𝑡) = ([[𝑚𝑖 + 1]]0 ) ⊙ ((1𝑆𝐿 ) ⊙ 𝑖𝐿𝑂 ), (23)
The probabilistic processor demand of a carry-over job released by
LO task 𝜏𝑖 with speed 𝑆𝐿 within [0, 𝑡) can be calculated as follows: the energy-efficient task execution model based on DVFS as shown below.
{
(1𝑆𝐿 ) ⊙ 𝑖𝐿𝑂 , 𝑟𝑖 + 𝐷𝑖𝑡 Energy-efficient task execution model in probabilistic IMC. The
 (𝐽𝐿 , 𝑡) = (24) system is first initialized to be in LO mode with processor speed 𝑆𝐿 . If
(0), 𝑜𝑡𝑒𝑟𝑤𝑖𝑠𝑒.
any HI task 𝜏𝑖 ∈ 𝛤𝐻 𝐼 executes beyond its 𝐶𝑖𝑡𝑟 𝑆𝐿 , the system switches
The probabilistic processor demand of any task 𝜏𝑖 ∈ 𝛤𝐿𝑂 with speed into HI mode, with processor speed 𝑆𝑚𝑎𝑥 = 1. As the mode-switch
𝑆𝐿 within [0, 𝑡) in HI mode can be calculated as follows: instant, if jobs of LO tasks have run for longer than their 𝐶𝑖𝑑 𝑒𝑔 𝑆𝐿 , the
 𝐻 𝐿𝑂
𝐿 (𝜏𝑖 , 𝑡) =((𝑘𝑖 ) ⊙ ((1𝑆𝐿 ) ⊙ 𝑖 )) (25) jobs will be stopped until new released. In addition, if the execution
⨂ time of LO jobs is less than 𝐶𝑖𝑑 𝑒𝑔 𝑆𝐿 by the switch time instant, these
𝐻𝐼
 (𝐽𝐿 , 𝑡) ((𝑐𝑖 ) ⊙ 𝑖 ).
carry-over jobs will continue to execute the leftover execution up to
In addition, the system schedules tasks with 𝑆𝐿 in LO mode and 𝐶𝑖𝑙𝑒𝑓 𝑡𝑜𝑣𝑒𝑟 after the switch time instant and before their deadlines, where
𝑆𝑚𝑎𝑥 = 1 in HI mode,  (1) and  (2) in Eq. (16) are calculated 𝐶𝑖𝑙𝑒𝑓 𝑡𝑜𝑣𝑒𝑟 is the leftover execution time at the nominal processor speed
by Eqs. (26) and (27), respectively. 𝑆𝑚𝑎𝑥 = 1. While in HI mode, each LO task 𝜏𝑖 ∈ 𝛤𝐿𝑂 executes no more
⨂ than its 𝐶𝑖𝑑 𝑒𝑔 if it is started in HI mode, or its 𝐶𝑖𝑙𝑒𝑓 𝑡𝑜𝑣𝑒𝑟 if it is a leftover
 (1) =((𝑏𝑖 ) ⊙ ((1𝑆𝐿 ) ⊙ 𝑖𝐿𝑂 )) (26)
⨂ job started in LO mode. The system switches back to LO mode, with
𝐻𝐼
 (𝐽𝐻 , 𝑡) ((𝑎𝑖 ) ⊙ 𝑖 ). processor speed 𝑆𝐿 , at an idle instant if no jobs wait for executions at
this time. In addition, incomplete tasks are dropped at their deadlines,
⨂ hence there does not exist a backlog of outstanding execution at the
 (2) = ((𝑘𝑖 ) ⊙ ((1𝑆𝐿 ) ⊙ 𝑖𝐿𝑂 ))  (𝐽𝐻 , 𝑡). (27) end of each hyper-period.
6. Experimental evaluation
Theorem 4. Given an IMC taskset 𝛤 that is deterministically schedulable
by EDF on a single processor, it remains deterministically schedulable with
We evaluate our approach based on two performance metrics: the
the energy-efficient processor speed 𝑆𝐿 in LO mode and 𝑆𝑚𝑎𝑥 = 1 in HI
schedulability ratio, which represents the proportion of schedulable task
mode if 0 < ∀𝑡 ≤ 𝑡𝑚𝑎𝑥 , 0 < 𝑡𝑠 < 𝑡
sets (either deterministically or probabilistically schedulable) out of all
max{ 𝐿 (𝑡)} ≤ 𝑡, 𝑎𝑛𝑑 max{ 𝐻 (𝑡)} ≤ 𝑡, (28) task sets; and the normalized energy consumption of each task set, as
defined in Eq. (22).
where 𝑆𝑐 𝑟𝑖𝑡𝑆𝐿 ≤ 1,  𝐿 (𝜏𝑖 , 𝑡),  (𝐽𝐿 , 𝑡),  𝐻
𝐿 (𝜏𝑖 , 𝑡),  (1) and
We generate synthetic tasksets based on the following experiment
 (2) are given in Eqs. (23)(27), respectively.
settings:
Proof. Theorem 4 can be directly derived from Theorem 3. • Number of tasks in each taskset 𝛤 is set to 𝑛 = 4.
• Number of HI tasks in 𝛤 is set to 𝑛𝐶 𝑃 , where the Criticality
Proportion 𝐶 𝑃 is set to 𝐶 𝑃 = 0.5.
5.3. Example 3
• Number of discrete values of each task 𝜏𝑖 s nominal pWCET 𝑖 is
set to 𝐾 = 4.
Let us consider the task set 𝛤3 that consists of tasks with the param-
• Each of the 𝐾 probability values in the PMF of 𝑖 is selected
eters presented in Table 4. The processor has tens discrete normalized
randomly from [0, 1) while ensuring that they sum to 1 (similar
processor speed, i.e., [0.1, 0.2, … , 1.0] [45]. According to Theorem 3, the
to [46,47]).
taskset is deterministically schedulable in both modes. We calculate
• For each LO task 𝜏𝑖 ∈ 𝛤𝐿𝑂 , the index of the Degraded WCET 𝐶𝑖𝑑 𝑒𝑔
𝑆𝐿 = 0.8 on the basis of Theorem 4, by iteratively trying out the
among the 𝐾 discrete values of 𝑖 is set to 𝑖𝑛𝑑(𝐶𝑖𝑑 𝑒𝑔 ) = 0.5𝐾1 = 1.
available speeds, from lowest to highest, until we find the minimum
speed that satisfies all constraints. According to Eq. (21), we have
• For each HI task 𝜏𝑖 ∈ 𝛤𝐻 𝐼 , the index of the Threshold WCET 𝐶𝑖𝑡𝑟
𝑥̄ 1 = 1.775, 𝑥̄ 2 = 1.99, 𝑥̄ 3 = 2.2. In addition, we can then use Eq. (22) to
among the 𝐾 discrete values of 𝑖 is set to 𝑖𝑛𝑑(𝐶𝑖𝑡𝑟 ) = 0.5𝐾 1 = 1.
obtain the tasksets normalized energy consumption to be 0.3242925
with processor speed 𝑆𝐿 = 0.8 with DVFS, and 0.50197 with processor
𝑇𝑖 is randomly selected the set {10, 20, 40, 50, 100, 200, 400, 500,
speed 𝑆max = 1 for EDF without DVFS, which represents significant
1000} [48].
energy savings.
• To control taskset processor utilization, max{𝐿𝑂𝐿𝑂 (𝛤 )} is varied
from 0.1 to 0.9, in steps of 0.1, while max{𝐻𝐻𝐼𝐼 (𝛤 )} is chosen
5.4. Energy-efficient task execution model
randomly from the range [0.1, 1.0].
Assuming that the system is deterministically schedulable in both (Each task 𝜏𝑖 s pWCET 𝑖 and period 𝑇𝑖 are implicit, since both sys-
modes, we can use DVFS to reduce the processor speed to 𝑆𝐿 in LO tem schedulability and normalized energy consumption are dependent
mode, and set to 𝑆𝑚𝑎𝑥 = 1 in HI mode, while maintaining schedulability on the utilization values only, i.e., pWCU equal to pWCET divided by
in both modes. We modify the task execution model in Section 3.1 to be period.) Note that the time overhead of the proposed method is mainly
7
Y.-W. Zhang and J.-L. Zhang Journal of Systems Architecture 160 (2025) 103361
Fig. 2. Impact on the schedulability ratio by varying the permitted system failure
𝐿𝑂
probability 𝐹𝑠 and max{𝐿𝑂 (𝛤 )}.
spent on the schedulability test, with significant time consumption
arising from the calculation of the probabilistic processor demands for
the task set, which involves a large number of convolution operations.
As the number of tasks increases, the time overhead grows exponen-
tially. To maintain the accuracy of the scheduling test, we have not
yet identified better methods to reduce the time overhead. Hence, we
have limited the number of tasks to four. In the future, we will strive
to reduce the time overhead associated with convolutions.
In the first experiment, we vary 𝐹𝑠 from 101 to 109 with a step
size of 10 by multiplication, i.e., 𝐹𝑠 is plotted with log scale. The value
𝐹𝑠 = 109 is based on the permitted failure probability of 109 for ASIL
D, the highest safety certification level in ISO 26262. The additional
case of 𝐹𝑠 = 0 is the special case of deterministic schedulability only for
Fig. 3. Varying each HI tasks Threshold WCET index 𝑖𝑛𝑑(𝐶𝑖𝑡𝑟 ) and max{𝐿𝑂
𝐿𝑂
(𝛤 )}.
hard real-time systems. Fig. 2 shows the results, where each data point
represents the average outcome obtained from a variable number of
task sets selected from 500 synthetic tasksets generated for each value
of max{𝐿𝑂 𝐿𝑂 (𝛤 )}, using different seeds for the pseudo-random number • The schedulability ratio is negatively correlated with max
𝐿𝑂 (𝛤 )}, as expected.
{𝐿𝑂
generator.
• The schedulability ratio is negatively correlated with 𝐶𝑖𝑡𝑟 . With
We make the following observations from Fig. 2:
increasing 𝐶𝑖𝑡𝑟 , HI tasks have larger WCETs (both expected
and maximum) in LO mode according to the trimming opera-
• The schedulability ratio is positively correlated with 𝐹𝑠 , con- tion for pWCET defined in Eq. (2), causing max{ 𝐿 (𝑡)} and
firming the significant advantages of considering probabilistic max{ 𝐻 (𝑡)} to increase, which reduces system schedulability.
schedulability compared to considering deterministic schedulabil- • The average normalized energy consumption 𝑁 𝐸(𝛤 ) is positively
ity only, even at very small values of 𝐹𝑠 for high levels of safety correlated with max{𝐿𝑂 𝐿𝑂 (𝛤 )}. From Eq. (22), 𝑁 𝐸(𝛤 ) is depen-
certification. dent on each tasks expected pWCET 𝑥𝑖 and the energy-efficient
• The schedulability ratio is negatively correlated with max processor speed in LO mode 𝑆𝐿 . With increasing max{𝐿𝑂 𝐿𝑂 (𝛤 )},
{𝐿𝑂 𝐿𝑂 (𝛤 )}, since both max{ (𝑡)} and max{ (𝑡)} increase
𝐿 𝐻 both 𝑥𝑖 and 𝑆𝐿 increase, causing 𝑁 𝐸(𝛤 ) to increase.
with increasing max{𝐿𝑂 𝐿𝑂 (𝛤 )}, which reduces system schedulabil-
𝑁 𝐸(𝛤 ) is positively correlated with 𝐶𝑖𝑡𝑟 . With increasing 𝐶𝑖𝑡𝑟 , HI
ity. task 𝜏𝑖 has a larger expected pWCET in LO mode, causing both 𝑥𝑖
and 𝑆𝐿 to increase, which in turn causes 𝑁 𝐸(𝛤 ) to increase.
In the second experiment, we fix the permitted system failure prob-
ability to be 𝐹𝑠 = 107 (based on the requirement for ASIL A in ISO Averaged over all cases, our approach achieves an average reduction
26262). We vary each HI tasks 𝐶𝑖𝑡𝑟 through varying its index 𝑖𝑛𝑑(𝐶𝑖𝑡𝑟 ) of 33.49% for the average normalized energy consumption compared
from 0 to 𝐾 1 with step size 1, i.e., the sequence {0, 1, 2, 3} (The to EDF without DVFS.
case of 𝑖𝑛𝑑(𝐶𝑖𝑡𝑟 ) = 3 is the special case where each HI task 𝜏𝑖 has the
7. Practical considerations
same WCET in both modes.). Each LO tasks 𝐶𝑖𝑑 𝑒𝑔 is fixed to be the
default value of 𝑖𝑛𝑑(𝐶𝑖𝑑 𝑒𝑔 ) = 1. The results are shown in Fig. 3, including
In this section, we address some practical considerations in trans-
both the schedulability ratio, and the normalized energy consumption
posing our proposal into to industry practice.
(𝑁 𝐸(𝛤 ) defined in Eq. (22)). Each data point represents the average
Timing analysis for pWCET. Task 𝜏𝑖 s pWCET 𝑖 , as specified
outcome obtained from a variable number of task sets selected from 500
𝐿𝑂 (𝛤 )}, depending by its PMF, may be obtained via static, dynamic or measurement-
synthetic tasksets generated for each value of max{𝐿𝑂
𝑡𝑟 based, or hybrid timing analysis methods, as discussed in the survey
on the value of 𝑖𝑛𝑑(𝐶𝑖 ).
paper [49]. Static Probabilistic Timing Analysis (SPTA) is based on
We make the following observations from Fig. 3: the analysis of the program code, along with an abstract model of the
8
Y.-W. Zhang and J.-L. Zhang Journal of Systems Architecture 160 (2025) 103361
hardware behavior. Measurement-Based Probabilistic Timing Analysis CRediT authorship contribution statement
(MBPTA) typically applies Extreme Value Theory (EVT) to make a
statistical estimate of the pWCET distribution of a program. Hybrid Yi-Wen Zhang: Writing review & editing, Writing original draft,
Probabilistic Timing Analysis (HyPTA) combines both statistical and Methodology, Funding acquisition, Formal analysis, Conceptualization.
analytical approaches, e.g., by taking measurements at the level of basic Jin-Long Zhang: Writing original draft, Visualization, Software, Data
blocks or sub-paths, and then composing the results using structural curation.
information obtained from static analysis of the code.
Number of discrete value (𝐾) of pWCET 𝑖 . The value of 𝐾 Declaration of competing interest
determines the granularity of modeling the pWCETs PMF: larger 𝐾
implies finer granularity modeling, but may not be well-supported by
The authors declare that they have no known competing finan-
timing analysis techniques, and also leads to higher computational costs
cial interests or personal relationships that could have appeared to
in schedulability analysis. The typical value of 𝐾 is 2-8 [5], although
influence the work reported in this paper.
there is no hard lower or upper bound on its value. Our experiments
with 𝐾 varying from 4 to 8 indicate that its value does not affect
system schedulability and power consumption significantly, indicating Acknowledgments
that 𝐾 = 4 already provides sufficiently fine granularity modeling
under our experimental setup. This work has been supported by the Natural Science Foundation
PMF of pWCET 𝑖 . In the absence of real industry tasksets, we of Fujian Province of China under Grant 2023J01139 and the Funda-
need to generate each tasks pWCET 𝑖 synthetically, as defined by mental Research Funds for the Central Universities, China under Grant
the PMF. There is no clear consensus on the generation method in the ZQN-1009.
literature on probabilistic schedulability analysis. An early work Edgar
and Burns [50] used the trimmed and scaled Gumbel distribution to Data availability
model likely WCET values; Draskovic [36] used the Weibull distribution
with an upper bound, which was used for modeling the distribution No data was used for the research described in the article.
of long but unlikely execution times based on EVT [51] (the Log of a
Weibull distribution is a Gumbel distribution); Wang et al. [46] and
Markovic et al. [47] adopted the uniform random distribution; Bozhko References
et al. [52] assumed two execution modes for each task in an MCS: a
typical mode and a rare exceptional mode. Its pWCET is equal to 𝑐 [1] Alan Burns, Robert Ian Davis, Mixed criticality systems-a review:(february 2022),
with probability .95 (the typical mode), and 4𝑐 with probability .05 2022, pp. 197, https://eprints.whiterose.ac.uk/183619/.
[2] Steve Vestal, Preemptive scheduling of multi-criticality systems with varying
(the exceptional mode), where 𝑐 was scaled to match the expected task
degrees of execution time assurance, in: 28th IEEE International Real-Time
utilization. In this paper, we adopt the simple approach of the uniform Systems Symposium, RTSS 2007, IEEE, 2007, pp. 239243.
random distribution similar to [46,47]. [3] Yi-Wen Zhang, Jin-Peng Ma, Hui Zheng, Zonghua Gu, Criticality-aware EDF
Runtime overhead of DVFS. The overhead of varying the pro- scheduling for constrained-deadline imprecise mixed-criticality systems, IEEE
cessor speed with DVFS is assumed to be zero. This is a common Trans. Comput.-Aided Des. Integr. Circuits Syst. 43 (2) (2024) 480491.
[4] Yi-Wen Zhang, Hui Zheng, Slack time management for imprecise mixed-criticality
assumption adopted in the DVFS literature [7]. We can determine
systems with reliability constraints, IEEE Trans. Comput. (2025).
through offline measurement an upper bound on the processor speed [5] Robert I. Davis, Liliana Cucu-Grosjean, A survey of probabilistic schedulability
transition overhead, which is typically relatively small compared to the analysis techniques for real-time systems, Leibniz Trans. Embed. Syst. 6 (1)
WCET of the task, hence it can be added to each tasks execution time (2019) 04:104:53.
without a significant impact on the solution. [6] Yi-Wen Zhang, Rong-Kun Chen, A survey of energy-aware scheduling in
Multiprocessor platforms. Our work can be easily extended to mixed-criticality systems, J. Syst. Archit. 127 (2022) 102524.
[7] Ashikahmed Bhuiyan, Federico Reghenzani, William Fornaciari, Zhishan Guo,
multi-processor platforms by a partitioned scheduling approach [31,32,
Optimizing energy in non-preemptive mixed-criticality scheduling by exploiting
53]. In partitioned scheduling, tasks are statically assigned to proces- probabilistic information, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.
sors, with each processor managed by a local scheduler. We can use 39 (11) (2020) 39063917.
simple allocation methods, e.g., Criticality-unaware worst-fit decreas- [8] Yi-Wen Zhang, Chen Ouyang, Semi-clairvoyant scheduling in non-preemptive
ing (CU-WFD), and criticality-aware first-fit decreasing (CA-FFD), to fixed-priority mixed-criticality systems, J. Syst. Archit. 159 (2025) 103332.
[9] Qingling Zhao, Mengfei Qu, Zonghua Gu, Haibo Zeng, Minimizing stack memory
allocate tasks to each processor while using an Energy-Efficient Task
for partitioned mixed-criticality scheduling on multiprocessor platforms, ACM
Execution Model to schedule tasks in each processor. Trans. Embed. Comput. Syst. (TECS) 21 (2) (2022) 130.
[10] Benny Akesson, Mitra Nasri, Geoffrey Nelissen, Sebastian Altmeyer, Robert I
8. Conclusions and future work Davis, A comprehensive survey of industry practice in real-time systems,
Real-Time Syst. (2021) 141.
The classic MCS task model has several restrictive assumptions, [11] Georg von der Brüggen, Nico Piatkowski, Kuan-Hsun Chen, Jian-Jia Chen,
Katharina Morik, Björn B Brandenburg, Efficiently approximating the worst-
including hard real-time constraints, dropping LO tasks in HI mode,
case deadline failure probability under EDF, in: 2021 IEEE Real-Time Systems
and lack of consideration of power/energy consumption issues. In Symposium, RTSS, IEEE, 2021, pp. 214226.
this paper, we relax these assumptions to make the MCS task model [12] Alexandre Esper, Geoffrey Nelissen, Vincent Nélis, Eduardo Tovar, An industrial
more practically applicable. We consider an IMC taskset scheduled view on the common academic understanding of mixed-criticality systems,
with the EDF algorithm on a uniprocessor platform, and propose an Real-Time Syst. 54 (3) (2018) 745795.
[13] Sanjoy Baruah, Alan Burns, Implementing mixed criticality systems in ADA, in:
Energy-Efficient Task Execution Model that guarantees (deterministic
International Conference on Reliable Software Technologies, Springer, 2011, pp.
or probabilistic) schedulability, allows degraded QoS to LO tasks in HI 174188.
mode, and applies DVFS to save energy. [14] Sanjoy K. Baruah, Alan Burns, Robert I. Davis, Response-time analysis for mixed
In this paper, we have considered EDF-based uniprocessor schedul- criticality systems, in: 2011 IEEE 32nd Real-Time Systems Symposium, IEEE
ing, dual-criticality MCS, and task execution time as probabilistic vari- Computer Society, 2011, pp. 3443.
ables. As part of future work, these assumptions can be further relaxed [15] François Santy, Gurulingesh Raravi, Geoffrey Nelissen, Vincent Nelis, Pratyush
Kumar, Joël Goossens, Eduardo Tovar, Two protocols to reduce the critical-
to fixed-priority scheduling, multi-processor platforms, multiple crit- ity level of multiprocessor mixed-criticality systems, in: Proceedings of the
icality levels, and the multiple task parameters (e.g., task period) 21st International Conference on Real-Time Networks and Systems, 2013, pp.
represented by random variables. 183192.
9
Y.-W. Zhang and J.-L. Zhang Journal of Systems Architecture 160 (2025) 103361
[16] Sanjoy Baruah, Vincenzo Bonifaci, Gianlorenzo DAngelo, Haohan Li, Alberto [39] Yifeng Guo, Dakai Zhu, Hakan Aydin, Jian-Jun Han, Laurence T Yang, Exploit-
Marchetti-Spaccamela, Suzanne Van Der Ster, Leen Stougie, The preemptive ing primary/backup mechanism for energy efficiency in dependable real-time
uniprocessor scheduling of mixed-criticality implicit-deadline sporadic task sys- systems, J. Syst. Archit. 78 (2017) 6880.
tems, in: 2012 24th Euromicro Conference on Real-Time Systems, IEEE, 2012, [40] Yi-Wen Zhang, System level fixed priority energy management algorithm for
pp. 145154. embedded real time application, Microprocess. Microsyst. 64 (2019) 170177.
[17] Di Liu, Nan Guan, Jelena Spasic, Gang Chen, Songran Liu, Todor Stefanov, Wang [41] Yi-Wen Zhang, Chu-Gui Xu, Low power fixed priority scheduling sporadic task
Yi, Scheduling analysis of imprecise mixed-criticality real-time tasks, IEEE Trans. with shared resources in hard real time systems, Microprocess. Microsyst. 45
Comput. 67 (7) (2018) 975991. (2016) 164175.
[18] Hang Su, Nan Guan, Dakai Zhu, Service guarantee exploration for mixed- [42] Wei Jiang, Xiong Pan, Ke Jiang, Liang Wen, Qi Dong, Energy-aware design of
criticality systems, in: 2014 IEEE 20th International Conference on Embedded stochastic applications with statistical deadline and reliability guarantees, IEEE
and Real-Time Computing Systems and Applications, IEEE, 2014, pp. 110. Trans. Comput.-Aided Des. Integr. Circuits Syst. 38 (8) (2019) 14131426.
[19] Robert I. Davis, Alan Burns, Iain Bate, Compensating adaptive mixed criticality [43] Yi-Wen Zhang, Hui Zheng, Energy-aware fault-tolerant scheduling for imprecise
scheduling, in: Proceedings of the 30th International Conference on Real-Time mixed-criticality systems with semi-clairvoyance, J. Syst. Archit. 151 (2024)
Networks and Systems, Association for Computing Machinery, 2022, pp. 8193. 103141.
[20] Zhe Jiang, Xiaotian Dai, Alan Burns, Neil Audsley, Zonghua Gu, Ian Gray, A [44] Yi-Wen Zhang, Hui Zheng, Energy-aware reliability guarantee scheduling with
high-resilience imprecise computing architecture for mixed-criticality systems, semi-clairvoyant in mixed-criticality systems, J. Syst. Archit. 156 (2024) 103269.
IEEE Trans. Comput. (2022). [45] Baoxian Zhao, Hakan Aydin, Dakai Zhu, Energy management under general
[21] Yi-Wen Zhang, Rui-Feng Guo, Low-power scheduling algorithms for sporadic task-level reliability constraints, in: 2012 IEEE 18th Real Time and Embedded
task with shared resources in hard real-time systems, Comput. J. 58 (7) (2015) Technology and Applications Symposium, IEEE, 2012, pp. 285294.
15851597. [46] Tianyi Wang, Soamar Homsi, Linwei Niu, Shaolei Ren, Ou Bai, Gang Quan,
[22] Pengcheng Huang, Pratyush Kumar, Georgia Giannopoulou, Lothar Thiele, En- Meikang Qiu, Harmonicity-aware task partitioning for fixed priority scheduling
ergy efficient dvfs scheduling for mixed-criticality systems, in: 2014 International of probabilistic real-time tasks on multi-core platforms, ACM Trans. Embed.
Conference on Embedded Software, EMSOFT, IEEE, 2014, pp. 110. Comput. Syst. (TECS) 16 (4) (2017) 121.
[23] Yi-Wen Zhang, Energy-aware mixed-criticality sporadic task scheduling algo- [47] Filip Markovic, Thomas Nolte, Alessandro Vittorio Papadopoulos, Analytical
rithm, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 40 (1) (2021) approximations in probabilistic analysis of real-time systems, in: Proceedings of
7886. the 43rd IEEE Real-Time Systems Symposium, RTSS, IEEE, 2022.
[24] Yi-Wen Zhang, Rong-Kun Chen, Energy aware fixed priority scheduling in [48] Jonah Caplan, Zaid Al-Bayati, Haibo Zeng, Brett H. Meyer, Mapping and
mixed-criticality systems, Comput. Stand. Interfaces 83 (2023) 103671. scheduling mixed-criticality systems with on-demand redundancy, IEEE Trans.
[25] Yi-Wen Zhang, Energy efficient non-preemptive scheduling of imprecise Comput. 67 (4) (2017) 582588.
mixed-criticality real-time tasks, Sustain. Comput.: Inform. Syst. 37 (2023) [49] Robert I. Davis, Liliana Cucu-Grosjean, A survey of probabilistic timing analysis
100840. techniques for real-time systems, LITES: Leibniz Trans. Embed. Syst. (2019) 160.
[26] Yi-Wen Zhang, Ning Cai, Energy efficient EDF-VD-based mixed-criticality [50] Stewart Edgar, Alan Burns, Statistical analysis of WCET for scheduling, in:
scheduling with shared resources, J. Syst. Archit. 119 (2021) 102246. Proceedings 22nd IEEE Real-Time Systems Symposium (RTSS 2001)(Cat. No.
[27] Y.-W. Zhang, Energy aware algorithm based on actual utilization for periodic 01PR1420), IEEE, 2001, pp. 215224.
tasks in mixed-criticality real-time systems, Comput. Stand. Interfaces 79 (2022) [51] Liliana Cucu-Grosjean, Luca Santinelli, Michael Houston, Code Lo, Tullio Var-
103563. danega, Leonidas Kosmidis, Jaume Abella, Enrico Mezzetti, Eduardo Quinones,
[28] Yi-Wen Zhang, DVFS-based energy-aware scheduling of imprecise mixed- Francisco J Cazorla, Measurement-based probabilistic timing analysis for multi-
criticality real-time tasks, J. Syst. Archit. 137 (2023) 102849. path programs, in: 2012 24th Euromicro Conference on Real-Time Systems, IEEE,
[29] Sujay Narayana, Pengcheng Huang, Georgia Giannopoulou, Lothar Thiele, 2012, pp. 91101.
R Venkatesha Prasad, Exploring energy saving for mixed-criticality systems [52] Sergey Bozhko, Georg von der Brüggen, Björn Brandenburg, Monte carlo
on multi-cores, in: 2016 IEEE Real-Time and Embedded Technology and response-time analysis, in: IEEE 42nd Real-Time Systems Symposium, IEEE, 2021,
Applications Symposium, RTAS, IEEE, 2016, pp. 112. pp. 342355.
[30] Behnaz Ranjbar, Tuan D.A. Nguyen, Alireza Ejlali, Akash Kumar, Power-aware [53] Yi-Wen Zhang, Rong-Kun Chen, Energy-efficient scheduling of imprecise mixed-
runtime scheduler for mixed-criticality systems on multicore platform, IEEE criticality real-time tasks based on genetic algorithm, J. Syst. Archit. 143 (2023)
Trans. Comput.-Aided Des. Integr. Circuits Syst. 40 (10) (2021) 20092023. 102980.
[31] Yi-Wen Zhang, Rong-Kun Chen, Zonghua Gu, Energy-aware partitioned schedul-
ing of imprecise mixed-criticality systems, IEEE Trans. Comput.-Aided Des. Integr.
Circuits Syst. 42 (11) (2023) 37333742. Yi-Wen Zhang (Senior Member, IEEE) received his Ph.D
[32] Yi-Wen Zhang, Jin-Peng Ma, Zonghua Gu, Partitioned scheduling with shared in Computer Application Technology from University of Chi-
resources on imprecise mixed-criticality multiprocessor systems, IEEE Trans. nese Academy of Sciences in 2016. He was a Post-doctoral
Comput.-Aided Des. Integr. Circuits Syst. 44 (1) (2025) 6576. Fellow with Shenyang Institute of Computing Technology,
[33] Luca Santinelli, Laurent George, Probabilities and mixed-criticalities: the Chinese Academy of Sciences from 2017 to 2019.
probabilistic c-space, in: Proceedings of WMC, 2015. He has been an associate professor since 2020. He is
[34] Dorin Maxim, Robert I Davis, Liliana Cucu-Grosjean, Arvind Easwaran, Prob- named in the worlds top 2% of Scientists List 2023 and
abilistic analysis for mixed criticality systems using fixed priority preemptive 2024 by Stanford University. His current research interests
scheduling, in: Proceedings of the 25th International Conference on Real-Time include real-time systems and low-power design.
Networks and Systems, 2017, pp. 237246.
[35] Jasdeep Singh, Luca Santinelli, Federico Reghenzani, Konstantinos Bletsas,
Zhishan Guo, Non-preemptive scheduling of periodic mixed-criticality real-time Jin-Long Zhang received the B.E. degree in Software En-
systems, in: Proceedings of the 10th European Congress on Embedded Real-Time gineering from Jiangxi Agricultural University in 2023. He
Systems, ERTS 2020, IEEE, 2020. is currently pursuing the MS degree in Huaqiao University.
[36] Stefan Draskovic, Rehan Ahmed, Pengcheng Huang, Lothar Thiele, Schedulability His current research interests include real-time systems and
of probabilistic mixed-criticality systems, Real-Time Syst. 57 (4) (2021) 397442. low power design.
[37] Zhishan Guo, Sudharsan Vaidhun, Luca Satinelli, Samsil Arefin, Jun Wang,
Kecheng Yang, Mixed-criticality scheduling upon permitted failure probability
and dynamic priority, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 41
(1) (2021) 6275.
[38] Kuan-Hsun Chen, Mario Günzel, Georg von der Brüggen, Jian-Jia Chen, Critical
instant for probabilistic timing guarantees: Refuted and revisited, in: 2022 IEEE
Real-Time Systems Symposium, RTSS, IEEE, 2022, pp. 145157.
10

View File

@@ -0,0 +1,70 @@
Embedded
Software Design
Journal of Systems Architecture
The EUROMICRO Journal
Editor-in-Chief
Dr. Zonghua Gu
Department of Computer Science, Hofstra University, USA
Subject Area Editors W. Meng
L. Almeida Technical University of Denmark, Lyngby, Denmark
Faculdade de Engenharia, Dept. of Electrical and Computer Engineering, M. Nasri
Universidade do Porto, Porto, Portugal Department of Mathematics and Computer Science, Eindhoven University of
J.H. Anderson Technology, Eindhoven, the Netherlands
Dept. of Computer Science, University of North Carolina at Chapel Hill, G. Palermo
Chapel Hill, North Carolina, USA Department of Electronics Information and Bioengineering,
P. Bellavista Polytechnic University of Milan, Italy
Dept. Computer Science and Engineering (DISI), Alma Mater Studiorum, L. Palopoli
Università di Bologna, Bologna, Italy Dipartimento di Ingegneria e Scienza dellInformazione (DISI),
C.-S. Bouganis Università di Trento, Povo (Trento), Italy
South Kensington Campus, Department of Electrical and Electronic S. Ren
Engineering, Imperial College London, London, England, UK Department of Electrical and Computer Engineering, San Diego State University,
L. Cassano USA
Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, S. Sarangi
Italy Department of Computer Science and Engineering, Indian Institute of
G. Chen Technology Delhi, India
School of Computer Science and Engineering, Sun Yat-sen University, M. Schoeberl
Guangzhou, China DTU Informatics, Danmarks Tekniske Universitet (DTU), Richard Petersens
M. García-Valls Plads, Kongens Lyngby, Denmark
Departamento de Ingeniería Telemática, Universidad Carlos III de Madrid, Z. Shao
Leganés, Madrid, Spain Dept. of Computing, The Hong Kong Polytechnic University, Hong Kong
C. Gill M. Staron
Department of Computer Science and Engineering, Washington University, USA Computer Science and Engineering, University of Gothenburg,
A. Gokhale Gothenburg, Sweden
Dept. of Electrical Engineering and Computer Science, Vanderbilt University, F. Tramarin
Nashville, Tennessee, USA Dip. Gestione e Tecnica dei Sistemi Industriali (DTG), Università degli Studi di
N. Guan Padova, Vicenza, Italy
Dept. of Computing, The Hong Kong Polytechnic University, Hong Kong M.A. Vega-Rodriguez
J. Hu ARCO Research Group, Dept. Technologies of Computers & Communications,
Department of Electrical and Computer Engineering, University of Pittsburgh, USA Universidad de Extremadura, Escuela Politecnica. Campus Universitario,
Y. Jiang Cáceres, Spain
School of Software, Tsinghua University, China S. Wan
H. Kapoor School of Information and Safety Engineering, Zhongnan University of
Department of Computer Science and Engineering, Indian Institute of Technology Economics and Law, China
Guwahati, India H. Wu
A. Kritikakou Center for Applied Mathematics, Tianjin University, China
University of Rennes, Inria, Irisa and CNRS, France G. Xie
F. Li College of Computer Science and Electronic Engineering, Hunan University,
School of Computer Science and Engineering, University of Electronics Science and Changsha, China
Technology of China, China W. Xu
S. Li Zhejiang University College of Electrical Engineering, Hangzhou, China
College of Computer Science, Zhejiang University Hangzhou, China H. Zeng
G. Lima Virginia Tech, Blacksburg, Virginia, USA
Instituto de Matematica, Departamento de Ciencia da Computacao, Y. Zhang
Federal University of Bahia, Salvador, Bahia, Brazil Department of Computer Science, University of Pittsburgh,
M. Lin Pittsburgh, Pennsylvania, USA
Department of Computer Science, St. Francis Xavier University, Canada Q. Zhao
G. Lipari Nanjing University of Science and Technology, Nanjing, China
Ecole Normale Superieure (ENS) de Cachan, Cachan, France N. Zheng
D. Liu Qiushi Academy for Advanced Studies, Zhejiang University, Hangzhou, China
College of Computer Science and Technology, Chongqing University, Chongqing, J. Zhou
China Department of Computer Science and Technology, Nanjing University of Science
W. Liu and Technology, China
School of Computer Science and Engineering, Nanyang Technological University, D. Zhu
Singapore Dept. of Computer Science, University of Texas at San Antonio, San Antonio,
L. Lo Bello Texas, USA
Dipart. di Ingegneria Elettrica Elettronica e Informatica (DIEEI),
Università degli Studi di Catania, Catania, Italy

View File

@@ -0,0 +1,999 @@
Computer Standards & Interfaces 97 (2026) 104112
Contents lists available at ScienceDirect
Computer Standards & Interfaces
journal homepage: www.elsevier.com/locate/csi
Efficient and secure multi-user 𝑘NN queries with dynamic POIs updating
Yining Jia a,b,c , Yali Liu a,b,c ,, Congai Zeng a,b,c , Xujie Ding a,b,c , Jianting Ning d,e
a
School of Artificial Intelligence and Computer Science, Jiangsu Normal University, Xuzhou, Jiangsu Province, 221116, China
b
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu Province, 210023, China
c Guangxi Key Laboratory of Cryptography and Information Security, Guilin University of Electronic Technology, Guilin, Guangxi Province, 541004, China
d School of Cyber Science and Engineering, Wuhan University, Wuhan, Hubei Province, 430072, China
e Faculty of Data Science, City University of Macau, 999078, Macao Special Administrative Region of China
ARTICLE INFO ABSTRACT
Keywords: The 𝑘-nearest neighbors (𝑘NN) query is a key operation in spatial and multimedia databases, which is widely
Cloud computing applied in fields such as electronic healthcare and Location-Based Services (LBS). With the rapid development
Security of cloud computing, uploading private data of Data Owner (DO) to Cloud Servers (CS) has become a trend.
kNN queries
However, existing 𝑘NN queries schemes are not designed for multi-user environments, cannot timely update
Dynamic POIs updating
the points of interest (POIs) stored in CS, and suffer from low query efficiency. Therefore, this paper proposes
efficient and secure multi-user 𝑘NN queries with dynamic POIs updating, named DESM𝑘NN, which achieves
secure multi-user 𝑘NN queries. To improve query efficiency, DESM𝑘NN adopts a two-stage search framework,
which consists of an initial filtering stage based on hierarchical clustering to effectively constrain the search
range, followed by a more efficient precise search stage. Based on this framework, DESM𝑘NN designs a set of
security protocols for efficient query processing and enables dynamic POIs updates. Meanwhile, DESM𝑘NN not
only utilizes Distributed Two Trapdoors Public-Key Cryptosystem (DT-PKC) to enable multi-user queries but
also ensures data privacy, query privacy, result privacy and access pattern privacy. Moreover, DESM𝑘NN can
verify the correctness and completeness of queries results. Finally, security analysis proves that DESM𝑘NN
meets the formal security definition of multiparty computation, and experimental evaluation shows that
DESM𝑘NN improves query efficiency by up to 45.5% compared with existing 𝑘NN queries scheme.
1. Introduction and LBS systems. Once such information is exposed, it can lead to
privacy leakage, commercial losses, or even public security risks [4].
LBS [13] are increasingly integrated into real-world applications, Therefore, to protect POIs from malicious access or theft by CS and
such as ride-hailing platforms (e.g., Uber, DiDi), navigation systems unauthorized users, DO needs to encrypt them before outsourcing to
(e.g., Google Maps, Baidu Maps), and online food delivery services. CS. In addition, security needs to be considered in query processing to
These services heavily rely on POIs databases to provide personalized maintain efficiency and protect the confidentiality of POIs databases.
and efficient responses to queries of query user (QU). Among various Although 𝑘NN queries have been widely studied in recent years,
query types, the 𝑘NN query [4,5] is one of the most fundamental several limitations still hinder their applicability in practice. First, most
methods, which aims to find the 𝑘 nearest POIs to a given query point. existing schemes [8,9] for 𝑘NN queries are based on static spatial
With the rapid development of cloud computing [6,7], DO increasingly
data [10], where the database remains unchanged within a certain
outsource their POIs databases to CS, which provides scalable storage
time interval. Consistent with this common setting, DESM𝑘NN also
and massive computing resources. Well-known commercial platforms,
assumes that POIs are static during query processing to enable fair
such as Amazon Web Services and Google Cloud Platform, already
performance comparison. However, in practice, POIs may change over
provide such services to support efficient 𝑘NN queries in LBS. Although
time, and their insertion or deletion frequency varies across different
outsourcing databases to CS improves data accessibility and flexibility,
it makes data more susceptible to unauthorized access threats. In prac- areas because these updates are driven by real-world change. In rapidly
tice, POIs often contain sensitive or private information. For instance, developing areas where new facilities emerge or existing ones close
POIs databases may include the locations of hospitals, government frequently, POI updates occur more frequently, whereas in more stable
facilities, or user-related activity areas in intelligent transportation regions, such updates tend to be infrequent. This dynamic updates of
Corresponding author at: School of Artificial Intelligence and Computer Science, Jiangsu Normal University, Xuzhou, Jiangsu Province, 221116, China.
E-mail address: liuyali@jsnu.edu.cn (Y. Liu).
https://doi.org/10.1016/j.csi.2025.104112
Received 12 June 2025; Received in revised form 18 November 2025; Accepted 8 December 2025
Available online 11 December 2025
0920-5489/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Y. Jia et al. Computer Standards & Interfaces 97 (2026) 104112
system construction is introduced. Section 6 presents the specific query
procedure for DESM𝑘NN. Next, Section 7 analyzes computational com-
plexity, communication complexity, and security. Section 8 provides an
experimental evaluation of DESM𝑘NN. Section 9 concludes this paper.
2. Related work
Secure Key-Sharing Query: Wong et al. [11] introduced a 𝑘NN
queries scheme for encrypted data based on ASPE. However, ASPE re-
lied on a secret matrix to transform data points and query points, which
required secret key to be shared among all QUs and DO. Additionally,
ASPE has been proven insecure against known-plaintext attacks [13].
To enhance query security, Elmehdwi et al. [15] developed a set of
two-party computation protocols based on the Paillier cryptosystem.
Although scheme [15] preserved the privacy of query results, QUs hold
DOs private key, and the query efficiency remains low. Moreover,
scheme [16] employed Delaunay triangulation and order-preserving
Fig. 1. Sample of the 𝑘NN query (𝑘 = 2). encryption [18] to accurately solve the secure 𝑘NN problem. Neverthe-
less, the encryption schemes in [16] are symmetric, which also required
DO and QUs to share the key. Cui et al. [8] proposed an efficient,
POIs reflects the continuous changes in the physical environment. As secure, and verifiable 𝑘NN queries scheme, which employed a secure
shown in Fig. 1, 𝑈0 searches for the two nearest neighbors (𝑘 = 2) index structure to ensure data security and result integrity, along with
in a POIs database 𝐷 = {𝑝0 , … , 𝑝7 }. The original 2NN query 𝑄 was a set of novel protocols and verification strategies for various index
{𝑝0 , 𝑝1 }. When a new and closer point 𝑝8 is inserted, the correct 2NN operations. However, the search complexity of scheme [8] was linearly
result becomes {𝑝1 , 𝑝8 }. This example shows that any updates to the related to the database size, which led to a lack of scalability. To
POI database, such as the insertion, modification, or deletion of POIs, address the efficiency issues in [8], Liu et al. [14] introduced a two-
may change the query results. Therefore, dynamic updates must be sup- stage search framework for secure and verifiable 𝑘NN queries, which
ported in outsourced POI databases. Second, existing schemes mostly integrated Edge Servers (ES) into the classic Twin-Cloud model by
use Asymmetric-Scalar-Product-Preserving Encryption (ASPE) [11,12] leveraging adaptive encryption strategies and secure data partition-
or pure homomorphic encryption algorithms to encrypt outsourced ing to optimize query performance. However, both scheme [8] and
data. Unfortunately, ASPE has been demonstrated to be insecure under scheme [14] could not resolve the key-sharing issue.
the known-plaintext attacks [13], and homomorphic operations lead to Secure Multi-User Query: To support multi-user 𝑘NN queries, re-
a significant computational cost. These limitations raise the challenge searchers first focused on multi-key queries. Cheng et al. [17] imple-
of designing an efficient and secure query mechanism. Finally, most mented 𝑘NN queries with multi-key support, where DO and QUs had
solutions [14,15] assume a single-user setting, where all QUs share the their own keys, and each QUs key was not shared with others. How-
same secret key to enable computability of encrypted data across multi- ever, scheme [17] incurred high computational cost and lacked result
user. In practice, the assumption of single-user setting has obvious verification. Subsequently, Liu et al. proposed the DT-PKC [19], which
flaws. Once the unique key of any QUs is leaked, the entire encrypted also allowed different QUs to use different keys during queries. Building
database can be completely decrypted, and the query content may also on the DT-PKC, Cheng et al. [20] and Nayak et al. [21] explored range
be intercepted by the adversary. As illustrated in Fig. 1, in such a single- queries and keyword queries, respectively. Nevertheless, scheme [20]
user setting, 𝑈1 and 𝑈2 can capture the query content and result of 𝑈0 and scheme [21] still suffered from computational cost and the inability
and decrypt them using the same secret key as 𝑈0 . This highlights the to verify results. Cui et al. [9] introduced a method for secure and
need for secure multi-user queries. verifiable 𝑘NN queries by utilizing DT-PKC, which encrypted grid and
To resolve the aforementioned challenges, this paper proposes bucket divisions within the Voronoi diagram to maintain data security,
DESM𝑘NN. The contributions of DESM𝑘NN are as follows: while also introducing a verification strategy to ensure the correctness
and completeness of the query results. However, scheme [9] relied
(1) Dynamic POIs Updating : DESM𝑘NN innovatively designs secure heavily on homomorphic encryption and data packing techniques,
insertion and deletion protocols, which avoids the problem of which led to high computational cost and search complexity. Moreover,
incorrect and incomplete query results. scheme [9] fails to address the issue of dynamic updates for POIs.
(2) Efficient Query: DESM𝑘NN proposes an efficient two-stage In summary, the limitations in the existing 𝑘NN queries schemes
search framework, which improves the query performance. are as follows: (1) The single-user queries schemes have a risk of key
(3) Multi-User Query: DESM𝑘NN designs a series of secure protocols leakage. (2) The multi-user queries schemes have low efficiency. (3)
based on DT-PKC, which achieves secure multi-user 𝑘NN queries. Most existing queries schemes unable to achieve dynamic updates of
(4) Security & Performance: Security analysis shows that the pro- POIs. For ease of exhibition, we summarize the above works in Table
posed DESM𝑘NN is secure. Additionally, experimental evalua- 1.
tion shows that DESM𝑘NN improves query efficiency by up to
45.5% compared with existing 𝑘NN queries scheme on two real
3. Preliminaries
datasets (California Road Network and Points of Interest, San
Francisco Road Network1 ).
3.1. Voronoi diagram
The rest of this paper is structured as follows. Section 2 presents
related work. Section 3 describes preliminaries. The architecture and The Voronoi diagram [22] partitions the plane according to a set of
security model of DESM𝑘NN is defined in Section 4. In Section 5, the points. Each Voronoi Cell (VC) corresponds to a point and contains all
locations that are closer to this point than to any other. Two points are
Voronoi neighbors if their cells share an edge, and the neighbor set of
1
https://users.cs.utah.edu/~lifeifei/SpatialDataset.htm. a point is denoted as 𝑉 𝑁(𝑝).
2
Y. Jia et al. Computer Standards & Interfaces 97 (2026) 104112
Table 1
Summary of existing 𝑘NN query works.
Method Data privacy Query privacy Result privacy Access patterns Verifiable Multi-user POIs updating
√ √
Wong [11] × × × × ×
√ √ √ √
Elmehdwi [15] × × ×
√ √ √
Choi [16] × × × ×
√ √ √ √
Cheng [17] × × ×
√ √ √ √ √
Cui [8] × ×
√ √ √ √
Liu [14] × × ×
√ √ √ √ √ √
Cui [9] ×
Notations: represents the approach satisfies the condition; × represents it fails to satisfy the condition.
DESM𝑘NN introduces hierarchical clustering, which improves both the
organization of spatial objects and the performance of query processing.
As shown in Fig. 3, it presents an R-tree with a fanout of 𝑓 = 2,
which is built from the POIs in 𝑅𝑒𝑐𝑡1 . In this construction, the data are
first grouped by applying hierarchical clustering based on the Euclidean
distance. This process is performed in two rounds, and the resulting
clusters naturally determine the partitioning of the dataset, which is
then used to build the tree structure.
3.3. Distributed two trapdoors public-key cryptosystem
The DT-PKC [19] is a variant of the traditional double trapdoor
decryption cryptosystem. Given a public key 𝑝𝑘, a private key 𝑠𝑘, and
Fig. 2. An example of Voronoi diagram. a strong private key 𝑆𝐾, the cryptosystem supports several algorithms
that enable encryption, decryption, and collaborative key operations.
First, encryption is carried out by the algorithm 𝐸𝑛𝑐. Given a
message 𝑝 ∈ Z𝑁 and the public key 𝑝𝑘, the algorithm outputs the
ciphertext 𝐸𝑝𝑘 (𝑝). The system then allows two types of decryption:
(1) With the private key (𝑠𝑘), the algorithm 𝑊 𝐷𝑒𝑐 takes 𝐸𝑝𝑘 (𝑝) as
input and recovers 𝑝.
(2) With the strong private key (𝑆𝐾), the algorithm 𝑆𝐷𝑒𝑐 also
decrypts 𝐸𝑝𝑘 (𝑝) to obtain 𝑝.
A distinctive feature of DT-PKC lies in the management of the strong
private key. The algorithm 𝑆𝑘𝑒𝑦𝑆 enables the strong private key 𝑆𝐾 to
be split into two partial strong private keys, 𝑆𝐾1 and 𝑆𝐾2 . This splitting
supports a collaborative decryption mechanism in two steps:
(1) In step 1, 𝑃 𝑆𝐷𝑒𝑐1 takes 𝐸𝑝𝑘 (𝑝) and 𝑆𝐾1 as input, which results
in a partially decrypted ciphertext 𝐶𝑇1 .
Fig. 3. R-tree structure based on hierarchical clustering. (2) In step 2, 𝑃 𝑆𝐷𝑒𝑐2 completes the process by using 𝐶𝑇1 and 𝑆𝐾2 ,
which ultimately recovers 𝑝.
For example, given a dataset 𝐷 that contains 16 POIs as shown in 3.4. Advanced comparable inner product encoding
Fig. 2-(b), the Voronoi diagram is shown in Fig. 2-(a). Since 𝑉 𝐶(𝑝8 )
The CIPE𝑠 scheme [25] allows edges to determine whether a value
shares a common edge with 𝑉 𝐶(𝑝𝑖 ) for 𝑖 ∈ {3, 4, 9, 11, 12, 13}, the
lies within a query range based on encrypted data. Compared to the
Voronoi neighbors of 𝑝8 include 𝑉 𝑁(𝑝8 ) = {𝑝3 , 𝑝4 , 𝑝9 , 𝑝11 , 𝑝12 , 𝑝13 }.
original CIPE scheme, CIPE𝑠 enhances security by extending query
Therefore, the search result of a 3NN query is 𝑅𝑒𝑠𝑢𝑙𝑡 = {𝑝9 , 𝑝11 , 𝑝13 }.
vectors into random query matrices, which makes it more resilient to
The Voronoi diagram has two useful properties for 𝑘NN verification:
chosen plaintext attacks.
(1) Given a query point 𝑞, the nearest neighbor of 𝑞 is data point 𝑝, CIPE𝑠 supports several key algorithms for encryption and range
query evaluation. First, the key generation algorithm 𝐺𝑒𝑛𝐾𝑒𝑦 takes a
if 𝑞𝑉 𝐶(𝑝).
security parameter 𝜅 ∈ N as input and outputs a secret key 𝑠𝑘𝑐 . The data
(2) If data points 𝑝1 , … , 𝑝𝑘 are the 𝑘(𝑘 > 1) nearest neighbors of the
encryption algorithm 𝐸𝑛𝑐𝐼 encrypts a plaintext 𝑥 into ciphertext 𝐸𝑐 (𝑥)
query point 𝑞, then 𝑝𝑖 belongs to 𝑉 𝑁(𝑝1 ) 𝑉 𝑁(𝑝𝑖1 ), for
with 𝑠𝑘𝑐 . To perform queries, the query encryption algorithm 𝐸𝑛𝑐𝑄
𝑖 = 2, … , 𝑘.
transforms a query range 𝑄 = [𝑏𝑙 , 𝑏𝑢 ] into an encrypted range 𝐸𝑐 (𝑄).
Finally, the calculation algorithm 𝐶𝑎𝑙 compares the encrypted value
3.2. R-tree index based on hierarchical clustering 𝐸𝑐 (𝑥) with the encrypted query range 𝐸𝑐 (𝑄) and outputs a comparison
result: 1 if 𝑥 < 𝑏𝑙 , 1 if 𝑥 > 𝑏𝑢 , and 0 if 𝑥 ∈ [𝑏𝑙 , 𝑏𝑢 ].
The R-tree index [23] organizes spatial objects into nested rect-
angles, known as Minimum Bounding Rectangles, to enable efficient 4. System architecture and security model
querying of spatial data, such as range queries [24] and nearest neigh-
bor searches. However, the efficiency of the R-tree strongly depends This section introduces the system architecture and security model
on how the data are grouped during construction. To address this, of DESM𝑘NN. A summary of notations is given in Table 2.
3
Y. Jia et al. Computer Standards & Interfaces 97 (2026) 104112
Table 2 verification object 𝑉 𝑂 to the QU (Step 7). The QU then verifies the
Summary of notations. correctness of the result before finalizing the query.
𝐷 A spatial dataset that includes 𝑛 points {𝑃1 , … , 𝑃𝑛 }
𝑉𝐷 Voronoi diagram built from 𝐷
𝑠𝑘𝑐 The secret key for CIPE𝑠 scheme 4.2. Security model
𝑠𝑘0 , 𝑝𝑘0 The secret/public key for DO
𝑠𝑘𝑢 , 𝑝𝑘𝑢 The secret/public key for users DESM𝑘NN is designed to address three security threats. First, CS
𝑆𝐾, 𝑆𝐾1 , 𝑆𝐾2 Strong private key and partial ones
𝑃 𝑆𝐷𝑒𝑐1(𝑆𝐾1 , ) The first step of partial decryption
cannot be fully trusted and may tamper with query results. Second, CS
𝑃 𝑆𝐷𝑒𝑐2(𝑆𝐾2 , , ) The second step of partial decryption may act as honest-but-curious adversaries that attempt to infer sensitive
𝑄, 𝐸𝑐 (𝑄) A query coverage and its encrypted range information from the encrypted data. Third, QUs themselves may be
𝑞, 𝐸𝑝𝑘0 (𝑞) A query point and its encrypted coordinates curious and try to learn the query information of others.
𝑃𝑖 , 𝐸𝑝𝑘0 (𝑃𝑖 ) A POI and its encrypted coordinates
To counter the risk of result tampering, DESM𝑘NN incorporates a
𝑇̂𝑟𝑒𝑒𝑅 , 𝑇 𝑟𝑒𝑒𝑅 The encrypted/clear R-tree index built from 𝐷
̂
𝑃 𝐷, 𝑃 𝐷 The encrypted/clear preprocessed data built from 𝑉 𝐷 verification mechanism that ensures both correctness and complete-
̂ 𝑄 , 𝑅𝑒𝑐𝑡𝑄
𝑅𝑒𝑐𝑡 The encrypted/clear range query generated for 𝑄 ness [27]. Correctness requires that every returned point 𝑝𝑅𝑒𝑠𝑢𝑙𝑡
𝐼𝑅 The immediate result remains unmodified and originates from the authentic database, while
̂ 𝑅𝑒𝑠𝑢𝑙𝑡
𝑅𝑒𝑠𝑢𝑙𝑡, The encrypted/clear result in the exact search phase completeness guarantees that all true 𝑘NN results are included and no
𝐻() A hash function
𝑉𝑂 The verification object
irrelevant points are omitted.
The other two threats are addressed by designing a secure index and
a set of novel secure protocols that jointly preserve multiple dimensions
of privacy [4,28]. Specifically, data privacy ensures that the database
𝐷 remains hidden from the CS; query privacy requires that the content
of a QUs query 𝑆𝑄 is concealed from both the CS and other QUs; result
privacy guarantees that only the QU can access the returned 𝑅𝑒𝑠𝑢𝑙𝑡; and
access-pattern privacy prevents the CS from learning which database
entries satisfy a given query.
It is noteworthy that during system setup stage, CCS is prevented
from compromising or collaborating with CSS. Furthermore, collusion
between CS and QUs must be prevented throughout the query process.
5. DESM𝒌NN construction
This section first introduces an optimized two-stage search frame-
work that supports efficient and secure multi-user 𝑘NN queries with
dynamic POIs updating. Subsequently, several well-designed secure
protocols are proposed to enable private 𝑘NN search operations on the
two-stage search framework.
5.1. Two-stage search framework
Fig. 4. System architecture. DESM𝑘NN adopts a two-stage search framework, which consists of
an initial filtering stage based on hierarchical clustering to effectively
constrain the search range, followed by a precise search stage to
4.1. System architecture achieve efficient querying.
Initial Filtering Stage: DO first preprocesses the dataset by using
DESM𝑘NN employs a two-stage framework: an initial filtering stage hierarchical clustering to construct a suitable 𝑇 𝑟𝑒𝑒𝑅 . Each node in the
on ESs and a precise search stage on dual cloud servers. To protect tree is encrypted by using the CIPE𝑠 .EncI algorithm to ensure security.
privacy, the system adopts a dual-cloud architecture [8,9,14,26], where The 𝑇̂ 𝑟𝑒𝑒𝑅 is then uploaded to ESs. When a QU at position (𝑥𝑞 , 𝑦𝑞 )
collusion-resilient protocols ensure both efficiency and security beyond initiates a query, they define a scope 𝐿 and construct a rectangle 𝑅𝑒𝑐𝑡𝑞
traditional single-cloud settings. As shown in Fig. 4, the architecture centered at (𝑥𝑞 , 𝑦𝑞 ) with edge length 𝐿. Each dimension of 𝑅𝑒𝑐𝑡𝑞 is
involves several entities with distinct roles. encrypted by using the CIPE𝑠 .EncQ algorithm and sent to the nearby
In the setup phase (Step 1), the Certified Authority (CA) generates ̂𝑞 over 𝑇̂
ES. The ES evaluates 𝑅𝑒𝑐𝑡 𝑟𝑒𝑒𝑅 to generate 𝐼𝑅, which efficiently
cryptographic keys: (𝑝𝑘0 , 𝑠𝑘0 ) for the DO, (𝑝𝑘𝑖𝑢 , 𝑠𝑘𝑖𝑢 ) for each QU, and narrows down the candidate objects.
a split strong key (𝑆𝐾1 , 𝑆𝐾2 ), which are respectively assigned to the Precise Search Stage: Once receiving (𝐸𝑝𝑘0 (𝑞), 𝑘) and 𝐼𝑅 from ES,
two cloud servers (CSS and CCS). All public keys are shared among the the dual-cloud servers collaboratively execute secure protocols over the
entities. The DO then prepares the dataset. For sensitive data (Step 2), preprocessed dataset to obtain the exact 𝑘 nearest neighbors (𝑅𝑒𝑠𝑢𝑙𝑡).
it preprocesses 𝑉 𝐷 into 𝑃 𝐷, encrypts 𝑃 𝐷 with DT-PKC to obtain 𝑃 ̂𝐷, The servers also generate a verification object (𝑉 𝑂) and send it with
and uploads it to CSS. For less sensitive data (Step 3), it builds an R-tree the 𝑅𝑒𝑠𝑢𝑙𝑡 back to QU for checking. This stage ensures both accuracy
index 𝑇 𝑟𝑒𝑒𝑅 , encrypts it with CIPE𝑠 , and distributes the encrypted index and security of the 𝑘NN search.
𝑇̂𝑟𝑒𝑒𝑅 to ESs for efficient query filtering.
When a QU issues a query (Step 4), it constructs 𝑆𝑄 = (𝑅𝑒𝑐𝑡 ̂𝑞 , 𝐸𝑝𝑘 5.2. Data pre-processing
0
(𝑞), 𝑘) and sends it to a nearby ES. The ES evaluates 𝑅𝑒𝑐𝑡 ̂𝑞 over
𝑇̂𝑟𝑒𝑒𝑅 , filters candidate results 𝐼𝑅, and forwards them together with To support DESM𝑘NN, DO preprocesses the dataset before outsourc-
(𝐸𝑝𝑘0 (𝑞), 𝑘) to CSS (Step 5). Next, CSS and CCS jointly execute secure ing, which aims to protect sensitive information while retaining the
protocols (Step 6), and return the final result set 𝑅𝑒𝑠𝑢𝑙𝑡 along with a structural relationships required for queries. First, DO constructs a
4
Y. Jia et al. Computer Standards & Interfaces 97 (2026) 104112
Voronoi diagram 𝑉 𝐷 from the dataset 𝐷, and encrypts the coordinates Algorithm 1 Secure Squared Distance Computation
of each POI and query point 𝑞 using DT-PKC. For every POI 𝑝𝑖
Require: CSS has 𝐸𝑝𝑘0 (𝑥1 ), 𝐸𝑝𝑘0 (𝑦1 ), 𝐸𝑝𝑘0 (𝑥2 ), 𝐸𝑝𝑘0 (𝑦2 );
𝑉 𝐷, a unique label 𝑖 = 𝐻(𝑥𝑖 |𝑦𝑖 ) is generated through the SHA-
CSS has 𝑆𝐾1 , 𝑝𝑘0 ; CCS has 𝑆𝐾2 , 𝑝𝑘0 ;
256 hash function, which serves as a compact identifier. Subsequently, Ensure: 𝐸𝑝𝑘0 (|𝑥1 𝑥2 |2 + |𝑦1 𝑦2 |2 );
DO obtains the neighborhood 𝑉 𝑁(𝑝𝑖 ) and its corresponding label set // Calculation in CSS:
𝑉 𝑁(𝑝𝑖 ), then employs DT-PKC to encrypt the packaged 𝑉 𝑁(𝑝𝑖 ) after 1: Choose 4 random numbers 𝑟1 , 𝑟2 , 𝑟4 , 𝑟5 ∈ Z𝑁 ;
applying data packaging technology [29]. This technique helps handle 2: Randomly choose the functionality 𝐹 ∈ {0, 1};
multiple values together, which makes encryption more straightfor- 3: if 𝐹 = 1 then
ward. To guarantee integrity, a signature 𝑆𝐼𝐺𝑝𝑖 = 𝐻(𝐻(𝑝𝑖 )|𝐻(𝑉 𝑁(𝑝𝑖 ))) 4: 𝐸𝑝𝑘0 (𝐴) ← 𝐸𝑝𝑘0 (𝑥1 ) 𝐸𝑝𝑘0 (𝑥2 )𝑁1 ;
is created, where 𝐻(𝑉 𝑁(𝑝𝑖 )) is obtained by hashing all neighbors 5: 𝐸𝑝𝑘0 (𝐵) ← 𝐸𝑝𝑘0 (𝑦1 ) 𝐸𝑝𝑘0 (𝑦2 )𝑁1 ;
together as 6: else if 𝐹 = 0 then
𝐻(𝑉 𝑁(𝑝𝑖 )) = 𝐻(𝐻(𝑝𝑉 𝑁1 )|𝐻(𝑝𝑉 𝑁2 )|...|𝐻(𝑝𝑉 𝑁𝑚𝑎𝑥 )). 7: Swap 𝑥1 with 𝑥2 and 𝑦1 with 𝑦2 ;
8: 𝑎𝐸𝑝𝑘0 (𝐴)𝑟1 , 𝑏𝐸𝑝𝑘0 (𝐵)𝑟2 ;
Intuitively, this signature ensures any tampering with 𝑝𝑖 or its neighbors
9: 𝑎𝑃 𝑆𝐷𝑒𝑐1(𝑆𝐾1 , 𝑎 ), 𝑏𝑃 𝑆𝐷𝑒𝑐1(𝑆𝐾1 , 𝑏 );
can be detected. Since homomorphic encryption requires uniform input
10: Send 𝑎 , 𝑏 , 𝑎 , 𝑏 and 𝐸𝑝𝑘0 (𝐴), 𝐸𝑝𝑘0 (𝐵) to CCS;
length, DO also performs incremental obfuscation: if a POI has fewer // Calculation in CCS:
neighbors than the maximum in 𝑉 𝐷, dummy neighbors are added to 11: Choose a random number 𝑟3 ∈ Z𝑁 ;
conceal the actual degree. Afterward, each POI is represented by a
12: 𝑎𝑃 𝑆𝐷𝑒𝑐2(𝑆𝐾2 , 𝑎 , 𝑎 ), 𝑏𝑃 𝑆𝐷𝑒𝑐2(𝑆𝐾2 , 𝑏 , 𝑏 );
sextuple 13: if 𝑎 > 0 then
14: 𝐸1 ← 𝐸𝑝𝑘0 (𝐴);
(𝐸𝑝𝑘0 (𝑖𝑑), 𝐸𝑝𝑘0 (𝑝𝑖 ), 𝐸𝑝𝑘0 (𝑉 𝑁(𝑝𝑖 )), 𝑖, 𝑉 𝑁(𝑝𝑖 ), 𝑆𝐼𝐺𝑝𝑖 ),
15: else if 𝐸𝑝𝑘0 (𝑟3 ) 𝐸𝑝𝑘0 (𝐴)𝑁1 = 𝐸𝑝𝑘0 (𝑟3 ) then
which combines encrypted attributes, hashed labels, and a verifiable 16: 𝐸1 ← 𝐸𝑝𝑘0 (𝑟3 )0 ;
signature. 17: else
To further protect access pattern privacy, DO divides the sextuple 18: 𝐸1 ← 𝐸𝑝𝑘0 (𝐴)𝑁1 ;
table into buckets [8,9] of size 𝑤, which ensures queries operate over 19: Apply the same steps to 𝑏 to obtain 𝐸2 ;
fixed-size groups instead of revealing individual record access. Since 20: Send 𝐸1 , 𝐸2 to CSS;
the final bucket may not be completely filled, DO pads it with randomly // Calculation in CSS:
generated dummy records, which prevents inference attacks [30,31] 21: 𝑐𝐸1 𝐸𝑝𝑘0 (𝑟4 );
where an adversary could deduce whether two queries target the 22: 𝑐𝑃 𝑆𝐷𝑒𝑐1(𝑆𝐾1 , 𝑐 );
same bucket based on its record count. At this point, DO completes 23: Apply the same steps to 𝐸2 , 𝑟5 to obtain 𝑑 , 𝑑 ;
preprocessing and securely outsources the bucketized sextuples to CSS. 24: Send 𝑐 , 𝑐 , 𝑑 , 𝑑 to CCS;
// Calculation in CCS:
5.3. Secure Square Distance Computation(SSDC) 25: 𝑐𝑃 𝑆𝐷𝑒𝑐2(𝑆𝐾2 , 𝑐 , 𝑐 );
26: 𝑠𝑐 𝑐;
The goal of SSDC is to compute the secure squared distance without 27: Apply the same steps to 𝑑 , 𝑑 to obtain 𝑑, 𝑧;
revealing any valid coordinate information to CSS and CCS. The process 28: Send 𝐸𝑝𝑘0 (𝑠), 𝐸𝑝𝑘0 (𝑧) to CSS;
is shown in Algorithm 1. // Calculation in CSS:
𝑁𝑟4 𝑁𝑟
Initially, CSS randomly chooses 4 random numbers 𝑟1 , 𝑟2 , 𝑟4 , 𝑟5 ∈ 29: 1 ← 𝐸𝑝𝑘0 (𝑠) 𝐸1 𝐸1 4 𝐸𝑝𝑘0 (𝑟4 𝑟4 )𝑁1 ;
𝑁𝑟5 𝑁𝑟
Z𝑁 , and chooses the functionality 𝐹 ∈ {0, 1} (line 12). If 𝐹 = 1, CSS 30: 2 ← 𝐸𝑝𝑘0 (𝑑) 𝐸2 𝐸2 5 𝐸𝑝𝑘0 (𝑟5 𝑟5 )𝑁1 ;
calculates the encrypted coordinate differences 𝐸𝑝𝑘0 (𝐴), 𝐸𝑝𝑘0 (𝐵) (line 31: 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒𝐸𝑝𝑘0 (|𝑥1 𝑥2 |2 + |𝑦1 𝑦2 |2 ) ← 1 2 ;
35). If 𝐹 = 0, the procedure is the same except that the positions
of 𝑥1 and 𝑥2 , as well as 𝑦1 and 𝑦2 , are swapped when computing the
differences (line 67). To mask these values and avoid direct leak- 5.4. Secure Minimum Computation(SMC)
age, CSS applies randomization with 𝑟1 and 𝑟2 (line 8). Subsequently,
CSS partially decrypts the masked values 𝑎 , 𝑏 by using the PSDec1 The goal of SMC is to compare two secure squared distances ob-
function to get 𝑎 , 𝑏 (line 9). Eventually, CSS sends 𝑎 , 𝑏 , 𝑎 , 𝑏 and tained by SSDC, determine the smaller one, and also obtain the corre-
𝐸𝑝𝑘0 (𝐴), 𝐸𝑝𝑘0 (𝐵) to CCS (line 10). sponding 𝑖𝑑𝑚𝑖𝑛 and 𝑚𝑖𝑛 . The process is shown in Algorithm 2.
Upon receiving a series of encrypted values from CSS, CCS chooses To start with, CSS generates 7 random numbers and randomly
a random number 𝑟3 ∈ Z𝑁 and decrypts the encrypted values to obtain selects a functionality 𝐹 , in a manner similar to SSDC (line 12). If
𝑎 and 𝑏 (line 1112). To conceal the sign information of the differences, 𝐹 = 1, CSS masks the differences between the distances, identifiers,
CCS applies a randomized comparison procedure (line 1318). Specifi- and location labels by incorporating random numbers either as mul-
cally, depending on the outcomes of 𝑎 versus 0 and related conditions, tiplicative factors or as exponents (line 310). For example, the key
CCS produces three possible cases and outputs 𝐸1 accordingly; this step
design prevents CSS from learning whether 𝑥1 𝑥2 or 𝑦1 𝑦2 is positive
𝐸𝑝𝑘0 (𝛼) ← (𝐸𝑝𝑘0 (𝑑1 ) 𝐸𝑝𝑘0 (𝑑2 )𝑁1 )𝑟𝛼
or negative. The same process is repeated for 𝑏 to obtain 𝐸2 (line 19).
Finally, CCS returns 𝐸1 , 𝐸2 to CSS (line 20). ensures that CCS cannot infer the exact magnitude of 𝑑1 and 𝑑2 with
Upon receiving a series of encrypted values from CCS, CSS further no less than 1/2 probability, which enables to preserve the magnitude
randomizes 𝐸1 and 𝐸2 with 𝑟4 and 𝑟5 , then partially decrypts them to relationship with semantic security. If 𝐹 = 0, the roles of 𝑑1 and 𝑑2 are
produce (𝑐 , 𝑑 ) and (𝑐 , 𝑑 ), and sends these values to CCS (line 2124). swapped, and the same randomization procedure follows (line 1112).
CCS completes the decryption (line 25), squares the plaintexts to derive After randomization, CSS partially decrypts one of the masked values
𝑠 = 𝑐 2 and 𝑧 = 𝑑 2 (line 2627), and sends back 𝐸𝑝𝑘0 (𝑠), 𝐸𝑝𝑘0 (𝑧) (line to obtain 𝛼1 and sends it together with the corresponding encrypted
28). Finally, CSS combines these ciphertexts through homomorphic terms to CCS (line 1314).
operations to obtain 1 and 2 , and computes the secure squared Upon receiving these values, CCS decrypts 𝛼1 to obtain 𝛼2 (line 15).
distance as 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 = 1 2 . By checking whether the bit-length of 𝛼2 exceeds half modulus size, CCS
5
Y. Jia et al. Computer Standards & Interfaces 97 (2026) 104112
decides whether 𝑑1 or 𝑑2 is smaller, and records this decision in a flag token is then partially decrypted using 𝑆𝐾1 , producing an auxiliary
𝑤 (line 1619). Using 𝑤 and the remaining encrypted values from CSS, value that, together with the token, is stored in a permuted list under
CCS computes three encrypted auxiliary terms that encode the correct a pseudo-random permutation to prevent linkability (line 79). After
selection of the minimum distance, identifier, and label (line 2022). completing all comparisons, CSS sends the resulting table to CCS for
These results, along with 𝑤, are then sent back to CSS (line 23). further processing (line 10).
On the CCS side, the server initializes an empty set and parses
Algorithm 2 Secure Minimum Computation the received tokens (line 1112). Each token is decrypted with 𝑆𝐾2 ,
Require: CSS has 𝐸𝑝𝑘0 (𝑑1 ), 𝐸𝑝𝑘0 (𝑑2 ), 𝐸𝑝𝑘0 (𝑖𝑑1 ), 𝐸𝑝𝑘0 (𝑖𝑑2 ), and whenever a decryption reveals equality between an element of 𝑆 ̂1
𝐸𝑝𝑘0 (1 ), 𝐸𝑝𝑘0 (2 ); and 𝑆̂2 , the corresponding index is added to the set (line 1315). This
CSS has 𝑆𝐾1 , 𝑝𝑘0 ; CCS has 𝑆𝐾2 , 𝑝𝑘0 ; set, containing the indices of overlapping elements, is then returned to
Ensure: 𝐸𝑝𝑘0 (𝑑𝑚𝑖𝑛 ), 𝐸𝑝𝑘0 (𝑖𝑑𝑚𝑖𝑛 ), 𝐸𝑝𝑘0 (𝑚𝑖𝑛 ); CSS (line 16). Finally, CSS uses the inverse permutation to locate the
// Calculation in CSS: original positions and removes the identified elements from 𝑆 ̂1 (line
1: Choose 7 random numbers 𝑟𝛼 , 𝑟𝛽 , 𝑟𝛾 , 𝑟𝛿 , 𝑟𝜖 , 𝑟𝜁 , 𝑟𝜂 ∈ Z𝑁 ; 1719). The remaining encrypted elements constitute the secure set
2: Randomly choose the functionality 𝐹 ∈ {0, 1}; difference 𝑆̂′ , which represents all values in 𝑆1 but not in 𝑆2 (line 20).
3: if 𝐹 = 1 then
Algorithm 3 Secure Set Difference
4: 𝐸𝑝𝑘0 (𝛼) ← (𝐸𝑝𝑘0 (𝑑1 ) 𝐸𝑝𝑘0 (𝑑2 )𝑁1 )𝑟𝛼 ;
5: 𝐸𝑝𝑘0 (𝛽) ← (𝐸𝑝𝑘0 (𝑑1 ) 𝐸𝑝𝑘0 (𝑑2 )𝑁1 𝐸𝑝𝑘0 (𝑟𝛽 )); Require: CSS has two sets of encrypted values
̂1 = {𝐸𝑝𝑘 (𝑥1 ), ..., 𝐸𝑝𝑘 (𝑥𝑀 )};
𝑆
6: 𝐸𝑝𝑘0 (𝛾) ← (𝐸𝑝𝑘0 (𝑑2 ) 𝐸𝑝𝑘0 (𝑑1 )𝑁1 𝐸𝑝𝑘0 (𝑟𝛾 )); 0 0
̂2 = {𝐸𝑝𝑘 (𝑦1 ), ..., 𝐸𝑝𝑘 (𝑦𝑇 )};
𝑆
7: 𝐸𝑝𝑘0 (𝛿) ← (𝐸𝑝𝑘0 (𝑖𝑑1 ) 𝐸𝑝𝑘0 (𝑖𝑑2 )𝑁1 𝐸𝑝𝑘0 (𝑟𝛿 )); 0 0
CSS has 𝑆𝐾1 ; CCS has 𝑆𝐾2 ;
8: 𝐸𝑝𝑘0 (𝜖) ← (𝐸𝑝𝑘0 (𝑖𝑑2 ) 𝐸𝑝𝑘0 (𝑖𝑑1 )𝑁1 𝐸𝑝𝑘0 (𝑟𝜖 ));
Ensure: CSS obtains an encrypted difference set 𝑆̂ ;
9: 𝐸𝑝𝑘0 (𝜁) ← (𝐸𝑝𝑘0 (1 ) 𝐸𝑝𝑘0 (2 )𝑁1 𝐸𝑝𝑘0 (𝑟𝜁 ));
// Calculation in CSS:
10: 𝐸𝑝𝑘0 (𝜂) ← (𝐸𝑝𝑘0 (2 ) 𝐸𝑝𝑘0 (1 )𝑁1 𝐸𝑝𝑘0 (𝑟𝜂 ));
1: Initialize 𝑇 to an empty table;
11: else if 𝐹 = 0 then ̂1 do
2: for the 𝑖th element 𝐸𝑝𝑘0 (𝑥𝑖 ) ∈ 𝑆
12: Swaps the roles of 𝑑1 , 𝑖𝑑1 , 1 with 𝑑2 , 𝑖𝑑2 , 2 .
3: Initialize 𝑡 to an empty list;
13: 𝛼1 ← 𝑃 𝑆𝐷𝑒𝑐1(𝑆𝐾1 , 𝐸𝑝𝑘0 (𝛼)); ̂2 in random order do
4: for all 𝐸𝑝𝑘0 (𝑦𝑗 ) ∈ 𝑆
14: Send 𝛼1 , 𝐸𝑝𝑘0 (𝛼), 𝐸𝑝𝑘0 (𝛽), 𝐸𝑝𝑘0 (𝛾), 𝐸𝑝𝑘0 (𝛿), 𝐸𝑝𝑘0 (𝜖),
5: Generate a random number 𝑟𝑖,𝑗 ;
𝐸𝑝𝑘0 (𝜁), 𝐸𝑝𝑘0 (𝜂) to CSS;
6: 𝑡𝑖,𝑗 [0] ← (𝐸𝑝𝑘0 (𝑥𝑖 ) 𝐸𝑝𝑘0 (𝑦𝑗 )𝑁1 )𝑟𝑖,𝑗 ;
// Calculation in CCS:
7: 𝑡𝑖,𝑗 [1] ← 𝑃 𝑆𝐷𝑒𝑐1(𝑆𝐾1 , 𝑡𝑖,𝑗 [0]);
15: 𝛼2 ← 𝑃 𝑆𝐷𝑒𝑐2(𝑆𝐾2 , 𝐸𝑝𝑘0 (𝛼), 𝛼1 );
8: Append 𝑡𝑖,𝑗 to t;
16: if 𝐿𝑒𝑛𝑔𝑡(𝛼2 ) > 𝐿𝑒𝑛𝑔𝑡(𝑁)2 then
9: 𝑇 [𝜋(𝑖)] ← 𝑡;
17: 𝑤 ← 1;
10: Send 𝑇 to CCS;
18: else
// Calculation in CCS:
19: 𝑤 ← 0;
11: Initialize 𝑉 to an empty set;
20: 𝐸𝑝𝑘0 (𝜃) ← (𝐸𝑝𝑘0 (𝛽)1𝑤 𝐸𝑝𝑘0 (𝛾)𝑤 )𝑁1 ;
12: for 𝑖 ∈ [𝑀] do
21: 𝐸𝑝𝑘0 (𝜗) ← (𝐸𝑝𝑘0 (𝛿)1𝑤 𝐸𝑝𝑘0 (𝜖)𝑤 )𝑁1 ; 13: Parse 𝑇 [𝑖] as (𝑡𝑖,1 , ..., 𝑡𝑖,𝑇 );
22: 𝐸𝑝𝑘0 (𝜄) ← (𝐸𝑝𝑘0 (𝜁)1𝑤 𝐸𝑝𝑘0 (𝜂)𝑤 )𝑁1 ; 14: if ∃𝑡𝑖,𝑗𝑇 [𝑖] ∩ 𝑃 𝑆𝐷𝑒𝑐2(𝑆𝐾2 , 𝑡𝑖,𝑗 [0], 𝑡𝑖,𝑗 [1]) then
23: Send 𝑤, 𝐸𝑝𝑘0 (𝜃), 𝐸𝑝𝑘0 (𝜗), 𝐸𝑝𝑘0 (𝜄) to CSS; 15: Add 𝑖 into set 𝑉 ;
// Calculation in CSS: 16: Send 𝑉 to CSS;
24: if 𝑠 = 𝑤 then // Calculation in CSS:
25: 𝐸𝑝𝑘0 (𝑑𝑚𝑖𝑛 ) = 𝐸𝑝𝑘0 (𝑑2 ) 𝐸𝑝𝑘0 (𝜃) 𝐸𝑝𝑘0 (𝑤)𝑟𝛾 17: for each element 𝑖 in 𝑉 do
(𝐸𝑝𝑘0 (1 𝑤))𝑟𝛽 ; 18: 𝑗 ← 𝜋 1 (𝑖);
26: 𝐸𝑝𝑘0 (𝑖𝑑𝑚𝑖𝑛 ) = 𝐸𝑝𝑘0 (𝑖𝑑2 ) 𝐸𝑝𝑘0 (𝜗) 𝐸𝑝𝑘0 (𝑤)𝑟𝜖 19: Remove the 𝑗th element 𝐸𝑝𝑘0 (𝑥𝑗 ) from 𝑆 ̂1 ;
(𝐸𝑝𝑘0 (1 𝑤))𝑟𝛿 ; ̂′ ← 𝑆̂1 ;
20: 𝑆
27: 𝐸𝑝𝑘0 (𝑚𝑖𝑛 ) = 𝐸𝑝𝑘0 (2 ) 𝐸𝑝𝑘0 (𝜄) 𝐸𝑝𝑘0 (𝑤)𝑟𝜂
(𝐸𝑝𝑘0 (1 𝑤))𝑟𝜁 ;
28: else 5.6. Secure Insertion(SI)
29: Swaps the roles of 𝑑2 , 𝑖𝑑2 , 2 with 𝑑1 , 𝑖𝑑1 , 1 .
To support secure data insertion in databases, DESM𝑘NN innova-
At the end of Algorithm 2, CSS computes 3 encrypted values:
tively proposes a secure insertion protocol. When DO inserts a new POI
𝐸𝑝𝑘0 (𝑑𝑚𝑖𝑛 ), 𝐸𝑝𝑘0 (𝑖𝑑𝑚𝑖𝑛 ), 𝐸𝑝𝑘0 (𝑚𝑖𝑛 ) via homomorphic encryption. The
into the database, two key problems must be addressed.
computation applies to 𝑠 = 𝑤 and 𝑠𝑤 (line 24-29). In this way, the
protocol securely determines the minimum distance and its associated • How to determine the insertion position of the POI?
information without revealing any intermediate values. • How to update 𝑇 𝑟𝑒𝑒𝑅 and 𝑉 𝐷?
5.5. Secure Set Difference(SSD) The first problem can be effectively resolved by CIPE𝑠 . First, DO
generates an insertion query rectangle 𝑅𝑒𝑐𝑡𝑖𝑛𝑠 for the POI to be inserted,
The goal of SSD is to securely compute the set difference between similar to generating a query rectangle 𝑅𝑒𝑐𝑡𝑞 for the query point 𝑞 in the
two encrypted sets, which allows CSS to obtain the elements in 𝑆1 initial filtering stage, where the 𝐿 of the rectangle can be customized.
that are not in 𝑆2 , without exposing any plaintext values. To achieve Then, DO encrypts each dimension of 𝑅𝑒𝑐𝑡𝑖𝑛𝑠 with CIPE𝑠 .EncQ algo-
̂1 and 𝑆
this, CSS holds the encrypted sets 𝑆 ̂2 together with 𝑆𝐾1 , while rithm and sends 𝑅𝑒𝑐𝑡 ̂ 𝑖𝑛𝑠 to ES near the inserted POI. ES will evaluate
CCS holds 𝑆𝐾2 . The protocol begins with CSS initializing an empty the obtained 𝑅𝑒𝑐𝑡̂ ̂
𝑖𝑛𝑠 over 𝑇 𝑟𝑒𝑒𝑅 to obtain the insertion position.
table and iteratively processing each encrypted element in 𝑆 ̂1 (line Once the insertion position is determined, the label of the inserted
12). For each comparison with an element in 𝑆 ̂2 , CSS generates a POI can be added to the 𝑇 𝑟𝑒𝑒𝑅 , thus completing the update of 𝑇 𝑟𝑒𝑒𝑅 .
random blinding factor and constructs a masked comparison token To address the problem of how to update 𝑉 𝐷, the Bowyer-Watson
that conceals the difference between the two values (line 36). This algorithm [32,33] is introduced. The Bowyer-Watson algorithm is an
6
Y. Jia et al. Computer Standards & Interfaces 97 (2026) 104112
incremental method that updates 𝑉 𝐷 by progressively updating the
Delaunay triangulation. When inserting a new point, algorithm first
identifies all the affected triangles, then removes them and reconstructs
the triangulation mesh by using the new point and the boundary of
the cavity, which ensures that the new Delaunay triangulation is valid.
Since 𝑉 𝐷 and Delaunay triangulation are duals, when the Delaunay
triangulation is updated by using the Bowyer-Watson algorithm, 𝑉 𝐷
is updated accordingly. When a new generating point is inserted, the
shape and boundaries of the Voronoi cells are adjusted. Therefore, DO
obtains the updated Voronoi diagram based on the Bowyer-Watson
algorithm and can obtain the encrypted id of the newly inserted POI:
𝐸𝑝𝑘0 (𝑖𝑑𝑖𝑛𝑠 ), the encrypted inserted POI: 𝐸𝑝𝑘0 (𝑝𝑖𝑛𝑠 ), the label of the newly
inserted POI: 𝑖𝑛𝑠 , the encrypted Voronoi neighbors: 𝐸𝑝𝑘0 (𝑉 𝑁(𝑝𝑖𝑛𝑠 )),
the encrypted labels of Voronoi neighbors: 𝐸𝑝𝑘0 (𝑉 𝑁(𝑝𝑖𝑛𝑠 )), and the
signature: 𝑆𝐼𝐺𝑖𝑛𝑠 used for verification. Finally, these six values are Fig. 5. Secure insertion and deletion in R-tree. (For interpretation of the
organized into a tuple and sent to CSS for storage. As shown in Fig. references to color in this figure legend, the reader is referred to the web
version of this article.)
5, the secure insertion in the R-tree is highlighted with green lines.
Algorithm 4 Secure 𝑘NN Query
Require: CSS has 𝐼𝑅, 𝐸𝑝𝑘0 (𝑞), 𝑆𝐾1 ;
CCS has 𝑆𝐾2 ; diagrams. The key idea behind dynamic deletion and update algorithm
Ensure: CSS obtains the encrypted search result 𝑅𝑒𝑠𝑢𝑙𝑡; is that Voronoi diagrams and Delaunay triangulations are dual to each
// Calculations in CSS and CCS: other: the vertices of Delaunay triangles correspond to the vertices of
1: CSS initializes 𝑅, 𝐶, 𝐷𝑒 to empty sets; Voronoi diagram, and the edges of Delaunay triangles correspond to the
2: for each triple (𝐸𝑝𝑘0 (𝑝𝑖 ), 𝐸𝑝𝑘0 (𝑖𝑑𝑖 ), 𝐸𝑝𝑘0 (𝑖 )) ∈ 𝐼𝑅 do edges of Voronoi diagram. The Delaunay triangulation-based Voronoi
3: CSS appends 𝐸𝑝𝑘0 (𝑃𝑖 ) to 𝐶; diagram dynamic deletion and update algorithm leverages the duality
4: CSS with input (𝐶, 𝐸𝑝𝑘0 (𝑞), 𝑆𝐾1 , 𝑝𝑘0 ) and CCS with input (𝑆𝐾2 , 𝑝𝑘0 ) of Delaunay triangles to efficiently update Voronoi diagram. When a
run SSDC protocol, and CSS obtains {𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒1 , ..., 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒|𝐶| }; point is deleted, the corresponding Delaunay triangles are removed,
5: if |𝐶| ≥ 𝑘 then and the algorithm updates the connectivity of affected neighboring
|𝐶|
6: ({𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒𝑖 , 𝐸𝑝𝑘0 (𝑖𝑑𝑖 ), 𝐸𝑝𝑘0 (𝑖 )}𝑖=1 , 𝑆𝐾1 , 𝑝𝑘0 ) as
triangles to maintain the Delaunay condition, which ensures that
input in CSS and CCS with input (𝑆𝐾2 , 𝑝𝑘0 ) run
the triangulation is reconstructed. Then, based on the new Delaunay
SMC protocol, and CSS puts (𝐸𝑝𝑘0 (𝑖𝑑𝑖 ))𝑘𝑖=1
triangulation, Voronoi diagrams boundaries are updated to ensure the
into 𝑅𝑒𝑠𝑢𝑙𝑡;
7: else
correct topological structure of the diagram.
|𝐶| Similarly, DO obtains the updated 𝑉 𝐷 and the labels of affected
8: ({𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒𝑖 , 𝐸𝑝𝑘0 (𝑖𝑑𝑖 ), 𝐸𝑝𝑘0 (𝑖 )}𝑖=1 , 𝑆𝐾1 , 𝑝𝑘0 ) as
input in CSS and CCS with input (𝑆𝐾2 , 𝑝𝑘0 ) run POIs 𝑎𝑓 𝑓 𝑒𝑐𝑡𝑖 , the encrypted Voronoi neighbors 𝐸𝑝𝑘0 (𝑉 𝑁(𝑝𝑎𝑓 𝑓 𝑒𝑐𝑡𝑖 )), the
SMC protocol, and CSS puts (𝐸𝑝𝑘0 (𝑖𝑑1 )) into encrypted labels of Voronoi neighbors 𝐸𝑝𝑘0 (𝑉 𝑁(𝑝𝑎𝑓 𝑓 𝑒𝑐𝑡𝑖 )), and the
𝑅𝑒𝑠𝑢𝑙𝑡 and puts (𝐸𝑝𝑘0 (1 )) into 𝐷𝑒; signature 𝑆𝐼𝐺𝑎𝑓 𝑓 𝑒𝑐𝑡𝑖 used for verification. Finally, these four values are
9: CSS and CCS collaborate to run SCR protocol to get the row organized into a quadruple and sent to CSS, which updates the database
corresponding to the 𝐸𝑝𝑘0 (𝑖𝑑1 ); based on the labels of the affected POIs. As shown in Fig. 5, the secure
10: CSS with input (𝐸𝑝𝑘0 (𝑉 𝑁(𝑝1 )), 𝐷𝑒, 𝑆𝐾1 ) and CCS with input 𝑆𝐾2
deletion in the R-tree is highlighted with red lines.
run SSD protocol, and CSS obtains 𝑉 𝑁 (𝑝1 );
11: for 𝐸𝑝𝑘0 (𝑝𝑗 ) ∈ 𝐸𝑝𝑘0 (𝑉 𝑁(𝑝1 )) ∩ 𝑉 𝑁 (𝑝1 ) do
Algorithm 5 Secure Transformation
12: CSS puts 𝐸𝑝𝑘0 (𝑝𝑗 ) into 𝐶 and 𝐸𝑝𝑘0 (𝑗 ) into 𝐷𝑒;
13: CSS and CCS collaborate to run SSD and SMC protocols to select Require: CSS has 𝐸𝑝𝑘0 (𝑎), 𝑆𝐾1 ;
the POI closest to 𝑞 from 𝐶 again, and removing it from 𝐶; CCS has 𝑆𝐾2 ;
14: CSS inserts 𝐸𝑝𝑘0 (𝑖𝑑2 ) into 𝑅𝑒𝑠𝑢𝑙𝑡; Ensure: CSS obtains 𝐸𝑝𝑘𝑢 (𝑎);
15: while |𝑅| < 𝑘 // Calculations in CSS:
16: Repeat line 9-14; 1: Choose one random number 𝑟 ∈ Z𝑁 ;
2: 𝐸𝑝𝑘0 (𝛼) = 𝐸𝑝𝑘0 (𝑎) 𝐸𝑝𝑘0 (𝑟);
3: 𝛼𝑃 𝑆𝐷𝑒𝑐1(𝑆𝐾1 , 𝐸𝑝𝑘0 (𝛼));
5.7. Secure Deletion(SD)
4: Send 𝐸𝑝𝑘0 (𝛼), 𝛼 to CCS;
// Calculations in CCS:
To support secure data deletion in database, DESM𝑘NN innovatively 5: 𝛼𝑃 𝑆𝐷𝑒𝑐2(𝑆𝐾2 , 𝐸𝑝𝑘0 (𝛼), 𝛼 );
proposes a secure deletion protocol. First, DO generates an deletion 6: Send 𝐸𝑝𝑘𝑢 (𝛼) to CSS;
query rectangle 𝑅𝑒𝑐𝑡𝑑𝑒𝑙 for the POI to be deleted, where the 𝐿 of the // Calculations in CSS:
rectangle can be customized. Then, DO encrypts each dimension of 7: 𝐸𝑝𝑘𝑢 (𝑎) = 𝐸𝑝𝑘𝑢 (𝛼) 𝐸𝑝𝑘𝑢 (𝑟)𝑁1 ;
̂
𝑅𝑒𝑐𝑡𝑑𝑒𝑙 with the CIPE𝑠 .EncQ algorithm and sends 𝑅𝑒𝑐𝑡 𝑑𝑒𝑙 to ES near the
̂
deleted POI. ES will evaluate the obtained 𝑅𝑒𝑐𝑡 ̂
𝑑𝑒𝑙 over 𝑇 𝑟𝑒𝑒𝑅 to obtain
the deletion position.
Once the deletion position is determined, DO sends 𝑑𝑒𝑙 , which is 6. DESM𝒌NN query processing
the label of the POI, to ES near the deleted POI. ES deletes the POI
label from the data at deletion location based on 𝑑𝑒𝑙 sent by DO. At
this point, the deletion update of 𝑇 𝑟𝑒𝑒𝑅 is completed. This section provides a detailed introduction to DESM𝑘NN query
Similar to SI protocol, DESM𝑘NN introduces a Delaunay processing, which consists of two parts: secure 𝑘NN query processing
triangulation-based dynamic deletion and update algorithm for Voronoi and verification processing.
7
Y. Jia et al. Computer Standards & Interfaces 97 (2026) 104112
6.1. Secure 𝑘NN query processing • Verifying completeness: Similar to correctness, completeness is de-
fined as follows: all the points returned are valid solutions to the
Based on comprehensive search framework, DESM𝑘NN proposes a 𝑘NN query, while the points not returned do not correspond to
secure and verifiable query processing strategy, which is divided into the actual answers. First, assume that 𝑝𝑖 represents the 𝑖th nearest
three steps as follows: point to the query point 𝑞 in 𝑅𝑒𝑠𝑢𝑙𝑡. Subsequently, based on the
properties of the Voronoi diagram, 𝑉 𝐶(𝑝𝑖 ) can be derived from
• Step 1. Calculating k nearest neighbors: The specific details and 𝑉 𝑁(𝑝𝑖 ) and 𝑝𝑖 . The specific process is divided into four steps: (1)
procedures are illustrated in Algorithm 4. First, CSS will create determine the coordinates of the neighboring points; (2) calculate
three new sets, which includes the result set 𝑅𝑒𝑠𝑢𝑙𝑡, the candidate the perpendicular bisectors between 𝑝𝑖 and each neighboring
set 𝐶, and the deduplication set 𝐷𝑒 (line 1). After initial filtering point; (3) identify the intersection points of all these perpen-
stage, CSS has 𝐼𝑅 = {(𝐸𝑝𝑘0 (𝑝𝑖 ), 𝐸𝑝𝑘0 (𝑖𝑑𝑖 ), 𝐸𝑝𝑘0 (𝑖 ))}. Next, CSS dicular bisectors, these intersection points form the vertices of
will insert each encrypted POI 𝐸𝑝𝑘0 (𝑝𝑖 ) from 𝐼𝑅 into 𝐶 (line the polygon, which represent the Voronoi cell; (4) connect these
23). Since CSS has already stored the encrypted query point vertices in either a clockwise or counterclockwise order to form
𝐸𝑝𝑘0 (𝑞), the SSDC protocol is executed for each intermediate POI the Voronoi cell surrounding the point 𝑝𝑖 . Thereafter, the final
verification is conducted based on the two important properties
to obtain the secure squared distance between each POI and the
of the Voronoi diagram. The first step is to determine whether 𝑞
query point (line 4). If |𝐶| ≥ 𝑘, which means that the required
lies within 𝑉 𝐶(𝑝1 ). If it does, 𝑝1 is confirmed as the nearest POI;
𝑘 POIs can be found in 𝐼𝑅, CSS and CCS will collaborate to
otherwise, the verification process is terminated immediately.
execute SMC protocol to obtain the desired 𝑘 POIs (line 56). If
The second step is to test each point (except for 𝑝1 ) in 𝑅𝑒𝑠𝑢𝑙𝑡
|𝐶| < 𝑘, CSS and CCS collaborate to execute the SMC protocol
individually, which determines whether 𝑝𝑖 ∈ {𝑉 𝑁(𝑝1 )
to obtain the nearest POI, and insert the corresponding 𝐸𝑝𝑘0 (𝑖𝑑1 )
𝑉 𝑁(𝑝𝑖1 )}, 𝑖 > 1. If it does, 𝑝𝑖 is confirmed as the 𝑖th nearest POI.
into 𝑅𝑒𝑠𝑢𝑙𝑡, and the corresponding 𝐸𝑝𝑘0 (1 ) into 𝐷𝑒 (line 78).
To further get the next nearest neighbor, CSS and CCS collaborate
7. Analysis
to execute the SCR protocol [8,9], to get the row corresponding
to the 𝐸𝑝𝑘0 (𝑖𝑑1 ): 𝐸𝑝𝑘0 (𝑉 𝑁(𝑝1 )), 𝑉 𝑁(𝑝1 ), 𝑆𝐼𝐺𝑝 (line 9). CSS and
1 7.1. Computational complexity
CCS collaborate to execute the SSD protocol, with two input
sets 𝑉 𝑁(𝑝1 ) and 𝐷𝑒. CSS obtains 𝑉 𝑁′ (𝑝1 ) (line 10). If one To verify the efficiency of DESMkNN, we analyze the computational
POI 𝐸𝑝𝑘0 (𝑃𝑗 ) in 𝐸𝑝𝑘0 (𝑉 𝑁(𝑝1 )) also exists in 𝑉 𝑁′ (𝑝1 ), 𝐸𝑝𝑘0 (𝑝𝑗 ) complexity of all four entities involved in the system: DO, QU, ESs, and
is added to 𝐶, and 𝐸𝑝𝑘0 (𝑗 ) is added to 𝐷𝑒 (line 1112). CSS dual-cloud servers. Let 𝑒𝑐 and 𝑑𝑐 denote the encryption and decryption
and CCS collaborate to execute SSD protocol and SMC protocol, operations of CIPE𝑆 , and let 𝑒𝑑𝑡 and 𝑑𝑑𝑡 represent the encryption and
which selects the POI closest to the query point from 𝐶 again decryption operations of DT-PKC.
and removes it from 𝐶 (line 13). CSS inserts 𝐸𝑝𝑘0 (𝑖𝑑2 ), which
corresponds to the obtained point, into 𝑅𝑒𝑠𝑢𝑙𝑡 and checks whether (1) DO: In the data pre-processing stage, DO needs to generate
the content in 𝑅𝑒𝑠𝑢𝑙𝑡 meets the requirements of 𝑘NN queries. If 𝑇 𝑟𝑒𝑒𝑅 and 𝑉 𝐷 based on the database 𝐷. 𝑇 𝑟𝑒𝑒𝑅 and the 𝑃 𝐷
not, S𝑘Q will repeat line 914. generated from 𝑉 𝐷 are encrypted by using CIPE𝑆 and DT-PKC,
respectively. Therefore, the total computational complexity is
• Step 2. Generating verification object : During secure 𝑘NN queries,
𝑂(𝑛)𝑒𝑐 + 𝑂(𝑛 𝑀)𝑒𝑑𝑡 ,
DESM𝑘NN also need to generate 𝑉 𝑂. By collaborating to execute
the SCR protocol, CSS and CCS can obtain 𝐸𝑝𝑘0 (𝑉 𝑁(𝑝𝑖 )) and where 𝑀 represents the maximum number of neighbors in 𝑉 𝐷.
𝑆𝐼𝐺𝑝𝑖 from the row, which corresponds to 𝑝𝑖 . Additionally, al- (2) QU : Due to the key conversion mechanism in Algorithm 5, QU
gorithm 5 enables key conversion, which transforms 𝐸𝑝𝑘0 (𝑉 𝑁(𝑝𝑖 )) only needs to perform a single DT-PKC decryption to obtain the
into 𝐸𝑝𝑘𝑢 (𝑉 𝑁(𝑝𝑖 )). At last, CSS adds 𝐸𝑝𝑘𝑢 (𝑉 𝑁(𝑝𝑖 )) and 𝐸𝑝𝑘𝑢 (𝑆𝐼𝐺𝑝𝑖 ) final result and 𝑉 𝑂. Thus, the computational cost is 𝑂(1)𝑑𝑑𝑡 .
of each result point into 𝑉 𝑂. (3) ESs: The ESs perform initial filtering by evaluating the encrypted
̂𝑞 over the encrypted R-tree 𝑇̂
query rectangle 𝑅𝑒𝑐𝑡 𝑟𝑒𝑒𝑅 to gen-
• Step 3. Returning results and verification object to QU : Based on erate the intermediate result set 𝐼𝑅. Their total computational
secure protocols we proposed, CSS can directly retrieve the final complexity is 𝑂(𝑙𝑜𝑔𝑛 )𝑑𝑐 .
results encrypted with 𝑝𝑘𝑢 in order, without needing an additional (4) Dual-Cloud Servers: The dual-cloud servers undertake the pre-
transformation process. Therefore, CSS puts the final points into cise search stage and therefore incur the highest computational
𝑅𝑒𝑠𝑢𝑙𝑡 and sends it, along with 𝑉 𝑂, to QU. complexity, as this stage requires executing several secure sub-
protocols. Specifically, the SSDC protocol is used to compute
6.2. Verification processing the secure squared distance between the query point 𝑞 and each
POI in the intermediate result set 𝐼𝑅. The SMC protocol is re-
sponsible for comparing encrypted distance values and obtaining
QU utilizes 𝑅𝑒𝑠𝑢𝑙𝑡 and 𝑉 𝑂 to authenticate the correctness and
the corresponding encrypted identifiers and location records. To
completeness of 𝑅𝑒𝑠𝑢𝑙𝑡.
determine the nearest POI among candidates, the SMC proto-
• Verifying correctness: Recall the definition of correctness described col must be executed 𝑛-1 times. In addition, the SSD protocol
in the security model, which means that each returned point computes the set difference between two encrypted sets and
must perform DT-PKC decryption |𝑆 ̂1 | |𝑆
̂2 | times. The overall
𝑝𝑅𝑒𝑠𝑢𝑙𝑡 remains unmodified and is an authentic entry in the
complexity depends on whether the number of candidates in
original database. To verify the correctness of 𝑅𝑒𝑠𝑢𝑙𝑡, QU first de-
𝐼𝑅 is greater than or smaller than 𝑘. When |𝐼𝑅| > 𝑘, the
crypts 𝑉 𝑂 by using his private key 𝑠𝑘𝑢 to obtain {𝑉 𝑁(𝑝𝑖 ), 𝑆𝐼𝐺𝑝𝑖 }.
SkQ protocol repeatedly invokes the SMC protocol to iteratively
Next, QU uses the obtained 𝑉 𝑁(𝑝𝑖 ) to compute 𝐻(𝑉 𝑁(𝑝𝑖 )) and
determine the top-𝑘 POIs, which requires (|𝐼𝑅|1+|𝐼𝑅|𝑘) 𝑘2
further calculates 𝐻(𝐻(𝑝𝑖 )|𝐻(𝑉 𝑁(𝑝𝑖 ))) (the specific method has
executions in total. In this case, the computational complexity of
been detailed in Data Pre-processing). Finally, QU only needs to
the precise search stage is
check whether 𝑆𝐼𝐺𝑝𝑖 matches the computed 𝐻(𝐻(𝑝𝑖 )|𝐻(𝑉 𝑁(𝑝𝑖 )))
to verify correctness. 𝑂(|𝐼𝑅| 𝑘)(𝑒𝑑𝑡 + 𝑑𝑑𝑡 ),
8
Y. Jia et al. Computer Standards & Interfaces 97 (2026) 104112
Table 3
Computational complexity of existing approaches and DESM𝑘NN.
DO QU ES Dual-cloud servers
{
𝑂(|𝐼𝑅| 𝑘)(𝑒𝑑𝑡 + 𝑑𝑑𝑡 )
DESM𝑘NN 𝑂(𝑛)𝑒𝑐 + 𝑂(𝑛 𝑀)𝑒𝑑𝑡 𝑂(1)𝑑𝑑𝑡 𝑂(𝑙𝑜𝑔𝑛 )𝑑𝑐
𝑂(|𝐼𝑅| + 𝑘2 𝑀)𝑒𝑑𝑡 + 𝑂(|𝐼𝑅| + 𝑘 ( 𝑛 + 𝑘 𝑀))𝑑𝑑𝑡
2
MSV𝑘NN [9] 𝑂(𝑚 𝑔 + 𝑛 𝑀)𝑒𝑑𝑡 𝑂(1)𝑑𝑑𝑡 𝑂(𝑘 (𝑛 + 𝑀))𝑒𝑑𝑡 + 𝑂(𝑘 ( 𝑛 + 𝑀))𝑑𝑑𝑡
{
𝑂(|𝐼𝑅| 𝑘)(𝑒𝑝 + 𝑑𝑝 )
SecVKQ [14] 𝑂(𝑛)𝑒𝑐 + 𝑂(𝑛 𝑀)𝑒𝑝 𝑂(1)(𝑒𝑐 + 𝑒𝑝 ) 𝑂(𝑙𝑜𝑔𝑛 )𝑑𝑐
𝑂(|𝐼𝑅| + 𝑘2 𝑀)(𝑒𝑝 + 𝑑𝑝 )
SV𝑘NN [8] 𝑂(𝑚2 𝑔 + 𝑛 𝑀)𝑒𝑝 𝑂(1)𝑑𝑝 𝑂(𝑘 (𝑛 + 𝑀))𝑒𝑝 + 𝑂(𝑘 ( 𝑛 + 𝑀))𝑑𝑝
Notations: Let 𝑛 represents the size of dataset 𝐷, 𝑘 represents the search parameter for 𝑘NN search, and 𝑀 represents the maximal number of Voronoi neighbors. 𝑚 refers to the
number of grids, while 𝑔 represents the maximum number of grid points, as discussed in [8,9].
Table 4 Theorem 1. The DT-PKC cryptosystem described in Section 3 is seman-
Comparison of communication costs (MB) under the setting of 𝐾 = tically secure under the assumed intractability of the DDH problem over
{1024, 2048}. Z 2 . This ensures that ciphertexts produced by DT-PKC reveal no infor-
𝑁
𝑛 DESM𝑘NN MSV𝑘NN mation about the underlying plaintexts, even to computationally bounded
California San Francisco California San Francisco adversaries (The details of the proof can be referred to [19]).
1024 2048 1024 2048 1024 2048 1024 2048
1024 6.1 12.7 5.9 12.3 6.5 13.1 6.1 12.4 Theorem 2 (Composition Theorem [35]). If a protocol is composed of mul-
2048 12.8 27.8 11.9 25.6 14.3 31.4 13.9 30.7 tiple subprotocols, each of which is secure under the simulation paradigm,
and all intermediate values are either random or pseudorandom, the com-
posed protocol is secure. This theorem allows the security of DESM𝑘NN to
When |𝐼𝑅| < 𝑘, the nearest POI is first identified by using |𝐼𝑅|1 be deduced from the security of its individual subprotocols.
SMC comparisons. Next, the SCR protocol is executed to locate
the bucket row containing this POI, after which the remaining Theorem 3 (Security of SSDC). Assuming DT-PKC is semantically se-
𝑘 1 POIs are obtained through the subsequent steps of SkQ. cure, the SSDC subprotocol securely computes encrypted squared distances
In this case, the computational complexity of the precise search between the query point and candidate points in 𝐼𝑅 for semi-honest adver-
stage is saries.
𝑂(|𝐼𝑅| + 𝑘2 𝑀)𝑒𝑑𝑡 + 𝑂(|𝐼𝑅| + 𝑘 ( 𝑛 + 𝑘 𝑀))𝑑𝑑𝑡 . Proof. In SSDC, the cloud servers view consists of the ciphertexts
𝑎 , 𝑏 , 𝑎 , 𝑏 , which are derived from plaintext differences scaled by
where 𝑀 denotes the maximum number of neighbors in the
random factors, and the encrypted comparison results 𝐸1 , 𝐸2 . The sim-
Voronoi diagram. The comparison results between DESM𝑘NN ∏
ulated view 𝑠𝐶𝐶𝑆 (𝑆𝑆𝐷𝐶) is constructed by sampling all elements uni-
and existing secure 𝑘NN query schemes are summarized in Table
3. formly at random from the appropriate domain. The semantic security
of DT-PKC ensures that 𝑎 , 𝑏 , 𝑎 , 𝑏 are computationally indistinguish-
Moreover, The computational complexity of POI insertion and dele- able from the corresponding simulated values (𝑎
𝑠 , 𝑏𝑠 , 𝑎𝑠 , 𝑏𝑠 ). Similarly,
tion in DESM𝑘NN is 𝑂(𝑙𝑜𝑔𝑛 + 𝑙𝑜𝑔(𝑀1 )) on average, which is asymp- the randomized encryption of the comparison outcomes 𝐸1 , 𝐸2 ensures
totically equivalent to 𝑂(𝑙𝑜𝑔(𝑀1 𝑛)). Here, 𝑀1 represents the number that these values are indistinguishable from their simulated counter-
of neighboring POIs affected by the local Voronoi diagram update. parts 𝐸1𝑠 , 𝐸2𝑠 . This demonstrates that the real execution reveals no
This complexity arises from updating the encrypted R-tree and locally additional information beyond what is contained in the input and
maintaining the Voronoi diagram. output, which confirms the security of SSDC. For CSS, the execu-
tion image is 𝐶𝐶𝑆 (𝑆𝑆𝐷𝐶) = {𝐸1 , 𝐸2 }, and the simulated image is
∏𝑠
7.2. Communication complexity 𝐶𝐶𝑆 (𝑆𝑆𝐷𝐶) = {𝐸1𝑠 , 𝐸2𝑠 }. Since 𝐸1 , 𝐸2 are produced by randomized
procedures, they are computationally indistinguishable from 𝐸1𝑠 , 𝐸2𝑠 ,
In this subsection, the communication cost incurred during the which further supports the security argument.
entire query processing is evaluated. As shown in Table 4, it presents
the communication cost of DESM𝑘NN compared with MSV𝑘NN. It is Theorem 4 (Security of SMC). Assuming DT-PKC is semantically secure,
observed that DESM𝑘NN consistently incurs the lowest communication the SMC protocol securely compares encrypted distance values and returns
cost. These experimental results align well with the theoretical analysis. encrypted identifiers or labels.
7.3. Security analysis Proof. In SMC, the servers view contains ciphertexts (𝐸𝑝𝑘0 (𝛼), 𝛼1 , 𝛼2 )
and a local output bit 𝑤. The simulated view 𝑠𝐶𝐶𝑆 (𝑆𝑀𝐶) is obtained
To establish the security of the proposed subprotocol, it is important by sampling all elements randomly. Semantic security guarantees that
to highlight that the semantic security of the DT-PKC cryptosystem has (𝐸𝑝𝑘0 (𝛼), 𝛼1 ) are indistinguishable from their simulated counterparts
been proven in [19]. Additionally, in accordance with the formal secu- (𝐸𝑝𝑘0 (𝛼)𝑠 , 𝛼1𝑠 ). Additionally, 𝛼2 is derived from random coin flips and
rity definition of multiparty computation introduced in [29] and [34], is indistinguishable from 𝛼2𝑠 . The local output bit 𝑤 also matches
the framework of the simulation paradigm proposed in [35] is adopted. the distribution of the simulated 𝑤𝑠 . Hence, the simulated view is
Specifically, the simulation paradigm requires that the view of each computationally indistinguishable from the real view, which confirms
participant in the protocol can be simulated based solely on its input the security of SMC.
and output, which ensures that no participant gains any additional in-
formation from the protocol. In other words, the real execution of each Theorem 5 (Security of DESM𝑘NN). If DT-PKC is semantically secure,
subprotocol is computationally indistinguishable from its simulated DESM𝑘NN is secure under the semi-honest model.
counterpart. For clarity, the SSDC and SMC are formally demonstrated
as examples, and other protocols we proposed can be proven in a Proof. Since each subprotocol (SSDC, SMC, SSD, and others) produces
similar manner. views indistinguishable from their respective simulated views, and all
9
Y. Jia et al. Computer Standards & Interfaces 97 (2026) 104112
Fig. 6. The data processing time with varying parameters.
Fig. 7. Comparison of search time between MSV𝑘NN and DESM𝑘NN on two datasets (𝑘 = 1 to 10).
intermediate values are either DT-PKC ciphertexts or explicitly ran- 8.1. Parameter setting
domized, the composition theorem applies. Consequently, the overall
DESM𝑘NN protocol is secure, ensuring confidentiality of the database, The evaluation of DESM𝑘NN is carried out on a system equipped
privacy of queries, and integrity of computation. with an Intel Core i7-14650HQ processor, clocked at 2.80 GHz, and
16 GB of RAM, which runs Windows 11. For this purpose, the DT-
In DESM𝑘NN, a quantitative security comparison across existing
PKC cryptosystem is implemented by using the JAVA development kit,
methods is not conducted due to significant differences in their threat
models, cryptographic assumptions, and supported functionalities, which forms the core element of the proposed protocol.
which make such evaluation extremely difficult. Instead, DESM𝑘NN In the experiment, the dataset size 𝑛 ranges from 1024 to 2024. The
focuses on formally achieving and proving multiple security properties search parameter 𝑘 is set between 1 and 10. The key size 𝐾 of the DT-
that prior methods do not simultaneously provide. DESM𝑘NN ensures PKC cryptosystem are selected from {1024, 2048, 3072}. These settings
data privacy, query privacy, result privacy, and access patterns privacy, apply to all values of 𝑛, 𝑘, 𝐾 in the experiment. While implementing the
while also supporting result verification, multi-user querying, and MSV𝑘NN and SV𝑘NN schemes, the grid granularity is fixed at 90 and
dynamic updates to the encrypted POIs database in outsourced POIs the cryptographic hash functions are implemented via HMAC-SHA-256.
queries, which prior methods cannot achieve simultaneously.
8.2. Experiment results
8. Experimental evaluation
The following analysis of the experimental results will focus on DO
This section evaluates the computational cost of DESM𝑘NN by us- and Dual-Cloud Servers. It should be noted that the experiment results
ing real-world datasets for spatial databases: California Road Network for the CIPE𝑠 scheme are not included, as its execution time is negligible
and San Francisco Road Network. A comparison is made between compared to the DT-PKC cryptosystem. For example, the CIPE𝑠 scheme
DESM𝑘NN and scheme MSV𝑘NN [9] in different phases. takes less than 1 s to retrieve 𝐼𝑅 from 1 million POIs.
10
Y. Jia et al. Computer Standards & Interfaces 97 (2026) 104112
Fig. 8. Comparison of search time between MSV𝑘NN and DESM𝑘NN on two datasets (𝐾 = 1024 to 3072).
Fig. 9. Comparison of search time between MSV𝑘NN and DESM𝑘NN on two datasets (𝑛 = 1024 to 2024).
Fig. 10. The search time of DESM𝑘NN on two datasets (𝐾 = 1024 to 3072).
• DO: The execution time in data preprocessing are shown in Fig. that in Fig. 7, both datasets (California Road Network and Points
6. The computational cost includes two components: the cost of Interest, San Francisco Road Network) are real-world datasets,
of encrypting 𝑉 𝐷 and the cost of generating 𝑆𝐼𝐺. Experiment where realistic POI distributions result in consistent performance
results show that MSV𝑘NN and SV𝑘NN require additional oper- gaps between DESM𝑘NN and MSV𝑘NN. Moreover, real-world
ations such as grid partition, grid padding, and grid encryption, datasets often exhibit a high density of POIs. Due to the grid
and thus perform worse in this stage. partitioning mechanism, MSV𝑘NN tends to be inefficient when
handling real-world datasets. For example, in the California road
• Dual-Cloud Servers: As shown in Section 7, the execution time network dataset, when setting the fine-grained grid parameter 𝑚
in search stage is influenced by parameters 𝑛, 𝑘, 𝐾. Experiments in MSV𝑘NN to 32 (which is the optimal parameter for MSV𝑘NN),
are conducted under different parameter settings to demonstrate the number of POIs contained within each grid reaches as high as
the effectiveness of DESM𝑘NN. We can observe that the search 108. To utilize data packing techniques, the parameter 𝐾 needs
time of DESM𝑘NN is significantly shorter than MSV𝑘NN, as shown to be adjusted to no less than 4096, which results in extremely
in Figs. 79, primarily because MSV𝑘NN incurs a high computa- high computational costs. However, in DESM𝑘NN, well-designed
tional cost when executing the critical SGC protocol. Please note data structures are employed to regulate the number of POIs
11
Y. Jia et al. Computer Standards & Interfaces 97 (2026) 104112
per partition, which keeps 𝐾 within a reasonable range and References
prevents excessive computational overhead. As shown in Fig. 10,
when 𝐼𝑅 is smaller than the query parameter 𝑘, the query time [1] R. Li, A. Liu, A. Wang, Fast and scalable range query processing with strong
privacy protection for cloud computing, IEEE/ACM Trans. Netw. 24 (4) (2015)
is significantly higher compared to when 𝐼𝑅 exceeds 𝑘, since
23052318.
CS need to perform more calculations related to homomorphic [2] G. Xiao, F. Wu, X. Zhou, K. Li, Probabilistic top-k range query processing for
encryption. For a given scheme, larger values of 𝑘 and 𝑛 increase uncertain databases, J. Intell. Fuzzy Syst. 31 (2) (2016) 11091120.
query time by expanding the search space and raising computa- [3] K. Xue, S. Li, J. Hong, Y. Xue, N. Yu, P. Hong, Two-cloud secure database
tional demands. Likewise, a larger 𝐾 leads to longer plaintexts for for numeric-related SQL range queries with privacy preserving, IEEE Trans. Inf.
Forensic Secur. 12 (7) (2017) 15961608.
encryption, which adds overhead from cryptographic operations. [4] Y. Miao, Y. Yang, X. Li, K.-K.R. Choo, X. Meng, R.H. Deng, Comprehensive survey
on privacy-preserving spatial data query in transportation systems, IEEE Trans.
In general, it can be concluded that DESM𝑘NN not only meets the Intell. Transp. Syst. 24 (12) (2023) 1360313616.
security requirements mentioned in Section 4 but also achieves higher [5] Y. Zhang, B. Wang, Z. Zhao, Verifiable and privacy-preserving 𝑘-NN query
efficiency than scheme MSV𝑘NN in all stages of POI queries, with an scheme with multiple keys, IEEE Trans. Big Data 11 (3) (2024) 14341446.
improvement of up to 45.5%. [6] Q. Liu, Y. Peng, J. Wu, T. Wang, G. Wang, Secure multi-keyword fuzzy searches
with enhanced service quality in cloud computing, IEEE Trans. Netw. Serv.
Manag. 18 (2) (2021) 20462062.
9. Conclusion [7] Q. Liu, Y. Peng, Q. Xu, H. Jiang, J. Wu, T. Wang, T. Peng, G. Wang, S.
Zhang, 𝖬𝖠𝖱𝖲Mars: Enabling verifiable range-aggregate queries in multi-source
This paper proposes efficient and secure multi-user 𝑘NN queries environments, IEEE Trans. Dependable Secur. Comput. 21 (4) (2024) 19942011.
[8] N. Cui, X. Yang, B. Wang, J. Li, G. Wang, SVkNN: Efficient secure and verifiable
with dynamic POIs updating, which preserves the privacy of data,
k-nearest neighbor query on the cloud platform, in: Proc. of ICDE, 2020, pp.
queries, results, access patterns and ensures the results are correct 253264.
and complete in a multi-user environment. Firstly, DESM𝑘NN proposes [9] N. Cui, K. Qian, T. Cai, J. Li, X. Yang, J. Cui, H. Zhong, Towards multi-user,
a two-stage search framework to accelerate query speed. Secondly, secure, and verifiable 𝑘 NN query in cloud database, IEEE Trans. Knowl. Data
DESM𝑘NN designs a series of novel secure protocols and a compact ver- Eng. 35 (9) (2023) 93339349.
[10] H. Xie, Y. Guo, X. Jia, A privacy-preserving online ride-hailing system without
ification strategy to facilitate the operation over the two-stage search involving a third trusted server, IEEE Trans. Inf. Forensics Secur. 16 (2021)
framework. Finally, computational complexity, security analysis and 30683081.
experimental evaluation demonstrate that DESM𝑘NN improves query [11] W. Wong, D. Cheung, B. Kao, N. Mamoulis, Secure kNN computation on
efficiency by up tp 45.5% compared to MSV𝑘NN. In future research, encrypted databases, in: Proc. of SIGMOD, 2009, pp. 139152.
[12] Y. Zhu, R. Xu, T. Takagi, Secure k-NN computation on encrypted cloud data
we plan to study 𝑘NN queries for multi-type POIs to address the
without sharing key with query users, in: Proc. of IWSEC, 2013, pp. 5560.
limitation of single-type POI scenarios, where query results are too [13] B. Yao, F. Li, X. Xiao, Secure nearest neighbor revisited, in: Proc. of ICDE, 2013,
homogeneous. Moreover, we will focus more on exploring the balance pp. 733744.
between security and efficiency. [14] Q. Liu, Z. Hao, Y. Peng, H. Jiang, J. Wu, T. Peng, G. Wang, S. Zhang, SecVKQ:
Secure and verifiable kNN queries in sensorcloud systems, J. Syst. Archit. 120
(2021) 102300.
CRediT authorship contribution statement [15] Y. Elmehdwi, B.K. Samanthula, W. Jiang, Secure k-nearest neighbor query
over encrypted data in outsourced environments, in: Proc. of ICDE, 2014, pp.
Yining Jia: Writing original draft, Software, Methodology, In- 664675.
vestigation, Conceptualization. Yali Liu: Writing review & editing, [16] S. Choi, G. Ghinita, H.-S. Lim, E. Bertino, Secure kNN query processing in
untrusted cloud environments, IEEE Trans. Knowl. Data Eng. 26 (11) (2014)
Resources. Congai Zeng: Writing review & editing. Xujie Ding:
28182831.
Writing review & editing. Jianting Ning: Writing review & editing. [17] K. Cheng, L. Wang, Y. Shen, H. Wang, Y. Wang, X. Jiang, H. Zhong, Secure 𝑘
k-NN query on encrypted cloud data with multiple keys, IEEE Trans. Big Data
Declaration of competing interest 7 (4) (2021) 689702.
[18] A. Boldyreva, N. Chenette, Y. Lee, A. Oneill, Order-preserving symmetric
encryption, in: Proc. of EUROCRYPT, 2009, pp. 224241.
The authors declare that they have no known competing finan-
[19] X. Liu, R.H. Deng, K.-K.R. Choo, J. Weng, An efficient privacy-preserving
cial interests or personal relationships that could have appeared to outsourced calculation toolkit with multiple keys, IEEE Trans. Inf. Forensics
influence the work reported in this paper. Secur. 11 (11) (2016) 24012414.
[20] K. Cheng, Y. Shen, Y. Wang, L. Wang, J. Ma, X. Jiang, C. Su, Strongly secure
and efficient range queries in cloud databases under multiple keys, in: Proc. of
Acknowledgments
INFOCOM, 2019, pp. 24942502.
[21] S.K. Nayak, S. Tripathy, SEMKC: Secure and efficient computation over out-
The authors thank the editor and the reviewers for their comments sourced data encrypted under multiple keys, IEEE Trans. Emerg. Top. Comput.
and suggestions. This work was supported by the National Natural Sci- 9 (1) (2018) 414428.
ence Foundation of China under Grant No. 61702237, No. 62425205, [22] A. Okabe, B. Boots, K. Sugihara, S. Chiu, Spatial tessellations: Concepts and
applications of voronoi diagrams, College Math. J. (2001).
and No. 12441101, the Opening Foundation of State Key Laboratory
[23] Y. Manolopoulos, A. Nanopoulos, A.N. Papadopoulos, Y. Theodoridis, R-Trees:
for Novel Software Technology, Nanjing University under Grant No. Theory and Applications: Theory and Applications, Springer Science & Business
KFKT2025B54, the Science and Technology Planning Foundation of Media, 2006.
Xuzhou City under Grant No. KC22052, the Opening Foundation of [24] N. Cui, D. Wang, H. Zhu, J. Li, J. Xu, X. Yang, Enabling verifiable and secure
range query in multi-user setting under cloud environments, IEEE Trans. Knowl.
Guangxi Key Laboratory of Cryptography and Information Security,
Data Eng. 36 (12) (2024) 81488163.
Guilin University of Electronic Technology under Grant GCIS202114, [25] Q. Liu, S. Wu, S. Pei, J. Wu, T. Peng, G. Wang, Secure and efficient multi-
the Postgraduate Research & Practice Innovation Program of Jiangsu attribute range queries based on comparable inner product encoding, in: Proc.
Normal University under Grant 2024XKT2579, and the University- of CNS, 2018, pp. 19.
Industry Collaborative Education Program of China under Grant No. [26] Y. Zhang, B. Wang, Z. Zhao, Secure k-NN query with multiple keys based on
random projection forests, IEEE Internet Things J. 11 (9) (2023) 1520515218.
202101374001. All authors have read and approved the final version
[27] S. Wu, Q. Li, G. Li, D. Yuan, X. Yuan, C. Wang, ServeDB: Secure, verifiable,
of the manuscript. and efficient range queries on outsourced database, in: Proc. of ICDE, 2019, pp.
626637.
Data availability [28] H.-I. Kim, H.-J. Kim, J.-W. Chang, A secure kNN query processing algorithm
using homomorphic encryption on outsourced database, Data Knowl. Eng. 123
(2019) 101602.
Data will be made available on request. [29] A. Liu, K. Zhengy, L. Liz, G. Liu, L. Zhao, X. Zhou, Efficient secure similarity
computation on encrypted trajectory data, in: Proc. of ICDE, 2015, pp. 6677.
12
Y. Jia et al. Computer Standards & Interfaces 97 (2026) 104112
[30] P. Williams, R. Sion, B. Carbunar, Building castles out of mud: practical access Congai Zeng received her M.Sc. in Electronic Information in
pattern privacy and correctness on untrusted storage, in: Proc. of CCS, 2008, pp. 2024 from Jiangsu Normal University, China. Currently, she
139148. is pursuing the Ph.D. degree in the Faculty of Information
[31] M.S. Islam, M. Kuzu, M. Kantarcioglu, Access pattern disclosure on searchable Technology at Beijing University of Technology, China. Her
encryption: ramification, attack and mitigation, in: Proc. of NDSS, vol. 20, 2012, research interests include Internet of Vehicles security and
p. 12. privacy.
[32] A. Bowyer, Computing dirichlet tessellations, Comput. J. 24 (2) (1981) 162166.
[33] D.F. Watson, Computing the n-dimensional delaunay tessellation with application
to voronoi polytopes, Comput. J. 24 (2) (1981) 167172.
[34] J. Liu, J. Yang, L. Xiong, J. Pei, Secure skyline queries on cloud platform, in:
Proc. of ICDE, 2017, pp. 633644.
[35] A.C.-C. Yao, How to generate and exchange secrets, in: Proc. of Sfcs, 1986, pp.
162167. Xujie Ding received his B.Sc. in Software Engineering in
2023 from Jiangsu Normal University, China. Currently, he
is pursuing the M.Sc. degree in the School of Artificial Intel-
ligence and Computer Science at Jiangsu Normal University,
Yining Jia received his B.Sc. in Computer Science and Tech-
China. His research interests include privacy preservation
nology in 2023 from Nanjing Forestry University, China.
and secure data sharing technology in smart healthcare.
Currently, he is pursuing the M.Sc. degree in the School
of Artificial Intelligence and Computer Science at Jiangsu
Normal University, China. His research interests include
data privacy, query processing, information security.
Jianting Ning received his Ph.D. in 2016 from Shanghai
Jiao Tong University, China. He has been a Research Sci-
entist at the School of Computing and Information Systems,
Singapore Management University, and a Research Fellow at
Yali Liu received her Ph.D. in 2014 from Nanjing Uni-
the National University of Singapore. His research interests
versity of Aeronautics and Astronautics, China. She is a
include applied cryptography and information security. He
senior member of China Computer Federation (CCF). She
is currently a Professor with the School of Cyber Science
has been a Research Scientist at Nanyang Technological
and Engineering, Wuhan University, China, and with Fac-
University, Singapore. She is currently a Professor in the
ulty of Data Science, City University of Macau, China. He
School of Artificial Intelligence and Computer Science at
has published papers in major conferences/journals, such
Jiangsu Normal University, China. Her research interests
as ACM CCS, NDSS, ASIACRYPT, ESORICS, ACSAC, IEEE
include information security, authentication and privacy-
Transactions on Information Forensics and Security, and
preserving technology, blockchain security and privacy,
IEEE Transactions on Dependable and Secure Computing.
vehicular ad hoc networks, cryptographic algorithms and
protocols and their applications in Internet of things and
mobile communication.
13

View File

@@ -0,0 +1,654 @@
Journal of Systems Architecture 160 (2025) 103347
Contents lists available at ScienceDirect
Journal of Systems Architecture
journal homepage: www.elsevier.com/locate/sysarc
Eliminating duplicate writes of logging via no-logging flash translation layer
in SSDs
Zhenghao Yin a , Yajuan Du a ,, Yi Fan a , Sam H. Noh b
a Wuhan University of Technology, Wuhan, 430070, Hubei Province, China
b
Virginia Tech, Blacksburg, 24061-0326, VA, USA
ARTICLE INFO ABSTRACT
Keywords: With the development of high-density flash memory techniques, SSDs have achieved high performance and
Flash memory large capacity. Databases often use logging to ensure transactional atomicity of data updates. However, it
Transaction introduces duplicate writes because of multi-versioning, which significantly weakens the performance and
Flash translation layer
endurance of SSDs. This is also often considered as the main reason for slow response of databases. This
Duplicate writes
paper proposes a novel flash translation layer (FTL) for SSDs, which we refer to as NoLgn-FTL, to reduce
the overhead of logging-induced duplicate writes by exploiting the inherent multi-version feature of flash
memories. Specifically, during a transaction, NoLgn-FTL retains the old data as valid and establishes the
mapping between the new physical addresses and the old physical addresses. Thus, the database can easily
roll back to the old-version data to maintain system consistency when a power failure occurs. To evaluate
NoLgn-FTL, we implement it within FEMU and modify the SQLite database and the file system to make them
compatible with the extended abstractions provided by NoLgn-FTL. Experimental results show that, in normal
synchronization mode, NoLgn-FTL can reduce SSD writes by 20% and improve database performance by 15%
on average.
1. Introduction To investigate the performance of database logging in SSD, this
paper first performs a preliminary study to collect latency that happens
Solid-state drives (SSDs) have been widely adopted in database sys- during WAL-based data updates. We find that WAL takes a larger
tems due to their high performance. Databases employ logging-based proportion of latency than regular data updates, especially for small
methods, such as write-ahead logging (WAL) and rollback journals, to data updates. This inspires us to design a direct update scheme to
ensure the transactional atomicity of multiple data updates. In these alleviate the overhead of duplicate writes by leveraging the out-of-
methods, data is first written to persistent logs before updating the place update feature of flash memory. This feature inherently maintains
original data, which induces duplicate writes [1]. For SSDs, duplicate multiple versions of data upon updates, allowing the database to easily
writes occur in the following manner. First, the updated data and roll back to the previous version of the data in the event of a power
metadata are written into log files in flash memory. Then, due to the failure or system crash, ensuring data consistency without the need for
inherent out-of-place update nature of the SSD [2], the updated data explicit logging.
is written into new flash pages rather than overwriting the original
This paper proposes a no-logging flash translation layer (NoLgn-
ones [3]. Thus, one user data write induces two SSD internal writes
FTL) by reusing old flash data pages. The key idea is to keep the
onto two different flash pages, increasing extra program/erase (P/E)
mapping information of old data during transactions, eliminating the
cycles. This reduces SSD lifespan and degrades overall performance by
need for separate log writes. We establish a mapping table between
consuming write throughput.
new and old physical addresses (called a P2P table) in the RAM of
To address the issue of SSD duplicate writes in logging-based
the flash controller. Meanwhile, the old physical address is written
databases, researchers have proposed data remapping methods. These
methods aim to convert logs directly into new data by modifying the into the out-of-band area of new flash pages, providing a backup
mapping between logical pages (LPs) and physical pages (PPs) in flash of the mapping information. In this way, uncommitted transactions
memory [4,5]. However, dealing with the inconsistency of logging and can be rolled back to the old data version upon power failure, thus
data LPs is challenging during power failures. maintaining consistency. We implement NoLgn-FTL within FEMU and
Corresponding author.
E-mail address: dyj@whut.edu.cn (Y. Du).
https://doi.org/10.1016/j.sysarc.2025.103347
Received 31 October 2024; Received in revised form 15 December 2024; Accepted 18 January 2025
Available online 25 January 2025
1383-7621/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Z. Yin et al. Journal of Systems Architecture 160 (2025) 103347
evaluate it with the SQLite database. Experimental results show that, The write overhead incurred by WAL cannot be overlooked com-
in normal synchronization mode, NoLgn-FTL can reduce SSD writes by pared to directly updating the page. Multiple update operations may
20% and improve database performance by 15% on average, compared be performed on the same data page in the buffer, but during a
to existing methods. Our paper makes the following contributions. checkpoint, the storage engine writes the latest data page to a database
file. Fig. 2 illustrates the storage engine layer writing process. In the
• We conduct a preliminary study that reveals the significant la- example, two concurrent transactions, Transaction1 and Transaction2,
tency impact of logging, compared to pure data updates in modify the database. Transaction1 updates A and B with values 2
databases, motivating the need for a more efficient approach to and 4, while Transaction2 updates A and C with values 3 and 7.
handling duplicate writes. During the first step of the write merging process, the modifications
• We propose a novel SSD FTL, called NoLgn-FTL, which fully made by both transactions are recorded in the WAL file. The WAL file
utilizes the out-of-place update nature of flash memory to largely maintains separate regions for each transaction, capturing the updated
remove duplicate writes caused by database logging. page identifiers and their corresponding values. Consequently, the WAL
• We modify SQLite and integrate NoLgn-FTL in the FEMU simula- file contains two distinct entries: one for Transaction1, documenting
tor. We verify the efficiency of NoLgn-FTL in reducing duplicate the updates to pages A(2) and B(4), and another for Transaction2,
writes and improving database performance through extensive recording the updates to pages A(3) and C(7). In the second step, the
experiments. changes recorded in the WAL file are applied to the database during the
checkpointing process. As both transactions modify page A, the WAL
The rest of this paper is organized as follows. Section 2 introduces
mechanism merges these updates into a single write operation. The
the basics of SSDs and logging methods as well as the motivation of
WAL mechanism consolidates the updates and writes the final value
this paper. Section 3 presents the design of NoLgn-FTL. Section 4 shows
of page A(3) to the database file. A contains the merged value of 3,
the experimental setup and evaluation results of NoLgn-FTL. Section 5
while B and C hold 4 and 7.
reviews existing work, and Section 6 concludes this paper.
2.3. Existing solutions
2. Background and motivation
Existing works propose to exploit data remapping to eliminate
This section begins by introducing the basics of SSDs, with a focus
duplicate writes in SSDs [810]. The key design is not to remove the
on logging methods. Then, we present existing remapping-based meth-
out-of-place data update but to directly remap the WAL file to the
ods. Finally, we present the preliminary study as the motivation for this
new-version data, as shown in Fig. 1b.
paper.
However, address remapping can lead to mapping inconsistency.
Flash pages are divided into a data area for storing user data and
2.1. Basics of SSD an OOB area for maintaining metadata. The OOB area contains the
physical-to-logical (P2L) mappings, which are crucial for maintaining
Flash memory utilizes a flash translation layer (FTL) to store and data consistency during garbage collection and database recovery.
manage a logical-to-physical address translation, called L2P mapping. During garbage collection, the P2L mappings enable quick identifica-
This mapping is often stored in the SRAM internal to the SSD to achieve tion of the logical address corresponding to a physical address, which
high access performance. Meanwhile, the logical address is also stored accelerates the update of L2P mappings during data migration. During
in the out-of-band (OOB) area of physical flash pages. Upon a data recovery upon a system crash, the FTL can reconstruct the lost L2P
update request, the FTL first stores the new data in new flash pages and mapping table using the P2L mapping stored within the page.
invalidates the old flash pages. Meanwhile, the L2P mapping is directed Without remapping, the P2L mappings in the OOB area directly
to the new physical page addresses, and the requested logical addresses correspond to the LPN in the L2P mapping table. However, mapping
are also stored in the OOB areas as the new flash pages are written. The inconsistencies may arise after remapping because remapping opera-
invalidated old pages are reclaimed during garbage collection (GC). tions do not simultaneously update the related P2L mappings in the
As shown in Fig. 1a, when data with physical addresses P1, P2, and OOB area.
P3 need to be updated, new data would eventually be stored in new
physical pages P1 , P2 , and P3 . (Note L𝑖 and P𝑖 in the figure represent 2.4. Preliminary study and motivation
the logical address and physical addresses).
To investigate the performance of database transactions, we conduct
2.2. Write ahead logging preliminary experiments using the FEMU simulator [11], which is
discussed in more detail in Section 4.
Relational databases are typically run in rollback mode or write- We run the SQLite database, perform 1 million overwrite operations
ahead log mode in order to support atomic execution of transactions [1, for each fixed value size, and collect the transaction latency under four
6,7]. New updates are first written in a dedicated log, and the data value sizes. In Fig. 3, the 𝑥-axis represents the transaction value size and
is kept consistent by rolling back or forwarding to the log. How- the 𝑦-axis represents the percentage of the time spent on WAL writes,
ever, using logs often generates write amplification, affecting database WAL synchronization, data writes, and data synchronization.
performance. Write-ahead logging (WAL) serves as an example. A From Fig. 3, we observe that WAL (WAL write and WAL synchro-
WAL-based transaction update includes three steps: WAL writing, WAL nization) takes up a significant portion of the total transaction latency.
synchronization, and database writing, as shown in Fig. 1a. First, when Compared to the data (data write and data synchronization) operations,
a transaction is initiated, the new data are written into the page cache the proportion is significantly higher for small value sizes, while for the
of WAL files (Step 1). Upon transaction commit, the WAL files are 16 KB size, the two are comparable.
physically written to flash memory (WAL synchronization) (Step 2). Two main factors contribute to this phenomenon. Firstly, WAL
Finally, the database data is updated during system checkpointing. As introduces additional overhead by writing an extra frame header for
this checkpoint is performed at the database software level, WAL data each transaction. This header contains essential recovery information
cannot be directly moved into the database data. Thus, the WAL file is and is stored alongside the normal data. Consequently, the relative
read again into the page cache (Step 3) and written into flash memory overhead of the frame header becomes more significant for smaller
upon database synchronization (Step 4). Duplicated writes introduced transactions. Secondly, although WAL consolidates multiple updates to
by WAL are detrimental to flash memory endurance and performance. the same data pages into a single write operation during checkpointing,
2
Z. Yin et al. Journal of Systems Architecture 160 (2025) 103347
Fig. 1. Existing write-ahead logging schemes in SSDs.
Fig. 3. Transaction latency distribution in SQLite database.
Fig. 2. Multi-version pages in the WAL.
the logging mechanism still necessitates storing multiple versions of 3.1. Overview
the same data in log files. It results in increased storage requirements,
particularly affecting smaller transactions with frequent updates on the We propose NoLgn-FTL, a novel approach that optimizes both soft-
same page, as the overhead of maintaining multiple versions becomes ware and hardware architectures to efficiently manage transactions and
more significant relative to the size of the transactions. data version control at the FTL layer, thereby avoiding the overhead of
This paper proposes a novel approach by directly updating data and logs in databases. At the core of NoLgn-FTL is the novel FTL, where
leveraging the inherent multi-version characteristic of flash memory. transaction information is utilized to perform mapping conversion of
Shifting the focus of transaction support to flash can reduce the reliance logical and physical addresses in the L2P and P2P tables only when
on logs and frequent file synchronization operations in the database. data is written, minimizing overhead. However, the use of NoLgn-
This leads to faster application response times as it reduces the need FTL starts at the database layer where the transaction information is
for excessive logging and synchronization. attached to write requests. The file system layer also plays a crucial role
by providing transaction-related interfaces and transmitting necessary
transactional metadata.
3. The proposed NoLgn-FTL Fig. 4 shows the overall workflow with an example of transactional
data update on three pages in L1, L2, and L3. The process is divided
We first introduce the overview of the whole system flow using an into three key stages: transaction delivery, transaction persistence, and
no-logging flash translation layer, which, hereafter, we simply refer to GC. These stages can be further subdivided into six steps.
as NoLgn-FTL. Then, we delve into the design details of NoLgn-FTL, First, the database assigns transaction flags to each transaction (⃝ 1
including old page information storage, transaction process, garbage in Fig. 4) to indicate the completion status of the transaction. Then, a
collection (GC), and data recovery. Without loss of generality, the SQL transaction ID is added to the original transactional data request (⃝). 2
database is used in discussing the use of NoLgn-FTL. Finally, we analyze To retain transaction flags and IDs, we design new interfaces in the file
and discuss the overhead associated with NoLgn-FTL. system (⃝).3
3
Z. Yin et al. Journal of Systems Architecture 160 (2025) 103347
Fig. 4. Overview of NoLgn-FTL.
In the second stage, which occurs within the SSDs, the flash con- of the old pages are also stored in the OOB area of the new flash pages.
troller identifies transaction data by transaction flags and IDs. Data and The primary purposes of the P2P table are twofold: firstly, to facilitate
transaction information are persisted, obtaining their corresponding the management of transactional information by the underlying FTL,
physical addresses. The old addresses and transaction information are and secondly, to enhance the performance during GC and transaction
written in the OOB area of the corresponding flash pages, as well as in operations. Note that locating old pages can be accelerated by using the
the P2P table in DRAM (⃝). 4 The old pages remain valid in this step P2P table, thereby avoiding frequent access on flash pages to the OOB
but will be invalidated only after the transaction is committed (⃝). 5 area. This table does not need to be written to flash memory and can be
As transactions are continuously executed, a large amount of invalid recovered through a full scan even after a sudden power failure, thus
data accumulates in the flash memory. The GC process (⃝) 6 reclaims the avoiding frequent writes of transaction information to flash memory.
invalid data. The collaboration between the database, file system, and Furthermore, transaction information, including transaction IDs and
flash controller in NoLgn-FTL ensures data consistency and integrity flags, is stored in the OOB area of new flash pages. In detail, flags S,
throughout the transactional data update process. M, and E represent the starting page, the middle pages, and the end
The modified file system interfaces play a crucial role in preserving page of a transaction, respectively. In the implementation of transac-
the necessary transaction metadata. The design of NoLgn-FTL in the tion flags, since we are only concerned whether the transaction has
above-mentioned three main stages will be presented in Sections 3.2,
ended, we use only one bit to mark the transactions completion. By
3.3, and 3.4.
storing transaction information alongside the corresponding pages, the
progress and state of transactions can be more effectively tracked,
3.2. Metadata management in transaction delivery
enabling data recovery in case of unexpected failures or interruptions.
Database recovery will be explained in Section 3.5.
In the transaction delivery process, we introduce additional meta-
In addition to transaction information, one extra bit, referred to
data to facilitate the implementation of the no-logging scheme. This
as the lock bit, is used to indicate the block lock state. The lock bit
metadata is passed along with the transactional data requests to en-
value 1 signifies that valid old pages exist in the current block, while
sure proper handling and management of transactions throughout the
0 indicates the block is stale and can be reclaimed during GC. By
system.
embedding the lock bit within the FTL, blocks containing valid old
In the FTL, we establish a physical-to-physical (P2P) table that
pages and normal blocks can be efficiently distinguished, allowing for
stores the mapping between new and old physical pages (i.e., their old
GC optimization. The GC process under NoLgn-FTL will be presented
version). In detail, one entry in the P2P table includes the transaction
in Section 3.4.
ID, the physical page number (PPN) of the new page and the PPN of the
corresponding old page. To ensure persistent P2P mappings, the PPNs
4
Z. Yin et al. Journal of Systems Architecture 160 (2025) 103347
3.3. Transaction persistence in NoLgn-FTL P2P Table Storage and Overhead: The P2P table is stored in the RAM
of the flash controller. The number of entries in the P2P table depends
To ensure transaction persistence, the transaction needs to do the on the number of concurrent transactions. In our experiment, the table
following during its write and commit process. During transaction contains 10 000 entries. Each P2P entry takes 12 bytes, including a 4-
writing, NoLgn-FTL first looks up the original L2P table to find the byte transaction ID and 4 bytes each for the new page PPN and the
old PPN corresponding to the requested logical addresses. As shown in old page PPN. The total size of the P2P table is about 120 KB. The
1
Fig. 4, the old PPNs are P1, P2, and P3 for the requested L1, L2, and L3, DRAM size is usually around 1024 of the SSD capacity. For an SSD with
respectively. Then, the updated data are written into the new pages P1 , a 1TB capacity, the DRAM size will be 1 GB, and the P2P table will be
P2 , and P3 , respectively. At the same time, transaction information 0.12 MB, which is only 0.012% of the DRAM size and is negligible. The
and the old PPN are written into the OOB area of these new pages. block lock state is stored in the metadata of data blocks as a bitmap,
Finally, NoLgn-FTL stores the mapping entry of P1, P2, and P3 into the with each block requiring only 1 bit, which is insignificant in terms of
P2P table. Different from the original flash write, the old page remains overhead. This lock bit is loaded into the SSDs DRAM during startup.
valid. Meanwhile, the blocks lock state containing valid old pages is Transaction Information Storage in OOB Area: Transaction informa-
set to 1. tion is stored in the OOB area of flash pages. NoLgn-FTL uses 4 bytes for
During transaction commit, NoLgn-FTL first searches the P2P table old PPNs and 4 bytes for transaction information (comprising the trans-
to find old valid pages and then invalidates them. Then, the blocks lock action ID and 1 bit for transaction flag). In current flash chips, the ratio
state containing these old valid pages would be set to 0. Finally, the of the OOB area size to the data area size is about 18 [12]. Therefore,
corresponding entries in the P2P table are deleted. the OOB area has enough space to store transaction information.
3.4. Garbage collection with NoLgn-FTL 4. Evaluation
GC in NoLgn-FTL requires handling valid old pages temporarily In this section, we present a comprehensive evaluation of NoLgn-
generated during transaction processing. Selecting a victim block for FTL, using an SQLite and Ext4 combination as a case study. We first
describe the experimental setup. Then, we present the sqlite-bench
GC involves several steps to ensure data integrity and efficient space
experimental results, focusing on two key aspects: flash write and
reclamation.
database performance. We also investigate the impact of NoLgn-FTL
When selecting a victim block for GC, the first step is to check the
on GC. Furthermore, we show the performance of real-world workloads
blocks lock state. If the lock state is 1, valid old pages still exist within
with the YCSB and TPC-C benchmarks.
the block, and therefore, the block cannot be reclaimed. At this time,
the next victim block in the queue is selected until the selected blocks
4.1. Experimental setup
lock state is 0. Then, whether there is a transaction page in the block
must be checked. As the transaction information and old PPN are stored
NoLgn-FTL is implemented on FEMU [1315], a QEMU-based NVMe
in the OOB area of the new valid pages, GC in NoLgn-FTL deals with
SSD emulator. The host system kernel of FEMU is Linux 5.15, and the
them differently depending on the transaction state. That is, before the
file system is Ext4. To ensure a representative and consistent setup, the
transaction is committed, GC will migrate these valid pages together
simulated SSD has a 16 GB logical capacity, with 1024 pages per flash
with the OOB area. However, after a commit has occurred, GC only
block and a 4 KB page size. The flash latency for read, write, and erase
migrates valid page data, removing the extra metadata of NoLgn-FTL
operations is 50 μs, 500 μs, and 5 ms, respectively [16]. To ensure the
that resides in the OOB area.
GC (Garbage Collection) mechanism is appropriately triggered during
our experiments, we conducted 4 million 4 KB write operations on the
3.5. Database recovery with NoLgn-FTL
SSD in each test. This setup guarantees that GC operations occur as part
of the evaluation.
In the event of a power-off or system crash, data stored in the For the logging database, we make use of SQLite. We make nec-
flash controllers RAM is lost, and only the OOB area of flash pages essary modifications to the Linux kernel to receive and process trans-
can be used for system recovery. One solution is to recover to the action information from the SQLite database. To enable SQLite to
consistent states in the latest checkpoint, which requires periodically transmit transaction information to the kernel, we utilize the ioctl
storing checkpoints. The other solution involves a full flash scan to system call to change database write, commit, and abort operations into
rebuild mappings, as shown in Step 1 of Fig. 5. Physical pages and write, commit, and abort commands. As SQLite does not automatically
their OOB area would be read one by one (Step 2). For pages that generate unique transaction IDs for each transaction, the transaction
do not have transaction information in the OOB area, NoLgn-FTL can IDs are generated in the kernel after each transaction is committed.
directly recover the L2P table of PPNs based on the LPNs in their OOB Upon receiving the written information from SQLite, the kernel first
area. Otherwise, NoLgn-FTL decides to recover old-version pages or assigns flags to the requested transaction pages. This enables the kernel
not according to transaction information. NoLgn-FTL would first obtain to keep track of the transaction status and perform necessary operations
pages with the same transaction ID. If the page with the end flag bit accordingly. Approximately 150 lines of code were modified in SQLite,
can be found, these pages would be directly put into the L2P table around 100 lines in the file system, and about 300 lines in FEMU.
together with their LPNs (Step 3). Otherwise, if all pages have the flag Hereafter, NoLgn-FTL will refer to the entire SQLite-Ext4-SSD sys-
bit 0, which indicates that the current transaction is not committed, tem stack modified to ensure the seamless integration and functionality
the old-version pages would be first read out (Step 4), and only the L2P of NoLgn-FTL within the existing software and hardware stack. The
mappings of old-version pages would then be put into the L2P table. newly introduced commands, which are based on the ioctl system
call, are as follows.
3.6. Discussion and overhead analysis write(page p, tid t, flag f). This command adds a transaction ID (tid),
𝑡, and a transaction flag, 𝑓 , to the original write operation. It is the
Compared to existing logging methods that store extra logs for each beginning of a transaction and corresponds to Step 4 in Fig. 4. The
transaction, the use of NoLgn-FTL allows normal data updates without inclusion of the transaction ID and flag enables the FTL to track and
the need for additional logging. The overhead of NoLgn-FTL is due manage the transaction.
to the storage of extra metadata, including the P2P table, transaction commit (tid t). This command with the parameter of transaction ID
information, and the block lock state. tid t is sent to NoLgn-FTL along with the original fsync command in the
5
Z. Yin et al. Journal of Systems Architecture 160 (2025) 103347
Fig. 5. Recovery with NoLgn-FTL.
Linux kernel. It indicates the successful completion of a transaction and shows the normalized number of writes in flash memory compared to
aligns with Step 5 in Fig. 4. Upon receiving this command, NoLgn-FTL Base-WAL under two synchronization modes. In NORMAL mode, SW-
finalizes the transaction and ensures the durability of the associated WAL reduces writes by 35% compared to Base-WAL, as it eliminates
data. extra writes caused by out-of-place updates through WAL file remap-
abort(tid t). This command is invoked to terminate ongoing trans- ping. On average, NoLgn-FTL reduces 55% and 20% of the flash page
actions before committing transaction 𝑡. It indicates a rollback opera- writes compared to Base-WAL and SW-WAL, respectively. The superior
tion, reverting the data pages to their previous versions, akin to the performance of NoLgn-FTL is due to its elimination of WAL writes
data recovery process for uncommitted transactions as mentioned in and WAL synchronization, resulting in a greater reduction of writes
Section 3.5. compared to SW-WAL. Specifically, there are two reasons for NoLgn-
We compare NoLgn-FTL with Base-WAL, the original SQLite, which FTLs write reduction. First, as WAL has to write an extra log header,
uses the native logging scheme, and SW-WAL [4], which reduces WAL write involves more data than normal data write. Second, since
duplicate writes by SSD remapping as shown in Fig. 1a. For each trans-
synchronization does not happen immediately after each transaction,
action size, the database runs separately, but these transactions share
in NORMAL mode, updates onto the same page are serviced from
the same SSD storage. It is important to consider that in real-world
the cache. NoLgn-FTL combines several updates into a single update,
scenarios, particularly in mobile environments, the characteristics of
thereby reducing writes. However, this combination cannot be realized
write requests can significantly impact the performance of storage
in SW-WAL as it uses different LPNs for data updates and WAL writes.
systems. SQLite is a lightweight, embedded database commonly used in
mobile devices for local data storage, making it highly relevant to our In FULL mode, NoLgn-FTL reduces flash page writes by 35% and
analysis. Studies have shown that approximately 90% of write requests 2% compared to Base-WAL and SW-WAL, respectively. Both methods
in Android applications, such as Facebook and Twitter, are related to show reductions in page writes compared with Base-WAL, similar to the
SQLite databases and journal files. In environments like these, the data NORMAL mode. However, the enhancement brought by NoLgn-FTL is
items stored in the database are typically small, often below 4 KB. less than that of the NORMAL mode. As each transaction is forcibly
These small data items, such as individual records or keyvalue pairs, synchronized to flash memory after committing, there is no chance for
are frequently written to the storage medium in the form of random NoLgn-FTL to combine updates on the same page. The reduction from
write operations. These operations usually target data blocks ranging log header writes is limited. Thus, in this mode, NoLgn-FTL behaves
from 64B to 4 KB, and such small writes often involve high interaction similarly to SW-WAL.
with the underlying file system, such as EXT4, which is commonly
used in Android devices [17,18]. Therefore, we set different transaction 4.3. Results of database performance
sizes from 256B to 16 KB in the experiment to observe their impact on
performance. We used sqlite-bench to observe SQLite performance. Fig. 7 shows
We conduct experiments in both the FULL and NORMAL syn- the normalized throughput results of SQLite under the three com-
chronous modes of the database. In FULL mode, synchronization is pared methods. In NORMAL mode, NoLgn-FTL achieves an average
triggered after each transaction is committed. This forces all transaction performance improvement of 51% and 15% against Base-WAL and SW-
data to be written into SSDs, thus providing the highest atomicity
WAL, respectively. NoLgn-FTL performs particularly better compared
and durability. Conversely, in NORMAL mode, synchronization is not
to SW-WAL for small-sized transactions, due to the reasons described
triggered immediately after the transaction is committed. Typically,
earlier.
transactions are synchronized into SSDs only when a certain number
In FULL mode, we observe that NoLgn-FTL outperforms Base-WAL
of frames (including transaction heads and data) are accumulated.
and SW-WAL by an average of 26% and 4%, respectively. This perfor-
Note that NoLgn-FTL has no explicit WAL synchronization operation.
In NORMAL mode, we manually control the frequency of commit in mance improvement is primarily due to the reduction in the number
NoLgn-FTL to keep consistent with the synchronization operation of the of writes achieved by NoLgn-FTL. Meanwhile, we find that both SW-
other two existing methods. In NoLgn-FTL, a synchronization operation WAL and NoLgn-FTL demonstrate a gradual performance improvement
will be triggered every 1000 data pages. as the transaction size increases. This is because, for large-size trans-
actions, Base-WAL takes up more latency to write flash pages and GC.
4.2. Results of flash page writes Since SW-WAL and NoLgn-FTL reduce the number of data writes, this
degradation is mitigated. Even in this situation, the performance of
We used sqlite-bench with 200 thousand overwrite operations to SW-WAL is still inferior to that of NoLgn-FTL, as it maintains head
observe the effect of NoLgn-FTL on flash memory page writes. Fig. 6 information that consumes data write latency.
6
Z. Yin et al. Journal of Systems Architecture 160 (2025) 103347
Fig. 6. Results of flash page writes.
Fig. 7. SQLite database performance.
Fig. 8. SQLite database latency.
Besides, we also evaluated database latency data under different by NoLgn-FTL remains significant. Compared to Base-WAL, NoLgn-FTL
conditions. Fig. 8 illustrates the normalized latency results under the reduces latency by an average of 16.4%, and compared to SW-WAL,
three compared methods: Base-WAL, SW-WAL, and NoLgn-FTL, in both the reduction is 3.7%. Both NoLgn-FTL and SW-WAL exhibit a gradual
NORMAL and FULL modes. latency improvement as transaction size increases, which aligns with
In NORMAL mode, NoLgn-FTL demonstrates the lowest latency the behavior observed in throughput analysis. For larger transactions
among the three methods, achieving an average reduction of 34.4% (e.g., 8 KB and 16 KB), Base-WAL experiences higher latency due to
compared to Base-WAL and 11% compared to SW-WAL. The latency more extensive flash page writes and garbage collection overhead. In
advantage of NoLgn-FTL is particularly pronounced for small-sized contrast, NoLgn-FTL and SW-WAL effectively mitigate this degradation
transactions (e.g., 256B and 512B). This stems from its ability to by reducing the volume of writes.
reduce the number of writes and optimize metadata updates, minimiz-
ing the overhead typically associated with WAL. SW-WAL also shows 4.4. Results of GC overhead
improved latency compared to Base-WAL, with an average reduction
of approximately 26.2%, thanks to its selective write strategy. How- We used sqlite-bench to investigate the impact of block locking on
ever, its performance is still limited due to the additional overhead GC performance by collecting write distribution results under different
introduced by writing WAL, which becomes increasingly noticeable for transaction sizes. Fig. 9 shows the write distribution of host requests,
smaller transactions. In FULL mode, the latency reduction achieved GC migration, and block locking (denoted as additional pages) under
7
Z. Yin et al. Journal of Systems Architecture 160 (2025) 103347
Fig. 9. Results of GC overhead. NoLgn-FTL would lock certain blocks, which would affect victim block selection and induce more migrations.
Table 1 E), the improvements from both methods are not significant. This is
YCSB workloads.
mainly because both methods only enhance write performance and
Workload Description have little impact on read performance. Meanwhile, NoLgn-FTL still
A 50% read and 50% update, Zipfian distribution outperforms SW-WAL due to its greater write performance benefits. In
B 95% read and 5% update, Zipfian distribution
the case of workload C, which only contains read requests, there are
C 100% read, Zipfian distribution
D 95% read and 5% insert, latest read no obvious differences in the three methods. This is because the remap-
E 95% scan and 5% insert, Zipfian distribution based logging in SW-WAL and no-logging scheme in NoLgn-FTL are not
F 50% read and 50% readmodifywrite, Zipfian distribution triggered. The slight performance fluctuations arise from the random
nature of read operations.
Fig. 11 shows the performance of SQLite in terms of transactions
different transaction sizes. per minute (tpmC) with different SSD free spaces. To obtain SSDs
Two key observations can be made from Fig. 9. First, as transaction with varying free space, sufficient random overwrite iterations are
value size increases, the proportion of valid page migration involved performed before each of the experiments. TPC-C is a write-intensive
in GC also increases, reaching a maximum of 62%. This trend can be workload with operations such as new orders, payment, and delivery,
attributed to the fact that larger transaction sizes require more frequent with an average of two pages updated per transaction. The results
GC to accommodate new content. Second, the block locking mechanism show that when SSD free space is 75%, the performance differences
impacts the number of valid pages migrated. The maximum proportion among the three modes are relatively small. However, as SSD free
of additional migration pages due to block locking is 6%, with an space decreases, the performance gap widens. Overall, NoLgn-FTL sig-
average increase of 3.5% in total write pages. This impact is more nificantly outperforms Base-WAL and SW-WAL. On average, SW-WAL
significant for smaller transaction sizes, as updates may be concentrated improves transaction throughput by 20% compared to Base-WAL, while
in fewer blocks, preventing them from being chosen as optimal victim NoLgn-FTL improves throughput by 38%. Notably, the performance
blocks for GC and leading to suboptimal data migration with more valid gains of SW-WAL and NoLgn-FTL become more pronounced when SSD
pages. free space is limited. When SSD remaining space is 25%, NoLgn-FTLs
Despite the extra page writes caused by block locking, these over- throughput is 81% higher than Base-WAL. This is mainly because when
heads are acceptable compared to the significant reduction in duplicate SSD free space is low, there may be a lack of free blocks, requiring
writes achieved by NoLgn-FTL. The benefits of eliminating duplicate frequent GC to accommodate new writes. Additionally, TPC-Cs trans-
writes and improving overall write performance outweigh the relatively action data size is relatively small, allowing multiple data items to be
minor increase in valid page migrations caused by locking SSD blocks. stored in a single page. Therefore, NoLgn-FTL effectively reduces write
operations and GC needs by minimizing duplicated writes.
4.5. Results of YCSB and TPC-C performance
5. Related works
We also evaluate NoLgn-FTL using the YCSB benchmark to assess its
performance under various realistic workloads. YCSB provides six core Research addressing duplicate writes can be divided into two direc-
workloads as summarized in Table 1. To evaluate the long-term impact tions: optimization on atomic writes and remapping-based methods.
of NoLgn-FTL, we use TPC-C benchmarks with four 4 warehouses [19] An atomic write interface was initially proposed by Park et al. [20],
tested under different SSD free space conditions. TPC-C contains the which achieved atomicity for multi-page writes. Prabhakaran et al. [21]
following 5 transaction types: 43% new order, 43% payment, 4% further introduced a transactional FTL called txFlash, which provides
delivery, 4% order status, 4% stock level. The number of database a transaction interface (WriteAtomic) to higher-level software. It pro-
connections was set to 1 to avoid frequent aborts of update transactions. vides isolation among multiple atomic write calls by ensuring that
Fig. 10 shows the normalized throughput results of SQLite under no conflicting writes are issued. Xu et al. [22] used the native off-
YCSB benchmarks in NORMAL mode. On average, SW-WAL shows site update feature of NAND flash memory to simulate copy-on-write
a 10% performance improvement over Base-WAL, while NoLgn-FTL technology and, at the same time, used NVM to store the FTL mapping
achieves a 17% improvement. For write-intensive workloads (A and F), table. However, these methods mostly supported atomicity for multi-
both SW-WAL and NoLgn-FTL exhibit significantly better performance page writes only. Kang et al. presented X-FTL [23], aiming to support
than Base-WAL. However, for read-intensive workloads (B, D, and general transactional atomicity, allowing data pages in a transaction
8
Z. Yin et al. Journal of Systems Architecture 160 (2025) 103347
Fig. 10. SQLite performance on YCSB benchmarks.
Fig. 11. SQLite performance on TPC-C benchmark.
to be written to flash at any time. However, it requires an additional 6. Conclusion
X-L2P table and needs to persist it to flash upon transaction commit.
Address remapping is another extensively researched method that In this paper, we presented NoLgn-FTL to directly update the
modifies the mapping table directly without performing actual writing. database in a no-logging way by reusing the old flash pages. NoLgn-
Wu et al. [24] proposed KVSSD, which exploits the FTL mapping mech- FTL uses a P2P table and OOB area of flash pages to keep old page
anism to implement copy-free compaction of LSM trees, and it enables information and transaction information. Thus, systems can recover
direct data allocation in flash memory for efficient garbage collection. to a consistent state when a crash happens. As there is no need to
However, address remapping may suffer from mapping inconsistencies store logging files in NoLgn-FTL, duplicate writes can be avoided. We
due to the inability of flash memory to perform in-place updates. implemented a prototype of NoLgn-FTL on the FEMU SSD simulator
Hahn et al. [25] use the address remapping operation for file system and integrated it with the SQLite database. The file system is modified
defragmentation. However, after remapping, it uses file system logs to to enable SQLite to use the provided interface and transfer transaction
deal with mapping inconsistencies. The larger log size results in longer information. Experimental results demonstrate that NoLgn-FTL can
search times and increased memory consumption when performing significantly reduce writes to SSDs and improve the performance of
read operations. As the number of remappings escalates, the log can SQLite, while still ensuring atomicity.
become several hundred MB or even GB. Therefore, these methods
may incur significant lookup overhead. Zhou et al. [26] address this CRediT authorship contribution statement
issue by storing the new mapping table in Non-Volatile Memory, re-
ducing lookup overhead. Besides, Wu et al. [4] proposed SW-WAL, a Zhenghao Yin: Writing original draft, Visualization, Validation,
novel approach that emulates the maintenance of a mapping table by Software, Methodology, Investigation, Formal analysis, Data curation.
inscribing transaction information directly into the OOB area of flash Yajuan Du: Writing review & editing, Supervision, Project adminis-
pages. This strategy markedly reduces the footprint of the search table tration, Conceptualization. Yi Fan: Visualization. Sam H. Noh: Writing
and concurrently boosts search efficiency. Additionally, to deal with review & editing.
the heavy query latency during WAL checkpointing, Yoon et al. [27]
proposed Check-In to align journal logs to the FTL mapping unit. Funding
The FTL creates a checkpoint by remapping the journal logs to the
checkpoint, effectively reducing the checkpointing overhead and WALs This research did not receive any specific grant from funding agen-
duplicate writes. cies in the public, commercial, or not-for-profit sectors.
9
Z. Yin et al. Journal of Systems Architecture 160 (2025) 103347
Declaration of competing interest [23] W.-H. Kang, S.-W. Lee, B. Moon, G.-H. Oh, C. Min, X-FTL: transactional FTL
for SQLite databases, in: Proceedings of the 2013 ACM SIGMOD International
Conference on Management of Data, 2013, pp. 97108.
The authors declare that they have no known competing finan-
[24] S.-M. Wu, K.-H. Lin, L.-P. Chang, KVSSD: Close integration of LSM trees and
cial interests or personal relationships that could have appeared to flash translation layer for write-efficient KV store, in: 2018 Design, Automation
influence the work reported in this paper. & Test in Europe Conference & Exhibition, DATE, IEEE, 2018, pp. 563568.
[25] S.S. Hahn, S. Lee, C. Ji, L. Chang, I. Yee, L. Shi, C.J. Xue, J. Kim, Improving file
system performance of mobile storage systems using a decoupled defragmenter,
Data availability
in: 2017 USENIX Annual Technical Conference (USENIX ATC 17), 2017, pp.
759771.
The original contributions presented in the study are included in the [26] Y. Zhou, Q. Wu, F. Wu, H. Jiang, J. Zhou, C. Xie, Remap-SSD: Safely and
article, further inquiries can be directed to the corresponding author. efficiently exploiting SSD address remapping to eliminate duplicate writes, in:
19th USENIX Conference on File and Storage Technologies (FAST 21), 2021, pp.
187202.
[27] J. Yoon, W.S. Jeong, W.W. Ro, Check-In: In-storage checkpointing for key-
References
value store system leveraging flash-based SSDs, in: 2020 ACM/IEEE 47th Annual
International Symposium on Computer Architecture, ISCA, 2020, pp. 693706,
[1] C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh, P. Schwarz, ARIES: A transaction http://dx.doi.org/10.1109/ISCA45697.2020.00063.
recovery method supporting fine-granularity locking and partial rollbacks using
write-ahead logging, ACM Trans. Database Syst. 17 (1) (1992) 94162.
[2] S. Lee, D. Park, T. Chung, D. Lee, S. Park, H. Song, A log buffer-based flash Zhenghao Yin received the BS degree in Computer Science
translation layer using fully-associative sector translation, ACM Trans. Embed. from Wuhan University of Technology, Wuhan, China, in
Comput. Syst. ( TECS) 6 (3) (2007) 18es. 2022, and is currently pursuing the MS degree in Computer
[3] L. Shi, J. Li, C.J. Xue, C. Yang, X. Zhou, ExLRU: A unified write buffer cache Science, expected to graduate in 2025. His research interests
management for flash memory, in: Proceedings of the Ninth ACM International include flash memory and database technologies.
Conference on Embedded Software, 2011, pp. 339348.
[4] Q. Wu, Y. Zhou, F. Wu, K. Wang, H. Lv, J. Wan, C. Xie, SW-WAL: Leveraging
address remapping of SSDs to achieve single-write write-ahead logging, in: 2021
Design, Automation & Test in Europe Conference & Exhibition, DATE, 2021, pp.
802807.
[5] F. Ni, X. Wu, W. Li, L. Wang, S. Jiang, Leveraging SSDs flexible address mapping
to accelerate data copy operations, in: 2019 IEEE 21st International Conference Yajuan Du received the joint Ph.D. degrees from the City
on High Performance Computing and Communications; IEEE 17th International University of Hong Kong and the Huazhong University of
Conference on Smart City; IEEE 5th International Conference on Data Science Science and Technology, in December 2017 and February
and Systems (HPCC/SmartCity/DSS), 2019, pp. 10511059. 2018, respectively. She is currently an Assistant Professor
[6] J. Coburn, T. Bunker, M. Schwarz, R. Gupta, S. Swanson, From ARIES to MARS: with the School of Computer Science and Technology,
Transaction support for next-generation, solid-state drives, in: Proceedings of Wuhan University of Technology. Her research interests
the Twenty-Fourth ACM Symposium on Operating Systems Principles, 2013, pp. include optimizing access performance, data reliability, and
197212. persistency of flash memories and non-volatile memories.
[7] J. Arulraj, M. Perron, A. Pavlo, Write-behind logging, Proc. VLDB Endow. 10 (4)
(2016) 337348.
[8] K. Han, H. Kim, D. Shin, WAL-SSD: Address remapping-based write-ahead-logging
solid-state disks, IEEE Trans. Comput. 69 (2) (2019) 260273.
[9] G. Oh, C. Seo, R. Mayuram, Y.-S. Kee, S.-W. Lee, SHARE interface in flash storage
for relational and NoSQL databases, in: Proceedings of the 2016 International Yi Fan received the BS degree in Computer Science from
Conference on Management of Data, 2016, pp. 343354. Wuhan University of Technology, Wuhan, China, in 2022,
[10] Q. Wu, Y. Zhou, F. Wu, H. Jiang, J. Zhou, C. Xie, Understanding and exploiting and is currently pursuing the MS degree in Computer
the full potential of SSD address remapping, IEEE Trans. Comput.-Aided Des. Science, expected to graduate in 2025. His research interests
Integr. Circuits Syst. 41 (11) (2022) 51125125. include keyvalue databases and flash memory technologies.
[11] H. Li, M. Hao, M.H. Tong, S. Sundararaman, M. Bjørling, H.S. Gunawi, The
CASE of FEMU: Cheap, accurate, scalable and extensible flash emulator, in:
16th USENIX Conference on File and Storage Technologies (FAST 18), 2018,
pp. 8390.
[12] Y. Zhou, F. Wu, Z. Lu, X. He, P. Huang, C. Xie, SCORE: A novel scheme to
efficiently cache overlong ECCs in NAND flash memory, ACM Trans. Archit.
Code Optim. ( TACO) 15 (4) (2018) 125.
Sam H. (Hyuk) Noh received his BE in Computer Engineer-
[13] L. Long, S. He, J. Shen, R. Liu, Z. Tan, C. Gao, D. Liu, K. Zhong, Y. Jiang, WA-
ing from Seoul National University in 1986 and his Ph.D. in
Zone: Wear-aware zone management optimization for LSM-Tree on ZNS SSDs,
Computer Science from the University of Maryland in 1993.
ACM Trans. Archit. Code Optim. 21 (1) (2024) 123.
He held a visiting faculty position at George Washington
[14] D. Huang, D. Feng, Q. Liu, B. Ding, W. Zhao, X. Wei, W. Tong, SplitZNS: Towards
University (19931994) before joining Hongik University,
an efficient LSM-tree on zoned namespace SSDs, ACM Trans. Archit. Code Optim.
where he was a professor in the School of Computer and
20 (3) (2023) 126.
Information Engineering until 2015. From 2001 to 2002, he
[15] S.-H. Kim, J. Shim, E. Lee, S. Jeong, I. Kang, J.-S. Kim, NVMeVirt: A versatile
was a visiting associate professor at UM IACS, University of
software-defined virtual NVMe device, in: 21st USENIX Conference on File and
Maryland. In 2015, Dr. Noh joined UNIST as a professor
Storage Technologies (FAST 23), 2023, pp. 379394.
in the Department of Computer Science and Engineering.
[16] B.S. Kim, J. Choi, S.L. Min, Design tradeoffs for SSD reliability, in: 17th USENIX
He became the inaugural Dean of the Graduate School
Conference on File and Storage Technologies (FAST 19), 2019, pp. 281294.
of Artificial Intelligence and previously served as Dean of
[17] Z. Shen, Y. Shi, Z. Shao, Y. Guan, An efficient LSM-tree-based sqlite-like database
the School of Electrical and Computer Engineering (2016
engine for mobile devices, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.
2018). He has contributed to numerous conferences, serving
38 (9) (2018) 16351647.
as General Chair, Program Chair, or committee member
[18] A. Mäkinen, Tracing Android applications for file system optimization.
for events like ACM SOSP, USENIX FAST, ACM ASPLOS,
[19] S.T. Leutenegger, D. Dias, A modeling study of the TPC-C benchmark, ACM
and USENIX OSDI. He also chaired the ACM HotStorage
Sigmod Rec. 22 (2) (1993) 2231.
Steering Committee and serves on the Steering Committees
[20] S. Park, J.H. Yu, S.Y. Ohm, Atomic write FTL for robust flash file system, in:
for USENIX FAST and IEEE NVMSA. Dr. Noh was Editor-
Proceedings of the Ninth International Symposium on Consumer Electronics,
in-Chief of ACM Transactions on Storage (20162022) and
2005.(ISCE 2005), 2005, pp. 155160.
is now co-Editor-in-Chief of ACM Transactions on Computer
[21] V. Prabhakaran, T.L. Rodeheffer, L. Zhou, Transactional flash, in: OSDI, Vol. 8,
Systems. His research focuses on system software and storage
2008.
systems, emphasizing emerging memory technologies like
[22] Y. Xu, Z. Hou, NVM-assisted non-redundant logging for Android systems, in:
flash and persistent memory.
2016 IEEE Trustcom/BigDataSE/ISPA, 2016, pp. 14271433.
10

View File

@@ -0,0 +1,595 @@
Computer Standards & Interfaces 97 (2026) 104120
Contents lists available at ScienceDirect
Computer Standards & Interfaces
journal homepage: www.elsevier.com/locate/csi
Energy consumption assessment in embedded AI: Metrological
improvements of benchmarks for edge devices
Andrea Apicella b , Pasquale Arpaia a ,, Luigi Capobianco d , Francesco Caputo a ,
Antonella Cioffi d , Antonio Esposito a , Francesco Isgrò a , Rosanna Manzo c ,
Nicola Moccaldi a , Danilo Pau e , Ettore Toscano d
a
Dipartimento di Ingegneria Elettrica e delle Tecnologie dellInformazione, Università degli Studi di Napoli Federico II, Naples, Italy
b
Dipartimento di Ingegneria dellInformazione ed Elettrica e Matematica applicata (DIEM), Università degli Studi di Salerno, Fisciano, Italy
c
Dipartimento di Sanità Pubblica e Medicina Preventiva, Università degli Studi di Napoli Federico II, Naples, Italy
d
Software Design Center, STMicroelectronics, Marcianise, Italy
e System Research and Applications, STMicroelectronics, Agrate Brianza, Italy
ARTICLE INFO ABSTRACT
Keywords: This manuscript proposes a new method to improve the MLCommons protocol for measuring power consump-
Energy assessment tion on Microcontroller Units (MCUs) when running edge Artificial Intelligence (AI). In particular, the proposed
Embedded AI approach (i) selectively measures the power consumption attributable to the inferences (namely, the predictions
Tiny-ML
performed by Artificial Neural Networks — ANN), preventing the impact of other operations, (ii) accurately
Uncertainty analysis
identifies the time window for acquiring the sample of the current thanks to the simultaneous measurement of
Edge device benchmark
power consumption and inference duration, and (iii) precisely synchronize the measurement windows and the
inferences. The method is validated on three use cases: (i) Rockchip RV1106, a neural MCU that implements
ANN via hardware neural processing unit through a dedicated accelerator, (ii) STM32 H7, and (iii) STM32 U5,
high-performance and ultra-low-power general-purpose microcontroller, respectively. The proposed method
returns higher power consumption for the two devices with respect to the MLCommons approach. This result
is compatible with an improvement of selectivity and accuracy. Furthermore, the method reduces measurement
uncertainty on the Rockchip RV1106 and STM32 boards by factors of 6 and 12, respectively.
1. Introduction (MCUs), widely used in IoT, this is particularly true. Many IoT applica-
tions, such as autonomous driving [6], demand low-latency responses
The rapid expansion of Internet of Things (IoT) devices has ushered to be effectively reactive. Moreover, several IoT devices often operate
in a new era of connected intelligence at the edge, where data process- under very limited power sources. Promising energy-efficient strategies
ing, low latency, and real-time decision making can take place directly aim to minimize consumption. For instance, index modulation [7,8] is
at the edge [1]. These IoT devices cover a variety of applications, from a transmission technique that conveys additional information through
smart home sensors [2], to industrial automation [3], and health mon- the indices of available resources such as antennas, subcarriers, or
itoring systems [4], where low latency responses and energy efficiency time slots, and it can significantly reduce energy usage while maintain-
are essential. ing data throughput. Nevertheless, even with advanced optimization
Extending computation to more peripheral network nodes enhances strategies, the repetitive and frequent processing required by many ap-
all key aspects of edge computing, including energy efficiency, carbon plications can rapidly deplete power resources, thereby limiting device
footprint reduction, security, latency, privacy, offline functionality, and
lifetime.
data management costs [5]. However, deploying intelligence at the
In recent years, Machine Learning (ML) methods [9], particularly
end nodes requires careful consideration of the IoT devices inherent
Artificial Neural Networks (ANNs), have been increasingly deployed on
limitations, such as memory and computational resources impacting
IoT devices to enhance localized data processing capabilities and reduce
time performances, and energy constraints. For Microcontroller Units
Corresponding author.
E-mail addresses: andapicella@unisa.it (A. Apicella), pasquale.arpaia@unina.it (P. Arpaia), luigi.capobianco@st.com (L. Capobianco),
francesco.caputo3@unina.it (F. Caputo), antonella.cioffi@st.com (A. Cioffi), antonio.esposito9@unina.it (A. Esposito), francesco.isgro@unina.it (F. Isgrò),
rosanna.manzo@unina.it (R. Manzo), nicola.moccaldi@unina.it (N. Moccaldi), danilo.pau@st.com (D. Pau), ettore.toscano@st.com (E. Toscano).
https://doi.org/10.1016/j.csi.2025.104120
Received 10 January 2025; Received in revised form 2 September 2025; Accepted 21 December 2025
Available online 22 December 2025
0920-5489/© 2025 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
A. Apicella et al. Computer Standards & Interfaces 97 (2026) 104120
dependency on cloud infrastructures [10,11]. It is common to refer to
these devices as tiny devices [12] and embedded ML as tiny machine
learning or tiny ML [5].
Consequently, assessing the inference time provided by the IoT
hardware for a specific ANN model is crucial to ensure that the em-
bedded system can satisfy real-time processing requirements. In this
context, inference refers to the process of an ANN generating outputs
based on its trained model parameters and given inputs.
Therefore, tailored energy consumption metrics are essential to
ensure the alignment between the ANN implementation and the en-
ergy constraints of the targeted IoT application. To this aim, Neural
MCUs are new edge devices embedding ANN accelerators, specifically
designed to manage the trade-off between reliability, latency, cost,
and power consumption [13]. Therefore, adopting standardized metrics
and procedures is essential for assessing the actual performance gains
achieved by neural MCUs in the context of embedded AI. Despite
several frameworks and tools have been proposed to facilitate the
benchmarking of tinyML models [1416], no standardized metrics and
procedures are currently defined.
Fig. 1. Energy measurement set up proposed by MLPerf Tiny Benchmark [17,
Among the proposed benchmarking protocols, MLPerf Tiny Bench- 19]. The DUT is powered by the Energy Monitor. The IO manager serves as
mark (MLPTB) [17] is developed by the MLCommons Association, an electrical-isolation proxy.
the largest and most authoritative community aimed at improving
the industrialization standardization process of machine learning [18].
MLPTB provides protocols and AI components, namely datasets and
functionalities: (i) sending a trigger signal, (ii) enabling UART commu-
pre-trained ML models. These can act as metrological references when
nication, (iii) generating and feeding random input data to the ANN,
implemented on different hardware to assess their performance such
(iv) performing inferences, and (v) printing the prediction results. The
as the inference time and the power consumption under real-world
software includes a graphical user interface that can be run on the Host
conditions. However, the MLPTB protocols exhibit some metrological
Computer, allowing the initiation of the measurement and monitoring
weakness: (i) both the assessment of time performance and energy
of input data. It is important to emphasize that in phase (iii) random
consumption is realized without measurement uncertainty computa-
data are generated to feed the ANN. This operation, however, does
tion, (ii) the energy consumption analysis is performed based on an
not reflect real-world applications, where the network processes sensor
approximate estimate of the average inference duration, and (iii) the
data in real time. Although not an intrinsic part of ANN inference,
impact on consumption caused by inferences is not isolated with respect
MLPTB includes this step in the performance and energy measurements.
to other processes.
Throughout this paper, phase (iii) is explicitly distinguished from phase
In this paper, a new method is proposed and validated to improve
(iv) (i.e., inference) and is referred to as the pre-inference phase.
MLPTB protocols to measure power consumption in MCUs running
ANNs, in a rigorous metrological framework. Specifically, in Section 2 The energy per inference (𝐸𝑖𝑛𝑓 ) is calculated using latency infor-
the MLPTB framework is reported, then the proposed method is pre- mation determined in the Performance phase. Specifically, the IPS is
sented in Section 3. Experiments and results are reported in Section 4 determined by taking the median value across five experiments. In each
and discussed in Section 5. experiment, input data is provided for a duration of at least 10 s, and
the number of inferences is recorded via a direct connection between
2. Background the Host Computer and the DUT. Given the IPS, 𝐸𝑖𝑛𝑓 is computed as:
𝐼𝑚 × 𝑉𝑛
𝐸𝑖𝑛𝑓 = (1)
Several frameworks and tools have been introduced to support 𝜏 × 𝐼𝑃 𝑆
the benchmarking of tinyML models [1416]. Among the available where 𝑉𝑛 is the nominal voltage, 𝐼𝑚 is the current averaged over the
benchmarking protocols, the MLPerf Tiny Benchmark (MLPTB) [17], fixed period 𝜏.
developed by the MLCommons Association [18], emerges as a key
initiative.
3. Proposed method
MLPTB proposes two modalities of assessment: (i) Performance and
(ii) Energy. The former measures Latency (inferences per second — IPS)
and accuracy (percentage of correct predictions to all predictions ratio) The MLCommons pre-inference phase generates random numbers as
through a direct USB connection between a Device Under Test (DUT) input to the ANN in order to perform inference (in addition to memory
and an host computer, while the latter measures energy (micro-joules operations needed to provide the input to the network). However, ran-
per inference). In the remainder of this section, the energy configura- dom number generation is hardly reproducible across different devices
tion mode is detailed, as it represents the central focus of this study. In under test, since both the libraries and the hardware resources available
the energy configuration mode (Fig. 1), an Energy Monitor is proposed on the microcontrollers for random number generation vary. In con-
to supply power to the DUT while measuring the current consumption. trast, the proposed work selectively excludes the pre-inference phase
An Input/Output Manager is introduced to interface the Host Computer from the performance and energy measurements, ensuring greater re-
with the DUT and serving as an electrical-isolation proxy. Furthermore, producibility while also providing a closer adherence to the actual
MLPTB requires level shifters to adapt the power supply in input to the operation of the device in real-world scenarios. In the following of this
DUT (not reported in Fig. 1 to simplify the schematic as they are not section, the proposed method is described. In paragraph 3.1 the circuit
essential to the discussion). solution for the joint measurement of time and energy consumption
In addition to defining assessment procedures, MLPTB provides is described. In paragraph 3.2 the expected impact of the method on
some firmware and software [19] for ML tasks on DUT. In particular, selectivity, accuracy, and uncertainty during the energy measurement
the provided firmware to be loaded onto the DUT ensures the following is highlighted.
2
A. Apicella et al. Computer Standards & Interfaces 97 (2026) 104120
inference. Furthermore, it is assumed with a non-negligible degree of
approximation that the inferences are executed consecutively by the
MCU, disregarding the impact of inter-inference operations that are
still present. Finally, the delays in the transmission of the command for
starting the measurement have a further impact on the accuracy, albeit
to a very small extent. Specifically, this refers to the time taken by the
CPU on the DUT to generate the trigger signal and by the Measurement
Board to handle the interrupt triggered at its input pin (see Fig. 3).
In the proposed method, limiting the observation to a single in-
ference at a time eliminates the approximation inherent in MLPTB,
where the inference duration is estimated through the average of
multiple successive inferences executed within a known time window.
Specifically, the proposed method allows the exclusion of all energy
contributions unrelated to the inference itself (e.g., data transfer op-
erations to memory during the pre-inference phase). However, in the
proposed method, the repetition of the measurement for each inference
amplifies the impact of inaccuracies caused by the delay in transmitting
the status signal. In contrast, the MLPTB approach mitigates this effect
Fig. 2. Proposed energy measurement setup. The Host Computer powers the
because the delay only occurs at the start of the measurement for
DUT and an ammeter is connected in series along the power line on the DUT multiple inferences. To address this issue, the inference duration (𝛥𝑡)
(e.g. a MCU). measurement is also performed. In the firmware for the DUT, the
onboard counter is read immediately before and after the inference
execution. The 𝛥𝑡, is used to appropriately resize the current sample
vector acquired while the inference status signal is active. The current
3.1. Circuit diagram and measurement procedure
sample vector is trimmed at both ends by a number of elements (𝑁𝑡𝑟𝑖𝑚 ),
calculated as follows:
The proposed method utilizes an ammeter that does not require ( )
powering the DUT to measure the absorbed current. The ammeter is 𝑓 𝑁𝑐𝑠
𝑁𝑡𝑟𝑖𝑚 = 𝑐 𝛥𝑡 (2)
connected in series to the microprocessor on the MCU powered by the 2 𝑓𝑐
Host Computer through the USB port (Fig. 2). This approach allows where 𝑓𝑐 is the sampling frequency of the Ammeter, 𝑁𝑐𝑠 is the number
the Host Computer to perform both latency and energy measurements of current samples acquired when the inference status signal is high,
simultaneously. Indeed, the firmware provided by MLPTB enables the and 𝛥𝑡 is the inference duration.
DUT to update the Host Computer on the number of completed infer-
ences through the USB connection. Instead of computing the energy 3.3. Uncertainty improvements
per inference as the ratio between the total energy measured in a
specific time window and the number of inferences (MLPTB method), Two distinct phases should be addressed in the evaluation of un-
the proposed method computes the energy for each inference without certainty: (i) the inference time measurement, and (ii) the energy
considering the impact of pre-inference phase. This is obtained by consumption assessment. In particular, an important source of un-
modifying the firmware provided by MLPTB: the trigger is replaced by certainty in MLPTB is due to the counting of inferences during the
a logic signal (inference status) that goes high during an ongoing infer- IPS measurement affecting inference time measurement and, conse-
ence and returns low otherwise. The inference status signal output from quently, also the energy consumption assessment. More deeply, the
the device under test is sampled by the Measurement Board (ammeter) measurement window is not an integer multiple of the inference period,
in parallel with the current (Fig. 3.a). Two vectors of synchronously therefore, there is no synchronization between the end of the last
sampled data (current and inference status signal) are sent to the Host inference and the end of the measurement window. This contribution
Computer. The current samples are processed, and the energy consump- can be modeled by a uniform random variable whose domain is equal
tion is calculated only when the inference status samples indicate a to the central value inference duration 𝛥𝑡𝑚 , with a standard deviation
low logic signal. Additionally, before and after each inference, the DUT 𝜎1𝑐𝑜𝑛𝑡 computed as:
reads the values of the Clock and Reset Management Unit (CRMU) and
𝛥𝑡
transmits them to the Host Computer to determine the duration of the 𝜎1𝑐𝑜𝑛𝑡 = 𝑢𝑡1 = √𝑚 (3)
inference. Finally, the software on the Host Computer computes the 2 3
mean value of 𝑁 inferences with associated uncertainty. In this work, The uncertainty of the MLPTB method is assessed by assuming the
𝑁 is set to 100. Similar to the MLPTB, the proposed firmware runs as median inference duration approximately equal to the mean. Differ-
the sole program on the MCU, with fully sequential execution and no ently, in the proposed method the counting uncertainty is determined
concurrency, or interrupts. Furthermore, in the proposed method, the by the fact that the inference duration is not an integer multiple of
inference status signal is set high immediately after the pre-inference the counter period (𝑇𝑐 ). Again, the random variable with uniform
phase, and the CRMU is queried right before the inference execution. probability distribution effectively describes this aspect. The standard
As soon as the inference completes, the CRMU is queried again, and deviation 𝜎2𝑐𝑜𝑛𝑡 is computed as:
finally the inference status is set low to signal the ammeter that the 𝑇
inference has finished. In Fig. 4, a flowchart describing the customized 𝜎2𝑐𝑜𝑛𝑡 = 𝑢𝑡2 = √𝑐 (4)
firmware behavior is reported. 2 3
Assuming that 𝛥𝑡𝑚 ≫ 𝑇𝑐 , it follows 𝑢𝑡1 ≫ 𝑢𝑡2 and the proposed method
3.2. Accuracy improvements improves the measurement uncertainty due to counting.
Then there is the uncertainty due to the variability of the duration
In the MLPTB, the number of inferences during the measurement time of the processes between the inferences (pre-inference phase). The
time in energy mode is calculated using the IPS obtained from the proposed method is not affected by this source of uncertainty because
previous latency measurement. This approach introduces accuracy is- it excludes from the energy measurement all the processes outside
sues because an estimator is used instead of the actual time of each the inference. Finally, both methods are exposed to the uncertainty
3
A. Apicella et al. Computer Standards & Interfaces 97 (2026) 104120
Fig. 3. Comparison between the block diagram of the proposed method (a) and ML Commons-Tiny approach (b) for energy consumption measurement. The
added blocks and signals are reported in red. In the proposed method, the Device Under Test stops the power consumption computation after each inference.
Differently, in the MLCommons-Tiny approach, the Host Computer stops the acquisition of current samples after a fixed time window, without distinguishing
between pre-inference and inference phases. Furthermore, it computes the energy consumption (μJ per inference) based on the Inference per Second measured
exploiting the Performance mode (see Section 2.) The Counter and the Time Calculator blocks are used for the measurement of the duration of each inference,
while an Inference Status ADC minimizes the latency between the inference start and current sample consideration. (For interpretation of the references to color
in this figure legend, the reader is referred to the web version of this article.)
according to the following formula [20]:
𝑢𝑐 = 𝑢2𝐴 + 𝑢2𝐵 + 𝑢2𝐵 + ⋯ + 𝑢2𝐵 . (5)
1 2 𝐾
4. Experiments and results
In this section, a comparison between the application of the pro-
posed and MLPTB methods is presented. In paragraph 4.1 the ex-
perimental procedure is described. The DUTs and the ammeter are
presented in paragraph 4.2. Results are reported in paragraph 4.3.
4.1. Experimental procedure
The MLPTB method was implemented using two different circuit
configurations for measuring inference duration and energy per infer-
ence, as described in [17]. Instead, in the proposed method the two
measures were realized with the same circuital solution shown in Fig. 2.
The Firmware used for MLPTB measurement was modified to allow the
measurement of the single inference as described in the paragraph 3.1.
The four MLPerf benchmarks were retained: (i) Anomaly Detection, (ii)
Keyword Spotting, (iii) Image Classification, (iv) Visual Wake Words.
Each benchmark targets a specific use case and specifies a dataset, a
model, and a quality target [17].
4.2. Experimental setup
Both methods are applied on three different MCU: STMicroelec-
tronics STM32-H7 (Clock Frequency = 280 MHz), STMicroelectronics
STM32-U5 (Clock Frequency = 160 MHz), and Rockchip RV1106 (Clock
Fig. 4. Flow chart of the proposed Firmware. The pre-inference phase (in red) Frequency = 1200 MHz). The STM32H7 and the STM32U5 are general-
is excluded from both time (CRMU timestamp read) and energy assessment purpose microcontrollers, the former designed for high-performance
(Inference Status digital signal setting and unsetting). (For interpretation of applications and the latter for ultra-low-power operation, both pro-
the references to color in this figure legend, the reader is referred to the web duced by STMicroelectronics. These devices do not have any ded-
version of this article.) icated Neural Processing Unit (NPU) hardware for ANN computa-
tion, so this part is commonly made by implemented firmware that
run on main Central Process Unit (CPU). The firmware is automati-
of the stability of the DUT (jitter) and ammeter precision, as well cally deployed using ST EdgeAI Core Technology and compiled through
as to the uncertainty of the signal transmission times between the STMCubeIDE [21] compiler implementing all needed tools to convert,
devices involved in the measurement process. For the calculation of optimize, and implement ANN models on the DUT.
the measurement uncertainty, the combined standard uncertainty 𝑢𝑐 is The evaluation boards of the STMicroelectronics Nucleo-STM32H7
adopted, where the contribution from the type A evaluation (𝑢𝐴 ) is with STM32H7 microcontroller and B-U585I-IOT02 A Discovery Kit
integrated with the 𝐾 contributions from the type B evaluations (𝑢𝐵𝑘 ), with STM32U5 microcontroller were chosen for the experimental setup
4
A. Apicella et al. Computer Standards & Interfaces 97 (2026) 104120
(a) (b) (c)
(d)
Fig. 5. Hardware components used in the experiments: (a) H7 board with STM32H7 MCU, (b) Luckfox Pico Pro Max with Rockchip RV1106 SoC, (c) B-U585I-
IOT02 A Discovery Kit with STM32U5 MCU, and (d) Power Profiler Kit II ammeter.
(Figs. 5(a), 5(c)). They include a connector in series to the MCUs power counter values returned by two consecutive CRMU readings. On each
supply line allowing an ammeter to be inserted to assess the power board, 30 experiments were performed, each providing two latency
consumption of the DUT under operating conditions. values. For each board, the mean value and type A uncertainty were
The RV1106 is a System on Chip (SoC) produced by Rockchip Elec- computed. In the worst case, namely the Rockchip, the latency was
tronics. This device has a dedicated NPU hardware, so the computation found to be 7 ± 4 CPU clock cycles (2 ± 1 for the other two boards),
of ANN models are made by hardware, and the software shall only which corresponds to only a few nanoseconds. Tables 1, 2, and 3
allocate necessary data into a dedicated memory area. While STM32 present the results of inference duration (𝛥𝑡) assessments conducted
microcontrollers operate without an operating system, RV1106 requires using both the MLPTB and the proposed methods. The results are
the use of an operating system given its CPU architecture. Ubuntu reported for the Rockchip RV1106, STM32H7, and STM32U5, respec-
22.04 RT [22] was therefore installed to minimize execution timing tively, with varying ANN models. Concerning uncertainty computation,
uncertainties. the MLPTB method does not provide strategies for calculating mea-
The software is deployed using RKNN Toolkit compiler that im- surement uncertainty and, in this work, it was computed by referring
plements all needed tools to convert, optimize, and implement ANN to the sole contribution of the counting inferences (Eq. (2)). In the
models on the device. The evaluation board with Rockchip RV1106 proposed method, since the Clock and Reset Management Unit (CRMU)
chosen for the experimental setup is the Luckfox Pico Pro Max (Fig. of the MCUs is employed for inference time measurement, the type
5(b)). The ammeter is inserted between USB-C main supply and the A uncertainty is combined with type B contributions arising from
SoCs power supply line in order to assess the power consumption of counting uncertainty, system clock stability (jitter), and the response
device under operative conditions. time required by the CRMU to be queried and to return a value.
The measurement board used for the power assessment is the Power For all the considered microcontrollers, the type B contribution was
Profiler Kit II (PPKII) produced by Nordic Semiconductor (Fig. 5(d)). found to be dominated by the counting uncertainty, computed using
This device is composed by an ammeter and a 8-bits digital sampler formula (4), and equal to 289 ns. The jitter contribution is at least
synchronized with the same time base. It can work into two different three orders of magnitude smaller at room temperature (between 20 ◦ C
modes that affect the only ammeter component: and 30 ◦ C) [2325]. Similarly, the uncertainty related to the CRMU
response time, characterized in this work for all three microcontrollers,
• Source Meter: With this mode, the internal ammeter is linked was found to be equal to 1 CPU clock cycle. In the worst case, i.e., con-
to a power supply generator that can be used to provide the sidering the STM32U5 device with the lowest CPU clock frequency, this
power supply to DUT. This mode was adopted for the MLPTB contribution was on the order of nanoseconds. Therefore, the overall
implementation evaluated uncertainty corresponds to the joint contribution of type A
• Ammeter Mode: With this mode, the instrument works as a pure and type B, with the latter coinciding with the counting uncertainty,
ammeter and the power supply of DUT can be provided ex- according to:
ternally. This mode was implemented in the proposed method √
application. 𝑢𝑡 = 𝑢2𝐴 + 𝑢2𝐵 (6)
For both modes, the device was metrologically characterized under To propagate the measurement uncertainty of the 𝛥𝑡 on the energy
operating conditions of 2030 ◦ C (the same conditions used for all per inference (𝐸𝑖𝑛𝑓 ) measurement, a constant power 𝑃 is assumed
experiments), exhibiting an uncertainty of less than 2%. during the inference time, obtaining the following propagation formula:
4.3. Results
𝐸𝑖𝑛𝑓 = 𝑃 𝛥𝑡 ⇒ 𝑢𝑒 = 𝑃 𝑢𝑑 (7)
For the proposed method, a characterization of the CRMU query where 𝑢𝑒 is the energy per inference measurement uncertainty. With
latency was carried out on all devices. A modified version of the same respect to the energy consumption estimation, an additional uncer-
firmware used for the energy consumption assessment was employed. tainty source arises from the measuring instrument, i.e., the ammeter
Specifically, an additional CRMU query was appended directly after employed. For both methods, an instrumental uncertainty of 2% was
the preceding one, making it consecutive to the two already present. considered, after a metrological characterization performed under oper-
The CRMU query latency was measured as the difference between the ational conditions at room temperature (between 20 ◦ C and 30 ◦ C). The
5
A. Apicella et al. Computer Standards & Interfaces 97 (2026) 104120
Table 1
Comparison of central value (𝑚𝑡 ) and uncertaintya (𝑢𝑡 ) of inference duration (expressed in ms) assessed by MLCommons and
proposed methods on Rockchip RV1106 at varying of neural models.
Method Visual Wake Words Image Classification Keyword Spotting Anomaly Detection
𝑚𝑡 𝑢𝑡 𝑚𝑡 𝑢𝑡 𝑚𝑡 𝑢𝑡 𝑚𝑡 𝑢𝑡
Proposed 0.820 0.006 0.415 0.012 0.400 0.008 0.558 0.033
MLPTB 0.815 0.235 0.414 0.120 0.371 0.107 0.350 0.101
a
In MLPTB, the counting uncertainty was taken into account.
Table 2
Comparison of central value (𝑚𝑡 ) and uncertaintya (𝑢𝑡 ) of inference duration (expressed in ms) assessed by MLCommons and
proposed methods on STM32H7 microcontroller at varying of neural models.
Method Visual Wake Words Image Classification Keyword Spotting Anomaly Detection
𝑚𝑡 𝑢𝑡 𝑚𝑡 𝑢𝑡 𝑚𝑡 𝑢𝑡 𝑚𝑡 𝑢𝑡
Proposed 29.656 0.003 49.941 0.001 14.860 0.001 1.690 0.002
MLPTB 29.600 8.545 51.900 14.982 15.400 4.446 1.800 0.520
a In MLPTB, the Counting Uncertainty was taken into account.
Table 3
Comparison of central value (𝑚𝑡 ) and uncertaintya (𝑢𝑡 ) of inference duration (expressed in ms) assessed by MLCommons and
proposed methods on STM32U5 microcontroller at varying of neural models.
Method Visual Wake Words Image Classification Keyword Spotting Anomaly Detection
𝑚𝑡 𝑢𝑡 𝑚𝑡 𝑢𝑡 𝑚𝑡 𝑢𝑡 𝑚𝑡 𝑢𝑡
Proposed 78.447 0.002 133.280 0.002 48.060 0.001 4.910 0.002
MLPTB 71.600 20.669 128.200 37.008 38.600 11.143 4.800 1.386
a
In MLPTB, the Counting Uncertainty was taken into account.
Table 4
Comparison of central value (𝑚𝑡 ) and uncertaintya (𝑢𝑒 ) of energy (expressed in μJ) assessed by MLCommons and proposed methods
on Rockchip RV1106 at varying of neural models.
Method Visual Wake Words Image Classification Keyword Spotting Anomaly Detection
𝑚𝑡 𝑢𝑒 𝑚𝑡 𝑢𝑒 𝑚𝑡 𝑢𝑒 𝑚𝑡 𝑢𝑒
Proposed 380 13 193 15 165 9 222 11
MLPTB 373 108 183 53 159 46 148 43
a
In MLPTB, the counting uncertainty was propagated into the energy measurements.
Table 5
Comparison of central value (𝑚𝑡 ) and uncertaintya (𝑢𝑒 ) of energy (expressed in μJ) assessed by MLCommons and proposed methods
on STM32H7 microcontroller at varying of neural models.
Method Visual Wake Words Image Classification Keyword Spotting Anomaly Detection
𝑚𝑡 𝑢𝑒 𝑚𝑡 𝑢𝑒 𝑚𝑡 𝑢𝑒 𝑚𝑡 𝑢𝑒
Proposed 4386 88 7536 151 2202 44 236 6
MLPTB 3699 1068 6311 1822 1870 540 221 64
a In MLPTB, the counting uncertainty was propagated into the energy measurements.
final uncertainty was thus obtained by applying the following formula: trends: for two networks, the measured consumption is higher with the
proposed method, while for the other two networks it is higher with
√ MLCommons. Regarding the uncertainty, the proposed method reduces
𝑢𝑒 = 𝑢2𝑡 + 𝑢2𝑠 (8)
𝑝 it by a factor of 12.
where 𝑢𝑡𝑝 denotes the inference time measurement uncertainty 𝑢𝑡 prop-
agated through the functional relation used for energy computation 5. Discussion
(see formula), and 𝑢𝑠 represents the instrumental uncertainty of the
ammeter. The measurement uncertainty obtained for the proposed The contrasting trends from energy assessment on STM32U5 pro-
method appears for all tested devices to be very low compared to the vide an opportunity to discuss the relationship between the two meth-
uncertainty of the MLPTB method. ods in terms of metrological accuracy. The MLCommons method ex-
In Tables 4, 5, and 6 a comparison between results of energy per tracts a central Inference Per Second value based on five experiments,
inference assessment by MLPTB and proposed methods are reported for whereas our method computes a central value as the mean over 100
the three DUTs. On the Rockchip RV1106, the proposed method mea- acquisitions. Given the large uncertainty of the MLPTB method and
sures an inference energy value that is, on average, 15% higher than the limited number of experiments, the calculated central value is
that obtained with MLPTB, while improving the uncertainty by a factor unlikely to be a reliable estimator of the true value of the measured
of 6. In the case of a STM32H7 inference energy assessment grows quantity [26]. The comparison of mean values obtained with the two
by 16% while the uncertainty improves by a factor of 12. Notably, methods is limited by the large difference in their associated uncertain-
the inference energy assessment on the STM32U5 shows contrasting ties. The less precise method exhibits an uncertainty up to two orders
6
A. Apicella et al. Computer Standards & Interfaces 97 (2026) 104120
Fig. 6. Temporal diagram of current values acquired from MCU during ANN operations. Orange traces represent (a) the inference status signal in the proposed
method and (b) the trigger signal in the MLPTB method. The windows used for energy consumption estimation are highlighted in light blue. Specifically, the
proposed method (a) considers only the current samples acquired during each neural network inference phase, whereas the MLPTB method (b) also includes the
energy contribution of pre-inference phases (light yellow window). (For interpretation of the references to color in this figure legend, the reader is referred to
the web version of this article.)
Fig. 7. Comparison between proposed method (orange) and MLPTB (green) in Energy per inference Assessment on the Rockchip RV1106, at varying th Models
provided by MLCommons. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Table 6
Comparison of central value (𝑚𝑡 ) and uncertaintya (𝑢𝑒 ) of energy (expressed in μJ) assessed by MLCommons and proposed methods
on STM32U5 microcontroller at varying of neural models.
Method Visual Wake Words Image Classification Keyword Spotting Anomaly Detection
𝑚𝑡 𝑢𝑒 𝑚𝑡 𝑢𝑒 𝑚𝑡 𝑢𝑒 𝑚𝑡 𝑢𝑒
Proposed 2362 47 3249 65 1184 27 116 3
MLPTB 1921 556 3384 980 1004 291 121 35
a
In MLPTB, the counting uncertainty was propagated into the energy measurements.
of magnitude higher than the other, rendering direct statistical com- by low energy consumption) from the calculation (Fig. 6). This prevents
parisons of the means largely insignificant. Observed differences may underestimation of the actual energy consumption, which may occur
therefore primarily reflect the inherent variability of the less accurate when using the MLPTB method.
method rather than genuine differences in the measured phenomenon. Finally the Figs. 7, 8, and 9 present the histograms of Energy
However, it is important to note that the proposed method provides per Inference assessment with the two methods on Rockchip RV1106,
greater selectivity by excluding the pre-inference phase (characterized STM32H7, and STM32U5, respectively. The orange bars (proposed
7
A. Apicella et al. Computer Standards & Interfaces 97 (2026) 104120
Fig. 8. Comparison between proposed method (orange) and MLPTB (green) in Energy per inference Assessment on the STM32 H7, at varying th Models provided
by MLCommons. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 9. Comparison between proposed method (orange) and MLPTB (green) in Energy per inference Assessment on the STM32 U5, at varying th Models provided
by MLCommons. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
method) are generally higher than the green bars (MLPTB). However, 6. Conclusions
comparing the mean values measured by the two methods is challeng-
ing due to the large uncertainty intervals (error bars) associated with A new method for assessing power consumption of edge devices
MLPTB. Nevertheless, the differences in error bar lengths confirm the such as MCUs running ANNs is presented, claiming metrological im-
improved precision of the proposed method. provements over the MLPerf Tiny Benchmark. Unlike MLPTB, the
The metrological improvements introduced in this work have direct proposed method calculates the duration and energy consumption of
each individual inference performed by the Device Under Test. Through
consequences for the practical adoption of embedded AI. First, more
an appropriate circuit and firmware design, the method measures only
accurate and reproducible energy assessments enhance the reliability of
the energy consumed by the inference, excluding other operations from
benchmarking, enabling fair comparisons among devices and support-
the computation. This approach not only enhances the selectivity and
ing informed selection of hardware for battery-powered applications,
accuracy of the measurement process but also reduces measurement
where autonomy is a critical design constraint. Second, the improved uncertainty. Instead of counting the number of inferences over a fixed
accuracy in energy characterization facilitates more precise sizing of interval, as MLPTB does, the proposed method counts the number of
power supply components, which is essential for ensuring efficiency, ticks from the counter of the DUT during a single inference execution.
stability, and cost-effectiveness in embedded deployments. Finally, the On a NPU powered microcontroller, the proposed method improves
refined timing characterization allows designers to better estimate measurement uncertainty by a factor of 6. In the case of two general-
inference latency, a key parameter for real-time and safety-critical purpose microcontrollers (high-performance and ultra-low-power), the
applications. measurement uncertainty improves by a factor of 12.
8
A. Apicella et al. Computer Standards & Interfaces 97 (2026) 104120
CRediT authorship contribution statement [6] M. Cunneen, M. Mullins, F. Murphy, Autonomous vehicles and embedded
artificial intelligence: The challenges of framing machine driving decisions, Appl.
Artif. Intell. 33 (8) (2019) 706731.
Andrea Apicella: Writing review & editing, Methodology, Con-
[7] J. Li, S. Dang, M. Wen, Q. Li, Y. Chen, Y. Huang, W. Shang, Index modulation
ceptualization. Pasquale Arpaia: Writing review & editing, Method- multiple access for 6G communications: Principles, applications, and challenges,
ology, Conceptualization. Luigi Capobianco: Writing review & edit- IEEE Netw. 37 (1) (2023) 5260.
ing, Methodology, Conceptualization. Francesco Caputo: Writing re- [8] M. Wen, B. Zheng, K.J. Kim, M. Di Renzo, T.A. Tsiftsis, K.-C. Chen, N.
view & editing, Writing original draft, Visualization, Validation, Soft- Al-Dhahir, A survey on spatial modulation in emerging wireless systems: Re-
search progresses and applications, IEEE J. Sel. Areas Commun. 37 (9) (2019)
ware, Methodology, Investigation, Formal analysis, Data curation, Con- 19491972.
ceptualization. Antonella Cioffi: Writing review & editing, Methodol- [9] M.I. Jordan, T.M. Mitchell, Machine learning: Trends, perspectives, and
ogy, Conceptualization. Antonio Esposito: Writing review & editing, prospects, Science 349 (6245) (2015) 255260.
Methodology, Conceptualization. Francesco Isgrò: Writing review [10] S. Mishra, J. Manda, Improving real-time analytics through the internet of things
and data processing at the network edge, J. AI Assist. Sci. Discov. 4 (1) (2024)
& editing, Methodology, Conceptualization. Rosanna Manzo: Writ-
184206.
ing review & editing, Methodology, Conceptualization. Nicola Moc- [11] M. De Donno, K. Tange, N. Dragoni, Foundations and evolution of mod-
caldi: Writing review & editing, Methodology, Conceptualization. ern computing paradigms: Cloud, IoT, edge, and fog, IEEE Access 7 (2019)
Danilo Pau: Writing review & editing, Methodology, Conceptual- 150936150948.
ization. Ettore Toscano: Writing review & editing, Methodology, [12] D.P. Pau, P.K. Ambrose, F.M. Aymone, A quantitative review of automated neural
search and on-device learning for tiny devices, Chips 2 (2) (2023) 130141.
Conceptualization. [13] C.-T. Lin, P.X. Huang, J. Oh, D. Wang, M. Seok, iMCU: A 102-𝜇J, 61-ms digital
in-memory computing-based microcontroller unit for edge TinyML, in: 2023 IEEE
Declaration of competing interest Custom Integrated Circuits Conference, CICC, IEEE, 2023, pp. 12.
[14] S. Gal-On, M. Levy, Exploring coremark a benchmark maximizing simplicity and
efficacy, Embed. Microprocess. Benchmark Consortium (2012).
The authors declare that they have no known competing finan-
[15] P. Torelli, M. Bangale, Measuring Inference Performance of Machine-Learning
cial interests or personal relationships that could have appeared to Frameworks on Edge-Class Devices with the Mlmark Benchmark, Techincal Re-
influence the work reported in this paper. port, 2021, Available Online: https://www.eembc.org/techlit/articles/MLMARK-
WHITEPAPERFINAL-1.pdf. (Accessed on 5 April 2021).
Acknowledgments [16] B. Sudharsan, S. Salerno, D.-D. Nguyen, M. Yahya, A. Wahid, P. Yadav, J.G.
Breslin, M.I. Ali, Tinyml benchmark: Executing fully connected neural networks
on commodity microcontrollers, in: 2021 IEEE 7th World Forum on Internet of
This work was carried out within the DHEAL-COM project (ID: PNC- Things, WF-IoT, IEEE, 2021, pp. 883884.
E3-2022-23683267 PNC HLS DH; CUP: E63C22003790001), which [17] C. Banbury, V.J. Reddi, P. Torelli, J. Holleman, N. Jeffries, C. Kiraly, P. Montino,
was financially supported by the Italian Ministry of Health through D. Kanter, S. Ahmed, D. Pau, et al., Mlperf tiny benchmark, 2021, arXiv preprint
arXiv:2106.07597.
the Complementary National Plan (CNP) to the PNRR. This publication
[18] MLCommons, 2024, URL: https://mlcommons.org/benchmarks/inference-tiny/.
reflects only the authors view and the Italian Ministry of Health is not [19] Performance mode vs. Energy mode, 2022, URL: https://github.com/eembc/
responsible for any use that may be made of the information it contains. energyrunner?tab=readme-ov-file#performance-mode-vs-energy-mode.
[20] B.N. Taylor, C.E. Kuyatt, Guidelines for Evaluating and Expressing the Un-
Data availability certainty of NIST Measurement Results, NIST Technical Note 1297, National
Institute of Standards and Technology (NIST), Gaithersburg, MD, 2020, http:
//dx.doi.org/10.6028/NIST.TN.1297-2020.
Data will be made available on request. [21] STMCubeIDE, 2022, URL: https://stm32ai.st.com/stm32-cube-ai/.
[22] Ubuntu 12 RT, 2012, Real-time variant of Ubuntu 12, Canonical Ltd. https:
//ubuntu.com/real-time. Canonical Ltd.
References [23] STMicroelectronics, STM32H753xI - 32-bit Arm® Cortex® -M7 480MHz MCUs,
2MB flash, 1MB RAM, 46 com. and Analog Interfaces, Crypto - Datasheet -
[1] R. Chataut, A. Phoummalayvane, R. Akl, Unleashing the power of IoT: A Production Data, Datasheet DS12117 Rev 9, STMicroelectronics, 2023, p. 358,
comprehensive review of IoT applications and future prospects in healthcare, URL: https://www.st.com/resource/en/datasheet/stm32h753vi.pdf. (Accessed 21
agriculture, smart homes, smart cities, and industry 4.0, Sensors 23 (16) (2023) August 2025).
7194. [24] STMicroelectronics, STM32U575xx - Ultra-low-power Arm® Cortex® -M33 32-bit
[2] Q. Ma, H. Tan, T. Zhou, Mutual authentication scheme for smart devices in MCU+TrustZone® +FPU, 240 DMIPS, up to 2 MB Flash memory, 786 KB SRAM -
IoT-enabled smart home systems, Comput. Stand. Interfaces 86 (2023) 103743. Datasheet - production data, Datasheet DS13737 Rev 10, STMicroelectronics,
[3] C.-W. Shih, C.-H. Wang, Integrating wireless sensor networks with statistical 2024, p. 346, URL: https://www.st.com/resource/en/datasheet/stm32u575ag.
quality control to develop a cold chain system in food industries, Comput. Stand. pdf. (Accessed 21 August 2025).
Interfaces 45 (2016) 6278. [25] UEC Electronics, AR4236AR4237 Luckfox Pico Pro/Max Datasheet,
[4] S.B. Baker, W. Xiang, I. Atkinson, Internet of things for smart healthcare: Datasheet, UEC Electronics, 2024, URL: https://uelectronics.com/wp-
Technologies, challenges, and opportunities, IEEE Access 5 (2017) 2652126544. content/uploads/2024/07/AR4236-AR4237-Luckfox-Pico-Pro-Max-Datasheet.pdf.
[5] Y. Abadade, A. Temouden, H. Bamoumen, N. Benamar, Y. Chtouki, A.S. Hafid, (Accessed 21 August 2025).
A comprehensive survey on tinyml, IEEE Access (2023). [26] I. BIPM, I. IFCC, I. ISO, O. IUPAP, Evaluation of measurement data—guide to
the expression of uncertainty in measurement, JCGM 100: 2008 GUM 1995 with
minor corrections, Jt. Comm. Guides Metrol. 98 (2008).
9

View File

@@ -0,0 +1,834 @@
Journal of Systems Architecture 160 (2025) 103346
Contents lists available at ScienceDirect
Journal of Systems Architecture
journal homepage: www.elsevier.com/locate/sysarc
Fast post-quantum private set intersection from oblivious pseudorandom
function for mobile social networks✩
Zhuang Shan a , Leyou Zhang a ,, Qing Wu b , Qiqi Lai c , Fuchun Guo d
a School of Mathematics and Statistics, Xidian University, Xian 710126, China
b
School of Automation, Xian University of Posts and Telecommunications, Xian 710121, China
c
School of Computer Science, Shaanxi Normal University, Xian 710121, China
d
Centre for Computer and Information Security Research, University of Wollongong, Wollongong, NSW 2522, Australia
ARTICLE INFO ABSTRACT
Keywords: Mobile social networks have become integral to our daily lives, transforming communication methods and
Mobile social networks facilitating social interactions. With technological advancements, users generate vast amounts of valuable
Private set intersection and sensitive personal data, which is stored on servers to enable instant information sharing. To protect the
Oblivious pseudorandom function
sharing data, each platform has implemented many techniques such as end-to-end encryption mechanisms,
Private information retrieval
fully homomorphic encryption, etc. However, these approaches face several security and privacy challenges,
including potential leaks of user data, vulnerabilities in encryption that expose privacy ciphertexts to
probabilistic attacks, and threats posed by future quantum computers.
Aimed at the above, we introduce a private set intersection (PSI) protocol based on oblivious pseudorandom
functions (OPRF) under ring LPR problem from lattice. The proposed perturbed pseudorandom generator
not only enhances the PSIs resistance to probabilistic attacks, but also leads to generate a more efficient
OPRF and a PSI. It boasts a time complexity of 𝑂(𝑛 log 𝑛) and is superior to existing well-known fast post-
quantum PSI protocol operating at 𝑂(𝑚𝑛 log(𝑚𝑛)), where 𝑚 is the bit length of the cryptographic modulus and 𝑛
represents the dimension of the security parameter. Simulation experiments and security analyses demonstrate
that our proposal effectively preserves user privacy, ensures collusion resilience, verifies computation results,
and maintains low computational costs. Finally, as an expansion of our OPRF, we also give a fast private
information retrieval (PIR) protocol.
1. Introduction respective data sets. This way, even if data is stored in distributed
systems, it can effectively prevent data breaches and violations of user
Mobile social networks have greatly enriched the ways people com- privacy, such as those caused by data leaks or unauthorized access.
municate and enhanced the convenience of social interactions. With the The application of PSI in mobile social networks not only enhances
development of technology, users generate a large amount of useful data security but also strengthens user trust in the platform, which
and sensitive personal data within mobile social networks. This data
is crucial for protecting user privacy and improving the platforms
often needs to be stored and processed to provide more personalized
competitiveness. In this way, mobile social networks can continue to
services and experiences [1,2]. However, due to the limited storage
capacity of mobile social network devices, it is impossible to store all provide a rich and vibrant social experience and efficient information
the data generated at any given moment, which presents challenges for services while safeguarding personal privacy. Furthermore, as an im-
data storage and privacy protection. portant application in the field of privacy computing, PSI has recently
To address this issue while ensuring data confidentiality and se- garnered widespread attention due to its efficiency and practicality,
curity, many mobile social network platforms have started adopting jointly promoting the rapid implementation of privacy computing tech-
advanced privacy-preserving technologies, such as private set inter- nology and ensuring the secure flow and value extraction of data
section (PSI). The technology allows two or more parties to securely elements.
compute the intersection of their datasets without disclosing their
✩ This document is the results of the research project funded by the National Science Foundation.
Corresponding author.
E-mail addresses: arcsec30@stu.xidian.edu.cn (Z. Shan), lyzhang@mail.xidian.edu.cn (L. Zhang), xiyouwuq@126.com (Q. Wu), laiqq@snnu.edu.cn (Q. Lai),
fuchun@uow.edu.au (F. Guo).
https://doi.org/10.1016/j.sysarc.2025.103346
Received 3 November 2024; Received in revised form 24 December 2024; Accepted 16 January 2025
Available online 25 January 2025
1383-7621/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Z. Shan et al. Journal of Systems Architecture 160 (2025) 103346
set intersection from oblivious pseudorandom function is proposed in
this paper, and it has the following advantages:
• Symmetric encryption is adopted, which is efficient and reduces the risk of
privacy leakage. The PSI in this paper is constructed based on OPRF,
which belongs to asymmetric encryption, thus reducing the number
of interactions between users and lowering the risk of user privacy
leakage. Compared to symmetric encryption, the operational cost of
asymmetric encryption is lower, reducing reliance on authoritative
institutions.
• The structure of OPRF is simple, and it is relatively efficient in post-
quantum OPRF. The OPRF used to construct PSI in this paper is based
on a new lattice problem, namely the learning parity with rounding
Fig. 1. Mobile social networks.
over ring problem(Ring-LPR). The Ring-LPR problem not only has a
simple structure but also possesses the capability to resist quantum
attacks.
• A perturbed pseudorandom generator (PPRG) can withstand probabilistic
attacks. In addition to OPRF, the PSI in this paper also includes
a structure with a perturbed pseudorandom generator, which can
overcome the weakness of weak encryption in symmetric encryp-
tion, thereby preventing adversaries from guessing the corresponding
plaintext using statistical methods on the ciphertext ratios.
Fig. 2. Private set intersection. 1.2. Technical overview
We adopted oblivious transfer technique and hamming correlation
There are many common construction tools for PSI [3], and obliv- robustness, both of which are used in the OPRF construction presented
ious transfer (OT) is one of them. An OT [4] is a crucial tool used in this paper. For the incidental pseudorandom function subject, we
for secure multiparty computation. In this tool, the sender transmits initially aimed to use learning parity with noise (LPN) over rings.
data from a set of messages to the receiver but remains oblivious to However, this approach results in varying encryption outcomes for the
which specific message was sent, while the receiver is unaware of the same private data, preventing the recipient from matching the private
other messages they did not receive. This protocol is also known as the
data. Thus, we sought to make LPN over rings behave consistently
oblivious transfer protocol. The essence of an oblivious pseudorandom
like learning with rounding (LWR), leading to the introduction of the
function is a pseudorandom function (PRF) enhanced with oblivious
concept of learning parity with rounding over rings (LPR over rings) in
transfer capabilities.
this paper.
In 1986, Goldreich, Goldwasser, and Micali introduced a new cryp-
To prove that LPR over rings is quantum-resistant, we established
tographic primitive known as the pseudorandom function, whose out-
put appears to be randomly chosen [5]. Two decades later, Naor and a reduction bridge between LPR over rings and LWR. Yes, LPR over
Reingold [6] noticed that their number-theoretic PRF allows for an rings is reduced to LWR, not LPN over rings. For (𝑞 = 2𝑛 , 𝑝)-LWR
interactive and oblivious evaluation, where a client with input 𝑥 instances, we demonstrated the hardness of (𝑞 = 2, 𝑝 = 1)-LWR instances
obtains 𝐹𝑘 (𝑥) for a function 𝐹𝑘 (𝑥) that is contributed by a server. and (𝑞 = 2, 𝑝 = 1)-LWR over rings, where (𝑞 = 2, 𝑝 = 1)-LWR over
Neither does the client learn the function (i.e., its key 𝑘), nor does the rings corresponds to LPR over rings. To verify that the computational
server learn 𝑥 or 𝐹𝑘 (𝑥). Freedman et al. later called such two-party efficiency of the post-quantum OPRF in this paper is quite fast, we
protocol an OPRF and gave first formal definitions and two OPRFs compared the OPRF with the LWE-instantiated OPRF from [14]. The
based on the Naor-Reingold PRF [7]. In 2009, Jarecki and Liu presented results showed that, as theoretical analysis suggested, the computation
an efficient OPRF for securing intersection data [8]. efficiency improves with the increase of security parameters.
Oblivious pseudorandom functions have been utilized in PSI [9]. Based on OPRF, we constructed private set intersection (PSI) based
The additional functionalities of oblivious pseudorandom functions on OPRF. Since the paper [15] analyzed that PSI based on symmetric
also exhibit diversity, such as verifiable oblivious pseudorandom func- encryption does not resist probabilistic attacks and proposed the con-
tions (VOPRF, [10]) and partially oblivious pseudorandom functions cept of perturbed pseudorandom generator, we used LPN over rings
(POPRF, [11]). to construct a pseudorandom generator and proved that it satisfies the
Currently, OPRFs still faces challenges, as summarized by Casacu- definition of PPRG as given in [15].
berta, Hesse, and Lehmann [12]. Efficient OPRF constructions often
rely on discrete-log or factoring-type hardness assumptions, which
1.3. Organizations
are vulnerable to quantum computers. This paper aims to address
this by constructing OPRFs based on lattice-hardness assumptions and
improving their efficiency (see Figs. 1 and 2). The structure of this paper is as follows. Section 3 provides the
necessary definitions and lemmas as a foundation for the readers
1.1. Contributions knowledge. Section 4 presents the construction and efficiency analysis
of OPRF, along with the definition and reduction of Ring-LPR. Section 5
Regarding the open problem proposed by Casacuberta, there are details the construction of the PSI in this paper, security proofs, and
currently quantum-resistant OPRFs, namely Albrecht et al.s lattice- LWE-based efficiency analysis, as well as the construction of the PPRG
based VOPRF [10] and Boneh et al.s isogeny-based OPRF [13]. Both and the proof of its pseudorandomness. Finally, Section 6 summarizes
constructions represent significant feasibility results but require further the advantages and limitations of the PSI presented in this paper, as
research to improve their efficiency [12]. So, fast post-quantum private well as the extension of OPRF to PIR
2
Z. Shan et al. Journal of Systems Architecture 160 (2025) 103346
2. Preliminary ⎛ 0 0 0 ⋯ 0 1 ⎞
⎜ 1 0 0 ⋯ 0 0 ⎟
Each element of a lattice in R𝑛 can be expressed linearly by 𝑛 ⎜ ⎟
0 1 0 ⋯ 0 0 ⎟
𝑋=⎜ .
linearly independent vector integer coefficients. This set of linearly ⎜ 0 0 1 ⋯ 0 0 ⎟
independent vectors is called a lattice basis, and we know that the ⎜ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⎟⎟
lattice basis is not unique. Given a set of lattice bases (𝑣1 , … , 𝑣𝑛 ) in ⎝ 0 0 0 ⋯ 1 0 ⎠
the lattice , then the fundamental parallelelepiped is
{ 𝑛 } So there is
∑ |
(𝑣1 , … , 𝑣𝑛 ) = 𝑘𝑖 𝑣𝑖 ||𝑘𝑖 ∈ [0, 1) . ⎛ 𝑎0 𝑎𝑛1 ⋯ 𝑎1 ⎞
| ⎜ ⎟
𝑖=1 𝑎1 𝑎0 ⋯ 𝑎2 ⎟
𝑅𝑜𝑡(𝑓 ) = ⎜ ,
If the lattice base (𝑣1 , … , 𝑣𝑛 ) is determined, use the symbol () to ⎜ ⋮ ⋮ ⋱ ⋮ ⎟
replace (𝑣1 , … , 𝑣𝑛 ). ∀𝑥 ∈ R𝑛 , project it onto (). According to the ⎜ 𝑎 𝑎𝑛2 ⋯ ⎟
𝑎0 ⎠
𝑛1
properties of projection, there is a unique 𝑦 ∈ () makes 𝑦 𝑥 ∈ .
it is easy to prove that this mapping relationship is isomorphic.
Use the symbol det () to represent the volume of the fundamental
parallelelepiped of the lattice . In other words, the symbol det ()
Definition 3 (Learning with Rounding, [16,17]). Let 𝜆 be the security
represents the determinant of a matrix composed of a set of lattice bases
parameter, 𝑛 = 𝑛(𝜆), 𝑚 = 𝑚(𝜆), 𝑞 = 𝑞(𝜆), 𝑝 = 𝑝(𝜆) be integers. The LWR
(𝑣1 , … , 𝑣𝑛 ). For a given 𝑛 dimensional lattice, the det () size of any set
problem states that for 𝐴 ∈ Z𝑚×𝑛 𝑛 𝑚
𝑞 , 𝑠 ∈ Z𝑞 , 𝑢 ∈ Z𝑞 the following distri-
of lattice bases of the lattice is constant.
butions are computationally indistinguishable: (𝐴, ⌊𝐴𝑠⌋𝑝 ) ≈𝐶 (𝐴, ⌊𝑢⌋𝑝 ).
Given 𝑛 lattice , (𝑣1 , … , 𝑣𝑛 ) and (𝑢1 , … , 𝑢𝑛 ) are two arbitrary groups
∑ Here ⌊𝑥⌋𝑝 = ⌊ 𝑞𝑝 𝑥⌋, ⌊𝑥⌋ represents the floor function, which rounds down
of lattice  respectively lattice bases. Therefore, there is 𝑣𝑖 = 𝑛𝑗=1 𝑚𝑖𝑗 𝑢𝑗
∑𝑛 to the nearest integer. For example, ⌊3.14⌋ = 3 and ⌊3⌋ = 3.
and 𝑢𝑖 = 𝑗=1 𝑚𝑖𝑗 𝑣𝑗 , 𝑖 ∈ {1, … , 𝑛}, there are two integer matrices 𝑀 and
𝑀 such that
𝑣1 ⎞ ⎛ 𝑢1 ⎞ ⎛ 𝑢1 ⎞ ⎛ 𝑣1 ⎞ Definition 4 (Learning Parity with Noise, [18,19]). Let 𝜆 be the security
⎜ ⋮ ⎟ = 𝑀 ⎜ ⋮ ⎟ and ⎜ ⋮ ⎟ = 𝑀 ⎜ ⋮ ⎟ . parameter, 𝑛 = 𝑛(𝜆), 𝑚 = 𝑚(𝜆) be integers. The LPN problem states
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
𝑣𝑛 ⎠ ⎝ 𝑢𝑛 ⎠ ⎝ 𝑢𝑛 ⎠ ⎝ 𝑣𝑛 ⎠ that for 𝐴 ∈ Z𝑚×𝑛
2
, 𝑠 ∈ Z𝑛2 , 𝑢, 𝑒 ∈ Z𝑚
2
the following distributions are
computationally indistinguishable: (𝐴, 𝐴𝑠 + 𝑒) ≈𝐶 (𝐴, 𝑢).
It is easy to prove that 𝑀 and 𝑀 are inverse to each other, and 𝑀
and 𝑀 are both integer matrices, there are det (𝑀)⋅ det (𝑀 ) = 1 and
det (𝑀) = det (𝑀 ) = ±1, so Definition 5 (Hamming Correlation Robustness, [14]). For a hash func-
det (𝑣1 , … , 𝑣𝑛 ) = ± det (𝑢1 , … , 𝑢𝑛 ). tion (⋅) and a pseudorandom function 𝐹𝑘 (⋅) with key 𝑘, (⋅) is Ham-
ming correlation robust if (𝑥) ≈𝐶 𝐹𝑘 (𝑥).
Definition 1. An ideal lattice is a subset of rings or domains that Definition 6 (OT1 ). The message sender sends data to the receiver
satisfies the following two properties: from a set of pending messages but remains oblivious to which specific
message was sent. Meanwhile, the receiver is unaware of the additional
1. Additive closure: If any two elements in the ideal are added, the data they want to receive. This protocol is also known as oblivious
result is still in the ideal. In other words, for any elements 𝑎 and transfer.
𝑏 in the ideal, 𝑎 + 𝑏 also belongs to that ideal.
2. Multiplicative absorptivity: If an element in the ideal is multi-
plied by any element in the ring (or field), the result is still in Definition 7 (OPRF, [20]). Let the PRF key 𝑘 consist of two bit-
the ideal. In other words, for any element 𝑎 in the ideal and any strings 𝑞 , 𝑠 ∈ {0, 1}𝜆 . Let 𝐹 (⋅)be a pseudorandom code that produces a
element 𝑟 in the ring (or field), 𝑎𝑟 and 𝑟𝑎 belong to that ideal. pseudorandom string and let  be a hash function. The pseudorandom
function is computed as
For a commutative ring, further require that the ideal be closed for both
addition and multiplication. Such an ideal is called a true ideal. OPRF𝑘 (𝑥) = (𝑞 ⊕ [𝐹 (𝑥) ⋅ 𝑠]),
where ⋅ denotes bitwise-AND and ⊕ denotes bitwise-XOR. For a ran-
Definition 2. Referring to the definition of ideal, the ideal lattice  is domly generated s, if 𝐹 (𝑥) has enough Hamming weight then the
a subset of the lattice  that satisfies the following two properties: function OPRF𝑘 (𝑥) is pseudorandom assuming the hash function  is
correlation robust.
1. Additive closure: If any two elements in an ideal lattice are
added, the result is still in the ideal lattice. In other words, for
any elements 𝑎 and 𝑏 in an ideal lattice, 𝑎+𝑏 also belongs to that Definition 8 (PSI, [14]). PSI enables two parties, each holding a private
ideal lattice. set of elements, to compute the intersection of the two sets while
2. Multiplicative absorptivity: If an element in an ideal lattice is revealing nothing more than the intersection itself.
multiplied by an element in any other ideal lattice, the result
remains in the ideal lattice. In other words, for any element 𝑎 in
Definition 9 (Dihedral Coset Problem). Given a security parameter 𝜅, for
the ideal and any element 𝑟 in another ideal lattice, both 𝑎𝑟 and
an instance of the DCP𝓁𝑞 problem, where 𝑁 denotes the modulus and 𝓁
𝑟𝑎 belong to that ideal lattice.
represents the number of states. Each state is expressed as
|0⟩|𝑥𝑖 ⟩ + |1⟩|(𝑥𝑖 + 𝑠) mod 𝑞⟩, 𝑖𝓁,
Corollary 1. The ideal lattice  is a true idea of the lattice . and it stores 1 + ⌈log2 𝑞⌉ bits, where 𝑥 ∈𝑅 Z𝑛𝑞 and 𝑠 ∈ Z𝑛𝑞 . If 𝑠 can be
For 𝑓 (𝑥) = 𝑎0 + 𝑎1 𝑥 + ⋯ + 𝑎𝑛1 𝑥𝑛1 is mapped to computed with probability poly(1 log 𝑞) in time poly(log 𝑞), then the
DCP𝓁𝑞 problem is considered to be broken.
𝑅𝑜𝑡(𝑓 ) = 𝑎0 𝐼 + 𝑎1 𝑋 + ⋯ + 𝑎𝑛1 𝑋 𝑛1 ∈ .
̃
Among them,  ̃ is the mapping of all Z[𝑥]<𝑥𝑛 + 1> to the elements in
1
the ideal lattice  collection, and https://blog.csdn.net/m0_61869253/article/details/139362753
3
Z. Shan et al. Journal of Systems Architecture 160 (2025) 103346
3.2. Security proof of OPRF
Note 1. The Dihedral Coset Problem is a difficult problem in quantum In this subsection, we will provide the definition of the underly-
computing, and solving it has a time complexity of 𝑂(𝑒𝑛 ) or 𝑂(𝑛!). ing lattice problem for OPRF, learning parity with rounding, and its
reduction proof.
Lemma 1. If an efficient algorithm  can solve DCP𝓁2 in polynomial
Definition 11 (Learning Parity with Rounding). Let 𝜆 be the security
time, then there exists an efficient algorithm  that can solve DCP𝓁𝑞 in
parameter, 𝑛 = 𝑛(𝜆), 𝑚 = 𝑚(𝜆) be integers. The LPR problem states
polynomial time.
that for 𝐴 ∈ Z𝑚×𝑛
2
, 𝑠 ∈ Z𝑛2 , 𝑢 ∈ Z𝑚 2
the following distributions are
computationally indistinguishable: (𝐴, ⌊𝐴𝑠 mod 4⌋1 ) ≈𝐶 (𝐴, ⌊𝑢⌋1 ).
Proof. We use a proof by contradiction. Suppose 𝑞 = 2𝑛 and there exists
an efficient algorithm  that can solve DCP𝓁2 in polynomial time. For Definition 12 (Learning Parity with Rounding Over Ring). The Ring LPR
instances of DCP𝓁4 , we have problem states that for 𝑎, 𝑠, 𝑢 ∈ 2 the following distributions are
|0⟩|𝑥𝑖 ⟩+|1⟩|(𝑥𝑖 + 𝑠) mod 4⟩ = |0⟩|𝑥𝑖 ⟩ + |1⟩|(𝑥𝑖 + 𝑠 ) mod 2⟩ computationally indistinguishable: (𝑎, ⌊𝑎𝑠 mod 4⌋1 ) ≈𝐶 (𝑎, ⌊𝑢⌋1 ).
+ 2(|0⟩|𝑥
𝑖 ⟩ + |1⟩|(𝑥𝑖 + 𝑠 ) mod 2), 𝑖𝓁,
so running the algorithm  twice will solve DCP𝓁4=22 . Similarly, run- Lemma 4. For an LWR problem instance ⌊𝐴𝑠⌋𝑝 , if there exists an algorithm
ning  four times will solve DCP𝓁16=24 , and continuing in this manner,  for solving 𝑠 from ⌊𝐴𝑠⌋1 , then there also exists an algorithm  for
running the algorithm  𝑛 times will solve DCP𝓁𝑞 . Let 𝑂() represent solving the LWR problem.
the time complexity of the algorithm . Thus, we have  𝑛𝑂()
and algorithm  is an efficient algorithm. □ Proof. Given that there exists an algorithm  that can solve ⌊𝐴𝑠⌋1 =
𝐴𝑠 ⌋, for an LWR problem instance ⌊𝐴𝑠⌋𝑝 , we have:
𝑞 ⌊ ⌋
Definition 10 (Extrapolated Dihedral Coset Problem with model 2, [21]). 1 1 𝑝𝐴𝑠
⌊𝐴𝑠⌋𝑝 =
Given a security parameter 𝜅, an instance of EDCP𝓁𝑛,2,𝜌 is provided, 𝑝 𝑝 𝑞
( )
where 2 denotes the modulus, 𝜌 represents the probability density 1 𝑝𝐴𝑠
= +𝑒 (𝑒 ∈ (1, 0]𝑚 )
function, and 𝓁 denotes the number of states. Each state is expressed 𝑝 𝑞
( ( ]𝑚 )
as 1 1
∑ = 𝐴𝑠 + 𝑒 𝑒 , 0
𝜌(𝑗)|𝑗⟩|(𝑥𝑖 + 𝑗 𝑠) mod 2⟩, 𝑖𝓁, 𝑞 𝑝
𝑗∈supp(𝜌) ≈ ⌊𝐴𝑠⌋1 .
and stores 2 bits, where 𝑥𝑖 ∈𝑅 Z𝑛2 and 𝑠 ∈ Z𝑛2 . If 𝑠 can be determined
Thus, the algorithm  can be used to solve the LWR problem. □
with probability poly(1(𝑛 log 2)) in time poly(𝑛 log 2), then the EDCP𝓁𝑛,2,𝜌
problem is considered to be broken. We get next corollary by Lemma 3.
Corollary 3. Let (𝑛, 2, 𝑟 = 𝛺( 𝜅)) be an instance of G-EDCP and (𝑛, 2, 𝛼)
Lemma 2. If there exists an algorithm for solving EDCP𝓁𝑛,4,𝜌 , then this be an instance of 2-LWR. If there exists an algorithm for solving 2-LWR,
algorithm can also solve DCP𝓁4 . then there exists an algorithm for solving G-EDCP𝓁𝑛,2,𝜌 .
𝑟
Proof. Let Corollary 4. Let (𝑛, 2, 𝑟 = 𝛺( 𝜅)) be an instance of G-EDCP and (𝑛, 2, 𝛼)
1 1 be an instance of LPR. If there exists an algorithm for solving LPR, then
|𝑏⟩ = √ |0⟩|𝑥𝑖 ⟩ + √ |1⟩|(𝑥𝑖 + 𝑠) mod 4⟩.
2 2 there exists an algorithm for solving G-EDCP𝓁𝑛,2,𝜌 .
𝑟
Thus, 𝜌(0)|0⟩ = √1 |0⟩ and 𝜌(1)|1⟩ = √1 |1⟩. Hence, DCP𝓁2 is a special
2 2
case of EDCP𝓁𝑛,2,𝜌 . Therefore, if there exists an algorithm for solving Lemma 5. If there exists an algorithm  for solving the Ring-LPR problem,
EDCP𝓁𝑛,2,𝜌 , this algorithm can also solve DCP𝓁2 . □ then there also exists an algorithm  for solving the LPR problem.
√ Proof. For an instance of the inner product Ring-LPR
Lemma 3 ([21]). Let (𝑛, 𝑞 , 𝑟 = 𝛺( 𝜅)) be an instance of G-EDCP and
(𝑛, 𝑞 , 𝛼) be an instance of LWE. If there exists an algorithm for solving 𝑏 = ⌊𝑎 ⋅ 𝑠⌋1
LWE𝑛,𝑞,𝛼 , then there exists an algorithm for solving G-EDCP𝓁𝑛,𝑞,𝜌 . where 𝑎 = 𝑎0 + 𝑎1 𝑥 + ⋯ + 𝑎𝑛1 𝑥𝑛1 , we can represent 𝑎 as a circulant
𝑟
matrix, specifically
√ ⎛ 𝑎0 𝑎𝑛1 ⋯ 𝑎1 ⎞
Corollary 2. Let (𝑛, 2, 𝑟 = 𝛺( 𝜅)) be an instance of G-EDCP and (𝑛, 2, 𝛼) ⎜ ⎟
𝑎 𝑎0 ⋯ 𝑎2 ⎟
be an instance of LPN. If there exists an algorithm for solving LPN𝑛,𝛼 , then 𝐴1 = ⎜ 1
.
⎜ ⋮ ⋮ ⋱ ⋮ ⎟
there exists an algorithm for solving G-EDCP𝓁𝑛,2,𝜌 . ⎜ 𝑎
𝑟
𝑛1 𝑎𝑛2 ⋯ 𝑎0 ⎠
Thus,
3. Ring-LPR based OPRF
𝑏 = ⌊𝑎 ⋅ 𝑠⌋1 ⇒ 𝑏 = 𝐴1 𝑠.
3.1. Constructing OPRF where 𝑎 = (𝑎0 , 𝑎1 , … , 𝑎𝑛1 ) ← 𝑎 = 𝑎0 + 𝑎1 𝑥 + ⋯ + 𝑎𝑛1 𝑥𝑛1 . We use
a proof by contradiction. Suppose there exists an efficient algorithm
Fig. 3 presents the ring LPR-based oblivious pseudorandom func-  that can solve Ring-LPR in polynomial time. We take the first row
tion. In the next section, we will prove the security of the oblivious from 𝐴1 , denote it as 𝛼1 , and have ⌊𝛼1 𝑠⌋1 = 𝑏1 , where 𝑏1 is the first
pseudorandom function. component of 𝑏. For the LWR problem instance, 𝛽⃗ = ⌊𝛬𝑠⃗⌋1 , assume
4
Z. Shan et al. Journal of Systems Architecture 160 (2025) 103346
Fig. 3. Oblivious Pseudorandom Function (OPRF).
𝛬𝑇 = (𝛼1 , 𝛼2 , … , 𝛼𝑚 ).
Thus, we use the algorithm  𝑚 times to find 𝛽𝑖 such that ⌊𝛾𝑖 ⌋1 = 𝛽𝑖 =
𝛼1 𝑠1 ⌋1 , and thus we can solve the equation
𝛾 = 𝛬𝑠⃗, 𝛾 𝑇 = (𝛾1 , … , 𝛾𝑚 ).
Assuming that the time complexity of solving 𝑠 from LWR problem
instance is 𝑂(𝛬, 𝛽), according to Corollary 3, let 𝑂(𝛾 = 𝛬𝑠⃗) be the
computational complexity of solving the equation 𝛾 = 𝛬𝑠⃗, we have
𝑚𝑂() + 𝑂(𝛾 = 𝛬𝑠⃗) ≥ 𝑂(𝛬, 𝛽) ≥ 𝑂(𝑛!) or 𝑂(𝑒𝑛 ).
Let 𝑚 = 𝑛, then
𝑂(𝛬, 𝛽) 𝑂(𝛾 = 𝛬𝑠⃗)
𝑂() ≥
𝑛
𝑂(𝑛!) 𝑂(𝛾 = 𝛬𝑠⃗) 𝑂(𝑒𝑛 ) 𝑂(𝛾 = 𝛬𝑠⃗)
≥ or .
𝑛 𝑛
This contradicts the assumption that there is an efficient algorithm 
that can solve the inner product Ring-LPR in polynomial time, thus the
theorem holds. □
3.3. Efficiency analysis
This section simulates the OPRF computation efficiency of this
paper and OPRF in [14] on MAC, Pad and Phone. The PRF of [14]
is instantiated based on LWE.
3.3.1. Efficiency analysis on MAC
The tools used in the subsection are Python 3.12, the programs are
performed on MacBook Air MAC Desktop Apple M1, RAM 8.00 GB (see
Fig. 4).
3.3.2. Efficiency analysis on mobile pad
The tools used in the subsection are Pydriod 3, the programs are
performed on Xiaomi Pad 6 Pro File Explorer 1th Qualcomm(R)AI En-
gine(TM) Xiaolong 8+ mobile platform@3.2 GHz, RAM 8.00+3.00 GB
(see Fig. 5).
Fig. 4. Parallel comparison of OPRF on MAC, where 𝑛 represents the security
parameter, unit is microseconds.
3.3.3. Summary of data comparison
From the simulation results, it can be seen that for 𝑛 ≤ 250, the
LWE-based OPRF in [14] is slightly faster, while for 𝑛 > 250, the ring
LPR-based OPRF in this paper is faster. Furthermore, as 𝑛 increases, 4. PSI based on OPRF
the advantages of ring LPR become more pronounced. Based on the
simulation results for Pad, the OPRF in this paper is more stable; In this paper, apart from OPRF, another tool used in the construction
although there are fluctuations, they are less significant compared to of PSI is a perturbed pseudorandom generator [15]. The perturbed
the LWE-based OPRF in [14]. pseudorandom generator in this paper is constructed from Ring-LPN.
5
Z. Shan et al. Journal of Systems Architecture 160 (2025) 103346
Fig. 6. Pseudorandom generator with perturbation 𝐺𝛾 (⋅).
𝑛1
√∑
‖𝑎‖ = √ |𝑎 |2 . 𝑖
𝑖=0
Definition 15 ([15]). A pseudorandom generator with perturbation,
denoted as 𝐺𝛾 (⋅), is defined such that for 𝑥1 , 𝑥2 ∈ , there exists 𝛾
satisfying the following conditions:
1. When 𝑥1 = 𝑥2 , Pr (𝐺𝛾 (𝑥1 ) = 𝐺𝛾 (𝑥2 )) ≤ 𝑂(exp(𝑛)),
2. When 𝑥1 = 𝑥2 , such that ‖𝐺𝛾 (𝑥1 ) 𝐺𝛾 (𝑥2 )‖ < 𝛾, there exists 𝑁
such that ‖𝐺𝛾 (𝑥1 ) 𝐺𝛾 (𝑥2 )‖ ≥ 𝛾𝑁, where clearly 𝑁 = 1 is
optimal.
Theorem 1. The Ring-LPN problem itself can be viewed as a pseudorandom
function with perturbations.
Proof. We prove each statement separately. First, when 𝑥1 = 𝑥2 , we
Fig. 5. Parallel comparison of OPRF on mobile pads, where 𝑛 represents the security have
parameter, unit is microseconds. ( ) 1
Pr 𝐺𝛾 (𝑥1 ) = 𝐺𝛾 (𝑥2 ) = Pr (𝑒1 = 𝑒2 ) = 𝑛 .
2
Additionally, set 𝛾 = 𝑛 + 1, so
Next, we will present the reduction process for Ring-LPN.
‖(𝐴𝑥1 + 𝑒1 ) (𝐴𝑥2 + 𝑒2 )‖ = ‖𝑒1 𝑒2 ‖ < 𝛾 .
4.1. Reduction of ring-LPN When 𝑥1 ≠ 𝑥2 , set 𝑣1 = 𝐺𝛾 (𝑥1 ), 𝑣2 = 𝐺𝛾 (𝑥2 ), and know that
√ ∑𝑛 ( )𝑘 ( )𝑛𝑘
1 1
Definition 13 (Learning Parity with Noise Over Ring). The learning parity Pr (‖𝑣1 𝑣2 ‖ ≤ 𝑛) = 𝐶𝑛𝑘
𝑘=0
3 2
with noise over ring problem states that for 𝑎, 𝑠, 𝑒, 𝑢 ∈ {0,1} the
following distributions are computationally indistinguishable: (𝑎, 𝑎𝑠 + ∑
𝑛2 ( )𝑘 ( )𝑘 ( )𝑛2𝑘
1 1 1
+ 𝐶𝑛𝑘 .
𝑒) ≈𝐶 (𝑎, 𝑢). 3 6 2
𝑘=0
Because
( )𝑘 ( )𝑛𝑘 ( ( )2 ( )𝑛 )
Corollary 5. If there exists an efficient algorithm  that can solve the ∑𝑛
1 1 1 2 2 2
Ring-LPN problem in polynomial time, then there also exists an algorithm 𝐶𝑛𝑘 = 𝑛 + +⋯+
𝑘=0
3 2 2 3 3 3
that can solve the LPN problem. ( ( )𝑛 )
3 2
= 𝑛 1 ,
2 3
Proof. The proof method is similar to that of Lemma 5, but this way
and
the computational complexity of  will decrease. If we want the Ring- ( )
𝑛2 ( )𝑘 ( )𝑘 ( )𝑛2𝑘 ( ) 2𝑛
LPN problem to be approximately as hard as the LPN problem, then 1 1 1 3⋅6 1 1
𝐶𝑛𝑘 ≤ 1 .
for the security parameters 𝜅1 of the Ring-LPN problem and 𝜅2 of the 𝑘=0
3 6 2 17 2𝑛 2𝑛 3⋅6
LPN problem, we have
Therefore
𝑒𝜅1 (𝜅 )! ( √ √ )
𝑒𝜅2 , or 1 ≥ (𝜅2 )!. 1
Pr ‖𝑣1 𝑣2 ‖ ≤ 𝑛 < 𝑛 + 1 ≤ 𝑛 .
𝜅12 𝜅12 2
Thus, we can roughly obtain 𝜅1 ≥ 1.5𝜅2 and 𝜅2 ≥ 12. Note that 𝑂(𝑛) Thus, there is a very high probability that ‖𝑣1 𝑣2 ‖ ≥ 𝑛 + 1, and 𝑁 = 1
is an asymptotically large quantity with respect to 𝑛. We use the most (see Fig. 6). □
extreme case to determine the relationship between 𝜅1 and 𝜅2 . □
4.2. Perturbed pseudorandom generator 4.3. PSI based on OPRF
Definition 14. Let 𝑎 = 𝑎0 + 𝑎1 𝑥 + ⋯ + 𝑎𝑛1 𝑥𝑛1 ∈ {0,1} . Define the Lemma 6. Assuming 𝑓 (𝑦) ≈𝐶 𝑢1 and 𝑔(𝑢1 ) ≈𝐶 𝑢2 , then (𝑔◦𝑓 )(𝑦) ≈𝐶 𝑢2 .
norm of 𝑎 as ‖𝑎‖, and
6
Z. Shan et al. Journal of Systems Architecture 160 (2025) 103346
Fig. 7. PSI based on OPRF.
Fig. 9. Parallel comparison of PSI on mobile pads, where 𝑛 represents the security
parameter, unit is microseconds.
Fig. 8. Parallel comparison of PSI on MAC, where 𝑛 represents the security parameter, Fig. 10. Comparison of PSI on mobile phones, where 𝑛 represents the security
unit is microseconds. parameter, unit is microseconds.
7
Z. Shan et al. Journal of Systems Architecture 160 (2025) 103346
Fig. 11. PIR based on OPRF.
Proof. On one hand, because the pseudorandom 𝐹̃𝑘 {0,1} × {0, 1}
{0,1} , for any 𝑘 ∈ {0,1} , 𝑦 ∈  ⊂ {0, 1} , we have 𝐹̃𝑘 (𝑦) ≈𝐶 𝑢𝜔 ∈
{0,1} .
On the other hand, due to the pseudorandom function 𝐹𝑘 {0,1} ×
{0,1} → {0,1} , for 𝑢𝓁1 ∈ {0,1} , we have 𝐹𝑘 (𝑢𝓁1 ) ≈𝐶 𝑢𝜔 . According
to the property of the hash function, have 1 (𝑦) ≈𝐶 𝑢𝓁1 . Combining
with Lemma 6, one can obtain that 𝐹𝑘 (1 (𝑦)) ≈𝐶 𝑢𝜔 . Consequently,
𝐹̃𝑘 (𝑦) ≈𝐶 𝐹𝑘 (1 (𝑦)). □
Theorem 2. If 1 is a collision resistant hash function, 2 and 3
are hamming correlation robustness, then the protocol in Fig. 7 securely
realizes 𝑃 𝑆 𝐼 in the semi-honest model when parameters 𝑚, 𝑤 are chosen
as described in [14].
Proof. Perspective from 𝑃1 .
Hyb0 𝑃1 s view and 𝑃2 s output in the real protocol.
Hyb1 Same as Hyb0 except that on 𝑃2 s side, for each 𝑖 ∈ [𝜔], if 𝑠[𝑖] = 0,
then sample 𝐴𝑖 ← {0, 1}𝑚 and compute 𝐵𝑖 = 𝐴𝑖𝐷𝑖 ; otherwise
sample 𝐵𝑖 ← {0, 1}𝑚 and compute 𝐴𝑖 = 𝐵𝑖𝐷𝑖 . This hybrid is
identical to Hyb0 .
Hyb2 Initialize an 𝑚 × 𝑤 binary matrix 𝐷 to all 1s. Denote its column
vectors by 𝐷1 , … , 𝐷𝜔 . Then 𝐷1 = ⋯ = 𝐷𝜔 = 1𝑚 . For 𝑦 ∈ ,
randomly select 𝑣 ← [𝑚]𝜔 , and set 𝐷𝑖 [𝑣[𝑖]] = 0 for all 𝑖 ∈ [𝜔].
Hyb3 Find a suitable pseudorandom function 𝐹̃𝑘 {0,1} × {0, 1}
{0,1} . For 𝑦 ∈ , compute 𝑣̃ = 𝐹̃𝑘 (𝑦), randomly select 𝑣 ← [𝑚]𝜔 ,
and set 𝐷𝑖 [𝑣[𝑖]] = 0 for all 𝑖 ∈ [𝜔].
Hyb4 Let there be a pseudorandom function 𝐹 {0,1} ×{0,1} → {0,1}
and a hash function 1 {0, 1} → {0,1} . For 𝑦 ∈ , compute
𝑣 = 𝐹𝑘 (1 (𝑦)), randomly select 𝑣 ← [𝑚]𝜔 , and set 𝐷𝑖 [𝑣[𝑖]] = 0 for
all 𝑖 ∈ [𝜔].
Hyb5 Let there be a pseudorandom function 𝐹 {0,1} × {0,1} →
{0,1} , Hamming Correlation Robustness 2 Z𝑚×𝜔 {0,1}
→ {0,1}
and a hash function 1 {0, 1} → {0,1} . For 𝑦 ∈ , compute
𝑣 = 𝐹𝑘 (1 (𝑦)), 𝑣 = 2 (𝑣 ), and set 𝐷𝑖 [𝑣[𝑖]] = 0 for all 𝑖 ∈ [𝜔].
Fig. 12. Parallel comparison of PIR on MAC, where 𝑛 represents the security parameter, Given that Hyb0 ≈𝐶 Hyb1 ≈𝐶 Hyb2 ≈𝐶 Hyb3 , Hyb4 ≈𝐶 Hyb5 and
unit is microseconds. according to Lemma 7, it be known that Hyb3 ≈𝐶 Hyb4 . Therefore, we
have Hyb0 ≈𝐶 Hyb5 .
Perspective from 𝑃2 .
Lemma 7. Find a suitable pseudorandom function 𝐹̃𝑘 {0,1} × {0, 1} → Hyb0 𝑃2 s view in the real protocol.
{0,1} . Assuming that the pseudo-random function 𝐹𝑘 {0,1} × {0,1} →
Hyb1 𝜓 ← {0,1} , all other aspects are consistent with the real
{0,1} and the hash function 1 {0, 1} → {0,1} are indistinguishable,
protocol.
we have
Hyb2 Introduce 𝐺𝛾 {0,1} → {0,1} and Hamming Correlation
𝐹̃𝑘 (𝑦) ≈𝐶 𝐹𝑘 (1 (𝑦)).
Robustness 3 Z𝑚×𝜔 {0,1}
→ {0,1} , let the initial matrices be
𝐶1 = ⋯ = 𝐶𝜔 = 1𝑚 , randomly select 𝑣 ∈ [𝑚]𝜔 , set 𝐶𝑖 [𝑣[𝑖]] = 0
for all 𝑖 ∈ [𝜔]. Compute 𝐺𝛾 (𝐶1 [𝑣[1]]‖ ⋯ ‖𝐶𝜔 [𝑣[𝜔]]).
8
Z. Shan et al. Journal of Systems Architecture 160 (2025) 103346
Hyb3 Let the initial matrices be 𝐶1 = ⋯ = 𝐶𝜔 = 1𝑚 , find an appropriate • Setup The simulator  generates some necessary parameters for the
pseudorandom function 𝐹̃𝑘 {0,1} × {0, 1} → {0,1} . For 𝑦 ∈ , algorithms and selects an appropriate hash functions 1 {0, 1}
compute 𝑣̃ = 𝐹̃𝑘 (𝑦), randomly select 𝑣 ← [𝑚]𝜔 , set 𝐶𝑖 [𝑣[𝑖]] = 0 for {0,1} , Hamming Correlation Robustness 2 {0,1} → [𝑚]𝜔 , Ham-
all 𝑖 ∈ [𝜔]. Compute 𝐺𝛾 (𝐶1 [𝑣[1]]‖ ⋯ ‖𝐶𝜔 [𝑣[𝜔]]). ming Correlation Robustness 3 Z𝑚×𝜔 → {0,1} and a 𝐺𝛾 {0,1} →
{0,1}
Hyb4 Let the initial matrices be 𝐶1 = ⋯ = 𝐶𝜔 = 1𝑚 , set a pseudo- {0,1} , a pseudorandom function 𝐹 {0,1} × {0,1} → {0,1} with
random function 𝐹 {0,1} × {0,1} → {0,1} , a hash function key 𝑘 ∈ {0,1} . The adversary 𝑃1 selects 𝑠 and transmits 𝑠 to the
1 {0, 1} → {0,1} and Hamming Correlation Robustness simulator  using OT.
𝑚×𝜔
3 Z{0,1} → {0,1} . For 𝑦 ∈ , compute 𝑣 = 𝐹𝑘 (1 (𝑦)), • H-Query, PRF-Query and PRG-Query The adversary 𝑃1 makes
randomly select 𝑣 ← [𝑚]𝜔 . Set 𝐶𝑖 [𝑣[𝑖]] = 0 for all 𝑖 ∈ [𝜔]. Compute queries about the hash function, pseudorandom function, oblivious
𝐺𝛾 (3 (𝐶1 [𝑣[1]]‖ ⋯ ‖𝐶𝜔 [𝑣[𝜔]])). transfer values, and pseudorandom generator. The simulator  pre-
Hyb5 Let the initial matrices be 𝐶1 = ⋯ = 𝐶𝜔 = 1𝑚 , set a pseu- establishes lists for handling H-Query, PRF-Query, and PRG-Query
dorandom function 𝐹 {0,1} × {0,1} → {0,1} and a hash respectively.
function 1 {0, 1} → {0,1} , Hamming Correlation Robustness
𝑚×𝜔
2 Z{0,1} → {0,1} and 3 Z𝑚×𝜔 → {0,1} . For 𝑦 ∈ , 1 -Query For the 𝑖th query 𝑥𝑖 ∈ {0, 1} corresponding to the
{0,1}
compute 𝑣 = 𝐹𝑘 (1 (𝑦)), compute 𝑣 = 𝐹𝑘 (1 (𝑦)). Set 𝐶𝑖 [𝑣[𝑖]] = 0 value of 1 , the simulator  selects from the hash value list
for all 𝑖 ∈ [𝜔]. Compute 𝐺𝛾 (3 (𝐶1 [𝑣[1]]‖ ⋯ ‖𝐶𝜔 [𝑣[𝜔]])). if available, otherwise selects a random 𝑋𝑖 ∈ {0,1} . Set 𝑋𝑖 =
Similarly, it can be proven that Hyb0 ≈𝐶 Hyb5 . □ 1 (𝑥𝑖 ) and update the list accordingly.
2 -Query For the 𝑖th query 𝑦𝑖 ∈ {0,1} corresponding to the
value of 2 , the simulator  selects from the hash value list if
Definition 16 (CPA Security Model of the Protocol in Fig. 7). Assume available, otherwise selects a random 𝑌𝑖 ∈ [𝑚]𝜔 . Set 𝑌𝑖 = 2 (𝑦𝑖 )
there exists a perturbed pseudorandom oracle machine 𝑃 𝑟𝑀𝛾 (where
and update the list accordingly.
𝛾 is the upper bound on the norm of the perturbation in 𝑃 𝑟𝑀𝛾 ), such
3 -Query For the 𝑖th query 𝑧𝑖 ∈ Z𝑚×𝜔 corresponding to the
that for an input 𝑥, it outputs two values: one is a random value 𝑦0 , {0,1}
value of 3 , the simulator  selects from the hash value list
and the other is a pseudorandom value 𝑦1 with 𝑥 as its input.
if available, otherwise selects a random 𝑍𝑖 ∈ {0,1} . Set 𝑍𝑖 =
• Setup The simulator  generates the necessary parameters for 3 (𝑧𝑖 ) and update the list accordingly.
the algorithms. The adversary  chooses 𝑠 and sends it to the 𝐹 -Query For the 𝑖th query 𝑢𝑖 ∈ {0,1} corresponding to the value
simulator  using OT. of 𝐹 , the simulator  selects from the pseudorandom function
• Hash Queries, PRF Queries and PRG Queries The adversary value list if available, otherwise selects a random 𝑈𝑖 ∈ {0,1} .
 sequentially performs hash function queries, pseudorandom Set 𝑈𝑖 = 𝐹 (𝑢𝑖 , 𝑘) and update the list accordingly.
function queries, and pseudorandom synthesizer queries. Here,
𝐺𝛾 -Query For the 𝑖th query 𝑤𝑖 ∈ {0,1} corresponding to the
the adversary cannot know the key in pseudorandom function
value of 𝐺𝛾 , the simulator  selects from the pseudorandom
queries.
generator value list if available, otherwise selects a random
• Challenge The adversary  selects a private message 𝑚 and sends
𝑊𝑖 ∈ {0,1} . Set 𝑊𝑖 = 𝐺𝛾 (𝑤𝑖 ) and update the list accordingly.
it to the simulator . The simulator queries the hash function,
pseudorandom function, and oblivious transfer values of the real Note that 𝐺𝛾 is not 𝐺𝛾black-box .
scheme, inputs these results into the pseudorandom oracle ma-
chine 𝑃 𝑟𝑀𝛾 , obtains two ciphertexts 𝑐0 and 𝑐1 , and sends them • Challenge 𝑃1 selects 𝑚 ∈ ∕ and sends it to .  using the corre-
to the adversary . sponding hash function queries and pseudorandom function queries,
• Guessing After receiving the two ciphertexts 𝑐0 and 𝑐1 ,  guesses inputs the queried values into the black-box 𝐺𝛾 , obtaining 𝜓0 and 𝜓1 ,
which ciphertext corresponds to the encryption of 𝑚 and sends the and then sends 𝜓0 , 𝜓1 to 𝑃1 .
guess back to the simulator . • Guess Based on the received 𝜓0 and 𝜓1 , 𝑃1 guesses whether 𝜓0 or
The advantage of the adversary  is defined as the advantage of the 𝜓1 is the ciphertext of the encrypted message 𝑚.
simulator  in distinguishing the outputs of 𝑃 𝑟𝑀𝛾 . According to the assumption, if the adversary 𝑃1 can break the
scheme with a non-negligible advantage, then the simulator  can
Note 2. The 𝑃 𝑟𝑀 mentioned in this paper differs from [22]. In [22], also break the black-box 𝐺𝛾 with a non-negligible advantage. This
𝑃 𝑟𝑀 refers to a pseudorandom oracle machine that outputs random contradicts the assumption that 𝐺𝛾 is secure. □
values when the adversary does not know the pseudorandom function key,
and outputs pseudorandom function values based on the key known to the
adversary when the key is known. This is a single-value output. However, the 4.4. Efficiency analysis PSI
𝑃 𝑟𝑀 required in this paper outputs both of these values simultaneously,
making it a multi-value output. This section simulates the PSI computation efficiency of this pa-
per and PSI in [14] on MAC, Pad, and Phone. The PRF of [14] is
Theorem 3. If 1 is a collision resistant hash function, 2 and 3 are instantiated based on LWE.
hamming correlation robustness, then the protocol in Fig. 7 securely realizes
𝑃 𝑆 𝐼 in Definition 16.
4.4.1. Efficiency analysis on MAC
The tools used in the subsection are Python 3.12, the programs are
Proof. Suppose the adversary 𝑃1 can break the scheme with non- performed on MacBook Air MAC Desktop Apple M1, RAM 8.00 GB (see
negligible advantage. Now, the simulator  simulates the scheme. Fig. 8).
Suppose there exists a black-box 𝐺𝛾𝑏𝑙𝑎𝑐 𝑘𝑏𝑜𝑥 such that
𝑦0 = 𝐺𝛾 (𝑥) ∈ {0,1} ,
4.4.2. Efficiency analysis on mobile pad
↗ The tools used in the subsection are Pydriod 3, the programs are
𝐺𝛾𝑏𝑙𝑎𝑐 𝑘𝑏𝑜𝑥 (𝑥) → (𝑦0 , 𝑦1 )
↘ performed on Xiaomi Pad 6 Pro File Explorer 1th Qualcomm(R)AI En-
𝑦1 ∈𝑅 {0,1} . gine(TM) Xiaolong 8+ mobile platform@3.2 GHz, RAM 8.00+3.00 GB
(see Fig. 9).
9
Z. Shan et al. Journal of Systems Architecture 160 (2025) 103346
4.5. Analysis of efficiency on mobile phones Acknowledgments
The tools used in the subsection are Pydriod 3, the programs are per- This work was supported in part by the National Nature Science
formed on Redmi K30 File Explorer 4th Qualcomm(R)AI Engine(TM) Foundation of China under Grant 61872087 and Grant 51875457; in
Qualcomm Xiaolong 730G 8+ mobile platform@2.2 GHz, RAM 6.00 GB part by the Key Foundation of National Natural Science Foundation
(see Fig. 10). of China under Grant U19B2021; and in part by the Key Research
and Development Program of Shaanxi under Program 2022GY-028 and
Program 2022GY-050.
4.5.1. Summary of data comparison
From the simulation results, it can be seen that for 𝑛 ≤ 400, the Data availability
LWE-based OPRF in [14] is slightly faster, while for 𝑛 > 400, the ring
LPR-based OPRF in this paper is faster. Furthermore, as 𝑛 increases, No data was used for the research described in the article.
the advantages of ring LPR become more pronounced. Based on the
simulation results for Pad, the OPRF in this paper is more stable;
although there are fluctuations, they are less significant compared to References
the LWE-based OPRF in [14].
[1] R. Lei, X. Chen, D. Liu, C. Song, Y. Tan, A. Ren, CEIU: Consistent and efficient
incremental update mechanism for mobile systems on flash storage, J. Syst. Ar-
5. Expansion of this work chit. 152 (2024) 103151, http://dx.doi.org/10.1016/j.sysarc.2024.103151, URL:
https://www.sciencedirect.com/science/article/pii/S1383762124000882.
[2] J. Sun, L. Yin, M. Zou, Y. Zhang, T. Zhang, J. Zhou, Makespan-minimization
Private Information Retrieval (PIR) [2329] is a technique that workflow scheduling for complex networks with social groups in edge
enables a client to securely download a specific element, such as a computing, J. Syst. Archit. 108 (2020) 101799, http://dx.doi.org/10.1016/
movie or a friends record, from a database managed by an untrusted j.sysarc.2020.101799, URL: https://www.sciencedirect.com/science/article/pii/
server, such as a streaming service or a social network, without disclos- S1383762120300928.
[3] Y. Gao, Y. Luo, L. Wang, X. Liu, L. Qi, W. Wang, M. Zhou, Efficient scalable
ing to the server which particular element has been retrieved. Given
multi-party private set intersection(-variants) from bicentric zero-sharing, in:
the functional similarities between PIR and PSI, this paper extends its
Proceedings of the Conference on Computer and Communications Security, CCS,
exploration into the construction of PIR using OPRF (see Fig. 11). Association for Computing Machinery (ACM), New York, NY, USA, 2024.
[4] M.O. Rabin, How to exchange secrets with oblivious transfer, 2005, URL: https:
5.1. Efficiency analysis PIR //eprint.iacr.org/2005/187.
[5] O. Goldreich, S. Goldwasser, S. Micali, How to construct random functions, J.
ACM 33 (4) (1986) 792807, http://dx.doi.org/10.1145/6490.6503.
This section simulates the PSI computation efficiency of this paper [6] M. Naor, O. Reingold, Number-theoretic constructions of efficient pseudo-random
and machine learning-based PIR in [30](DLMI for short) on MAC. functions, J. ACM 51 (2) (2004) 231262, http://dx.doi.org/10.1145/972639.
The tools used in the subsection are Python 3.12, the programs are 972643.
[7] M.J. Freedman, Y. Ishai, B. Pinkas, O. Reingold, Keyword search and oblivious
performed on MacBook Air MAC Desktop Apple M1, RAM 8.00 GB.
pseudorandom functions, in: J. Kilian (Ed.), Theory of Cryptography, Springer
The OPRF-based PIR proposed in this paper has a runtime that Berlin Heidelberg, Berlin, Heidelberg, 2005, pp. 303324.
differs from the machine learning-based PIR by no more than approx- [8] S. Jarecki, X. Liu, Efficient oblivious pseudorandom function with applications
imately 5 × 103 seconds. Additionally, the security of our PIR scheme to adaptive OT and secure computation of set intersection, in: O. Reingold (Ed.),
is theoretically supported in comparison to [30] (see Fig. 12). Theory of Cryptography, Springer Berlin Heidelberg, Berlin, Heidelberg, 2009,
pp. 577594.
[9] V.K. Yadav, N. Andola, S. Verma, S. Venkatesan, A survey of oblivious trans-
6. Conclusion fer protocol, ACM Comput. Surv. 54 (10s) (2022) http://dx.doi.org/10.1145/
3503045.
This paper presents a PSI based on efficient post-quantum OPRF and [10] M.R. Albrecht, A. Davidson, A. Deo, N.P. Smart, Round-optimal verifiable
oblivious pseudorandom functions from ideal lattices, in: J.A. Garay (Ed.), Public-
proves its security under the semi-honest model, demonstrating security
Key Cryptography PKC 2021, Springer International Publishing, Cham, 2021,
even in the CPA model in Definition 16. The addition of PPRG enables pp. 261289.
the PSI to effectively resist probabilistic attacks. In the simulation [11] N. Tyagi, S. Celi, T. Ristenpart, N. Sullivan, S. Tessaro, C.A. Wood, A fast
experiments, the proposed PSI shows greater efficiency compared to and simple partially oblivious PRF, with applications, in: O. Dunkelman, S.
post-quantum PSIs represented by LWE. Dziembowski (Eds.), Advances in Cryptology EUROCRYPT 2022, Springer
Although the PIR in this study is not as efficient as the machine International Publishing, Cham, 2022, pp. 674705.
[12] S. Casacuberta, J. Hesse, A. Lehmann, Sok: Oblivious pseudorandom functions,
learning-based PIR, the gap between the two is already quite small.
in: 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P),
However, there are also notable shortcomings; the efficiency of the 2022, pp. 625646, http://dx.doi.org/10.1109/EuroSP53844.2022.00045.
proposed PSI still lags behind that of non-post-quantum PSIs, which [13] D. Boneh, D. Kogan, K. Woo, Oblivious pseudorandom functions from isogenies,
will be addressed in future work. in: S. Moriai, H. Wang (Eds.), Advances in Cryptology ASIACRYPT 2020,
Springer International Publishing, Cham, 2020, pp. 520550.
[14] M. Chase, P. Miao, Private set intersection in the internet setting from lightweight
CRediT authorship contribution statement oblivious PRF, in: D. Micciancio, T. Ristenpart (Eds.), Advances in Cryptology
CRYPTO 2020, Springer International Publishing, Cham, 2020, pp. 3463.
Zhuang Shan: Writing original draft, Conceptualization. Leyou [15] Z. Shan, L. Zhang, Q. Wu, Q. Lai, Analysis, modify and apply in IIOT form
Zhang: Writing review & editing, Writing original draft. Qing Wu: light-weight PSI in CM20, 2024, URL: https://eprint.iacr.org/2024/969.
[16] J. Alwen, S. Krenn, K. Pietrzak, D. Wichs, Learning with rounding, revisited, in:
Conceptualization. Qiqi Lai: Writing review & editing. Fuchun Guo:
R. Canetti, J.A. Garay (Eds.), Advances in Cryptology CRYPTO 2013, Springer
Writing review & editing. Berlin Heidelberg, Berlin, Heidelberg, 2013, pp. 5774.
[17] A. Banerjee, C. Peikert, A. Rosen, Pseudorandom functions and lattices, in: D.
Declaration of competing interest Pointcheval, T. Johansson (Eds.), Advances in Cryptology EUROCRYPT 2012,
Springer Berlin Heidelberg, Berlin, Heidelberg, 2012, pp. 719737.
[18] D. Bellizia, C. Hoffmann, D. Kamel, H. Liu, P. Méaux, F.-X. Standaert, Y.
The authors declare that they have no known competing finan- Yu, Learning parity with physical noise: Imperfections, reductions and FPGA
cial interests or personal relationships that could have appeared to prototype, IACR Trans. Cryptogr. Hardw. Embed. Syst. 2021 (2021) 390417,
influence the work reported in this paper. URL: https://api.semanticscholar.org/CorpusID:235814670.
10
Z. Shan et al. Journal of Systems Architecture 160 (2025) 103346
[19] Y. Yu, J. Zhang, Smoothing out binary linear codes and worst-case sub- Leyou Zhang received the M.S. and Ph.D. degrees from Xid-
exponential hardness for LPN, in: T. Malkin, C. Peikert (Eds.), Advances in ian University, Xian, China, in 2002 and 2009, respectively.
Cryptology CRYPTO 2021, Springer International Publishing, Cham, 2021, pp. From 2013 to 2014, he served as a visiting scholar at the
473501. University of Wollongong, Australia. He currently worked
[20] V. Kolesnikov, R. Kumaresan, M. Rosulek, N. Trieu, Efficient batched oblivious in Xidian University as a professor.
PRF with applications to private set intersection, in: Proceedings of the 2016 His current research interests include public key cryp-
ACM SIGSAC Conference on Computer and Communications Security, CCS 16, tography, network security and computer security. He has
Association for Computing Machinery, New York, NY, USA, 2016, pp. 818829, over 120 scientific publications in many highly ranked
http://dx.doi.org/10.1145/2976749.2978381. cybersecurity journals and conferences.
[21] Z. Brakerski, E. Kirshanova, D. Stehlé, W. Wen, Learning with errors and
extrapolated dihedral cosets, in: Public-Key Cryptography PKC 2018, Springer
International Publishing, 2018, pp. 702727.
[22] A. Jain, H. Lin, J. Luo, D. Wichs, The pseudorandom oracle model and ideal
obfuscation, in: H. Handschuh, A. Lysyanskaya (Eds.), Advances in Cryptology
CRYPTO 2023, Springer Nature Switzerland, Cham, 2023, pp. 233262.
Qing Wu received the M.S. and Ph.D. degrees from the Xid-
[23] S. Angel, H. Chen, K. Laine, S. Setty, PIR with compressed queries and amortized
ian University, Xian, China, in 2006 and 2009, respectively.
query processing, in: 2018 IEEE Symposium on Security and Privacy, SP, 2018,
She currently works with Xian University of Posts and
pp. 962979, http://dx.doi.org/10.1109/SP.2018.00062. Communications, Xian, as a Professor. Her current research
[24] A. Burton, S.J. Menon, D.J. Wu, Respire: High-rate PIR for databases with small interests include artificial intelligence security and cloud
records, in: Proceedings of the Conference on Computer and Communications security.
Security, CCS, Association for Computing Machinery (ACM), New York, NY, USA,
2024.
[25] J. Dujmovic, M. Hajiabadi, Lower-bounds on public-key operations in PIR, in: M.
Joye, G. Leander (Eds.), Advances in Cryptology EUROCRYPT 2024, Springer
Nature Switzerland, Cham, 2024, pp. 6587.
[26] B. Fisch, A. Lazzaretti, Z. Liu, C. Papamanthou, Thorpir: Single server PIR via
homomorphic thorp shuffles, in: Proceedings of the Conference on Computer and
Communications Security, CCS, Association for Computing Machinery (ACM),
New York, NY, USA, 2024.
Qiqi Lai received the B.S. from PLA University of Informa-
[27] A. Gascon, Y. Ishai, M. Kelkar, B. Li, Y. Ma, M. Raykova, Computationally
tion Engineering, henan, China, in 2008. And he received
secure private information retrieval and aggregation in the shuffle model, in:
the M.S. and Ph.D. degrees from Xidian University, Xian,
Proceedings of the Conference on Computer and Communications Security, CCS, China, in 2011 and 2015.
Association for Computing Machinery (ACM), New York, NY, USA, 2024. His currently works with Shaanxi Normal University,
[28] A. Ghoshal, M. Zhou, E. Shi, Efficient pre-processing PIR without public- Xian, as a Professor. His current research interests include
key cryptography, in: M. Joye, G. Leander (Eds.), Advances in Cryptology the theory of lattice-based public key cryptography and its
EUROCRYPT 2024, Springer Nature Switzerland, Cham, 2024, pp. 210240. provable security, as well as the construction and analysis
[29] M. Luo, F.-H. Liu, H. Wang, Faster FHE-based single-server private information of homomorphic encryption schemes.
retrieval, in: Proceedings of the Conference on Computer and Communications
Security, CCS, Association for Computing Machinery (ACM), New York, NY, USA,
2024.
[30] M. Lam, J. Johnson, W. Xiong, K. Maeng, U. Gupta, Y. Li, L. Lai, I. Leontiadis,
M. Rhu, H.-H.S. Lee, V.J. Reddi, G.-Y. Wei, D. Brooks, E. Suh, GPU-based
Funcun Guo received the B.S. and M.S. degrees from Fujian
private information retrieval for on-device machine learning inference, in:
Normal University, China, in 2005 and 2008, respectively,
Proceedings of the 29th ACM International Conference on Architectural Support and the Ph.D. degree from the University of Wollongong,
for Programming Languages and Operating Systems, Volume 1, ASPLOS 24, Australia, in 2013. He is currently an Associate Research
Association for Computing Machinery, New York, NY, USA, 2024, pp. 197214, Fellow with the School of Computing and Information
http://dx.doi.org/10.1145/3617232.3624855. Technology, University of Wollongong.
His primary research interests include the public
key cryptography, in particular protocols, encryption and
Zhuang Shan received the B.S. from Liaoning Institute of signature schemes, and security proof.
Science and Technology, benxi, China, in 2019. And he
received the M.S. from North Minzu University, yinchuan,
China, in 2022.
He is currently pursuing the Ph,D. degree in mathemat-
ics with Xidian University, Xian, China. His current interests
include cryptography, reduction of hard problems in lattice,
and network security.
11

View File

@@ -0,0 +1,989 @@
Computer Standards & Interfaces 97 (2026) 104097
Contents lists available at ScienceDirect
Computer Standards & Interfaces
journal homepage: www.elsevier.com/locate/csi
Fully decentralized period k-times anonymous authentication with access
criteriaI , II
Hongyan Di a , Yinghui Zhang a ,, Ziqi Zhang a , Yibo Pang a , Rui Guo a , Yangguang Tian b
a
School of Cyberspace Security, Xian University of Posts & Telecommunications, 710121, Xian, China
b
University of Surrey, GU2 7XH, Surrey, UK
ARTICLE INFO ABSTRACT
Keywords: The explosive growth of Internet user devices highlights the strong and urgent need for digital identity
Fully decentralized infrastructure. However, the existing decentralized identity schemes are still not fully decentralized, and there
Publicly auditable is still a contradiction between publicly auditable credentials and maintaining anonymity. Therefore, using
Access criteria
advanced cryptographic techniques such as signature proof of knowledge, Pedersen commitment, and Merkle
Anonymous authentication
tree, this paper propose a fully decentralized period k-times anonymous authentication with access criteria.
Signature proof of knowledge
The scheme allows user credentials to be publicly audited, users can manage their identity independently, and
the verifier can not only verify the users identity, but also implement access control. The issuer does not need
to hold a key or maintain a list, and it can still authenticate even after the trusted center is attacked, and only
three zero-knowledge proofs are needed for registration and verification. The security analysis indicates that
this scheme satisfies unforgeability, anonymity, unlinkability and attribute privacy. Performance evaluation
shows significant improvements in both computational and communication efficiency over existing schemes.
1. Introduction control over digital resources such as services. The core of this system is
the concept of digital identity. The evolution of digital identity has gone
With the surge in digital services accessed through network con- through multiple eras, during which digital identity recognition has
nections, the number of digital identities has seen an unprecedented gradually shifted from centralized to decentralized identity models [3].
increase. Therefore, the vast majority of the global population has In fact, the way entities prove the ownership of digital identities may be
at least one digital identity, which becomes the key to unlocking a affected by various vulnerabilities [4]. The current Internet ecosystem
variety of online functions and services. However, the concept of digital generally adopts the centralized Identity Provider (IdP) model, with
identity goes far beyond human identity recognition [1]. With the wide tech giants such as Google and Facebook (e.g., Meta) serving as the
adoption of IoT and the powerful functions of the 5th Generation Mo- custodians of digital identities. Other services can directly rely on the
bile Communication Technology (5G) network, as well as the upcoming identity information provided by IdP. This architecture simplifies the
6th Generation Mobile Communication Technology (6G), the number authentication process by achieving single sign-on through protocols
of connected devices has increased significantly [2]. These devices such as OAuth, it has fundamental flaws when examined from the
require unique digital identities to enable their participation in digital perspective of privacy protection, users lose control over their digital
ecosystems, such as establishing secure communications. identities [5], and all their identity attributes are centrally stored in the
Authentication and authorization are crucial security-related core IdPs servers. Users neither know the specific usage of these data nor
tasks in the digital world. Their purpose is to ensure the authenticity can they effectively manage their flow. More seriously, this architecture
of the identities of the communicating parties and implement access has created a dangerous data island phenomenon—IdP can fully
I This article is part of a Special issue entitled: Information Security and Privacy published in Computer Standards & Interfaces.
II This work is supported by the National Cryptologic Science Fund of China (2025NCSF02037), the National Natural Science Foundation of China (62072369),
the Youth Innovation Team of Shaanxi Universities (23JP160), the Shaanxi Special Support Program Youth Top-notch Talent Program, the Technology Innovation
Leading Program of Shaanxi (2023-YD-CGZH-31), the Technology Innovation Guidance Special Fund of Shaanxi Province (2024QY-SZX-17), the Graduate
Innovation Fund of Xi an University of Posts and Telecommunications (CXJJBDL2024004).
Corresponding author.
E-mail addresses: 15029659213@163.com (H. Di), yhzhaang@163.com (Y. Zhang), qiqizhang0408@163.com (Z. Zhang), ybpang1998@163.com (Y. Pang),
guorui@xupt.edu.cn (R. Guo), yangguang.tian@surrey.ac.uk (Y. Tian).
URLs: https://www.xiyou.edu.cn/ (Y. Zhang), http://www.surrey.ac.uk (Y. Tian).
https://doi.org/10.1016/j.csi.2025.104097
Received 12 July 2025; Received in revised form 26 September 2025; Accepted 11 November 2025
Available online 19 November 2025
0920-5489/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
H. Di et al. Computer Standards & Interfaces 97 (2026) 104097
grasp the cross-platform service usage trajectory and behavioral char- have emerged. These include zero-knowledge credentials, lightweight
acteristics of users, essentially constructing a panoramic user profile. anonymous credentials without heavy zero-knowledge proofs and other
IdP, on the other hand, can obtain information about all the network computationally intensive operations, self-blinding credentials, group
services used by users (and related usage data). When the server storing signatures, AC schemes without unlinkability, and post-quantum AC
user data is invaded, sensitive personal information may be obtained schemes. In order to reduce the trust dependence of the credential
by malicious attackers, causing significant loss of personal data and issuance process on a central authority in traditional anonymous cre-
damaging the reputation of stakeholders [6]. In 2022 alone, there were dential schemes, Garman et al. [14] proposed the concept of decen-
over 1800 major data breaches worldwide, involving more than 400 tralized anonymous credential (DAC), which allows users to construct
million user records. The increasing number of data breach cases has and manage credentials in a completely anonymous manner. Derler
raised significant concerns to data confidentiality and transparency et al. [15] designed a new revocable multi-show attribute anonymous
in the field of digital identity management. In addition, centralized credential based on previous work, which has good scalability and con-
identity management systems rely on specific identity service nodes, stant operation of two roles. Bui and Aura [16] developed a distributed
making them vulnerable to single point of failure problem [7]. access control revocation framework to facilitate the manipulation of
Therefore, the increasing popularity of online services, the growing revocation methods. Subsequently, Sonnino et al. [17] proposed a
trend of decentralization, and the rising awareness of the shortcomings special selective disclosure voucher solution based on blind signatures
of traditional methods are paving the way for more secure and privacy- and bilinear pairing, which holds short and highly efficient vouch-
protecting approaches. Under this trend, supported by current laws and ers. Inspired by Sonninos work, Halpin [18] redesigned the tagging
regulations (such as the General Data Protection Regulation (GDPR) mechanism to improve scalability and support embedding arbitrary
of the European Union) [8], the concept of Self-Sovereign Identity attributes. Cui et al. [19] constructed a Blockchain Digital Identity
(SSI) [9] has attracted significant attention from both academia and Management System (BDIdM) by extending the functional features of
industry. SSI is based on the idea that individuals should have full the DAC scheme [14], which enabled limited reusability of specific cre-
control over their information without being forced to outsource data dentials on the premise of maintaining the security of the DAC scheme.
to any centralized institution or third party. Such technologies play a In addition, decentralized anonymous credentials are widely integrated
crucial role in establishing trust among entities (including non-human with other scenarios. Lin et al. [20] applied the DAC scheme to the
entities such as humans and IoT devices) and ensuring communication smart grid scenario and enhanced the privacy protection mechanism.
security through digital identities. Decentralized Identifiers (DIDs) and The solutions combined with the application scenarios of blockchain-
Verifiable Credentials (VCs), as effective solutions for enhancing pri- based Internet of Vehicles include [2125], Zeng et al. [26] also applied
vacy and security, have been promoted in multiple application fields anonymous credentials to cross-domain authentication in IIoT.
such as intelligent transportation and smart healthcare. These standards
can be extended to anyone or anything, covering cloud, edge, and IoT 2.2. 𝑘-Time anonymous authentication (𝑘-TAA)
resources. It is worth noting that several institutions, including industry
giants such as Microsoft, have recently developed and released a variety The 𝑘-period anonymous authentication allows users to be authen-
of implementation plans to support these technologies. In addition, ticated up to 𝑘-times within a certain time period while remaining
global government agencies are also actively promoting the widespread anonymous. Teranishi et al. [27] introduced the first 𝑘-TAA scheme,
application of DIDs and VCs. For instance, the European union pro- allowing the identification of users who exceeded the authentication
mulgated regulation 2024/1183 [10] in May 2024, establishing the limit. Nguyen and Safavi-Naini [28] extended this concept to dynamic
European digital identity framework, aiming to provide European cit- 𝑘-TAA, enabling each authenticator to independently grant or revoke
izens with digital passes for cross-border access to public and private access rights. Au et al. [29] proposed a fixed-size dynamic 𝑘-times.
services through the SSI system. This represents a significant milestone Chaterjee et al. [30] proposed a 𝑘-TAA scheme based on physically
in the development of digital identity solutions. However, current unclonable functions (PUFs), which is applicable to trusted platform
decentralized anonymous authentication schemes still face significant modules (TPM). Huang et al. [31] designed an efficient 𝑘-TAA system
challenges. These include the inability to achieve full decentralization, tailored for pay-as-you-go pricing, facilitating multiple service accesses
a lack of mutual trust between users and issuers, and the persistent and related payments within each certification cycle. However, many
contradiction between public verifiability and true anonymity. Against existing 𝑘-TAA schemes fail to provide periodic anonymous authenti-
this backdrop, AI-driven identity threat analysis has become a new cation. Although the existing schemes [32,33] support periodic anony-
focus of security research. Initiatives such as the Global Digital Iden- mous authentication, they have deficiencies in supporting the selective
tity Wallet (GDIW) have launched cross-border interoperability tests, disclosure of credential attributes to achieve fine-grained authentica-
while Digital Identity Chain has completed the integration of DIDs tion. In addition, they require a large number of pairing operations,
with the national government service platform—efforts that represent resulting in significant verification delays. In contrast, scheme [34,35]
preliminary but critical explorations in addressing these underlying supports periodic 𝑘-times anonymous authentication while reducing
issues. cumbersome pairing operations. However, scheme [34] does not sup-
port credential revocation. As shown in Table 1, our scheme, while
2. Relate work meeting the above requirements, supports full decentralization and
access control.
2.1. Decentralized anonymous credential (DAC)
• Research Contributions
In the 1980s, David Chaum [11,12] introduced privacy-preserving Next, we list the main research contributions of this paper.
cryptographic techniques, aiming to create a more privacy-focused The Proposed Scheme: We propose a fully decentralized 𝑘-times
and user-centered authentication and authorization solution. It enables period anonymous authentication scheme with access control.
users to prove their membership, identity, or any other arbitrary at- The scheme enforces both access criteria and authentication dur-
tribute in a group in a privacy-preserving manner. Such techniques are ing the verification process, while eliminating the need for issuers
often referred to as anonymous credentials (ACs), and various methods to hold keys or maintain lists, thus remaining secure even if the
for building AC systems have been widely studied in the academic com- trusted center is compromised. Only three zero-knowledge proofs
munity. However, since Camenish and Lysyanskaya [13] first proposed are required for registration and verification.
a completely anonymous credential scheme in 2001, a large number of Security Analysis: We conducted a correctness and theoretical
anonymous credit construction schemes suitable for various scenarios security analysis based on the game definition of the proposed
2
H. Di et al. Computer Standards & Interfaces 97 (2026) 104097
Table 1
Function comparison.
Security features [29] [30] [31] [33] [19] [34] [35] Our Scheme
Anonymity ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Unlinkability ✓ N.A ✓ N.A ✓ ✓ ✓ ✓
𝑘-times period anonymous authentication × × ×× ✓ N.A ✓
Publicly auditable N.A × N.A N.A ✓ ✓ ✓ ✓
Select attribute disclosure × × × × ✓ ✓ N.A ✓
Key forward and backward secure ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Reveal violators identity without TTP ✓ ✓ × ✓ ✓ ✓ ×
Issuer not hold key and identity list × × × × × × ×
Support credential revocation ✓ ✓ ✓ ✓ ✓ × ✓ ✓
Note*: ✓: Support this feature; ×: Does not support this feature; N.A: No applicable; TTP: Trusted third party.
scheme. By simulating games and citing programmable random 3.2. Zero-knowledge proof
oracles and fork lemmas, among other techniques, we demon-
strated that the scheme meets the requirements of unforgeability,
A signature proof of knowledge (SPK) is a non-interactive zero-
anonymity, unlinkability, and attribute privacy. This analysis em-
knowledge proof (ZKP) technique that enables a prover to demonstrate
phasizes that the plan has protected the integrity and validity of
the data. knowledge of a secret value without revealing it, while also signing
Performance Evaluation: We conducted a detailed analysis of a message. We constructed a cyclic group G of prime order 𝑞 and
this authentication scheme, demonstrating its efficiency advan- employed the FiatShamir heuristic [36] to convert an interactive
tages over existing authentication schemes. Tests were also car- proof into a non-interactive one. These non-interactive constructs are
ried out on secp256k1 and BLS12-381 curves, verifying that the precisely referred to as signature proofs of knowledge (SPK). All the
proposed algorithm performs better on lightweight curves. signatures of knowledge are secure in the random oracle model. Ac-
• Structure of Paper cording to the symbols introduced by Camenisch and Stadler [37],
The remaining paper is structured as follows: Section 3 intro- 𝑃 𝑜𝐾{(𝑥) 𝑦 = 𝑔 𝑥 } represents the zero-knowledge proof protocol
duces the problem assumptions and fundamentals. Section 4 de- between the prover and the verifier. Such prover knows 𝑥 ∈ Z𝑝 and
fines the syntax, security model, and detailed construction of 𝑦 = 𝑔 𝑥 ∈ G. The corresponding non-interactive signature knowledge
the scheme. Section 5 analyzes its correctness and theoretical proof on the message 𝑚 should be expressed as 𝑆𝑃 𝐾{(𝑥) 𝑦 = 𝑔 𝑥 }(𝑚).
security. Section 6 evaluates performance in terms of computation It can be regarded as a signature on the message 𝑚, which is signed by
and communication overhead, and Section 7 concludes the paper. a key pair (𝑔 𝑥 , 𝑥) based on discrete logarithms.
3. Preliminaries
3.3. Pedersen commitment
3.1. Group description and hardness assumptions
Literature [38] uses Poseidon to realize the hash of Merkle tree
A group generator 𝐺𝐺𝑒𝑛(1𝜅 ) → (G, 𝑞) inputs a security parameter 𝜅 and commitment. Instantiate another method of using Pedersen hash-
and outputs a cyclic group G of prime order 𝑞. This scheme is based on ing and perfectly hiding commitments in the scheme. The Pedersen
the following hard problem assumption.
commitment algorithm as follows:
Definition 2.1 (Discrete Logarithm Problem (DLP) Assumption). Let 𝑔 be
𝐺𝑒𝑛(1𝜅 ) → 𝑐𝑘 Select a finite group G with a large prime order
a generator of a group G. Given a tuple (𝑔, 𝑔 𝑎 ) ∈ G2 , where 𝑎 ∈ Z𝑞 , the
𝑞, and choose two generators 𝑔 and from the group G. The
Discrete Logarithm Problem is output 𝑎. The DLP assumption holds if
parameters of this commitment scheme are 𝑐𝑘 = (G, 𝑞, 𝑔, ).
for all PPT adversary , the advantage is negligible.
• 𝐶𝑜𝑚𝑚𝑖𝑡(𝑐𝑘, 𝑢) → 𝑐: Generate a commitment 𝑐 for a secret value 𝑢.
AdvDLP
 (𝜅) = |𝑃 𝑟[(𝑔, 𝑔 )| = 𝑎] ≤ 𝑛𝑒𝑔𝑙(𝜅).
𝑎 The commitment party randomly selects a blind factor 𝑟 and then
calculates 𝑐 = 𝑔 𝑢 𝑟 .
• 𝑂𝑝𝑒𝑛𝐶𝑜𝑚(𝑐𝑘, 𝑐, 𝑢, 𝑟) → 01: The verifier checks whether 𝑐 is equal
Definition 2.2 (Decisional DiffieHellman (DDH) Assumption). Let G
to 𝑔 𝑢 𝑟 .
be a group of order a large prime 𝑞, 𝑔 be the generator of G. The
input is a random quadruple  = (𝑔, 𝑔 𝑥 , 𝑔 𝑦 , 𝑔 𝑥𝑦 ) ∈ G3 , and quadruple
 = (𝑔, 𝑔 𝑥 , 𝑔 𝑦 , 𝑔 𝑧 ) ∈ G3 , where 𝑥, 𝑦, 𝑧 ← Z𝑞 . It is computationally hard
3.4. Merkle tree
for adversary  to distinguish between two tuples, the advantage of
PPT adversary  is negligible.
In the proposed scheme, the Merkle tree 𝑇 is used to represent the
𝐴𝑑𝑣DDH
 (𝜅) = |𝑃 𝑟[() = 1] 𝑃 𝑟[() = 1]| ≤ 𝑛𝑒𝑔𝑙(𝜅). membership of the set. The root of the tree 𝑇 is denoted 𝑇𝑟𝑜𝑜𝑡 . The
Merkle tree has the following functions:
Definition 2.3 (Computing DiffieHellman (CDH) Assumption). Let G
be a cyclic group of order 𝑞 with generator 𝑔. Given the tuple  = • 𝑇 .𝐼𝑛𝑠𝑒𝑟𝑡(𝑣) → 𝑇 Inserts the value 𝑣 into the next available leaf
(𝑔, 𝑔 𝑎 , 𝑔 𝑏 ) where 𝑎, 𝑏 ← Z𝑞 , computing 𝑔 𝑎𝑏 is hard. For all probabilistic in 𝑇 and returns the modified tree.
polynomial-time (PPT) algorithms , the advantage probability of • 𝑇 .𝑅𝑒𝑚𝑜𝑣𝑒(𝑣) → 𝑇 Removes 𝑣 from the tree, if it exists, and
successfully solving the CDH problem is negligible. returns the modified tree 𝑇 .
| [ ]| • 𝑇 .𝐴𝑢𝑡𝑃 𝑎𝑡(𝑣) → 𝜃 Generate an authentication path 𝜃 that
𝐴𝑑𝑣𝐶𝐷𝐻 (𝜅) = |𝑃 𝑟 (𝑔, 𝑔 𝑎 , 𝑔 𝑏 ) = 𝑔 𝑎𝑏 | ≤ 𝑛𝑒𝑔𝑙(𝜅).
 | | proves 𝑣𝑇 . The size of 𝜃 is proportional to the height of the
where 𝜅 is a security parameter, 𝑛𝑒𝑔𝑙(𝜅) denotes a negligible function. tree, ensuring efficient verification in cryptographic protocols.
3
H. Di et al. Computer Standards & Interfaces 97 (2026) 104097
Table 2
Summary of notations.
Symbol Description
 , ,  User, Issuer, Verifier
𝜆 Security parameter
The maximum height of the Merkle tree
𝑚 The maximum number of attributes
𝑛 The number of access criteria the verifier is allowed to define
𝜄𝑝𝑢𝑏 , 𝜄𝑧𝑘 Verify the access policy for ancillary information when the request is issued
𝑖𝑎𝑢𝑥𝑧𝑘 , 𝑖𝑎𝑢𝑥𝑝𝑢𝑏 Auxiliary information when requesting registration
𝜙𝑖 The verifier defines the 𝑖th access criterion
𝑎𝑢𝑥𝑖 Show proof of auxiliary information
{ }𝑚
𝐴𝑡𝑡𝑟𝑠 = 𝑎𝑡𝑡𝑟𝑖 𝑖=1 The 𝑖th attribute of the user and the attribute set
𝑤 Witness Collection
𝑐𝑡𝑥 Context information
𝐼, 𝑉 Collection of issuance criteria and access criteria
𝛱𝑈1 , 𝛱𝑉1 , 𝛱̃ Zero-knowledge proofs generated by the user and issuer
𝑠 ← Z𝑞 A secret random number randomly selected by the issuer
𝜃 The authentication path generated by the Merkle tree
𝑇𝑟𝑜𝑜𝑡 , 𝑇𝜅 , 𝑇𝜅′ Merkle tree root, Merkle tree, updated Merkle tree
Note*: 𝜄, 𝜙  → {0, 1} is a predicate over the users attributes that needs to be satisfied in order to pass verification, i.e.,
verification only passes if 𝜄𝑝𝑢𝑏 (𝑖𝑎𝑢𝑥𝑝𝑢𝑏 ) = 1, 𝜙(𝐴𝑡𝑡𝑟𝑠, 𝑎𝑢𝑥) = 1.
3.5. Pseudo-Random Function (PRF) • 𝑆𝑒𝑡𝑢𝑝(1𝜆 , 1 , 1𝑚 ) → 𝑝𝑝 The algorithm inputs the security pa-
rameter 𝜆, the maximum height of the Merkle tree, and the
A Pseudo-Random Function (PRF) is a family of computational func- maximum number 𝑚 of attributes in a credential. Generates the
{ } system parameters 𝑝𝑝.
tions 𝐹𝑘 , where 𝑘 is a key and 𝐹𝑘 is a function from the input space
to the output space. For an ideal PRF, when the key 𝑘 is unknown, its • 𝐼𝑠𝑠𝑢𝑒𝑆𝑒𝑡𝑢𝑝𝐼 (𝑝𝑝) → (𝐼, 𝜄𝑝𝑢𝑏 ) The algorithm inputs the public
output is computationally indistinguishable from that of a true random parameter 𝑝𝑝, outputs the issue criteria set 𝐼 and the issue criteria
for verifying public auxiliary information 𝜄𝑝𝑢𝑏 .
function. We construct a PRF with efficient correctness proof. We adopt
the specific PRF construction proposed by Dodis and Yampolskiy [39] • 𝑆𝑜𝑤𝑆𝑒𝑡𝑢𝑝𝑉 (𝑝𝑝) → 𝑉 The verifier sets up 𝑛 access criteria to
(DY-PRF). The DY-PRF is defined by the tuple (G, 𝑞, 𝑔, 𝑠), where G = ⟨𝑔⟩ define the users access policy. This algorithm outputs a collection
of access criteria 𝑉 = {𝜙1 , 𝜙2 , … , 𝜙𝑛 } where each 𝜙𝑖 represents an
is a cyclic group of prime order 𝑞 and 𝑠 ∈ Z𝑞 . For an input 𝑘, 𝑃 𝑅𝐹𝑔,𝑠 (𝑘)
access criteria.
is defined as 𝑃 𝑅𝐹𝑔,𝑠 (𝑘) 𝑘𝑔 (𝑠+𝑘+1) . There exists an efficient proof of
𝐼𝑠𝑠𝑢𝑒𝑅𝑒𝑞
( ( 𝑈 (𝑝𝑝, 𝐼, 𝐴𝑡𝑡𝑟𝑠,
) ) 𝑖𝑎𝑢𝑥𝑧𝑘 , 𝑖𝑎𝑢𝑥𝑝𝑢𝑏 )
𝑤, 𝑐𝑡𝑥, →
correct formation for the output, and as long as the 𝑞-DDHI assumption
𝐶𝑚, 𝛱𝑈1 , 𝑖𝑎𝑢𝑥𝑧𝑘 , 𝑖𝑎𝑢𝑥𝑝𝑢𝑏 The issue request algorithm inputs
holds, the output 𝑃 𝑅𝐹𝑔,𝑠 (𝑘) is indistinguishable from a random element
the public parameters 𝑝𝑝, the issue criteria 𝐼, the set of attributes
in G𝑞 .
𝐴𝑡𝑡𝑟𝑠 of  , the secret value 𝑤, the context 𝑐𝑡𝑥, and the auxiliary
information (𝑖𝑎𝑢𝑥𝑧𝑘 , 𝑖𝑎𝑢𝑥𝑝𝑢𝑏 ).  generates the 𝛱𝑈1 associated with
4. Proposed scheme 𝑖𝑎𝑢𝑥𝑧𝑘 and outputs ((𝛱𝑈1 , 𝑖𝑎𝑢𝑥𝑧𝑘 ), 𝑖𝑎𝑢𝑥𝑝𝑢𝑏 ).
𝐼𝑠𝑠𝑢𝑒𝐺𝑟𝑎𝑛𝑡𝐼 (𝑝𝑝, (𝐼, 𝜄𝑝𝑢𝑏 ), (𝛱𝑈1 , 𝑖𝑎𝑢𝑥𝑧𝑘 ), 𝑖𝑎𝑢𝑥𝑝𝑢𝑏 ) →
In this section, we describe in Table 2 all the symbolic definitions (𝑠 , (𝜃, 𝑇𝑟𝑜𝑜𝑡 ), 𝑘, 𝑇𝜅 ) The algorithm inputs the zero-knowledge sig-
involved as well as the implications, followed by defining the syntax nature 𝛱𝑈1 , and the auxiliary information (𝑖𝑎𝑢𝑥𝑧𝑘 , 𝑖𝑎𝑢𝑥𝑝𝑢𝑏 ). Then
and designing the scheme.  return the random value 𝑠 , authentication path 𝜃, number of
times 𝑘 to  , and locally generated Merkle tree 𝑇𝜅 .
{ }𝑛 { }
𝑆𝑜𝑤𝐶𝑟𝑒𝑑𝑈 (𝑝𝑝, 𝑉 , 𝑇𝑟𝑜𝑜𝑡 , 𝑐𝑟𝑒𝑑, 𝜃, 𝑤𝑖 , 𝑎𝑢𝑥𝑖 𝑖=1 ) → (𝛱, ̃ 𝑎𝑢𝑥𝑖 𝑛 )
4.1. Syntax and security model 𝑖=1
 inputs the root 𝑇𝑟𝑜𝑜𝑡 of the affiliated tree, the credential 𝑐𝑟𝑒𝑑,
and the authentication path 𝜃.  shows that the sent credential
4.1.1. Security definition satisfies the access criterion 𝜙𝑖 and proves that the displayed
The security of the system is defined by the standard properties credential
{ } belongs to the tree 𝑇𝜅 . Then, the algorithm outputs
of anonymous credentials, including unforgeability, anonymity, un- ̃ 𝑎𝑢𝑥𝑖 𝑛 ).
(𝛱, 𝑖=1 { }
linkability, and attribute privacy. In our model, the attacker is as- • 𝑉 𝑒𝑟𝑖𝑓 𝑦𝑆𝑜𝑤𝑉 (𝑝𝑝, 𝑉 , (𝑐𝑟𝑒𝑑, 𝑇𝑟𝑜𝑜𝑡 ), (𝛱, ̃ 𝑎𝑢𝑥𝑖 𝑛 )) → 01  ver-
𝑖=1
sumed to have only polynomial-time computational capability, and all ifies that the credentials 𝑐𝑟𝑒𝑑 displayed by  meet the access
communications occur over open channels. criteria and that 𝑐𝑟𝑒𝑑 belongs to the Merkle tree 𝑇𝜅 ,  outputting
Threat Model. Our model considers adversaries as external attack- 0/1.
ers intercepting or modifying communications without breaking hard • 𝑅𝑒𝑣𝑜𝑘𝑒𝐶𝑟𝑒𝑑𝐼 (𝑝𝑝, 𝑇𝜅 , 𝑐𝑟𝑒𝑑) → 𝑇𝜅′  revoke the 𝑐𝑟𝑒𝑑 registered by
cryptographic problems, internal attackers misusing valid credentials dishonest users and update the Merkle tree 𝑇𝜅 to 𝑇𝜅′ .
for forgery, transfer, or link attacks, semi-honest verifiers inferring user
identities or attributes while following the protocol, and trusted-but- 4.1.3. Security requirements
curious issuers complying with the protocol but attempting to snoop The scheme is required to satisfy the following security require-
on user data. ments:
Unforgeability: Attackers cannot forge valid credentials and de-
ceive validators into performing correct verification. This game is
4.1.2. Syntax definition reduced to discrete logarithm or CDH problems.
Referring to the ideal function  in [38], the zk-credit anonymous Anonymity: Credentials are displayed without revealing the users
credential approach realizes  using Groth16 [40], which is not suitable identity. This game specification is reduced to the DDH problem.
for authentication. In this work,  is instantiated using signatures of Unlinkability: Different displays of the same certificate cannot
knowledge, resulting in an algorithm that meets the authentication be linked, even if the merkle path remains identical across multiple
requirements. The specific algorithm is as follows: authentications.
4
H. Di et al. Computer Standards & Interfaces 97 (2026) 104097
Fig. 1. System Model.
Attribute Privacy: Hides attributes when displaying credentials from untrusted channels, forge information and impersonate users.
unless the access policy requires them to be displayed. Therefore, this paper adopts the method of zero-knowledge proof to
Security is analyzed using a formal game-based model [41] under realize the users verification of the certificate sent by the issuer, and
the random oracle assumption [42]. The game is defined as follows: prove to the verifier that the certificate is the users own, and at the
same time, it can reduce the risk of privacy leakage. As shown in Fig.
Game 1: Unforgeability Game 1.
Setup. The challenger-1 run system initialization algorithm
𝑆𝑒𝑡𝑢𝑝(1𝜆 , 1 , 1𝑚 ) generate 𝑝𝑝, send 𝑝𝑝 to adversary 1 . 1 save issuer • Issuer: The issuer is the issuer of the certificate, usually an
private key 𝑖𝑠𝑘. authority or trusted entity (such as government, enterprise, de-
Query. In this phase, the adversary 1 can querie three random centralized organization, etc.), which is responsible for verifying
oracles, as follows: the identity or attribute of the user and generating the encrypted
credential. Before sending the certificate, the issuing criteria will
1. − 𝑄𝑢𝑒𝑟𝑦: 1 query random oracle 1 , 2 , 3 , 1 random re- be verified.
sponse and recording. • User: The user is the holder of the credential, requests the cre-
2. 𝑄𝑢𝑒𝑟𝑦2 : 1 query the issuer to registered certificate, 1 use dential from the issuer, upon receipt, verifies the credential.
the simulator  Simulate the interaction between 𝐼𝑠𝑠𝑢𝑒𝑅𝑒𝑞 and • Verifier: The verifier is the receiver of credentials, who receives
𝐼𝑠𝑠𝑢𝑒𝐺𝑟𝑎𝑛𝑡, using the programmability of random oracle to gen- the users credentials, goes through a secure channel, downloads
erate effective 𝑆𝑃 𝐾2 . the criteria and auxiliary verification data, verifies the access
3. 𝑄𝑢𝑒𝑟𝑦3 : 1 query certificate display, simulate the interaction criteria, and then verifies the users identity.
between 𝑆𝑜𝑤𝐶𝑟𝑒𝑑 and 𝑉 𝑒𝑟𝑖𝑓 𝑦𝑆𝑜𝑤, and simulate 𝑆𝑃 𝐾3 using
a zero-knowledge simulator. 4.2.1. System ( initialization
)
𝑆𝑒𝑡𝑢𝑝 1𝜆 , 1 , 1𝑚 → 𝑝𝑝
Forgery. 1 output a forged certificate 𝑐𝑟𝑒𝑑 , correspond Merkle  select a cyclic group G of order 𝑞, and generate generators
tree path 𝜃 , satisfy that 𝑐𝑟𝑒𝑑 is not on the list of previously issued 𝑢, {𝑢𝑖 }𝑖∈[0,𝑛] ) ∈ G, along with hash functions 𝐻1
(𝑔0 , 𝑔1 , 𝑔2 , 𝛾, 0 , 1 , 2 , ̃
credentials. 𝑉 𝑒𝑟𝑖𝑓 𝑦𝑆𝑜𝑤 accept 𝑐𝑟𝑒𝑑 and 𝜃 . 1 wins conditional on {0, 1} → Z𝑞 and 𝐻2 {0, 1} × {0, 1} → Z𝑞 ;
the output of valid forged credentials. Define a Merkle tree of height , where for public input (𝑇𝑟𝑜𝑜𝑡 , 𝑐𝑟𝑒𝑑),
it can prove 𝑐𝑟𝑒𝑑 ∈ 𝑇𝜅 through an authentication path 𝜃;
Game 2: Anonymity and Unlinkability Game Define the global period 𝑒𝑝𝑜𝑐 and pseudorandom function
Setup. The challenger-2 run system initialization algorithm 𝑃 𝑅𝐹𝑔,𝑠 (𝑘) 𝑘𝑔𝑠+𝑘+1 1
;
𝑆𝑒𝑡𝑢𝑝(1𝜆 , 1 , 1𝑚 ) generate 𝑝𝑝, send 𝑝𝑝 to adversary 2 . 2 save issuer 𝑦
 selects random number 𝑦1 , 𝑦2 ← Z𝑞 , computes 𝑌1 = 11 , 𝑌2 =
private key 𝑖𝑠𝑘. 𝑦2
2 , and sets the issuer secret key 𝑖𝑠𝑘 = (𝑦1 , 𝑦2 ) and issuer public key
Query. Adversary 2 can continue to query issuance and pre-
𝑖𝑝𝑘 = (𝑌1 , 𝑌2 ); (
sentation, but cannot query revocation or presentation of challenge
Set the public parameters 𝑝𝑝 ) = 𝑞, G, 𝑔0 , 𝑔1 , 𝑔2 , 𝛾, 0 , 1 , 2 ,
credentials. 𝑢, {𝑢𝑖 }𝑖∈[0,𝑛] , 𝐻1 , 𝐻2 , 𝑇𝜅 (, 𝑇𝑟𝑜𝑜𝑡 , 𝑒𝑝𝑜𝑐,
̃ 𝑖𝑝𝑘 .
challenge. The adversary 2 selects the identity and attribute sets )
( ) ( ) 𝐼𝑠𝑠𝑢𝑒𝑆𝑒𝑡𝑢𝑝𝐼 (𝑝𝑝) → 𝐼, 𝜄𝑝𝑢𝑏
of two users, 𝐼0 , 𝐴𝑡𝑡𝑟𝑠0 , 𝐼1 , 𝐴𝑡𝑡𝑟𝑠1 , which satisfy the same access Define the relevant issuance criteria 𝜄 = (𝜄𝑧𝑘 , 𝜄𝑝𝑢𝑏 ), set
policy. Send it to the challenger 2 . 2 randomly selects 𝑏 ← {0, 1} 𝐼𝑠𝑠𝑢𝑒𝐶𝑟𝑖𝑡𝑒𝑟𝑖𝑎[𝐼] = 𝐼𝑠𝑠𝑢𝑒𝐶𝑟𝑖𝑡𝑒𝑟𝑖𝑎[𝐼] 𝜄;
to generate a credential for 𝐼𝑏 and display it (i.e., run 𝑆𝑜𝑤𝐶𝑟𝑒𝑑 to For the public input auxiliary information 𝑖𝑎𝑢𝑥𝑧𝑘 , prove:
generate 𝛱𝑏 ), and then gives 𝛱𝑏 to 2 . 𝜄𝑧𝑘 (𝐴𝑡𝑡𝑟𝑠, 𝑖𝑎𝑢𝑥𝑧𝑘 ) = 1;
Guess. 2 outputs 𝑏 and wins if 𝑏 = 𝑏. Publish (𝐼, 𝜄𝑝𝑢𝑏 ).
𝑆𝑜𝑤𝑆𝑒𝑡𝑢𝑝𝑉 (𝑝𝑝) → 𝑉
4.2. Scheme construction  define access criteria 𝜙 for user attributes 𝐴𝑡𝑡𝑟𝑠 (Multiple access
criteria 𝜙𝑖 can be defined), and set 𝐴𝑐𝑐𝑒𝑠𝑠𝐶𝑟𝑖𝑡𝑒𝑟𝑖𝑎[𝑉 ]
In this scheme, the user is untrusted, the issuer is semi-trusted, the = 𝐴𝑐𝑐𝑒𝑠𝑠𝐶𝑟𝑖𝑡𝑒𝑟𝑖𝑎[𝑉 ] {𝜙𝑖 };
channel between the verifier and the issuer is trusted, and the rest of For public input (𝑇root , 𝑐𝑟𝑒𝑑, 𝑎𝑢𝑥), prove: 𝜙(𝐴𝑡𝑡𝑟𝑠, 𝑎𝑢𝑥) = 1𝛬𝑐𝑟𝑒𝑑;
the channels are untrusted channels. Attackers can steal information Publish the access criteria set 𝑉 .
5
H. Di et al. Computer Standards & Interfaces 97 (2026) 104097
4.2.2. Credential registration Proof 𝛱̃ = 𝑆𝑃 𝐾3 . The generation of 𝛱̃ = 𝑆𝑃 𝐾3 is as follows:
( ( ))
𝐼𝑠𝑠𝑢𝑒𝑅𝑒𝑞𝑈 𝑝𝑝, 𝐼, 𝐴𝑡𝑡𝑟𝑠, 𝑤, 𝑐𝑡𝑥, 𝑖𝑎𝑢𝑥𝑧𝑘 , 𝑖𝑎𝑢𝑥𝑝𝑢𝑏 → ( )
( ( 1 ) ) ⎧ 𝑛𝑘, 𝑟𝑘, 𝐴𝑡𝑡𝑟𝑠, 𝛼0 , 𝑥𝑢 , 𝑠, 𝑡, 𝑛𝑗 , 𝑎𝑡𝑡𝑟𝑗𝐴𝑇 𝑇 𝑅
𝐶𝑚, 𝛱𝑈 , 𝑖𝑎𝑢𝑥𝑧𝑘 , 𝑖𝑎𝑢𝑥𝑝𝑢𝑏 𝛼
𝑋0 = 𝑔0 0 𝛾 𝐻1 (𝜃) ⎪
 generate anonymous key 𝑛𝑘 and rate-limiting key 𝑟𝑘 us-
⎪ ∧ 𝜁 = 𝑌1𝑥𝑢 𝑌2𝑠 ⋅ 𝐶𝑚𝑡 ⎪
ing pseudorandom function 𝑃 𝑅𝐹 and context 𝑐𝑡𝑥, calculate 𝑛𝑘 = ⎪ 1 ⎪
𝑃 𝑅𝐹 (𝑐𝑡𝑥), 𝑟𝑘 = 𝑃 𝑅𝐹 (𝑒𝑝𝑜𝑐𝑐𝑡𝑥), define 𝑚 attributes 𝐴𝑡𝑡𝑟𝑠 = ⎪ ∧ 𝜂 = 𝑃 𝑅𝐹𝑟𝑘,𝑢̃ (𝑛𝑗 ) = 𝑟𝑘+𝑛 +1 ⎪
⎪ 𝑢̃ 𝑗
{𝑎𝑡𝑡𝑟1 , 𝑎𝑡𝑡𝑟2 , … , 𝑎𝑡𝑡𝑟𝑚 }; 𝛱̃ = 𝑆𝑃 𝐾3 ⎨ 𝑥𝑢 𝑅 𝑥𝑢
𝑅
𝑛𝑘+𝑛𝑗 +1 ⎬
Select a random blind factor 𝑟 ← Z𝑞 and compute pedersen ⎪ ∧ 𝛤 = 𝑢0 𝑃 𝑅𝐹𝑛𝑘,𝑢̃ (𝑛𝑗 ) = 𝑢0 ⋅ 𝑢̃ ⎪
⎪ ∧ 0 ≤ 𝑛𝑗 < 𝑘
commitment 𝐶𝑚, where 𝐶𝑚 ∈ G: ⎪ ⎪
( 𝑚 ) ⎪ ∧ 𝜙 1 (𝐴𝑡𝑡𝑟𝑠, 𝑎𝑢𝑥 1 ) = 1 ⎪
𝐻 (𝑎𝑡𝑡𝑟 ) ⎪ ∧ ⋮ ⎪
𝐶𝑚 = 𝐶𝑜𝑚𝑚𝑖𝑡(𝑛𝑘, 𝑟𝑘, 𝐴𝑡𝑡𝑟𝑠; 𝑟) = 𝑔1𝑛𝑘 𝑔2𝑟𝑘 𝑢𝑖 1 𝑖𝑟0 ; ⎪ ∧ 𝜙 (𝐴𝑡𝑡𝑟𝑠, 𝑎𝑢𝑥 ) = 1 ⎪
𝑖 𝑖
𝑖=1 ( )
Set 𝑤 = (𝑟, 𝑛𝑘, 𝑟𝑘, 𝐴𝑡𝑡𝑟𝑠) (collect private witness 𝑤), select × 𝑎𝑢𝑥𝑖 , 𝑋0 , 𝜁 , 𝜂, 𝛤 , 𝑇𝑟𝑜𝑜𝑡 ;
𝑥𝑢 , 𝑠 , 𝑡 ← Z𝑞 and generate 𝛱𝑈1 :
Send (𝛱, ̃ {𝑎𝑢𝑥𝑖 }𝑛 , 𝑋0 , 𝜁 , 𝜂, 𝛤 , (𝜃, 𝑇𝑟𝑜𝑜𝑡 ), 𝛷′ , 𝑎𝑡𝑡𝑟𝑖𝐴𝑇 𝑇 𝑅 ) to the
𝑖=1
⎧ ( ) ⎫ verifier .
𝑥𝑢 , 𝑠 , 𝑡, 𝑟, 𝑛𝑘, 𝑟𝑘, 𝐴𝑡𝑡𝑟𝑠 ⎪ ( ( ) ( { } ))
𝑥𝑢 𝑠 𝑉 𝑒𝑟𝑖𝑓 𝑦𝑆𝑜𝑤𝑉 𝑝𝑝, 𝑉 , 𝑐𝑟𝑒𝑑, 𝑇𝑟𝑜𝑜𝑡 , 𝛱, ̃ 𝑎𝑢𝑥𝑖 𝑛 → 01
𝑋𝑢 = 𝑔1 𝑔2 ⎪( ) 𝑖=1
𝛱𝑈1 = 𝑆𝑃 𝐾1 ⎨ 𝑥𝑢 𝑠 𝑡𝑋𝑢 , 𝜁, 𝑖𝑎𝑢𝑥𝑧𝑘 , 𝑖𝑎𝑢𝑥𝑝𝑢𝑏 ;  checks whether the users submitted 𝛷′ matches its defined
⎪ ∧ 𝜁 = 𝑌 𝑌 ⋅ 𝐶𝑚 ⎪
( 1 2 ) access criteria set 𝛷. Using 𝜃, verify and calculate 𝑐𝑟𝑒𝑑 = 𝜁 𝑢0 2
? 𝐻 (𝑒𝑝𝑜𝑐ℎ∥𝑘)
.
⎪ ∧ 𝜄𝑧𝑘 𝐴𝑡𝑡𝑟𝑠, 𝑖𝑎𝑢𝑥𝑧𝑘 = 1 ⎪
⎩ ⎭ If (𝜂, 𝛤 ) is valid, it proves that 𝑛𝑗 is within the range allowed to be
1
 send (𝛱𝑈 , 𝑋𝑢 , 𝜁, 𝑖𝑎𝑢𝑥𝑧𝑘 , 𝑖𝑎𝑢𝑥𝑝𝑢𝑏 ) to Issuer ; displayed within 𝑒𝑝𝑜𝑐;
 received 𝛱𝑉1 . If verification passes, receive the returned au- If verification succeeds, accept the request, otherwise reject it and
thentication path 𝜃, 𝑠 and 𝑘; invoke the 𝑅𝑒𝑣𝑜𝑘𝑒𝐶𝑟𝑒𝑑 function to revoke 𝑐𝑟𝑒𝑑. For the specific process,
Locally store (𝑛𝑘, 𝑟𝑘, 𝑟, 𝐴𝑡𝑡𝑟𝑠, 𝜃, 𝑠, 𝑡, 𝑒𝑝𝑜𝑐, 𝑘), where 𝑠 = 𝑠 + 𝑠 and please refer to Fig. 2.
𝑘 is the maximum allowed accesses within epoch 𝑒𝑝𝑜𝑐.
𝐼𝑠𝑠𝑢𝑒𝐺𝑟𝑎𝑛𝑡𝐼 (𝑝𝑝, (𝐼, 𝜄𝑝𝑢𝑏 ), (𝛱𝑈1 , 𝑖𝑎𝑢𝑥𝑧𝑘 ), 𝑖𝑎𝑢𝑥𝑝𝑢𝑏 ) →
( ( ) ) 4.2.4. Credential revocation
𝑐𝑟𝑒𝑑, 𝑠 , 𝜃, 𝑇𝑟𝑜𝑜𝑡 , 𝑘, 𝑇𝜅 ( )
𝑅𝑒𝑣𝑜𝑘𝑒𝐶𝑟𝑒𝑑 𝑝𝑝, 𝑇𝜅 , 𝑐𝑟𝑒𝑑 → 𝑇𝜅′
− verify 𝜄𝑝𝑢𝑏 (𝑖𝑎𝑢𝑥𝑝𝑢𝑏 ), 𝜄𝑝𝑢𝑏 checks for publicly auxiliary information Search for 𝑐𝑟𝑒𝑑 ∈ 𝑇𝜅 , if 𝑐𝑟𝑒𝑑 is not found, terminate the process;
𝑖𝑎𝑢𝑥𝑝𝑢𝑏 ;
Else run 𝑇𝜅′ = 𝑇𝜅 . Remove(𝑐𝑟𝑒𝑑), store and update the Merkle
Verify 𝛱𝑈1 = 𝑆𝑃 𝐾1 , where 𝛱𝑈1 proves the correctness of tree 𝑇𝜅′ ;
(𝜁, 𝑋𝑢 , 𝑖𝑎𝑢𝑥𝑧𝑘 , 𝑖𝑎𝑢𝑥𝑝𝑢𝑏 ) and that the hidden attributes satisfy the issuance Return 𝑇𝑘 and publicly notify that 𝑐𝑟𝑒𝑑 has been revoked.
criteria 𝜄𝑧𝑘 . If verification fails, reject issuance and abort ⟂;
Else verification passes,  randomly selects 𝑠 ← Z𝑞 , and define
5. Analysis of correctness and security
the maximum times of accesses 𝑘 allowed by users within 𝑒𝑝𝑜𝑐,
𝐻 (𝑒𝑝𝑜𝑐ℎ∥𝑘)
calculate 𝑐𝑟𝑒𝑑 = (𝜁 ⋅ 𝑌2𝑠 ) ⋅ 𝑢0 1 , run 𝑇𝜅 = 𝑇 .Insert(𝑐𝑟𝑒𝑑) registers
5.1. Correctness analysis
the anonymous credential. Where the registered 𝑐𝑟𝑒𝑑 is only known
privately by the issuer. Then, run 𝜃 = 𝑇𝜅 .AuthPath(𝑐𝑟𝑒𝑑) generate
authentication path. Updated Merkle tree root 𝑇𝑟𝑜𝑜𝑡 , and upload to a 5.1.1. Details of 𝑆𝑃 𝐾1
public panel such as blockchain; 𝑆𝑃 𝐾1 can be implemented using standard discrete logarithm proof
techniques.
Next, select 𝑧0 , 𝑧1 ← Z𝑞 and generate 𝛱𝑉1 :
( ) 1. (Commitment.) User  randomly selects 𝑠1 , 𝑠2 , 𝑠3 ∈𝑅 Z𝑞 and
𝑧0 , 𝑧1 , 𝑦1 , 𝑦2
1 ⎪ 𝑌 =
𝑦1 𝑦2
⎪(
) computes:
𝛱𝑉 = 𝑆𝑃 𝐾2 ⎨ 𝑢 ( 1 2 )𝑧1 ⎬ 𝑌𝑢 , 𝑠 , 𝑘,  ; 𝑠 𝑠 𝑠 𝑠 𝑦 𝑦
⎪ ∧ = 𝜁 ⋅𝑌 𝑠 𝐻 2 (𝑒𝑝𝑜𝑐ℎ∥𝑘)⋅𝑧 0 ⎪ 𝑇1 = 𝑔11 𝑔22 , 𝑇2 = 𝑌1 1 𝑌2 2 ⋅ 𝐶𝑚𝑠3 = (11 )𝑠1 (22 )𝑠2 ⋅ 𝐶𝑚𝑠3 .
⎩ 2
𝑢0 ⎭ 2. (Challenge.) The scheme uses non-interactive zero-knowledge
 store the Merkle tree 𝑇𝜅 and send (𝛱𝑉1 , 𝑠 , 𝑘, 𝜃) to user  .
proof, where the user  generates challenge 𝑐:
4.2.3. Show and verification certificate 𝑐 = 𝐻(𝑇1 ∥ 𝑇2 ∥ 𝑋𝑢 ∥ 𝜁 ∥ 𝑖𝑎𝑢𝑥𝑧𝑘𝑖𝑎𝑢𝑥𝑝𝑢𝑏 ).
( { }𝑛 ) ( { } )
̃ 𝑎𝑢𝑥𝑖 𝑛
𝑆𝑜𝑤𝐶𝑟𝑒𝑑𝑈 𝑝𝑝, 𝑉 , 𝑇𝑟𝑜𝑜𝑡 , cred, 𝜃, 𝑤𝑖 , 𝑎𝑢𝑥𝑖 𝑖=1 → 𝛱,
𝑖=1 3. (Proof.)  generates proof 𝛱𝑈1 that satisfies issuer policy
User  sends an access request message 𝑚𝑠𝑔, and the verifier 𝜄𝑧𝑘 , 𝜄𝑧𝑘 (𝐴𝑡𝑡𝑟𝑠, 𝑖𝑎𝑢𝑥𝑧𝑘 ) = 1, and computes 𝑆1 = 𝑠1 𝑐𝑥𝑢 , 𝑆2 =
returns a random number 𝑅 = 𝐻2 (𝑛𝑜𝑛𝑐𝑒 ∥ 𝑚𝑠𝑔); 𝑠2 𝑐𝑠 , 𝑆3 = 𝑠3 𝑐𝑡. The proof 𝛱𝑈1 = (𝑐, 𝑆1 , 𝑆2 , 𝑆3 ), and sends
 locally retrieves the verifiers access criteria 𝑉 and the root ((𝛱𝑈1 , 𝑖𝑎𝑢𝑥𝑧𝑘 ), 𝑖𝑎𝑢𝑥𝑝𝑢𝑏 ) to the issuer .
node 𝑇𝑟𝑜𝑜𝑡 of the tree containing 𝑐𝑟𝑒𝑑; 𝑆 𝑆 𝑆 𝑆
4. (Verify.)  computes 𝑇1 = 𝑋𝑢𝑐 𝑔1 1 𝑔2 2 , 𝑇2 = 𝜁 𝑐 𝑌1 1 𝑌2 2 ⋅ 𝐶𝑚𝑆3 , and
? ?
Upon receiving (𝑛𝑜𝑛𝑐𝑒, 𝑅), verify 𝑅 = 𝐻2 (𝑛𝑜𝑛𝑐𝑒 ∥ 𝑚𝑠𝑔), then verify: 𝑐 = 𝐻(𝑇1𝑇2𝑋𝑢 ∥ 𝜁 ∥ 𝑖𝑎𝑢𝑥𝑧𝑘𝑖𝑎𝑢𝑥𝑝𝑢𝑏 ). If verification
randomly select 𝛼0 ← Z𝑞 . For 𝑛 access criteria 𝛷′ = {𝜙1 , 𝜙2 , … , 𝜙𝑛 }, passes, then 𝛱𝑈1 is correct, otherwise abort.
partition the attribute set into public attributes 𝐴𝑇 𝑇 𝑅 and secret
attributes {𝑎𝑡𝑡𝑟𝑗𝐴𝑇 𝑇 𝑅 }. Compute the commitment using blind
5.1.2. Details of 𝑆𝑃 𝐾2
factor 𝑟:
SPK2 can also be implemented using standard discrete logarithm
𝐶𝑚 = 𝐶𝑜𝑚𝑚𝑖𝑡(𝑛𝑘, 𝑟𝑘, {𝑎𝑡𝑡𝑟𝑗𝐴𝑇 𝑇 𝑅 }; 𝑟) proof techniques.
⎛ ∏ ⎞ ∏
𝐻 (𝑎𝑡𝑡𝑟 ) 1. (Commitment.) The issuer/trust authority randomly selects
= ⎜𝑔1𝑛𝑘 𝑔2𝑟𝑘𝑢𝑖 1 𝑗𝑟0 ⎟ ⋅
𝐻 (𝑎𝑡𝑡𝑟 )
𝑢𝑖 1 𝑖 ;
⎜ ⎟ 𝑡1 , 𝑡2 , 𝑡3 , 𝑡4 ∈𝑅 Z𝑞 and computes:
𝑎𝑡𝑡𝑟 𝑗 ∉𝐴𝑇 𝑇 𝑅 ⎠ 𝑎𝑡𝑡𝑟 𝑖 ∉𝐴𝑇 𝑇 𝑅
Next, the times of certificate displays is initialized to 𝑛𝑗 = 1, and 𝑡 𝑡 𝐻 (𝑒𝑝𝑜𝑐ℎ∥𝑘)⋅𝑡4
𝐶1 = 11 22 , 𝐶2 = (𝜁 ⋅ 𝑌2𝑠 )𝑡3 ⋅ 𝑢0 2 .
𝑛𝑗 = 𝑛𝑗 + 1 (0 ≤ 𝑛𝑗 < 𝑘) is set for each generation of zero-knowledge
6
H. Di et al. Computer Standards & Interfaces 97 (2026) 104097
Fig. 2. System Flowchart.
2. (Challenge.) The scheme uses non-interactive zero-knowledge 2. (Challenge.) Using non-interactive zero-knowledge proof, the
proof, where  generates challenge 𝑐: user generates challenge 𝑐:
𝑐 = 𝐻(𝐶1 ∥ 𝐶2 ∥ 𝑌𝑢 ∥  ∥ 𝑠𝑘). 𝑐 = 𝐻(𝐴1 ∥ 𝐴2 ∥ 𝐴3 ∥ 𝐴4 ∥ 𝐴5 ∥ 𝑋0 ∥ 𝜁 ∥ 𝜂 ∥ 𝛤 ∥ 𝑇𝑟𝑜𝑜𝑡𝑎𝑢𝑥𝑖 ).
3. (Proof.) The issuer generates proof 𝛱𝑉1 by computing 𝐶1 = 3. (Proof.)  generates proof 𝛱̃ by computing:
𝑡1 𝑐𝑦1 , 𝐶2 = 𝑡2 𝑐𝑦2 , 𝐶3 = 𝑡3 𝑐𝑧1 , 𝐶4 = 𝑡4 𝑐𝑧0 . The
proof 𝛱𝑉1 = (𝑐, 𝐶1 , 𝐶2 , 𝐶3 , 𝐶4 ),  sends (𝛱𝑉1 , 𝑠 , 𝑘) to user. 𝐴1 = t3 𝑐𝛼0 , 𝐴2 = t4 𝑐𝑥𝑤 , 𝐴3 = t5 𝑐𝑠,
𝐶 𝐶 𝐴4 = t6 𝑐𝑡, 𝐴5 = n7 𝑐𝑛𝑗 , 𝐴6 = n8 𝑐𝜌1 ,
4. (Verify.) computes, C1 = 𝑌𝑢𝑐 1 1 2 2 , C2 = 𝑐 (𝜁 ⋅ 𝑌 𝑠 )𝐶3
2
𝐻2 (𝑒𝑝𝑜𝑐ℎ∥𝑘)⋅𝐶4 ?
𝑢0 , and verify: 𝑐 = 𝐻(C1 ∥ C2 ∥ 𝑌𝑢𝑍𝑘). ∥ 𝑠 𝐴7 = 𝜚2 𝑐𝑟𝑘, 𝐴8 = 𝜚1 𝑐𝑛𝑘.
If verification passes, then 𝛱𝑉1 is correct, otherwise abort.
The proof 𝛱̃ = (𝑐, 𝐴1 , 𝐴2 , 𝐴3 , 𝐴4 , 𝐴5 , 𝐴6 , 𝐴7 , 𝐴8 ), and sends
̃ 𝑎𝑢𝑥𝑖 , 𝑋0 , 𝜁 , 𝜂, 𝛤 , 𝑇𝑟𝑜𝑜𝑡 ) to verifier .
(𝛱,
5.1.3. Details of 𝑆𝑃 𝐾3
4. (Verify.)  computes:
The construction of 𝑆𝑃 𝐾3 includes zero-knowledge proof and range
proof. We divide 𝑆𝑃 𝐾3 into two parts 𝑆𝑃 𝐾3𝐴 and 𝑆𝑃 𝐾3𝐵 . The specific 𝐴 𝐴 𝐴
A1 = 𝑋0𝑐 𝑔0 1 𝛾 𝐻1 (𝜃) , A2 = 𝜁 𝑐 𝑌1 2 𝑌2 3 𝐶𝑚𝐴4 ,
details are as follows: ( )𝑐
( ) 𝐴 𝐴 ̃
𝑢
𝑛𝑘, 𝑟𝑘, 𝛼0 , 𝑥𝑢 , 𝑠, 𝑡, 𝑛𝑗 , 𝜌1 ⎫ A3 =  𝑐 𝑔1 5 𝑔2 6 , A4 = 𝜂 𝐴7 𝜂 𝐴5 ,
𝜂
𝑋0 = 𝑔0 𝛾 1
𝛼0 𝐻 (𝜃)
= 𝑌 𝑥𝑢 𝑌 𝑠 ⋅ 𝐶𝑚𝑡 ⎪ [ 𝑅 ]𝑐
⎪ ∧ 𝜁 1 2 ⎪( ) 𝑢𝑢0
̃ 𝐴 𝐴 𝐴
𝑆𝑃 𝐾3𝐴 ⎨ ∧  = 𝑔 𝑛𝑗 𝑔 𝜌1
𝑎𝑢𝑥𝑖 , 𝑋0 , 𝜁 , 𝜂, 𝛤 , 𝑇𝑟𝑜𝑜𝑡 , A5 = 𝑢0 8 𝑢0 5 𝑢0 2 𝛤 𝐴8 𝛤 𝐴5 ,
𝛤
⎪ 𝑢̃
1 2
𝑟𝑘 𝑛
⎪ ∧ 𝜂 =𝜂 𝜂 𝑗 ⎪ ?
⎪ and verify: 𝑐 = 𝐻(A1 ∥ A2 ∥ A3 ∥ A4 ∥ A5 ∥ 𝑋0 ∥ 𝜁 ∥ 𝜂 ∥ 𝛤 ∥
𝑢̃ 𝑅𝑢0 𝑛𝑘 𝑢𝑛𝑗 𝑢𝑥𝑢 𝛤 𝑛𝑘 𝛤 𝑛𝑗
⎩ ∧ 𝛤
= 𝑢 0 0 0
𝑇𝑟𝑜𝑜𝑡𝑎𝑢𝑥𝑖 ).
𝑛 𝜌
𝑆𝑃 𝐾3𝐵 {(𝑛𝑗 , 𝜌1 )  = 𝑔1 𝑗 𝑔2 1 ∧ 0 ≤ 𝑛𝑗 < 𝑘}(𝑚). In groups of unknown order, range proofs currently widely recognized
SPK3𝐵 is instantiated as a simple range proof, which will be dis- by academia and industry are based on the square decomposition
cussed later. Next, we demonstrate how to implement SPK3𝐴 . assumption [43] and 𝑛-ary decomposition [40], which can achieve
secure and efficient range proofs. However, we note that the range
1. (Commitment.)  randomly selects 𝜚1 , 𝜚2 , t3 , t4 , t5 , t6 , n7 , n8 ∈𝑅 proofs required in authentication protocols always take the form 0 ≤
Z𝑛𝑞 and computes: 𝑛 < 𝑘. If we set 𝑘 = 2𝜅 , we can easily construct a simple range proof
t t t n n
with complexity (𝜅), as shown in Eq. (1):
𝐴1 = 𝑔03 𝑦𝐻1 (𝜃) , 𝐴2 = 𝑌1 4 𝑌2 5 𝐶𝑚t6 , 𝐴3 = 𝑔1 7 𝑔2 8 ,
𝜚 n 𝑡 𝑃 𝑂𝐾𝑅𝐴𝑁𝐺𝐸 {(𝑛, 𝑟) 𝐶𝑛 = 𝑔0𝑛 𝑔1𝑟 ∧ 0 ≤ 𝑛 < 2𝜅 }. (1)
𝐴4 = 𝜂 𝜚2 𝜂 n7 , 𝐴5 = 𝑢0 1 𝑢0 7 𝑢0 4 𝛤 𝜚1 𝛤 n7 .
7
H. Di et al. Computer Standards & Interfaces 97 (2026) 104097
In this scheme, we use a Bulletproofs-based instantiation of 𝑆𝑃 𝐾3𝐵 . the adversary 1 forges parameters (𝑐𝑡𝑥 , 𝑛𝑘 , 𝑟𝑘 , 𝐴𝑡𝑡𝑟𝑠 ), selects the
Here we will briefly describe and provide a detailed proof process. random blind factor 𝑟 ∈ Z𝑞 , query 1 𝑄𝑢𝑒𝑟𝑦, and generates 𝐶𝑚∗ =
Please refer to the Ref. [29,43]. 𝐶𝑜𝑚𝑚𝑖𝑡 (𝑛𝑘 , 𝑟𝑘 , 𝐴𝑡𝑡𝑟𝑠 ; 𝑟 ). Next, choose 𝑥𝑢 , 𝑠 , 𝑡 ← Z𝑞 , calculate 𝛱𝑈1 :
∑ ( )
1. (Prove.) First, perform binary decomposition on 𝑛, 𝑛 = 𝑘1 𝑖
𝑖=0 𝑏𝑖 2 ,
𝑥𝑢 , 𝑠 , 𝑡 , 𝑟 , 𝑛𝑘 , 𝑟𝑘 , 𝐴𝑡𝑡𝑟𝑠
where 𝑏 ∈ {0, 1}. Construct vector 𝐚𝐿 = (𝑏0 , 𝑏1 , … , 𝑏𝑘1 ), 𝐚𝑅 = ⎪ 𝑥𝑢 𝑠
𝑋𝑢 = 𝑔1 𝑔2 ⎪( )
𝐚𝐿 𝟏𝑘 (𝑎𝑅,𝑖 = 𝑏𝑖 1). Next, choose blind factor 𝛼, 𝜌 ← Z𝑞 , 𝒔𝐿 , 𝒔𝑅 ← 𝛱𝑈1 = 𝑆𝑃 𝐾1 ⎨ ( ) 𝑋𝑢 , 𝜁 , 𝑖𝑎𝑢𝑥𝑧𝑘 , 𝑖𝑎𝑢𝑥𝑝𝑢𝑏 .
𝑎 𝑥 𝑏 𝑠 ⋅ 𝐶𝑚∗𝑡∗
Z𝑘𝑞 , compute the initialization commitment 𝐴 = 𝛼 𝒈𝒂𝐿 𝒉𝒂𝑅 , 𝑆 = ⎪ 𝛬 𝜁 (= ( ) 𝑢  ) ⎪
⎪ 𝛬 𝜄𝑧𝑘 𝐴𝑡𝑡𝑟𝑠 , 𝑖𝑎𝑢𝑥𝑧𝑘 = 1 ⎪
𝜌 𝒈𝒔𝐿 𝒉𝒔𝑅 . Then, construct a non-interactive proof challenge 𝑦 = ⎩
( ) ⎭ ( )
( ) Sending 𝛱𝑈1 , 𝑖𝑎𝑢𝑥𝑧𝑘 , 𝑖𝑎𝑢𝑥𝑝𝑢𝑏 to the issuer,  checks 𝜄𝑝𝑢𝑏 𝑖𝑎𝑢𝑥𝑝𝑢𝑏
𝐻 𝐴, 𝑆, 𝐶𝑛 , 𝑧 = 𝐻(𝑦, 𝐴, 𝑆) based on FiatShamir and polyno-
( ) 1
and validates 𝛱𝑈 , aborts if it fails, otherwise it selects a random
mials 𝒍(𝑥) = 𝒂𝐿 𝑧𝟏𝑘 + 𝒔𝐿 𝑥, 𝒓(𝑥) = 𝑦𝑘𝒂𝑅 + 𝑧𝟏𝑘 + 𝒔𝑅 𝑥, calculate
the inner product 𝑡 = ⟨𝒍(𝑥), 𝒓(𝑥)⟩, 𝜏𝑥 ← Z𝑝 , 𝑇 = 𝑔 𝑡 ℎ𝜏𝑥 . The final number 𝑠 ∈ Z𝑞 and performs 2 𝑄𝑢𝑒𝑟𝑦. Embed tuple  = (, 𝑎 , 𝑏 ),
challenge is 𝑥 = 𝐻(𝑧, 𝑦, 𝑇 ), generate response 𝒍 = 𝒍(𝑥), 𝒓 = register 𝑐𝑟𝑒𝑑 = (𝜁 ⋅ (𝑏 )𝑠 ) ⋅ 𝑢𝑤 0
, generate the forged Merkle
tree 𝑇 , update the root node to 𝑇𝑟𝑜𝑜𝑡 , select 𝑧 , 𝑧 ← Z , Calculate
𝒓(𝑥), 𝑡̂ = ⟨𝒍, 𝒓⟩, 𝜏 = 𝜏𝑥 + 𝑥2 𝜌, 𝜇 = 𝛼 + 𝑥𝜌. Finally output the proof { 0 1 𝑞 }
𝜋 = (𝐴, 𝑆, 𝑇 , 𝑡̂, 𝜏, 𝜇, 𝒍, 𝒓). ( ) 𝑤 ⋅𝑧∗
𝛱𝑉1 = 𝑆𝑃 𝐾2 𝑧0 , 𝑧1 , 𝑎, 𝑏 𝑌𝑢 = 𝑎 𝑏 ∧ ∗ = (𝜁 ⋅ (𝑏 )𝑠 )𝑧1 ⋅ 𝑢0 0
2. (Verify.) Upon receiving the commitment 𝐶𝑛 , proof 𝜋, recal-
( )
(𝑌𝑢 , 𝑠 , 𝑘 , ∗ ), send (𝛱𝑉1 , 𝑠 , 𝑘 , 𝜃 ) to adversary 1 , 1 calculate
culate the challenge 𝑦 = 𝐻 𝐴, 𝑆, 𝐶𝑛 , 𝑧 = 𝐻(𝑦, 𝐴, 𝑆), 𝑥 =
⟨ ⟩ 𝑠 = 𝑠 + 𝑠 and save to local.
𝐻(𝑧, 𝑦, 𝑇 ). Next, compute offset value 𝛿𝑦 = 𝑦𝑘 , 𝑧𝟏𝑘 + 𝑧2 2𝑘 , and
𝑘 ( )𝑧𝟏 𝑘 +𝑧2 2𝑘 𝑄𝑢𝑒𝑟𝑦3 : In this phase 1 to show the proof, using zero knowledge
reconstruct the commitment 𝑃 = 𝐴𝑆 𝑥 ⋅ ℎ−𝜇 ⋅ 𝒈𝑧𝟏𝒉 ,
? 2
simulator , run algorithm 𝑆𝑜𝑤𝐶𝑟𝑒𝑑 forged 𝑡𝑜𝑘𝑒𝑛 and 𝑉 𝑒𝑟𝑖𝑓 𝑦𝑆𝑜𝑤
where 𝒉 = 𝒉◦𝑦𝑘 . Then, verify inner product 𝑔 𝑡̂ℎ𝜏 = 𝑇𝐶𝑛𝑍𝑔 𝛿𝑦 . interact. Adversary 1 forges the message 𝑚𝑠𝑔 requesting access to
If passed, accept, otherwise, reject. .  selects 𝑛𝑜𝑛𝑐𝑒 , conducts 3 𝑄𝑢𝑒𝑟𝑦 query, calculates 𝑟 , and
returns it to adversary 1 . Adversary 3 𝑄𝑢𝑒𝑟𝑦 hash verification,
5.2. Theoretical security analysis if by selecting public attribute 𝑎𝑡𝑡𝑟𝑖𝐴𝑇
( 𝑇 𝑅 , the secret attribute )is
𝑎𝑡𝑡𝑟𝑗𝐴𝑇 𝑇 𝑅∗ , calculate 𝐶𝑚∗ = Commit 𝑛𝑘 , 𝑟𝑘 , 𝑎𝑡𝑡𝑟𝑗𝐴𝑇 𝑇 𝑅∗ ; 𝑟 ,
5.2.1. Proof of Game1 ( )
select 𝑛𝑗 0 ≤ 𝑛𝑗 < 𝑘 , 𝛼0 ← Z𝑞 , generate 𝛱 ̃ , send
{ } 𝑖=𝑛 ( )
Theorem 1. The scheme is unforgeable if the DLP and DDH assumptions ̃ , 𝑎𝑢𝑥𝑖
(𝛱
, 𝜃 , 𝑇𝑟𝑜𝑜𝑡 , 𝛷′ , 𝑎𝑡𝑡𝑟𝑖𝐴𝑇 𝑇 𝑅∗ ) to .
𝑖=1
hold. Forgery. Adversary 1 outputs the forged certificate 𝑐𝑟𝑒𝑑 and the
corresponding authentication path 𝜃 , which meets the condition that
Proof. Suppose that the adversary 1 forges the credential with the 𝑐𝑟𝑒𝑑 was not generated through legal issuance.  running )algorithm
( ( ) { }
non-negligible probability 𝜖, we construct reduction algorithm  to VerifyShow, 𝑉 𝑒𝑟𝑖𝑓 𝑦𝑆𝑜𝑤 𝑝𝑝, 𝑉 , 𝑐𝑟𝑒𝑑 , 𝑇𝑟𝑜𝑜𝑡 ̃ , 𝑎𝑢𝑥𝑖 𝑖=𝑖 = 1.
,𝛱 𝑖=1
solve the DLP or CDH problem with the non-negligible advantage Then, requery 3 by rewinding technique to obtain 𝑟 , modify the
𝜖 𝑛𝑒𝑔𝑙. The reduction algorithm  embeds the group parameter tuple new challenge to 𝑐 ≠( 𝑐 , compute the response and output ̃
) 𝛱 to
 = (, 𝑎 , 𝑏 ) into the problem instance,  can control and program
extract witness 𝑤 = 𝑥𝑢 , 𝑠 , 𝑡 , 𝑟 , 𝑛𝑘 , 𝑟𝑘 , 𝑎𝑡𝑡𝑟𝑗𝐴𝑇 𝑇 𝑅∗ , separate
the random oracle, and simulates the whole system:
Setup. Challenger 1 run system initialization algorithm from the witness 𝜁 = (𝑎 )𝑥𝑢 (𝑏 )𝑠 ⋅ 𝐶𝑚∗𝑡 = (𝑎𝑏 )𝑥𝑢 ⋅𝑠 ⋅ 𝐶𝑚∗𝑡 . According
𝑆𝑒𝑡𝑢𝑝(1𝜆 , 1 , 1𝑚 ) generate 𝑝𝑝, send 𝑝𝑝 to simulator . 1 save issuer to the above proof, if the forgery credential 𝑐𝑟𝑒𝑑 and the corresponding
private key 𝑖𝑠𝑘 = (𝑦1 , 𝑦2 ). authentication path 𝜃 make it difficult to compute 𝑎𝑏 on G, the
Query. In this phase, 1 query random Oracle − 𝑄𝑢𝑒𝑟𝑦, 𝑄𝑢𝑒𝑟𝑦2 , probability that adversary 1 will successfully forge a credential for the
and 𝑄𝑢𝑒𝑟𝑦3 , 1 random response and recording. first time is 𝜖, and the probability of a single retry is about 𝜖 2 . By the
− 𝑄𝑢𝑒𝑟𝑦: The adversary 1 can query the random oracle 1 , 2 , 3 . universal bifurcation Lemma, since adversary 1 performs 𝑞𝐻3 queries.
Before any hash query,  will prepare three empty hash lists 1,2,3 , The probability of success is 𝜖 2 𝑞𝐻3 , then the advantage of simulator
and define the query number size as 𝑞𝐻1 , 𝑞𝐻2 , 𝑞𝐻3 to record the query to break CDH hard problem successfully is 𝜖 2 𝑞𝐻3 𝑛𝑒𝑔𝑙.
response. [ ]
1 𝑄𝑢𝑒𝑟𝑦: Before 1 query,  randomly selected 𝑖1 ∈ 1, 𝑞𝐻1 , the 5.2.2. Proof of Game2
input attribute 𝑎𝑡𝑡𝑟𝑖 ,  record of all the queries in the list 1 , and make
a response. If 𝑖 = 𝑖1 ,  return values in the list, otherwise  generated Theorem 2. The Scheme is anonymity and unlinkability if the CDH
1 (𝑎𝑡𝑡𝑟𝑖 ), records (𝑖, 𝑎𝑡𝑡𝑟𝑖 , 1 (𝑎𝑡𝑡𝑟𝑖 )) in 1 . assumption hold.
[ ]
2 𝑄𝑢𝑒𝑟𝑦: Before the 2 query,  randomly selects 𝑖2 ∈ 1, 𝑞𝐻2 ,
Proof. Suppose that the adversary 2 distinguishes credentials with
after entering each user time period 𝑒𝑝𝑜𝑐𝑖 , and the maximum number
a non-negligible advantage 𝜖, and construct a reduction algorithm 
of credentials to be initialized 𝑘𝑖 ,  records all queries in the list 2 ,
to solve the DDH problem with a non-negligible advantage 𝜖 𝑛𝑒𝑔𝑙.
and responds. If 𝑖 = 𝑖2 ,  returns the value in the list, otherwise 
generates 2 (𝑒𝑝𝑜𝑐𝑘) with the following Eq. (2): The reduction algorithm  embedded the group parameter tuple  =
{ (, 𝑎 , 𝑏 , 𝑐 ) into the DDH problem instance, and the adversary 2
( ) 𝑤 , 𝑖 = 𝑖2 determined whether 𝑐 = 𝑎𝑏 or random, and simulated the whole
2 𝑒𝑝𝑜𝑐𝑖𝑘𝑖 = . (2)
𝑤 , otherwise process:
( (𝑖 ) ( ))
Then,  record 𝑖, epoch 𝑖𝑘𝑖 , 2 𝑒𝑝𝑜𝑐𝑖𝑘𝑖 in the [ list ]2 . Setup. Same with the initialization of Game 1.
3 𝑄𝑢𝑒𝑟𝑦: Before 3 queries,  randomly selected 𝑖3 ∈ 1, 𝑞𝐻3 , the Query. Adversary 2 can continue to query issuance and show, but
input random 𝑛𝑜𝑛𝑐𝑒𝑖 and message 𝑚𝑠𝑔𝑖 ,  record of all the queries in cannot query revocation or presentation of challenge credentials. At the
the list 3 , and respond. If 𝑖 = 𝑖3 ,  return values in the list, otherwise same time also can query 1 𝑄𝑢𝑒𝑟𝑦.
 generated 2 (𝑛𝑜𝑛𝑐𝑒 ∥ 𝑚𝑠𝑔) in the following Eq. (3): Challenge. Adversary 2 submits two attribute sets 𝐴𝑡𝑡𝑟𝑠0 and
{ 𝐴𝑡𝑡𝑟𝑠1 , that satisfy the same access policy to challenger 2 . Since the
( ) 𝑟 , 𝑖 = 𝑖3
2 𝑛𝑜𝑛𝑐𝑒𝑖 ∥ 𝑚𝑠𝑔𝑖 = . (3) parameter related to the attribute set in zero-knowledge is 𝜁 . The
𝑟𝑖 , otherwise
challenger 2 calls the simulator  to simulate the SPK and prove
( ( ) ( ))
Then,  record 𝑖, 𝑛𝑜𝑛𝑐𝑒𝑖 ∥ 𝑚𝑠𝑔𝑖 , 2 𝑛𝑜𝑛𝑐𝑒𝑖 ∥ 𝑚𝑠𝑔𝑖 in the list 3 , the embedding group parameter tuple  = (, 𝑎 , 𝑏 , 𝑐 ), randomly
where oracle 2 and 3 share a hash function. 𝑄𝑢𝑒𝑟𝑦2 : In this phase, select 𝑎, 𝑏 ← Z𝑞 , and calculate 𝜁1 . Select 𝑐 ← Z𝑞 calculate 𝜁2 . Next,
8
H. Di et al. Computer Standards & Interfaces 97 (2026) 104097
Table 3
Average times of cryptographic and Merkle tree operations.
Symbol Definition secp256k1 (128-bit security) BLS12-381 (128-bit security)
100 s/Leaves 1000 s/Leaves 100 s/Leaves 1000 s/Leaves
𝑇𝑏𝑝 Bilinear pairing operation time 0.9162 ms 0.9466 ms
𝑇 Hash computation time 0.0003 ms 0.0000 ms 0.0001 ms 0.0000 ms
𝑇𝑒𝑝 Exponentiation time in group G 0.0211 ms 0.0314 ms 0.2606 ms 0.2677 ms
G1 :0.3958 ms G1 :0.2686 ms
𝑇𝑚𝑝−𝑒𝑐 Elliptic curve point multiplication time 0.0254 ms 0.0234 ms
G2 :0.8140 ms G2 :0.8009 ms
G1 :0.0007 ms G1 :0.0006 ms
𝑇𝑎𝑑𝑑𝑒𝑐 Elliptic curve point addition time 0.0462 ms 0.0530 ms
G2 :0.0018 ms G2 :0.0018 ms
𝑇𝜅𝐺 Generation algorithm of tree 𝑇𝜅 0.0025 ms 0.0024 ms 0.0029 ms 0.0023 ms
𝑇𝜅𝑉 Verification algorithm of tree 𝑇𝜅 0.0004 ms 0.0002 ms 0.0020 ms 0.0002 ms
𝑇𝜅𝑈 Update algorithm of tree 𝑇𝜅 0.0002 ms 0.0002 ms 0.0003 ms 0.0003 ms
Table 4
Computation and communication cost analysis.
Algorithms Parameter Phase Computation cost Communication cost
𝑆𝑒𝑡𝑢𝑝 𝑝𝑝 2𝑇𝑒𝑝 (13 + 𝑚)|G|
𝐼𝑠𝑠𝑢𝑒𝑆𝑒𝑡𝑢𝑝𝐼 (𝐼, 𝜄𝑝𝑢𝑏 )
𝑆𝑜𝑤𝑆𝑒𝑡𝑢𝑝𝑉 𝑉
𝐶𝑚 (3 + 𝑚)𝑇𝑒𝑝 + 𝑚𝑇ℎ + 3𝑇𝑚𝑝𝑒𝑐 |G|
𝐼𝑠𝑠𝑢𝑒𝑅𝑒𝑞𝑈
Proof (16 + 𝑚)𝑇𝑒𝑝 + 3𝑇𝑚𝑝𝑒𝑐 2|G| + 5|Z𝑞 |
𝛱𝑈1
Verify 7𝑇𝑒𝑝
𝑐𝑟𝑒𝑑 1𝑇𝑒𝑝 + 2𝑇𝑚𝑝𝑒𝑐 + 1𝑇
𝐼𝑠𝑠𝑢𝑒𝐺𝑟𝑎𝑛𝑡𝐼 𝑇𝜅 𝑇𝜅𝐺
Proof 8𝑇𝑒𝑝 + 1𝑇 + 3𝑇𝑚𝑝𝑒𝑐 2|G| + 6|Z𝑞 |
𝛱𝑉1
Verify 6𝑇𝑒𝑝
𝛱̃ Proof 25𝑇𝑒𝑝 5|G| + 7|Z𝑞 |
𝑆𝑜𝑤𝐶𝑟𝑒𝑑𝑈
{𝑎𝑢𝑥𝑖 }𝑛𝑖=1 i|Z𝑞 |
𝑉 𝑒𝑟𝑖𝑓 𝑦𝑆𝑜𝑤𝑉 Verify 26𝑇𝑒𝑝 + 𝑇𝜅𝑉
𝑅𝑒𝑣𝑜𝑘𝑒𝐶𝑟𝑒𝑑 𝑇𝜅′ 𝑇𝜅𝑈
Note*: i is the number of access criteria defined per verifier.
simulator  selects 𝑏 ← ( {0, 1}, and uses 𝐴𝑡𝑡𝑟𝑠𝑏 to generate the cre- ) 6.2. Algorithm computation and communication cost analysis
{ } ( )
dential display 𝛱̃ 𝑏 . Send 𝛱 ̃ 𝑏 , 𝑎𝑢𝑥𝑖 𝑖=𝑖 , 𝜃, 𝑇𝑟𝑜𝑜𝑡 , 𝛷′ , 𝑎𝑡𝑡𝑟𝑖𝐴𝑇 𝑇 𝑅
𝑖=1
to adversary 2 . Table 4 shows the computational cost and communication cost
Guess. 2 guesses 𝑏 from the output 𝛱 ̃ 𝑏 , and the advantage is of the proposed algorithm in the scheme. The algorithm includes
| [ ] |
defined as: |Pr 𝑏 = 𝑏 12 |. 8 algorithms as follows. 𝑆𝑒𝑡𝑢𝑝, 𝐼𝑠𝑠𝑢𝑒𝑆𝑒𝑡𝑢𝑝𝐼 , 𝑆𝑜𝑤𝑆𝑒𝑡𝑢𝑝𝑉 , 𝐼𝑠𝑠𝑢𝑒𝑅𝑒𝑞𝑈 ,
| |
𝐼𝑠𝑠𝑢𝑒𝐺𝑟𝑎𝑛𝑡𝐼 , 𝑆𝑜𝑤𝐶𝑟𝑒𝑑𝑈 ,
According to the above proof, if two attribute sets satisfying the
𝑉 𝑒𝑟𝑖𝑓 𝑦𝑆𝑜𝑤𝑉 and 𝑅𝑒𝑣𝑜𝑘𝑒𝐶𝑟𝑒𝑑. The computational cost increases
same access policy are (submitted 𝐴𝑡𝑡𝑟𝑠0 , 𝐴𝑡𝑡𝑟𝑠 ̃
) 1 . It( is difficult for 𝛱)𝑏 linearly with the number of attributes 𝑚. We compared the single user
to distinguish between 𝑎 , 𝑏 , 𝑎⋅𝑛𝑘+𝑏⋅𝑟𝑘+𝑎𝑏⋅𝑟 and 𝑎 , 𝑏 , 𝑎⋅𝑛𝑘+𝑏⋅𝑟𝑘+𝑐⋅𝑟
in Table 4 cases for each verifier ℶ access criteria general computation
on G, then adversary 2 succeeds in distinguishing credentials with
and communication costs. Respectively, (94 + 2 𝑚)𝑇𝑒𝑝 + (𝑚 + 2)𝑇 +
non-negligible probability 𝜖𝑞𝐻1 . Then the advantage of the simulator
11𝑇𝑚𝑝𝑒𝑐 + 𝑇𝜅𝐺 + 𝑇𝜅𝑉 and (22 + 𝑚)|G| + (18 + ℶ)|Z𝑞 |. The cost of a single
 to break the DDH hard problem successfully is 𝜖𝑞𝐻1 𝑛𝑒𝑔𝑙.
algorithm is shown in Table 4 below:
Note that even if the underlying Merkle path remains the same
for repeated authentications, the simulator ensures that each creden-
6.3. Computation and communication cost comparison
tial presentation is randomized. Therefore, the adversarys advantage
does not increase by observing identical path values, which remain
In Table 1 of Section 2, we have compared the functions of the ex-
computationally indistinguishable across sessions.
isting schemes [19,2931,3335]. The scheme [3234] satisfies the 𝑘-
times period anonymous authentication function. Since the scheme [32]
Theorem 3. The Scheme is attribute Privacy if the CDH assumption hold.
is constructed based on bilinear pairing. Here, we compare the scheme
Similar anonymity, but in view of the properties rather than identity.
[33,34] with the proposed scheme in the computation cost processes of
6. Performance analysis issuance, show and verification. Using the lightweight curve secp256k1
environment, as shown in Table 5 and Fig. 3. In Table 1, the scheme
6.1. Experimental setup [33] does not support the attribute selection disclosure function and
does not increase with the increase of the number of attributes 𝑚.
The scheme is based on AMD Ryzen9 7945HX processor, Rust 1.75 Therefore, the data results in Fig. 3 show that our scheme is better
and Ubuntu 22.04 LTS environment, and the error is controlled within than the scheme [33] when the number of attributes 𝑚 is small.
5%. The test program is written in 𝑅𝑢𝑠𝑡 and performs benchmark Throughout the entire process, the overall performance was superior
evaluations on SHA-256 hacks, elliptic curve operations, and Merkle to the scheme [34]. Finally, the data results show that our scheme
tree operations with the 128-bit security secp256k1, BLS12-381, and is superior to the existing schemes under the condition of similar
sha2 libraries. The experiment measured the average time of 100 and functions.
1000 operations (as shown in Table 3). All tests were compiled based In addition to the above experimental comparison, we also added
on release optimization to ensure accurate and reliable performance the proposed scheme to test the computational overhead under two
results. different curve environments, BLS12-381 supporting bilinear pairing
9
H. Di et al. Computer Standards & Interfaces 97 (2026) 104097
Table 5
Computation cost comparison.
Scheme Computation cost (ms)
Credential issuance Certificate showing Authentication credentials
[33] 15𝑇𝑒𝑝 + 10𝑇𝑚𝑝𝑒𝑐 + 2𝑇𝑎𝑑𝑑𝑒𝑐 31𝑇𝑒𝑝 + 6𝑇𝑚𝑝𝑒𝑐 + 𝑇 20𝑇𝑒𝑝 + 9𝑇𝑚𝑝𝑒𝑐 + 𝑇
[34] (5 𝑚 + 40)𝑇𝑒𝑝 + (3 𝑚 + 4)𝑇 (𝑚 + 22)𝑇𝑒𝑝 + 𝑇 (𝑚 + 23)𝑇𝑒𝑝
Our Scheme (𝑚 + 35)𝑇𝑒𝑝 + (𝑚 + 2)𝑇 + 11𝑇𝑚𝑝𝑒𝑐 + 𝑇𝜅𝐺 (16 + 𝑚)𝑇𝑒𝑝 + 𝑚𝑇ℎ 19𝑇𝑒𝑝 + 𝑇 + 𝑇𝜅𝑉
(a) (b) (c) (d)
Fig. 3. Computation cost comparison.
Fig. 4. Computation cost comparison of different curves.
Fig. 5. Communication cost comparison.
and lightweight curve secp256k1, as shown in Fig. 4. The exper- 7. Conclusion
imental results show that the scheme has more advantages under
lightweight curve. It is suggested to apply the proposed scheme under In this paper, we propose a 𝑘-times periodic anonymous authen-
curve secp256k1.
tication that does not require the issuer to hold a key and supports
Finally, the communication cost of the existing scheme [33,34] is
the access criteria. Compared with other existing 𝑘-Times periodic
compared and calculated based on the size of the data to be transmitted
anonymous authentication schemes, the proposed scheme not only has
during the anonymous certificate display process. We test the commu-
lower computational cost, but also eliminates the need for the issuer to
nication efficiency on curve secp256k1, where the group element and
hold the issuing information or the user key, and only needs to upload
integer size of curve secp256k1 are |G| = 264𝑏𝑖𝑡𝑠 = 33𝑏𝑦𝑡𝑒𝑠, |Z𝑞 | =
256𝑏𝑖𝑡𝑠 = 32𝑏𝑦𝑡𝑒𝑠, respectively. In the test, it is assumed that the the root path of the Merkle tree to the blockchain or public panel, which
access criterion ℶ is 1, and the number of user attributes is 1. The ensures that the subsequent authentication can still be carried out even
communication costs of the schemes [33,34] are respectively 8|G| + in the case of the failure of the issuing center. In terms of security,
11|Z𝑞 |, and (𝑚 + 14)|G| + 8|Z𝑞 |. The parameters that our scheme needs it satisfies a series of DAC security properties, including anonymity,
to transmit for presentation are (𝛱, ̃ {𝑎𝑢𝑥𝑖 }𝑛 , 𝑋0 , 𝜁 , 𝜂, 𝛤 , 𝜃), where 𝛱̃ = unlinkability, unforgeability and attribute privacy. The limitation of
𝑖=1
(𝑐, 𝐴1 , 𝐴2 , 𝐴3 , 𝐴4 , 𝐴5 , 𝐴6 , 𝐴7 , 𝐴8 ). Therefore, the total communication current schemes is that they rely on classical cryptography, which
cost during the transmission process is 4|G| + (9 + ℶ)|Z𝑞 |. As shown cannot resist quantum computing attacks. To address this challenge,
in Fig. 5. we plan to integrate quantum-resistant cryptographic frameworks, such
10
H. Di et al. Computer Standards & Interfaces 97 (2026) 104097
as lattice-based signature, coding cryptography, or multivariate poly- [14] C. Garman, M. Green, I. Miers, Decentralized anonymous credentials, in: Proceed-
nomial encryption in future research to construct periodic 𝑘-times ings of the 21st NDSS, 2014, URL: https://www.ndss-symposium.org/ndss2014/
authentication schemes with post-quantum security. decentralized-anonymous-credentials.
[15] D. Derler, C. Hanser, D. Slamanig, A new approach to efficient revocable
attribute-based anonymous credentials, in: Cryptography and Coding, 2015, pp.
CRediT authorship contribution statement 5774.
[16] T. Bui, T. Aura, Application of public ledgers to revocation in distributed access
Hongyan Di: Writing original draft, Methodology, Formal analy- control, in: Information and Communications Security, 2018, pp. 781792.
[17] A. Sonnino, M. Al-Bassam, S. Bano, S. Meiklejohn, G. Danezis, Coconut: Thresh-
sis, Data curation, Conceptualization. Yinghui Zhang: Writing review
old issuance selective disclosure credentials with applications to distributed
& editing, Supervision, Project administration, Methodology, Funding ledgers, in: 26th Annual Network and Distributed System Security Symposium,
acquisition. Ziqi Zhang: Writing original draft, Formal analysis, Data NDSS, 2019, URL: https://arxiv.org/pdf/1802.07344.
curation. Yibo Pang: Project administration, Formal analysis, Data [18] H. Halpin, Nym credentials: Privacy-preserving decentralized identity with
curation. Rui Guo: Writing original draft, Methodology, Formal anal- blockchains, in: 2020 Crypto Valley Conference on Blockchain Technology,
ysis. Yangguang Tian: Writing original draft, Project administration, CVCBT, 2020, pp. 5667, http://dx.doi.org/10.1109/CVCBT50464.2020.00010.
[19] H. Cui, M. Whitty, A. Miyaji, Z. Li, A blockchain-based digital identity manage-
Methodology, Funding acquisition. ment system via decentralized anonymous credentials, in: Proceedings of the 6th
ACM International Symposium on Blockchain and Secure Critical Infrastructure,
Declaration of competing interest 2025, pp. 111, http://dx.doi.org/10.1145/3659463.3660027.
[20] C. Lin, D. He, H. Zhang, L. Shao, X. Huang, Privacy-enhancing decentralized
anonymous credential in smart grids, Comput. Stand. Interfaces 75 (2021)
The authors declare that they have no known competing finan-
103505, http://dx.doi.org/10.1016/j.csi.2020.103505.
cial interests or personal relationships that could have appeared to [21] Z. Ma, J. Zhang, Y. Guo, Y. Liu, X. Liu, W. He, An efficient decentralized key
influence the work reported in this paper. management mechanism for VANET with blockchain, IEEE Trans. Veh. Technol.
69 (2020) 58365849, http://dx.doi.org/10.1109/TVT.2020.2972923.
Data availability [22] J. Zhang, J. Cui, H. Zhong, I. Bolodurina, L. Liu, Intelligent drone-assisted
anonymous authentication and key agreement for 5G/B5G vehicular ad-hoc
networks, IEEE Trans. Netw. Sci. Eng. 8 (2021) 29822994, http://dx.doi.org/
Data will be made available on request. 10.1109/TNSE.2020.3029784.
[23] D. Liu, H. Wu, C. Huang, J. Ni, X. Shen, Blockchain-based credential management
for anonymous authentication in SAGVN, IEEE J. Sel. Areas Commun. 40 (2022)
References 31043116, http://dx.doi.org/10.1109/JSAC.2022.3196091.
[24] D. Liu, H. Wu, J. Ni, X. Shen, Efficient and anonymous authentication with
[1] K.Y. Lam, C.H. Chi, Identity in the internet-of-things (IoT): New challenges and succinct multi-subscription credential in SAGVN, IEEE Trans. Intell. Transp. Syst.
opportunities, in: Information and Communications Security, 2016, pp. 1826. 23 (2022) 28632873, http://dx.doi.org/10.1109/TITS.2022.3147354.
[2] K. Shafique, B.A. Khawaja, F. Sabir, S. Qazi, M. Mustaqim, Internet of things [25] L. Wei, Y. Zhang, J. Cui, H. Zhong, I. Bolodurina, D. He, A threshold-based full-
(IoT) for next-generation smart systems: A review of current challenges, future decentralized authentication and key agreement scheme for VANETs powered
trends and prospects for emerging 5G-IoT scenarios, IEEE Access 8 (2020) by consortium blockchain, IEEE Trans. Mob. Comput. 23 (2024) 1250512521,
2302223040, http://dx.doi.org/10.1109/ACCESS.2020.2970118. http://dx.doi.org/10.1109/TMC.2024.3412106.
[3] L. Ante, C. Fischer, E. Strehle, A bibliometric review of research on digital [26] M. Zeng, J. Cui, Q. Zhang, H. Zhong, D. He, Efficient revocable cross-domain
identity: Research streams, influential works and future research paths, J. Manuf. anonymous authentication scheme for IIoT, IEEE Trans. Inf. Forensics Secur. 20
Syst. 62 (2022) 523538, http://dx.doi.org/10.1016/j.jmsy.2022.01.005. (2025) 9961010, http://dx.doi.org/10.1109/TIFS.2024.3523198.
[4] M.A. Olivero, A. Bertolino, F.J.D. Mayo, M.J.E. Cuaresma, I. Matteucci, Digital [27] I. Teranishi, J. Furukawa, K. Sako, K-times anonymous authentication (extended
persona portrayal: Identifying pluridentity vulnerabilities in digital life, J. Inf. abstract), in: Advances in Cryptology - ASIACRYPT 2004, 2004, pp. 308322.
Secur. Appl. 52 (2020) 102492, URL: https://api.semanticscholar.org/CorpusID: [28] L. Nguyen, R. Safavi-Naini, Dynamic k-times anonymous authentication, in:
215881538. Applied Cryptography and Network Security, 2005, pp. 318333.
[29] M.H. Au, W. Susilo, Y. Mu, Constant-size dynamic k-TAA, in: Security and
[5] M.S. Ferdous, F. Chowdhury, M.O. Alassafi, In search of self-sovereign identity
Cryptography for Networks, 2006, pp. 111125.
leveraging blockchain technology, IEEE Access 7 (2019) 103059103079, http:
[30] U. Chaterjee, D. Mukhopadhyay, R.S. Chakraborty, 3PAA: A private PUF protocol
//dx.doi.org/10.1109/ACCESS.2019.2931173.
for anonymous authentication, IEEE Trans. Inf. Forensics Secur. 16 (2021)
[6] A. Shabtai, Y. Elovici, L. Rokach, List of data breaches and cyber attacks in 2023.
756769, http://dx.doi.org/10.1109/TIFS.2020.3021917.
Media report. IT governance, 2023, URL: https://www.itgovernance.co.uk/blog/
[31] J. Huang, W. Susilo, F. Guo, G. Wu, Z. Zhao, Q. Huang, An anonymous
list-of-data-breaches-andcyber-attacks-in-2023.
authentication system for pay-as-you-go cloud computing *, IEEE Trans. Depend-
[7] P.C. Bartolomeu, E. Vieira, S.M. Hosseini, J. Ferreira, Self-sovereign identity:
able Secur. Comput. 19 (2) (2022) 12801291, http://dx.doi.org/10.1109/TDSC.
Use-cases, technologies, and challenges for industrial IoT, in: 2019 24th IEEE
2020.3007633.
International Conference on Emerging Technologies and Factory Automation,
[32] J. Camenisch, S. Hohenberger, M. Kohlweiss, A. Lysyanskaya, M. Meyerovich,
ETFA, 2019, pp. 11731180, http://dx.doi.org/10.1109/ETFA.2019.8869262.
How to win the clonewars: efficient periodic n-times anonymous authentication,
[8] European Union, Regulation (EU) 2016/679 of the European parliament and of
in: Proceedings of the 13th ACM Conference on Computer and Communications
the council of 27 april 2016 on the protection of natural persons with regard
Security, 2006, pp. 201210, http://dx.doi.org/10.1145/1180405.1180431.
to the processing of personal data and on the free movement of such data,
[33] B. Lian, G. Chen, M. Ma, J. Li, Periodic 𝐾 -times anonymous authentication with
and repealing directive 95/46/EC (general data protection regulation), 2016,
efficient revocation of violators credential, IEEE Trans. Inf. Forensics Secur. 10
[Online] Available: URL: https://eur-lex.europa.eu/eli/reg/2016/679/oj/eng.
(3) (2015) 543557, http://dx.doi.org/10.1109/TIFS.2014.2386658.
[9] A. Mühle, A. Grüner, T. Gayvoronskaya, C. Meinel, A survey on essential [34] Y. Yang, W. Xue, J. Sun, G. Yang, Y. Li, H. Hwa Pang, R.H. Deng, PkT-
components of a self-sovereign identity, Comput. Sci. Rev. 30 (2018) 8086, SIN: A secure communication protocol for space information networks with
http://dx.doi.org/10.1016/j.cosrev.2018.10.002. periodic k-time anonymous authentication, IEEE Trans. Inf. Forensics Secur.
[10] European Union, Regulation (EU) 2024/1183 of the European parliament and (2024) 60976112, http://dx.doi.org/10.1109/TIFS.2024.3409070.
of the council of 5 June 2024 on European digital identity wallets, 2024, URL: [35] C. Wiraatmaja, S. Kasahara, Scalable anonymous authentication scheme based
https://eur-lex.europa.eu/eli/reg/2024/1183/oj. (Accessed 13 October 2024). on zero-knowledge set-membership proof, Distrib. Ledger Technol. 4 (2025)
[11] D. Chaum, Security without identification: transaction systems to make big http://dx.doi.org/10.1145/3676285.
brother obsolete, Commun. ACM 28 (1985) 10301044, http://dx.doi.org/10. [36] R. Canetti, Y. Chen, J. Holmgren, A. Lombardi, G.N. Rothblum, R.D. Rothblum,
1145/4372.4373. D. Wichs, Fiat-Shamir: from practice to theory, 2019, http://dx.doi.org/10.1145/
[12] D. Chaum, Showing credentials without identification. Signatures transferred 3313276.3316380.
between unconditionally unlinkable pseudonyms, in: Proc. of a Workshop on [37] J. Camenisch, M. Stadler, Efficient group signature schemes for large groups, in:
the Theory and Application of Cryptographic Techniques on Advances in Advances in Cryptology — CRYPTO 97, 1997, pp. 410424.
Cryptology—EUROCRYPT 85, 1986, pp. 241244. [38] M. Rosenberg, J. White, C. Garman, I. Miers, zk-creds: Flexible anonymous
[13] J. Camenisch, A. Lysyanskaya, An efficient system for non-transferable anony- credentials from zkSNARKs and existing identity infrastructure, in: 2023 IEEE
mous credentials with optional anonymity revocation, in: Advances in Cryptology Symposium on Security and Privacy, SP, 2023, pp. 790808, http://dx.doi.org/
— EUROCRYPT 2001, 2001, pp. 93118. 10.1109/SP46215.2023.10179430.
11
H. Di et al. Computer Standards & Interfaces 97 (2026) 104097
[39] Y. Dodis, A. Yampolskiy, A verifiable random function with short proofs and Yibo Pang received the B.S. degree in Information Security
keys, 2004, URL: https://eprint.iacr.org/2004/310. Cryptology ePrint Archive, from the School of Cyberspace Security, Xian University of
Paper 2004/310. Posts and Telecommunications, Xian, China, in 2020, and
[40] J. Groth, On the size of pairing-based non-interactive arguments, in: Advances the M.S. degree in Cyberspace Security from the School of
in Cryptology EUROCRYPT 2016, 2016, pp. 305326. Cyberspace Security, Xian University of Posts and Telecom-
[41] V. Shoup, Sequences of games: a tool for taming complexity in security proofs, munications, Xian, China, in 2023. He is currently pursuing
IACR Cryptol. EPrint Arch. (2004) 332, URL: http://eprint.iacr.org/2004/332. a PhD at Xian University of Posts and Telecommunica-
[42] M. Bellare, P. Rogaway, Random oracles are practical: a paradigm for designing tions. His research interests include multimedia security and
efficient protocols, in: Proceedings of the 1st ACM Conference on Computer and privacy.
Communications Security, 1993, pp. 6273, http://dx.doi.org/10.1145/168588.
168596.
[43] B. Bünz, J. Bootle, D. Boneh, A. Poelstra, P. Wuille, G. Maxwell, Bulletproofs:
Short proofs for confidential transactions and more, in: 2018 IEEE Symposium Rui Guo is an associate professor and masters supervisor at
on Security and Privacy, SP, 2018, pp. 315334, http://dx.doi.org/10.1109/SP. Xian an University of Posts and Telecommunications. He
2018.00020. has presided over a total of 9 scientific research projects,
including those funded by the National Natural Science
Foundation of China, the Key Research and Development
Hongyan Di is currently studying for a masters degree in
Program of Shaanxi Province, and the Basic Research Pro-
Cyberspace and Information Security from Xian University
gram of Shaanxi Province. As a major participant, he has
of Posts and Telecommunications. Her research interests
participated in and completed more than 10 projects, such
include cross-domain authentication and digital signature
as the National Key Research and Development Plan and the
security.
National Natural Science Foundation of China. As the first
author, I have published over 20 academic papers, among
which 12 are indexed by SCI (including 1 TOP 1% ESI
highly cited paper).
Dr. Yangguang Tian received his Ph.D. degree in applied
Yinghui Zhang received his Ph.D. degree in Cryptography cryptography from the University of Wollongong, Australia.
from Xidian University, China, in 2013. He is a professor After Ph.D., he did post-docs at School of Information
at School of Cyberspace Security, National Engineering System, Singapore Management University, and iTrust, Sin-
Research Center for Secured Wireless (NERCSW), Xian gapore University of Technology and Design. Before Surrey,
University of Posts & Telecommunications. He was a re- he was a research-based assistant professor at Osaka Uni-
search fellow at School of Information System, Singapore versity, Japan. He is currently a lecturer at the University
Management University. He has published over 100 research of Surrey, UK. His research interests include applied cryp-
articles in ACM CSUR, IEEE TDSC, IEEE TCC, Computer tography, network security, blockchain technologies, and
Networks, etc. He served on the program committee of privacy-preserving technologies. Dr. Tians recent research
several conferences and the editorial member of several works have been published in the cybersecurity-related
international journals in information security. His research international conferences and journals, such as USENIX24,
interests include public key cryptography, cloud security, AsiaCCS24, IEEE TIFS23, IEEE TDSC24, etc.
and wireless network security.
Ziqi Zhang is currently studying for a masters degree in
Cyberspace and Information Security from Xian University
of Posts and Telecommunications. Her research interests
include digital signature security and its applications.
12

View File

@@ -0,0 +1,897 @@
Computer Standards & Interfaces 97 (2026) 104094
Contents lists available at ScienceDirect
Computer Standards & Interfaces
journal homepage: www.elsevier.com/locate/csi
How AI agents transform reflective practices: A three-semester comparative
study in socially shared regulation of learning
Yumin Zheng a, Fengjiao Tu b , Fengfang Shu a,c , Chaowang Shang a,* , Lulu Chen a , Jiang Meng a
a
Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430079, China
b
Department of Information Science, University of North Texas, 3940 North Elm, Denton, Texas, 76203, USA
c
Institute of Open Education, Wuhan Vocational College of Software and Engineering, Wuhan Open University, Wuhan, China
A R T I C L E I N F O A B S T R A C T
Keywords: High-quality reflection has been a challenging barrier in the socially shared regulation of learning (SSRL).
Artificial intelligence agent Especially with the emergence of generative artificial intelligence (GAI), traditional methods such as reflection
Socially shared regulation of learning reports may increase the students risk of superficial reflection. This study uses an artificial intelligence agent (AI
Reflection quality
agent) to design a reflection assistant, which aims to enhance students reflection ability through continuous
Collaborative learning
Generative artificial intelligence
questioning and real-time, content-specific feedback based on their written reflections. Through a comparative
experiment conducted over three semesters, this study demonstrates the different impacts of three reflection
methods, reflection reports, reflection short-answer questions, and AI agents, on the quality of university stu­
dents reflections. The results indicate that there is a significant difference in the quality of reflection among the
three reflection methods. Students using AI agents show the highest levels of reflection, characterized primarily
by connective reflection and critical reflection. Epistemic network analysis further reveals that the AI agent
reflection method is more effective in improving the reflection quality of low-performance teams than that of
high-performance teams. This expands AI agents use in SSRL reflection, introduces new methods for the GAI era,
and provides practical experience and reflection intervention strategies for teachers and instructional designers
in SSRL.
1. Introduction Nowadays, these traditional methods fall short of addressing the chal­
lenges posed by GAI [9]. Students may easily rely on tools like ChatGPT
With the rapid advancement of generative artificial intelligence to complete short-answer questions, journals, and reports. Kiy [10] has
(GAI), numerous challenges in collaborative learning have been shown that 76 % of university students use ChatGPT for their assign­
addressed with innovative solutions [1,2]. GAI applications, represented ments, with the percentage being even higher among software engi­
by artificial intelligence agents (AI agents), have introduced revolu­ neering students, reaching 93 % [11]. The widespread use of GAI has
tionary transformations to education. These transformations are mainly profoundly transformed traditional methods of learning and teaching,
due to the powerful expert-level conversational abilities and and this era calls for new approaches to reflection.
user-friendly accessibility [3]. AI agents are computing systems with capabilities for autonomous
The socially shared regulation of learning (SSRL) strategy serves as a perception, decision making, and action [12]. They use GAI to learn,
crucial mechanism for enhancing learning outcomes in collaborative reason, and perform corresponding tasks or actions from the surround­
learning [4]. Through the SSRL strategy, learners collaboratively set ing environment and input information. To enable practical imple­
goals and monitor progress, thereby improving their performance [5]. mentation, rule-based AI agents have been developed that require no
Reflection is a critical component of SSRL, aiding learners in recognizing programming and can be deployed simply by defining task objectives
and refining their learning processes [6]. However, achieving and roles via prompts. In educational contexts, these rule-based AI
high-quality reflection remains a challenge [7]. agents are commonly used for personalized instruction and intelligent
There are various methods to enhance reflection quality in SSRL, tutoring due to their ability to engage in real-time dialogue and provide
such as providing prompts and templates in reflection reports [8]. immediate feedback [13].
* Corresponding author.
E-mail address: phdzhengyumin@mails.ccnu.edu.cn (C. Shang).
https://doi.org/10.1016/j.csi.2025.104094
Received 1 February 2025; Received in revised form 28 October 2025; Accepted 10 November 2025
Available online 11 November 2025
0920-5489/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Y. Zheng et al. Computer Standards & Interfaces 97 (2026) 104094
The rule-based AI agent provides an effective approach for sup­ widely applied in education [16]. It can support collaborative learning
porting SSRL reflection. Instructors can set specific SSRL task directions, through personalized instruction, real-time feedback, and intelligent
and the agent guides students based on the reflection checklist while assessment [17]. AI agents, a form of GAI equipped with autonomous
adaptively generating questions according to students responses. Each learning and decision-making capacities, have emerged as key instruc­
follow-up question is dynamically generated based on the students tional tools in global educational research.
prior answers and the specific SSRL task, making it difficult for students Empirical studies have shown that AI agents significantly improve
to rely on external AI tools like ChatGPT to provide generic responses. student engagement [18,19], learning motivation [20,21], and aca­
This continuous dialogue mechanism supports deeper, more analytical demic performance [22]. AI agents exist in various forms, such as
reflection and reduces the risk of superficial reflection [14]. Despite AI chatbots [23], intelligent tutoring systems (ITS; [24]), embodied
agents having broad application prospects, current research on conversational agents (ECA; [25,26]), and intelligent virtual assistants
improving learners reflection quality by AI agents remains limited and (IVA; [13,27]). Among these, GAI-based chatbots have been widely
requires further in-depth exploration. adopted in education due to their customizable roles and flexible
Against this backdrop, this study introduces a rule-based AI agent deployment. The present study focuses on this type of conversational AI
reflection assistant within the SSRL framework to help learners enhance agent.
their reflection quality. This study aims to examine the impact of the AI In higher education, AI agents have been shown to support higher-
agent on SSRL reflection quality by comparing three reflection methods: order thinking skills, such as critical thinking, metacognition, and
reflection reports, short-answer reflection questions, and the AI agent- problem-solving [23,28,29]. In these studies, GAI was embedded within
based reflection. In addition, different methods may lead to different structured reflection activities, allowing students to engage in guided
reflection qualities among learners in high and low-performance teams reflective processes targeting specific cognitive skills. For example,
[15]. Therefore, we further explored the differences in reflection quality Hong et al. [29] employed AI to handle lower-level tasks in essay
between high and low-performance teams when using these three writing, enabling students to focus on evaluation and reflection, thereby
reflection methods. We proposed the following research questions: enhancing critical thinking. Chen et al. [28] implemented metacognitive
strategy-supported AI agents that prompted process-oriented reflection
RQ1: How does the AI agent reflection assistant affect learners and multi-perspective discussion, improving metacognitive skills. Zhou
reflection quality in SSRL? et al. [23] situated reflection within a self-regulated learning frame­
RQ2: What differences do high and low-performance teams show in work, showing that GAI-supported reflection indirectly benefits critical
reflection quality when using the three reflection methods? thinking and problem-solving.
Although these studies demonstrate that AI agents can enhance
This study conducted a three-semester comparative teaching exper­ higher-order thinking, reflection itself has often been treated merely as a
iment to evaluate the impact of AI agents and two traditional reflection learning process rather than a measurable skill. Reflection is a core
methods (reflection reports and short-answer questions) on university component of higher-order thinking and an essential learning compe­
students reflection quality. Using statistical analysis, content analysis, tency for 21st-century university students. Empirical evidence directly
and epistemic network analysis (ENA), this study examines the effec­ examining the impact of AI agents on learners reflective abilities,
tiveness of AI agents in enhancing university students reflection quality particularly in collaborative learning environments, remains scarce.
in SSRL. Investigating this relationship is therefore necessary to understand how
The main contributions of this study are summarized as follows: AI agents can effectively support the development of reflection.
- We introduce a practical SSRL activity, providing educators with a 2.2. Socially shared regulation of learning and reflection
valuable instructional framework for facilitating collaborative
learning. Collaborative learning includes three primary types of regulation:
- We integrated an AI agent reflection assistant in SSRL and provided a self-regulation (SR), co-regulation (CoR), and socially shared regulation
comprehensive debugging process, offering instructors examples and (SSR) [30,31]. Based on SSR theory, socially shared regulation of
considerations of AI agent implementation. learning (SSRL) is an emerging collaborative learning strategy empha­
- We revealed the reflection quality differences between high and low- sizing mutual support and feedback among team members. The strategy
performance teams in various reflection approaches and demon­ consists of four key stages: goal setting, task distribution, progress
strated the advantages of the AI agent for low-performance teams. monitoring, and reflection evaluation [3235]. Research indicates that
the SSRL strategy has a positive impact on collaborative learning [36].
The research is organized as follows: Section 2 reviews prior research Learners may enhance their awareness of the collaborative process and
on AI agents in education, SSRL theory, and reflection. Section 3 de­ facilitate the activation of regulatory processes through SSRL [4]. And
scribes the participants, research design, and methods for data collection SSRL helps to enhance learners cognitive and metacognitive abilities,
and analysis. Section 4 compares reflection quality across the three boosting learning motivation and engagement [37,38]. Additionally,
methods and examines differences between high and low-performance SSRL fosters communication among team members, improving collab­
teams using ENA. Section 5 discusses the results and implications. The orative efficiency [39]. Thus, SSRL has been widely incorporated into
paper concludes with a summary and potential directions for future collaborative learning and plays a significant role in enhancing various
research. learner abilities.
Reflection quality is a key indicator for assessing the success of SSRL
2. Literature review [39]. High-quality reflection is an indispensable component of SSRL, as
it enables learners to examine and evaluate their learning processes and
To explore the impact of AI agents on learning processes, it is outcomes [40]. Unlike conventional collaborative learning, the reflec­
essential to examine their application in education, followed by a dis­ tion content in SSRL emphasizes the process of mutual regulation and
cussion on SSRL and reflection. monitoring among group members. However, since reflection is the final
stage of SSRL, educators often overlook its significance [41]. Teachers
2.1. AI agents in teaching lack of emphasis on the reflection stage may lead to low-quality
reflection among students [42]. Achieving high-quality SSRL reflection
Generative Artificial Intelligence (GAI), defined as AI systems remains a persistent challenge for educators and students [43].
capable of autonomous learning and content generation, has been To enhance students reflective abilities, it is essential to focus on the
2
Y. Zheng et al. Computer Standards & Interfaces 97 (2026) 104094
definition of reflection. Dewey [44] defined reflection as a continuous elaborated on the activities of SSRL and the design process of the AI
process of exploring and evaluating experiences, which helps in­ agent. Lastly, we discussed the coding scheme for reflection quality and
dividuals gain a deeper understanding of their behaviors and outcomes. provided the methodology for data collection and analysis.
Zimmerman [45] further emphasized that self-reflection is a complex
learning process involving various aspects of self-monitoring, such as 3.1. Participants
self-assessment and feedback on contributions. In the theory of SSRL,
reflection encompasses not only self-assessment but also shared moni­ The participants were from the course “Internet Thinking and Digital
toring processes with others [39]. These theories provide support for Self-Learning” over three semesters: Spring 2023, Fall 2023, and Spring
exploring and promoting the reflective process. 2024. A total of 97 undergraduate students, aged 18 to 22, took part in
In reflective activities, teachers can support students deep learning this study (Table 1).
and reflective abilities through various intervention strategies, such as At the beginning of each semester, students completed a pre-test
scaffolding, reflective prompts, and feedback [46]. Reflective scaf­ using the CThQ [63], which assesses six cognitive dimensions: mem­
folding involves providing structured guidance to help students more ory, comprehension, application, analysis, evaluation, and creation
effectively review and analyze their learning experiences [47]. When (overall reliabilityα= 0.87). According to Dewey [64], critical thinking
designing reflection tasks for SSRL, teachers often utilize the SSRL is a deepening and extension of reflective thinking, with high consis­
reflection scaffolds developed by Panadero et al. [48]. Additionally, tency in cognitive processing, reasoning, and evidence evaluation. The
reflective prompts and guiding questions steer students toward specific CThQ pre-test provides a valid proxy for students baseline reflection
directions for reflection, assisting them in identifying potential barriers levels. One-way ANOVA indicated no significant differences in pre-test
and challenges in their learning [49]. Feedback provides learners with total scores among the three groups (Group 1: M = 105.07, SD =
suggestions or information to improve task performance, helping them 6.13; Group 2: M = 103.72, SD = 4.19; Group 3: M = 105.22, SD =
optimize both their reflection and learning processes [50]. From a 4.24), F(2, 86) = 1.33, p = 0.27, suggesting comparable reflection
cognitive perspective, feedback serves as guidance to enhance students abilities across groups prior to the intervention.
task performance [51]. Timely feedback on students reflections not Participants were divided into 3 groups, each employing a different
only improves the quality of subsequent reflections but also deepens reflection method, and within each group, students were further divided
their understanding of reflective concepts [52]. into teams using random assignment to minimize potential biases arising
Reflection journals, reflection reports, and reflection short-answer from prior academic performance, familiarity, or interpersonal prefer­
questions have been explored to improve reflection quality [53,54]. ence. Random assignment was chosen over self-selection or instructor-
However, the traditional methods may not adapt to the advancements of based grouping to ensure group equivalence and to enhance the inter­
GAI. These require students to submit longer texts, which inevitably nal validity of the comparative analysis [65].
causes a risk of superficial reflections due to the use of GAI. Some The first group (G1), consisting of 31 students from the Spring 2023
scholars have also modified reflection methods from a technological semester, conducted reflection reports and were further divided into 7
perspective by using various reflection platforms, such as Google Docs teams. The second group (G2), consisting of 30 students from the Fall
[55], Flipgrid [56], the VEO app [57], and Wiki [58]. However, these 2023 semester, conducted short-answer reflections and were divided
platforms primarily offer static or limited interaction, which constrains into 7 teams. The third group (G3), consisting of 36 students from the
students ability to adaptively engage in reflective processes. The Spring 2024 semester, conducted reflections through continuous ques­
low-quality reflection issues in SSRL urgently require new solutions. tioning by an AI agent and were divided into 9 teams. Additional in­
Although GAI poses challenges to traditional reflection methods, it formation about the participants is provided in Table 1.
also offers new solutions. AI agents are increasingly regarded as effective
tools for supporting reflection practices. Research indicates that the use 3.2. Design of socially shared regulation of learning activities
of AI agents in reflection activities may enhance students learning
motivation and engagement [59]. Teachers can use AI agents to design During the 4-week activity, students collaborated in teams to pro­
reflection scaffolding, assisting learners in conducting more in-depth duce micro-lesson videos lasting 5 to 8 min. The activity was divided
and systematic reflections [60]. In addition, AI agents may enhance into 4 stages, each lasting one week (Table 2).
reflection quality through data analysis and intelligent feedback [61]. In the first week (goal setting), students were required to establish a
Therefore, AI agents demonstrate potential in addressing the issue of common goal, select the videos theme, and outline the content frame­
improving SSRL reflection quality. work. Then, they submitted a project proposal detailing the topic, ob­
Thus, this study designed a reflection assistant by AI agents to jectives, task distribution, and timeline. In the second week (task
enhance university students reflection quality in SSRL. Statistical distribution), the teams followed their project plan to allocate tasks and
analysis, content analysis, and ENA were employed to collect and begin executing the project. The instructor provided guidance and
analyze textual data related to reflection quality. By comparing the AI suggestions throughout this process. In the third week (progress moni­
agent reflection assistant with traditional SSRL strategy reflection scaf­ toring), each team submitted a video sample that was between 1 and 2
folding, this study analyzed the differences in reflection content and min long. The instructor conducted an initial evaluation based on the
reflection levels among university students across three methods. sample and suggested improvement. Students refined and adjusted their
Additionally, previous research suggests that high and low-performance video production based on the feedback. In the fourth week (reflection
teams may experience different effects from various reflection methods evaluation), students submitted their completed micro-lesson videos
[62]. Therefore, this study further explores the differences between high
and low-performance teams when using three reflection methods. This Table 1
study provides new theoretical evidence for using AI agents in SSRL Participant and group information.
reflection practices.
Group Course Reflection Team Participant Female Male
method
3. Methodology
G1 Spring Report 7 31 17 14
2023
This study employed a quasi-experiment to explore the differences G2 Fall 2023 Short-answer 7 30 19 11
among three reflection methods in SSRL. And examine whether AI questions
agents improve the reflection quality of university students. Firstly, we G3 Spring AI reflection 9 36 20 16
2024 assistant
provided information about the participants and the course. Then, we
3
Y. Zheng et al. Computer Standards & Interfaces 97 (2026) 104094
Table 2 AI agent reflection assistant, Crystal, was developed using the Coze
The stages of SSRL. platform (https://www.coze.cn/). The AI agent consists of 4 core com­
Week SSRL stages Description ponents, with Part A being the AI agents name, Part B defining the role
setting and response logic, Part C specifying the conversational experi­
1 Goal setting Students discuss the goal, theme, and framework.
2 Task distribution Students allocate tasks and make the micro lesson ence, such as the opening dialogue, and Part D serving as the preview
videos. interface. Developing the AI agent requires following these operational
3 Progress Students monitor the task and submit a video sample. steps.
monitoring
4 Reflection Students submit completed micro-lesson videos and
evaluation individual reflection assignments.
Step 1: Create the AI agent and assign it to the name Crystal (as
shown in Fig. 1, Part A). Define it as the reflection assistant for the
course “Internet Thinking and Digital Self-Learning. Set its duty to
and individual reflection assignments (employing different reflection guide students in completing tasks (as shown in Fig. 1 Part B) and
methods for each of the three semesters). Finally, a reflection-sharing design the opening statement (as shown in Fig. 1 Part C).
session was held in class, where students exchanged learning experi­ Step 2: Set up the reflection task (as shown in Fig. 1, Part B). Input all
ences and insights. the questions from the SSRL reflection scaffolding developed by
Panadero et al. [48] into the AI agents as the question base. This
ensures a logical flow of questions from the AI agent to the students,
3.3. Design of the three reflection methods preventing task misdirection. In addition, the AI agent was not
restricted to this fixed list but generated follow-up questions,
Prior to the reflection phase, all students completed a four-week particularly “Why” questions, based on the students specific an­
SSRL activity in which the instructor introduced and practiced the swers, which reflected its adaptiveness.
four SSRL stages. Consequently, all reflections were anchored in the Step 3: Set up the response rule (as shown in Fig. 1, Part B). Establish
teams performance across these four stages. In G1, the reflection the response rules for the AI agent:
remained open-ended within this framework and only specified a min­ a. Ask only one reflection question per interaction.
imum length of at least 200 words (no SSRL question list was provided). b. Provide encouraging feedback that adapts dynamically after each
In G2, students conducted individual reflections through short- response (e.g., “You did a great job”, “Your reflection is very
answer questions. The guiding questions were derived from the SSRL insightful”).
reflection scaffolding [48]. For example, questions included “What is the c. Avoid using academic terms.
groups current assignment?” and “What obstacles might the group d. Use only special interrogative questions (e.g., “What, ” “Why”),
encounter?” with follow-up questions adjusted according to students responses.
G3 students used the AI agent reflection assistant for their re­ e. After answering all questions, conclude the conversation and ex­
flections. After the SSRL task, the instructor provided students with a press gratitude.
quick response code (QR code) linking to the AI agents website. Stu­ Step 4: Testing and deployment (as shown in Fig. 1, Part D). Check
dents scanned the QR code with their phones to initiate a conversation the conversation flow and ensure the AI agents smooth and effective
with the AI agent. Each student completed the reflection task through interactions. Select 5 students for a second round of testing to ensure
the dialogue.
The development process of the AI agent is illustrated in Fig. 1. The
Fig. 1. AI agent development interface on the Coze platform.
4
Y. Zheng et al. Computer Standards & Interfaces 97 (2026) 104094
the conversation flows smoothly. Once confirmed, the AI agent can Table 3
be deployed and available to all students. Learner reflection quality coding scheme.
Categories Coding Description
3.4. Experimental procedure Reflection NOR Lacking a reflection mindset.
level LOWR Having a reflective mindset involves reviewing
experiences, describing facts and feelings, and
The experimental procedure is illustrated in Fig. 2. As described in
reflecting on what has been learned. It also
the Participants section, all students completed the CThQ [63] as a encompasses the ability to connect new knowledge
pre-test before the course. They then attended a 16-week course with existing knowledge and to improve learning
covering basic concepts. All students were taught by the same instructor, strategies.
with the course content, teaching methods, and learning resources HIGHR Critically analyzing the current situation, attempting to
view problems from different perspectives, forming
remaining entirely consistent across the three semesters. Students new viewpoints from available resources, and seeking
participated in a 4-week group collaboration activity, “creating micro to test hypotheses.
lesson videos”, conducted using the SSRL strategy. After the group ac­ Reflection DESR A description of “what” the object of reflection is.
tivity finished, each student was assigned an individual reflection task. content EXPR An explanation of the causes behind the object of
reflection, addressing the “why” often indicated by
G1 and G2 used traditional reflection methods, with G1 completing
keywords such as “in order to”, "due to", or "so as to".
reflection reports and G2 answering short-answer questions. G3 CONR Understanding whether the object of reflection has
employed a new reflection method, utilizing the AI agent reflection changed across different times and contexts, coupled
assistant. with an analysis of the reasons for these changes and
their impact on behavior, represents a higher level of
analysis concerning the “what” and “why”.
3.5. Data collection and analysis CRIR It identifies personal or team issues and analyzes them
with theory and practice to solve problems, focusing on
“how” to achieve self-reconstruction. This may include
After the three semesters, the reflection texts of all students were keywords like “needs improvement” or “next stage”.
collected and anonymized. G1 produced 31 reflection reports totaling
8032 words. G2 submitted 30 reflection short-answer texts, totaling
15,468 words. G3s AI agent reflection assistant dialogues comprised 36 To ensure reliability, a coding discussion group comprised two ex­
submissions, totaling 16,801 words (excluding the AI agents questions). perts and two professional coders. First, the two coders preliminarily
Content analysis was used to process the reflection texts. Through coded the first 10 % of the reflection texts. In cases of disagreement, they
systematic coding rules, this method reduced the influence of subjective consulted with the experts to reach a consensus. After training and
judgment and personal bias, thereby providing more objective results. repeated practice, the coders achieved a high level of consistency. The
The coding scheme consists of two parts: reflection level and reflection coders strictly adhered to the revised coding scheme during the formal
content, as shown in Table 3. The reflection level coding scheme is based coding process. After coding, inter-coder reliability was calculated,
on Plack et al. [66], and it is used to assess the overall reflection level of yielding a Cohens kappa coefficient of 0.87, indicating that the coding
learners, categorized into no reflection (NOR), low reflection (LOWR), process had a high level of reliability. The coders consulted with experts
and high reflection (HIGHR). The reflection content coding scheme is for different coding results and ultimately reached an agreement.
based on Wang et al. [67] and is used to explore the differences in the After coding the reflection texts using the content analysis, ENA was
types of learners reflection content. The reflection content is catego­ employed to conduct a fine-grained analysis of the reflection data.
rized into 4 types: descriptive reflection (DESR), explanatory reflection Content analysis excels at systematically and objectively analyzing large
(EXPR), connected reflection (CONR), and critical reflection (CRIR), volumes of textual content. ENA focuses on uncovering the complex
with reflection quality progressively increasing across these categories. relational networks between elements, such as reflection levels. The
The reflection texts in the reflection reports and short-answer re­ combination of the two methods allows for attention to both the char­
flections were relatively longer, while those in the AI agent dialogues acteristics of the text itself and the internal relationships between the
were shorter. To mitigate the differences caused by these length dis­ content elements. Additionally, the ENA Webkit (http://www.epist
crepancies, this study used a single complete sentence as the minimum emicnetwork.org/) provides a stable environment for data analysis.
coding unit. For example, the statement “As the group leader, I am quite To investigate the differences in reflection quality between the high
decisive. I directly assigned tasks to everyone, and the group was sup­ and low-performance teams, we assessed the micro lesson videos
portive.” should be coded as two separate sentences. completed by students in SSRL. The videos were assessed by two experts
Fig. 2. Experimental procedure.
5
Y. Zheng et al. Computer Standards & Interfaces 97 (2026) 104094
in education, each with over 10 years of teaching experience. The Table 5
evaluation criteria included the following categories, with topic selec­ The result of the Kruskal-Wallis H test.
tion worth 10 points, instructional design 40 points, content complete­ Codes Mean score χ² p
ness 20 points, audio-visual quality 20 points, and artistry 10 points.
G1 G2 G3
Each group received a score ranging from 0 to 100 points. The two ex­
perts thoroughly discussed the evaluation criteria to ensure consistency Reflection quality NOR 0.018 0.005 0.088 6.557 0.038
LOWR 0.267 0.163 0.232
in scoring and then individually assessed all instructional designs and
HIGHR 0.018 0.044 0.218
materials. The scoring consistency between the two experts (Spearman DESR 0.229 0.197 0.262
correlation coefficient) was 0.86 (p < 0.01). EXPR 0.100 0.103 0.264
The average score from both experts was used as the final score for CONR 0.038 0.028 0.221
CRIR 0.006 0.037 0.200
each group (Table 4). The grouping criteria for high and low performing
teams proposed by Hou [68] have been widely adopted by scholars [69].
In this study, based on those criteria, the top 15 % of teams were clas­ significance of 0.038. The mean ranks for the 3 groups were G1 = 9.14,
sified as the high-performance teams, including G1-team7, G2-team2, G2 = 8.00, and G3 = 15.86. The results indicate a statistically significant
and G3-team1. The bottom 15 % of teams were classified as the difference in reflection scores between the groups (p = 0.038). Specif­
low-performance teams, including G1-team5, G2-team6, and G3-team4. ically, G3s mean rank was significantly higher than G1 and G2, indi­
Using ENA, we further explored the differences between the high and cating that using the AI agent is associated with higher performance.
low-performance teams of students. To further investigate the observed differences, we applied ENA for a
fine-grained analysis of the students reflections across the 3 reflection
3.6. IRB approval and AI agent data privacy methods. This analysis aims to uncover the epistemic structures and
patterns, providing deeper insights into how different reflection
This study has received approval from the Institutional Review Board methods influence the quality and complexity of students reflection
(IRB) of the university, ensuring that all ethical standards are met. All processes. By analyzing epistemic networks, we may better understand
students participated voluntarily, fully aware of the studys purpose and the specific epistemic factors and relationships underlying the differ­
procedures, and signed informed consent forms prior to the ences observed in the statistical results.
commencement of the experiment. In addition, to protect participants Fig. 3 presents a comparative ENA network model of reflection
privacy, all data collected during the study were anonymized. content for the three groups using different reflection methods. In this
All conversations on the Coze platform were fully anonymized, and model, nodes represent individual reflection codes, and edges indicate
students were reminded before using the platform not to enter any the co-occurrence of codes within each unit of analysis. Blue, red, and
personal or sensitive information (such as name, student ID, gender, or purple dots denote the centroids of students in G1, G2, and G3,
school). Data was labeled only with class sequence numbers (e.g., Stu­ respectively, while the four black dots represent the four categories of
dent 1, Student 2), and access was strictly limited to the research team. reflection content (DESR, EXPR, CRIR, CONR). ENA applies singular
In addition, all students signed the Coze platforms privacy protection value decomposition (SVD) to reduce the network model to two di­
agreement, and the platform further ensures data security through mensions, which together account for 70.1 % of the variance (SVD1 =
anonymization and encryption techniques. 51.5 %, SVD2 = 18.6 %). The x-axis in the ENA space (SVD1) defines the
dimension of reflection content, with the right side (higher x-values)
4. Results representing DESR codes and the left side (lower x-values) representing
CONR codes. The y-axis (SVD2) in the ENA space defines the dimension
The results are organized to address the key research questions of reflection content, where the CRIR and EXPR codes are positioned
regarding the effectiveness of the AI agent and the differences in higher (with higher y-values). The DESR code is located lower in the
reflection quality across various reflection methods. ENA space (with lower y-values). This model allows comparison across
students and groups, showing which types of reflection are more
4.1. How does the AI agent reflection assistant affect learners reflection dominant and how reflection content patterns differ between groups.
quality in SSRL? The right side of Fig. 3 displays the mean networks of the 3 groups.
Overall, the reflection content of all 3 groups predominantly features
A Kruskal-Wallis H test was conducted to assess the differences in EXPR and DESR, with a strong association observed between these two
SSRL reflection scores among the 3 groups of students using different points. The reflection content network of G1 is the sparsest, with only a
reflection methods, as shown in Table 5. The test compares independent few occurrences of CRIR, aside from the relatively frequent appearances
samples without assuming a normal data distribution. This makes it of EXPR and DESR. The network of G2 is more concentrated, with dis­
highly suitable for analyzing the multiple groups of non-normally tribution across all 4 reflection types and a stronger CRIR-DESR
distributed reflection data in this study. connection (value of 0.10). The reflection content of G3 is the most
For this analysis, an overall reflection quality score was calculated densely connected, with all 4 types having a relatively high proportion
for each student by taking the meaning of all seven reflection codes of representation. The CRIR-CONR (0.23) and CONR-EXPR (0.13) con­
(NOR, LOWR, HIGHR, DESR, EXPR, CONR, CRIR). This composite score nections are relatively strong. In contrast, the other pairs based on
was used for the Kruskal-Wallis H test, while the mean scores for indi­ traditional SSRL reflection did not exhibit strong correlations.
vidual codes presented in Table 5 are provided only for descriptive Table 6 demonstrates how the AI agent, through guided dialogue,
purposes. facilitated the transition of G3 students from connective reflection
The results showed a chi-square value of 6.557, and an asymptotic (CONR) to critical reflection (CRIR), thereby deepening the SSRL
Table 4
Scores of the SSRL performance for the 3 groups.
Group team1 team2 team3 team4 team5 team6 team7 team8 team9
G1 86.0 90.0 88.5 76.5 68.5 87.0 92.0 NA NA
G2 83.5 93.5 87.0 90.5 81.0 71.5 76.5 NA NA
G3 94.0 84.5 75.0 71.5 88.0 76.0 89.5 84.5 90.5
6
Y. Zheng et al. Computer Standards & Interfaces 97 (2026) 104094
Fig. 3. Comparison of reflection content.
then proposed a concrete plan for deeper learning of AI technology
Table 6
(CONR). The AI agents continuous questioning and feedback prompted
Transcribed example of the G3 reflection content.
students to progress from simple descriptive reflection to more critical
Student Transcript and in-depth reflection throughout the reflection process.
1 AI agent: What problems did you encounter, and how did you solve them? Table 7 presents reflection examples from some G1 and G2 students,
Student 9: We initially struggled, but by discussing with our team and highlighting the impact of different reflection forms and guidance
leader, we solved the problems and saw how teamwork improved our
process over time. (CONR)
AI agent: What are your plans moving forward?
Table 7
Student 9: I hope for more team projects to use everyones strengths and
improve how I contribute next time. (CRIR)
Transcript example of the G1 and G2 reflection content.
2 AI agent: What were the shortcomings of this micro-lecture task? Group Transcript
Student 18: Our PPT didnt cover everything, so next time Ill make it
more complete. (CRIR) G1 G1-Student 4: Our group chose a radio show format for this Himalaya
AI agent: I see, thats a very insightful analysis. What plans do you have for assignment. (DESR) Ive always been a fan of radio shows, so I was very
your future studies? happy to have the opportunity to create one this time. (DESR) Of course, I
Student 18: I plan to learn more about AI video production to improve my also faced some challenges during the production process (DESR), such as
skills for future jobs. (CONR) the tone not fitting the storyline and the quality of the program needing to
3 AI agent: Did you encounter any difficulties and challenges while be better. (EXPR)
completing the task? If so, what were they, and how did you solve them? G1 G1-Student 30: Regarding this task, firstly, we didnt do well in the
Student 31: Our biggest challenge was managing time, but adjusting our presentation aspect. The presentation was only in the form of a document,
schedule helped us complete the tasks. (CONR) which needed to ensure a smooth connection between the presentation and
AI agent: Great. How did you overcome these difficulties and challenges? the work, making it difficult to access the content. (CONR) Secondly, the
Student 31: We held weekly meetings to discuss. (DESR) content presentation was poorly executed and lacked a logical structure.
4 AI agent: Please summarize the successful aspects of this task. (EXPR) Finally, the speech was not coherent during the presentation, and
Student 36: The tasks were well-organized, and because our team the preparation was insufficient. (EXPR)
cooperated closely, we were able to complete the work more efficiently G2 G2-Student 6:
than at the beginning. (CONR) Task: We approached the task mainly in two aspects. (DESR) The first part
determined the theme and type of work, and the second part recorded the
work. (DESR)
Division of labor: Our division of labor and cooperation were very
reflection process. Under the guidance of the AI agent, student 9 and
reasonable, and each member completed their assigned tasks. (EXPR)
student 31 shifted from describing the current state of teamwork and Self-evaluation: Very successful. (DESR)
time management, such as “We solved problems through communica­ Outlook: We plan to work more collaboratively on each task and strive to do
tion with team members” (CONR), to deeper reflections on self- our best. (CRIR)
improvement and future learning plans, exemplified by “I hope for G2 G2-Student 27:
Task: This task enhanced our understanding of content production and
more team projects to utilize everyones potential” (CRIR). Prompted by strengthened the collaboration among team members. (EXPR)
the AI agents questioning, student 10 and student 36 reflected on the Division of labor: Our team had a clear division of responsibilities, and
shortcomings of the SSRL tasks, noting that “The resources were not everyone had their tasks (EXPR). I was responsible for the recording, which
comprehensive, and most content lacked innovation” (CONR), and was quite challenging. (EXPR)
Self-evaluation: Although our team may not have been the best among all
further analyzed the root causes of these issues, along with potential
the teams, we had unique messages to convey. (CONR) If there is a next
improvement measures (CRIR). Inspired by the AI agent, student 18 first time, we will strive to improve it. (CRIR)
identified the issue of inadequate presentation in the task (CRIR) and Outlook: We should promote our work more effectively. (CRIR)
7
Y. Zheng et al. Computer Standards & Interfaces 97 (2026) 104094
methods on students reflection quality. Two G1 students (student 4 and study, the U values are relatively high; however, they remain within the
student 30) conducted their reflections in the form of reports. Due to the acceptable range for statistical analysis. Some of these differences
lack of specific guidance from the instructor, who only provided general showed relatively small effect sizes, which will be further addressed in
requirements, their reflections remained superficial, primarily involving the discussion section.
DESR and EXPR. For example, student 4 wrote, “I have always enjoyed
radio shows, so I was very pleased to have the opportunity to create one 4.2. What differences do high and low-performance teams show in
this time. “Student 30 mentioned, "The tone did not match the storyline, reflection quality when using the three reflection methods?
and the sound quality of the program was poor. These reflections remain
limited to mere descriptions of the phenomena, needing more in-depth Fig. 4 illustrates the distribution of students from the 3 reflection
analysis of the underlying causes and offering no insights for future methods (G1, G2, G3) along the two principal component axes (SVD1
improvement. This tendency may be related to the relatively broad and SVD2). The points of different colors and shapes in the figure
scope of the reports. These examples demonstrate that structured represent high and low-performance teams within each group, indi­
guidance exerts a positive effect on the quality of reflection. In addition, cating their performance across various reflection categories, such as
they highlight the importance of timely feedback and question DESR, EXPR, CONR, and CRIR. The SVD1 axis accounts for 77.3 % of the
prompting. Providing students with immediate feedback based on their total variance, while the SVD2 axis explains 16.8 %. The position of each
responses and guiding them toward more elaborated answers contrib­ point represents the students tendencies in reflection content, with
utes to fostering deeper levels of reflection. points closer to a specific reflection category indicating that the groups
In contrast, two students from Group G2 (student 6 and student 27), performance is more concentrated in that category.
guided by the 4 aspects provided by the instructor and reflecting In Fig. 4, the centroids of the low-performance teams in G1 and G2
through short-answer questions, demonstrated a higher reflection are positioned relatively close to each other, with the low-performance
quality. The instructor guided students to reflect on four dimensions, teams located higher near DESR. Conversely, the high-performance
including task, division of labor, self-evaluation, and outlook. This teams are situated lower, closer to CRIR. This indicates a certain de­
approach, particularly in the latter two areas, effectively fostered CRIR gree of similarity in the reflection content between the low-performance
and CONR. For example, student 6 mentioned, “We plan to collaborate teams in G1 and G2. G3 is distributed on the right side of the figure, with
more effectively in completing each future learning task, striving to a greater distance between the high and low-performance teams, indi­
achieve the best outcome” (CRIR). At the same time, student 27 stated, cating a more pronounced difference in reflection content than the other
“Although our team may not be the best among all teams, we conveyed teams. Unlike G1 and G2, the G3 high-performance teams are positioned
our unique message. If there is a next time, we will work harder to at the top, closer to CONR, while the low-performance teams are located
improve” (CONR and CRIR). This structured guidance enhanced the at the bottom, near CRIR and EXPR. This suggests that the high-
depth of reflection. However, since short-answer questions are a one- performance teams in G3 tend to engage more in connective reflec­
way form of reflection for students, the instructor may not intervene tion, whereas the low-performance teams focus more on critical and
in their responses. As a result, there may be instances where students explanatory reflection.
provide irrelevant answers or overly brief responses, which can affect The study employed the Mann-Whitney U test to elucidate further
the overall reflection quality. For instance, student 6 responded with the scaling characteristics of the differences in reflection content be­
“Very successful” in the self-evaluation section (DESR), which lacked tween the high and low-performance teams across the 3 cohorts
depth in reflection. The AI agent could address this shortcoming by (Table 8). According to the results of the Mann-Whitney U test, there are
facilitating continuous interaction and feedback, encouraging students differences in the reflection content performance between the high and
to engage in deeper reflection.
When comparing the effectiveness of the reflection methods in G1,
G2, and G3, G1s reflection reports were of lower quality, primarily
focusing on DESR and EXPR. Due to the absence of specific guidance, the
reflections needed more depth. The short-answer questions format in G2
improved reflection quality to some extent. Students reflections became
more focused with the instructors guidance, particularly improving
CRIR and CONR. However, this approach is still constrained by the
limitations of outcome-based assessment. The AI agent guidance in G3
further enhanced reflection quality. Through real-time feedback and
targeted questioning, students could engage in deeper levels of CRIR and
CONR.
To scale these differences, the Mann-Whitney U test was employed to
evaluate the distribution of the projection points of the 3 groups of
students within the ENA space. The results indicated that at the α = 0.05
significance level, G1 and G2 showed significant differences in both the
first dimension (U = 147,537, p = 0.01, r = 0.09) and the second
dimension (U = 147,204, p = 0.01, r = 0.08). This suggests that the
structured guidance provided by short-answer questions enhances
reflection quality. G1 and G3 also showed a significant difference in the
first dimension (U = 99,595.5, p = 0.00, r = 0.34), highlighting the
impact of integrating the AI agent in G3 to enhance reflection quality.
However, no difference was observed in the second dimension (U =
147,049.5, p = 0.42, r = 0.03). Additionally, G2 and G3 exhibited dif­
ferences in both the first dimension (U = 127,246.5, p = 0.00, r = 0.36)
and the second dimension (U = 215,386.5, p = 0.01, r = 0.08), further
demonstrating the effectiveness of the AI agent in fostering deeper
reflection. This effect surpasses that of the structured short-answer Fig. 4. The centroid distribution of high and low group students across the
questions approach alone. Notably, due to the large sample size in this three reflection methods.
8
Y. Zheng et al. Computer Standards & Interfaces 97 (2026) 104094
Table 8
The reflection content distribution of high and low-performance teams across the three methods.
low-performance teams across different reflection approaches. In G1, highlighted, AI may assist learners in constructing their learning pro­
the high and low-performance teams did not exhibit significant differ­ cesses, thereby enhancing critical thinking. In higher education, Xia and
ences in either dimension (MR1: U = 4932.00, p = 0.41, r = 0.05; MR2: Li [73] also suggested that AI assistants have a positive impact on stu­
U = 5463.00, p = 0.44, r = 0.05). In G2, the high and low-performance dents imagination, creativity, critical thinking, and autonomous
teams showed a significant difference in the MR1 dimension (U = learning. Zang et al. [69] experimentally confirmed the role of AI agents
3303.00, p = 0.03, r = 0.19) but no difference in the MR2 dimension (U in enhancing students critical thinking in English learning. However,
= 3051.00, p = 0.26, r = 0.10). For G3 (students using AI agent-driven the systematic review by Mohamud et al. [74] indicated that the
continuous questioning), the high and low-performance teams showed a introduction of AI in higher education may diminish students critical
significant difference in the MR1 dimension (U = 1136.50, p < 0.001, r thinking. This conclusion contradicts the findings of this study. The
= 0.45). In contrast, the difference in the MR2 dimension was insignif­ differences may be due to a lack of proper instructional design by
icant (U = 2187.50, p = 0.54, r = 0.06). teachers when using AI [74]. Cronje [75] argued that AI may serve as a
In G3, the differences between the high and low-performance teams teaching assistant to facilitate learning, but it should be integrated with
were the most pronounced, particularly on the MR1 dimension. Further instructional design and necessary prompts. In this study, the SSRL
analysis of the ENA diagram revealed that low-performance teams reflection checklist was operationalized as structured prompts to cali­
exhibited stronger connections in EXPR-CRIR (0.46) and EXPR-CONR brate the AI agent, enabling it to scaffold students reflections across the
(0.61). This suggests that the AI agent-driven reflection method may four phases of SSRL. By embedding SSRL principles into its dialogic
help low-performance teams focus more on specific reflection content. design, the agent acted as both a facilitator of reflection and a medium
for delivering theoretical scaffolds. This underscores the importance for
5. Discussion educators and researchers to apply instructional theory and design
thoughtfully when integrating AI into the classroom.
This section analyzes the findings based on the research questions. It In addition to SSRL theoretical guidance, the AI agent leveraged its
covers the positive impact of AI agents on students SSRL reflection, technological capabilities, including continuous questioning and real-
differences in reflection quality between high and low-performance time feedback, to actively scaffold deeper student reflections. Wolf­
teams, and key considerations for using AI agents effectively in SSRL. bauer et al. [76] noted that continuous dialogue with intelligent assis­
tants enhances students levels of reflection. In the G3 group, the AI
agents not only guided students to explore the root causes of issues but
5.1. The positive role of AI agents in students SSRL reflection
also helped them develop specific improvement plans. This guiding
process is similar to the “Socratic method” in educational psychology.
In SSRL, the AI agent reflection assistant enhanced the quality of
Through a series of targeted questions, students are encouraged to
students reflections. This outcome aligns with previous research [70,
engage in deep thinking and gain a more profound understanding of the
71]. For instance, Maedche et al. [70] demonstrated the positive role of
knowledge [77]. In addition, the timely feedback function of AI agents
AI agents in fostering deeper reflection among students. Sigman et al.
plays a crucial role in enhancing the quality of students SSRL re­
[71] also found that AI assistants emulate and augment human cogni­
flections. Self-determination theory suggests that providing positive
tion, thereby promoting reflection. These studies provide more evidence
emotional support through feedback helps students gain a sense of
of the positive impact AI agents have on facilitating reflective practices
belonging, thereby enhancing their motivation to learn and willingness
in education.
to reflect [78]. Uygur et al. [79] suggested that timely feedback
This study further clarifies how AI agents enhance the quality of
enhanced students reflection and learning. However, traditional SSRL
student reflection in the SSRL process through ENA. In these activities,
reflection reports and short-answer questions are one-way reflective
student reflections guided by AI agents exhibited higher levels of critical
activities, lacking immediate feedback and guidance. The AI agent
thinking and coherence. In contrast, the other two traditional reflective
reflection assistant compensates for the shortcomings of teachers in
texts displayed lower levels of reflection, focusing primarily on
providing timely feedback, enhancing the effectiveness of collaborative
descriptive and exploratory reflection. As Rusandi et al. [72]
9
Y. Zheng et al. Computer Standards & Interfaces 97 (2026) 104094
learning. examine how to fine-tune AI guidance so that it benefits high performers
This study indicates that the level of reflection guidance directly without disrupting their existing strategies.
affects learners reflection quality, which is consistent with previous Additionally, there was no significant difference in performance
research [8082]. G1, with minimal guidance, showed the lowest between high and low-performance student teams in reflective reports,
quality, while G2, guided by the SSRL reflection checklist, exhibited with both showing low quality reflections. This may be due to learners
higher-quality reflections, demonstrating the importance of SSRL scaf­ lacking clear guidance in the reflection process. Maedche et al. [70]
folds. G3 combined SSRL scaffolding with real-time feedback and found that in reflective environments lacking external feedback or
encouragement for deeper reflection. Comparisons suggest that while structured guidance, the quality of students reflections is constrained.
structured short-answer questions had a limited impact, the AI agent This suggests that instructors should provide the necessary scaffolding
provided a practically meaningful enhancement of students reflective when designing reflective tasks. The SSRL scaffolding demonstrated
practices. However, these findings are based primarily on qualitative significant value in this study and is well-suited for broader application
data, and further quantitative research is needed to validate them. in collaborative settings.
In summary, AI agents play a substantial role in promoting student
reflection. Although the comparison between structured short-answer 5.3. Considerations for the effective use of AI agents in SSRL
questions and traditional reflective reports showed statistically signifi­
cant but very small effects, this suggests that short-answer questions Although experiments have demonstrated that AI agents enhance
alone had a limited impact on enhancing students reflection quality. In SSRL reflection quality, there are several limitations in their usage. To
contrast, the AI agent had a substantially greater impact on students better promote the outcomes of this study, we offer considerations for
reflective practices. It is essential for educators and instructional de­ teachers and instructional designers regarding the use of AI agents.
signers to integrate AI agents into classrooms and develop more Firstly, the quality and reliability of feedback provided by AI agents
instructional design case studies. Moreover, teachers should prioritize still present limitations. This finding aligns with the studies of Maloney
the importance of instructional theories and provide essential design et al. [91] and Fedus et al. [92], which suggest that the accuracy and
guidance when applying AI agents. effectiveness of AI agents depend on algorithm design and data quality.
In this study, the AI agent exhibited two primary issues: repeated
5.2. Differences between high and low-performance teams under various questioning and unexpected interruptions during conversations. To
SSRL reflection methods address the issue of repeated questioning, adjustments to the prompt
design can be implemented. For example, the prompts specify that each
The results indicate a significant difference in the high and low- question should be asked only once and repeated only if the student
performance teams that utilized reflective short-answer questions and responds off-topic or does not answer. For unexpected interruptions,
the AI agent reflection assistant. In short-answer questions, high- teachers need to guide students in testing their network environment
performance teams performed better. This aligns with the conclusions and re-engaging with the task. These observations show that AI agents
of Knight et al. [83], who found that high-performance students out­ need improvement in handling complex contexts and dynamic learning
performed low-performance students in reflective questions. The needs.
disparity in reflection between high and low-performance learners is In addition, data privacy and ethical concerns pose another chal­
primarily attributed to their metacognitive levels and learning strategies lenge in the application of AI agents. AI agents require extensive data
[8486]. For instance, Safari and Fitriati [85] found that collection, including students reflection content, behavioral patterns,
high-performance learners were able to use all strategies equally, but and learning habits [93]. To mitigate this issue, this study incorporated
low-performance learners more frequently relied on metacognitive and an opening message in the AI agents script. The message advised stu­
social strategies. These differences may impact learners outcomes, dents: “Please do not disclose personal sensitive information, such as
including their learning effectiveness and reflection [84]. your name or school, during the interaction.” Furthermore, before
In contrast, the reflection quality of low-performance teams using the implementing the AI agent, teachers need to raise students awareness of
AI agent reflective assistant was better than that of the high- data security and privacy protection [94].
performance teams. This is a novel finding of the study, suggesting The risks associated with over-reliance on AI technology should also
that the AI reflective assistant played a positive role in guiding low- be carefully evaluated. Although AI agents can provide personalized
performance learners through the reflection process. This finding support, they cannot fully replace the role of human teachers, particu­
aligns with previous evidence showing that AI technologies tend to larly in offering emotional support and fostering social interaction [95].
provide greater benefits for lower performers [8790]. Prior studies In this study, AI agents were utilized exclusively in the post-class
have suggested that such differential effects often occur because an AI reflection phase. The remaining instructional time relied on
chatbot can use adaptive strategies and personalized feedback to address face-to-face interactions between teachers and students. As GAI tech­
the strategic gaps of low performers [88]. AI tutoring can also offer both nology becomes increasingly accessible, preventing students from
cognitive and emotional support [89]. Xu et al. [90] further found that developing dependency behaviors may become more challenging.
low-performing learners become more engaged when they receive im­ Future research could explore strategies to prevent learners from
mediate feedback and external help. This engagement encourages them becoming overly reliant on GAI technologies.
to apply higher-order thinking strategies more actively. While AI agents have demonstrated advantages in enhancing stu­
These mechanisms may also explain the current results in our SSRL dents SSRL reflection quality, their widespread applicability is con­
reflection task. The AI reflection assistant provided structured guidance strained by feedback quality, data privacy, and ethical considerations.
in real time and reduced the cognitive load of producing reflections. This Future research should emphasize these limitations, refining the appli­
allowed low-performing learners to focus more on critical and creative cation framework of AI to ensure its effectiveness and sustainability in
thinking. In contrast, high-performing learners may already have the educational domain.
established reflection routines. Extra guidance could interfere with these
processes, leading to smaller gains in reflection quality [87]. 6. Conclusion, limitations, and future research
This study, therefore, not only confirms that differential effects exist
in reflection tasks but also highlights the potential of AI support to This study explores methods to enhance student reflection quality by
promote higher-order thinking in low-performing learners. In educa­ designing an AI agent that supports reflection through continuous
tional practice, this suggests that AI reflection assistants could be stra­ questioning and real-time feedback. Using content analysis and ENA,
tegically deployed to close performance gaps. Future research could this study conducted a three-semester experiment comparing reflection
10
Y. Zheng et al. Computer Standards & Interfaces 97 (2026) 104094
reports, short-answer questions, and an AI agent reflection assistant. The National Natural Science Foundation of China (Grant Number:
results indicate that AI agents improve reflection quality, particularly for 62577035). The other authors declare that they have no known
low-performance teams. The study offers practical guidance for inte­ competing financial interests or personal relationships that could have
grating AI into SSRL-based instruction. appeared to influence the work reported in this paper.
Although this study contributes to understanding students reflection
behaviors in SSRL, several limitations remain. The first limitation arises Appendix A. The Critical Thinking Questionnaire (CThQ)
from the study participants. Conducted within a higher education
setting, this research primarily examines the effectiveness of using AI Instructions: For each statement below, please indicate how much
agents to facilitate reflection among university students. Only 97 stu­ you agree using a 5-point Likert scale (1 = Strongly disagree, 2 =
dents from the “Internet Thinking and Digital Self-Learning” course Disagree, 3 = Neutral, 4 = Agree, 5 = Strongly agree).
participated, so the findings may not be generalizable to other courses or
age groups. Further research is needed to explore the potential impact 1. After reading a text, I check important information, even if it
and adaptability of AI agents in secondary and primary education set­ seems to be true.
tings [96]. Secondly, the AI agent still has limitations in the quality and 2. I like combining information from different texts.
reliability of feedback, which may affect the depth and quality of stu­ 3. I am willing to share newly acquired information.
dents reflections. Addressing this issue relies on rapidly updating and 4. In-depth analyses of reality is a waste of time.
optimizing large AI model algorithms to provide higher-quality and 5. After reading a text, I can recall important points.
more targeted feedback. The third limitation is that the three reflection 6. The same content can be expressed in many different ways.
methods used in this experiment all fall under outcome-based reflection, 7. I can understand texts from various fields.
overlooking the dynamic process of students reflections at different 8. I form my impressions based on various pieces of information that
stages of collaborative learning. Additionally, the proposed mechanisms I combine.
underlying the AI agents impact on reflection quality, particularly for 9. Everything already exists, so nothing completely new can be
low-performance teams, remain hypothetical and require further created.
empirical validation through quantitative studies. Lastly, this study did 10. When I talk, I give many examples.
not differentiate the specific contributions of individual design elements 11. In discussions, I care about justifying my stance while under­
in the AI agents interaction strategy (e.g., sequential questioning, standing the other party.
encouraging feedback, simplified language). More research could adopt 12. I like finding connections between seemingly different
ablation analysis to examine how these elements independently influ­ phenomena.
ence students reflective practices. 13. I can see the structure of a text, and I could reorganize it.
Based on the limitations identified in this study, future research 14. When discussing, I try to use practical examples to justify my
could expand the study to more diverse educational contexts, including stance.
secondary and primary education, to examine the generalizability and 15. If necessary, I can recall information I have read before.
adaptability of AI agents. Incorporating multi-modal data, such as stu­ 16. I do not remember much of what I learned at school.
dents facial expressions, gestures, and dialogue, may offer a more 17. When I am interested in some information, I try to verify whether
comprehensive understanding of reflective behaviors in SSRL. Im­ it is true.
provements in AI models are needed to enhance the quality and reli­ 18. I can extract the most relevant parts of a text.
ability of feedback, supporting deeper and higher-quality student 19. To evaluate information, I check multiple sources.
reflections. In addition, investigating the individual contributions of 20. I like discussing new interpretations of texts I already know.
specific design elements in AI agents interaction strategies, for example, 21. I like to collate different opinions and compare them.
through ablation-style comparisons, could clarify which features most 22. I have difficulties with paraphrasing.
effectively promote high-order reflection, particularly among low- 23. I try to apply the information I have learned in everyday life.
performance teams. We therefore urge more researchers to focus on 24. When I read, I look for relationships between its information and
this area of study, exploring the impact of GAI on educational outcomes other texts I have read.
to better understand and harness its potential for improving educational 25. I pay attention to the contexts, nuances, and overtones of
practices. statements.
Declaration of generative AI in the writing process
Data availability
During the preparation of this work, the authors used Kimi (https:
//kimi.moonshot.cn/) to improve language and readability. After The datasets generated and analyzed during the current study are
using this tool, the authors reviewed and edited the content as needed available from the corresponding author on reasonable request.
and take full responsibility for the content of the publication.
References
CRediT authorship contribution statement
[1] S. Ahmad, M. Rahmat, M. Mubarik, M. Alam, S. Hyder, Artificial intelligence and
its role in education, Sustainability 13 (22) (2021) 12902.
Yumin Zheng: Writing original draft, Conceptualization. Fengjiao [2] X. Gong, Z. Li, A. Qiao, Impact of generative AI dialogic feedback on different
Tu: Investigation, Data curation. Fengfang Shu: Investigation, Data stages of programming problem solving, Educ. Inf. Technol. 30 (7) (2025)
curation. Chaowang Shang: Formal analysis, Data curation. Lulu 96899709.
[3] O. Tapalova, N. Zhiyenbayeva, D. Gura, Artificial Intelligence in Education: aIEd
Chen: Writing review & editing, Formal analysis. Jiang Meng: for personalised learning pathways, Electron. J. e-Learn. 20 (5) (2022) 639653.
Investigation. [4] S. Järvelä, P. Kirschner, E. Panadero, J. Malmberg, C. Phielix, J. Jaspers,
M. Koivuniemi, H. Järvenoja, Enhancing socially shared regulation in collaborative
learning groups: designing for CSCL regulation tools, in: Educ. Technol. Res. Dev.,
Declaration of competing interest 63, 2014, pp. 125142.
[5] D. Bransen, M.J.B. Govaerts, E. Panadero, et al., Putting self-regulated learning in
The authors declare the following financial interests/personal re­ context: integrating self-, co-, and socially shared regulation of learning, Med.
Educ. 56 (1) (2022) 2936.
lationships which may be considered as potential competing interests:
Chaowang Shang acknowledges the financial support from the
11
Y. Zheng et al. Computer Standards & Interfaces 97 (2026) 104094
[6] E. Eshuis, J. Vrugte, A. Anjewierden, L. Bollen, J. Sikken, T. Jong, Improving the [36] E. Panadero, S. Järvelä, Socially shared Regulation of Learning: a review, Eur.
quality of vocational students collaboration and knowledge acquisition through Psychol. 20 (2015) 190203.
instruction and joint reflection, Int. J. Comput.-Support. Collab. Learn. 14 (2019) [37] J. Isohätälä, H. Järvenoja, S. Järvelä, Socially shared regulation of learning and
5376. participation in social interaction in collaborative learning, Int. J. Educ. Res. 81
[7] C. Chan, K. Lee, Reflection literacy: a multilevel perspective on the challenges of (2017) 1124.
using reflections in higher education through a comprehensive literature review, [38] J. Li, Y. Lin, M. Sun, R. Shadiev, Socially shared regulation of learning in game-
Educ. Res. Rev. 32 (2020) 100376. based collaborative learning environments promotes algorithmic thinking,
[8] L. Guo, How should reflection be supported in higher education? — A meta- learning participation, and positive learning attitudes, Interact. Learn. Environ. 31
analysis of reflection interventions, Reflective Pract. 23 (2021) 118146. (2020) 17151726.
[9] S. Popenici, S. Kerr, Exploring the impact of artificial intelligence on teaching and [39] J. Malmberg, S. Järvelä, H. Järvenoja, E. Panadero, Promoting socially shared
learning in higher education, Res. Pract. Technol. Enhanc. Learn. 12 (1) (2017) 22. regulation of learning in CSCL: progress of socially shared regulation among high-
[10] H. Kiy, A Study on Writing Experience With ChatGPT of College Students, J. Korea and low-performing groups, Comput. Hum. Behav. 52 (2015) 562572.
Converg. Soc. 14 (9) (2024) 976. [40] J. Yukawa, Co-reflection in online learning: collaborative critical thinking as
[11] K. Hanifi, O. Cetin, C. Yilmaz, On ChatGPT: perspectives from software engineering narrative, Int. J. Comput.-Support. Collab. Learn. 1 (2006) 203228.
students, in: Proc. 2023 IEEE 23rd Int. Conf. Softw. Qual. Reliab. Secur. (QRS), [41] A. Głowala, M. Kołodziejski, T. Butvilas, Reflection as a basic category of a
2023, pp. 196205. teachers thinking and action, Multidiscip. J. Sch. Educ. 12.1(2023):229250.
[12] Zhiheng Xi, et al., The rise and potential of large language model based agents: A [42] J. Buck, Reflecting on reflections: a case study of disappointment in student writing
survey, Sci. China Inf. Sci. 68 (2) (2025) 121101. assignments, J. Acoust. Soc. Am. (2023). A273-A273.
[13] E. Katsarou, F. Wild, A. Sougari, P. Chatzipanagiotou, A systematic review of voice- [43] N Rahmi, C M Zubainur, Students mathematical reflective thinking ability through
based intelligent virtual agents in EFL education, Int. J. Emerg. Technol. Learn. scaffolding strategies[C]//Journal of Physics: Conference Series, IOP Publishing
(iJET) 18 (10) (2023) 6585. 1460 (1) (2020) 012022.
[14] P.R. Lewis, Ş. Sarkadi, Reflective artificial intelligence, Minds Mach. 34 (2) (2024) [44] J. Dewey, Education Democracy, The elementary school teacher 4 (4) (1903)
14. 193204.
[15] Z. Xu, P. Zhang, M. Tu, M. Zhang, Y. Lai, Brain optimization with additional study [45] B.J. Zimmerman, Self-regulated learning and academic achievement: an overview,
time: potential brain differences between high- and low-performance college Educ. Psychol. 25 (1) (1990) 317.
students, Front. Psychol. 14 (2023) 1209881. [46] D. Coulson, M. Harvey, Scaffolding student reflection for experience-based
[16] UK Government, Generative Artificial Intelligence (AI) in Education, GOV.UK, learning: a framework, Teach. High. Educ. 18 (2013) 401413.
2023. https://www.gov.uk/government/publications/generative-artificial-i [47] S. Lajoie, Extending the scaffolding metaphor, Instr. Sci. 33 (2005) 541557.
ntelligence-in-education/generative-artificial-intelligence-ai-in-education. [48] E. Panadero, P.A. Kirschner, S. Järvelä, J. Malmberg, H. Järvenoja, How individual
[17] M. Dogan, T. Dogan, A. Bozkurt, The use of artificial intelligence (AI) in online self-regulation affects group regulation and performance: a shared regulation
learning and distance education processes: a systematic review of Empirical intervention, Small Group Res. 46 (4) (2015) 431454.
Studies, Appl. Sci. 13 (5) (2023) 3056. [49] E. Davis, Prompting middle school science students for productive reflection:
[18] L. Shi, The integration of advanced AI-enabled emotion detection and adaptive generic and directed prompts, J. Learn. Sci. 12 (2003) 142191.
learning systems for improved emotional regulation, J. Educ. Comput. Res. 63 [50] J. Hattie, H. Timperley, The Power of Feedback, Rev. Educ. Res. 77 (2007)
(2024) 173201. 112181.
[19] B. Tang, J. Liang, W. Hu, H. Luo, Enhancing programming performance, learning [51] R. Ajjawi, F. Kent, J. Broadbent, J. Tai, M. Bearman, D. Boud, Feedback that works:
interest, and self-efficacy: the role of large language models in middle school a realist review of feedback interventions for written tasks, Stud. High. Educ. 47
education, Systems 30 (6) (2025) 81098138. (2021) 13431356.
[20] L. Feng, Investigating the effects of artificial intelligence-assisted language learning [52] U. Krause, R. Stark, Reflection in example- and problem-based learning: effects of
strategies on cognitive load and learning outcomes: a comparative study, J. Educ. reflection prompts, feedback, and cooperative learning, Eval. Res. Educ. 23 (2010)
Comput. Res. 62 (8) (2025) 17411774. 255272.
[21] Q. Huang, W. Li, Y. Zhao, Enhancing deep learning and motivation in university [53] J. Contreras, S. Edwards-Maddox, A. Hall, M. Lee, Effects of reflective practice on
English education through AI technology: a quasi-experimental study, Asian J. baccalaureate nursing students Stress, Anxiety, and competency: an integrative
Educ. Soc. Stud. 51 (4) (2025) 452463. review, Worldviews Evid.-Based Nurs. 17 (3) (2020) 239245.
[22] Ó. Cuéllar, M. Contero, M. Hincapié, Personalized and Timely Feedback in Online [54] H. Gadsby, Fostering reflective practice in Post Graduate Certificate in Education
education: Enhancing learning With Deep Learning and Large Language Models, students through reflective journals. Developing a typology for reflection,
MTI. 9 (5) (2025) 45. Reflective Pract. 23 (2022) 357368.
[23] X. Zhou, D. Teng, H. Al-Samarraie, The mediating role of generative AI self- [55] S. Rabu, N. Badlishah, Levels of students Reflective thinking skills in a
regulation on students critical thinking and problem-solving, Educ. Sci. 14 (12) collaborative learning environment using Google Docs, TechTrends 64 (2020)
(2024) 1302. 533541.
[24] S. Steenbergen-Hu, H. Cooper, A meta-analysis of the effectiveness of intelligent [56] J. Stoszkowski, A. Hodgkinson, D. Collins, Using Flipgrid to improve reflection: a
tutoring systems on college students academic learning, J. Educ. Psychol. 106 collaborative online approach to coach development, Phys. Educ. Sport Pedagogy
(2014) 331347. 26 (2020) 167178.
[25] C. Moridis, A. Economides, Affective learning: empathetic agents with emotional [57] E. Liesa, P. Mayoral, M. Giralt-Romeu, S. Angulo, Video-Based Feedback for
facial and tone of voice expressions, IEEE Trans. Affect. Comput. 3 (2012) Collaborative Reflection Among Mentors, University Tutors, and Students, Edu.
260272. Sci. 13 (9) (2023) 879.
[26] S. Nelekar, A. Abdulrahman, M. Gupta, D. Richards, Effectiveness of embodied [58] M. Alghasab, J. Hardman, Z. Handley, Teacher-student Interaction On wikis:
conversational agents for managing academic stress at an Indian university (ARU) Fostering collaborative Learning and Writing, Learn Cult. Soc. Inter. 21 (2019)
during COVID-19, Br. J. Educ. Technol. 53 (2021) 491511. 1020.
[27] W. Sun, Q. Chen, The design, implementation, and evaluation of Gamified [59] R. Gubareva, R. Lopes, Virtual Assistants for learning: a systematic literature
Immersive Virtual Reality (IVR) for learning: a review of Empirical Studies, Proc. review, CSEDU (1) (2020) 97103.
Eur. Conf. Games-Based Learn. 17 (1) (2023) 789797. [60] L. González, H. Neyem, I. Contreras-McKay, D. Molina, Improving learning
[28] M. Chen, L. Wu, Z. Liu, X. Ma, The impact of metacognitive strategy-supported experiences in software engineering capstone courses using artificial intelligence
intelligent agents on the quality of collaborative learning from the perspective of virtual assistants, Comput. Appl. Eng. Educ. 30 (2022) 13701389.
the community of inquiry, in: Proc. 2024 4th Int. Conf. Educ. Technol. (ICET), [61] B. Renner, G. Wesiak, V. Pammer-Schindler, M. Prilla, L. Müller, D. Morosini,
2024, pp. 1117. S. Mora, N. Faltin, U. Cress, Computer-supported reflective learning: how apps can
[29] H. Hong, C. Viriyavejakul, P. Vate-U-Lan, Enhancing critical thinking skills: foster reflection at work, Behav. Inf. Technol. 39 (2019) 167187.
exploring generative AI-enabled cognitive offload instruction in English essay [62] A. Freiberg-Hoffmann, A. Romero-Medina, B. López-Fernández, M. Fernández-
writing, 4, ECOHUMANISM Учредители: Transnational Press, London, p. 2024. Liporace, Learning approaches: cross-cultural differences (SpainArgentina) and
[30] D.H. Schunk, B.J. Zimmerman, Motivation and Self-Regulated learning: Theory, academic achievement in college students, Span. J. Psychol. 26 (2023) e16.
research, and Applications, Routledge, 2012. [63] A. Kobylarek, K. Błaszczyński, L. Ślósarz, M. Madej, Critical thinking Questionnaire
[31] P.H. Winne, A.F. Hadwin, N.E. Perry. Metacognition and computer-supported (CThQ)construction and application of a critical thinking test tool, Andragogy
collaborative learning, The International Handbook of Collaborative Learning, Adult Educ. Soc. Mark. 2 (2) (2022), 1-1.
Routledge, 2013, pp. 462479. [64] J. Dewey, An analysis of reflective thought, J. Philos. (1922) 2938.
[32] Y. Su, Y. Li, H. Hu, et al., Exploring college English language learners self and [65] D.T. Campbell, J.C. Stanley, Experimental and Quasi-Experimental Designs For
social regulation of learning during wiki-supported collaborative reading activities, Research, Ravenio Books, 2015.
Int. J. Comput.-Support. Collab. Learn. 13 (2018) 3560. [66] M.M. Plack, M. Driscoll, S. Blissett, R. McKenna, T.P. Plack, A method for assessing
[33] F. Tu, L. Wu, Kinshuk, et al., Exploring the influence of regulated learning reflective journal writing, J. Allied Health 34 (4) (2005) 199208.
processes on learners prestige in project-based learning, Educ. Inf. Technol. 30 (2) [67] L. Wang, G. Wu, J. Wu, A Study on the Reflective Level of Teachers
(2025) 22992329. Autobiography, Global Education Outlook (01), (2018) 93105.
[34] S. Zhang, J. Chen, Y. Wen, H. Chen, Q. Gao, Q. Wang, Capturing regulatory [68] H.T. Hou, Integrating cluster and sequential analysis to explore learners flow and
patterns in online collaborative learning: a network analytic approach, Int. J. behavioral patterns in a simulation game with a situated-learning context for
Comput.-Support. Collab. Learn. 16 (2021) 3766. science courses: a video-based process exploration, Comput. Human Behav. 48
[35] J. Zheng, W. Xing, G. Zhu, Examining sequential patterns of self-and socially (2015) 424435.
shared regulation of STEM learning in a CSCL environment, Comput. Educ. 136
(2019) 3448.
12
Y. Zheng et al. Computer Standards & Interfaces 97 (2026) 104094
[69] G. Zang, M. Liu, B. Yu, The application of 5G and artificial intelligence technology [83] J. Knight, D. Weaver, M. Peffer, Z. Hazlett, Relationships between prediction
in the innovation and reform of college English education, Comput. Intell. accuracy, metacognitive reflection, and performance in introductory genetics
Neurosci. 2022 (1) (2022) 9008270. students, CBE Life Sci. Educ. 21 (3) (2022) ar45.
[70] A. Maedche, C. Legner, A. Benlian, B. Berger, H. Gimpel, T. Hess, O. Hinz, [84] D. Difrancesca, J. Nietfeld, L. Cao, A comparison of high and low achieving
S. Morana, M. Söllner, AI-based digital assistants, Bus. Inf. Syst. Eng. 61 (2019) students on self-regulated learning variables, Learn. Individ. Differ. 45 (2016)
535544. 228236.
[71] M. Sigman, D. Slezak, L. Drucaroff, S. Ribeiro, F. Carrillo, Artificial and Human [85] S A Gani, D Fajrina, R Hanifa, Students learning strategies for developing speaking
intelligence in mental health, AI Mag. 42 (2021) 3946. ability[J], Stud. Eng. lang. educ. 2 (1) (2015) 1628.
[72] M.A. Rusandi, I. Saripah, D.M. Khairun, No worries with ChatGPT: building bridges [86] M. Yip, Differences between high and low academic achieving university students
between artificial intelligence and education with critical thinking soft skills, in learning and study strategies: a further investigation, Educ. Res. Eval. 15 (2009)
J. Public Health. 45 (3) (2023) e602e603. 561570.
[73] X. Xia, X. Li, Artificial intelligence for higher education development and teaching [87] H.K. Etkin, K.J. Etkin, R.J. Carter, C.E. Rolle, Differential effects of GPT-based tools
skills, Wirel, Commun. Mob. Comput. 2022 (1) (2022) 7614337. on comprehension of standardized passages, Front. Educ. 10 (2025) 1506752.
[74] Y. Mohamud, A. Marof, A. Mohamed, M. Uzir, A narrative review on the impact of [88] S. Ruan, A. Nie, W. Steenbergen, J. He, J.Q. Zhang, M. Guo, et al., A reinforcement
applied artificial intelligence tools on higher secondary students, Int. J. Acad. Res. learning tutor better supported lower performers in a math task, Mach. Learn. 113
Bus. Soc. Sci. 13 (14) (2023) 3442. (2024) 30233048.
[75] J. Cronje, Exploring the role of ChatGPT as a peer coach for developing research [89] D.R. Thomas, J. Lin, E. Gatz, A. Gurung, S. Gupta, K. Norberg, et al., Improving
proposals: feedback quality, prompts, and student reflection, Electron. J. (2024) student learning with hybrid human-AI tutoring: a three-study quasi-experimental
22.2, e-Learn. investigation, in: Proc. 14th Learn. Anal. Knowl. Conf., 2024, pp. 404415. New
[76] I. Wolfbauer, V. Pammer-Schindler, K. Maitz, C. Rosé, A script for conversational York, NY, USA: Association for Computing Machinery (LAK 24).
reflection guidance: a field study on developing reflection competence with [90] Y Xu, J Zhu, M Wang, et al., The impact of a digital game-based AI chatbot on
apprentices, IEEE Trans. Learn. Technol. 15 (2022) 554566. students academic performance, higher-order thinking, and behavioral patterns in
[77] F. Leigh, Platonic dialogue, maieutic method, and critical thinking, J. Philos. Educ. an information technology curriculum[J], App. Sci. 14 (15) (2024) 6418.
41 (2008) 309323. [91] Maloney, A., Roberts, D. A., & Sully, J. (2022). A solvable model of neural scaling
[78] E. Deci, R. Ryan. Intrinsic motivation and self-determination in Human behavior, laws. arXiv preprint arXiv:2210.16859.
1975, pp. 1371. [92] W. Fedus, B. Zoph, N. Shazeer, Switch transformers: scaling to trillion parameter
[79] J. Uygur, E. Stuart, M. Paor, E. Wallace, S. Duffy, M. OShea, S. Smith, models with simple and efficient sparsity, J. Mach. Learn. Res. 23 (120) (2022)
T. Pawlikowska, The Best evidence in Medical Education systematic review to 139.
determine the most effective teaching methods that develop reflection in medical [93] K. Seo, J. Tang, I. Roll, S. Fels, D. Yoon, The impact of artificial intelligence on
students: BEME Guide No. 51, Med. Teach. 41 (2019) 316. learnerinstructor interaction in online learning, Int. J. Educ. Technol. High. Educ.
[80] K. Arendt, L. Stark, A. Friedrich, R. Brünken, R. Stark, Quality of reflections on 18 (1) (2021) 54.
teaching: approaches to its measurement and low-threshold promotion, Educ. Sci. [94] B. Klimova, M. Pikhart, J. Kacetl, Ethical issues of the use of AI-driven mobile apps
15 (7) (2025) 884. for education, Front. Public Health 10 (2023) 1118116.
[81] J. Jung, Y. Lu, A. Ding, How do prompts shape preservice teachers reflections? A [95] T. Adiguzel, M. Kaya, F. Cansu, Revolutionizing education with AI: exploring the
case study in an online technology integration class, J. Teach. Educ. 73 (3) (2021) transformative potential of ChatGPT, Contemp. Educ. Technol. 15 (3) (2023).
301313. [96] M. Thottoli, B. Alruqaishi, A. Soosaimanickam, Robo academic advisor: can
[82] A. Sturgill, P. Motley, Methods of reflection about service learning: guided vs. free, chatbots and artificial intelligence replace human interaction? Contemp. Educ.
dialogic vs. expressive, and public vs. private. Teaching and learning inquiry, Technol. 16 (1) (2024) ep485.
ISSOTL J. 2 (1) (2014) 8193.
13

View File

@@ -0,0 +1,883 @@
Computer Standards & Interfaces 97 (2026) 104107
Contents lists available at ScienceDirect
Computer Standards & Interfaces
journal homepage: www.elsevier.com/locate/csi
MExpm: Fair computation offloading for batch modular exponentiation with
improved privacy and checkability in IoV
,1
Sipeng Shen 1 , Qiang Wang , Fucai Zhou, Jian Xu , Mingxing Jin
Software College, Northeastern University, China
ARTICLE INFO ABSTRACT
Keywords: Modular exponentiation is a fundamental cryptographic operation extensively applied in the Internet of
Internet of Vehicles Vehicles (IoV). However, its computational intensity imposes significant resource and time demands on
Modular exponentiation intelligent vehicles. Offloading such computations to Mobile Edge Computing (MEC) servers has emerged as
Computation offloading
a promising approach. Nonetheless, existing schemes are generally impractical, as they either fail to ensure
Smart contract
fairness between intelligent vehicles and MEC servers, lack privacy protection for the bases and exponents,
or cannot guarantee the correctness of results with overwhelming probability due to potential misbehavior by
MEC servers. To address these limitations, we propose MExpm, a fair and efficient computation offloading
scheme for batch modular exponentiation under a single untrusted server model. Our scheme leverages
blockchain technology to ensure fairness through publicly verifiable results. Furthermore, MExpm achieves
high checkability, offering a near-perfect probability of checkability. To enhance privacy, we introduce secure
obfuscation and logical split techniques, effectively protecting both the bases and the exponents. Extensive
theoretical analysis and experimental results demonstrate that our scheme is not only efficient in terms of
computation, communication, and storage overheads but also significantly improves privacy protection and
checkability.
1. Introduction requirements of intelligent vehicles [10,11]. Despite these benefits, it
still suffers from some security challenges. Once the computation tasks
1.1. Motivation are offloaded, it will lose control over them. As a result, the MEC
server may forge the outcome of the computation. To address this issue,
Batch modular exponentiation, a fundamental mathematical opera- verifiable CO was first proposed by [12] to ensure the integrity of the
𝑎
tion, denoted as 𝑛𝑖=1 𝑢𝑖 𝑖 mod 𝑁, which is widely used in the Internet results. A fundamental requirement for verifiable CO is that the total
of Vehicles (IoV) (i.e., key exchange, digital signatures, and identity time invested in the verification process should be less than the time
authentication) and is assumed as one of the most resource-intensive spent performing the computation by himself. Otherwise, the intelligent
operations. Considering limited computation resources in intelligent vehicle would not prefer to offload its computation.
vehicles, locally executing the above task is unviable, which cannot
meet both computation resources and time latency requirements [1].
1.2. Limitations of prior art
To tackle this challenge, computation offloading (CO) is proposed to
undertake resource-intensive computation tasks for intelligent vehi-
In this paper, we mainly focus on verifiable computation offloading
cles [2]. However, current cloud computation paradigm for modular
for batch modular exponentiation with MEC servers. However, to the
exponentiation offloading in [38] fails to meet the requirements of low
best of our knowledge, none of the existing prior schemes are practical
latency, location awareness, and mobility support [9], since the cloud
enough, as demonstrated in Fig. 1. They suffer from the following
servers are far from the vehicles, it is a challenge for network transfer
latency. To overcome the limitations of cloud computation, offloading challenges.
computational tasks from intelligent vehicles to MEC servers, being Fairness. Most verifiable CO schemes for batch modular exponen-
closer to intelligent vehicles than cloud servers, can provide adequate tiation make sure the results are correct for the client before paying
computation resources for offloaded tasks while meeting the latency but often disregard the clouds interests. As a result, the client might
Corresponding author.
E-mail address: wangqiang1@mail.neu.edu.cn (Q. Wang).
1
Equal contribution.
https://doi.org/10.1016/j.csi.2025.104107
Received 3 June 2025; Received in revised form 12 August 2025; Accepted 28 November 2025
Available online 3 December 2025
0920-5489/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
S. Shen et al. Computer Standards & Interfaces 97 (2026) 104107
Fig. 1. Limitations of Prior Art and Our Defenses: Scenario 1: Previous works adopt private verification algorithm. Under this assumption, the greedy intelligent
vehicle may reject the correct computation results and refuse to pay for the MEC servers work. Scenario 2: Previous works with low checkability fail to detect
MEC servers misbehavior. Scenario 3: Previous works with plaintext offloading strategy fail to protect the confidentiality of inputs and outputs.
refuse to pay by deliberately claiming that the MEC server returns Table 1
an incorrect result even when executed faithfully. Furthermore, the Comparison of properties.
cloud may intentionally manipulate the computation outcome for some Scheme Batch size Privacy Checkability rate Verification Fairness
119
economic incentives. When a dispute occurs between them, a fully MExp [3] 1 × 120
Private ×
119
trusted third party (TTP), such as a judge, has to be involved to deduce SMCExp [4] 1 × 120
Private ×
119
which party is wrong. As an ex-post measure, the dispute can be finally SoRSA [6] 1 × 120
Private ×
handled, but it is unfriendly for time-sensitive IoV applications [13]. EPExp [7] 1 × 0 Private ×
MExpm(ours) 1 ✓ ≈1 Public ✓
Therefore, it is essential to find an immediate resolution without TTP to 2
guarantee fairness between the MEC server and the intelligent vehicle. MExp [3] n × 1 10(4𝑛2𝑛+6𝑛+2) Private ×
2
Due to transparency, accountability, and immutability, blockchain can SMCExp [4] n × 1 10(4𝑛2𝑛+6𝑛+2) Private ×
1
be used to establish trust among untrusted parties. A naive solution is to GExp [5] n × 𝑛+1
Private ×
𝑛2
delegate the entire computation to the blockchain. It is inefficient and MExpm(ours) n ✓ 1 (4𝑛2 +6𝑛+2)(𝑁2) Public ✓
imposes financial burdens on intelligent vehicles, such as significant gas Batch Size: The number of bases in one offloading;
fees for modular exponentiation in Ethereum. Besides, this approach Privacy: Whether it can protect privacy of bases and exponents;
seriously deviates from the original intent of computation offloading. Checkability Rate: The Checkability of Offloading scheme;
Verification: Verification method for offloading;
Checkability Rate. The existing schemes employ verification mech- Fairness: The fairness for both service provider and intelligent vehicles;
anisms to ensure computation correctness against malicious MEC ✓: means the scheme achieves this property; ×: means it does not.
servers [14]. However, the achieved checkability rate often falls short
of expectations, failing to reach 100%. For example, in [3,4], the
checkability rate is only 97.5% when the batch size 𝑛 = 1000. In
a fair computation offloading task, the verification algorithm should
other words, intelligent vehicles may fail to detect misbehavior by
be public. To tackle this challenge, a straightforward approach is to
malicious MEC servers with a 2.5% probability. Besides, the intelligent
process the computation using fully homomorphic encryption (FHE).
vehicle may make misjudgments even if the MEC servers return correct Specifically, the bases 𝑢𝑖 are encrypted using the data owners public
results. When disputes arise between the intelligent vehicle and the key and outsourced to the MEC server. The intelligent vehicle encodes
MEC servers, as previously mentioned, a complex procedure involving the queries 𝑎𝑖 under the same public key. To recover the final result
TTP is imperative. This ex-post measure is valid, but it is unsuitable for returned by the MEC server, the private key of the data owner should
IoV time-sensitive applications. Furthermore, there is no such a fully be shared with the intelligent vehicle. If the private key is leaked, it
trusted entity in the real world. will cause serious privacy issues [15,16]. Furthermore, the intelligent
Privacy. Most of the existing schemes offload modular exponenti-
𝑎 vehicle cannot afford this heavy computation owing to the limitation
ation 𝑛𝑖=1 𝑢𝑖 𝑖 mod 𝑁 in a plaintext way [6,7]. If we directly apply of resources.
them into IoV, this inevitably comes with the privacy concern. Modular Compared with existing works in Table 1, MExpm supports privacy
exponentiation plays a critical role in secure cryptography algorithms and fairness both for service providers and intelligent vehicles. Our
(i.e., key exchange, digital signatures, identity authentication). In this contributions can be summarized as follows.
case, the MEC server knows the base 𝑢𝑖 , exponent 𝑎𝑖 , and output which
is the result of 𝑢𝑎 ( mod 𝑁), so it will increase the risks of privacy 1. To the best of our knowledge, we are the first ones to attempt fair
leakage and attacks. From this point, it is essential to protect the computation offloading of batch modular exponentiation under
privacy of the bases, exponents, and results. To tackle this challenge, a single untrusted server model, which is more appropriate for
some researchers [35] utilize the logical split technique to protect practical applications.
privacy. The security relies on a strong assumption that the auxiliary 2. We integrate smart contracts into the verification process to en-
information cannot be known by the malicious adversary. Therefore, sure fairness and correctness. Compared with existing schemes,
the verification algorithm can only be executed by the data owner. For our approach incurs lower gas consumption.
2
S. Shen et al. Computer Standards & Interfaces 97 (2026) 104107
3. We employ a logical split method and secure obfuscation tech-
niques to conceal the bases, exponents, and modulus before
offloading computation. Consequently, MExpm achieves near-
perfect checkability rate.
2. Related work
2.1. Computation offloading
Intelligent vehicles, with limited computation resources and an
increasing number of in-car applications, struggle to efficiently execute
computation-intensive tasks. To address the challenges faced by intel-
ligent vehicles, computation offloading has been proposed. It transfers
communication, computation, and storage tasks to MEC servers situ-
ated around intelligent vehicles [17]. Existing computation offloading
schemes mainly focus on computation efficiency [1719], resource allo-
cation [20,21], or decision-making optimization [11] task. While these
schemes lay the foundation for offloading computationally intensive
tasks to MEC servers, they often lack adequate security considerations, Fig. 2. The architecture of system model.
leading to a gap in verifiable computation offloading for batch modular
exponentiation.
3. Fair computation offloading for batch modular exponentiation
2.2. Secure outsourcing algorithm for modular exponentiation scheme in IoV
3.1. System model
The secure outsourcing algorithm for modular exponentiation can
be categorized into single and dual-server models. Dual-server model
As illustrated in Fig. 2, the fair computation offloading for batch
assumes that there is no complicity risk between cloud servers [2225].
modular exponentiation in IoV mainly comprises four entities: Service
This assumption, complex to implement in real-world applications, is
Agency (SA), Intelligent Vehicle (IV), Roadside Unit (RSU), and MEC
vulnerable to collusion attacks between servers. Therefore, we mainly Server.
consider the single-server model, which is first proposed in 2006 by SA: It is an honest entity. It provides the intelligent vehicle with the
Dijk et al. [26]. Recently, numerous algorithms have been proposed to initialized bases 𝑢𝑖 and modulus 𝑁 of the batch modular exponentiation
improve checkability rate [5,27,28]. In 2016, Ding et al. [3] proposed ∏ 𝑎
task 𝑛𝑖=1 𝑢𝑖 𝑖 mod 𝑁, and its communication with intelligent vehicles
a modular exponentiation outsourcing scheme with checkability rate is based on secure channels.
close to 119
120
, especially when batch size 𝑛 = 1, which is rather higher IV: It is a resource-limited entity. It does not trust the MEC server,
than before. Thereafter, Su et al. [4], in 2020, expanded Dings method, but it wants to offload some requests 𝑎𝑖 to the MEC server, where
optimized the logical split, and changed the modulus of the algorithm 𝑖 ∈ {1, … , 𝑛}. Furthermore, it may try to get the result without paying
to a composite number. Recent schemes including SoRSA [6] and EP- by intentionally saying that the clouds computation result is wrong.
Exp [7] assume that bases in computation tasks are ciphertext and lack RSU: It is an untrusted entity, which serves as a full node of the
the consideration of security of bases. The checkability rate of these blockchain. It provides verifiable services to guarantee the integrity of
methods is still far from 1, and it can result in certain security risks. the result.
Meanwhile, many of these schemes concurrently present outsourcing MEC Server: It is a powerful entity deployed at the networks
algorithms for 𝑢𝑎 . Nevertheless, a single modular exponentiation out- edge with adequate computation resources, which is responsible for
sourcing algorithm represents a specific instance of batch modular performing the computation offloading tasks for the intelligent vehicle.
exponentiation outsourcing with batch size 𝑛 = 1. Similar to the intelligent vehicle, it is also a profit-driven entity. It
would like to get the reward from the intelligent vehicle without
performing the computation.
2.3. Fair computation A fair computation offloading for batch modular exponentiation
(MExpm) in IoV consists of the following algorithms.
Recently, blockchain and smart contracts have been proposed to (𝑃 𝑎𝑟𝑎𝑚𝑠, 𝑅𝐾) ← 𝑆𝑒𝑡𝑢𝑝(1𝜆 , 𝑢1 , … , 𝑢𝑛 , 𝑁). Given a security parameter
𝜆, the bases 𝑢1 , … , 𝑢𝑛 and 𝑁, SA invokes this algorithm to generate the
address these fairness issues [29]. Smart contracts can provide a secure
public parameters 𝑃 𝑎𝑟𝑎𝑚𝑠 and the recovery key 𝑅𝐾, where 𝑢𝑖 and 𝑁
solution for participants to execute contracts on Ethereum, essentially
are the base and modulus for modular exponentiation tasks.
being executable code with correctness, transparency, and immutabil-
(𝑇 𝐾, 𝑉 𝐾, 𝐴𝑢𝑥) ← 𝐾𝑒𝑦𝐺𝑒𝑛(𝑎1 , … , 𝑎𝑛 , 𝑃 𝑎𝑟𝑎𝑚𝑠). On inputting the ex-
ity [30]. Although there are some studies utilizing smart contract to
ponents 𝑎1 , … , 𝑎𝑛 and the public parameters 𝑃 𝑎𝑟𝑎𝑚𝑠, IV runs this
fulfill fair computation, they either rely on the assumption that the
algorithm to generate the evaluation key 𝑇 𝐾 for performing the compu-
client and the cloud are honest [31], or utilize smart contract con- tation task, witness generation key 𝑉 𝐾 and auxiliary information 𝐴𝑢𝑥.
ducting complex computation tasks [29,32,33]. However, in standard It is worth noting that this algorithm can be carried out entirely offline
blockchain systems such as Ethereum, users are typically charged gas before the online phase, so it does not introduce additional latency
fees based on the complexity of the computational task running in the during computation outsourcing. The input 𝑎𝑖 is the exponent of 𝑢𝑖 ,
smart contract. The gas fees for smart contracts are recorded in the where 𝑖 ∈ {1, … , 𝑛}.
fee table EIP150 [34]. Generally, the cost of Ethereum is high, and (𝜎E , 𝜋𝐸 ) ← 𝐶𝑜𝑚𝑝𝑢𝑡𝑒(𝑇 𝐾, 𝑉 𝐾). On inputting the evaluation key
considering that modular exponentiation is an expensive computation 𝑇 𝐾 and witness generation key 𝑉 𝐾, the MEC server performs this
task, Existing schemes may increase the financial burden on users. algorithm to produce the encoding result 𝜎E and witness 𝜋𝐸 .
3
S. Shen et al. Computer Standards & Interfaces 97 (2026) 104107
Table 2 𝟐. 𝑲𝒆𝒚𝑮𝒆𝒏(𝒂1 , … , 𝒂𝒏 , 𝑷 𝒂𝒓𝒂𝒎𝒔) This algorithm is executed by the
Notations. IV to construct the evaluation key 𝑇 𝐾 for performing the computation
Symbols Descriptions task, the witness generation key 𝑉 𝐾, and auxiliary information 𝐴𝑢𝑥.
{𝑢1 , 𝑢2 , … , 𝑢𝑛 } Computation bases Notably, the IV can execute this procedure in an offline manner, thereby
𝜆 Security Parameter
avoiding additional delays in the online authentication or verification
𝑝 512-bit prime integer
𝑁 512-bit prime integer phase. This algorithm works as follows:
𝐿 A composite integer 𝐿 = 𝑝𝑁 (a) IV parses 𝑃 𝑎𝑟𝑎𝑚𝑠 as {𝐿, 𝑦𝑖 } and the input exponents 𝑎𝑖 ∈ Z𝜙(𝐿) .
𝑘 A random integer
(b) IV runs RandN program [35] four times to generate four blinding
𝜏 A composite integer 𝜏 = 𝑘𝑁
{𝑦1 , 𝑦2 , … , 𝑦𝑛 } Bases after secure obfuscation pairs (𝑘1 , 𝑔 𝑘1 ), (𝑘2 , 𝑔 𝑘2 ), (𝑘3 , 𝑔 𝑘3 ), (𝑘4 , 𝑔 𝑘4 ) and sets:
(𝑘𝑖 , 𝑔 𝑘𝑖 ), 𝑖 ∈ {1, 2, 3, 4} Random pairs generated by RandN algorithm
{𝑎1 , 𝑎2 , … , 𝑎𝑛 } Computation exponents
𝑣1 = 𝑔 𝑘1 mod 𝐿, 𝑣2 = 𝑔 𝑘2 mod 𝐿,
(2)
𝜙(⋅) Eulers function 𝑣3 = 𝑔 𝑘3 mod 𝐿, 𝑣4 = 𝑔 𝑘4 mod 𝐿.
{𝑤𝑖 , 𝑧1 , 𝛿1 , 𝑚𝑖 }, 𝑖 ∈ {1, … , 𝑛} Computation tasks after logical division
𝑟 ∈ {2, … , 𝑁} Random integer where 𝑔 ∈ Z𝐿 and its order is 𝜙(𝐿).
𝜉 ∈ {1, … , 𝑛} Random index (c) IV performs logical split to compute 𝑤𝑖 , 𝑧1 , 𝛿1 , and 𝑚𝑖 such that
𝑑 Modular Multiplicative Inverse of 𝑎𝜉
{𝑤𝑖 , 𝑧2 , 𝛿2 , 𝑚′𝑖 }, 𝑖 ∈ {1, … , 𝑛} Verification tasks after logical division
{𝜎𝐸 , 𝜋𝐸 } Computation results returned by MEC server 𝑤𝑖 = 𝑦𝑖 𝑣1 (mod 𝐿),
( )
𝑘1 𝑎1 + 𝑎2 + ⋯ + 𝑎𝑛 = 𝑘3 + 𝛿1 𝑧1 (mod 𝜙(𝐿)), (3)
𝑎𝑖 = 𝛿1 𝑧1 + 𝑚𝑖 (mod 𝜙(𝐿)).
{01, 𝜎E } ← 𝑉 𝑒𝑟𝑖𝑓 𝑦(𝜎E , 𝜋𝐸 , 𝐴𝑢𝑥). On inputting the encoding result (d) IV chooses two random integers 𝑟 ∈ {2, … , 𝑁} and 𝜉 ∈ {1, … , 𝑛}
𝜎E , the witness 𝜋𝐸 and auxiliary information 𝐴𝑢𝑥, the RSU runs this and computes 𝑑, where 𝑎𝜉 𝑑 ≡ 1 (mod 𝜙(𝐿)).
algorithm to check whether the MEC server returns a correct result
(e) IV computes 𝑤𝑖 , 𝑧2 , 𝛿2 , and 𝑚′𝑖 such that
utilizing smart contract. If not, it outputs 0, 1 and 𝜎E otherwise.
𝑅𝑒𝑠𝑢𝑙𝑡𝑅𝑒𝑐𝑜𝑣𝑒𝑟𝑦(𝜎E , 𝑅𝐾). On inputting 𝜎E and recovery key 𝑅𝐾, 𝑤𝑖 = 𝑦𝑖 𝑣2 (mod 𝐿),
the intelligent vehicle runs this algorithm to decode the true result ( )
𝑘2 𝑎1 + 𝑎2 + ⋯ + 𝑎𝑛 = 𝑘4 + 𝛿2 𝑧2 (mod 𝜙(𝐿)), (4)
𝑅𝑒𝑠𝑢𝑙𝑡.
𝑎𝑖 = 𝛿2 𝑧2 + 𝑚′𝑖 (mod 𝜙(𝐿)).
3.2. Overview of construction and notations Especially when 𝑖 = 𝜉, we have 𝑦′𝜉 = 𝑦𝜉 𝑟𝑑 ( mod 𝐿) and 𝑤′𝜉 = 𝑦′𝜉 𝑣2 ( mod
𝐿), where 𝜉 ∈ [1, 𝑛] is a random integer.
Similar to [3], the bases and exponents are protected using logical (f) IV sets 𝑇𝐾 = {(𝑔 𝑛𝑖=1 𝑤𝑖 , 𝑧1 ), (𝑤𝑖 , 𝑚𝑖 )𝑖∈[𝑛] },
∏𝑛
split. A recovery algorithm is also involved to protect the confidentiality 𝑉 𝐾 = {(𝑔 𝑖=1 𝑤𝑖 , 𝑧2 ), (𝑤𝑖 , 𝑚′𝑖 )𝑖∈[𝑛] } and 𝐴𝑢𝑥 = {𝑟𝑣3 , 𝑣4 , 𝛿1 , 𝛿2 }, where
of the final result. At the setup phase, we utilize the secure obfuscation 𝛿1 , 𝛿2 ∈ Z𝜙(𝐿) . The pseudo code of the key generation procedure can be
technique to hide the modulus 𝑁 and bases 𝑢𝑖 . In 𝑆𝑒𝑡𝑢𝑝 step (a), only found in Algorithm 1.
the masked modulus 𝐿 = 𝑝𝑁 is sent to MEC server, so the MEC
server cannot get any information about 𝑁 without the mask factor
Algorithm 1: KeyGen Algorithm
𝑝 chosen and kept privately by the User. To prevent the MEC server
from learning the original bases 𝑢𝑖 , we apply a modular obfuscation Input: Exponents 𝑎1 , ⋯ , 𝑎𝑛 ∈ Z𝜙(𝐿) , public parameters
technique by embedding each base into a larger modular space (i.e., Eq. 𝑃 𝑎𝑟𝑎𝑚𝑠 = {𝐿, 𝑦𝑖 }
(1)) in 𝑆𝑒𝑡𝑢𝑝 step (b). Since 𝑘 and 𝑝 are sampled uniformly, the Output: Evaluation key 𝑇 𝐾, verification key 𝑉 𝐾, auxiliary info
adversarial MEC server cannot recover them. The original computation 𝐴𝑢𝑥
∏𝑛 𝑎𝑖 mod 𝑁 is converted into ∏𝑛 𝑦 𝑎𝑖 mod 𝐿. 1 Parse 𝑃 𝑎𝑟𝑎𝑚𝑠 as {𝐿, 𝑦𝑖 } ; // Step (a)
offloading task 𝑖=1 𝑢𝑖 𝑖=1 𝑖
2 Run RandN algorithm four times to get
The privacy of the exponent 𝑎𝑖 is ensured by the logical split, where
𝑎𝑖 = 𝛿1 ⋅ 𝑧𝑖 + 𝑚𝑖 mod 𝜙(𝐿). Since the standard integer factorization (𝑘1 , 𝑔 𝑘1 ), (𝑘2 , 𝑔 𝑘2 ), (𝑘3 , 𝑔 𝑘3 ), (𝑘4 , 𝑔 𝑘4 ); // Step (b)
𝑘 𝑘
3 Compute 𝑣1 = 𝑔 1 mod 𝐿, 𝑣2 = 𝑔 2 mod 𝐿, 𝑣3 = 𝑔 3 mod 𝐿,
𝑘
assumption holds, the adversary cannot derive their factors 𝑝 and 𝑁
from 𝐿. Without factors 𝑝 and 𝑁, it is infeasible to compute 𝜙(𝐿) = 𝑣4 = 𝑔 𝑘4 mod 𝐿; // Compute Equation 2
4 Compute 𝑘1 (𝑎1 + ⋯ + 𝑎𝑛 ) = 𝑘3 + 𝛿1 𝑧1 mod 𝜙(𝐿) ; // Equation
(𝑝 1)(𝑁 1).𝜙(𝐿) = (𝑝 1) ⋅ (𝑁 1). As a result, the reduction modulo
𝜙(𝐿) effectively hides the underlying value, which makes it infeasible 3
5 for 𝑖 ← 1 to 𝑛 do
to recover 𝑎𝑖 from 𝛿1 ⋅ 𝑧𝑖 + 𝑚𝑖 mod 𝜙(𝐿). Furthermore, the malicious
adversary learns nothing about the final computation result without the
6 Compute 𝑤𝑖 = 𝑦𝑖 𝑣1 mod 𝐿 ; // Equation 3
recovery key. A detailed description of the notations used in MExpm
7 Compute 𝑎𝑖 = 𝛿1 𝑧1 + 𝑚𝑖 mod 𝜙(𝐿); // Equation 3
can be found in Table 2. 8 Sample 𝑟 ∈ {2, ⋯ , 𝑁} and 𝜉 ∈ {1, ⋯ , 𝑛} randomly ; // Step
(d)
9 Compute 𝑑 = 𝑎
1 mod 𝜙(𝐿); // Step (d)
3.3. Detailed construction 𝜉
10 Compute 𝑘2 (𝑎1 + ⋯ + 𝑎𝑛 ) = 𝑘4 + 𝛿2 𝑧2 mod 𝜙(𝐿); // Equation 3
𝟏. 𝑺𝒆𝒕𝒖𝒑(𝟏𝝀 , 𝒖1 , … , 𝒖𝒏 , 𝑵) This algorithm is run by SA. Given 11 for 𝑖 ← 1 to 𝑛 do
a security parameter 𝜆 and a 512-bit prime integer 𝑁, SA works as 12 Compute 𝑤𝑖 = 𝑦𝑖 𝑣2 mod 𝐿 ; // Equation 3
follows: 13 Compute 𝑎𝑖 = 𝛿2 𝑧2 + 𝑚′𝑖 mod 𝜙(𝐿); // Equation 3
(a) SA generates a 512-bit prime integer 𝑝, and computes 𝐿 = 𝑝𝑁. 14 Randomly Select 𝜉 ∈ [1, 𝑛], and Update 𝑦′𝜉 = 𝑦𝜉 ⋅ 𝑟𝑑 mod 𝐿,
(b) SA uniformly chooses 𝑘 from 𝑍𝑁 and computes 𝜏 = 𝑘𝑁. For any 𝑤′𝜉 = 𝑦′𝜉 𝑣2 mod 𝐿 ; // Step (e)
𝑖 ∈ {1, 2, … , 𝑛}, SA sets 𝑦𝑖 as follows: ∏
15 Set 𝑇 𝐾 = {(𝑔𝑛𝑖=1 𝑤𝑖 , 𝑧1 ), (𝑤𝑖 , 𝑚𝑖 )𝑖∈[𝑛] }; // Step (f)
∏𝑛
𝑦𝑖 = 𝑢𝑖 + 𝜏 mod 𝐿 (1) 16 Set 𝑉 𝐾 = {(𝑔
𝑖=1 𝑤𝑖 , 𝑧2 ), (𝑤𝑖 , 𝑚𝑖 )𝑖∈[𝑛] }; // Step (f)
17 Set 𝐴𝑢𝑥 = {𝑟𝑣3 , 𝑣4 , 𝛿1 , 𝛿2 }; // Step (f)
(c) SA sets 𝑃 𝑎𝑟𝑎𝑚𝑠 = {𝐿, 𝑦𝑖 } and 𝑅𝐾 = {𝑁}, where 𝑅𝐾 is 18 return 𝑇 𝐾, 𝑉 𝐾, 𝐴𝑢𝑥;
transmitted via a secure channel between SA and IV.
4
S. Shen et al. Computer Standards & Interfaces 97 (2026) 104107
1. 𝑆𝑒𝑡𝑢𝑝: Generate two distinct secure primes 𝑝 = 13, 𝑁 = 11.
Compute 𝐿 = 𝑝𝑁 = 143. Then generate 𝑢 = 128, 𝑎 = 79.
𝟑. 𝑪𝒐𝒎𝒑𝒖𝒕𝒆(𝑻 𝑲, 𝑽 𝑲) This algorithm is run by MEC server to 2. 𝐶𝑜𝑚𝑝𝑢𝑡𝑒: Locally compute result 𝑅𝑒𝑠𝑢𝑙𝑡 = 12879 mod 43 = 8.
generate encoding result 𝜎𝐸 and witness result 𝜋𝐸 . The MEC server
works as follows: The proposed MExpm consists of the following procedures.
(a) MEC server parses 𝑇 𝐾 as {(𝑔 𝑛𝑖=1 𝑤𝑖 , 𝑧1 ), (𝑤𝑖 , 𝑚𝑖 )𝑖∈[𝑛] } and 𝑉 𝐾
∏𝑛
as {(𝑔 𝑖=1 𝑤𝑖 , 𝑧2 ), (𝑤𝑖 , 𝑚𝑖 )𝑖∈[𝑛] }, and then sets 𝛾𝑖 = (𝑤𝑖 )𝑚𝑖 and 𝛾𝑖 = (𝑤𝑖 )𝑚𝑖
1. 𝑆𝑒𝑡𝑢𝑝: SA generates 𝑝 = 13, 𝑁 = 11 and computes 𝐿 = 𝑝⋅𝑁 = 143.
for any 𝑖 ∈ {1, … , 𝑛}, respectively. Then SA generates base 𝑢 = 128 and utilizes random integer
( ∏ )𝑧 ( ∏ )𝑧
(b) MEC server sets 𝑄0 = 𝑔 𝑛𝑖=1 𝑤𝑖 1 and 𝑄1 = 𝑔 𝑛𝑖=1 𝑤𝑖 2 . 𝑘 = 5 to compute 𝑦 = 𝑢 + 𝑘𝑁 mod 𝐿 = 40.
(c) MEC server sets 𝜎𝐸 = {𝑄0 , (𝛾𝑖 )𝑖∈[𝑛] } and 𝜋𝐸 = {𝑄1 , (𝛾𝑖 )𝑖∈[𝑛] }. 2. 𝐾𝑒𝑦𝐺𝑒𝑛: Intelligent Vehicle runs 𝑅𝑎𝑛𝑑𝑁 algorithm to obtain
𝟒. 𝑽 𝒆𝒓𝒊𝒇 𝒚(𝝈 𝑬 , 𝝅 𝑬 , 𝑨𝒖𝒙) The algorithm is run by RSU to check the (63, 125), (42, 25), (52, 113), (82, 69), 𝑔 = 71 and compute 𝑣1 1 =
correctness of the result returned by the MEC. The RSU works as 1251 mod 𝐿 = 135, 𝑣2 1 = 251 mod 𝐿 = 103. Then it gener-
follows: ates a computation task 𝑎 = 79. Thereafter, intelligent vehicle
(a) Upon receiving the encoding result 𝜎𝐸 and witness 𝜋𝐸 , it first generate random integers 𝛿1 = 11, 𝛿2 = 109, 𝑟 = 7 and compute
parses them as {𝑄0 , (𝛾𝑖 )𝑖∈[𝑛] } and {𝑄1 , (𝛾𝑖 )𝑖∈[𝑛] }, respectively. (b) it 𝑑 = 𝑎1 mod 𝜙(𝐿) = 79, 𝑟𝑑 mod 𝐿 = 19. Finally, intelligent
parses auxiliary information 𝐴𝑢𝑥 as {𝑟𝑣3 , 𝑣4 , 𝛿1 , 𝛿2 }. vehicle utilizes Eqs. (3) and (4) to conduct logical split, then
( )𝛿
(c) RSU utilizes smart contract to compute 𝜂 = 𝑄0 1 and then obtain 𝑤 = 109, 𝑔𝑤 = 17, 𝑤 = 59, 𝑔𝑤 = 42, 𝛿1 𝑧1 = 57, 𝑧1 =
check whether the following equation holds: 55, 𝛿2 𝑧2 = 78, 𝑧2 = 44, 𝑚 = 74, 𝑚′ = 83, 𝑟𝑣3 = 76.
𝑛
( )𝛿 ∏ 𝑛 3. 𝐶𝑜𝑚𝑝𝑢𝑡𝑒: MEC server receives the offloading tasks and com-
𝑟𝑣3 ⋅ 𝜂 ⋅ 𝛾𝑖 = 𝑣4 ⋅ 𝑄1 2 ⋅ 𝛾𝑖 (mod 𝐿) (5) pute (𝑔𝑤)𝑧1 = 5955 mod 143 = 43, (𝑔𝑤 )𝑧2 = 4244 mod 143 =
𝑖=1 𝑖=1 126, 𝑤𝑚 = 12, 𝑤 𝑚 = 119.
If not, the smart contract outputs 0 and aborts. Otherwise, outputs 1
∏ 4. 𝑉 𝑒𝑟𝑖𝑓 𝑦: Smart contract is called to verify 𝐿𝑒𝑓 𝑡 = 76 ⋅ 12 ⋅ 4311
and sets 𝜎𝐸 = {𝑟𝑣3 𝜂 𝑛𝑖=1 𝛾𝑖 }. The verification logic of the smart contract
mod 143 = 111 and 𝑅𝑖𝑔𝑡 = 69 ⋅ 126109 ⋅ 119 mod 143 = 111.
can be found in Algorithm 2.
5. 𝑅𝑒𝑐𝑜𝑣𝑒𝑟𝑦: Intelligent Vehicle computes 𝑟1 mod 𝐿 = 41, then
obtain 𝑅𝑒𝑠𝑢𝑙𝑡 = 111 ⋅ 41 mod 11 = 8.
Algorithm 2: Verification Logic of Smart Contract
Input:
𝑄0 , 𝑄1 ∈ Z𝐿 ; (𝛾𝑖 )𝑖∈[𝑛] , (𝛾𝑖 )𝑖∈[𝑛] ∈ Z𝐿 4. Theoretical analysis
Scalars: 𝑟𝑣3 , 𝑣4 , 𝛿1 , 𝛿2 ∈ Z𝐿
Output: Boolean flag indicating verification result 4.1. Correctness
𝛿1
1 𝜂 ←𝑄
0
mod 𝐿 ; // Compute 𝜂
2 prodGamma ← 1;
To prove the correctness, we need to argue that the returned results
3 for 𝑖 ← 1 to 𝑛 do
by the MEC server can pass the verification algorithm and the intel-
4 prodGamma ← prodGamma ⋅ 𝛾𝑖 mod 𝐿; // Accumulate
ligent vehicle can recover the final result if all entities involved are
product of 𝛾𝑖
honest.
5 prodGammaPrime ← 1;
For the first part, we mainly argue it based on Eq. (5). That is, we
6 for 𝑖 ← 1 to 𝑛 do
prove that the 𝜎𝐸 and 𝜋𝐸 can pass the 𝑣𝑒𝑟𝑖𝑓 𝑦 algorithm when MEC
7 prodGammaPrime ← prodGammaPrime ⋅ 𝛾𝑖 mod 𝐿;
server is honest and follows all algorithms mentioned above.
// Accumulate product of proofs 𝛾𝑖
Based on Eq. (4), the right-hand side (𝑅𝐻𝑆) of Eq. (5) can be
8 lhs ← 𝑟𝑣3 ⋅ 𝜂 ⋅ prodGamma mod 𝐿; // Left-hand side of
expressed as:
the equality
𝛿2
9 rhs ← 𝑣4 ⋅ 𝑄 ⋅ prodGammaPrime mod 𝐿; // Right-hand ( )𝛿 ∏ 𝑛
1 𝑅𝐻𝑆 = 𝑣4 ⋅ 𝑄1 2 ⋅ 𝛾𝑖 (mod 𝐿)
side of the equality 𝑖=1
10 return (lhs == rhs); // Return true if verification ( )𝑧2 𝛿2 𝑛
𝑛 ∏ 𝑚′
passes = 𝑔 𝑘4 𝑔 𝑤𝑖 𝑤𝑖 𝑖 (mod 𝐿)
𝑖=1 𝑖=1
( 𝑛 )𝑧2 𝛿 2 𝑛
∏ ∏ 𝑚′
𝟓. 𝑹𝒆𝒄𝒐𝒗𝒆𝒓𝒚(𝝈 𝑬 , 𝑹𝑲) The algorithm is run by IV to recover the = 𝑔 𝑘4 +𝑧2 𝛿2 𝑤𝑖 𝑤𝑖 𝑖 (mod 𝐿)
encoding result 𝜎𝐸 to the true result 𝑅𝑒𝑠𝑢𝑙𝑡. 𝑖=1 𝑖=1
∏ )∏ 𝑛
(a) IV parses 𝑅𝐾 as {𝑁} and 𝜎𝐸 as {𝑟𝑣3 𝜂 𝑛𝑖=1 𝛾𝑖 }, where 𝜂 = (
( ∏𝑛 )𝑧1 𝛿1 = 𝑔 𝑘2 𝑎1 +𝑎2 +···+𝑎𝑛 𝑤𝑖 𝑚𝑖 +𝛿2 𝑧2 (mod 𝐿)
𝑔 𝑖=1 𝑤𝑖 . 𝑖=1
(b) IV recovers the final computation result 𝑅𝑒𝑠𝑢𝑙𝑡 as follows: ( )∏ 𝑛
𝑘2 𝑎1 +𝑎2 +···+𝑎𝑛
=𝑔 𝑤𝑖 𝑎𝑖 (mod 𝐿) (7)
𝑛
𝑅𝑒𝑠𝑢𝑙𝑡 = 𝑟𝑣3 𝜂 𝛾𝑖𝑟1 (mod 𝐿) 𝑖=1
𝑖=1 ∏𝑛
( )𝑧1 𝛿1 (6) = 𝑔 𝑘2 𝑎𝑖 𝑤𝑖 𝑎𝑖 (mod 𝐿)
𝑛
𝑛
𝑚 𝑖=1
= 𝑔 𝑘3 𝑔 𝑤𝑖 𝑤𝑖 𝑖 (mod 𝑁)
𝑖=1 𝑖=1
∏𝑛
𝑎
= 𝑣2𝑖 𝑤𝑖 𝑎𝑖 (mod 𝐿)
𝑖=1
3.4. An illustrative ∏𝑛
= 𝑦𝑖 𝑎𝑖 (mod 𝐿)
𝑖=1
We now provide a toy example to further illustrate MExpm. MExpm
performs the following procedures. The original modular exponentia- ∏
𝜉1
𝑎𝑛
𝑎
= 𝑦𝑖 𝑖 ⋅ 𝑦′𝜉 𝑟𝑑𝑎𝜉 ⋅ 𝑦𝑖 𝑖 (mod 𝐿)
tion performs the following procedures. 𝑖=1 𝑖=𝜉+1
5
S. Shen et al. Computer Standards & Interfaces 97 (2026) 104107
Since 𝑎𝜉 𝑑 ≡ 1( mod 𝜙(𝐿)), we always have 𝑦′𝜉 𝑟𝑑𝑎𝜉 = 𝑟𝑦𝜉 mod 𝐿. Based on
Eq. (3), we can get:
𝑛
𝑎
𝑅𝐻𝑆 = 𝑟 𝑦𝑖 𝑖 (mod 𝐿)
𝑖=1
𝑛
𝑎
=𝑟 𝑔 𝑘1 𝑎𝑖 𝑤𝑖 𝑖 (mod 𝐿)
𝑖=1
( )∏
𝑛
𝑚 +𝛿1 𝑧1
= 𝑟𝑔 𝑘1 𝑎1 +𝑎2 +···+𝑎𝑛 𝑤𝑖 𝑖 (mod 𝐿)
𝑖=1
( 𝑛 )𝑧1 𝛿1
∏ ∏
𝑛
𝑚
𝑘3 +𝑧1 𝛿1
= 𝑟𝑔 𝑤𝑖 𝑤𝑖 𝑖 (mod 𝐿) (8)
𝑖=1 𝑖=1
( )𝑧1 𝛿1
𝑛
𝑛
𝑚
= 𝑟𝑔 𝑘3 𝑔 𝑤𝑖 𝑤𝑖 𝑖 (mod 𝐿) Fig. 3. Comparison of checkability rate.
𝑖=1 𝑖=1
( )𝛿 ∏
𝑛
= 𝑟𝑣3 𝑄0 1 𝛾𝑖 (mod 𝐿)
𝑖=1 Proof. If the malicious MEC server deceives the intelligent vehicle IV
𝑛
successfully, the following equation will hold.
= 𝑟𝑣3 ⋅ 𝜂 ⋅ 𝛾𝑖 (mod 𝐿)
𝑖=1 ∏
𝑛
( )𝛿 ∏𝑛
𝑡𝑟𝑣3 𝜂 𝛾𝑖 = 𝑡𝑣4 𝑄1 2 𝛾𝑖 (10)
Obviously, according to Eq. (8), if the MEC server and intelligent 𝑖=1 𝑖=1
vehicle IV are honest and follow all procedures described above, the
The corresponding encoding result will be decoded by IV as follows:
encoding result 𝜎𝐸 and witness result 𝜋𝐸 can always pass the 𝑉 𝑒𝑟𝑖𝑓 𝑦
algorithm. ∏
𝑛
𝑅𝑒𝑠𝑢𝑙𝑡 = 𝑡𝑣3 𝜂 𝛾𝑖 (mod 𝑁) (11)
Second, we will argue that the encoding result 𝜎𝐸 can be decoded
𝑖=1
to the actual result 𝑅𝑒𝑠𝑢𝑙𝑡. In this section, we mainly rely on Eq. (1), Since the MEC server could not gain access to the values of 𝛿1 and 𝛿2 ,
Eq. (2), Eq. (3) and (6). The 𝜎𝐸 can be parsed and computed as follows: ( )𝛿
it cannot obtain the correct values of 𝜂 and 𝑄1 2 . Therefore, the MEC
server can only turn to other 𝑛 pairs to cheat the intelligent vehicle,
𝑛
then it needs to determine the correct meanings of 2𝑛 + 2 sub-tasks to
𝑅𝑒𝑠𝑢𝑙𝑡 = 𝑟𝑣3 𝜂 𝛾𝑖𝑟1 (mod 𝑁)
obtain pairs: (𝑤 , 𝑚ℎ ) and (𝑤𝑗 , 𝑚′𝑗 ). Since the sending order of these 2𝑛+
𝑖=1
( )𝑧1 𝛿1 2 pairs is random, the MEC server is unaware of the specific meanings
𝑛
𝑛
= 𝑔 𝑘3 𝑔 𝑤𝑖
𝑚
𝑤𝑖 𝑖 (mod 𝑁) of each pair. Therefore, it needs to find (𝑤 , 𝑚ℎ ) and (𝑤𝑗 , 𝑚′𝑗 ) among the
𝑛 𝑛
𝑖=1 𝑖=1 2𝑛 + 2 pairs. The probability of this operation is 2𝑛+2 2𝑛+1
. Additionally,
( 𝑛 )𝑧1 𝛿 1 for successful deception, the MEC server needs to determine the value
∏ ∏
𝑛
𝑚
= 𝑔 𝑘3 +𝑧1 𝛿1 𝑤𝑖 𝑤𝑖 𝑖 (mod 𝑁) of 𝑟, where 𝑟 ∈ {2, … , 𝑁}. Thus, the probability of finding the correct 𝑟
𝑖=1 𝑖=1 1
is 𝑁2 . Subsequently, the MEC server generates a random number 𝑡 and
( )∏
𝑛
𝑚 𝑚′
= 𝑔 𝑘1 𝑎1 +𝑎2 +···+𝑎𝑛
𝑚 +𝛿 𝑧
𝑤𝑖 𝑖 1 1 (mod 𝑁) returns 𝑡𝑤 and 𝑡𝑤𝑗 𝑗 . We denote the malicious server successfully
𝑖=1 (9) determining the correct meaning of (𝑤 , 𝑚ℎ ) and (𝑤𝑗 , 𝑚′𝑗 ) as event 𝐸1 ,
𝑛
𝑎 and denote the determining the value of 𝑟 as event 𝐸2 . We have:
𝑘1 𝑎 𝑖
= 𝑔 𝑤𝑖 𝑖 (mod 𝑁) 𝑛 𝑛 1
Pr(𝐸1 ) = 2𝑛+2 2𝑛+1
and Pr(𝐸2 ) = 𝑁2 .
𝑖=1
∏𝑛 Therefore, the probability of the intelligent vehicle being deceived
𝑎 𝑛2
= 𝑦𝑖 𝑖 (mod𝑁) is: Pr(𝐸1 ∩ 𝐸2 ) = Pr(𝐸1 ) Pr(𝐸2 ) = (4𝑛2 +6𝑛+2)(𝑁2) . Therefore, the checka-
𝑖=1 𝑛 2
bility rate of our proposed scheme MExpm is: 1 (4𝑛2 +6𝑛+2)(𝑁2) .
∏𝑛
( )𝑎
= 𝑢𝑖 + 𝑘𝑁 𝑖 (mod 𝑁)
𝑖=1
𝑛 5. Simulation
𝑎
= 𝑢𝑖 𝑖 (mod 𝑁)
𝑖=1
In this section, we evaluate the performance of our proposed scheme
Obviously, when Eq. (9) holds, the correctness of the algorithm
MExpm by comparing it with the most advanced and representative
𝑅𝑒𝑐𝑜𝑣𝑒𝑟𝑦 is guaranteed and the proof is completed.
modular exponentiation offloading schemes reported in recent litera-
ture. Specifically, we consider MExp [3] and SMCExp [4] for secure
4.2. Security analysis batch modular exponentiation, as well as SoRSA [6] and EPExp [7]
for single modular exponentiation. These schemes reflect the latest
In this section, we demonstrate the privacy for computation of- advancements in both batch-oriented and single-operation settings and
floading results. In MExpm, we firstly convert 𝑛𝑖=1 𝑢𝑖 𝑎𝑖 (mod 𝑁) into are widely recognized as benchmarks in the field. Notably, all of these
∏𝑛 𝑎
𝑖=1 𝑦𝑖 (mod 𝐿), then the exponents 𝑎𝑖 are transformed into 𝛿1 𝑧1 +
𝑖 algorithms are incorporated as baselines in our experimental evalua-
𝑚𝑖,𝑖∈[𝑛] and 𝛿2 𝑧2 + 𝑚′𝑖,𝑖∈[𝑛] . The public information in our scheme are tion, covering key performance indicators such as local computation
{𝑃 𝑎𝑟𝑎𝑚𝑠, 𝑇 𝐾, 𝑉 𝐾, 𝐴𝑢𝑥, 𝜎𝐸 , 𝜋𝐸 }, and the adversaries cannot obtain any time, end-to-end execution latency, communication overhead, and gas
information about secret information {𝑢𝑖,𝑖∈[𝑛] , 𝑎𝑖,𝑖∈[𝑛] , 𝑅𝐾, 𝑅𝑒𝑠𝑢𝑙𝑡}. consumption. Since MExpm is designed for batch modular exponenti-
ation, while SoRSA and EPExp are designed solely for single modular
Theorem 1. When the MEC server cheats the client, the misbehavior can exponentiation, for fairness, we also conduct the comparison for the
𝑛2
be detected with checkability rate 1 (4𝑛2 +6𝑛+2)(𝑁2) . case where the batch size 𝑛 = 1.
6
S. Shen et al. Computer Standards & Interfaces 97 (2026) 104107
Fig. 4. Comparsion of time cost in 𝑢𝑎 mod 𝑁.
∏𝑛
Fig. 5. Comparison of time cost in 𝑖=1
𝑢𝑖 𝑎𝑖 mod 𝑁.
Fig. 6. Comparison of communication cost. Fig. 7. Comparison of storage cost.
7
S. Shen et al. Computer Standards & Interfaces 97 (2026) 104107
usage, communication overhead, and storage overhead. We quantify
privacy using the checkability rate. Computation cost is measured by
the execution time (in milliseconds), where longer runtimes correspond
to higher resource consumption. We assess blockchain resource usage
on the Ethereum simulation platform, Remix, using gas consumption
as the metric. Communication overhead is defined as the total data
transmitted during offloading. Storage overhead is quantified by the
additional storage required on both the client and server sides.
5.3. Checkability
The details of checkability rate comparison of these four schemes
are shown in Fig. 3. As 𝑛 increases, our proposed scheme MExpm
always maintains a high checkability rate close to 1, while the other
three schemes gradually decrease. When 𝑛 = 1000, the checkability
1
rate of GExp is only 1001 . It means that a forged result can pass
the verification algorithm with the probability 1000
1001
. Both MExp and
SMCExp have the same checkability rate. However, when 𝑛 = 5000, the
Fig. 8. Comparison of Gas Consumption in Verify Algorithm when the size of checkability rate for these two schemes is only 0.975. Since MExpm
𝑟 is larger than 32 bits. uses a prime number 𝑁 with size 512, its checkability rate is higher
than 0.999.
5.4. Computation cost
5.4.1. Single modular exponentiation offloading
The comparsion results of single modular exponentiation offloading
could be found in Fig. 4. Compared with MExp and SMCExp, MExpm
demonstrates better performance either in 𝐾𝑒𝑦𝐺𝑒𝑛 algorithm or in
𝑆𝑒𝑡𝑢𝑝 algorithm. Particularly in 𝑉 𝑒𝑟𝑖𝑓 𝑦 algorithm, MExpm outperforms
these competitors. When it comes to SoRSA and EPExp, whose security
assumption is rather simple and cannot applied in real-world scenarios,
it seems unfair to compare them with schemes for batch modular
exponentiation with higher security standard.
5.4.2. Batch modular exponentiation offloading
Fig. 5 compares the computational cost of batch modular exponen-
tiation offloading. MExpm consistently requires fewer resources than
MExp across all phases. Although MExpm adds a recovery phase, it
Fig. 9. The relative saving ratio of MExp and MExpm. consists of a single modular inversion—incurring a fixed and negligible
overhead.
5.5. Communication and storage cost
5.1. Experimental setting
We implemented MExpm and MExp for batch modular exponentia- To simulate a low-bandwidth network environment, we set the
tion offloading, MExp, SMCExp, EPExp, SoRSA and MExpm for single transmission rate to 1 Kbps. Fig. 6 shows the communication cost of
modular exponentiation offloading using Python 3.8, along with the Py- the all competitors and MExpm in terms of the time cost of trans-
Cryptodome and GNU Multiple Precision (gmpy2 version 2.1.5) library. mission. For fair comparison, all schemes employ 1024-bit modulus.
All simulation experiments were conducted on the same Windows ma- For EPExp and SoRSA, whose authors assume that they only offload
chine equipped with an Intel Core TM i9-13900HX processor (running ciphertext to servers and do not need to take security of bases into
at 2.20 GHz) and 16 GB of memory. We perform each algorithm 100 consideration, and thus, they have lower communication cost com-
times and then computed the mean of its time cost. The size of prime pared with other schemes. Compared with MExp and SMCExp, MExpm
numbers selected in MExpm are all 512 bits, meaning the number 𝐿 shares the same communication cost in 𝐶𝑜𝑚𝑝𝑢𝑡𝑒 and 𝑉 𝑒𝑟𝑖𝑓 𝑦 algorithm.
is 1024 bits. For MExp and methods without offloading, we randomly SMCExp shows the least communication cost in 𝐾𝑒𝑦𝐺𝑒𝑛 and 𝑆𝑒𝑡𝑢𝑝
generate a pair of 1024-bit prime numbers. In our simulation, MExpm algorithm. The results in Fig. 6 demonstrate that MExpm can deploy
w/o obfuscation denotes MExpm without the secure obfuscation oper-
a more secure offloading strategy while with similar communication
ation and w/o offloading indicates the local execution of the modular
cost compared with other competitors. Fig. 7 shows the storage cost
exponentiation operation.
among all schemes, SoRSA needs to store 𝑛, 𝑞, 𝑝, 𝐶, 𝑘, 𝑡1 , 𝑡2 to conduct
5.2. Evaluation metrics verification and recovery, leading to the most demanding storage cost.
EPExp demonstrates the best storage performance, while it lacks a
To comprehensively evaluate MExpm, we assess its performance consideration of a malicious MEC server. Among MExp and SMCExp,
across five dimensions: privacy, computation cost, blockchain resource MExpm demonstrates the best performance in storage performance.
8
S. Shen et al. Computer Standards & Interfaces 97 (2026) 104107
5.6. Gas consumption • Witness Tampering: The MEC server alters or forges witnesses
with the objective of deceiving the verifier and illegitimately
The results of the gas consumption comparison are demonstrated in passing the verification process.
Fig. 8. It can be observed that as 𝑟 increases, the gas consumption for
In our simulation, the Intelligent Vehicle (IV) executes the KeyGen
both MExpm and MExp grows steadily. However, the gap between them
algorithm entirely offline to generate the evaluation key 𝑇 𝐾, witness
widens significantly. For instance, when 𝑛 = 5, the gas fee difference
generation key 𝑉 𝐾, and auxiliary information 𝐴𝑢𝑥. The 𝑇 𝐾 and 𝑉 𝐾
rises from 7,504 gas at 𝑟 = 32 bits to 58,981 gas at 𝑟 = 256 bits.
are then transmitted to the MEC server and the Roadside Unit (RSU),
Furthermore, the gas cost of MExpms 𝑉 𝑒𝑟𝑖𝑓 𝑦 algorithm scales linearly
respectively. The RSU, acting as a lightweight verifier, executes the
with 𝑛 and is largely unaffected by 𝑟, whereas MExps verification cost
verification algorithm upon receiving computation results from the
increases with both 𝑟 and 𝑛. This highlights MExpms superior efficiency
MEC server.
in reducing computational and financial burdens for intelligent vehi- Experimental results demonstrate that our verification mechanism
cles, especially at larger scales. To provide a normalized view of these achieves a 100% detection rate for all injected malicious behaviors,
savings, we evaluate the relative saving ratio (SR) defined as with zero false positives under benign conditions. This confirms that
𝑛,𝑟 𝑛,𝑟
𝐺MExp 𝐺MExpm the proposed scheme maintains strong security guarantees even in the
𝑆𝑅 = 𝑛,𝑟 presence of malicious MEC servers, thereby reinforcing its practicality
𝐺MExp
for real-world V2X deployments.
𝑛,𝑟 𝑛,𝑟
where 𝐺MExp , 𝐺MExpm is the gas consumption of MExp and MExpm with
same 𝑛 and 𝑟 respectively. As illustrated in Fig. 9, MExpm consistently 5.9. Deployment feasibility in real-world V2X environments
achieves 𝑆𝑅 > 0 across all tested parameter, with observed savings
The proposed scheme is designed for secure computation outsourc-
ranging from approximately 30% to 70%. These results confirm that
ing in resource-constrained vehicular networks. In such scenarios, the
MExpm delivers substantial resource savings over MExp, reinforcing its
primary metric for evaluation is whether the total computational cost
scalability and economic advantages.
incurred locally after outsourcing is significantly lower than that of
fully local execution. Therefore, as is common in the literature on com-
5.7. Economic analysis of gas savings
putation outsourcing, we perform all experiments — including the com-
putation algorithm, verification algorithm, and a non-outsourced base-
To further assess the practical impact of our scheme in real-world line — on the same hardware platform. This ensures a fair and repro-
deployments, we provide an economic estimation of gas savings ducible comparison under identical computational conditions, thereby
achieved by the proposed MExpm scheme compared with the represen- directly demonstrating the benefits of outsourcing.
tative baseline MExp, particularly in the context of blockchain-based In the proposed system, the Service Authority (SA) is responsible for
smart contract verification. As shown in Fig. 8, the gas cost of each handling the bulk of initialization tasks, such as modulus generation,
offloaded batch modular exponentiation result increases with both the base obfuscation, and parameter distribution. This design choice aligns
batch size 𝑛 and the bit length of the randomness parameter 𝑟. When 𝑛 = with the practical division of labor in vehicular networks, reducing
1 and bit length of 𝑟 is 32 bits, MExp incurs 24,013 gas while MExpm the computational burden on field devices. The Intelligent Vehicle (IV)
requires only 16,509 gas, leading to a difference of 7504 gas per executes the KeyGen algorithm to generate the evaluation key 𝑇 𝐾,
verification. The average gas price is approximately 30 Gwei (1 Gwei = witness generation key 𝑉 𝐾, and auxiliary information 𝐴𝑢𝑥. Crucially,
109 ETH), the ETH/USD exchange rate is approximately $4, 000, and 1 the KeyGen algorithm can be performed entirely offline before the
million gas cost approximately $120 on Ethereum networks. Therefore, online phase, ensuring that no additional delay is introduced when
the savings can be translated as initiating the outsourced computation. Once 𝑇 𝐾 is generated, the IV
transmits 𝑇 𝐾 to the MEC server for performing the computation tasks.
Gas Savings = 7,504 gas × 0.03 USD1,000,000 ≈ 0.90 USD In real deployments, RSUs are lightweight verification devices typi-
Consider a practical usage scenario where each intelligent vehicle cally deployed at traffic intersections or along highways, where power
offloads batch modular exponentiation tasks 10 times per day (e.g., for supply and network connectivity can be unstable. To emulate these
authentication, key negotiation, digital signatures, etc.), the annual constraints, we configure the RSU role on a lightweight laptop and
number of invocations is 10 (tasks/day)×365 (days/year) = 3,650 tasks/ limit the network transmission rate to 1 Kbps, thereby simulating a
year. Thus, the total annual gas cost saving per vehicle is: 0.90 realistic low-bandwidth vehicular environment. Meanwhile, the Service
Authority (SA) undertakes the bulk of initialization tasks, ensuring that
USD/task × 3,650 tasks/year ≈ 𝟑𝟐𝟖𝟓𝐔𝐒𝐃𝐲𝐞𝐚𝐫. This estimation high-
RSUs and IVs remain computationally efficient during the online phase.
lights the substantial economic benefits of MExpm when deployed at
This deployment-oriented design, together with our simulation set-
scale in large IoV systems. For a fleet of 10,000 vehicles, the projected
tings, ensures that the evaluation faithfully reflects real-world lim-
gas savings could exceed 32.8 million USD annually.
itations while remaining reproducible. Consequently, the proposed
scheme is shown to be both practically feasible and robust for secure
5.8. Robustness evaluation against malicious MEC servers
computation outsourcing in V2X environments.
To address potential security risks in practical deployments, we 6. Conclusion and future work
conducted robustness experiments simulating malicious Mobile Edge
Computing (MEC) servers that deviate from the prescribed computation In this paper, we propose MExpm, a secure and efficient computa-
protocol. Such adversarial behaviors may include, but are not limited tion offloading scheme for batch modular exponentiation in Vehicle-
to: to-Everything (V2X) communications. Our proposed scheme addresses
critical challenges in V2X systems, such as computational burden,
• Forged Results: The MEC server deliberately returns computa- latency, and privacy concerns, by leveraging Mobile Edge Computing
tion results that deviate from the prescribed algorithm, thereby (MEC) servers and blockchain technology. Our scheme achieves several
attempting to mislead the verifier regarding the correctness of the significant improvements over existing methods. It ensures fairness
computation. in computation offloading by using smart contracts, provides high
• Partial Omission or Manipulation: The MEC server selectively checkability to detect any misbehavior by MEC servers, and enhances
omits partial computation results or manipulates intermediate privacy protection through secure obfuscation and logical split tech-
values with the intent of reducing its own computational work- niques. These features make MExpm particularly well-suited for various
load. real-time applications in V2X systems. These include
9
S. Shen et al. Computer Standards & Interfaces 97 (2026) 104107
• Real-time Cryptographic Operations: MExpm can offload [6] H. Zhang, J. Yu, C. Tian, L. Tong, J. Lin, L. Ge, H. Wang, Efficient and secure
resource-intensive cryptographic tasks, such as digital signatures outsourcing scheme for rsa decryption in internet of things, IEEE Internet Things
J. 7 (8) (2020) 68686881.
and key exchanges, ensuring secure and efficient communication
[7] Q. Hu, M. Duan, Z. Yang, S. Yu, B. Xiao, Efficient parallel secure outsourcing of
with reduced latency. modular exponentiation to cloud for iot applications, IEEE Internet Things J. 8
• Safety-Critical Message Signature: Offloading the computation (16) (2020) 1278212791.
of digital signatures (which rely on exponentiation) for emer- [8] K. Zhou, M. Afifi, J. Ren, Expsos: Secure and verifiable outsourcing of exponen-
gency braking alerts and collision avoidance warnings, with on- tiation operations for mobile cloud computing, IEEE Trans. Inf. Forensics Secur.
12 (11) (2017) 25182531.
chain smart contracts validating each signature to prevent tam-
[9] Y. Saleem, F. Salim, M.H. Rehmani, Integration of cognitive radio sensor
pering. networks and cloud computing: A recent trend, in: Cognitive Radio Sensor
• Privacy-Preserving Authentication: The scheme guarantees the Networks: Applications, Architectures, and Challenges, IGI Global, 2014, pp.
privacy of sensitive data, such as cryptographic bases and expo- 288312.
nents, while allowing verification of computation results. This is [10] J. Wang, Y. Wang, H. Ke, Joint optimization for mec computation offloading
and resource allocation in iov based on deep reinforcement learning, Mob. Inf.
essential for secure authentication in V2X communications, pro-
Syst. 2022 (2022).
tecting both vehicles and infrastructure from malicious attacks. [11] L. Yao, X. Xu, M. Bilal, H. Wang, Dynamic edge computation offloading for
• Traffic Management Systems: MExpm can be integrated into internet of vehicles with deep reinforcement learning, IEEE Trans. Intell. Transp.
smart city infrastructures, supporting secure communication for Syst. (2022).
traffic management, tolling systems, and other applications where [12] R. Gennaro, C. Gentry, B. Parno, Non-interactive verifiable computing: Out-
sourcing computation to untrusted workers, in: Advances in CryptologyCRYPTO
privacy and computation efficiency are crucial.
2010: 30th Annual Cryptology Conference, Santa Barbara, CA, USA, August 2010
15-19. Proceedings 30, Springer, 2010, pp. 465482.
Although MExpm significantly reduces computation resources com-
[13] Y. Guan, H. Zheng, J. Shao, R. Lu, G. Wei, Fair outsourcing polynomial
pared to local execution, it introduces additional complexity due to en- computation based on the blockchain, IEEE Trans. Serv. Comput. 15 (5) (2021)
hanced security features. Future work should aim to design a more gen- 27952808.
eralized verifiable computation offloading framework that optimizes [14] H. Pagnia, F.C. Gärtner, et al., On the Impossibility of Fair Exchange Without a
the balance between security and computational efficiency. Trusted Third Party, Tech. Rep., Citeseer, 1999.
[15] A. Aloufi, P. Hu, Y. Song, K. Lauter, Computing blindfolded on data homomor-
phically encrypted under multiple keys: A survey, ACM Comput. Surv. 54 (9)
CRediT authorship contribution statement (2021) http://dx.doi.org/10.1145/3477139.
[16] J. Zhang, Z.L. Jiang, P. Li, S.M. Yiu, Privacy-preserving multikey com-
Sipeng Shen: Writing review & editing, Writing original draft, puting framework for encrypted data in the cloud, Inform. Sci. 575
Methodology, Conceptualization. Qiang Wang: Writing review & (2021) 217230, http://dx.doi.org/10.1016/j.ins.2021.06.017, https://www.
sciencedirect.com/science/article/pii/S0020025521006083.
editing, Writing original draft, Methodology. Fucai Zhou: Writing
[17] M. Cui, S. Zhong, B. Li, X. Chen, K. Huang, Offloading autonomous driving
review & editing, Supervision. Jian Xu: Writing review & editing, services via edge computing, IEEE Internet Things J. 7 (10) (2020) 1053510547.
Supervision. Mingxing Jin: Writing review & editing. [18] J. Zhou, F. Wu, K. Zhang, Y. Mao, S. Leng, Joint optimization of offloading
and resource allocation in vehicular networks with mobile edge computing, in:
Declaration of competing interest 2018 10th International Conference on Wireless Communications and Signal
Processing, WCSP, IEEE, 2018, pp. 16.
[19] S. Yang, A task offloading solution for internet of vehicles using combination
The authors declare no competing interests.
auction matching model based on mobile edge computing, IEEE Access 8 (2020)
5326153273.
Acknowledgments [20] H. Xu, W. Huang, Y. Zhou, D. Yang, M. Li, Z. Han, Edge computing resource
allocation for unmanned aerial vehicle assisted mobile network with blockchain
This work was supported in part by the National Natural Sci- applications, IEEE Trans. Wirel. Commun. 20 (5) (2021) 31073121.
[21] Y. Liu, H. Yu, S. Xie, Y. Zhang, Deep reinforcement learning for offloading and
ence Foundation of China under Grants 62202090, 62173101, and
resource allocation in vehicle edge computing and networks, IEEE Trans. Veh.
62372069, by Natural Science Foundation of Liaoning Province un- Technol. 68 (11) (2019) 1115811168.
der Grant 2025-MS-046, by the Fundamental Research Funds for the [22] J. Kilian, Theory of Cryptography: Second Theory of Cryptography Conference,
Central Universities, China under Grant N2417006, and by Liaoning TCC 2005, Cambridge, MA, USA, February 10-12. 2005, Proceedings, vol. 3378,
Collaboration Innovation Center for CSLE under Grant XTCX2024-015. Springer, 2005.
[23] X. Ma, J. Li, F. Zhang, Efficient and secure batch exponentiations outsourcing
in cloud computing, in: 2012 Fourth International Conference on Intelligent
Data availability Networking and Collaborative Systems, IEEE, 2012, pp. 600605.
[24] X. Chen, J. Li, J. Ma, Q. Tang, W. Lou, New algorithms for secure outsourcing
Data will be made available on request. of modular exponentiations, IEEE Trans. Parallel Distrib. Syst. 25 (9) (2013)
23862396.
[25] Y. Ren, N. Ding, X. Zhang, H. Lu, D. Gu, Verifiable outsourcing algorithms for
References modular exponentiations with improved checkability, in: Proceedings of the 11th
ACM on Asia Conference on Computer and Communications Security, 2016, pp.
[1] R. Sun, Y. Wen, N. Cheng, W. Wang, R. Chai, Y. Hui, Structural knowledge- 293303.
driven meta-learning for task offloading in vehicular networks with integrated [26] M. Van Dijk, D. Clarke, B. Gassend, G.E. Suh, S. Devadas, Speeding up
communications, sensing and computing, J. Inf. Intell. (2024). exponentiation using an untrusted computational resource, Des. Codes Cryptogr.
[2] S. Yuan, Y. Fan, Y. Cai, A survey on computation offloading for vehicular 39 (2006) 253273.
edge computing, in: Proceedings of the 2019 7th International Conference on [27] J. Ye, J. Wang, Secure outsourcing of modular exponentiation with single
Information Technology: IoT and Smart City, 2019, pp. 107112. untrusted server, in: 2015 18th International Conference on Network-Based
[3] Y. Ding, Z. Xu, J. Ye, K.-K.R. Choo, Secure outsourcing of modular exponen- Information Systems, IEEE, 2015, pp. 643645.
tiations under single untrusted programme model, J. Comput. System Sci. 90 [28] S. Li, L. Huang, A. Fu, J. Yearwood, Cexp: secure and verifiable outsourcing of
(2017) 113. composite modular exponentiation with single untrusted server, Digit. Commun.
[4] Q. Su, R. Zhang, R. Xue, Secure outsourcing algorithms for composite modular Netw. 3 (4) (2017) 236241.
exponentiation based on single untrusted cloud, Comput. J. 63 (8) (2020) [29] C. Dong, Y. Wang, A. Aldweesh, P. McCorry, A. van Moorsel, Betrayal, distrust,
12711271. and rationality: Smart counter-collusion contracts for verifiable cloud comput-
[5] Y. Wang, Q. Wu, D.S. Wong, B. Qin, S.S. Chow, Z. Liu, X. Tan, Securely ing, in: Proceedings of the 2017 ACM SIGSAC Conference on Computer and
outsourcing exponentiations with single untrusted program for cloud storage, in: Communications Security, 2017, pp. 211227.
Computer Security-ESORICS 2014: 19th European Symposium on Research in [30] J. Ellul, G.J. Pace, Runtime verification of ethereum smart contracts, in: 2018
Computer Security, Wroclaw, Poland, September 2014 7-11. Proceedings, Part I 14th European Dependable Computing Conference, EDCC, IEEE, 2018, pp.
19, Springer, 2014, pp. 326343. 158163.
10
S. Shen et al. Computer Standards & Interfaces 97 (2026) 104107
[31] M.R. Dorsala, V. Sastry, S. Chapram, Fair payments for verifiable cloud services [33] D. Xu, Y. Ren, X. Li, G. Feng, Efficient and secure outsourcing of modular
using smart contracts, Comput. Secur. 90 (2020) 101712. exponentiation based on smart contract, Int. J. Netw. Secur. 22 (6) (2020)
[32] S. Avizheh, M. Nabi, R. Safavi-Naini, K. M. Venkateswarlu, Verifiable computa- 934944.
tion using smart contracts, in: Proceedings of the 2019 ACM SIGSAC Conference [34] G. Wood, et al., Ethereum: a secure decentralised generalised transaction ledger,
on Cloud Computing Security Workshop, 2019, pp. 1728. (2014) 2017.
[35] S. Mahdavi-Hezavehi, Y. Alimardani, R. Rahmani, D. Rosaci, An efficient frame-
work for a third party auditor in cloud computing environments, Comput. J. 63
(1) (2020) 12851297.
11

View File

@@ -0,0 +1,733 @@
Journal of Systems Architecture 160 (2025) 103362
Contents lists available at ScienceDirect
Journal of Systems Architecture
journal homepage: www.elsevier.com/locate/sysarc
Quantum-safe identity-based designated verifier signature for BIoMT
Chaoyang Li a,b ,, Yuling Chen a , Mianxiong Dong c , Jian Li d , Min Huang b , Xiangjun Xin b ,
Kaoru Ota c
a State Key Laboratory of Public Big Data, Guizhou University, Guizhou Guiyang, 550025, China
b
College of Software Engineering, Zhengzhou University of Light Industry, Zhengzhou 450001, China
c
Department of Sciences and Informatics, Muroran Institution of Technology, Muroran 050-8585, Japan
d
School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China
ARTICLE INFO ABSTRACT
MSC: Blockchain technology changes the centralized management form in traditional healthcare systems and
00-01 constructs the distributed and secure medical data-sharing mechanism to achieve data value maximization.
99-00 However, the advanced capabilities of quantum algorithms bring a serious threat to current blockchain
Keywords: cryptographic algorithms which are based on classical mathematical difficulties. This paper proposes the first
Blockchain quantum-safe identity-based designated verifier signature (ID-DVS) scheme for blockchain-based Internet of
Internet of medical things medical things (BIoMT) systems. This scheme is constructed based on the lattice assumption of the short
Identity
integer solution (SIS) problem, which is believed to resist the quantum attack. The identity mechanism helps
DVS
to establish a transaction traceability mechanism when this data is shared among different medical institutions.
Privacy-preserving
The designated verifier mechanism also prevents unauthorized users from accessing data to improve the
security of medical data-sharing processes. Next, this ID-DVS scheme is proved in random oracle model, which
can achieve the security properties of anonymity and unforgeability. It also can capture the post-quantum
security. Then, the performance analysis of the key size and time consumption are presented, and the results
show that this ID-DVS is more efficient than other similar schemes. Therefore, this work supports secure
medical data-sharing and protects the privacy of users and medical data.
1. Introduction tructure, Merkle tree, digital signature, and zero-knowledge proof,
which are utilized to better adapt to the transaction privacy protection
Blockchain-enabled Internet of Medical Things (BIoMT) profoundly in the blockchain network. These blockchain cryptographic technolo-
affects peoples lives and health with the gradual increase of wearable gies jointly protect transaction security and user privacy. For example,
health devices [1]. Firstly, blockchain technology helps to establish a the digital signature is responsible for transaction verification in the
distributed medical data-sharing framework among different medical consensus process and for establishing links to different blocks [3].
institutions, which replaces the traditional centralized management The signature also provides the transaction traceability mechanism
form and achieves cross-institutional medical data utilization. Then, the when some disputes occur. Especially the DVS is more suitable for
BIoMT solves the problems of collecting, storing, sharing, and using one-to-one data-sharing among different BIoMT systems that it can
massive medical data. However, the security issues with medical data guarantee the non-delegatability of signature. These technologies con-
and user privacy in the cross-institutional data-sharing process have struct the trust foundation for the blockchain-based network as these
gained much attention as more sensitive information is inserted into NP-hard problem-based cryptographic algorithms cannot be broken
these medical data. Especially for the sensitive information protection, through with the current most advanced classic computer. Most of
the users do not want to give non-specified users access to the data. these algorithms are based on RSA and ECC cryptographic theories, but
Hence, one-to-one data sharing can effectively prevent the leakage of the fundamental problems of large integer factorization and discrete
sensitive information. logarithms are weak against the quantum attack [4].
Blockchain cryptography has received more attention as it is in- Quantum threat is the main concern in current information systems
creasingly essential in most blockchain-based applications [2]. It is with the rapid developments of quantum computers and quantum
relation to the cryptographic algorithms of the symmetric crypto- computing. The Grover quantum algorithm can speed up the efficiency
graphic, asymmetric cryptographic, hash function, public key infras-
Corresponding author at: College of Software Engineering, Zhengzhou University of Light Industry, Zhengzhou 450001, China.
E-mail address: lichaoyang@zzuli.edu.cn (C. Li).
https://doi.org/10.1016/j.sysarc.2025.103362
Received 9 December 2024; Received in revised form 13 January 2025; Accepted 6 February 2025
Available online 15 February 2025
1383-7621/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
C. Li et al. Journal of Systems Architecture 160 (2025) 103362
of target search, which brings threats to the symmetric cryptographic data-sharing processes. For identity authentication, Jia et al. [13]
algorithm, for example: Elliptic Curve Cryptography
√ (ECC), by decreas- constructed a privacy-aware authentication model with blockchain and
ing the search complexity from 𝑂(𝑁) to 𝑂( 𝑁) [5]. The Shor quantum proposed two authentication protocols based on ECC and physically un-
algorithm can achieve exponential acceleration for large integer factor- clonable function algorithm respectively to enhance privacy security in
ization [6], which brings threats to the asymmetric cryptographic, for the IoMT ecosystem. Lin et al. [14] proposed a mutual user authentica-
example: RSA. In recent years, post-quantum cryptographic algorithms tion protocol with the ECC algorithm, which could achieve a legal user
have gained much attention in the areas of scientific research, finance, authentication in blockchain-based IoMT networking. Chen et al. [15]
and industry [7]. Currently, code-based cryptography, Hash cryp- designed a certificateless aggregate signcryption scheme based on ECC
tography, lattice cryptography, and multivariate-quadratic-equations to protect the data privacy in IoT applications, but it could not provide
cryptography are some famous post-quantum cryptographic (PQC) al- anti-quantum attack security. Han et al. [16] introduced a blockchain
gorithms. Code-based cryptography was first proposed by McEliece [8], based privacy-preserving framework and a public key searchable en-
which was constructed by the error correction codes. Although this cryption scheme to strengthen the data traceability. Zou et al. [17]
cryptosystem has a significant anti-quantum attack advantage, its key introduced a credential-embedded authentication protocol to protect
size disadvantage makes it unsuitable for IoT systems. Hash cryptog- users privacy and designed an authenticated key agreement protocol to
raphy was initially introduced by Lamport [9], which was known as support bilateral authentication for medical data-sharing through IoMT
the one-way function to provide quantum-proof security. The Merkle systems. For data encryption/decryption, Guo et al. [18] presented
tree is another well-known hash-based cryptosystem [10]. These hash- an attributed-based encryption protocol with a ciphertext policy and
based algorithms are not based on solving hard mathematical problems, set an outsourced online/offline revocable mechanism to guarantee
but they can obtain the properties of one-wayness, collusion resistance, fine-grained access control. Li and Dong et al. [19] gave a keyword-
and preimage resistance. Lattice cryptography is one of the suggested searchable encryption scheme to achieve cross-institution medical data
PQC scheme in the NIST call, which was first proposed by Ajtai [11]. utilization and established an on-chain ledger and off-chain storage
Multivariate-quadratic-equations cryptography is another kind of PQC model to reduce ledger redundancy. Liu et al. [20] designed a cer-
that is based on the complexity of solving multivariate equations [12]. tificateless public key encryption protocol based on high-consumption
This kind of PQC algorithm suffers from efficiency hardship with the bilinear pairing, combining the keyword search function to protect
large key size and ciphertext overhead. medical data in IoMT. Qu et al. [21] introduced an interesting work
This paper focuses on the needs of security and integrity, and pro- of quantum blockchain to improve privacy security in IoMT, which
poses a lattice-based ID-DVS scheme to cover the privacy-preserving is- utilized the quantum signature and quantum identity authentication
sues, such as designated verifier, signers anonymity, and signature non- to achieve secure medical data-sharing with the quantum cloud. For
delegatability in the BIoMT system. The contributions are summarized transaction verification, Mao et al. [22] presented an identity-based
as follows. aggregated signature scheme for IoMT, which could enable efficient
local verification of medical data with a locally verifiable mechanism.
• A lattice-based ID-DVS scheme has been proposed. This is the
Zhang et al. [23] proposed a certificateless signcryption protocol to
first ID-DVS scheme which is constructed with the reject sampling
guarantee privacy security in IoMT, which utilized bilinear pairings
in Gaussian distribution and SIS lattice problem. The identity
and zero-knowledge proof to resist super-level internal adversaries.
mechanism in this ID-DVS provides transaction traceability for
Li et al. [24] proposed a designated verifier signature scheme and
medical data-sharing, and the designed verifier setting protects
established a cross-chain medical data-sharing framework to support
user privacy as unauthorized users cannot access the transaction.
secure and efficient data-sharing among different BIoMT systems.
• The security proof of the proposed ID-DVS scheme is given. In
With the deepening application of blockchain in BIoMT, the re-
the random oracle model, this ID-DVS scheme can be proved to
search on blockchain cryptographic algorithms applicable to medical
satisfy the security properties of anonymity and unforgeability.
data-sharing transactions is also more urgent. Most of these BIoMT
Meanwhile, this ID-DVS scheme can resist the quantum attack
systems are also based on RSA and ECC cryptographic algorithms,
with the lattice assumption, which can prevent the quantum
which are vulnerable to quantum attacks. So it is urgent to seek more
adversary in the future quantum computer age.
secure anti-quantum cryptographic algorithms to equip current BIoMT
• The efficiency comparison and performance analysis are pre-
systems.
sented. The key size, time consumption, and energy consumption
are calculated and compared with other similar schemes. The
2.2. Post-quantum cryptography
results show that this ID-DVS scheme is more efficient, which can
well support secure medical data-sharing among different BIoMT
PQC utilizes classical computationally hard problems to construct
systems.
quantum-safe cryptosystems for current information systems. Especially
Next, the related work is given in Section 2, some preliminaries are for the sensitive information protection of medical data in BIoMT
shown in Section 3, the ID-DVS scheme is proposed in Section 4, the systems, the practical application of PQC is important and necessary.
security of the ID-DVS scheme is analyzed and proved in Section 5, the For code-based cryptography, Thiers et al. [25] presented a decoding
performance analysis is in Section 6, and the conclusion is in Section 7. algorithm based on the 𝑞-ary codes, which could achieve low com-
plexity and anti-quantum security. Alahmadi et al. [26] introduced
2. Related work a signature scheme with error-correcting codes for blockchain-based
networks and utilized bounded distance decoding for signature veri-
This paper mainly focuses on the research and applications of fication. For hash cryptography, Punithavathi et al. [27] established a
blockchain cryptography in BIoMT. Some reviews of blockchain cryp- double-layer encryption framework and proposed a crypto hash algo-
tography for BIoMT, PQC, and lattice-based signature theory about this rithm to resist the malware attack in medical data-sharing processes in
theme are given in the following subsections. the IoMT system. Kuznetsov et al. [28] gave the performance analysis
of the hashing algorithm in blockchain-based systems and compared
2.1. Blockchain cryptography for BIoMT it with other related hashing algorithms to show its efficiency and
practice. For lattice cryptography, Ye et al. [29] designed a traceable
In the BIoMT system, identity authentication, data ring signature scheme based on lattice assumption for IoMT, which
encryption/decryption, and transaction verification all need blockchain could obtain tag-linkability and exculpability in a random oracle model.
cryptography algorithms to protect privacy security in the medical Bagchi et al. [30] utilized the ring LWE problem to construct an
2
C. Li et al. Journal of Systems Architecture 160 (2025) 103362
Table 1
Lattice-based schemes comparison.
Ref. Lattice problem Advantage Limitation
Kim et al. [33] NTRU Key encapsulation; Centralized KGC; Key escrow;
Randomness-recovery; Encoding Chosen ciphertext attack weak
Yu et al. [35] NTRU and SIS Certificateless, Ring signature Private key management
Li and Jiang et al. [34] ring-LWE and SIS Non-delegatability; Bimodal Centralized KGC; Key escrow
Gaussians
Yao et al. [36] ring-LWE and ring-ISIS Ring analog; Authenticate Centralized KGC; Key escrow
ciphertext
Zhang et al. [37] ring-LWE and SIS Non-delegatability; Chameleon Centralized KGC; Key escrow
hash
Zhang and Sun et al. [38] ring-LWE Re-signature; Semi-trusted proxy; Centralized KGC; Key escrow;
Signature evolution Double time consumption
aggregate signature scheme and applied this scheme to the Internet of 3. Preliminaries
drones for privacy preservation. For multivariate-quadratic-equations
cryptography, Shim et al. [31] proposed a post-quantum signature The lattice theories, ID-DVS scheme model, and security model have
with multivariate-quadratic-equations, which supported the dramatic been presented in this section.
online signing for cryptographic systems. These four PQC proposals are
not only generally used for creating encryption/decryption and digital 3.1. Lattice theories
signature algorithms, but also for key exchange and authentication
cryptosystems in the not-too-distant future. Definition 1 (Lattice [39]). Let 𝑣1 , … , 𝑣𝑛 ∈ R𝑚 be a set of linearly
This paper plans to utilize lattice theory to construct a PQC signa- independent vectors. The lattice 𝛬𝐿 generated by 𝑣1 , … , 𝑣𝑛 refers to the
ture algorithm, as the digital signature plays an essential roles in trans- set formed by linear combinations of vectors 𝑣1 , … , 𝑣𝑛 .
action signature, blockchain system consistency, and data ownership
confirmation in BIoMT systems. 𝛬𝐿 = {𝑎1 𝑣1 + 𝑎2 𝑣2 + · · · + 𝑎𝑛 𝑣𝑛 𝑎1 , 𝑎2 , · · ·, 𝑎𝑛 ∈ Z} (1)
2.3. Lattice-based signature theory Here, the matrices 𝐴 = (𝑎1 , … , 𝑎𝑚 ) ⊂ R𝑛×𝑚 is the coefficient matrix
of lattice 𝛬, where the dimension 𝑛 and rank 𝑚 of this lattice satisfy
Lattice cryptography serves as one promising PQC theory that has 𝑚 = 𝑂(𝑛 log 𝑞).
gained much attention in recent years. Its security is also based on some
NP-hard problems, such as shortest vector problem (SVP), shortest in-
Definition 2 (q-ary Lattice [39]). Eq. (1) is the q-ary lattice, which
dependent vectors problem (SIVP), closest vector problem (CVP), short
is constructed by a matrix  ∈ Z𝑛×𝑚
𝑞 , a prime number 𝑞, and a vector
integer solution (SIS), learning with errors (LWE), bounded distance
𝜇 ∈ Z𝑛𝑞 .
decoding problem (BDD), and so on [32]. The Number Theory Research
Unit (NTRU) algorithm is based on SVP or SIVP, which is designed with 𝛬⟂ (𝐴) = {𝑥 ∈ Z𝑚 |𝑥 = 0 mod 𝑞 𝑓 𝑜𝑟 𝑥 ∈ Z𝑚 }
(2)
the polynomial ring. The scheme in the Refs. [19] is based on this mech- 𝛬⟂𝜇 (𝐴) = {𝑥 ∈ Z |𝑥 = 𝜇 𝑚𝑜𝑑 𝑞 𝑓 𝑜𝑟 𝑥 ∈ Z }
𝑚 𝑚
anism. Kim et al. [33] introduced a key encapsulation mechanism with
the NTRU lattice, which could resist significant cryptanalytic attacks in
current information systems. The LWE is a CVP in which the hardness
Definition 3 (Gaussian Distribution [40]). The Gaussian distribution is
is solving linear equations with noise. The scheme in the Refs. [29] is 𝜌𝑐 ,𝜎 (𝑥) = 𝑒𝑥𝑝( (𝑥𝑐)
2
), where 𝜎 ∈ R is the standard deviation, 𝑐 ∈ R is
based on this mechanism. Li and Jiang et al. [34] proposed a group 2𝜎 2
the center, and 𝑥 ∈ R is vector. More generally, it can be defined as
signature scheme with the SIS lattice problem, which had been applied 2
𝜌𝑐 ,𝜎 (𝑥) = 𝑒𝑥𝑝( −‖𝑥−𝑐‖
2𝜎 2
) with 𝑥, 𝑐 ∈ R𝑛 . When the center 𝑐 = 0, it becomes
to the IoMT system with blockchain technology for secure medical
𝜌𝜎 (𝑥). Meanwhile, 𝐷𝜎 (𝑥) = 𝜌𝜎 (𝑥)𝜌𝜎 (Z) is discrete Gaussian distribution
data-sharing. Yu et al. [35] designed an NTRU-based certificateless
over Z and 𝐷𝜎 (𝑥) = 𝜌𝜎 (𝑥)𝜌𝜎 (Z𝑚 ) is the general situation over Z𝑚 .
ring signature for electronic voting, which could obtain the properties
of quantum immunity, unconditional anonymity, and unforgeability.
The ring-LWE is a variant of LWE that has more strengthened security Definition 4 ( 𝑆 𝐼 𝑆𝑞𝜅,𝑛,𝑚,𝛽 Problem [40]). 𝑆 𝐼 𝑆𝑞𝜅,𝑛,𝑚,𝛽 is defined to
properties. The schemes in the Refs. [30] are based on this mechanism. find a non-zero 𝑣 ∈ ℜ𝑚 𝑞 which satisfy 𝐴𝑣 = 0, where a ring, 𝜅 is a
𝑞 , 𝐴𝑞 , and ‖𝑣‖2 ≤ 𝛽.
distribution over ℜ𝑛×𝑚
Yao et al. [36] designed a public-key authenticated encryption protocol 𝑛×𝑚
with ring-LWE in the ideal lattice, which also could achieve keyword
search ability in cloud computing. Zhang et al. [37] proposed a DVS
scheme with the chameleon hash and without trapdoors, which could Definition 5 (𝑆 𝑎𝑚𝑝𝑙𝑒𝑃 𝑟𝑒(𝐴, 𝑇 , 𝜎 , 𝑦) [40]). Given a matrix 𝐴 ∈ 𝑍𝑞𝑛×𝑚 ,
achieve non-delegatability. Zhang and Sun et al. [38] presented an ID- a trapdoor basis 𝑇 of lattice 𝛬⟂ (𝐴), 𝜎𝐿 ⋅ 𝜔( 𝑙𝑜𝑔 𝑛), and a random
DVS scheme with a function of signature evolution, which also added vector 𝑦, 𝑆 𝑎𝑚𝑝𝑙𝑒𝑃 𝑟𝑒(𝐴, 𝑇 , 𝜎 , 𝑦) can derive a non-zero vector 𝑒 ∈ 𝑍𝑞𝑚 ,
the proxy and re-signature functions. The simple comparisons of these which satisfy 𝐴𝑒 = 𝑦 𝑚𝑜𝑑 𝑞. Here, ‖𝑒‖ ≤ 𝜎 𝑚.
lattice-based schemes are shown in Table 1.
As in BIoMT, the protection of sensitive information in medical
data is essential in the medical utilization processes among different 3.2. Model descriptions
medical institutions. Meanwhile, the threats to classical cryptographic
algorithms from quantum computers should be taken more seriously. The scheme model and security model are given in this subsection,
Therefore, This paper addresses security and privacy issues related to and they provide the formal definition of an ID-DVS scheme.
system users and medical data by proposing a quantum-safe ID-DVS (1) Scheme model
scheme to strengthen the security of medical data-sharing in BIoMT For an ID-DVS scheme, it is mainly composed of five polynomial
systems. time algorithms.
3
C. Li et al. Journal of Systems Architecture 160 (2025) 103362
• Setup(1𝑛 ): Input the security parameter 𝑛, key generation center Table 2
(KGC) outputs the system parameters 𝑝𝑝 and system master secret System parameters.
key 𝑚𝑠𝑘. Notation Meaning
• KeyGen.(𝐼 𝐷𝑎 , 𝐼 𝐷𝑏 , 𝑝𝑝, 𝑚𝑠𝑘): Input the identities 𝐼 𝐷𝑎 and 𝐼 𝐷𝑏 of q One large prime with 𝑞 = 𝑞(𝑛) ≥ 3
the signer and designated verifier, 𝑝𝑝, and 𝑚𝑠𝑘, KGC generates the n, m The dimension of key matrix, and 𝑚 ≥ 5𝑛𝑙𝑜𝑔 𝑞
𝜅 The system security parameter
key pairs (𝑝𝑘𝑎 , 𝑠𝑘𝑎 ) and (𝑝𝑘𝑏 , 𝑠𝑘𝑏 ) respectively.
Z The integer matrix/vector set for system keys
• Sign(𝑝𝑝, 𝑠𝑘𝑎 , 𝑝𝑘𝑎 , 𝑝𝑘𝑏 , 𝜇): Input the message 𝜇, 𝑝𝑝, (𝑝𝑘𝑎 , 𝑠𝑘𝑎 ), the √
𝜎 A system parameter with 𝜎 = 𝐿 ⋅ 𝜔( 𝑙𝑜𝑔 𝑛)
designated verifiers public key 𝑝𝑘𝑏 , the signer generates an ID- 𝑚𝑝𝑘 The group public key
DVS signature (𝑒, 𝜇). 𝑚𝑠𝑘 The group muster secret key
• Verify(𝑠𝑘𝑏 , 𝑝𝑘𝑏 , 𝑝𝑘𝑎 , 𝜇, 𝑒): Input (𝑒, 𝜇), 𝑝𝑝, (𝑝𝑘𝑏 , 𝑠𝑘𝑏 ), and the 𝐼 𝐷𝑖 The user identity
𝐻1 , 𝐻2 The cryptographic Hash function
signers public key 𝑝𝑘𝑎 , the designated verifier checks the legality
𝐷𝜎𝑚 The bimodal Gaussian distribution
of the ID-DVS signature. 𝜎 The standard deviation for 𝐷𝜎𝑚
• Simulation(𝑝𝑝, 𝑠𝑘𝑏 , 𝑝𝑘𝑏 , 𝑝𝑘𝑎 , 𝜇): Input the message 𝜇, 𝑝𝑝, (𝑝𝑘𝑏 , 𝑠𝑘𝑏 ), 𝜇 The message to be signed
the singers public key 𝑝𝑘𝑎 , the designed verifier generates an- 𝑝𝑘, 𝑠𝑘 The public and private keys for system users
other ID-DVS signature (𝑒 , 𝜇).
(2) Security model
An ID-DVS scheme must satisfy the correctness, anonymity, and
unforgeability. The correctness can be verified according to the verifi-
cation process. The anonymity and unforgeability should be proved in • Initialize: 𝐶 performs the Setup(1𝑛 ) algorithm to obtain the system
the random oracle model as shown in the following Definitions 6 and 7, parameters 𝑝𝑝 and the master secret key 𝑚𝑠𝑘. Then, he exposes 𝑝𝑝
respectively. Note that only by passing this certification can it be shown and keeps 𝑚𝑠𝑘 in secret.
that the designed ID-DVS scheme is safe. Next, the security proof model • Query: 𝐸 can perform enough polynomial times of queries on the
is constructed with a query-respond game, where an adversary Eve 𝐸 random oracle. Here, the hash function, secret key, and signature
performs the query and a challenger Charlie 𝐶 performs the response. are all the query targets. 𝐸 can perform queries on the non-target
users identity 𝐼 𝐷 or the non-target message 𝜇 . 𝐶 responds to
Definition 6 (Anonymity). If an adversary can make the right guess the answers to the queries if the answers already exist. Other-
whether the signature is signed by the signer or the designated verifier wise, 𝐶 executes the signature algorithms of KeyGen. or Sign to
with the adaptive selective identity attack in the random oracle model, generate new answers to 𝐸s queries.
he wins this round of the query-respond game. Detailed query-respond • Forge: 𝐸 utilizes these enough queried answers to generate a valid
processes between 𝐴 and 𝐶 are shown as follows.
signature (𝑒, 𝜇 ) for the target users identity 𝐼 𝐷 and message 𝜇 ,
• Initialize: 𝐶 performs the Setup(1𝑛 ) algorithm to obtain the system and exposes this signature.
parameters 𝑝𝑝 and the master secret key 𝑚𝑠𝑘. Then, he exposes 𝑝𝑝 • Challenge: 𝐶 also can execute the signature processes legally and
and keeps 𝑚𝑠𝑘 in secret. derive another valid signature (𝑒 , 𝜇 ) for the target users identity
• Query: 𝐸 can perform enough polynomial times of queries on the 𝐼 𝐷 and message 𝜇 . Then, 𝐶 utilizes these two valid signatures
random oracle. Here, the hash function, secret key, and signature about the same message 𝜇 to solve the Z 𝑆 𝐼 𝑆𝑞𝜅,𝑛,𝑚,𝛽 instance.
are all the query targets. 𝐸 can perform queries on the non-target • Analyze: This step analyses two points. One is the probability that
users identity 𝐼 𝐷 or the non-target message 𝜇 . 𝐶 responds to
𝐶 can find a solution for the Z 𝑆 𝐼 𝑆𝑞𝜅,𝑛,𝑚,𝛽 instance, and the other
the answers to the queries if the answers already exist. Other-
one is the probability that 𝐸 successfully generates a valid ID-DVS
wise, 𝐶 executes the signature algorithms of KeyGen. or Sign to
signature. Here the successful rate of 𝐸 can be defined as shown
generate new answers to 𝐸s queries.
in Eq. (4).
• Challenge: 𝐸 selects two target system users identities 𝐼 𝐷𝑖0 and
𝐼 𝐷𝑖1 and queries on the signature about these two identities. Next, 𝐴𝑑 𝑣𝐹𝐴 𝑜𝑟𝑔 𝑒 = 𝑃 𝑟[𝐸 𝑠𝑢𝑐 𝑐 𝑒𝑠𝑠𝑒𝑑 .] (4)
𝐶 randomly chooses the identity 𝐼 𝐷𝑖𝑏 , 𝑏 ∈ 0, 1 as the signer and
the other one as the designated verifier, derives the ID-DVS (𝑒, 𝜇 ) This unforgeability ensures that no one other than the signer can
according to the processes of KeyGen. and Sign algorithms, and
generate a legitimate signature, thus improving the security of the
sends it back to 𝐸.
medical data-sharing process among different BIoMT systems.
• Guess: 𝐸 performs the guess of 𝑏 . If 𝑏 = 𝑏, 𝐸 wins this game.
Here the guess successful rate of 𝐸 can be defined as shown in
Eq. (3). 4. The ID-DVS scheme
𝐴𝑑 𝑣𝐴𝑛𝑜𝑛
𝐴 = 𝑃 𝑟[𝐸 𝑠𝑢𝑐 𝑐 𝑒𝑠𝑠𝑒𝑑 .] (3)
This ID-DVS scheme is constructed with the lattice assumption of
𝑆 𝐼 𝑆𝑞𝜅,𝑛,𝑚,𝛽 . To improve the computational efficiency, the lattice
This anonymity increases the probability that the adversary will assumption is reduced from R to Z, and the new lattice assumption
fail to attack the signature because he cannot determine whether the Z𝑆 𝐼 𝑆𝑞𝜅,𝑛,𝑚,𝛽 does not decrease the hardness. The parameter definitions
signer or the designated verifier is the real signer. Meanwhile, the are shown in Table 2. This scheme mainly contains five algorithms of
designated verifier cannot prove to third parties that this signature is 𝑆 𝑒𝑡𝑢𝑝, 𝐾 𝑒𝑦𝐺𝑒𝑛., 𝑆 𝑖𝑔 𝑛, 𝑉 𝑒𝑟𝑖𝑓 𝑦, and 𝑆 𝑖𝑚𝑢𝑙𝑎𝑡𝑖𝑜𝑛. The simple framework of
valid. This mechanism can protect user privacy in medical data-sharing this ID-DVS scheme is shown in Fig. 1, and details of these algorithms
transactions and prevent the designated verifier from authorizing other are described as follows.
users to access the signature.
4.1. Setup
Definition 7 (Unforgeability). If an adversary can forge a valid signature
with the adaptive selective message attack in the random oracle model,
Some system parameters are preset according to the setting princi-
a challenger can derive another valid signature and solve the lattice
assumption with these two signatures. Here, the successful probability ple in Ref. [41], where 𝑛 is the security parameter, 𝑞 is a prime number
of this challenger is non-negligible. Detailed query-respond processes 𝑞 = 𝑞(𝑛) ≥ 3, 𝑚 is a positive
which satisfies with √ √ integer which satisfies
between 𝐸 and 𝐶 are shown below. 𝑚 ≥ 5𝑛 𝑙𝑜𝑔 𝑞, 𝐿 = 𝑂( 𝑛 𝑙𝑜𝑔 𝑞), and 𝜎𝐿 ⋅ 𝜔( 𝑙𝑜𝑔 𝑛).
4
C. Li et al. Journal of Systems Architecture 160 (2025) 103362
Fig. 1. The simple framework of ID-DVS scheme.
(1) KGC generates a matrix 𝑚𝑝𝑘 = 𝐴 ∈ 𝑍𝑞𝑛×𝑚 with the former system (3) Utilizes his secret key 𝑠𝑘 to compute 𝑒 = 𝑥 + 𝑠𝐼 𝐷1 ;
parameters by the Trapdoor generation (TrapGen.(1𝑛 )) algorithm, 𝐷𝑚 (𝑒)
(4) Output the signature < 𝑒, 𝑐 > with probability 𝑚𝑖𝑛( 𝑀 𝐷𝑚 𝜎 , 1);
𝑠𝐼 𝐷 𝑐 ,𝜎 (𝑒)
which is an approximate random distribution matrix. Then, a 1
otherwise, restart.
basis 𝑇 ∈ 𝑍𝑞𝑚×𝑚 is derived from 𝛬⟂ (𝐴) by TrapGen.(1𝑛 ) as ‖𝑇̃ ‖ ≤
𝐿; This is a probabilistic algorithm, and 𝑀 is some fixed positive real
(2) Chooses 𝐻1 , 𝐻2 {0, 1}𝑍𝑞𝑛 ; that is set large enough to ensure that the preceding probability is
(3) Outputs 𝑝𝑝 = {𝐴, 𝐻1 , 𝐻2 } as public system parameters; always at most 1. If there is no data output, the signer will repeat these
(4) Serves 𝑚𝑝𝑘 = 𝐴 as the master public key and 𝑚𝑠𝑘 = 𝑇 as the sign processes until a legal ID-DVS is generated.
master secret key.
4.4. Verify
4.2. KeyGen When receives the ID-DVS from the signer, the designated verifier
utilizes 𝑝𝑝, the signers private key 𝑎𝐼 𝐷1 , and his private key 𝑠𝑘2 = 𝑠𝐼 𝐷2
Given the system parameter 𝑝𝑝 and users identity 𝐼 𝐷𝑖 . to verify the legality of (𝑒, 𝑐) with message 𝜇.
(1) KGC computes 𝑎𝐼 𝐷𝑖 = 𝐻1 (𝐼 𝐷𝑖 ) ∈ 𝑍𝑞𝑛 ; (1) The designated verifier checks ‖𝑒‖ > 𝐿, and rejects it;
(2) Computes 𝑠𝐼 𝐷𝑖𝑆 𝑎𝑚𝑝𝑙𝑒𝑃 𝑟𝑒(𝐴, 𝑇 , 𝑎𝐼 𝐷𝑖 , 𝜎) ∈ 𝑍𝑞𝑚 , where 𝜎 ≥ (2) Checks ‖𝑒‖∞ > 𝑞4, and rejects it;
√ √
‖𝑇̃ ‖𝜔( 𝑙𝑜𝑔 𝑚), 𝑎𝐼 𝐷𝑖 𝑚𝑜𝑑 𝑞 = 𝐴𝑠𝐼 𝐷𝑖 , and ‖𝑠𝐼 𝐷𝑖 ‖ ≤ 𝜎 𝑚; (3) When the former conditions hold, he verifies whether
(3) Outputs 𝑝𝑘 = 𝑎𝐼 𝐷𝑖 as the public key and 𝑠𝑘 = 𝑠𝐼 𝐷𝑖 as the secret 𝑐 = 𝐻2 (𝐴(𝑒 + 𝑠𝐼 𝐷2 ) 𝑎𝐼 𝐷1 𝑚𝑜𝑑 𝑞 , 𝜇) holds or not. Iff this condition
key for system user with 𝐼 𝐷𝑖 . holds, he accepts this signature; Otherwise, he rejects it.
For the signer and designated verifier in this ID-DVS scheme, the
signers key pair is set as (𝑝𝑘1 , 𝑠𝑘1 ) = (𝑎𝐼 𝐷1 , 𝑠𝐼 𝐷1 ) and the designated 4.5. Simulation
verifiers key pair is set as (𝑝𝑘2 , 𝑠𝑘2 ) = (𝑎𝐼 𝐷2 , 𝑠𝐼 𝐷2 ). Then, they will work
together to generate a legitimate ID-DVS with the following steps. This subsection presents the generation simulation of a new ID-
DVS performed by the designated verifier. According to the former
4.3. Sign generation processes, he can derive a legal ID-DVS with the same
message 𝜇.
Given the system parameter 𝑝𝑝 and message 𝜇.
(1) Selects a random vector 𝑥 ← 𝐷𝜎𝑚
(1) The signer 𝐼 𝐷1 randomly chooses 𝑥 ∈ 𝐷𝜎𝑚 ; (2) Computes 𝑐 = 𝐻(𝐴𝑥 + 𝑎𝐼 𝐷1 𝑚𝑜𝑑 𝑞 , 𝜇) with the system public key
(2) Computes 𝑐 = 𝐻2 (𝐴𝑥 + 𝑎𝐼 𝐷2 𝑚𝑜𝑑 𝑞 , 𝜇); 𝐴 and the same message 𝜇;
5
C. Li et al. Journal of Systems Architecture 160 (2025) 103362
(3) Computes 𝑒 = 𝑥 + 𝑠𝐼 𝐷2 ; exists, the result (𝐼 𝐷𝑖 , 𝑎𝐼 𝐷𝑖 ) is returned back to 𝐸. If not,
𝐷𝑚 (𝑒 ) 𝐶 computes the corresponding 𝑎𝐼 𝐷𝑖 = 𝐻1 (𝐼 𝐷𝑖 ), returns the
(4) Outputs the ID-DVS (𝑒, 𝑐 ) with probability min( 𝑀 𝐷 𝜎 (𝑒 )
, 1),
𝑠𝐼 𝐷 𝑐 ,𝜎 result (𝐼 𝐷𝑖 , 𝑎𝐼 𝐷𝑖 ) back to 𝐸, and records this result into the
2
otherwise he restarts this algorithm. list 𝐿𝑖𝑠𝑡𝐻1 .
Here, the simulated signature (𝑒 , 𝑐 ) is indistinguishable from the 𝐻2 query: 𝐸 adaptively chooses a message 𝜇𝑖 to query on
former generated signature (𝑒, 𝑐) with the same message 𝜇. This is the 𝐻2 function. 𝐶 owns a list 𝐿𝑖𝑠𝑡𝐻2 to store (𝜇𝑖 , 𝑐𝑖 ). When he
inherent quality of the DVS scheme which can prevent attacks from obtains the query, he first searches the list 𝐿𝑖𝑠𝑡𝐻2 whether
unauthorized verifiers. It can improve the security of cross-institution the identity 𝜇𝑖 is queried or not. If exists, the result (𝜇𝑖 , 𝑐𝑖 )
medical data-sharing through the BIoMT system. is returned back to 𝐸. If not, 𝐶 randomly selects 𝑥 ∈ 𝐷𝜎𝑚 ,
computes the corresponding 𝑐𝑖 = 𝐻2 (𝐴𝑥 𝑚𝑜𝑑 𝑞 , 𝜇𝑖 ), returns
5. Security analysis the result (𝜇𝑖 , 𝑐𝑖 ) back to 𝐸, and records this result into the
list 𝐿𝑖𝑠𝑡𝐻2 .
The security analyses of the correctness, anonymity, and unforge- Secret key query: 𝐸 adaptively chooses the non-target iden-
ability of the proposed ID-DVS scheme have been given in this section. tity 𝐼 𝐷𝑖 to query on secret key. 𝐶 owns a list 𝐿𝐾 to store
(𝑠𝐼 𝐷𝑖 , 𝐼 𝐷𝑖 ). When he obtains the query, he first searches
5.1. Correctness the list 𝐿𝐾 whether the identity 𝐼 𝐷𝑖 is queried or not.
If exists, the result (𝑠𝐼 𝐷𝑖 , 𝐼 𝐷𝑖 ) is returned back to 𝐸. If
According to the verification steps in Verify algorithm, a valid not, 𝐶 obtains (𝐼 𝐷𝑖 , 𝑎𝐼 𝐷𝑖 ) from the list 𝐿𝑖𝑠𝑡𝐻1 or regener-
ID-DVS shall satisfy three conditions. From the signature generation ates it firstly. Next, 𝐶 computes the corresponding 𝑠𝐼 𝐷𝑖
process, (𝑒, 𝑐) satisfy ‖𝑒‖ ≤ 𝐿 and ‖𝑒‖∞ ≤ 𝑞4 which are easily 𝑆 𝑎𝑚𝑝𝑙𝑒𝑝𝑟𝑒(𝐴, 𝑇 , 𝑎𝐼 𝐷𝑖 , 𝜎), returns the result (𝑠𝐼 𝐷𝑖 , 𝐼 𝐷𝑖 ) back to
verified. The third condition 𝑐𝐻2 (𝐴(𝑒 + 𝑠𝐼 𝐷2 ) 𝑎𝐼 𝐷1 𝑚𝑜𝑑 𝑞 , 𝜇) = 𝐸, and records this result into the list 𝐿𝐾 .
𝐻2 (𝐴𝑥 + 𝑎𝐼 𝐷2 𝑚𝑜𝑑 𝑞 , 𝜇) holds which can be verified by the equation Signature query: 𝐸 adaptively chooses a message 𝜇𝑖 to query
𝐴(𝑒 + 𝑠𝐼 𝐷2 ) 𝑎𝐼 𝐷1 = 𝐴𝑥 + 𝑎𝐼 𝐷2 𝑚𝑜𝑑 𝑞. Eq. (5) shows the detailed on signature. 𝐶 owns a list 𝐿𝑆 to store (𝑒, 𝑐𝑖 ). When he
verification processes. obtains the query, he first searches the list 𝐿𝑆 whether the
𝐴(𝑒 + 𝑠𝐼 𝐷2 ) 𝑎𝐼 𝐷1 = 𝐴(𝑥 + 𝑠𝐼 𝐷1 + 𝑠𝐼 𝐷2 ) 𝑎𝐼 𝐷1 message 𝜇𝑖 is queried or not. If exists, the result (𝑒, 𝑐𝑖 , 𝜇)
= 𝐴𝑥 + 𝐴𝑠𝐼 𝐷1 + 𝐴𝑠𝐼 𝐷2 𝑎𝐼 𝐷1 is returned back to 𝐸. If not, 𝐶 obtains (𝜇𝑖 , 𝑐𝑖 ) from the
(5) list 𝐿𝑖𝑠𝑡𝐻2 or regenerates it firstly. Next, 𝐶 computes the
= 𝐴𝑥 + 𝑎𝐼 𝐷1 + 𝑎𝐼 𝐷2 𝑎𝐼 𝐷1
corresponding 𝑒1 = 𝑥 + 𝑠𝐼 𝐷1 , where 𝐼 𝐷1 is set as the signer
= 𝐴𝑥 + 𝑎𝐼 𝐷2 and 𝐼 𝐷2 is set as the designated verifier. Then, he returns
the result (𝑒, 𝑐𝑖 ) back to 𝐸, and records this result into the
Meanwhile, the signature (𝑒 , 𝑐 ) simulated by the designated verifier list 𝐿𝑆 .
also can be verified by the signer as the conditions of ‖𝑒′ ‖ ≤ 𝐿,
‖𝑒′ ‖∞ ≤ 𝑞4, and the equation 𝑐 𝐻2 (𝐴(𝑒 + 𝑠𝐼 𝐷1 ) 𝑎𝐼 𝐷2 𝑚𝑜𝑑 𝑞 , 𝜇) = • Challenge: 𝐸 randomly selects two system users identities 𝐼 𝐷𝑖0
𝐻2 (𝐴𝑥 + 𝑎𝐼 𝐷1 𝑚𝑜𝑑 𝑞 , 𝜇) holds, which is shown in Eq. (6) holds. and 𝐼 𝐷𝑖1 which are not queried before. Next, he sends these two
𝐴(𝑒 + 𝑠𝐼 𝐷1 ) 𝑎𝐼 𝐷2 = 𝐴(𝑥 + 𝑠𝐼 𝐷2 + 𝑠𝐼 𝐷1 ) 𝑎𝐼 𝐷2 target identities to 𝐶. 𝐶 randomly selects the identity 𝐼 𝐷𝑖𝑏 , 𝑏
0, 1 as the signer and the other one as the designated verifier, and
= 𝐴𝑥 + 𝐴𝑠𝐼 𝐷2 + 𝐴𝑠𝐼 𝐷1 𝑎𝐼 𝐷2
(6) derives the ID-DVS (𝑒, 𝑐𝑖0 ) and (𝑒 , 𝑐𝑖1 ) according to the ID-DVS
= 𝐴𝑥 + 𝑎𝐼 𝐷2 + 𝑎𝐼 𝐷1 𝑎𝐼 𝐷2 processes, and sends it back to 𝐸.
= 𝐴𝑥 + 𝑎𝐼 𝐷1 • Guess: 𝐸 utilizes the formerly obtained messages and performs the
guess of signer 𝑏 . 𝐶 confirms whether 𝐼 𝐷𝑖𝑏 is the real signer or
not. If correct, 𝐸 wins this game.
5.2. Anonymity • Analyze: Because the parameter 𝑥 is randomly selected with the
same Gaussian distribution 𝐷𝜎𝑚 , the statistical distance of 𝑐𝑖0 and
Theorem 1. The proposed ID-DVS can capture anonymity with lattice 𝑐𝑖1 is indistinguishable. Therefore, the statistical distance of these
assumption Z 𝑆 𝐼 𝑆𝑞𝜅,𝑛,𝑚,𝛽 if no adversary can correctly distinguish the real two signatures (𝑒, 𝑐𝑖0 ) and (𝑒 , 𝑐𝑖1 ) generated by 𝑒 = 𝑥 + 𝑠𝐼 𝐷𝑖 and
0
signer with the non-negligible probability. 𝑒 = 𝑥 + 𝑠𝐼 𝐷𝑖 is also indistinguishable. This is to say that 𝐸
1
cannot distinguish the correct signer of these two signatures and
the proposed ID-DVS can guarantee the signers anonymity.
Proof. According to Definition 6, 𝐸 attempts to distinguish the real
signer by performing the queries on Hash, secret key, and sign algo-
rithms under the adaptively chosen identity attack. Here, 𝐸 can execute 5.3. Unforgeability
enough times queries on three algorithms to obtain information about
the non-target identity in polynomial time. Meanwhile, the probability Theorem 2. The proposed ID-DVS can capture unforgeability with lattice
that 𝐸 wins one round query-respond game is defined as at least 𝜁. assumption Z 𝑆 𝐼 𝑆𝑞𝜅,𝑛,𝑚,𝛽 if no adversary can generate a valid signature
Then, 𝐶 generates a signature with the target identity 𝐼 𝐷 and lets 𝐸 with the non-negligible probability.
guess the real signer. Detailed query-respond processes are shown as
follows.
Proof. According to Definition 7, 𝐸 attempts to derive a valid signature
• Initialize: 𝐶 executes the Setup algorithm to generate the system
by performing the queries on Hash, secret key, and sign algorithms
parameters (𝑛, 𝑚, 𝑞 , 𝑘, 𝜎) and sends them to 𝐸.
under the adaptively chosen message attack. Here, 𝐸 can execute
• Query: 𝐸 adaptively chooses the non-target identity to query with
enough time queries on three algorithms to obtain information about
𝐶.
the non-target message in polynomial time. Meanwhile, the probability
𝐻1 query: 𝐸 adaptively chooses the non-target identity 𝐼 𝐷𝑖 that 𝐸 wins one round query-respond game is defined as at least 𝜉.
to query on 𝐻1 function. 𝐶 owns a list 𝐿𝑖𝑠𝑡𝐻1 to store Then, 𝐶 attempts to utilize this forged signature to solve the lattice
(𝐼 𝐷𝑖 , 𝑎𝐼 𝐷𝑖 ). When he obtains the query, he first searches the instance Z 𝑆 𝐼 𝑆𝑞𝜅,𝑛,𝑚,𝛽 . Detailed query-respond processes are shown as
list 𝐿𝑖𝑠𝑡𝐻1 whether the identity 𝐼 𝐷𝑖 is queried or not. If follows.
6
C. Li et al. Journal of Systems Architecture 160 (2025) 103362
• Initialize: 𝐶 executes the Setup algorithm to generate the system It also has:
parameters (𝑛, 𝑚, 𝑞 , 𝑘, 𝜎) and sends them to 𝐸.
𝐴(𝑒 𝑒 ) = 𝐴(𝑥 𝑥 ) 𝑚𝑜𝑑 𝑞 (10)
• Query: 𝐸 adaptively chooses the non-target messages to query
with 𝐶. 𝐴(𝑒1 to
Due 𝑒𝑥 ) = 0𝑚𝑜𝑑 𝑞
1 𝑥 ≠ 0, it can derive
(11)
𝐻1 query: 𝐸 adaptively chooses the identity 𝐼 𝐷𝑖 to query Here, 𝐶 quits this game if 𝑒1 𝑒 = 0. Otherwise, 𝑒1 𝑒 is a
1 1
on 𝐻1 function. 𝐶 owns a list 𝐿𝑖𝑠𝑡𝐻1 to store (𝐼 𝐷𝑖 , 𝑎𝐼 𝐷𝑖 ). solution of SIS instance 𝐴𝑒 = 0 𝑚𝑜𝑑 𝑞.
When he obtains the query, he first searches the list 𝐿𝑖𝑠𝑡𝐻1 • Analyze: There are two situations in which 𝐶 quits the query-
whether the identity 𝐼 𝐷𝑖 is queried or not. If exists, the re- respond game. Therefore, the success rate is 𝑞 +𝑞 𝜉 +𝑞 +𝑞 . This
sult (𝐼 𝐷𝑖 , 𝑎𝐼 𝐷𝑖 ) is returned back to 𝐸. If not, 𝐶 computes the 𝐻1 𝐻2 𝐾 𝑆
probability is negligible with the increase in query times. In
corresponding 𝑎𝐼 𝐷𝑖 = 𝐻1 (𝐼 𝐷𝑖 ), returns the result (𝐼 𝐷𝑖 , 𝑎𝐼 𝐷𝑖 )
addition, the lattice assumption is a non-deterministic polynomial
back to 𝐸, and records this result into the list 𝐿𝑖𝑠𝑡𝐻1 .
problem that cannot be broken with current classical or quantum
𝐻2 query: 𝐸 adaptively chooses the non-target message 𝜇𝑖 to
computational conditions.
query on 𝐻2 function. 𝐶 owns a list 𝐿𝑖𝑠𝑡𝐻2 to store (𝜇𝑖 , 𝑐𝑖 ).
When he obtains the query, he first searches the list 𝐿𝑖𝑠𝑡𝐻2
From former theoretical security proof, the proposed ID-DVS scheme
whether the identity 𝜇𝑖 is queried or not. If exists, the result
can obtain correctness, anonymity, and unforgeability. Meanwhile,
(𝜇𝑖 , 𝑐𝑖 ) is returned back to 𝐸. If not, 𝐶 randomly selects
𝑥 ∈ 𝐷𝜎𝑚 , computes the corresponding 𝑐𝑖 = 𝐻2 (𝐴𝑥 𝑚𝑜𝑑 𝑞 , 𝜇𝑖 ), this ID-DVS scheme can also satisfy the post-quantum security as it
returns the result (𝜇𝑖 , 𝑐𝑖 ) back to 𝐸, and records this result is constructed with lattice assumption. Compared with other classi-
into the list 𝐿𝑖𝑠𝑡𝐻2 . cal cryptography algorithm-based BIoMT systems, this scheme can
well guarantee anti-quantum security for medical data-sharing among
Secret key query: 𝐸 adaptively chooses the identity 𝐼 𝐷𝑖 to
query on secret key. 𝐶 owns a list 𝐿𝐾 to store (𝑠𝐼 𝐷𝑖 , 𝐼 𝐷𝑖 ). different medical institutions.
When he obtains the query, he first searches the list 𝐿𝐾
whether the identity 𝐼 𝐷𝑖 is queried or not. If exists, the 6. Performance analysis
result (𝑠𝐼 𝐷𝑖 , 𝐼 𝐷𝑖 ) is returned back to 𝐸. If not, 𝐶 obtains
(𝐼 𝐷𝑖 , 𝑎𝐼 𝐷𝑖 ) from the list 𝐿𝑖𝑠𝑡𝐻1 or regenerates it firstly. Next,
𝐶 computes the corresponding 𝑠𝐼 𝐷𝑖𝑆 𝑎𝑚𝑝𝑙𝑒𝑝𝑟𝑒(𝐴, 𝑇 , 𝑎𝐼 𝐷𝑖 , The performance analyses of this ID-DVS scheme from the theory
𝜎), returns the result (𝑠𝐼 𝐷𝑖 , 𝐼 𝐷𝑖 ) back to 𝐸, and records this and simulation aspects have been given in this section.
result into the list 𝐿𝐾 .
Signature query: 𝐸 adaptively chooses the non-target mes-
6.1. Theoretical analysis
sage 𝜇𝑖 to query on signature. 𝐶 owns a list 𝐿𝑆 to store (𝑒, 𝑐𝑖 ).
When he obtains the query, he first searches the list 𝐿𝑆
whether the message 𝜇𝑖 is queried or not. If exists, the result In this phase, six items are selected for comparison, where the
(𝑒, 𝑐𝑖 , 𝜇) is returned back to 𝐸. If not, 𝐶 obtains (𝜇𝑖 , 𝑐𝑖 ) from assumption is the lattice assumption, 𝑚𝑝𝑘 is the system master key,
the list 𝐿𝑖𝑠𝑡𝐻2 or regenerates it firstly. Next, 𝐶 computes the 𝑚𝑠𝑘 is the system private key, 𝑝𝑘 is the system users public key, 𝑠𝑘 is
corresponding 𝑒 = 𝑥 + 𝑠𝐼 𝐷1 , where 𝐼 𝐷1 is set as the signer the system users private key, and signature is the size of the proposed
and 𝐼 𝐷2 is set as the designated verifier. Then, he returns signature. The comparison results are shown in Table 3. Firstly, the
the result (𝑒, 𝑐𝑖 ) back to 𝐸, and records this result into the schemes in Ref. [24,34] and this proposed scheme are based on the
list 𝐿𝑆 . problem of Z 𝑆 𝐼 𝑆, the schemes in Ref. [29,30] are based on Ring-
LWE, and the scheme in Ref. [35] is based on NTRU lattice. Secondly,
• Forge: 𝐸 can respectively perform 𝑞𝐻1 , 𝑞𝐻2 , 𝑞𝐾 , and 𝑞𝑆 queries on
the size of 𝑚𝑝𝑘, 𝑚𝑠𝑘, 𝑝𝑘, and 𝑠𝑘 is in relation to the parameters of
the algorithms of 𝐻1 Hash, 𝐻2 Hash, secret key, and sign until
𝑚, 𝑛, and 𝑞. Then, the size of the signatures in these schemes is also
obtaining enough information. With these query results, 𝐸 can
with the effort scalar factor 𝜎 and ring number 𝑁. In Ref. [29] and
forge a valid signature (𝑒 , 𝑐𝑖 ) about the target message 𝜇∗ . Then,
Ref. [30], the signature size increases with the ring number increasing
𝐸 returns it to 𝐶.
• Challenge: 𝐶 first confirms that the signature secret key about which will affect the efficiency of the signature algorithm. Here, there
identity 𝐼 𝐷𝑖 is not queried, the signature about message 𝜇 is not are no results about 𝑚𝑝𝑘 and 𝑚𝑠𝑘 in Ref. [24] and Ref. [24,34] as the
queried, and the public keys of (𝑎𝐼 𝐷1 , 𝑎𝐼 𝐷2 ) is derived by 𝐶. Then, algorithms of Setup and KeyGen. in these two references are not divided.
𝐶 utilizes this forged signature (𝑒 , 𝑐𝑖 ) to solve the Z 𝑆 𝐼 𝑆𝑞𝜅,𝑛,𝑚,𝛽 These theoretical comparisons and analyses show that the proposed
instance 𝐴𝑒 = 0 𝑚𝑜𝑑 𝑞. He checks the list 𝐿𝑖𝑠𝑡𝐻2 and quits this ID-DVS has certain advantages over those in the other five related
game if that (𝜇𝑖∗ , 𝑐𝑖 ) does not exist. Otherwise, he utilizes the same schemes.
random vector 𝑥 ∈ 𝐷𝜎𝑚 and derives a new valid signature (𝑒 , 𝑐𝑖 ) Meanwhile, the theoretical analyses of the times costs of Setup,
according to the sign algorithm with the following two equations. KeyGen, Sign, and Verify algorithms are presented in Table 4, where
𝑐𝑖𝐻2 (𝐴(𝑒 + 𝑠𝐼 𝐷2 ) 𝑎𝐼 𝐷1 𝑚𝑜𝑑 𝑞 , 𝜇) 𝑇𝑇 𝑟𝑎𝑝 represents the time costs of trapdoor algorithm, 𝑇𝑆 𝑎𝑚 represents
⎪ the Gaussian Samplepre algorithm, 𝑇𝑀 𝑢𝑙 represents the scalar mul-
⎪ = 𝐻2 (𝐴𝑥 + 𝑎𝐼 𝐷2 𝑚𝑜𝑑 𝑞 , 𝜇 )
(7) tiplication algorithm, and 𝑇𝐻 represents the hash algorithm. Here,
⎪𝑐𝑖 ←𝐻2 (𝐴(𝑒 + 𝑠𝐼 𝐷2 ) 𝑎𝐼 𝐷1 𝑚𝑜𝑑 𝑞 , 𝜇) some high-time-consuming algorithms and steps have been selected for
⎩ = 𝐻2 (𝐴𝑥 + 𝑎𝐼 𝐷2 𝑚𝑜𝑑 𝑞 , 𝜇∗ ) comparison, and some other addition or modular operations that are
According to the verification algorithm, it has: low-time-consuming are not considered. The Setup and KeyGen algo-
{ rithms can be prepared in advance, which can save time and costs. So
𝐴(𝑒 + 𝑠𝐼 𝐷2 ) 𝑎𝐼 𝐷1 = 𝐴𝑥 + 𝑎𝐼 𝐷2 𝑚𝑜𝑑 𝑞
(8) the time-consuming in other algorithms will affect the efficiency more.
𝐴(𝑒 + 𝑠𝐼 𝐷2 ) 𝑎𝐼 𝐷1 = 𝐴𝑥 + 𝑎𝐼 𝐷2 𝑚𝑜𝑑 𝑞
In the proposed ID-DVS scheme, the time costs of KeyGen and Sign
Then, it has: algorithms are lower than the other schemes. From these comparison
{
𝐴𝑒 𝑎𝐼 𝐷1 = 𝐴𝑥 𝑚𝑜𝑑 𝑞 results, it can derived that the proposed ID-DVS has certain advantages
(9)
𝐴𝑒 𝑎𝐼 𝐷1 = 𝐴𝑥 𝑚𝑜𝑑 𝑞 over those in the other five related schemes.
7
C. Li et al. Journal of Systems Architecture 160 (2025) 103362
Table 3
Keys size comparison.
Ref. Assumption mpk msk pk sk signature
Li et al. [24] Z 𝑆𝐼𝑆 mnlog2q mnlog2q 2mlog(12𝜎)
Ye et al. [29] Ring-LWE mnlogq n(m-n)logq nlogq mlogq 2mlog(12𝜎)+Nlog3
Bagchi et al. [30] Z 𝑆𝐼𝑆 2mlogq mlogq 2mlogq mlogq 2Nmlog(12𝜎)
Li and Jiang et al. [34] Ring-LWE mnlog2q mnlog2q 2mlog(12𝜎)
Yu et al. [35] NTRU mlogq 4𝑛2 𝑙𝑜𝑔 𝑞 mlogq 2nlogq 2mlog(2𝜎)
This scheme Z 𝑆𝐼𝑆 mnlogq mmlogq nlogq mlogq 2mlog(12𝜎)
Table 4
Time costs comparison.
Items Setup KeyGen. Sign Verify
Li et al. [24] 2𝑇𝑇 𝑟𝑎𝑝 2𝑇𝑀 𝑢𝑙 + 𝑇𝐻 3𝑇𝑀 𝑢𝑙 + 𝑇𝐻
Ye et al. [29] 𝑇𝑇 𝑟𝑎𝑝 𝑇𝑆 𝑎𝑚 + 𝑇𝑀 𝑢𝑙 𝑇𝑆 𝑎𝑚 + 7𝑇𝑀 𝑢𝑙 + 3𝑇𝐻 5𝑇𝑀 𝑢𝑙 + 2𝑇𝐻
Bagchi et al. [30] 2𝑇𝑇 𝑟𝑎𝑝 3𝑁 𝑇𝑀 𝑢𝑙 + 𝑁 𝑇𝐻 3𝑁 𝑇𝑀 𝑢𝑙 + 𝑁 𝑇𝐻 2𝑇𝑀 𝑢𝑙 + 𝑇𝐻
Li and Jiang et al. [34] 2𝑁 𝑇𝑇 𝑟𝑎𝑝 5𝑇𝑀 𝑢𝑙 + 2𝑇𝐻 3𝑇𝑀 𝑢𝑙 + 𝑇𝐻
Yu et al. [35] 𝑇𝑇 𝑟𝑎𝑝 𝑁 𝑇𝑆 𝑎𝑚 + 2𝑁 𝑇𝑀 𝑢𝑙 + 2𝑁 𝑇𝐻 3𝑇𝑀 𝑢𝑙 + 𝑇𝐻 6𝑇𝑀 𝑢𝑙 + 4𝑇𝐻
This scheme 𝑇𝑇 𝑟𝑎𝑝 𝑇𝑆 𝑎𝑚 + 𝑇𝐻 2𝑇𝑀 𝑢𝑙 + 𝑇𝐻 4𝑇𝑀 𝑢𝑙 + 𝑇𝐻
Fig. 2. Keys size comparison (80-bit security level with parameter setting of 𝑛 = 512 𝑚 = 3549, 𝑞 = 223 , and 𝜎 = 230 ; 192-bit security level with parameter setting of 𝑛 = 1024 𝑚 = 8323,
𝑞 = 227 , and 𝜎 = 230 ).
6.2. Simulation evaluation Ref. [40]. Then, the time-consuming results in Table 4 are calculated,
and the results show that this ID-DVS scheme has obvious advantages
To more clearly compare the advantages and disadvantages of dif- that other similar schemes. Meanwhile, the simulated devices are with
ferent schemes, the ID-DVS scheme has been executed with the Matlab 3.2 V and 7.6 mA. With the former calculated time-consuming data,
2016b on a Windows 11 desktop with Intel(R) Core(TM) i5-1240P the energy-consuming results are calculated and shown in Fig. 4.
1.90 GHz and 16G RAM. Here, the system parameters are selected
according to those in Ref. [39], which are presented in the tile of 7. Conclusion
Fig. 2. Meanwhile, the signature size in Ref. [29] and Ref. [30] is in
relation to the ring number 𝑁 which is preset as 𝑁 = 3. With the This paper contributes to privacy protection in the cross-chain
ring number increasing, the signature size in these two references will health data-sharing process in the BIoMT systems and introduces an
increase. From the comparison results, the key size of 𝑝𝑘 and 𝑠𝑘 in this MCF model with a DVS scheme. The MCF model is constructed with
ID-DVS has a certain advantage over other schemes. Although 𝑚𝑝𝑘 and blockchain and relay chain technologies, which can support cross-chain
𝑚𝑠𝑘 are equal to or bigger than that in other schemes, this ID-DVS is health data-sharing and guarantee that data is not tampered with.
constructed with the lattice assumption Z 𝑆 𝐼 𝑆 which can provide a The DVS is designed with lattice cryptography which can resist anti-
strong security guarantee. As the signing process is the main part of a quantum attack. Meanwhile, the combination of the MCF model and
signature scheme, the signature size is the smallest compared with these DVS scheme can effectively improve the privacy security of system
similar schemes, which can improve the algorithm execution efficiency. transactions and users. Then, it has proved that the DVS scheme can
Then, the simulation of the time-consuming and energy-consuming satisfy the security requirements of unforgeability, anonymity, and
are shown in Fig. 3 and Fig. 4, respectively. Here, the time-consuming non-traceability. The key size comparison shows that the proposed
of 𝑇𝑇 𝑟𝑎𝑝 , 𝑇𝑆 𝑎𝑚 , 𝑇𝑀 𝑢𝑙 , 𝑇𝐻 algorithms are set according to the principal in DVS scheme is efficient and ledger space-saving, the consumption
8
C. Li et al. Journal of Systems Architecture 160 (2025) 103362
Fig. 3. Time-consuming comparison.
Fig. 4. Energy-consuming comparison.
comparison of time and energy shows that this DVS is more practical Declaration of competing interest
for cross-chain transactions and the performance evaluations of cross-
chain transactions show that the proposed MCF model is efficient and The authors declare that they have no known competing finan-
practical for BIoMT systems. These works provide a new solution for cial interests or personal relationships that could have appeared to
the data island and privacy protection issues in current IoMT systems influence the work reported in this paper.
and promote the cross-chain technology application in BIoMT systems.
Acknowledgments
Moreover, there are still some worth exploring research directions,
such as cross-chain identity authentication, secure secret sharing, data
This work was supported by the National Natural Science Founda-
access control, and efficient data retrieval in cross-chain health data- tion of China under Grant Numbers 62272090, 72293583, 72293580,
sharing processes which will become the possible research orientations the Foundation of State Key Laboratory of Public Big Data under Grant
in future work. PBD2023-25, the Foundation and Cutting-Edge Technologies Research
Program of Henan Province (CN) under Grant Numbers 242102211073,
CRediT authorship contribution statement the Japan Society for the Promotion of Science (JSPS) KAKENHI Grant
Numbers JP22K11989, JP24K14910, Leading Initiative for Excellent
Chaoyang Li: Writing review & editing, Writing original draft, Young Researchers (LEADER), MEXT, Japan, and Japan Science and
Formal analysis, Conceptualization. Yuling Chen: Writing review Technology Agency (JST), PRESTO Grant Number JPMJPR21P3, JST
& editing, Supervision. Mianxiong Dong: Project administration, In- ASPIRE Grant Number JPMJAP2344, and the Soroptimist Japan Foun-
vestigation. Jian Li: Validation, Supervision. Min Huang: Validation, dation. Mianxiong Dong is the corresponding author, and the Doctor
Supervision. Xiangjun Xin: Supervision, Funding acquisition. Kaoru Scientific Research Fund of Zhengzhou University of Light Industry
Ota: Supervision, Formal analysis. under Grant 2021BSJJ033.
9
C. Li et al. Journal of Systems Architecture 160 (2025) 103362
Data availability [21] Z. Qu, Y. Meng, B. Liu, G. Muhammad, P. Tiwari, QB-IMD: A secure medical
data processing system with privacy protection based on quantum blockchain
for IoMT, IEEE Internet Things J. 11 (1) (2023) 4049.
No data was used for the research described in the article.
[22] W. Mao, P. Jiang, L. Zhu, Locally verifiable batch authentication in IoMT, IEEE
Trans. Inf. Forensics Secur. 19 (2023) 10011014.
[23] J. Zhang, C. Dong, Y. Liu, Efficient pairing-free certificateless signcryption
References scheme for secure data transmission in IoMT, IEEE Internet Things J. (2023).
[24] C. Li, B. Jiang, M. Dong, Y. Chen, Z. Zhang, X. Xin, K. Ota, Efficient designated
[1] X. Xiang, J. Cao, W. Fan, S. Xiang, G. Wang, Blockchain enabled dynamic trust verifier signature for secure cross-chain health data sharing in BIoMT, IEEE
management method for the internet of medical things, Decis. Support Syst. 180 Internet Things J. 11 (11) (2024) 1983819851.
(2024) 114184. [25] J.-P. Thiers, J. Freudenberger, Code-based cryptography with generalized con-
[2] A. Kosba, A. Miller, E. Shi, Z. Wen, C. Papamanthou, Hawk: The blockchain catenated codes for restricted error values, IEEE Open J. Commun. Soc. 3 (2022)
model of cryptography and privacy-preserving smart contracts, in: 2016 IEEE 15281539.
Symposium on Security and Privacy, SP, IEEE, 2016, pp. 839858. [26] A. Alahmadi, S. Çalkavur, P. Solé, A.N. Khan, M.A. Raza, V. Aggarwal, A new
[3] W. Wang, H. Xu, M. Alazab, T.R. Gadekallu, Z. Han, C. Su, Blockchain-based code based signature scheme for blockchain technology, Mathematics 11 (5)
reliable and efficient certificateless signature for iIoT devices, IEEE Trans. Ind. (2023) 1177.
Inform. 18 (10) (2021) 70597067. [27] R. Punithavathi, K. Venkatachalam, M. Masud, M.A. AlZain, M. Abouhawwash,
[4] Z. Wang, S. Wei, G.-L. Long, L. Hanzo, Variational quantum attacks threaten Crypto hash based malware detection in IoMT framework, Intell. Autom. Soft
advanced encryption standard based symmetric cryptography, Sci. China Inf. Sci. Comput. 34 (1) (2022).
65 (10) (2022) 200503. [28] A. Kuznetsov, I. Oleshko, V. Tymchenko, K. Lisitsky, M. Rodinko, A. Kol-
[5] L.K. Grover, Quantum mechanics helps in searching for a needle in a haystack, hatin, Performance analysis of cryptographic hash functions suitable for use in
Phys. Rev. Lett. 79 (2) (1997) 325. blockchain, Int. J. Comput. Netw. Inf. Secur. 13 (2) (2021) 115.
[6] P.W. Shor, Polynomial-time algorithms for prime factorization and discrete [29] Q. Ye, Y. Lang, H. Guo, Y. Tang, Efficient lattice-based traceable ring signature
logarithms on a quantum computer, SIAM Rev. 41 (2) (1999) 303332. scheme with its application in blockchain, Inform. Sci. 648 (2023) 119536.
[7] D.J. Bernstein, T. Lange, Post-quantum cryptography, Nature 549 (7671) (2017) [30] P. Bagchi, R. Maheshwari, B. Bera, A.K. Das, Y. Park, P. Lorenz, D.K. Yau,
188194. Public blockchain-envisioned security scheme using post quantum lattice-based
[8] R.J. McEliece, A public-key cryptosystem based on algebraic, Coding Thv 4244 aggregate signature for internet of drones applications, IEEE Trans. Veh. Technol.
(1978) 114116. 72 (8) (2023) 1039310408.
[9] L. Lamport, Constructing digital signatures from a one way function, 1979. [31] K.-A. Shim, J. Kim, Y. An, Mq-sign: A new post-quantum signature scheme based
[10] R.C. Merkle, A certified digital signature, in: Conference on the Theory and on multivariate quadratic equations: Shorter and faster, KpqC Round 1 (2022).
Application of Cryptology, Springer, 1989, pp. 218238. [32] H. Nejatollahi, N. Dutt, S. Ray, F. Regazzoni, I. Banerjee, R. Cammarota, Post-
[11] M. Ajtai, Generating hard instances of lattice problems, in: Proceedings of the quantum lattice-based cryptography implementations: A survey, ACM Comput.
Twenty-Eighth Annual ACM Symposium on Theory of Computing, 1996, pp. Surv. 51 (6) (2019) 141.
99108. [33] J. Kim, J.H. Park, Ntru+: Compact construction of NTRU using simple encoding
[12] J. Dey, R. Dutta, Progress in multivariate cryptography: Systematic review, method, IEEE Trans. Inf. Forensics Secur. 18 (2023) 47604774.
challenges, and research directions, ACM Comput. Surv. 55 (12) (2023) 134. [34] C. Li, B. Jiang, M. Dong, X. Xin, K. Ota, Privacy preserving for electronic medical
[13] X. Jia, M. Luo, H. Wang, J. Shen, D. He, A blockchain-assisted privacy-aware record sharing in healthchain with group signature, IEEE Syst. J. 17 (4) (2023)
authentication scheme for internet of medical things, IEEE Internet Things J. 9 61146125.
(21) (2022) 2183821850. [35] H. Yu, W. Hui, Certificateless ring signature from NTRU lattice for electronic
[14] Q. Lin, X. Li, K. Cai, M. Prakash, D. Paulraj, Secure Internet of medical Things voting, J. Inf. Secur. Appl. 75 (2023) 103496.
(IoMT) based on ECMQV-MAC authentication protocol and EKMC-SCP blockchain [36] L. Yao, J. Weng, A. Yang, X. Liang, Z. Wu, Z. Jiang, L. Hou, Scalable CCA-secure
networking, Inform. Sci. 654 (2024) 119783. public-key authenticated encryption with keyword search from ideal lattices in
[15] D. Chen, F. Zhou, Y. Liu, L. Li, Y. Liang, Secure pairing-free certificateless cloud computing, Inform. Sci. 624 (2023) 777795.
aggregate signcryption scheme for IoT, J. Syst. Archit. 156 (2024) 103268. [37] Y. Zhang, W. Susilo, F. Guo, Lattice-based strong designated verifier signature
[16] Y. Han, J. Han, W. Meng, J. Lai, G. Wu, Blockchain-based privacy-preserving with non-delegatability, Comput. Stand. Interfaces 92 (2025) 103904.
public key searchable encryption with strong traceability, J. Syst. Archit. 155 [38] Q. Zhang, Y. Sun, Y. Lu, W. Huang, Revocable identity-based designated verifier
(2024) 103264. proxy re-signature with signature evolution, Comput. Stand. Interfaces 92 (2025)
[17] S. Zou, Q. Cao, C. Huangqi, A. Huang, Y. Li, C. Wang, G. Xu, A physicians 103894.
privacy-preserving authentication and key agreement protocol based on decen- [39] D. Micciancio, O. Regev, Lattice-based cryptography, in: Post-Quantum
tralized identity for medical data sharing in IoMT, IEEE Internet Things J. 11 Cryptography, Springer, 2009, pp. 147191.
(17) (2024) 2917429189. [40] L. Ducas, A. Durmus, T. Lepoint, V. Lyubashevsky, Lattice signatures and bimodal
[18] R. Guo, G. Yang, H. Shi, Y. Zhang, D. Zheng, O 3-R-CP-ABE: An efficient and Gaussians, in: Annual Cryptology Conference, Springer, 2013, pp. 4056.
revocable attribute-based encryption scheme in the cloud-assisted IoMT system, [41] M. Ajtai, Generating hard instances of the short basis problem, in: Automata,
IEEE Internet Things J. 8 (11) (2021) 89498963. Languages and Programming: 26th International Colloquium, ICALP99 Prague,
[19] C. Li, M. Dong, J. Li, G. Xu, X.-B. Chen, W. Liu, K. Ota, Efficient medical big Czech Republic, July 1115, 1999 Proceedings 26, Springer, 1999, pp. 19.
data management with keyword-searchable encryption in healthchain, IEEE Syst.
J. 16 (4) (2022) 55215532.
[20] X. Liu, Y. Sun, H. Dong, A pairing-free certificateless searchable public key
encryption scheme for IoMT, J. Syst. Archit. 139 (2023) 102885.
10

View File

@@ -0,0 +1,998 @@
Journal of Systems Architecture 160 (2025) 103339
Contents lists available at ScienceDirect
Journal of Systems Architecture
journal homepage: www.elsevier.com/locate/sysarc
REC: Enhancing fine-grained cache coherence protocol in multi-GPU systems
Gun Ko, Jiwon Lee, Hongju Kal, Hyunwuk Lee, Won Woo Ro
Yonsei University, 50 Yonsei-ro Seodaemun-gu, Seoul, 03722, Republic of Korea
ARTICLE INFO ABSTRACT
Keywords: With the increasing demands of modern workloads, multi-GPU systems have emerged as a scalable solution, ex-
Multi-GPU tending performance beyond the capabilities of single GPUs. However, these systems face significant challenges
Data sharing in managing memory across multiple GPUs, particularly due to the Non-Uniform Memory Access (NUMA)
Cache coherence
effect, which introduces latency penalties when accessing remote memory. To mitigate NUMA overheads, GPUs
Cache architecture
typically cache remote memory accesses across multiple levels of the cache hierarchy, which are kept coherent
using cache coherence protocols. The traditional GPU bulk-synchronous programming (BSP) model relies on
coarse-grained invalidations and cache flushes at kernel boundaries, which are insufficient for the fine-grained
communication patterns required by emerging applications. In multi-GPU systems, where NUMA is a major
bottleneck, substantial data movement resulting from the bulk cache invalidations exacerbates performance
overheads. Recent cache coherence protocol for multi-GPUs enables flexible data sharing through coherence
directories that track shared data at a fine-grained level across GPUs. However, these directories limited in
capacity, leading to frequent evictions and unnecessary invalidations, which increase cache misses and degrade
performance. To address these challenges, we propose REC, a low-cost architectural solution that enhances
the effective tracking capacity of coherence directories by leveraging memory access locality. REC coalesces
multiple tag addresses from remote read requests within common address ranges, reducing directory storage
overhead while maintaining fine-grained coherence for writes. Our evaluation on a 4-GPU system shows that
REC reduces L2 cache misses by 53.5% and improves overall system performance by 32.7% across a variety
of GPU workloads.
1. Introduction each kernel. However, as recent GPU applications increasingly require
more frequent and fine-grained communication both within and across
Multi-GPU systems have emerged to meet the growing demands kernels [11,1315], these frequent synchronizations can lead to sub-
of modern workloads, offering scalable performance beyond what a stantial cache operation and data movement overheads. Additionally,
single GPU can deliver. However, as multi-GPU architectures scale in precisely managing the synchronizations places additional burdens on
size and complexity [1,2], managing memory across multiple GPUs programmers, complicating the optimization of multi-GPU systems.
becomes increasingly challenging [37]. One of the primary challenges Ren et al. [11] proposed HMG, a hierarchical cache coherence
arises from the bandwidth discrepancy between local and remote mem- protocol designed for L2 caches in large-scale multi-GPU systems. HMG
ory, commonly known as the Non-Uniform Memory Access (NUMA) employs coherence directories to record cache line addresses and their
effect [3,4]. To mitigate the NUMA penalty, GPUs generally rely on associated sharers upon receiving remote read requests. Any writes to
caching remote memory accesses, allowing them to be served with
these addresses trigger invalidations. Once capacity is reached, existing
local bandwidth [5,810]. This caching strategy is often extended
entries are evicted from the directory, triggering invalidation requests
across multiple levels of the cache hierarchy, including both private
to the sharer GPUs. These invalidations are unnecessary, as the cor-
on-chip caches and shared caches [3,4,11,12], to better accommodate
responding cache lines do not immediately require coherence to be
the diverse access patterns of emerging workloads.
maintained. When GPUs access data across a wide range of addresses,
While remote data caching offers significant performance benefits
in multi-GPU systems, it also requires extending coherence throughout significant directory insertions lead to a number of unnecessary invali-
the cache hierarchy. Conventional GPUs rely on a simple software- dations for cache lines that have not yet been fully utilized. Subsequent
inserted bulk-synchronous programming (BSP) model [11], which per- accesses to these cache lines result in cache misses, requiring data to
forms cache invalidation and flush operations at the start and end of be fetched again over bandwidth-limited inter-GPU links.
Corresponding author.
E-mail address: wro@yonsei.ac.kr (W.W. Ro).
https://doi.org/10.1016/j.sysarc.2025.103339
Received 10 September 2024; Received in revised form 27 December 2024; Accepted 5 January 2025
Available online 9 January 2025
1383-7621/© 2025 Published by Elsevier B.V.
G. Ko et al. Journal of Systems Architecture 160 (2025) 103339
Fig. 1. Performance of each caching scheme normalized to a system that enables
remote data caching in both L1 and L2 caches using software and hardware coherence
protocols, respectively. No caching refers to a system that disables remote data Fig. 2. Baseline multi-GPU system. Each GPU has a coherence directory that records
caching, simplifying coherence. and tracks the status of shared data at given addresses along with the corresponding
sharer IDs.
To evaluate the implications of the coherence protocol, we mea-
sure the performance impact of unnecessary invalidations on a 4-GPU 2. Background
system that caches remote data in both L1 and L2 caches. L1 caches
are assumed to be software-managed, while L2 caches are managed 2.1. Multi-GPU architecture
under fine-grained invalidation through coherence directories. As Fig. 1
shows, there exists a significant performance opportunity in eliminat- The slowdown of transistor scaling has made it increasingly difficult
ing unnecessary invalidations caused by frequent directory evictions. for single GPUs to meet the growing demands of modern workloads. Al-
Increasing the size of the coherence directory can delay evictions and ternatively, multi-GPU systems have emerged as a viable path forward,
the corresponding invalidation requests, but at the cost of increased offering enhanced performance and memory capacity by leveraging
hardware. Our observations indicate that to eliminate unnecessary multiple GPUs connected using high-bandwidth interconnects such as
invalidations, the size of the coherence directory would need be sub- PCIe and NVLink [18]. However, these inter-GPU links are likely to
stantially increased, accounting for 30.4% of the L2 cache size. As have bandwidth that falls far behind the local memory bandwidth [3,
the size of GPU L2 caches continues to grow [16,17], the aggregate 4,8]. The NUMA effect that arises from this large bandwidth gap
storage overhead of coherence directories becomes substantial, caus- can significantly impact multi-GPU performance, making it crucial to
ing inefficiency in scaling for multi-GPU environment (discussed in optimize remote access bottlenecks to maximize efficiency.
Section 3.3). Fig. 2 illustrates the architectural details of our target multi-GPU
In this paper, we propose Range-based Directory Entry Coalescing system. Each GPU is divided into several SAs, with each comprising a
(REC), an architectural solution that mitigates unnecessary invalidation
number of CUs. Every CU has its own private L1 vector cache (L1V$),
overhead by increasing the effective tracking capacity of the coher-
while the L1 scalar cache (L1S$) and L1 instruction cache (L1I$) are
ence directory without incurring significant hardware costs. Our key
shared across all CUs within an SA. Additionally, each GPU contains
insight is that since directory updates are performed upon receiving
a larger L2 cache that is shared across all SAs. When a data access
remote read requests, leveraging memory access locality provides an
misses in the local cache hierarchy, it is forwarded to either local or
opportunity to coalesce multiple tag addresses of shared data based on
remote GPU memory, depending on the data location. For local mem-
their common address range. To achieve this, we employ a coherence
ory accesses, the cache lines are stored in both the shared L2 cache and
directory design, which aggregates data from incoming remote reads
that share a common base address within the same address range, the L1 cache private to the requesting CU. In the case of remote-GPU
storing only the offset and the sharer IDs. We reduce the storage memory accesses, the data can be cached either only in the L1 cache
requirements of directory entries by designing them in a base-and-offset of the requesting CU [4,5,8] or in both the L2 and L1 caches [3,11,12].
format, recording the common high-order bits of addresses and using a Caching data in remote memory nodes helps mitigate the performance
bit-vector to indicate the index of each coalesced entry within the target degradation caused by accessing remote memory nodes.
range. For incoming writes, if they are found in the coherence direc-
tory, invalidations are propagated only to the corresponding address, 2.2. Remote data caching in multi-GPU
maintaining fine-grained coherence in multi-GPU systems.
To summarize, this paper makes the following contributions: While caching remote data only in the L1 cache can save L2 cache
capacity, it limits the sharing of remote data among CUs. As a result,
• We identify a performance bottleneck of fine-grained shared data such an approach provides lower performance gain when unnecessary
tracking mechanisms in multi-GPU systems. Our analysis demon- invalidation overhead is eliminated in its counterpart, as shown in
strates that such methods generate unnecessary invalidations at Fig. 1. For this reason, in this study, we assume the baseline multi-GPU
coherence directory evictions, which incurs a significant perfor- architecture allows caching of remote data in both L1 and L2 caches.
mance bottleneck due to increased cache miss rates.
A step-by-step process of remote data caching is shown in Fig. 2.
• We show that simply employing larger coherence directories
Upon generating a memory request, an L1 cache lookup is performed
incurs significant storage overhead. Our analysis shows that the
by the requesting CU ( 1 ). When data is not present in the L1, an
baseline multi-GPU system requires a 12× increase in the direc-
L2 cache lookup is generated to check if the remote data is cached
tories to eliminate redundant invalidations.
in the L2 ( 2 ). If the data is found in the L2 cache, it is returned to
• We propose REC which increases effective coverage of the co-
the requesting CU and cached in its local L1 cache. If the data is not
herence directory by enabling each entry to coalesce and track
multiple memory addresses along with the associated sharers. By found in the L2 cache, the request is forwarded to the remote GPU
reducing the L2 cache misses by 53.5%, REC improves overall memory at the given physical address. Subsequently, the requested
performance by 32.7% on average across our evaluated GPU data is returned at a cache line granularity and cached in both the L1
workloads. and L2 caches ( 3 ). At the same time, the coherence directory, which
maintains information about data locations across multiple GPUs, is
2
G. Ko et al. Journal of Systems Architecture 160 (2025) 103339
Fig. 3. Coherence protocol flows in detail. The baseline hardware protocol has two Fig. 4. L2 cache miss rates in baseline and idealized system where no invalidations
stable states: valid and invalid, with no transient states or acknowledgments required are propagated by coherence directory evictions. Cold misses are excluded from the
for write permissions. results.
updated with the corresponding entry and the sharer GPU ( 4 ). Writes Local reads: Local read requests arriving at the L2 cache are directed
to remote data in the home GPU are also performed in the local L2 to either locally- or remotely-mapped data. On cache hits, the data is
cache, following the write-through policy, as the corresponding GPU returned and guaranteed to be consistent because it is either the most
may access the written data in the future. Remote writes arriving at up-to-date data (if mapped to local DRAM) or correctly managed by
the home GPU trigger invalidation messages to be sent out to the sharer the protocol (if mapped to remote GPU). On cache misses, the requests
GPU(s), and the requesting GPU is recorded as a sharer ( 4 ). are forwarded to either local DRAM or a remote GPU. In all cases, the
directory of the requesting GPU remains unchanged.
2.3. Cache coherence in multi-GPU Remote reads: For remote reads that arrive at the home GPU, the
coherence directory records the ID of the requesting GPU at the given
cache line address. If the line is already being tracked (i.e., the entry is
Existing hardware protocols, such as GPU-VI [19], employ coher-
found and valid), the directory simply adds the requester to the sharer
ence directories to track sharers (i.e., L1s) and propagate write-initiated
field and keeps the entry in the valid state. If the line is not being
cache invalidations within a single GPU. Bringing the notion into multi-
tracked, the directory finds an empty spot to allocate a new entry and
GPU environments, Ren et al. proposed HMG [11], a hierarchical design
marks it as valid. When the directory is full and every entry is valid, it
that efficiently manages both intra- and inter-GPU coherence. HMG
evicts an existing entry and replaces it with the new entry (discussed
includes two layers for selecting home nodes to track sharers: (1) the
below).
inter-GPU module (GPM) level that selects a home GPM within a GPU
Local writes: Local writes to data mapped to the home GPU memory
and (2) the inter-GPU level that selects a home GPU across the entire
look up the directory to find whether a matching entry at the line
system. A GPM is a chiplet in multi-chip module GPUs. With this,
address exists. If found, invalidations are propagated to the recorded
HMG reduces the complexity of tracking and maintaining coherence
sharers in the background, and the directory entry becomes invalid.
across a large number of sharers. HMG also optimizes performance by
Remote writes: By default, L2 caches use a write-back policy for local
eliminating all transient states and most invalidation acknowledgments,
writes. As described in Section 2.2, remote writes update both the L2
leveraging weak memory models in modern GPUs [11].
cache of the requester and local memory, similar to a write-through
Each GPU has a coherence directory attached to its L2 cache, policy. Consequently, the directory maintains the entry as valid by
managed by the cache controllers. The directory is organized in a set- adding the requester to the sharer list and sends out invalidations to
associative structure, and each entry contains the following fields: tag, other sharers recorded in the original entry.
sharer IDs, and coherence state. The tag field stores the cache line Directory entry eviction/replacement: Coherence directories are im-
address for the data copied and fetched by the sharer. The sharer plemented in a set-associative structure. Thus, capacity and conflict
ID field is a bit-vector representing the list of sharers, excluding the misses occur as directory lookups are initiated by the read requests con-
home GPU. Each entry is in one of two stable states: valid or invalid. tinuously received from remote GPUs. To notify that the information
Unlike HMG [11], the baseline coherence directory tracks one cache in the evicted entry is no longer traceable, invalidations are sent out as
line per each entry. In contrast, a directory entry in HMG is designed with writes.
to track four cache lines using a single tag address and sharer ID Acquire and release: At the start of a kernel, invalidations are per-
field, which limits its ability to manage each cache line at a fine formed in L1 caches as coherence is maintained using software bulk
granularity. Consequently, a write to any address tracked by a directory synchronizations. However, the invalidations are not propagated be-
entry may unnecessarily invalidate other cache lines within the same yond L1 caches, as L2 caches are kept coherent with the fine-grained
range, potentially causing inefficiencies in remote data caching. We directory protocol. Release operations flush dirty data in both L1 and
discuss the importance of reducing unnecessary cache line invalidations L2 caches.
in detail in Section 3.1. Like typical memory allocation in multi-GPU
systems, the physical address space is partitioned among the GPUs in 3. Motivation
the system. Therefore, data at any given physical address is designated
to one GPU (i.e., the home GPU), and every access by a remote GPU In multi-GPU systems, coherence is managed explicitly through
references the coherence directory of the home GPU. For example, in cache invalidations to ensure data consistency across multiple GPUs.
Fig. 2, GPU0 requests data at address 0xA from GPU1, which is the When invalidation requests are received, sharer GPUs must look up and
home GPU; the corresponding entry is then inserted into the directory invalidate the corresponding cache lines. Subsequent accesses to these
of GPU1 with the relevant information. invalidated cache lines result in cache misses, which are then forwarded
Fig. 3 shows the detailed state transitions and actions initiated by to the home GPU. This, in turn, can negate the performance benefits of
the coherence directory. Note that local and remote refer to the sources local caching as it undermines the effectiveness of caching mechanisms
of memory requests received: local refers to accesses from the local CUs, intended to reduce remote access bottlenecks. In this section, we ana-
and remote refers to accesses from the remote GPUs. lyze the behavior of cache invalidation and its impact on the overall
3
G. Ko et al. Journal of Systems Architecture 160 (2025) 103339
Fig. 6. Performance impact of increasing coherence directory sizes. To eliminate
unnecessary invalidations, GPUs require a directory size up to 12× larger than the
Fig. 5. Fraction of evict-initiated and write-initiated invalidations in the baseline multi- baseline.
GPU system. The results are based on invalidation requests that hit in the sharer-side
L2 caches.
3.2. Source of premature invalidation
performance of multi-GPU systems. We identify the sources of invalida-
As described in Section 2.3, when a coherence directory becomes
tion and explore a straightforward solution to mitigate the associated
full, the GPU needs to evict an old entry and replace it with a new
bottlenecks. Our experiments are conducted using MGPUSim [20], a
one upon receiving a remote read request; an invalidation request must
multi-GPU simulation framework that we have extended to support
be sent out to the sharer(s) in the evicted entry. Fig. 5 shows the
the hardware cache coherence protocol. The detailed configuration is
distribution of invalidations triggered by directory eviction and write
provided in Table 2.
requests, referred to as evict-initiated and write-initiated invalidations,
respectively. The measurements are taken based on the invalidations
3.1. Impact of cache invalidation that are hit in the sharer-side L2 caches after receiving the requests. We
observe that significant amount of invalidations (average 79.5%) are
To ensure data consistency across multiple GPUs, invalidation re- performed by the requests from directory evictions in the home GPUs.
quests are propagated by the home GPU in two cases: (1) when write These invalidations, considered unnecessary as they do not require
requests are received and (2) when an entry is evicted from the coher- immediate action, should be delayed until remote GPUs have full use
ence directory due to capacity and conflict misses. Invalidation requests of the data.
triggered by writes are crucial for maintaining data consistency, as they We also show the percentage of write-initiated invalidations in
ensure that no stale data is accessed in the sharer GPU caches. On the Fig. 5. One can observe that applications such as FIR, LU, and MM2
other hand, invalidations generated by directory eviction aim to notify experience a significant number of invalidations due to write re-
the sharers that the coherence information is no longer traceable, even quests. These workloads exhibit fine-grained communication within
if the data is still valid. A detailed background on the protocol flows and across dependent kernels, necessitating the invalidation of corre-
with invalidations is given in Section 2.3. sponding cache lines in the remote L2 cache upon any modification
Broadcasting invalidations does not significantly impact cache ef- to the shared data. Although the applications exhibit a high percent-
ficiency if the cache lines are already evicted or no longer in use. age of write-initiated invalidations, their impact on cache miss rates
However, when applications exhibit frequent remote memory accesses, may be negligible if the GPUs do not subsequently require access
the generation of new directory entries increases invalidation requests to the invalidated cache lines. Nonetheless, the results from Fig. 4
from eviction, invalidating the associated cache lines prematurely. clearly demonstrate the importance of minimizing unnecessary cache
These premature invalidations lead to higher cache miss rates, as invalidations.
So far, we have discussed how prematurely invalidating remote data
subsequent accesses to the invalidated cache lines result in misses.
leads to increased cache miss rates, which negatively impacts multi-
As remote data misses exacerbates NUMA overheads, they need to be
GPU performance. We also show that a large fraction of invalidation
reduced to improve multi-GPU performance.
requests stems from directory evictions, which frequently occur due to
Fig. 4 shows the impact of cache miss rate when eliminating unnec-
the high volume of remote accesses. These accesses trigger numerous
essary invalidations across the benchmarks listed in Table 3 running on
directory updates, overwhelming the baseline coherence directorys
a 4-GPU system. The figure demonstrates that the baseline system expe-
capacity to effectively manage coherence. A straightforward solution to
riences a cache miss rate more than double (average 2.4×) that of the
mitigate premature invalidations is to increase the size of the coherence
idealized system without the unnecessary invalidation. This increase
directory, providing more coverage to track sharers and reducing evic-
is mainly due to frequent invalidation requests, which prematurely
tion rates. In the following section, we analyze the performance impact
invalidate cache lines before they can be fully utilized, leading to an of larger coherence directory sizes. It is important to note that this
increase in the number of remote memory accesses. The result strongly paper primarily focuses on delaying invalidations caused by directory
motivates us to further study the source of these frequent invalidations evictions, as write-initiated invalidations are necessary and must be
to improve efficiency of remote data caching in multi-GPU systems. performed immediately for correctness.
To demonstrate the performance opportunity, Fig. 1 presents a study
showing the performance of idealized caching without the invalidation 3.3. Increasing directory sizes
overhead. With no invalidations to unmodified cache lines, remote data
can be fully utilized as needed until they are naturally replaced by the A simple approach to delay directory evictions, thereby minimizing
typical cache replacement policy. The performance of the baseline and premature invalidations, is to increase the size of coherence directories.
ideal system is represented in the first and fourth bars, respectively, Limited directory sizes lead to significant evict-initiated invalidations,
in Fig. 1. The result shows that an ideal system with no unnecessary which can undermine the performance benefits of local caching. To
cache invalidation overheads outperforms the baseline by up to 2.79× quantify the benefits of larger directories, we conduct a quantitative
(average 36.9%). As demonstrated by Figs. 1 and 4, reducing premature analysis of performance improvements with increasing directory sizes.
cache invalidations is crucial in improving efficiency of remote data In our simulated 4-GPU system, each GPU has an L2 cache size of
caching in multi-GPU systems. 2 MB, with each cache line being 64B. Each coherence directory tracks
4
G. Ko et al. Journal of Systems Architecture 160 (2025) 103339
Fig. 7. Average performance improvement per increased directory storage in the
baseline coherence directory design. The results are normalized to the system with
8K-entry coherence directory.
the identity of all sharers excluding the home GPU (i.e., three GPUs).
To cover the entire L2 cache space for three GPUs, an ideal coherence
directory would require approximately 96K entries, or about 12× the
baseline 8K entries.
Fig. 6 illustrates the normalized performance for increasing the
Fig. 8. A high-level overview of (a) baseline and (b) proposed REC architecture with
directory sizes by 2×-12× the baseline. With an ideal directory size,
simplified 2-entry coherence directories. The figure illustrates a scenario where GPU1
unnecessary invalidations from directory evictions can be eliminated, accesses memory of GPU0 in order of 0 × 1000, 0 × 1040, 0 × 1080, and 0 × 1000
leaving only write-initiated invalidations. The results show that ap- by each CU. In the baseline directory, entry that tracks status of data at 0 × 1000
plications exhibit significant performance gains as the directory size is evicted for recording the address 0 × 1080. The proposed directory coalesces three
increases, with some benchmarks (e.g., ATAX, PR, and ST) requiring addresses with same base address into one entry.
8×-12× the baseline size to achieve the highest speed-up. Specifically,
benchmarks such as PR and ST show irregular memory access patterns
that span a wide address range, leading to higher chances of conflict 4.1. Hardware overview
misses when updating coherence directories. Most other tested bench-
marks require up to six times the baseline directory size to achieve As shown in Section 3.2, a significant fraction of cache invalidations
maximum attainable performance; the average speedup with six times are generated by the frequent directory evictions. These invalidations
the size is 1.35×. lead to increased cache misses, as data is prematurely invalidated from
Each entry in the coherence directory comprises a tag, sharer list, the cache, requiring subsequent accesses to fetch the data from remote
and coherence state. We assume 48 bits for tag addresses, a 3-bit memory. While simply increasing the directory size can address this
vector for tracking sharers, and one bit for the directory entry state; bottleneck, the associated cost of hardware can become substantial. To
thus, each entry requires a total of 52 bits of storage. Our baseline address this, we propose REC, an architectural solution that compresses
directory implementation has 8K entries and occupies approximately remote GPU access information, retaining as much data as possible
2.5% of the L2 cache [11]. Therefore, the storage cost of the baseline before eviction occurs. It aggregates data from incoming remote read
directory in each GPU is 52 × 8192/8/1024 = 52 kB, assuming 8 requests so that (1) multiple reads to the same address range share
bits per byte and 1024 bytes per kilobyte. From our observation in a common base address, storing only the offset and source GPU in-
Fig. 6, applications require directory sizes from 6× up to 12× the formation, and (2) the coalescing process does not result in any loss
baseline to achieve maximum performance. This corresponds to a total
of information, maintaining the accuracy of the coherence protocol.
storage cost of 312-624 kB, which is an additional 15.230.4% of
We now discuss the design overview of REC and the details of the
the L2 cache size. While increasing directory size can significantly
associated hardware components.
improve performance, the associated hardware costs are substantial.
Fig. 8(a) shows how the baseline GPU handles a sequence of in-
To show the inefficiency of simply scaling directory sizes, we calculate
coming read requests. The cache controller records the tag addresses
the performance per storage using the results in Fig. 6 and the number
and the corresponding sharer IDs in the order that the requests arrive.
of directory entries. Fig. 7 illustrates the results relative to the baseline
When the coherence directory reaches its capacity, the cache controller
with 8K entries, showing that performance improvements per increased
follows a typical FIFO policy to replace the oldest entry with a new
storage do not scale proportionally with larger coherence directories.
one within the set. Once an entry is evicted, the information it held
Additionally, since GPU applications require different directory sizes
can no longer be tracked, triggering an invalidation request to be sent
to achieve maximum performance, simply increasing the directory size
is not an efficient solution. Moreover, as GPU L2 caches continue to to the GPU listed in the entry. Upon receiving this request, the sharer
grow [16,17], the cost of maintaining proportionally larger coherence GPU checks its L2 cache and invalidates the corresponding cache line,
directories will only amplify these overheads. Therefore, improving leading to a cache miss on any subsequent access to the cache line.
coherence directory coverage without significant storage overhead mo- To delay invalidations caused by directory evictions without signif-
tivates the need for more efficient fine-grained hardware protocols in icant hardware overhead, we introduce the REC architecture, which
multi-GPU systems. enhances the baseline coherence directory by leveraging spatial locality
for merging multiple addresses into a single entry. As illustrated in
4. REC architecture Fig. 8(b), REC stores tag addresses with common high-order bits as a
single entry using a base-plus-offset format. When a new read request
This work aims to enhance coherence directory coverage while matches the base address in an existing entry, the offset and sharer in-
avoiding significant hardware overhead, overall reducing unnecessary formation are appended to that entry, reducing the need for additional
cache invalidations in multi-GPU systems. We introduce REC, an archi- entries and delaying evictions. The base address represents the shared
tecture that coalesces directory entries by leveraging the spatial locality high-order bits, covering a range of addresses and reducing the storage
in memory accesses observed in GPU workloads. In this section, we required compared to storing full tag addresses individually. Addition-
provide an overview of REC design and discuss its integration with ally, REC uses position bits to efficiently track multiple addresses within
existing multi-GPU coherence protocols. the specified range, further minimizing storage overhead.
5
G. Ko et al. Journal of Systems Architecture 160 (2025) 103339
Table 1
Trade-offs between addressable range and storage for each entry. Note that one valid
bit, not shown in the table, is included in the overall calculation.
Addressable range
64B 128B 256B 1 kB 4 kB
Base address bits 48 41 40 38 36
Position/Sharer bits /3 2/6 4/12 16/48 64/192
Total bits per entry 52 50 57 103 293
Table 2
Baseline GPU configuration.
Parameter Configuration
Number of SAs 16
Number of CUs 4 per SA
L1 vector cache 1 per CU, 16 kB 4-way
L1 inst cache 1 per SA, 32 kB 4-way Fig. 10. Overview of the REC protocol flows. In the example coherence directory,
L1 scalar cache 1 per SA, 16 kB 4-way entry insertion and offset addition operations are highlighted in blue, while eviction
L2 cache 2 MB 16-way, 16 banks, write-back and offset deletion operations are shown in red.
Cache line size 64B
Coherence directory 8K entries, 8-way
DRAM capacity 4 GB HBM, 16 banks
DRAM bandwidth 1 TB/s [11]
for comparing the storage costs. REC designs with larger addressable
Inter-GPU bandwidth 300 GB/s, bi-directional ranges can benefit from increased directory coverage but at the cost of
storage. In the evaluation of this paper, we tested various addressable
ranges for REC. Each design is configured to coalesce the maximum
Table 3
number of offsets within its specified range. Later in the results, we
Tested workloads.
confirm that a 1 kB coalesceable range offers the best trade-off, bal-
Benchmark Abbr. Memory footprint
ancing reasonable size overhead per entry with the ability to coalesce
Matrix transpose and vector multiplication [21] ATAX 128 MB
2-D convolution [21] C2D 512 MB
a significant number of entries before evictions occur (discussed in
Finite impulse response [22] FIR 128 MB Section 5.2).
Matrix-multiply [21] GEMM 128 MB Based on these findings, the format of a directory entry is as
Vector multiplication and matrix addition [21] GEMV 256 MB illustrated in Fig. 9. Each entry comprises a base address, coalesced
2-D jacobi solver [21] J2D 128 MB
entries, and a valid bit. When the first remote read request arrives at the
LU decomposition [21] LU 128 MB
2 matrix multiplications [21] MM2 128 MB home GPU, the cache controller sets the base address by right-shifting
3 matrix multiplications [21] MM3 64 MB the tag address by the number of bits needed to represent the offset
PageRank [22] PR 256 MB within the specified range. For a 48-bit tag, the address is right-shifted
Simple convolution [23] SC 512 MB
by 10 bits (considering a 64B-aligned 1 kB range), and the resulting
Stencil 2D [24] ST 128 MB
bits from positions 64 to 101 are used to store the base address. The
coalesced entry is identified using the offset within the 1 kB range,
represented by a position bit, followed by three bits for recording the
sharers. The position bit is calculated as:
( )
Tag mod 𝑚
𝑝= × (𝑛 + 1)
64
where 𝑚 denotes the coalescing range, and 𝑛 is the number of shar-
ers, which are set to 1 kB and 3, respectively. Once the position is
determined, the corresponding position and the sharer bit are set to
1 using bitwise OR operation. Given that the 1 kB range allows each
entry to record up to 16 individual tag addresses, we use the lower 64
Fig. 9. Coherence directory entry structure for 64B cache lines. In our design, each bits to store the coalesced entries. Furthermore, the position bit can
entry stores up to 16 coalesced entries based on 1 kB range. also function as the valid bit for each coalesced entry, meaning only
one valid bit is necessary to indicate whether the entire entry is valid
or not.
Determining the address range within which REC coalesces entries is
one of the key design considerations, as it directly impacts the number 4.2. REC protocol flows
of bits required for each entry. Table 1 shows a list of design choices for
implementing REC with varying addressable ranges and their potential The baseline coherence protocol operates with two stable states-
trade-offs. The number of required base address bits is calculated using valid and invalid-allowing it to remain lightweight and efficient. In
2n = addressable_range, where n is the number of bits right-shifted our proposed coherence directory design, each entry represents the
from the original tag address. Also, the number of required position validity of an entire address range instead of tracking individual tag
bits is determined by the maximum number of coalesceable cache line addresses and associated sharers. This enables the state transitions
addresses within the target range, assuming 64B line size. Then, the to be managed at a coarser granularity during directory evictions.
number of sharer bits required is (n-1)×num_position_bits, where n is Additionally, REC supports fine-grained control over write requests by
the number of GPUs. For example, if REC is designed to coalesce with tracking specific offsets within these address ranges, avoiding the need
addressable range of 256B, each entry would require 40, 4, and 12 bits to evict entire entries. Fig. 10 highlights the architecture of REC and
for base address, position, and sharer fields, respectively. Lastly, one how it differently handles the received requests with the baseline. REC
valid bit is added to each entry. In Table 1, we show the total bits does not require additional coherence states but instead modifies the
required per entry under the addressable ranges from 128B to 4 kB transitions triggered under specific conditions.
6
G. Ko et al. Journal of Systems Architecture 160 (2025) 103339
Remote reads: When the GPU receives the read request from the 4.3. Discussion
remote GPU, the cache controller extracts the base and offset from the
tag address ( A ). The controller then looks up the coherence directory Overheads: In our design, the coherence directory consists of 8K
for an entry with the matching base address ( B ). If a valid entry is entries, with each entry covering a 1 kB range of addresses. Each entry
found, the position bit corresponding to the offset calculated using the comprises a 38-bit base address field, a 64-bit vector for offsets and
formula in Section 4.1 and the associated sharer bit are set ( C ). For sharers, and a valid bit (detailed in Table 1). Thus, the total directory
example, the position bit is 34016 /64 × 4 = 52 representing the 14th size is 8192 × 103/8/1024 = 103 kB. We also estimate the area
cache line within the specified 1 kB range. The sharer bit is determined and power overhead of the coherence directory in REC, using CACTI
by the source GPU index (e.g., GPU1). Therefore, bit 52 and 53 are set 7.0 [25]. The results show that the directory is 3.94% area and has
to 1. It can happen that the position bit is already set; nevertheless, the 3.28% power consumption compared to GPU L2 cache. REC requires no
controller still performs a bitwise OR on the bits at the corresponding additional hardware extensions for managing the coherence directory.
positions. Since the entry already exists in the directory, it remains The existing cache controller handles operations such as base address
valid. Otherwise, if no valid entry is found, a new entry is created calculation and bitwise manipulation efficiently.
with the base address, and the position and sharer bits are set. With Comparison to prior work: As discussed in Section 2.3, HMG [11]
designs each coherence directory entry to track four cache lines at
the insertion of a new entry, the state transitions from invalid to valid.
a coarse granularity. We empirically show, in Section 3.3, that GPUs
Local writes: When the write request is performed locally ( D ), the
require a directory size up to 12× the baseline to eliminate unnecessary
cache controller must determine whether it needs to send out inval-
cache line invalidations. Since REC coalesces up to 16 consecutive
idation requests to the sharers that hold the copy of data. For this,
cache line addresses per entry, REC can track a significantly larger num-
the controller again looks up the directory with the calculated base
ber of cache lines compared to the prior work. Moreover, REC precisely
address and offset ( E ). If an entry is found and the offset is valid
tracks each address by storing the offset and sharer information. Thus,
(i.e., the position bit is set), the invalidation request is generated
REC fully support fine-grained management of cache lines under write
and propagated to the recorded sharers immediately ( F ). The state
operations.
transition is handled differently based on two conditions. First, when Scalability: REC requires modifications to its design in large-scale
another offset is tracked under the common address range, the directory systems, specifically to the sharer bit field. For an 8-GPU system, REC
entry should remain valid. Thus, the controller clears only the position requires (8-1) × 16 = 112 bits to record sharers in each entry. Then,
and sharer bits for the specific offset of the target address. For example, the size of each entry becomes 112 + 38 + 16 + 1 = 167 bits, which
in Fig. 10, the directory entry has another offset (atp = 56) recorded is approximately three times the baseline size, where each entry costs
under the same base address. Once the invalidation request is sent out 56 bits, including a 4-bit increase for sharers. Similarly, for a 16-GPU
to GPU1, the controller only clears bits 0 and 1. If the cleared bits are system, REC requires 295 bits per entry, roughly five times the baseline
the last ones, the entire directory entry transitions to an invalid state size. However, as observed in Section 3.3, an ideal GPU requires up to
to make room for new entries. 12 times the baseline directory size even in a 4-GPU system, implying
Remote writes: For the remote write request, the cache controller that simply increasing the baseline directory size is insufficient to meet
begins the same directory lookup process by calculating the base and scalability demands.
offset from the tag ( G ). In our target multi-GPU system, the source
GPU also performs writes to the copy of data in its local L2 cache 5. Evaluation
(discussed in Section 2.2). Therefore, the controller handles remote
write requests differently from local writes. When an entry already 5.1. Methodology
exists in the directory (i.e., hits), there may be two circumstances: (1)
the target offset is invalid but the entry has other valid offsets and (2) We use MGPUSim [20], a cycle-accurate multi-GPU simulator, to
the target offset is already valid and one or more sharers are being model baseline and REC architecture with four AMD GPUs connected
tracked. If the target offset is invalid, the controller simply adds the using inter-GPU links of 300 GB/s bandwidth [26]. The configuration of
offset and the sharer to the entry in the same way it handles remote the modeled GPU architecture is detailed in Table 2. Each GPU includes
reads. If the offset is valid, the controller adds the source GPU to the L1 scalar and instruction caches shared within each SA, while the L1
sharer list by setting its corresponding bit and clearing other sharer vector cache is private to each CU, and the L2 cache is shared across the
GPU. We extend remote data caching to the L2 caches, allowing data
bits ( H ), then sends invalidation requests to all other sharers ( I ). In
from any GPU in the system to be cached in the L2 cache of any other
Fig. 10, the entry and the target offset (atp = 56) both are already
GPU. Since MGPUSim does not include a support of hardware cache
recorded. The controller, thus, additionally sets bit 58 to add GPU2 as
coherence, we extend the simulator by implementing a coherence di-
a sharer while clearing the bit 59 and sends the invalidation request
rectory managed by the L2 cache controller. The coherence directory is
to GPU3. In either cases, the directory entry remains valid. When the
implemented with a set-associative structure to reduce lookup latency.
directory misses, the cache controller allocates a new entry to record
Since the baseline coherence directory is decoupled from the caches,
the base, offset, and sharer from the write request. Then, the entry state
its way associativity as well as the size can be scaled independently.
transitions to valid. In our evaluation, the coherence directory is designed with an 8-way
Directory entry eviction/replacement: When the coherence directory set-associative structure to reduce conflict misses, containing 8K entries
becomes full, it needs to replace an entry with the newly inserted in both the baseline and REC architectures. Upon receiving remote read
one. The baseline coherence directory uses a FIFO replacement policy. requests, the cache controller updates the coherence directory with
However, for workloads that exhibit irregular memory access pat- recording the addresses and the associated sharers. Once capacity of
terns, capturing locality becomes a challenge. To address this, REC the directory is reached, the cache controller evicts an entry and sends
adopts the replacement policy, similar to LRU, to better retain entries out invalidation requests to the recorded sharers. For receiving write
that are more likely to be accessed again. As the cache controller requests, the controller looks up the directory to find whether data
receives the remote read request and does find an entry with the with matching addresses are shared by remote GPUs. If the matching
matching base address ( J ), it determines an entry for replacement entries are found, invalidation requests are propagated to the sharers
( K ). The evicting entry is then replaced with the new entry from the except the source GPU. Additionally, since L2 caches are managed
incoming request ( L ). Meanwhile, the controller retrieves the base by coherence directories, acquire operations do not perform invalida-
address, every merged offset from the evicting entry and reconstructs tions on L2 caches, but release operations flush the L2 caches. We
the original tag addresses. Invalidation requests are propagated to every use workloads from a diverse set of benchmark suites, including AM-
recorded sharer associated with each tag address ( M ). Lastly, the entry DAPPSDK [23], Heteromark [22], Polybench [21], SHOC [24]. Table 3
transitions to an invalid state. lists the workloads with their memory footprints.
7
G. Ko et al. Journal of Systems Architecture 160 (2025) 103339
Fig. 11. Performance comparison of the baseline with double-sized coherence direc- Fig. 12. Number of coalesced cache line addresses at directory entry eviction under
tory, HMG [11], REC, and an idealized system with zero unnecessary invalidations. REC with varying addressable ranges. REC in this work coalesces with 1 kB addressable
Performance is normalized to the baseline with 8K-entry coherence directory. range.
5.2. Performance analysis
Fig. 11 shows the performance of the baseline with coherence di-
rectory double in size, HMG [11], REC, and an ideal multi-GPU system
with zero unnecessary invalidations relative to the baseline. First, we
include the performance of baseline with double in coherence directory
size to compare REC with the same storage cost. The result shows that
the baseline with double the size of directory achieves average speedup
of 7.3%. The baseline coherence directory tracks each remote access
Fig. 13. Total number of L2 cache misses in the baseline with double-sized coherence
individually, on a per-entry basis. As discussed in Section 3.3, doubling directory, HMG [11], and REC relative to the baseline.
the size of coherence directory does not mitigate the unnecessary cache
line invalidations for applications with significant directory evictions.
Also, results show that HMG and REC achieve average speedup of
As a result, this delays the replacement of useful cache lines, thereby
16.7% and 32.7% across the evaluated workloads. We observe that
improving cache efficiency.
REC outperforms the prior scheme for two reasons. First, REC delays
L2 cache misses: The performance improvement of REC is largely
directory evictions by allowing each entry to record more cache line
attributed to the reduction in cache misses caused by unnecessary
addresses for a wider range. Since HMG uses each directory entry to
invalidations from frequent evictions in the coherence directory of
track four cache lines, an entire coherence directory can track cache
home GPUs. Fig. 13 shows the total number of L2 cache misses in the
lines up to 4× the baseline. On the other hand, the directory in REC
baseline with double-sized directory, HMG, and REC relative to the
can record up to 16× the number of entries. Second, REC manages write
baseline. Cold misses are excluded from the results. We observe that
operations to shared cache lines at a fine granularity by searching the
REC reduces L2 cache misses by 53.5%. In contrast, the baseline with
directory with exact addresses and sharers, propagating invalidations
double-sized directory and HMG experience 1.79× and 1.40× higher
only when necessary. Since each directory entry of HMG stores only
a single address and sharer ID field that cover for four cache lines, number of cache misses than REC since neither approach is insufficient
writes to any of these cache lines trigger invalidation requests to every to delay evict-initiated cache line invalidations. The result is closely
cache line and recorded sharer which leads them to be false positives. related to the reduction in remote access latency, as the corresponding
In contrast, REC does not allow any false positives and performs inval- misses are forwarded to the remote GPUs. Addressing the remote GPU
idations only to the modified cache lines and the associated sharers. As access bottleneck is performance-critical in multi-GPU systems.
a result, REC reduces unnecessary invalidations on cache lines that are Unnecessary invalidations: In the baseline, invalidation requests
actively being accessed by the requesting GPUs, minimizing redundant propagated from frequent directory evictions in the home GPU lead to a
remote memory accesses. To investigate the effectiveness of REC under higher chances of finding the corresponding cache lines still valid in the
different addressable ranges listed in Table 1, we also measure the sharer-side L2 caches. This results in premature invalidations of cache
number of coalesced cache line addresses when an entry is evicted lines that are actively in use, exacerbating the cache miss rate. In REC,
and plot in Fig. 12. We observe that the directory entries capture an the invalidation requests generated by directory eviction reduce the
average of 1.8, 3.4, 12.9, and 54.7 addresses until eviction under REC chances of invalidating valid cache lines. Fig. 14 shows that the number
with 128B, 256B, 1 kB, and 4 kB coalesceable ranges. Specifically, of unnecessary invalidations performed in remote L2 caches (i.e., where
REC captures more than 14 addresses before directory eviction for they are hits) is reduced by 84.4%. Since REC significantly delays evict-
applications with strong spatial locality. initiated invalidation requests, many cache lines have already been
Fig. 12 also illustrates the characteristics of limited locality for evicted from the caches by the time these requests are issued.
certain workloads where REC benefits less. In ATAX, PR, and ST, REC Inter-GPU transactions: The reduction in unnecessary invalidations
coalesces 3.9, 6.1, and 5.8 addresses, respectively. This is because the enhances the utilization of data within the sharer GPUs and min-
applications exhibit locality challenging to be captured due to their imizes redundant accesses over inter-GPU links. Fig. 14 shows the
irregular memory access patterns that span across a wide range of total number of inter-GPU transactions compared to the baseline. As
addresses. To delay the eviction of entries in irregular workloads, we illustrated, REC reduces inter-GPU transactions by an average of 34.9%.
design our proposed coherence directory with an LRU-like replace- The reduced inter-GPU transactions directly contributes to the overall
ment policy (discussed in Section 4.2). Another interesting observation performance improvement in multi-GPU systems.
is that the performance improvement of GEMV with REC is higher Bandwidth impact: Fig. 15 shows the total inter-GPU bandwidth
than the improvement seen when eliminating unnecessary invalida- costs of invalidation requests. As presented in Section 3.2, a large
tions. Our approach delays invalidations, but still performs them when fraction of invalidation requests are propagated due to frequent direc-
the directories become full. During cache line replacement, the con- tory evictions. Since REC delays invalidation requests from directory
troller prioritizes invalid cache lines before applying the LRU policy. evictions by allowing each entry to coalesce multiple tag addresses, the
8
G. Ko et al. Journal of Systems Architecture 160 (2025) 103339
Fig. 14. Total number of unnecessary invalidations (bars) and inter-GPU transactions
(plots) relative to the baseline. Fig. 17. Performance of REC under varying (a) coalescing address ranges and (b)
number of directory entries. Results are shown relative to the baseline with an 8K-
entry coherence directory.
Fig. 15. Total bandwidth consumption of invalidation requests.
Fig. 18. Performance comparison of REC using FIFO and LRU replacement policies.
Performance is normalized to the baseline coherence directory with FIFO policy.
Fig. 16. L2 cache lookup latency.
bandwidth in most of the workloads becomes only a few gigabytes per Fig. 19. Performance impact of different L2 cache sizes in the baseline and REC.
Performance is normalized to the baseline with 2 MB L2 cache.
second.
Cache lookup latency: Fig. 16 illustrates the average L2 cache lookup
latency of REC normalized to the baseline. The results show that the
lookup latency reduces by 14.8% compared to the baseline. REC affects average, REC outperforms the baseline, even with reduced entry sizes
the average lookup latency as evict-initiated invalidation requests are compared to the baseline system with 8K-entry coherence directory.
propagated in burst. However, since REC significantly delays direc- This is because the coverage of each coherence directory in REC
tory eviction by coalescing multiple tag addresses, the overall latency can increase by up to 16× when locality is fully utilized. Although
decreases for most of the evaluated workloads. applications with limited locality show performance improvements as
the directory size increases, these gains are relatively modest when
5.3. Sensitivity analysis considered against the additional hardware costs.
FIFO replacement: Fig. 18 represents the performance of REC with
Coalescing range: One important design decision in optimizing REC a FIFO replacement policy. Our evaluation shows that the choice of
is determining the range over which to coalesce when remote read replacement policy has a relatively small impact on the overall perfor-
requests are received. As discussed in Section 4.1, the trade-off exists mance. For the workloads with regular and more predictable memory
between the range an entry coalesces and the number of bits required: access patterns, using the FIFO replacement policy is already effective
the larger the range, the more bits are needed to store the remote in coalescing sufficient number of addresses under the target ranges
GPU access information. Fig. 17(a) shows that the performance of REC (shown in Fig. 12). However, for some applications, such as ATAX,
improves as the coalescing range increases, with performance gains PR, and ST, performance is lower with FIFO compared to REC due
beginning to saturate at 1 kB. For our applications, a 1 kB range is to their limited locality patterns. These applications, therefore, benefit
sufficient to capture the majority of memory access locality within the from using an LRU-like replacement policy.
workloads. Since coalescing beyond 4 kB incurs excessive overhead in L2 cache size: The performance impact of different L2 cache sizes is
terms of bits required per entry (with 4 kB already requiring nearly 6× shown in Fig. 19. The results are normalized to the baseline with a
the baseline size), the potential performance improvement may not be 2 MB L2 cache. The benefits from increasing L2 cache capacity are
substantial to offset the additional cost. Therefore, we choose a 1 kB limited by the baseline coherence directory. In contrast, the perfor-
range for our implementation. mance of REC improves as L2 cache size increases, demonstrating its
Entry size: In our evaluation, we use a directory size of 8K entries ability to leverage larger caches effectively. Another observation is that
to match the baseline coherence directory. Fig. 17(b) shows the per- performance improvement with smaller L2 capacity is less significant
formance REC with varying entry sizes, ranging from 2K to 32K. On compared to larger L2 caches. This is because the coverage of the
9
G. Ko et al. Journal of Systems Architecture 160 (2025) 103339
Fig. 20. Performance impact of different inter-GPU bandwidth in the baseline and REC. Fig. 23. Performance of REC in different GPU architecture.
Performance is normalized to the baseline with 300 GB/s inter-GPU bandwidth.
Fig. 21. Performance of REC with different number of SAs normalized to the baseline
Fig. 24. Performance of REC with DNN applications.
with 16 SAs.
16-GPU systems, respectively. We observe that the performance im-
provement decreases as the number of GPUs increases. This is because,
with more GPUs, the application dataset is more distributed, and the
amount of data allocated to each GPUs memory decreases, resulting
in reduced pressure on each coherence directory for tracking shared
copies. Additionally, we compare REC with the baseline configured
with different directory sizes to match equal storage costs (discussed in
Section 4.3). We observe that REC achieves performance improvements
of 2.04× and 1.83× over the baseline with directory sizes increased by
Fig. 22. Performance comparison of REC and the baseline with equal storage cost 3× and 5×, respectively. The results confirm that simply increasing di-
under different number of GPUs. Performance is normalized to the baseline with 8K
rectory sizes is not an efficient approach, even in large-scale multi-GPU
entries.
systems.
5.4. REC with Different GPU Architecture
baseline coherence directory relatively increases as the L2 cache size
decreases. To further explore the performance sensitivity to different
We extend the evaluation of REC to include a different GPU ar-
L2 cache sizes, we evaluate REC in systems with L2 cache sizes of
chitecture by adapting the simulation environment to a more recent
0.5 MB and 8 MB. We find that REC achieves an average performance
NVIDIA-styled GPU [27]. This involves increasing the number of com-
improvement of 6.3% and 26.7% compared to the baseline with 0.5 MB
putation and memory resources compared to the AMD GPU setup.
and 8 MB L2 caches, respectively. Additionally, the performance trend
Specifically, we change the GPU configuration to include 128 CUs, each
of REC decreases as the L2 cache size increases since the effectiveness
with a 128 kB L1V cache. The L2 cache size is increased to 72 MB
of REC also reduces larger caches. Nevertheless, the results emphasize
with the cache line size adjusted to 128B. With the increased cache
the importance of coherence protocol in improving cache efficiency.
line size, we configure the addressable range of REC to 2 kB, allowing
Inter-GPU bandwidth: The bandwidth of inter-GPU links is a critical for coalescing up to the same number of tag addresses. We also scale
factor in scaling multi-GPU performance. Fig. 20 shows the perfor- the input sizes of the workloads until the simulations remain feasible.
mance of the baseline and REC under different inter-GPU bandwidths, The performance results, in Fig. 23, show that REC achieves a 12.9%
relative to the 300 GB/s baseline. The results demonstrate that REC out- performance improvement over the baseline. This indicates that our
performs the baseline, even in applications where performance begins proposed REC also benefits the NVIDIA-like GPU architecture.
to saturate with increased bandwidth.
Number of SAs: We also evaluate REC with increasing the number 5.5. Effectiveness of REC on DNN applications
of SAs as shown in Fig. 21. The performance improvement of REC
decreases compared to the system with 16 SAs since the increased We evaluate the performance improvement of REC in training
number of SAs improves thread-level parallelism of GPUs. However, the two DNN models, VGG16 and ResNet18, using Tiny-Imagenet-200
system with a larger number of SAs also elevates the intensity of data dataset [28]. As shown in Fig. 24, REC outperforms the baseline for
sharing thus, increases the frequency of coherence directory evictions. training VGG16 and ResNet18 by 5.6% and 8.9%, respectively. The
As a result, REC outperforms the baseline with 16 SAs by 17.1%. results imply that REC also has benefits in multi-GPU training on
Number of GPUs: We evaluate REC in 8-GPU and 16-GPU systems, DNN workloads. Additionally, GPUs have recently gained significant
as shown in Fig. 22. To ensure a fair comparison, we do not change attention for training large language models (LLM). The computation
the workload sizes. The results show that REC provides performance of LLM training comprises multiple decoder blocks with each primarily
improvements of 24.7% and 14.7% over the baseline in 8-GPU and having series of matrix and vector operations [29]. In our evaluation,
10
G. Ko et al. Journal of Systems Architecture 160 (2025) 103339
we observe that REC improves multi-GPU performance by 20.2% and translation overheads, and [47]. Villa et al. [49] studied design-
20.4% on GEMM and GEMV workloads, respectively. Considering real- ing trustworthy system-level simulation methodologies for single- and
world LLM training, the memory requirements can become significant multi-GPU systems. Lastly, NGS [50] enables multiple nodes in a data
with large parameters which can pressure memory systems and lead center network to share the compute resources of GPUs on top of a
to under-utilization of computation resources [29]. Since REC im- virtualization technique.
proves the cache efficiency in multi-GPU systems, we expect a higher
performance potential from REC in real-world LLM training. 7. Conclusion
6. Related work In this paper, we propose REC to improve the efficiency of cache
coherence in multi-GPU systems. Our analysis shows that the limited
Several prior works have proposed GPU memory consistency and
capacity of coherence directories in fine-grained hardware protocols
cache coherence mechanisms optimized for general-purpose domains
frequently leads to evictions and unnecessary invalidations of shared
[1315,19,3032]. GPU-VI [19] reduces stalls at the cache controller
data. As a result, the increase in cache misses exacerbates NUMA
by employing write-through, write-no-allocate L1 caches and treating
overhead, leading to significant performance degradation in multi-GPU
loads to the pending writes as misses. To maintain write atomicity,
systems. To address this challenge, REC leverages memory access local-
GPU-VI adds transient states and state transitions and requires invali-
ity to coalesce multiple tag addresses within common address ranges,
dation acknowledgments before write completion. REC is implemented
effectively increasing the coverage of coherence directories without
based on the relaxed memory models commonly adopted in recent
incurring significant hardware overhead. Additionally, REC maintains
GPU architectures, which do not require acknowledgments to be sent
write-initiated invalidations at a fine granularity to ensure precise and
or received over long-latency inter-GPU links. HMG [11] proposes a
flexible coherence across GPUs. Experiments show that REC reduces
lightweight directory protocol by addressing up-to-date memory consis-
L2 cache misses by 53.5% and improves overall system performance
tency and coherence requirements. HMG integrates separate layers for
by 32.7%.
managing inter-GPM and inter-GPU level coherence, reducing network
traffic and complexity in deeply hierarchical multi-GPU systems. REC
primarily addresses the increased cache misses to remotely fetched data CRediT authorship contribution statement
caused by frequent invalidations. Additionally, REC can be extended
to support hierarchical multi-GPU systems posed by HMG without Gun Ko: Writing original draft, Visualization, Validation, Soft-
significant hardware modifications. ware, Resources, Methodology, Investigation, Formal analysis, Data
Other efforts aim to design efficient cache coherence protocols for curation, Conceptualization. Jiwon Lee: Formal analysis, Conceptu-
other processor domains. Wang et al. [33] suggested a method to alization. Hongju Kal: Validation, Conceptualization. Hyunwuk Lee:
efficiently support dynamic task parallelism on heterogeneous cache Visualization, Validation. Won Woo Ro: Supervision, Project adminis-
coherent systems. Zuckerman et al. [34] proposed Cohmeleon that tration, Conceptualization.
orchestrates the coherence in accelerators in heterogeneous system-on-
chip designs. HieraGen [35] and HeteroGen [36] are automated tools Declaration of competing interest
for generating hierarchical and heterogeneous cache coherence proto-
cols, respectively, for generic processor designs. Li et al. [37] proposed The authors declare that they have no known competing finan-
methodologies to determine the minimum number of virtual networks cial interests or personal relationships that could have appeared to
for cache coherence protocols that can avoid deadlocks. However, these influence the work reported in this paper.
studies do not address the challenges of redundant invalidations in the
cache coherence mechanisms of multi-GPU systems.
Acknowledgments
Significant research has addressed the NUMA effect in multi-GPU
systems by proposing efficient page placement and migration strate-
This work was supported by Institute of Information & communica-
gies [5,6,38], data transfer and replication methods [4,7,8,10,39,40],
tions Technology Planning & Evaluation (IITP) grant funded by the Ko-
and address translation schemes [4143]. In particular, several works
rea government (MSIT) (No. 2024-00402898, Simulation-based High-
have focused on improving the management of shared data within the
speed/High-Accuracy Data Center Workload/System Analysis Platform)
local memory hierarchy. NUMA-aware cache partitioning [3] dynami-
cally allocates cache space to accommodate data from both local and
remote memory by monitoring inter-GPU and local DRAM bandwidths. Data availability
The authors also extend software coherence with bulk invalidations
to L2 caches and evaluate the overhead associated with unnecessary The authors are unable or have chosen not to specify which data
invalidations. SAC [12] proposes reconfigurable last-level caches (LLC) has been used.
that can be utilized as either memory-side or SM-side, depending on
predicted application behavior in terms of effective LLC bandwidth.
References
SAC evaluates the performance of both software and hardware ex-
tensions for LLC coherence. In contrast, REC specifically targets the
[1] NVIDIA, NVIDIA DGX-2, 2018, https://www.nvidia.com/content/dam/en-
issue of unnecessary invalidations under hardware coherence, which zz/Solutions/Data-Center/dgx-2/dgx-2-print-datasheet-738070-nvidia-a4-web-
can undermine the efficiency of remote data caching. It introduces uk.pdf.
a new directory structure, carefully examining the trade-off between [2] NVIDIA, NVIDIA DGX A100 system architecture, 2020, https://download.
performance and storage overhead. boston.co.uk/downloads/3/8/6/386750a7-52cd-4872-95e4-7196ab92b51c/
DGX%20A100%20System%20Architecture%20Whitepaper.pdf.
Recent studies on multi-GPU and multi-node GPU systems also ad-
[3] U. Milic, O. Villa, E. Bolotin, A. Arunkumar, E. Ebrahimi, A. Jaleel, A. Ramirez,
dress challenges in various domains. Researchers proposed methods to
D. Nellans, Beyond the socket: NUMA-aware GPUs, in: Proceedings of IEEE/ACM
accelerate deep learning applications [44], graph neural networks [45], International Symposium on Microarchitecture, 2017, pp. 123135.
and graphics rendering applications [46] in multi-GPU systems. Na [4] V. Young, A. Jaleel, E. Bolotin, E. Ebrahimi, D. Nellans, O. Villa, Combining
et al. [47] addressed security challenges in inter-GPU communications HW/SW mechanisms to improve NUMA performance of multi-GPU systems, in:
under unified virtual memory framework. Barre Chord [48] leverages Proceedings of IEEE/ACM International Symposium on Microarchitecture, 2018,
page allocation schemes in multi-chip-module GPUs to reduce address pp. 339351.
11
G. Ko et al. Journal of Systems Architecture 160 (2025) 103339
[5] T. Baruah, Y. Sun, A.T. Dinçer, S.A. Mojumder, J.L. Abellán, Y. Ukidave, A. [30] K. Koukos, A. Ros, E. Hagersten, S. Kaxiras, Building heterogeneous Unified
Joshi, N. Rubin, J. Kim, D. Kaeli, Griffin: Hardware-software support for efficient Virtual Memories (UVMs) without the overhead, ACM Trans. Archit. Code Optim.
page migration in multi-GPU systems, in: Proceedings of IEEE International 13 (1) (2016).
Symposium on High Performance Computer Architecture, 2020, pp. 596609. [31] X. Ren, M. Lis, Efficient sequential consistency in GPUs via relativistic cache co-
[6] M. Khairy, V. Nikiforov, D. Nellans, T.G. Rogers, Locality-centric data and thread- herence, in: Proceedings of IEEE International Symposium on High Performance
block management for massive GPUs, in: Proceedings of IEEE/ACM International Computer Architecture, 2017, pp. 625636.
Symposium on Microarchitecture, 2020, pp. 10221036.
[32] S. Puthoor, M.H. Lipasti, Turn-based spatiotemporal coherence for GPUs, ACM
[7] H. Muthukrishnan, D. Lustig, D. Nellans, T. Wenisch, GPS: A global publish-
Trans. Archit. Code Optim. 20 (3) (2023).
subscribe model for multi-GPU memory management, in: Proceedings of
IEEE/ACM International Symposium on Microarchitecture, 2021, pp. 4658. [33] M. Wang, T. Ta, L. Cheng, C. Batten, Efficiently supporting dynamic task paral-
[8] L. Belayneh, H. Ye, K.-Y. Chen, D. Blaauw, T. Mudge, R. Dreslinski, N. Talati, lelism on heterogeneous cache-coherent systems, in: Proceedings of ACM/IEEE
Locality-aware optimizations for improving remote memory latency in multi-GPU International Symposium on Computer Architecture, 2020, pp. 173186.
systems, in: Proceedings of the International Conference on Parallel Architectures [34] J. Zuckerman, D. Giri, J. Kwon, P. Mantovani, L.P. Carloni, Cohmeleon:
and Compilation Techniques, 2022, pp. 304316. Learning-based orchestration of accelerator coherence in heterogeneous SoCs, in:
[9] S.B. Dutta, H. Naghibijouybari, A. Gupta, N. Abu-Ghazaleh, A. Marquez, K. Proceedings of IEEE/ACM International Symposium on Microarchitecture, 2021,
Barker, Spy in the GPU-box: Covert and side channel attacks on multi-GPU pp. 350365.
systems, in: Proceedings of ACM/IEEE International Symposium on Computer
[35] N. Oswald, V. Nagarajan, D.J. Sorin, HieraGen: Automated generation of con-
Architecture, 2023, pp. 633645.
current, hierarchical cache coherence protocols, in: Proceedings of ACM/IEEE
[10] H. Muthukrishnan, D. Lustig, O. Villa, T. Wenisch, D. Nellans, FinePack:
International Symposium on Computer Architecture, 2020, pp. 888899.
Transparently improving the efficiency of fine-grained transfers in multi-GPU
systems, in: Proceedings of IEEE International Symposium on High Performance [36] N. Oswald, V. Nagarajan, D.J. Sorin, V. Gavrielatos, T. Olausson, R. Carr,
Computer Architecture, 2023, pp. 516529. HeteroGen: Automatic synthesis of heterogeneous cache coherence protocols, in:
[11] X. Ren, D. Lustig, E. Bolotin, A. Jaleel, O. Villa, D. Nellans, HMG: Extending Proceedings of IEEE International Symposium on High Performance Computer
cache coherence protocols across modern hierarchical multi-GPU systems, in: Architecture, 2022, pp. 756771.
Proceedings of IEEE International Symposium on High Performance Computer [37] W. Li, A.G.U. of Amsterdam, N. Oswald, V. Nagarajan, D.J. Sorin, Determining
Architecture, 2020, pp. 582595. the minimum number of virtual networks for different coherence protocols, in:
[12] S. Zhang, M. Naderan-Tahan, M. Jahre, L. Eeckhout, SAC: Sharing-aware caching Proceedings of ACM/IEEE International Symposium on Computer Architecture,
in multi-chip GPUs, in: Proceedings of ACM/IEEE International Symposium on 2024, pp. 182197.
Computer Architecture, 2023, pp. 605617. [38] Y. Wang, B. Li, A. Jaleel, J. Yang, X. Tang, GRIT: Enhancing multi-GPU
[13] B.A. Hechtman, S. Che, D.R. Hower, Y. Tian, B.M. Beckmann, M.D. Hill, S.K. performance with fine-grained dynamic page placement, in: Proceedings of IEEE
Reinhardt, D.A. Wood, QuickRelease: A throughput-oriented approach to release International Symposium on High Performance Computer Architecture, 2024, pp.
consistency on GPUs, in: Proceedings of IEEE International Symposium on High 10801094.
Performance Computer Architecture, 2014, pp. 189200.
[39] M.K. Tavana, Y. Sun, N.B. Agostini, D. Kaeli, Exploiting adaptive data com-
[14] M.D. Sinclair, J. Alsop, S.V. Adve, Efficient GPU synchronization without
pression to improve performance and energy-efficiency of compute workloads in
scopes: Saying no to complex consistency models, in: Proceedings of IEEE/ACM
multi-GPU systems, in: Proceedings of IEEE International Parallel and Distributed
International Symposium on Microarchitecture, 2015, pp. 647659.
[15] J. Alsop, M.S. Orr, B.M. Beckmann, D.A. Wood, Lazy release consis- Processing Symposium, 2019, pp. 664674.
tency for GPUs, in: Proceedings of IEEE/ACM International Symposium on [40] H. Muthukrishnan, D. Nellans, D. Lustig, J.A. Fessler, T.F. Wenisch, Efficient
Microarchitecture, 2016, pp. 113. multi-GPU shared memory via automatic optimization of fine-grained trans-
[16] NVIDIA, NVIDIA TESLA V100 GPU architecture, 2017, https://images.nvidia. fers, in: Proceedings of the ACM/IEEE International Symposium on Computer
com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf. Architecture, 2021, pp. 139152.
[17] NVIDIA, NVIDIA A100 tensor core GPU architecture, 2020, https: [41] B. Li, J. Yin, Y. Zhang, X. Tang, Improving address translation in multi-
//images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere- GPUs via sharing and spilling aware TLB design, in: Proceedings of IEEE/ACM
architecture-whitepaper.pdf. International Symposium on Microarchitecture, 2021, pp. 11541168.
[18] NVIDIA, NVIDIA NVLink high-speed GPU interconnect, 2024, https://www.
[42] B. Li, J. Yin, A. Holey, Y. Zhang, J. Yang, X. Tang, Trans-FW: Short circuiting
nvidia.com/en-us/design-visualization/nvlink-bridges/.
page table walk in multi-GPU systems via remote forwarding, in: Proceedings
[19] I. Singh, A. Shriraman, W.W.L. Fung, M. OConnor, T.M. Aamodt, Cache coher-
of IEEE International Symposium on High Performance Computer Architecture,
ence for GPU architectures, in: Proceedings of IEEE International Symposium on
2023, pp. 456470.
High Performance Computer Architecture, 2013, pp. 578590.
[20] Y. Sun, T. Baruah, S.A. Mojumder, S. Dong, X. Gong, S. Treadway, Y. Bao, [43] B. Li, Y. Guo, Y. Wang, A. Jaleel, J. Yang, X. Tang, IDYLL: Enhancing page
S. Hance, C. McCardwell, V. Zhao, H. Barclay, A.K. Ziabari, Z. Chen, R. translation in multi-GPUs via light weight PTE invalidations, in: Proceedings of
Ubal, J.L. Abellán, J. Kim, A. Joshi, D. Kaeli, MGPUSim: Enabling multi- IEEE/ACM International Symposium on Microarchitecture, 2015, pp. 11631177.
GPU performance modeling and optimization, in: Proceedings of ACM/IEEE [44] E. Choukse, M.B. Sullivan, M. OConnor, M. Erez, J. Pool, D. Nellans, Buddy
International Symposium on Computer Architecture, 2019, pp. 197209. compression: Enabling larger memory for deep learning and HPC workloads
[21] T. Yuki, L.-N. Pouchet, Polybench 4.0, 2015. on GPUs, in: Proceedings of ACM/IEEE International Symposium on Computer
[22] Y. Sun, X. Gong, A.K. Ziabari, L. Yu, X. Li, S. Mukherjee, C. Mccardwell, A. Architecture, 2020, pp. 926939.
Villegas, D. Kaeli, Hetero-mark, a benchmark suite for CPU-GPU collaborative
[45] Y. Tan, Z. Bai, D. Liu, Z. Zeng, Y. Gan, A. Ren, X. Chen, K. Zhong, BGS: Accelerate
computing, in: Proceedings of IEEE International Symposium on Workload
GNN training on multiple GPUs, J. Syst. Archit. 153 (2024) 103162.
Characterization, 2016, pp. 110.
[23] AMD, AMD app SDK OpenCL optimization guide, 2015. [46] X. Ren, M. Lis, CHOPIN: Scalable graphics rendering in multi-GPU systems via
[24] A. Danalis, G. Marin, C. McCurdy, J.S. Meredith, P.C. Roth, K. Spafford, V. Tip- parallel image composition, in: Proceedings of IEEE International Symposium on
paraju, J.S. Vetter, The Scalable Heterogeneous Computing (SHOC) benchmark High Performance Computer Architecture, 2021, pp. 709722.
suite, in: Proceedings of the 3rd Workshop on General-Purpose Computation on [47] S. Na, J. Kim, S. Lee, J. Huh, Supporting secure multi-GPU computing with dy-
Graphics Processing Units, 2010, pp. 6374. namic and batched metadata management, in: Proceedings of IEEE International
[25] R. Balasubramonian, A.B. Kahng, N. Muralimanohar, A. Shafiee, V. Srinivas, Symposium on High Performance Computer Architecture, 2024, pp. 204217.
CACTI 7: New tools for interconnect exploration in innovative off-chip memories, [48] Y. Feng, S. Na, H. Kim, H. Jeon, Barre chord: Efficient virtual memory trans-
ACM Trans. Archit. Code Optim. 14 (2) (2017) 14:125. lation for multi-chip-module GPUs, in: Proceedings of ACM/IEEE International
[26] NVIDIA, NVIDIA DGX-1 with tesla V100 system architecture, 2017, pp. 143.
Symposium on Computer Architecture, 2024, pp. 834847.
[27] NVIDIA, NVIDIA ADA GPU architecture, 2023, https://images.nvidia.com/aem-
dam/Solutions/Data-Center/l4/nvidia-ada-gpu-architecture-whitepaper- [49] O. Villa, D. Lustig, Z. Yan, E. Bolotin, Y. Fu, N. Chatterjee, Need for speed:
v2.1.pdf. Experiences building a trustworthy system-level GPU simulator, in: Proceedings
[28] Y. Le, X. Yang, Tiny ImageNet visual recognition challenge, 2015, https://http: of IEEE International Symposium on High Performance Computer Architecture,
//vision.stanford.edu/teaching/cs231n/reports/2015/pdfs/yle_project.pdf. 2021, pp. 868880.
[29] G. Heo, S. Lee, J. Cho, H. Choi, S. Lee, H. Ham, G. Kim, D. Mahajan, J. Park, [50] J. Prades, C. Reaño, F. Silla, NGS: A network GPGPU system for orchestrating
NeuPIMs: NPU-PIM heterogeneous acceleration for batched LLM inferencing, remote and virtual accelerators, J. Syst. Archit. 151 (2024) 103138.
in: Proceedings of ACM International Conference on Architectural Support for
Programming Languages and Operating Systems, 2024, pp. 722737.
12
G. Ko et al. Journal of Systems Architecture 160 (2025) 103339
Gun Ko received the B.S. degree in electrical engineering Hyunwuk Lee received his B.S. and Ph.D. degrees in
from Pennsylvania State University in 2017. He is currently electrical and electronic engineering from Yonsei University,
pursuing the Ph.D. degree with the Embedded Systems Seoul, Korea, in 2018 and 2024, respectively. He currently
and Computer Architecture Laboratory, School of Electrical works in the memory division at Samsung Electronics. His
and Electronic Engineering, Yonsei University, Seoul, South research interests include neural network accelerators and
Korea. His current research interests include GPU memory GPU systems.
systems, multi-GPU systems, and virtual memory.
Jiwon Lee received the B.S. and Ph.D. degrees in electrical
and electronic engineering from Yonsei University, Seoul, Won Woo Ro received the B.S. degree in electrical engineer-
South Korea, in 2018 and 2024, respectively. He currently ing from Yonsei University, Seoul, South Korea, in 1996, and
works in the memory division at Samsung Electronics. His the M.S. and Ph.D. degrees in electrical engineering from the
research interests include virtual memory, GPU memory University of Southern California, in 1999 and 2004, respec-
systems, and storage systems. tively. He worked as a Research Scientist with the Electrical
Engineering and Computer Science Department, University
of California, Irvine. He currently works as a Professor
with the School of Electrical and Electronic Engineering,
Yonsei University. Prior to joining Yonsei University, he
worked as an Assistant Professor with the Department
Hongju Kal received the B.S. degree from Seoul National of Electrical and Computer Engineering, California State
University of Science and Technology and Ph.D. degree from University, Northridge. His industry experience includes a
Yonsei University in school of electric and electronic engi- college internship with Apple Computer, Inc., and a contract
neering, Seoul, South Korea in 2018 and 2024, respectively. software engineer with ARM, Inc. His current research
He currently works in the memory division at Samsung interests include high-performance microprocessor design,
Electronics. His current research interests include memory GPU microarchitectures, neural network accelerators, and
architectures, memory hierarchies, near memory processing, memory hierarchy design.
and neural network accelerators.
13

View File

@@ -0,0 +1,901 @@
Journal of Systems Architecture 160 (2025) 103349
Contents lists available at ScienceDirect
Journal of Systems Architecture
journal homepage: www.elsevier.com/locate/sysarc
Real-time scheduling for multi-object tracking tasks in regions with different
criticalities
Donghwa Kang a , Jinkyu Lee b ,, Hyeongboo Baek c ,
a Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
b
Sungkyunkwan University (SKKU), Suwon, South Korea
c
University of Seoul (UOS), Seoul, South Korea
ARTICLE INFO ABSTRACT
Keywords: Autonomous vehicles (AVs) utilize sensors such as LiDAR and cameras to iteratively perform sensing, decision-
Multi-object tracking making, and actions. Multi-object tracking (MOT) systems are employed in the sensing stage of AVs, using these
Real-time scheduling sensors to detect and track objects like pedestrians and vehicles, thereby enhancing situational awareness.
Timing guarantee
These systems must handle regions of varying criticality and dynamically shifting locations, all within limited
Criticality-awareness
computing resources. Previous DNN-based MOT approaches primarily focused on tracking accuracy, but timing
Autonomous driving
guarantees are becoming increasingly vital for autonomous driving. Although recent studies have introduced
MOT scheduling frameworks with timing guarantees, they are either restricted to single-camera systems or
fail to prioritize safety-critical regions in the input images. We propose CA-MOT, a Criticality-Aware MOT
execution and scheduling framework for multiple cameras. CA-MOT provides a control knob that balances
tracking accuracy in safety-critical regions and timing guarantees. By effectively utilizing this control knob,
CA-MOT achieves both high accuracy and timing guarantees. We evaluated CA-MOTs performance using a
GPU-enabled embedded board commonly employed in AVs, with data from real-world autonomous driving
scenarios.
1. Introduction pooling and convolutional layers) using CNN (convolutional neural
network)-based models (e.g., OS-Net [10]). For unmatched objects,
Autonomous vehicles (AVs) are systems that iteratively perform location-based methods like intersection over union (IoU) are applied.
sensing, decision-making, and actions using various sensors such as MOT input images exhibit two key characteristics: (i) regions with
LiDAR, radar, inertial measurement units (IMU), and cameras [1]. varying levels of criticality and (ii) dynamically shifting locations. With
Multi-object tracking (MOT) systems, used in the perception stage limited computing resources in AVs, it is crucial to deliver different
of AVs, track objects like pedestrians and cars, enhancing situational levels of service quality based on criticality. Safety-critical regions,
awareness. Since MOT information is periodically transferred to control where objects with a short time-to-collision (e.g., under 2 s) cluster,
tasks, timely execution must be guaranteed to ensure safety and prevent must be prioritized. If multiple clusters exist, the broader area en-
severe accidents [24]. Low accuracy, despite timely execution, may compassing them is considered the safety-critical region, as defined in
result in missed objects, thus compromising AVs safety [2,4,5]. There-
DNN-SAM [5]. Established methods compute time-to-collision using Li-
fore, AV MOT systems should ensure timing guarantees with maximized
DAR and IMU data; we follow the approach from DNN-SAM. This leads
accuracy.
to two requirements for criticality-aware MOT systems: (R1) accuracy
Tracking-by-detection [6,7] is widely used due to its high accuracy
maximization for safety-critical regions and (R2) timing guarantees.
and ability to leverage state-of-the-art DNN-based detection models
Most existing DNN-based MOT approaches focus on accuracy [7,
(e.g., YOLO series [8], Faster R-CNN [9]). For each input image from
each camera, tracking-by-detection performs two tasks: detection and 11,12], but timing guarantees are increasingly critical in autonomous
association. Detection uses DNN-based models to sense the motion driving. Recent research has proposed MOT resource scheduling frame-
information of objects, such as location and velocity, while association works that guarantee timing for every MOT execution [2,4]. How-
matches objects between frames based on extracted feature informa- ever, [2] overlooks safety-criticality, while [4] focuses on a single task.
tion (also called feature vectors or feature maps obtained through We address safety-criticality across multiple tasks, raising the following
Corresponding authors.
E-mail addresses: anima0729@kaist.ac.kr (D. Kang), jinkyu.lee@skku.edu (J. Lee), hbbaek359@gmail.com (H. Baek).
https://doi.org/10.1016/j.sysarc.2025.103349
Received 22 September 2024; Received in revised form 13 December 2024; Accepted 20 January 2025
Available online 28 January 2025
1383-7621/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
challenges:
C1. How to balance R1 and R2 to efficiently use limited computing
resources.
C2. How to achieve both R1 and R2 by effectively using the control
knob developed from C1.
In this paper, we propose CA-MOT, a Criticality-Aware MOT exe-
cution and scheduling framework for multiple MOT tasks. To address
C1, CA-MOT offers three execution options (low, middle, and high
workloads) to balance R1 and R2 for both detection and association.
To address C2, CA-MOT introduces the notion of aging for detection
and association sub-tasks, estimating the reliability of motion and
feature information over time. Balancing the aging of these tasks is
essential to achieve R1 and R2 with limited resources (to be discussed
in Section 3.4). Based on this, CA-MOT develops two scheduling algo-
rithms: EDF-BE and EDF-Slack. EDF-BE increases the workload of tasks
waiting in the ready queue for execution (referred to as active tasks)
without compromising the R2 bound when no other tasks are pending.
In contrast, EDF-Slack is designed to handle scenarios with multiple
active tasks. Fig. 1. Tracking accuracy and execution time on different execution options of
detection and association.
To validate CA-MOTs performance in meeting R1 and R2, we
conducted extensive experiments on an NVIDIA Jetson Xavier using
the KITTI Dataset [13]. Additionally, we applied three detectors in our
experiments: YOLOv5 [14], YOLOX [8], and Faster-RCNN [9]. a system that tracks specific objects moving between the fields of view
The contributions of this paper are as follows: of multiple cameras (called hand-over), this is beyond the scope of
our work. This paper focuses on dividing the multi-object tracking task
• We motivate the importance of balancing between aging of de- into two subtasks (i.e., detection and association) and using DNN-based
tection and association to achieve R1 and R2 (Section 2). MOT-specific properties (i.e., reuse of motion and feature information)
• We propose a new system design, CA-MOT that addresses R1 and to achieve R1 and R2 under limited resources.
R2 considering varying levels of criticality in different regions for
multiple MOT tasks (Section 3). 2.2. Trade-off between accuracy and execution time
• We develop new scheduling algorithms to effectively achieve R1
and R2 by balancing between aging of detection and association
To address C1, we consider two factors: (i) the input image size
for each MOT task (Section 4).
and detection within the safety-critical region, and (ii) the number of
• We demonstrate the effectiveness of CA-MOT in achieving R1 and
objects used for feature extraction during association across all detected
R2 using a real-world self-driving dataset (Section 5).
objects in each frame.
Fig. 1(a) compares the multi-object tracking accuracy (MOTA) [15]
2. Motivation for the overall and safety-critical regions (referred to as overall and
critical accuracy) and the execution time for a single MOT task using
This section presents target systems and motivates the system de- three input image sizes (256 × 256, 416 × 416, 672 × 672). Overall
sign of CA-MOT to address C1 and C2 based on measurement-based accuracy considers all objects, while critical accuracy focuses on the
observations. safety-critical region. YOLOv5 [14] is used for detection, and features
are extracted for all detected objects. The KITTI dataset [13] is used.
2.1. Target system
For image sizes 256 × 256 and 416 × 416, detection is performed
on a cropped region of interest (RoI) that includes the safety-critical
CA-MOT targets 2D MOT systems on AVs equipped with multiple
region. If the RoI is smaller, it is resized to include the critical region;
camera sensors. Each MOT task performs MOT execution on consecu-
otherwise, the critical region is cropped accordingly. The safety-critical
tive input frames received from the corresponding camera sensor at a
region will be defined in Section 3. For the 672 × 672 size (i.e., the
predetermined period. As this recurring task is required to complete
original input size), detection occurs without cropping.
a job within a specified deadline, each MOT task is considered a
As shown in Fig. 1(a), reducing image size leads to a notable
real-time task with a period and deadline. CA-MOT employs tracking-
decrease in overall accuracy, while critical accuracy decreases less
by-detection comprising two steps of MOT execution: detection and
association. The front-end detector performs detection by exploiting significantly due to prioritization of the safety-critical region in the RoI.
the existing stand-alone DNN-based detector to identify the position Additionally, execution time decreases as the image size is reduced,
and class of objects in the input image. Using the locations of detected demonstrating a trade-off between R1 and R2 when focusing on the
objects, the feature extractor (e.g., the deployed CNN model such as critical region.
OSNet) extracts features (i.e., feature vectors or feature maps) for each Fig. 1(b) shows the impact of varying the number of objects used
object. These features capture the visual characteristics of each object. for feature extraction on accuracy and execution time, with the image
The back-end tracker compares the feature similarities between objects size fixed at 672 × 672. The number of objects ranges from zero, three,
in the current frame and the previous frame, matching objects with and more than three. OS-Net [10] is used for feature extraction. As
high similarity. For any remaining unmatched objects, a location-based shown in Fig. 1(b), as the number of objects with feature extraction
matching method such as IoU is applied. The tracker then stores the increases, both overall and critical accuracy improve, but this also leads
motion information (position and velocity) and the features of each to increased execution time. This highlights a trade-off between R1 and
object in preparation for the next frame. R2 based on the number of objects considered for feature extraction.
We assume a system in which each camera independently tracks Section 3.3 details the MOT execution pipeline of CA-MOT, which
objects moving within its field of view. While it is possible to consider leverages these observations to effectively address C1.
2
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
Fig. 2. System design of CA-MOT: the key features are (a) an aging-aware scheduler that provides timing guarantees and a criticality-aware flexible MOT execution pipeline
including (b) a detection module that accommodates varying input sizes and (c) an association module that handles a varying number of objects for feature extraction.
2.3. Different combination of detection and association • It provides a timing guarantee while providing prioritized track-
ing accuracy for the safety-critical regions by exploiting an MOT-
To address C2, Fig. 1(c) reveals an intriguing observation that differ- specific property.
ent combinations of image sizes and the number of feature extractions
To address the first goal, CA-MOT implements a criticality-aware
yield distinct effects on accuracy and execution time. The experiment
flexible MOT execution pipeline in which detection and association
was conducted over 100 consecutive frames.
are performed with different execution option by leveraging the ob-
In Fig. 1(c), the execution of detection or association is denoted by
servations discussed in Section 2.2. To address the second goal, CA-
𝑃 or 𝐹 . 𝑃 represents partial computation, where detection is performed MOT develops an aging-aware task-level scheduler to provide accuracy
only on the region of interest (RoI) at a size of 256 × 256, including maximization while providing a timing guarantee by exploiting the
a safety-critical region, and association is limited to location-based observations discussed in Section 2.3 building upon the MOT execution
association without feature extraction. 𝐹 represents full computation, pipeline. The MOT execution pipeline and the scheduler are imple-
where detection is performed on the entire image at a size of 672 × 672, mented as separate threads, and they are communicated with shared
and association includes feature extraction for all objects. The number memory.
in the upper right of the notation indicates how many times the combi- CA-MOT does not require modifications to existing DNN models
nation of detection and association has been performed. For example, (e.g., detectors and feature extractors), which allows for reusing most
the notation 𝐹 𝐹 50 𝑃 𝑃 50 indicates that we use 𝐹 for both the detection (if not all) stand-alone detectors and feature extractors. Notably, state-
and association steps in the first 50 frames and 𝑃 for both phases in the of-the-art detectors like YOLOv5 are inherently designed to handle
remaining 50 frames. To mitigate the issue of objects outside the critical varying input image sizes, and all CNN models can perform batch exe-
region not being detected due to cropping, which can decrease the cution on multiple images (each corresponding to a different object). As
accuracy of the non-critical region, we utilize the position information shown in Fig. 2, the key features of CA-MOT are: (a) a scheduling policy
of objects from the previous frame as predicted position information for that selects one input per camera, provides timing guarantees, and
the current frame using a prediction model such as Kalman filter [16] adjusts the workload for detection and association; (b) a module that
during the execution of 𝑃 in the detection step. Except for 𝑃 𝑃 100 and processes detection with inputs of varying sizes; and (c) a module that
𝐹 𝐹 100 , all combinations have the same proportion of 𝐹 and 𝑃 for the extracts features from a pre-determined number of detected objects.
entire frames.
As shown in Fig. 1(c), although 𝐹 𝐹 50 𝑃 𝑃 50 and 𝑃 𝐹 100 have similar 3.2. Workflow
execution times with (𝑃 𝐹 + 𝐹 𝑃 )100 and (𝐹 𝐹 + 𝑃 𝑃 )100 , they show lower
tracking accuracy. This indicates that different combinations of 𝐹 and Fig. 2 presents the workflow of CA-MOT. During system operation,
𝑃 can have a varying impact on accuracy. The observation in Fig. 1(c) the task scheduler maintains a queue to store images periodically
necessitates a new scheduler that is capable of obtaining high tracking received from each camera sensor (⃝). 1 Then, the task scheduler deter-
accuracy by capturing an MOT-specific property referred to as aging, mines the following for tasks in the queue: (a) the task to be scheduled,
which will be detailed in Section 3.4. (b) the execution option for the detector, and (c) the execution option
for the association (⃝).
2 After an image moves to the MOT execution
3. System design of CA-MOT pipeline, the critical region identification module identifies a safety-
critical region from the image and crops (or not) a RoI including the
This section presents the goal and design of CA-MOT to address C1 safety-critical region according to the execution option for the detector
and C2. (⃝).
3 Depending on the execution option, the cropped RoI or entire
image is processed for detection (⃝).4 Furthermore, depending on the
3.1. System overview number of objects for which features are extracted, CA-MOT selectively
extracts features for detected objects (⃝).
5 All detected objects are then
CA-MOT utilizes a tracking-by-detection approach, consisting of matched with the tracked objects from the previous frame. If both the
two steps: detection and association, where the front-end detector em- detected and tracked objects have feature vectors, they are associated
ploys a pre-existing DNN-based detector to detect and classify objects through feature-based matching. Otherwise, they are associated solely
in the input image, and the unmatched objects are matched using a based on their locations (⃝).
6
location-based method like IoU. CA-MOT aims at providing prioritized
tracking accuracy for the safety-critical region with a timing guarantee 3.3. Criticality-aware flexible MOT execution pipeline
for every MOT execution on limited computing resources by addressing
C1 and C2 discussed in Section 1, which has the following design goals. The MOT execution pipeline conducts detection and association
sequentially. CA-MOT can employ any existing stand-alone DNN-based
• It provides different execution options not only for detection but detectors as long as it can accommodate different sizes of input images
also association considering different criticality of regions in input (e.g., YOLO series) and offer a clear trade-off between accuracy and
images. execution time. For each input image with a size of 672 × 672, the
3
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
detector performs the detection to identify the location and class of association for each task at every scheduling decision. The scheduler
multiple objects in the image. Once the scheduler determines the task manages MOT tasks using a single queue and is triggered when a task
(associated with an input image) to be scheduled and the execution completes its execution or a new task is released. As three execution
option for the tasks, the detection is performed for the task according options (e.g., low, middle, and high workloads) are provided for each
to the execution option. CA-MOT provides three execution options detection and association under CA-MOT, the scheduler decides the
(i.e., low, middle, and high workloads, respectively) providing a trade- image size (e.g., 256 × 256, 416 × 416, and 672 × 672) for detection
off between execution time and accuracy. For low and middle workload and feature size (e.g., zero, three, and more than three) for association
detections, CA-MOT first identifies the RoI with sizes of 256 × 256 and according to the scheduling algorithms (to be presented in Section 4).
416 × 416, respectively, and then detection is performed on cropped As discussed in Section 2.3, various combinations of image sizes
RoI, which includes the safety-critical region. The area outside the and the quantity of feature extractions result in different impacts on
RoI is not subject to detection, and the motion information (e.g., size, tracking accuracy. This is due to an important property of the MOT
position, velocity, direction) of objects detected in the previous frame system, which involves supplementing non-updated motion or feature
is used in the prediction models such as the Kalman filter to obtain information during detection and association in the current frame by
the estimated information of objects in the current frame. On the other utilizing information from the previous frame. For example, in scenar-
hand, high-workload detection is performed on the original image with ios with low and middle workload detection, the detection process does
a size of 672 × 672. not cover the area outside the RoI. Instead, the motion information
We define the area that encompasses all safety-critical objects, of objects detected in the previous frame, such as their size, position,
which are objects with a time-to-collision of less than two seconds, as velocity, and direction, is leveraged to estimate the corresponding
the safety-critical area. If the safety-critical area exceeds the input size information for objects in the current frame. Moreover, during the as-
for the detector, as determined by the detection process (e.g., 256 × 256 sociation step, if the feature extracted from the immediately preceding
or 416 × 416), the safety-critical area is cropped and resized to the frame is unavailable due to low- and middle-workload associations,
corresponding dimensions before being fed into the detector model. the feature-based matching algorithm compares the features extracted
The locations of safety-critical objects are determined based on their from objects in the current frame with the features extracted from the
most recently computed positions, without projecting future safety- nearest past frames. Therefore, the tracking accuracy is determined
critical regions from them. There are numerous existing approaches by the reliability of the reused motion and feature information of the
that calculate time-to-collision based on the relative positions of objects objects. To capture the reliability of the motion and feature information
and the ego vehicle given LiDAR and IMU data, and we assume the use of objects, we propose a new notion of aging that specifies the number
of one such method. It is also important to note that the KITTI dataset of middle- or high-workload executions of detection and association
provides both LiDAR and IMU data. For example, areas where objects conducted from the beginning of the MOT task, respectively. In order
with a time-to-collision of less than 2 s congregate can be defined as to update the motion and feature information as frequently as possi-
safety-critical regions, and if multiple such areas exist, the encompassing ble using limited computing resources, it is necessary to balance the
area that includes all of them would be considered the safety-critical aging of detection and association for each task. Note that increasing
region. Please note that we adhere to the definition of the safety-critical the aging of detection and association for all tasks simultaneously in
region as defined in the existing paper DNN-SAM in [5]. It is assumed every MOT execution is generally not feasible due to limited com-
that the critical region is pre-calculated by external sensors such as puting resources. Therefore, a mechanism is required to balance the
LiDAR and IMU and provided to CA-MOT. If an input image does not aging of detection and association for all tasks while providing timing
have a critical region, the entire frame is considered a critical region. guarantees under constrained resources. To this end, we propose new
As seen in Fig. 2, GPU is used only for the inference of DNN models, scheduling algorithms that will be detailed in the next section.
such as the detector (e.g., YOLOv5) and feature extractor (e.g., OSNet),
while all other execution is performed on the CPU. 4. Scheduling algorithm
For the association, the MOT system uses the two-step approach [7].
Initially, a CNN-based model (e.g., OS-Net [10]) is employed by the This section presents a task model and proposes new scheduling
tracker to extract features from the detected objects. The tracker then algorithms building upon CA-MOT.
compares these features between the current and previous frames to
identify object pairs with the highest feature similarity. For the re- 4.1. Task model
maining objects that are not matched based on feature comparison, a
location-based matching method such as IoU (intersection over union) Targeting MOT systems in AVs that involve 𝑛 camera sensors, we
is used. CA-MOT also provides three execution options (i.e., low, consider a set 𝜏 consisting of 𝑛 MOT tasks denoted as 𝜏𝑖 ∈ 𝜏. Each
middle, and high workloads, respectively) for the association. When MOT task 𝜏𝑖 is responsible for conducting MOT execution using input
it comes to the middle and high workload associations, the tracker images provided periodically by each camera sensor. As we employ
extracts features from some (e.g., three) of the detected objects or the methodology of tracking-by-detection, an MOT task consists of
all of the detected objects, and then performs consecutive feature- detection and association sub-tasks. Thus, the specification of each
based and location-based matchings. On the other hand, low-workload MOT task 𝜏𝑖 is given as 𝜏𝑖 = (𝑇𝑖 , 𝐶𝑖 (𝑠𝑖 , 𝑓𝑖 ), 𝐷𝑖 ), where 𝑇𝑖 represents the
association performs location-based matching only. Depending on the period (or the minimum inter-arrival time), 𝐶𝑖 (𝑠𝑖 , 𝑓𝑖 ) denotes the worst-
execution option, CA-MOT may extract features from only a subset or case execution time (WCET) based on the execution options (i.e., low,
all of the detected objects, which means that the feature information middle, and high-workload execution) for detection and association
of objects may not be updated every time. Therefore, during feature- sub-tasks, respectively, and 𝐷𝑖 indicates the relative deadline. The
based matching, the algorithm compares the features extracted from execution time of the detection sub-task depends on the image size
the objects in the current frame with the closest previously extracted 𝑠𝑖𝑆𝑖 = {𝐿, 𝑀 , 𝐻}, where 𝐿, 𝑀 , 𝐻 are 256 × 256, 416 × 416, and
features of the tracked objects and matches the two objects with the 672 × 672, respectively. Note that CA-MOT supports arbitrary non-
highest feature similarity. decreasing sizes for 𝑆𝑖 = {𝐿, 𝑀 , 𝐻}. On the other hand, the execution
time of the association sub-task depends on the feature size 𝑓𝑖
3.4. Aging-aware task scheduler 𝐹𝑖 = {𝐿, 𝑀 , 𝐻}, where 𝐿, 𝑀 , 𝐻 are zero, from one to three, and more
than three, respectively. Note that the tracking-by-detection methodol-
The CA-MOT implements a thread-level task scheduler to determine ogy performs the association phase sequentially through feature-based
the task to be scheduled and execution options for detection and matching followed by location-based matching using IoU. If 𝑓𝑖 is equal
4
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
Table 1 On the other hand, the worst-case execution time 𝐶𝑖𝐴 (𝑓𝑖 ) of the associa-
Notations used in the scheduling algorithms.
tion task depends on the feature size 𝑓𝑖 . It is calculated by considering
Symbol Description the time required for extracting features from detected objects and
𝜏𝑖 Task 𝑖 in the system performing matching methods such as feature-based and IoU-based
𝑇𝑖 Period of task 𝜏𝑖 (minimum inter-arrival time)
matching. An MOT task 𝜏𝑖 is considered schedulable if every job 𝐽𝑖
𝐷𝑖 Relative deadline of task 𝜏𝑖
(invoked by 𝜏𝑖 ) completes its execution within the relative deadline 𝐷𝑖 .
𝐶𝑖 (𝑋 , 𝑌 ) Worst-case execution time (WCET) of task 𝑖.
The overall schedulability of the system is determined by ensuring that
𝑋: image size for detection (𝐿, 𝑀 , 𝐻)
𝑌 : feature size for association (𝐿, 𝑀 , 𝐻) every task 𝜏𝑖 ∈ 𝜏 is schedulable.
𝑠𝑖 Image size for the detection sub-task of task 𝑖
𝑓𝑖 Feature size for the association sub-task of task 𝑖 4.2. EDF best-effort
𝑆𝑖 Set of image size options for task 𝑖 (𝑆𝑖 = {𝐿, 𝑀 , 𝐻})
𝐹𝑖 Set of feature size options for task 𝑖 (𝐹𝑖 = {𝐿, 𝑀 , 𝐻}) Building upon the system design of CA-MOT presented in Section 3,
𝐿 Low workload execution
we develop two scheduling algorithms that aim to provide not only
𝑀 Middle workload execution
𝐻 High workload execution high tracking accuracy for the safety-critical regions but also a tim-
𝐶𝑖𝐷 (𝑠𝑖 ) WCET of the detection sub-task of task 𝑖, based on image
ing guarantee for every MOT execution. To this end, the proposed
size 𝑠𝑖 scheduling algorithms have the following two features: (F1) an offline
𝐶𝑖𝐴 (𝑓𝑖 ) WCET of the association sub-task of task 𝑖, based on timing guarantee for the minimum execution (i.e., low-workload exe-
feature size 𝑓𝑖 cution for both detection and association) of every MOT execution and
𝑅𝐶𝑖 (𝐿, 𝐿) Remaining execution time for the minimum execution of (F2) an online policy to maximize tracking accuracy by systematically
task 𝑖 increasing workload (i.e., middle- or high-workload execution) of an
𝑎𝑔 𝑒𝐷 Aging value of the detection sub-task of task 𝑖 MOT execution using notions of slack and aging without compromising
𝑖
𝑎𝑔 𝑒𝐴
𝑖
Aging value of the association sub-task of task 𝑖 timing guarantee.
𝑠𝑙𝑎𝑐 𝑘𝑖𝑡 Slack time available for task 𝑖 at the current time 𝑡𝑐 𝑢𝑟 The proposed scheduling algorithms are based on the non-preem-
𝑐 𝑢𝑟
𝑞𝑖 Minimum execution time of task 𝑖 in the interval ptive earliest deadline first (EDF) scheduling algorithm, which assigns
[𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 )] higher priority to jobs with earlier deadlines without allowing any
𝑝 Sum of the minimum execution times for all tasks preemption. To provide the first feature F1, CA-MOT employs the
𝑑1 (𝑡𝑐 𝑢𝑟 ) Earliest deadline or future release time at time instant 𝑡𝑐 𝑢𝑟 existing schedulability analysis developed for non-preemptive EDF as
𝑠𝑙𝑎𝑐 𝑘𝐷 Remaining slack after executing high-workload detection follows.
𝑖
for task 𝑖
𝑠𝑙𝑎𝑐 𝑘𝐴 Remaining slack after executing high-workload Lemma 1. For a set 𝜏 of MOT tasks scheduled by non-preemptive EDF,
𝑖
association for task 𝑖 minimum execution 𝐶𝑖 (𝐿, 𝐿) of every task 𝜏𝑖 ∈ 𝜏 can be executed without
deadline miss as long as the following holds for every task 𝜏𝑖 ∈ 𝜏.
max𝜏𝑖 𝐶𝑖 (𝐿, 𝐿) ∑ 𝐶𝑖 (𝐿, 𝐿)
+ ≤ 1.0 (2)
to 𝐿, this indicates that no feature extraction has been performed for min𝜏𝑖 𝑇𝑖 𝜏 ∈𝜏
𝑇𝑖
𝑖
the frame, and thus, feature-based matching is skipped, proceeding
directly to location-based matching. In the case of 𝐻, we employ the Proof. The lemma presents a schedulability condition for non-preem-
maximum number of objects as defined by the environment (for the ptive EDF, and its proof is outlined as follows. Let us target 𝜏𝑘 ∈ 𝜏;
dataset considered, this is based on values measured across all videos), also, consider a virtual task 𝜏𝑥 ∉ 𝜏, whose 𝑇𝑥 and 𝐶𝑥 (𝐿, 𝐿) are set
for example, 10. Then, the worst-case execution time 𝐶𝑖 (𝑠𝑖 , 𝑓𝑖 ) of each to min𝜏𝑖 ∈𝜏 𝑇𝑖 and max𝜏𝑖 ∈𝜏 𝐶𝑖 (𝐿, 𝐿), respectively. Now, we compare the
MOT task 𝜏𝑖 is derived as follows. finishing time of a job of 𝜏𝑘 when (Case 1) 𝜏 is scheduled by non-
preemptive EDF, and (Case 2) 𝜏 {𝜏𝑥 } is scheduled by preemptive EDF.
𝐶𝑖 (𝑠𝑖 , 𝑓𝑖 ) = 𝐶𝑖𝐷 (𝑠𝑖 ) + 𝐶𝑖𝐴 (𝑓𝑖 ), (1)
Since at most one lower-priority job can block a high-priority job under
where 𝐶𝑖𝐷 (𝑠𝑖 ) and 𝐶𝑖𝐴 (𝑓𝑖 ) are the worst-case execution times of detection non-preemptive scheduling, 𝜏𝑘 can be blocked by at most one lower-
and association sub-tasks according to 𝑠𝑖 and 𝑓𝑖 , respectively. As shown priority job under Case 1; obviously, the WCET of the lower-priority job
in Fig. 2, both the detection and the association sub-tasks involve GPU is upper-bounded by max𝜏𝑖 ∈𝜏 𝐶𝑖 (𝐿, 𝐿). Also, to block all the following
operations, with their respective WCETs including the communication jobs of 𝜏𝑘 , the blocking frequency should be no smaller than 𝑇𝑘 , which
costs between the CPU and GPU. Note that the detection sub-task, is lower-bounded by min𝜏𝑖 ∈𝜏 𝑇𝑖 . Therefore, the finishing time of a job of
denoted as 𝜏𝑖𝐷 , and the association sub-task, denoted as 𝜏𝑖𝐴 , are executed 𝜏𝑖 under Case 1 is no later than that under Case 2. Once we apply the
consecutively without any preemption while sharing the same period well-known schedulability condition for preemptive EDF to Case 2, the
and relative deadline. Similarly, when an active task is running, it condition is the same as Eq. (2), which proves the lemma. □
executes without any interruptions, while other tasks wait in the queue.
Note that the proof is self-contained, but a different proof for
In addition, each task runs on an environment where non-preemption
between the GPU and CPU is guaranteed. To ensure this, while the Lemma 1 can be found in [5,17].
CPU is running, the GPU waits for input from the CPU. Once the GPU To provide the second feature F2, the proposed scheduling al-
receives the input and is activated, the CPU waits until it receives the gorithms (i) dynamically increase the workload of each MOT task
results from the GPU, as illustrated in Fig. 2. As seen Fig. 2, GPU (e.g., from low workload to middle or high workload) without compro-
is used only for the inference of DNN models, such as the detector mising the timing guarantee while (ii) balance the aging of detection
(e.g., YOLOv5) and feature extractor (e.g., OSNet), while all other and association of every task. We propose two scheduling algorithms
execution is performed on the CPU. Also, CA-MOT does not allow that simultaneously provide (i) and (ii) in different ways: EDF-BE
parallel execution for multiple MOT executions (see Table 1). (EDF Best-Effort) and EDF-Slack (EDF with Slack reclamation), adapted
The worst-case execution time 𝐶𝑖𝐷 (𝑠𝑖 ) of the detection sub-task is from [5]. EDF-BE and EDF-Slack utilize slacks defined differently, but
determined by the sum of various components, including preprocessing use the same mechanism (in Algorithm 2) to decide on the execution
time (such as cropping and resizing the input image), image transfer option that employs a notion of aging.
time from CPU memory to GPU memory, model inference time to Let 𝑑1 (𝑡𝑐 𝑢𝑟 ) be the earliest deadline or future release time among
𝑡
obtain candidate objects, and postprocessing time (e.g., applying non- all tasks at a time instant 𝑡𝑐 𝑢𝑟 . The slack 𝑠𝑙𝑎𝑐 𝑘𝑖𝑐 𝑢𝑟 of task 𝜏𝑖 at 𝑡𝑐 𝑢𝑟
maximum suppression) to extract the final objects from the candidates. under the EDF-BE is defined as the expected remaining time up to
5
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
Algorithm 1 Slack calculation for 𝜏𝑘 at 𝑡𝑐 𝑢𝑟 under EDF-Slack
Input: 𝜏, 𝑡𝑐 𝑢𝑟
𝑡
Output: 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟
1: 𝑝 = 0, 𝑈 = the left-hand-side of Equation (2)
2: for 𝑖 = 𝑛 to 1, 𝜏𝑖 ∈ {𝜏1 , ..., 𝜏𝑛 |𝑑1 (𝑡𝑐 𝑢𝑟 ) ≤ ⋯ ≤ 𝑑𝑛 (𝑡𝑐 𝑢𝑟 )} do
𝐶𝑖 (𝐿, 𝐿)
3: 𝑈 =𝑈
𝑇𝑖
4: 𝑞𝑖 = max(0, 𝑅𝐶𝑖 (𝐿, 𝐿) (1 𝑈 ) ⋅ (𝑑𝑖 (𝑡𝑐 𝑢𝑟 ) 𝑑1 (𝑡𝑐 𝑢𝑟 )))
( 𝑅𝐶𝑖 (𝐿, 𝐿) 𝑞𝑖 )
5: 𝑈 = min 1.0, 𝑈 +
𝑑𝑖 (𝑡𝑐 𝑢𝑟 ) 𝑑1 (𝑡𝑐 𝑢𝑟 )
6: 𝑝 = 𝑝 + 𝑞𝑖
7: end for
𝑡
8: return 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 = 𝑑1 (𝑡𝑐 𝑢𝑟 ) 𝑡𝑐 𝑢𝑟 𝑝
that does not exceed the earliest deadline or future release, ensuring
Fig. 3. Execution timeline of multiple MOT tasks under (a) baseline (non-preemptive execution without deadline misses. The following lemma present the
EDF), (b) EDF-BE, and (c) EDF-Slack scheduling policies.
timing guarantee of EDF-BE.
Theorem 1. A task set 𝜏 that satisfies the condition in Eq. (2) is
𝑑1 (𝑡𝑐 𝑢𝑟 ) after the execution of 𝐶𝑖 (𝐿, 𝐿) is completed, which is calculated schedulable by EDF-BE .
by 𝑑1 (𝑡𝑐 𝑢𝑟 ) 𝑡𝑐 𝑢𝑟 𝐶𝑖 (𝐿, 𝐿). This slack value is only valid when there
are no more than two tasks in the waiting queue at time 𝑡𝑐 𝑢𝑟 and no Proof. According to Lemma 1, for a task set 𝜏 that satisfies Eq. (2),
future releases within the interval [𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 )). Using the slack value the minimum execution time 𝐶𝑖 (𝐿, 𝐿) of all tasks 𝜏𝑖 ∈ 𝜏 guarantees
conditionally provided at a scheduling decision, EDF-BE can perform execution without deadline misses. At each scheduling decision at 𝑡
middle- or high-workload execution for detection and/or association. under the online policy of EDF-BE, the execution of a job exploiting
any slack value does not impose additional inference on any other job.
Example. Figs. 3(a) and (b) present a scheduling scenario of the This guarantees that all tasks 𝜏𝑖 receive no more interference than what
baseline algorithm (i.e., non-preemptive EDF) and EDF-BE with an they would receive under non-preemptive EDF scheduling. Thus, this
example task set. We consider an example task set 𝜏 = {𝜏1 , 𝜏2 } of theorem holds. □
which 𝐶𝑖 = 𝐶𝑖𝐷 (𝐻) + 𝐶𝑖𝐴 (𝐻) = 25, 𝑇𝑖 = 25, 𝐶𝑖𝐷 (𝑠𝑖 ) = {5, 9, 12}, and
𝐶𝑖𝐴 (𝑓𝑖 ) = {3, 8, 13} hold for 𝜏𝑖 ∈ 𝜏. As shown in Figs. 3(a) and (b), each 4.3. EDF with slack reclamation
first job of 𝜏1 and 𝜏2 are released at 𝑡 = 0 and 𝑡 = 13, respectively. In
the baseline algorithm, the first job of 𝜏1 executes for 25 time units, In the case of EDF-BE, more workload than the minimum execution
and then the first job of 𝜏2 starts its execution at 𝑡 = 25 resulting in can only be processed when there is a single job in the waiting queue at
a deadline miss at 𝑡 = 38. Let 𝑎𝑔 𝑒𝐷 𝐴
𝑖 and 𝑎𝑔 𝑒𝑖 be the aging value of a given time 𝑡𝑐 𝑢𝑟 and no additional releases occur until 𝑑1 (𝑡𝑐 𝑢𝑟 ). This cre-
detection and association of 𝜏𝑖 . The aging value is an integer satisfying ates a limited opportunity for MOT tasks in CA-MOT to perform more
𝑎𝑔 𝑒𝐷 𝐴 𝐷 𝐴
𝑖 , 𝑎𝑔 𝑒𝑖 ≥ 0, and 𝑎𝑔 𝑒𝑖 and 𝑎𝑔 𝑒𝑖 for all task 𝜏𝑖 ∈ 𝜏 are set to zero workload than the minimum execution, thus restricting the potential
at the beginning of the system. The, 𝑎𝑔 𝑒𝐷 𝐴
𝑖 (and 𝑎𝑔 𝑒𝑖 ) increases by one to improve tracking accuracy. To address this limitation, we integrate
at each time when a detection (and association) is run with middle- or the approach presented in [5] into EDF-Slack, allowing it to compute
high-workload. In other words, the aging value refers to the number of slack in a different way than EDF-BE.
executions excluding those with low workloads. By adjusting the aging Let 𝑑𝑖 (𝑡𝑐 𝑢𝑟 ) denote the 𝑖th earliest deadline or release time at 𝑡𝑐 𝑢𝑟 , and
value, a balance is maintained so that neither detection nor association 𝑅𝐶𝑖 (𝐿, 𝐿) represent the remaining execution time required to complete
becomes disproportionately large. the minimum execution 𝐶𝑖 (𝐿, 𝐿). Algorithm 1 outlines the calculation
𝑡
of the slack value 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 for task 𝜏𝑘 at 𝑡𝑐 𝑢𝑟 within the EDF-Slack
Compared to EDF-Slack, EDF-BE is a simpler algorithm that utilizes
algorithm, triggered at each scheduling decision. Since EDF is a job-
as many resources as possible, executing a job for greater than 𝐶𝑖 (𝐿, 𝐿)
level fixed-priority scheduling policy, wherein the priority of a job
up to its closest future release only when there is exactly one job in the
remains constant throughout its execution, scheduling decisions under
waiting queue. EDF-BE naturally guarantees no deadline misses in any
EDF occur either at the commencement of a jobs execution or upon its
job execution. This is because, as stated in Lemma 1, the execution of
completion. In the interval [𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 )], EDF-Slack processes tasks in
𝐶𝑖 (𝐿, 𝐿) without deadline misses for all jobs is guaranteed under EDF.
reverse EDF order, starting from the task with the latest deadline. Job
Furthermore, when a job executes for more than 𝐶𝑖 (𝐿, 𝐿) under EDF-
𝐽𝑘 of 𝜏𝑘 has the highest priority at 𝑡𝑐 𝑢𝑟 , with 𝑑1 (𝑡𝑐 𝑢𝑟 ) being its deadline,
BE, there is only one active job in the waiting queue at that time. In
as EDF-Slack follows the EDF policy. The goal of the slack calculation
the case of EDF-BE, when the first job of task 𝜏1 starts its execution
in Algorithm 1 is to delay the execution of all other tasks 𝜏𝑖 ∈ 𝜏 𝜏𝑘
at 𝑡 = 0, it executes the minimum execution 𝐶1 (𝐿, 𝐿) = 8 until the
beyond 𝑑1 (𝑡𝑐 𝑢𝑟 ) while ensuring that future deadlines are met. This is
earliest deadline or future release at 𝑡 = 13, resulting in a slack of five. repeated for all tasks in the waiting queue. To ensure 𝜏𝑖 completes
Utilizing this slack, the task 𝜏1 then executes 𝐶1 (𝑀 , 𝐿), and the aging 𝐶𝑖 (𝐿, 𝐿) before 𝑑𝑖 (𝑡𝑐 𝑢𝑟 ), EDF-Slack calculates the maximum execution
factor 𝑎𝑔 𝑒𝐷
𝑖 increases by one. For the first job of task 𝜏2 , released at time in the interval 𝑑1 (𝑡𝑐 𝑢𝑟 ), 𝑑𝑖 (𝑡𝑐 𝑢𝑟 ), which is (1 𝑈 ) ⋅ (𝑑𝑖 (𝑡𝑐 𝑢𝑟 ) 𝑑1 (𝑡𝑐 𝑢𝑟 )),
𝑡 = 13, there is a slack of 4 until the earliest deadline or future release where 𝑈 denote the left-hand-side of Eq. (2).
at 𝑡 = 25. Thus, 𝐶2 (𝑀 , 𝐿) is executed, and 𝑎𝑔 𝑒𝐷 𝑖 increases by one. The The key steps in the slack calculation are as follows:
second job of task 𝜏1 , released at 𝑡 = 13, has a slack of 5 until the earliest
deadline or future release at 𝑡 = 38. To balance 𝑎𝑔 𝑒𝐷 𝐴
𝑖 and 𝑎𝑔 𝑒𝑖 , 𝐶1 (𝐿, 𝑀) • 𝑞𝑖 is computed as the minimum execution of 𝜏𝑖 in the interval
𝐴
is executed, and 𝑎𝑔 𝑒𝑖 increases by one. The details of the online policy 𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 ) (Lines 34).
that effectively balances the aging of detection and association for each • 𝑅𝐶𝑖 (𝐿, 𝐿) is either zero or 𝐶𝑖 (𝐿, 𝐿), since scheduling decisions
task will be provided at the end of this section. As can be observed are only made upon job completion or release in non-preemptive
from the figure, at each scheduling decision, an execution is performed scheduling (Line 4).
6
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
• The execution rate of 𝜏𝑖 in the interval 𝑑1 (𝑡𝑐 𝑢𝑟 ), 𝑑𝑖 (𝑡𝑐 𝑢𝑟 ) is calculated Algorithm 2 Determination of execution options
and recorded (Line 5). 𝑡
Input: 𝜏, 𝑡𝑐 𝑢𝑟 , 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟
𝑝 is set as the sum of the minimum execution times of all tasks
Output: (𝑠𝑘 , 𝑓𝑘 )
𝜏𝑖 ∈ 𝜏 (Line 6). 𝑡
1: if 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 ≤ 0 then
• The slack is then determined as the remaining time slots, exclud-
2: return (𝐿, 𝐿)
ing 𝑝 (i.e., the sum of 𝑞𝑖 ), within the interval 𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 ) (Line
3: else
7).
4: if 𝑎𝑔 𝑒𝐷
𝑘
𝑎𝑔 𝑒𝐴
𝑘
then
𝑡
Example. Fig. 3(c) illustrates a scheduling scenario of EDF-Slack using 5: 𝑠𝑙𝑎𝑐 𝑘𝐷
𝑘
= 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 (𝐶𝑘𝐷 (𝐻) 𝐶𝑘𝐷 (𝐿))
the same example tasks as shown in Figs. 3(a) and (b). The initial jobs 6: if 𝑠𝑙𝑎𝑐 𝑘𝐷
𝑘
≥ 0 then
of 𝜏1 and 𝜏2 are released at 𝑡 = 0 and 𝑡 = 13, respectively. Applying 7: return (𝐻 , 𝑓𝑘 (𝑠𝑙𝑎𝑐 𝑘𝐷 𝑘
+ 𝐶𝑘𝐴 (𝐿)))
Algorithm 1, the calculated slack value for 𝜏1 at 𝑡𝑐 𝑢𝑟 = 0 is 17, allowing 8: else
𝑡
the first job of 𝜏1 to execute for 𝐶1 (𝐻 , 𝐻) until 𝑡 = 25. Furthermore, 9: return (𝑠𝑘 (𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 + 𝐶𝑘𝐷 (𝐿)), 𝐿)
𝑎𝑔 𝑒𝐷
1
and 𝑎𝑔 𝑒𝐴1
increment by one. Subsequently, the first job of 𝜏2 begins 10: end if
its execution at 𝑡 = 25, executing for 𝐶2 (𝑀 , 𝐿) while increasing 𝑎𝑔 𝑒𝐷 2
11: else
𝑡
by one. Finally, the second job of 𝜏2 starts its execution at 𝑡 = 37. 12: 𝑠𝑙𝑎𝑐 𝑘𝐴
𝑘
= 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 (𝐶𝑘𝐴 (𝐻) 𝐶𝑘𝐴 (𝐿))
𝐴
if 𝑠𝑙𝑎𝑐 𝑘𝑘 ≥ 0 then
Comparing Fig. 3(b) that represents EDF-BE with Fig. 3(c) depicting 13:
EDF-Slack, we observe that the aging of 𝜏1 and 𝜏2 increases in the same 14: return (𝑠𝑘 (𝑠𝑙𝑎𝑐 𝑘𝐴 𝑘
+ 𝐶𝑘𝐷 (𝐿)), 𝐻)
amount in both cases. However, the key difference lies in the execution 15: else
𝑡
of the first job of 𝜏1 . Under EDF-Slack, this job is able to execute with a 16: return (𝐿, 𝑓𝑘 (𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 + 𝐶𝑘𝐴 (𝐿)))
high-workload execution, while under EDF-BE, it can only execute with 17: end if
a middle-workload execution, which allows for higher expectations of 18: end if
tracking accuracy in EDF-Slack. 19: end if
The following proves the timing guarantee of EDF-Slack.
Theorem 2. A task set 𝜏 that satisfies Eq. (2) is schedulable under
EDF-Slack . • If the slack is less than or equal to zero, the algorithm returns 𝐿
and 𝐿 (Lines 12).
Proof. We prove this by contradiction. Assume, for the sake of con- • Otherwise, the algorithm compares the ages of the detection step
tradiction, that the task set 𝜏 satisfies Eq. (2), but is not schedulable (𝑎𝑔 𝑒𝐷
𝑘
) and the association step (𝑎𝑔 𝑒𝐴
𝑘
) (Lines 34).
under EDF-Slack. This implies that at some time 𝑡, the total utilization
exceeds 1.0, and hence a deadline miss occurs for some job 𝐽𝑖 in 𝜏. If 𝑎𝑔 𝑒𝐷
𝑘
is smaller than 𝑎𝑔 𝑒𝐴 𝑘
, indicating the detection step
Let 𝑡𝑚𝑖𝑠𝑠 denote the earliest such time at which a deadline miss occurs, requires more resources, the algorithm calculates 𝑠𝑙𝑎𝑐 𝑘𝐷 𝑘
,
i.e., 𝑡𝑚𝑖𝑠𝑠 = 𝑑𝑖 , where 𝑑𝑖 is the deadline of 𝐽𝑖 . By the definition of EDF- representing the remaining slack after executing
Slack, at each time 𝑡, the slack time for each task is computed based high-workload detection (Line 5).
on the highest-priority job 𝐽1 (𝑡𝑐 𝑢𝑟 ), where 𝑡𝑐 𝑢𝑟 denotes the current time. If 𝑠𝑙𝑎𝑐 𝑘𝐷
𝑘
is greater than or equal to zero, the high-workload
Since no tasks are released in the interval [𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 )], the slack time detection is followed by middle- or high-workload associa-
ensures that lower-priority tasks cannot block the execution of 𝐽1 . As a tion depending on 𝑠𝑙𝑎𝑐 𝑘𝐷 𝑘
(Lines 67). In this case, 𝑓𝑘 (𝑥) is
result, the blocking term in Eq. (2) remains valid during this interval. set as follows:
Now, since EDF-Slack is based on EDF scheduling, the total utiliza- 𝐿 for 𝑥 < 𝐶𝑘𝐴 (𝑀),
tion 𝑈 (𝑡) at any time 𝑡 can be expressed as:
𝑀 for 𝐶𝑘𝐴 (𝑀) ≤ 𝑥 < 𝐶𝑘𝐴 (𝐻),
𝐶𝑖
𝑈 (𝑡) = + 𝐵(𝑡), 𝐻 for 𝑥𝐶𝑘𝐴 (𝐻).
𝑑
𝐽 ∈𝜏(𝑡) 𝑖
𝑡𝑐 𝑢𝑟
𝑖
where 𝐶𝑖 is the remaining execution time of task 𝐽𝑖 , and 𝐵(𝑡) is the If 𝑠𝑙𝑎𝑐 𝑘𝐷
𝑘
is less than zero, the algorithm determines if
blocking term. According to Eq. (2), 𝑈 (𝑡) ≤ 1.0 for all 𝑡. Since 𝑡𝑚𝑖𝑠𝑠 is middle- or high-workload detection can be performed based
𝑡
the earliest time a deadline miss occurs, we must have 𝑈 (𝑡𝑚𝑖𝑠𝑠 ) > 1.0. on 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 , followed by low-workload association (Lines
However, by Eq. (2), we know that 𝑈 (𝑡) ≤ 1.0 for all 𝑡𝑡𝑐 𝑢𝑟 , including 810). In this case, 𝑠𝑘 (𝑥) is set as follows:
𝑡𝑚𝑖𝑠𝑠 . This leads to a contradiction, as the assumption that 𝑈 (𝑡𝑚𝑖𝑠𝑠 ) > 𝐿 for 𝑥 < 𝐶𝑘𝐷 (𝑀),
1.0 contradicts the fact that 𝑈 (𝑡) ≤ 1.0 holds at all times. Therefore,
𝑀 for 𝐶𝑘𝐷 (𝑀) ≤ 𝑥 < 𝐶𝑘𝐷 (𝐻),
no deadline miss can occur, and the task set 𝜏 is schedulable under
EDF-Slack. □ 𝐻 for 𝑥𝐶𝑘𝐷 (𝐻).
Note that the proof is self-contained, but a different proof can be • Lines 1118 follow a similar procedure for determining the ex-
found in [5]. ecution options, giving preference to the association step. Here,
Determination of execution options. EDF-BE and EDF-Slack use 𝑠𝑙𝑎𝑐 𝑘𝐷 represents the remaining slack after executing the high-
𝑘
different slack concepts to ensure timely execution of tasks while im- workload association.
proving tracking accuracy by executing beyond the minimum
According to the definition of aging, 𝑎𝑔 𝑒𝐷 𝑘
(and 𝑎𝑔 𝑒𝐴
𝑘
) increase
(i.e., 𝐶𝑖 (𝐿, 𝐿)). As shown in Figs. 3(b) and (c), both EDF-BE and
by one when middle- or high-workload detection (and association) is
EDF-Slack enhance the aging of detection and association through
performed.
predefined mechanisms. The goal of these mechanisms is to balance the
DNN-SAM proposed in [10] introduces two scheduling algorithms:
aging of detection and association, minimizing continuous omissions in
EDF-MandFirst and EDF-Slack. Unlike CA-MOT, both DNN-SAM al-
updating motion and feature information, thereby maximizing tracking
gorithms target multi-object detection (MOD) tasks. The primary dis-
accuracy. tinction between MOT and MOD lies in the presence or absence of
Algorithm 2 outlines the process for determining the execution dependencies between consecutive frames. In MOD, the detection oper-
options for the detection and association steps of task 𝜏𝑘 at time 𝑡𝑐 𝑢𝑟 ation for a given frame does not utilize any information from previous
𝑡
based on the slack 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 calculated in Algorithm 1. frames. Therefore, techniques that rely on previous frame information,
7
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
such as aging-aware methods, cannot be employed in the DNN-SAM al- Table 2
gorithms. Another key difference is that DNN-SAM is responsible solely Execution time measurement (average and maximum) in terms of image size, feature
size, and scheduling overhead.
for detection execution and does not handle the association task. Both
Time (ms) 𝐶𝑖𝐷 𝐶𝑖𝐴 𝐶𝑖𝑠𝑐 𝑒
DNN-SAM and CA-MOT algorithms are based on EDF and prioritize
executing jobs with the earliest deadlines among the released tasks. L M H L M H
However, in contrast to CA-MOT, DNN-SAM splits each job at release Average 28.0 30.6 36.7 8.3 63.4 74.4 0.3
Maximum 43.6 53.5 67.6 11.3 74.0 125.2 0.6
into a mandatory job, responsible for execution in the safety-critical
area, and an optional job, responsible for execution in non-critical
areas. When any mandatory job is present in the waiting queue, it
is always executed first using the EDF algorithm. The distinction be-
tween MandFirst and EDF-Slack arises from whether the execution
of an optional job may interfere with the execution of a mandatory
job. Specifically, the scheduling behavior of DNN-SAM and EDF-Slack
operates as follows:
• EDF-MandFirst in [10]: Any mandatory job in the waiting queue
has a higher priority than optional jobs and is scheduled using
EDF. If no mandatory jobs are in the queue, optional jobs are
executed using EDF, ensuring that they do not interfere with the Fig. 4. Comparison for two tasks with the periods (equal to the relative deadlines) of
execution of future release jobs of mandatory tasks. 180 ms and 270 ms.
• EDF-Slack in [10]: Any mandatory job in the waiting queue has
a higher priority than optional jobs and is scheduled using EDF.
If no mandatory jobs are in the queue, optional jobs are executed the ground truth of objects in all frames with the tracking results
using EDF, potentially interfering with the execution of future- obtained from the given techniques to measure accuracy. The
release mandatory jobs, within the slack calculated from the jobs KITTI dataset consists solely of data captured from forward-facing
runtime. cameras and does not utilize different cameras, meaning there
• EDF-BE of CA-MOT: A job is not split and has three execu- is no overlap in the areas they cover. Additionally, as it does
tion options for both detection and association. It is executed not assume simultaneous capture by each camera, there are no
with the maximum workload option to avoid interfering with synchronization issues. CA-MOT aims to maximize the average
the execution of future release jobs, but only when exactly one accuracy for the MOT tasks corresponding to all given cameras
job is present in the waiting queue. The aging of detection and without missing any deadlines. This assumes that CA-MOT oper-
association tasks is considered for accuracy maximization. ates independently of camera interdependencies, with all cameras
• EDF-Slack of CA-MOT: A job is not split and has three execution receiving the same forward-facing camera feed.
options for both detection and association. Regardless of the • Execution time measurement: To obtain the WCET of different
number of jobs in the waiting queue, the job is executed with execution options for detection and association, we measured the
the maximum workload option, potentially interfering with the execution time by iterating 1000 times for each sub-tasks with
execution of future release jobs based on the slack calculated three different execution options of an MOT task and then took
from its runtime. The aging of detection and association tasks is the largest value. We also measured the worst-case time required
considered for accuracy maximization. for slack calculation and scheduling decisions such as Algorithms
1 and 2. Table 2 shows the measurement results.
5. Evaluation
5.2. Experiment result
This section evaluates the effectiveness of CA-MOT in achieving R1
and R2 for multiple MOT tasks.
We consider task sets in which schedulability is not guaranteed with
the high-workload execution for detection and association
5.1. Experiment setting
(i.e., 𝐶𝑖 (𝐻 , 𝐻)) for all tasks but is guaranteed with the minimum
execution (𝐶𝑖 (𝐿, 𝐿)) according to Eq. (2). Note that the schedulability
• Software: CA-MOT employs the tracking-by-detection of which
with 𝐶𝑖 (𝑥, 𝑦) for 𝑥, 𝑦 ∈ {𝐿, 𝑀 , 𝐻} can be judged with Eq. (2) by
the detector is one of the most popular detectors, YOLOv5 [14]
substituting 𝐶𝑖 (𝐿, 𝐿) to 𝐶𝑖 (𝑥, 𝑦). To evaluate the effectiveness of CA-
model, and tracker is StrongSORT [7]. We confirmed that other
MOT we consider the following including a baseline and our two
detectors (i.e. YOLOX, Faster-RCNN) exhibit a similar trend to
proposed approaches.
YOLOv5 in terms of MOTA and execution time, as shown in Fig. 7.
For feature extraction conducted as a part of association, we used • Detection first (DF): non-preemptive EDF in which the execution
OS-Net [10]. The YOLOv5 model was pretrained on the COCO option of all tasks 𝜏𝑖 ∈ 𝜏 is equally fixed to the rightmost
Dataset [18], while OS-Net was pretrained on the MSMT Dataset
one among {𝐶𝑖 (𝐿, 𝐿), 𝐶𝑖 (𝑀 , 𝐿), 𝐶𝑖 (𝐻 , 𝐿), 𝐶𝑖 (𝐻 , 𝑀), 𝐶𝑖 (𝐻 , 𝐻)} that
[19]. The experimental environment is with Ubuntu 18.04.6 LTS,
satisfies the schedulability condition in Eq. (2).
CUDA 11.4, and PyTorch 1.12.
• EDF-BE: EDF-BE of which task set passes the schedulability con-
• Hardware: We consider the NVIDIA Jetson Xavier as a GPU-
dition in Eq. (2), which is proposed in Section 4.2.
enabled embedded board [20]. The NVIDIA Jetson Xavier features
• EDF-Slack: EDF-Slack of which task set passes the schedulability
a 64-bit 8-core CPU, 32 GB Memory, and 512-core Volta GPU. We
condition in Eq. (2), which is proposed in Section 4.3.
utilized the MAXN mode provided by the NVIDIA Jetson Xavier.
• Dataset and performance metric: We used the KITTI Dataset Fig. 4 represents the tracking accuracy and the proportion of three
[13], which contains data collected from autonomous vehicle execution options (i.e., 𝐿, 𝑀, and 𝐻) selected during detection and
driving. To evaluate the accuracy of each region, we measured association for two tasks with different periods: 180 and 270 ms
the MOTA [15] as the most well-known performance metric for (milliseconds). As shown in Fig. 4(a), for overall accuracy, EDF-BE and
tracking accuracy for critical and entire regions. MOTA compares EDF-Slack achieve 20.2% and 26.6%, respectively, while DF achieves
8
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
Fig. 5. Comparison for four tasks with the same period (equal to the relative deadline) Fig. 6. Visualization on KITTI dataset for three tasks with the periods of 400 ms. (For
of 400 ms. interpretation of the references to color in this figure legend, the reader is referred to
the web version of this article.)
13.4%, which demonstrate the effectiveness of slack utilization and
balancing aging of detection and association in increasing tracking
accuracy. We observe that the slack reclamation performed by Algo-
rithm 1 in EDF-Slack is significantly more effective in achieving high
tracking accuracy than in EDF-BE which has limitations in obtaining
a substantial amount of slack. For critical accuracy, EDF-BE and EDF-
Slack achieve much higher accuracies, which are 28.3% and 32.2%,
respectively, compared to 15.4% of DF. Based on this observation,
we can interpret that even though EDF-BE obtains a smaller amount
Fig. 7. MOTA and execution time on other detectors.
of slack compared to EDF-Slack, it efficiently performs tracking for
the critical region with limited computing resources. On the other
hand, EDF-Slack provides high tracking accuracy not only for the entire
region but also for the safety-critical region, thanks to its efficient slack Additional experiments are conducted to ascertain if CA-MOT ex-
reclamation. As seen in Fig. 4, EDF-Slack exhibits a significantly higher hibits comparable behavioral patterns across a range of detectors,
proportion of high-workload execution and middle-workload execution including YOLOv5, which was evaluated previously. Fig. 7 displays
for detection and association compared to other execution options. On the MOTA and execution time for various contemporary detectors,
the other hand, EDF-BE shows a slight proportion of middle-workload analyzed according to their workload. Modern detectors are generally
execution, while the majority of cases involve low-workload execution. classified into one-stage and two-stage categories based on their archi-
Fig. 5 depicts the results of another experiment involving three tecture and further into anchor-free and anchor-based types, contingent
different sets of tasks, with the number of tasks ranging from two on their use of predefined anchors for object detection. Our study
to four, all having the same periods (i.e., 400 ms with a guaranteed incorporated YOLOv5, a standard one-stage anchor-based detector. We
minimum execution 𝐶𝑖 (𝐿, 𝐿), but no guaranteed maximum execution also investigate the performance of the two-stage anchor-based detector
𝐶𝑖 (𝐻 , 𝐻) for 𝜏𝑖 ∈ 𝜏). In Fig. 5(a), the tracking accuracy of the evaluated Faster-RCNN [9] and the one-stage anchor-free detector YOLOX [21],
approaches is shown as the number of tasks increases. For the case to verify the consistency of results. Faster-RCNN utilized ResNet-50
of two tasks, EDF-Slack achieves an overall accuracy of 41.8% and a as its backbone network, while YOLOX was configured with a small
critical accuracy of 41.4%, while EDF-BE achieves an overall accuracy version model. Both models were trained using the COCO dataset.
of 24.3% and a critical accuracy of 27.2%. In contrast, DF achieves Despite minor discrepancies in specific ratios, the results consistently
lower accuracy, with an overall accuracy of 18.0% and a critical demonstrate that both MOTA and execution time escalate in conjunc-
accuracy of 18.7%. As the number of tasks increases, both EDF-BE and tion with increasing workload, as shown in Figs. 7(a) and (b). The
EDF-Slack experience a decrease in accuracy, but they still outperform runtime trend of YOLOX is particularly noteworthy, which closely
DF in terms of tracking accuracy. Even with only four tasks, EDF- mirrors that of YOLOv5. This pattern indicates that similar outcomes
BE yields lower overall accuracy than DF, as it can only detect part may be expected from other detectors akin to YOLOv5.
of the image when selecting the high workload option. Nevertheless,
by prioritizing computations in critical regions at low and medium
6. Related work
workloads, EDF-BE attains higher critical accuracy than DF. Fig. 5(b)
presents the distribution of execution options for EDF-BE and EDF-Slack
The tracking-by-detection model is a commonly used method in
when there are three tasks. Similar to Fig. 4, it is evident that both
the MOT field. It has shown significant progress and enhanced perfor-
EDF-BE and EDF-Slack allocate the workload between the detection and
association steps in a balanced manner using the ages. Additionally, mance recently, largely thanks to the evolution of deep neural networks
EDF-Slack can reclaim more slack compared to EDF-BE. (DNNs). A well-recognized model in this field, SORT (simple online
Fig. 6 presents the tracking outcomes of a single task within a set and real-time tracking) [22], does its matching based mainly on where
of three tasks, each with a period of 400 ms, comparing (a) the DF objects are located, using detection tools to achieve this. To push this
algorithm and (b) EDF-Slack. In the visualization, each tracked object model further, DeepSORT [6] builds on the SORT model by adding a
is represented by a unique color and ID within a bounding box, with DNN-based re-identification model. This allows for the extraction of ob-
the symbol # indicating the frame number. In the DF scenario, each ject features. By adding this layer, DeepSORT utilizes both the objects
task executes 𝐶𝑖 (𝐻 , 𝐿), leading to insufficient computational resources location and its visual information, leading to a stronger performance.
for proper association. This inadequacy results in DFs failure to track Recent work in this area, such as Deep OC-SORT [23] and Strong-
two objects within the safety-critical region in the 167th frame and SORT [7], is geared towards enhancing the accuracy of these models
causes an ID switch from 4 to 10 in the subsequent 168th frame, even more, focusing especially on refining and improving the matching
as illustrated in Fig. 6. Conversely, EDF-Slack leverages aging and algorithms used in these systems. However, it is critical to understand
slack techniques to allocate sufficient computational resources for both that these approaches are mainly designed for situations where there
detection and association tasks, enabling accurate tracking of all objects are plenty of computing resources. Therefore, they might struggle to
in the safety-critical region. meet the timing needs in systems that are restricted in resources, like
9
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
the embedded systems in self-driving vehicles where resources may be Declaration of competing interest
scarce.
Considering self-driving vehicles, which are fundamentally systems The authors declare the following financial interests/personal rela-
where safety is critical, even the smallest delays or slight drops in tionships which may be considered as potential competing interests:
accuracy can lead to significant and potentially dangerous risks. Some Hyeongboo Baek reports financial support was provided by National
research such as DNN-SAM [5] has tried to tackle these problems by Research Foundation of Korea. If there are other authors, they declare
suggesting frameworks that concentrate specifically on safety-critical that they have no known competing financial interests or personal
areas. These frameworks give priority to critical accuracy and use relationships that could have appeared to influence the work reported
uncertainty handling to ensure the highest safety standards. However, in this paper.
these research studies and their related approaches are mostly designed
for multi-object detection systems and may not directly apply to or be Acknowledgments
effective in multi-object tracking. Likewise, another study, RT-MOT [2],
aims to maximize the overall accuracy of multi-object tracking and This work was supported by the National Research Foundation of
ensure on-time execution, but it overlooks the importance of individual Korea (NRF) grant funded by the Korea government (MSIT) (RS-2023-
objects in its approach. To address these limitations, our suggested 00250742, 2022R1A4A3018824, RS-2024-00438248). This work was
framework, known as CA-MOT, aims to confront these challenges partly supported by the Institute of Information & Communications
directly. By leveraging the unique traits of multi-object tracking in Technology Planning & Evaluation(IITP)-ITRC(Information Technol-
safety-critical systems, CA-MOT ensures on-time execution and boosts ogy Research Center) grant funded by the Korea government(MSIT)
tracking accuracy for objects that could potentially be dangerous to the (IITP-2025-RS-2023-00259061).
system. It builds on previous work while addressing their weaknesses
to create a safer and more efficient tracking system. Data availability
Data will be made available on request.
7. Discussion
A limitation of CA-MOT is its exclusive reliance on a single CPU and References
GPU, which restricts scalability. A recent approach, Batch-MOT [3],
addresses this limitation by processing input images from multiple [1] M. Yang, S. Wang, J. Bakita, T. Vu, F.D. Smith, J.H. Anderson, J.-M. Frahm,
Re-thinking CNN frameworks for time-sensitive autonomous-driving applications:
cameras through a shared queue, distributing CPU operations across Addressing an industrial challenge, in: Proceedings of IEEE Real-Time Technology
multiple CPUs, and employing batch processing on a single GPU. How- and Applications Symposium, IEEE, 2019, pp. 305317.
ever, this approach may introduce additional communication overhead [2] D. Kang, S. Lee, H.S. Chwa, S.-H. Bae, C.M. Kang, J. Lee, H. Baek, RT-MOT:
among CPUs, potentially determining its overall efficiency. The primary Confidence-aware real-time scheduling framework for multi-object tracking tasks,
in: Proceedings of IEEE Real-Time Systems Symposium, IEEE, 2022, pp. 318330.
contribution of Batch-MOT lies in its online schedulability analysis,
[3] D. Kang, S. Lee, C.-H. Hong, J. Lee, H. Baek, Batch-MOT: Batch-enabled real-
which dynamically determines the maximum number of images that time scheduling for multi-object tracking tasks, IEEE Trans. Comput.-Aided Des.
can be batch-processed without violating their deadlines. Nonethe- Integr. Circuits Syst. (2024).
less, unlike CA-MOT, Batch-MOT lacks support for multiple execution [4] S. Liu, X. Fu, M. Wigness, P. David, S. Yao, L. Sha, T. Abdelzaher, Self-cueing
real-time attention scheduling in criticality-aware visual machine perception, in:
strategies during the association phase, resulting in suboptimal resource
Proceedings of IEEE Real-Time Technology and Applications Symposium, IEEE,
utilization for individual MOT tasks. Enhancing CA-MOT by incor- 2022, pp. 173186.
porating batch processing capabilities to address these shortcomings [5] W. Kang, S. Chung, J.Y. Kim, Y. Lee, K. Lee, J. Lee, K.G. Shin, H.S. Chwa,
presents a promising avenue for future research. Furthermore, as high- DNN-SAM: Split-and-merge DNN execution for real-time object detection, in:
Proceedings of IEEE Real-Time Technology and Applications Symposium, 2022,
lighted in previous studies, deploying CA-MOT on real-world platforms,
URL https://rtcl.dgist.ac.kr/index.php/publication-2/.
such as the F1/10 autonomous driving platform [5], offers significant [6] N. Wojke, A. Bewley, D. Paulus, Simple online and realtime tracking with a
potential for further investigation and practical validation. deep association metric, in: Proceedings of the IEEE International Conference on
Image Processing, IEEE, 2017, pp. 36453649.
[7] Y. Du, Y. Song, B. Yang, Y. Zhao, StrongSORT: Make deepsort great again, 2022,
8. Conclusion arXiv preprint arXiv:2202.13514.
[8] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-
time object detection, in: Proceedings of the IEEE/CVF Conference on Computer
In this paper, we proposed, CA-MOT, a new criticality-aware MOT
Vision and Pattern Recognition, 2016, pp. 779788.
execution and scheduling framework. Aiming at achieving critical- [9] R. Girshick, Fast R-CNN, in: Proceedings of the IEEE International Conference
accuracy maximization and timing guarantee, CA-MOT first proposes on Computer Vision, 2015, pp. 14401448.
a new system design to offer a control knob between tracking accuracy [10] K. Zhou, Y. Yang, A. Cavallaro, T. Xiang, Omni-scale feature learning for
person re-identification, in: Proceedings of the IEEE International Conference
and timing guarantee to efficiently utilize limited computing resources.
on Computer Vision, 2019, pp. 37023712.
Then, CA-MOT develops two scheduling algorithms to effectively uti- [11] Y. Zhang, P. Sun, Y. Jiang, D. Yu, F. Weng, Z. Yuan, P. Luo, W. Liu, X.
lize the system design while using the notions of slack and aging of Wang, ByteTrack: Multi-object tracking by associating every detection box, in:
detection and association. Using various task sets and real-world au- Proceedings of the European Conference on Computer Vision, Springer, 2022,
pp. 121.
tonomous driving data, we demonstrated that CA-MOT can obtain high
[12] Y. Zhang, C. Wang, X. Wang, W. Zeng, W. Liu, FairMOT: On the fairness of
tracking accuracy of entire and safety-critical regions while ensuring detection and re-identification in multiple object tracking, Int. J. Comput. Vis.
the timely execution of all MOT tasks. 129 (11) (2021) 30693087.
[13] A. Geiger, P. Lenz, R. Urtasun, Are we ready for autonomous driving? The
KITTI vision benchmark suite, in: Proceedings of the IEEE/CVF Conference on
CRediT authorship contribution statement Computer Vision and Pattern Recognition, 2012.
[14] ultralytics, YOLOv5 [Online], 2022, Available: https://github.com/ultralytics/
yolov5.
Donghwa Kang: Writing original draft, Software, Methodology,
[15] K. Bernardin, R. Stiefelhagen, Evaluating multiple object tracking performance:
Formal analysis. Jinkyu Lee: Writing review & editing, Valida- the clear mot metrics, EURASIP J. Image Video Process. 2008 (2008) 110.
tion, Formal analysis. Hyeongboo Baek: Writing review & editing, [16] G. Welch, G. Bishop, et al., An introduction to the Kalman filter, ACM SIGGRAPH
Supervision, Funding acquisition, Formal analysis, Conceptualization. (1995).
10
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
[17] T.P. Baker, A stack-based resource allocation policy for realtime processes, in: Jinkyu Lee is an associate professor in the Department
Proceedings of IEEE Real-Time Systems Symposium, IEEE, 1990, pp. 191200. of Computer Science and Engineering at Sungkyunkwan
[18] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, University (SKKU), South Korea, where he joined in 2014.
C.L. Zitnick, Microsoft COCO: Common objects in context, in: Proceedings of the He received the BS, MS, and Ph.D. degrees in computer
European Conference on Computer Vision, Springer, 2014, pp. 740755. science from the Korea Advanced Institute of Science and
[19] L. Wei, S. Zhang, W. Gao, Q. Tian, Person transfer gan to bridge domain gap Technology (KAIST), South Korea, in 2004, 2006, and 2011,
for person re-identification, in: Proceedings of the IEEE/CVF Conference on respectively. He has been a research fellow/visiting scholar
Computer Vision and Pattern Recognition, 2018, pp. 7988. in the Department of Electrical Engineering and Computer
[20] NVIDIA, NVIDIA Xavier Developer Kit. [Online], 2022, Available: https://www. Science, University of Michigan until 2014. His research
nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-agx-xavier. interests include system design and analysis with timing
[21] Z. Ge, S. Liu, F. Wang, Z. Li, J. Sun, Yolox: Exceeding yolo series in 2021, 2021, guarantees, QoS support, and resource management in real-
arXiv preprint arXiv:2107.08430. time embedded systems and cyberphysical systems. He won
[22] A. Bewley, Z. Ge, L. Ott, F. Ramos, B. Upcroft, Simple online and realtime track- the best student paper award from the 17th IEEE Real-Time
ing, in: Proceedings of the IEEE International Conference on Image Processing, and Embedded Technology and Applications Symposium
IEEE, 2016, pp. 34643468. (RTAS) in 2011 and the Best Paper Award from the 33rd
[23] G. Maggiolino, A. Ahmad, J. Cao, K. Kitani, Deep oc-sort: Multi-pedestrian IEEE Real-Time Systems Symposium (RTSS) in 2012.
tracking by adaptive re-identification, 2023, arXiv preprint arXiv:2302.11813.
Hyeongboo Baek is an associate professor in the Depart-
Dongwha Kang is a Ph.D. course student in the School ment of Artificial Intelligence, University of Seoul (UOS),
of Computing, Korea Advanced Institute of Science and South Korea. He received the BS degree in Computer Science
Technology (KAIST), South Korea. He received a BS and and Engineering from Konkuk University, South Korea, in
MS degree in computer science from Incheon National Uni- 2010 and the MS and Ph.D. degrees in Computer Science
versity (INU) in 2022 and 2024 respectively. His research from KAIST, South Korea, in 2012 and 2016, respectively.
interests include artificial intelligence, autonomous systems, His research interests include cyberphysical systems, real-
and real-time embedded systems. time embedded systems, and system security. He won the
best paper award from the 33rd IEEE Real-Time Systems
Symposium (RTSS) in 2012.
11

View File

@@ -0,0 +1,788 @@
Computer Standards & Interfaces 97 (2026) 104111
Contents lists available at ScienceDirect
Computer Standards & Interfaces
journal homepage: www.elsevier.com/locate/csi
Refining decision boundaries via dynamic label adversarial training for
robust traffic classificationI
Haoyu Tong a,c,d , Meixia Miao b,c,d , Yundong Liu a,c,d , Xiaoyu Zhang a,c,d ,,
Xiangyang Luo c,d , Willy Susilo e
a
State Key Laboratory of Integrated Service Networks (ISN), Xidian University, 710121, Xian, China
b School of Cyberspace Security, Xian University of Posts and Telecommunications, Xian, 710121, China
c Key Laboratory of Cyberspace Security, Ministry of Education of China, 450001, Zhengzhou, China
d Henan Key Laboratory of Cyberspace Situation Awareness, 450001, Zhengzhou, China
e
School of Computing and Information Technology, University of Wollongong, Wollongong, Australia
ARTICLE INFO ABSTRACT
Keywords: Network traffic classification plays a critical role in securing modern communication systems, as it enables
Traffic classification the identification of malicious or abnormal patterns within traffic data. With the growing complexity of
Adversarial examples network environments, deep learning models have emerged as a compelling solution due to their ability to
Adversarial training
automatically learn discriminative representations from raw traffic. However, these models are highly vulner-
Label noise
able to adversarial examples, which can significantly degrade their performance by introducing imperceptible
perturbations. While adversarial training (AT) has emerged as a primary defense, it often suffers from label
noise, particularly when hard labels are forcibly assigned to adversarial examples whose true class may be
ambiguous. In this work, we first analyze the detrimental effect of label noise on adversarial training, revealing
that forcing hard labels onto adversarial examples can cause excessive shifts of the decision boundary away
from the adversarial examples, which in turn degrades the models generalization. Motivated by the theoretical
analysis, we propose Dynamic Label Adversarial Training (DLAT), a novel AT framework that mitigates label
noise via dynamically mixed soft labels. DLAT interpolates the logits of clean and adversarial examples
to estimate the labels of boundary-adjacent examples, which are then used as soft labels for adversarial
examples. By adaptively aligning the decision boundary toward the vicinity of adversarial examples, the
framework constrains unnecessary boundary shifts and alleviates generalization degradation caused by label
noise. Extensive evaluations on network traffic classification benchmarks validate the effectiveness of DLAT in
outperforming standard adversarial training and its variants in both robustness and generalization.
1. Introduction there is a growing demand for more intelligent and adaptive classi-
fication methods that do not rely on payload visibility or fixed port
Network traffic classification, which aims to determine the appli- mappings.
cation or service associated with observed traffic packets, flows, or In recent years, deep learning (DL) [9] has become a dominant
sessions, serves as a fundamental building block in a wide range of paradigm for network traffic classification due to its ability to auto-
networking tasks, including intrusion detection, quality-of-service man- matically extract the underlying representations from raw or lightly
agement, and traffic engineering [1,2]. In the early stages of network processed traffic data [1014]. Compared to traditional statistical or
management, classification was carried out mainly through port-based machine learning approaches that rely heavily on manual feature en-
identification [3,4] and deep packet inspection (DPI) [5,6]. However, gineering, deep neural networks, including convolutional, recurrent,
these traditional approaches have become increasingly ineffective due and Transformer-based architectures, can effectively capture spatial
to the widespread use of dynamic port allocation, encrypted commu- and temporal patterns in traffic data, enabling high accuracy even
nication protocols, and intentional obfuscation techniques [7,8]. As in challenging scenarios such as previously unseen traffic. However,
network environments become more complex and security-conscious,
I This article is part of a Special issue entitled: Secure AI published in Computer Standards & Interfaces.
Corresponding author at: State Key Laboratory of Integrated Service Networks (ISN), Xidian University, 710121, Xian, China.
E-mail addresses: haoyutong@stu.xidian.edu.cn (H. Tong), miaofeng415@163.com (M. Miao), yundongliu@stu.xidian.edu.cn (Y. Liu),
xiaoyuzhang@xidian.edu.cn (X. Zhang), xiangyangluo@126.com (X. Luo), wsusilo@uow.edu.au (W. Susilo).
https://doi.org/10.1016/j.csi.2025.104111
Received 26 October 2025; Received in revised form 29 November 2025; Accepted 8 December 2025
Available online 13 December 2025
0920-5489/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
H. Tong et al. Computer Standards & Interfaces 97 (2026) 104111
despite their impressive performance, deep learning-based classifiers the adversarial example is far from the boundary, a larger weight is
remain highly susceptible to adversarial examples. These are deliber- assigned to the clean prediction. In contrast, when it is close to the
ately crafted inputs with imperceptible perturbations that cause models boundary, more weight is allocated to the adversarial output. This
to misclassify [15,16]. In the context of traffic classification, adversarial similarity-guided interpolation enables precise estimation of soft labels
perturbations can manipulate flow-level features or packet sequences for boundary-adjacent examples, which in turn facilitates more accu-
in ways that evade detection without disrupting the underlying com- rate adjustment of the decision boundary. By avoiding rigid supervision
munication protocols. To mitigate this vulnerability, adversarial train- of hard labels, this adaptive labeling mechanism mitigates semantic
ing has been widely adopted as a defense mechanism by introducing distortion and helps the model learn more robust decision surfaces
adversarial examples during model training to enhance robustness [17]. under label noise. Our key contributions are outlined as follows:
While adversarial training is effective in many domains, apply-
ing it to traffic classification poses unique challenges. Unlike natural • We extend the understanding of label noise in adversarial training
image domains, traffic data distributions typically exhibit higher in- to the domain of network traffic classification. The compact and
trinsic dimensionality and more complex manifold structures. Different entangled distribution of traffic data makes it vulnerable to small
application protocols often share significant common subsequences perturbations, increasing the likelihood of label inconsistency in
at the byte level, creating naturally entangled features that separate adversarial examples. This inconsistency corresponds to a higher
classes through subtle statistical patterns rather than distinct visual degree of label noise, which enforces incorrect alignment and
characteristics. Furthermore, unlike images where semantic meaning impedes the learning of robust decision boundaries.
is often locally correlated, traffic features exhibit long-range depen- • We provide a theoretical characterization of how hard-label
dencies across packet sequences, making them particularly sensitive supervision on shifted adversarial examples induces excessive
to small, strategically placed perturbations. These characteristics cause movement of the decision boundary. Specifically, enforcing
even minor perturbations to readily shift traffic samples across class high-confidence predictions for adversarial examples distorts the
boundaries, leading to significant label noise during training. This issue classifier, increasing the risk of misclassification for nearby exam-
is further exacerbated by standard adversarial training practices [18], ples from other classes.
which introduce perturbed examples into the training set while still • We introduce a novel adversarial training method called DLAT,
assigning them the same labels as their clean examples, thereby inten- which dynamically assigns soft labels to adversarial examples
sifying the semantic mismatch between the true and assigned labels. based on their estimated proximity to the decision boundary.
Traditional adversarial training typically enforces the original hard Instead of assigning uniform soft labels or incurring high compu-
label on adversarial examples. While effective to some extent, this rigid tational overhead through explicit boundary detection, DLAT es-
supervision introduces significant label noise, especially when adver- timates soft labels through interpolation between clean and ad-
sarial examples cross or approach decision boundaries. Consequently, versarial examples, substantially reducing the cost of label gener-
the decision boundary is pushed away from perturbed examples, often ation.
reinforcing the robustness of the class in which the adversarial example
is located at the expense of others. This imbalance undermines the
2. Related work
overall robustness of the model, particularly in tasks such as traffic
classification, where class semantics are inherently ambiguous and
2.1. Traffic classification
sensitive to perturbations.
To address this issue, we propose Dynamic Label Adversarial Train-
ing (DLAT), a novel adversarial training framework designed to mit- Traffic classification, the task of identifying and categorizing net-
igate the adverse effects of excessive label noise in robust network work traffic based on application types, has evolved significantly over
traffic classification. Rather than rigidly assigning the original hard the years. Traditional methods such as port-based classification and
label to adversarial examples, DLAT constructs soft labels for examples payload inspection (DPI) were initially dominant but became ineffec-
near decision boundaries through a similarity-guided strategy that takes tive due to dynamic port allocation, encryption, and protocol obfusca-
advantage of the models output distributions. Such soft labels help tion. Statistical and machine learning-based approaches later emerged,
guide the decision boundary toward the neighborhood of adversarial leveraging flow-level features (e.g., packet size, inter-arrival time) to
examples, rather than forcing it away due to overconfident and po- classify encrypted and unencrypted traffic. However, these methods
tentially incorrect supervision. Instead of explicitly approximating the still relied on manual feature engineering, which is time-consuming and
decision boundary using computationally intensive techniques, such as error prone. The advent of DNNs revolutionized traffic classification
multi-step adversarial attacks with decaying step sizes, DLAT leverages by automating feature extraction and improving accuracy. Lotfollahi
the similarity between the output logits of clean and perturbed inputs et al. [10] first applied deep learning to the field of traffic classification.
to estimate the soft labels of the examples near the decision boundary. By leveraging stacked autoencoders (SAE) and CNN architectures, it
Specifically, since the similarity between their output distributions enables automatic extraction of network traffic features and achieves
reflects how close the adversarial example lies to the current decision efficient classification of encrypted network traffic. Subsequent studies
boundary, it serves as a reliable proxy for boundary proximity. Based have advanced DL-based traffic classification in both accuracy and
on this similarity, DLAT interpolates between the models prediction on applicability. Wang et al. [19] proposed an end-to-end 1D-CNN model
the clean and adversarial inputs. When adversarial and clean outputs that processes raw packet bytes to capture spatial patterns, eliminating
are closely aligned, the soft label remains closer to the clean prediction; the need for manual feature design. Lan et al. [20] combined 1D-
on the contrary, greater divergence triggers a softer supervisory signal CNN, Bi-LSTM, and multi-head attention to classify darknet traffic,
that better reflects the models uncertainty regarding adversarial input. leveraging side-channel features to enhance robustness. LEXNet [21]
This adaptive labeling mechanism mitigates the semantic distortion further improved deployment efficiency by introducing a lightweight
introduced by fixed-label training, thus reducing the risk of reinforcing and interpretable CNN with residual connections and a prototype layer,
incorrect decision boundaries and improving robustness under label enabling real-time inference on edge devices without sacrificing ac-
noise. Specifically, since the similarity between the output distributions curacy. Liu et al. [22] introduced an innovative hybrid architecture
of clean and adversarial examples serves as an effective proxy for their TransECA-Net, combining ECANet-enhanced CNN modules with Trans-
proximity to the decision boundary, DLAT computes this similarity former encoders to simultaneously extract local channel-wise features
to guide the interpolation between their corresponding logits. When and global temporal dependencies.
2
H. Tong et al. Computer Standards & Interfaces 97 (2026) 104111
2.2. Adversarial example attacks and defense Truncation. To standardize the size of the input dimensions of the
model, we truncate the flow to the first 784 bytes:
While deep learning has significantly advanced traffic classification,
𝜏𝑘 (F ) = (𝑏1 , … , 𝑏min(𝐿,𝑘) ), 𝑘 = 784. (2)
it inherits the inherent vulnerabilities of DNNs and is susceptible to
adversarial example attacks. Adversarial examples are inputs delib- Zero-Padding. For flows with 𝐿 < 784, zero-padding is applied to
erately modified with subtle perturbations that cause the model to
ensure uniform dimensionality:
produce incorrect predictions while remaining imperceptible to hu- {
man observers. This vulnerability also poses serious challenges to the (𝑏1 , … , 𝑏𝐿 , 0, … , 0) if 𝐿 < 784,
𝜋784 (F ) = (3)
security and reliability of DL-based traffic classification systems, high- 𝜏784 (F ) otherwise.
lighting the need for robust defense methods. Szegedy et al. [23] first
revealed this weakness by formulating an optimization problem to Image Mapping. The resulting 784-dimensional vector is reshaped into
find minimal perturbations that cause misclassification, attributing the a 28 × 28 grayscale image in row-major order. We define the mapping
phenomenon to local linearity in deep networks. Goodfellow et al. [15] 𝛷 Z784
256
→ Z28×28
256
as:
introduced the Fast Gradient Sign Method (FGSM), which efficiently
𝑏1 𝑏2 ⋯ 𝑏28 ⎤
generates adversarial examples by leveraging the linear approxima- ⎢ ⎥
𝑏 𝑏30 ⋯ 𝑏56 ⎥
tion of the loss function. Kurakin et al. [24] extended FGSM to an 𝛷(𝐟) = ⎢ 29 , (4)
iterative version (BIM) to improve attack success. Madry et al. [17] ⎢ ⋮ ⋮ ⋱ ⋮ ⎥
⎢𝑏 𝑏746 ⋯ ⎥
𝑏784 ⎦
further enhanced this with Projected Gradient Descent (PGD), adding ⎣ 745
random initialization to avoid local optima and establish a robust attack where 𝐟 = 𝜋784 (F ) is the padded byte vector. This bijection arranges
benchmark. Carlini and Wagner [25] proposed a strong optimization- bytes row-by-row into a square image.
based attack C&W that effectively bypasses gradient masking defenses.
Normalization. Finally, pixel values are normalized to the range [0, 1]:
Sadeghzadeh [16] extends the adversarial attack to the traffic clas-
sification field and proposes adversarial pad attack and adversarial 𝛷(𝐟)𝑖,𝑗
 (𝛷(𝐟))𝑖,𝑗 = . (5)
payload attack for packet and flow classification respectively, as well 255
as adversarial burst attack for the statistical characteristics of flow time The resulting tensor 𝑥 =  (𝛷(𝜋784 (F ))) ∈ [0, 1]28×28 is used as the
series. input to downstream neural models.
Adversarial training (AT) is a widely adopted defense strategy to
enhance DNNs robustness against such adversarial attacks by incor- 3.2. Notion
porating adversarial examples into the training process. Proposed by
Goodfellow et al. [15], AT initially used FGSM adversarial examples Let 𝒙 ∈ [0, 1]28×28 denote the resulting input image. The neural net-
combined with clean examples for optimization. Madry et al. [17] work takes 𝒙 as input and outputs either class predictions (e.g., traffic
showed that stronger PGD-based adversarial examples provide better type or application label) or binary decisions (e.g., benign vs. mali-
robustness through a minmax optimization. However, PGD training cious), depending on the task. Consider a 𝐾-class classification task on
often leads to overfitting on adversarial examples and reduced accu- the dataset  = {(𝒙𝑖 , 𝒚 𝑖 )}𝑁
𝑖=1
where 𝒙𝑖 are preprocessed network traffic
racy on clean data, highlighting a trade-off between robustness and and 𝒚 𝑖 ∈  = {1, … , 𝐾} are class labels. We consider a parameterized
generalization. To address this, Zhang et al. [26] introduced TRADES to model 𝑓𝜽 [0, 1]28×28 →  that maps a normalized grayscale image 𝑥
balance this trade-off with a regularized loss. Wang et al. [27] proposed to a probability distribution over classes (i.e., 𝒑 = 𝑓𝜽 (𝒙)) and the final
MART, which treats misclassified examples differently to enhance ro- predicted label is obtained by 𝒚̂ = arg max𝑘 𝒑𝑘 . We then denote the
standard loss function in the standard training process:
bustness. Dong et al. [28] developed AWP, combining input and weight
1 ∑
perturbations to flatten the loss landscape and further reduce robust 𝑁
error. However, the aforementioned methods were originally proposed 𝑠𝑡 (𝜽, ) = 𝓁(𝑓𝜽 (𝒙𝑖 ), 𝒚 𝑖 ), (6)
𝑁 𝑖=1
for image classification tasks and are not specifically designed for
robust traffic classification. Directly applying these methods to traffic where 𝑁 is the number of the training data, and 𝓁(⋅) denotes a loss
classification may not yield optimal results. For example, adversarial function that measures the discrepancy between the model prediction
training applied to traffic data frequently induces substantial label and the ground-truth label (e.g., cross-entropy).
noise, and inadequate management of such noise can considerably
hinder the enhancement of model robustness. 3.3. Adversarial attack
Deep learning models are known to be vulnerable to adversar-
3. Preliminaries
ial examples perturbed by imperceptible noise that induce incorrect
predictions. Network traffic classifiers based on deep learning inherit
3.1. Pre-processing this vulnerability: small, carefully designed perturbations can cause
significant degradation in classification performance. Formally, given
Consider a raw network traffic flow as a discrete byte-level se- a trained model 𝑓𝜃 [0, 1]28×28 →  and a clean input 𝑥, an adversary
quence of arbitrary length. Formally, a raw traffic flow is defined as aims to craft a perturbed input 𝑥 = 𝑥 + 𝛿 such that:
a variable-length sequence:
Minimize ‖𝛿‖𝑝 ,
F = (𝑏1 , 𝑏2 , … , 𝑏𝐿 ), (1) subject to: 𝑓𝜽 (𝒙 + 𝛿) = 𝒚 𝑡𝑎𝑟𝑔𝑒𝑡 , (7)
28×28
where 𝐿 ∈ N+ denotes the sequence length, and each byte 𝑏𝑖𝒙 + 𝛿 ∈ [0, 1] ,
Z256 = {0, 1, … , 255}. The flow F thus resides in the input space where 𝛿 denotes the adversarial perturbation and ‖ ⋅ ‖𝑝 (𝑝 ∈ {0, 1, 2, ∞})
= ∞ 𝑘
𝑘=1 Z256 , which encompasses all finite-length byte sequences. quantifies perturbation magnitude. For traffic image inputs, 𝑥 = 𝑥 + 𝛿
Following the methodology proposed by [19], each raw traffic flow maintains the structural properties of legitimate traffic while causing
F is standardized to a fixed length of 784 bytes to enable batch process- misclassification. Under a white-box threat model where adversaries
ing and compatibility with convolutional neural networks. Specifically, possess full knowledge of both the preprocessing pipeline 𝛹 and clas-
the transformation pipeline 𝛹  → Z28×28
256
consists of: sifier parameters 𝜃, attacks are executed directly in the image domain.
3
H. Tong et al. Computer Standards & Interfaces 97 (2026) 104111
Crucially, the perturbation is constrained within the payload region of flow (or packet) and 𝒙 = 𝒙 + 𝛿 be its adversarial example. In standard
the traffic image, rather than the padding area. adversarial training, each sample is annotated with a hard label 𝒚,
while the underlying ground-truth semantics are better represented by
Payload-Constrained Perturbation. To ensure semantic fidelity when
a softer distribution P(𝑌 𝒙), especially for adversarial examples lying
mapping perturbed inputs back to the traffic domain, the adversarial
perturbation 𝛿 is restricted to the non-padding (i.e., payload) region: close to the decision boundary. This inherent discrepancy between
the hard label and the true soft distribution can be regarded as label
 = {(𝑖, 𝑗) 28(𝑖 1) + 𝑗𝐿} , (8) noise. Under adversarial perturbations 𝒙 , such mismatches are further
amplified, leading to a higher effective label noise rate, which we define
where  denotes the set of pixels corresponding to the original 𝐿 bytes
as
of the flow F . During attack iterations, any updates falling outside
1 ∑ [
𝑁
 are explicitly zeroed out. While this constraint does not achieve ]
the theoretically optimal adversarial perturbation, it aligns with re- 𝑝𝑒 (′ ) = I 𝒚 𝑖 ≠ arg max P(𝑌 𝒙𝑖 ) , (12)
𝑁 𝑖=1
alistic payload limitations in network traffic and therefore produces
semantically faithful perturbations that are more suitable for practical where ′ = (𝒙𝑖 , 𝒚 𝑖 ) denotes the adversarial training set, and P(𝑌 𝒙𝑖 )
deployment. In this work, we adopt the PGD (Projected Gradient De- reflects the (unknown) ground-truth label distribution of the perturbed
scent) [17] as our primary adversarial method. Specifically, we perform input. Such excessive label noise disrupts the supervision learning,
iterative updates on the input image within the allowed perturbation preventing the model from accurately learning the underlying discrim-
budget 𝜖 and constrain the perturbation to the valid traffic region : inative features of the data. As a result, the classifier may overfit
( ( ( ))) to incorrect labels or adversarial patterns rather than the true class
𝒙𝑡+1 = 𝛱𝜖 (𝒙)∩ 𝒙𝑡 + 𝛼 ⋅ sign ∇𝒙  𝑓𝜽 (𝒙𝑡 ), 𝒚 , (9)
semantics. This issue is particularly critical in adversarial training for
where  denotes the loss function, 𝛱 is the projection operator that traffic classification, where decision boundaries between classes are
restricts the updated input to the intersection of the valid region  and inherently subtle and highly sensitive to small perturbations.
the 𝓁𝑝 -ball of radius 𝜖 centered at 𝒙, and 𝛼 is the step size.
4.2. Impact of label noise on decision boundary robustness
3.4. Adversarial training
Adversarial training assumes that the label of an adversarial ex-
One of the most effective defenses against adversarial attacks is
ample remains unchanged from its clean example. However, when
adversarial training (AT), which enhances model robustness by incor-
an adversarial example crosses the decision boundary into a region
porating adversarial examples into the training process. Specifically, it
semantically aligned with a different class, assigning it the original
formulates the training objective as a minmax optimization:
label introduces semantic inconsistency. We formalize this effect in a
1 ∑
𝑁
( ) binary classification setting. Let the input space be  ⊂ R𝑑 and the
min max 𝓁 𝑓𝜽 (𝒙𝑖 + 𝛿𝑖 ), 𝒚 𝑖 , (10)
𝜽 𝑁 ‖𝛿𝑖 ‖𝑝 ≤𝜖 label space be  = {𝐴, 𝐵}. Consider a classifier 𝑓𝜽  → [0, 1],
𝑖=1
For network traffic classifiers, we extend this paradigm with where 𝑓𝜽 (𝒙) denotes the predicted probability of class 𝐴, and 1 𝑓𝜽 (𝒙)
payload-aware constraints: is the probability of class 𝐵. The decision boundary is defined by the
hypersurface 𝜽 = {𝒙 ∈  𝑓𝜽 (𝒙) = 0.5}. We consider an adversarial
1 ∑
𝑁
min max 𝓁(𝑓𝜽 (𝒙𝑖 + 𝛿), 𝒚 𝑖 ) (11) example 𝒙 generated from a clean input 𝒙 of class 𝐴, such that 𝒙 lies in
𝜽 𝑁 𝛿 ∈
𝑖=1 𝑖 𝑖 the classification region of class 𝐵, i.e., 𝑓𝜽 (𝒙 ) < 0.5. During adversarial
{ } training, if 𝒙 is labeled as 𝐴 (i.e., the same as 𝒙), then minimizing
where 𝑖 = 𝛿 ‖𝛿‖𝑝 ≤ 𝜖 and 𝛿(𝑖,𝑗) = 0, ∀(𝑖, 𝑗) ∉ 𝑖 is the constraint
set for the 𝑖th example. the loss on 𝒙 pushes the decision boundary toward class 𝐵, potentially
degrading the robustness of that class.
4. Label noise
Definition 1 (Margin Distance). Given a example 𝒙 ∈  and a classifier
Label noise in adversarial training refers to the semantic mismatch 𝑓  → [0, 1], the margin distance from 𝒙 to the decision boundary
between the assigned labels and the true labels of adversarial examples.  = {𝒙 ∈  𝑓 (𝒙) = 0.5} is defined as:
As first proposed by Dong et al. [18], this phenomenon arises from
𝑑𝑖𝑠𝑡(𝒙, ) = 𝑚𝑖𝑛 ‖𝒙 𝒙‖𝑝 . (13)
the practice of assigning adversarial examples the same labels as their 𝒙 ∈
clean input. Given a clean input-label pair (𝒙, 𝒚), adversarial training
constructs a perturbed input 𝒙 = 𝒙 + 𝛿 and assigns it the original Theorem 1 (Excessive Boundary Shift Induced by Hard-Label Adversarial
label 𝒚 during training. However, the true label of 𝒙 may differ due Training ). Consider a binary classifier 𝑓  → [0, 1], with the pre-training
to the semantic distortion introduced by the adversarial perturbation decision boundary defined as:
𝛿. This distributional shift is especially detrimental to learning robust
representations, as it misguides the optimization process. pre = {𝒙 ∈  𝑓pre (𝒙) = 0.5}. (14)
Suppose 𝒙𝐴 ∈ 𝐴 is a clean example from class A and 𝒙𝐴 = 𝒙𝐴 + 𝛿 is an
4.1. Amplified label noise in robust traffic classification
adversarial example generated to cross pre , i.e., 𝑓pre (𝒙𝐴 ) < 0.5. Let 𝑓post be
While label noise poses a general challenge in adversarial training, the classifier obtained via hard-label adversarial training using (𝒙𝐴 , 𝑦𝐴 ) as
it becomes even more prominent in the context of robust network supervision, where 𝑦𝐴 = 1. Then, under hard-label supervision, the training
traffic classification. Unlike image data, where semantic changes are objective enforces high-confidence predictions for 𝒙𝐴 , i.e.,
often human-perceivable, traffic data is inherently opaque and lacks
𝑓post (𝒙𝐴 ) ≫ 0.5, (15)
intuitive visual features. Consequently, different classes of traffic data
are compactly distributed and highly entangled, small perturbations in which necessarily implies that the new decision boundary post = {𝒙
the byte-level input space can lead to disproportionately large semantic 𝑓post (𝒙) = 0.5} must satisfy
changes that are not easily detectable by human inspection. In such a
scenario, the probability of label mismatch between clean and adversar- 𝑓post (𝒙𝐴 ) 0.5
dist(𝒙𝐴 , post ) = . (16)
ial examples increases. Let 𝒙 be the image representation of a network ‖∇𝒙 𝑓post (𝒙𝐴 )‖𝑝
4
H. Tong et al. Computer Standards & Interfaces 97 (2026) 104111
(0.5, 0.5) to guide adversarial training. However, in multi-classification,
it is difficult to determine the soft labels of the examples near the deci-
sion boundary, and the boundary may be the intersection of decisions of
1 1 1
multiple classes, and using soft labels such as ( || , || , … , || ) does not
fit the shape of the decision boundary well. A natural solution would be
to find the examples near the current decision boundary that are within
the same class as the original class of the adversarial example, and
use the models output about them as a soft label. However, explicitly
detecting the decision boundary via iterative adversarial attacks is
computationally expensive. Instead, DLAT capitalizes on the fact that
the decision boundary must lie within the space between clean and
adversarial examples, using a lightweight interpolation mechanism to
approximate the soft labels of boundary-adjacent examples.
5.2. Method design
Fig. 1. Decision boundary changes: Hard-Label AT vs. Soft-Label DLAT.
In order to accurately estimate the soft label of the examples near
the decision boundary, we first need to determine the proximity of
the adversarial examples to the current decision boundary, when the
In typical cases where 𝑓post (𝒙𝐴 ) → 1, the post-training boundary adversarial examples are farther away from the decision boundary, the
moves far beyond 𝒙𝐴 in the direction of class B. As a result, many output logits of the clean examples are given higher weight for interpo-
nearby class-B examples 𝒙𝐵 ∈ 𝐵 satisfying 𝒙𝐵𝒙𝐴 may fall lation in order to adjust the timely adjustment of the decision boundary
into the wrong side of the decision boundary, resulting in increased to the vicinity of the adversarial examples, and on the contrary, the
misclassification. The detailed proof can be found in Appendix. adversarial examples are given higher weight for interpolation to be
Although Theorem 1 is formulated in a binary classification setting able to prevent the adjusted decision boundary from crossing too much
for analytical clarity, the underlying insights naturally extend to multi- distance from the adversarial examples.
class scenarios. In the multi-class case, a classifier defines multiple
decision boundaries between classes. Hard-label adversarial training on Algorithm 1: Dynamic Label Adversarial Training
an adversarial example 𝒙 with true label 𝑦 forces an increase in the 1 Input: Network traffic dataset 𝐷; Learning rate 𝜂; Total
logit margin: training epochs 𝑇 ; Model architecture 𝑓
2 Initialize model 𝑓 with parameters 𝜽 // Model
𝑧𝑦 (𝒙) 𝑧𝑘 (𝒙), ∀𝑘 ≠ 𝑦, (17) initialization
which effectively pushes the decision boundaries of all other classes 3 for 𝑖 ∈ [𝑇 ] do
away from 𝒙 . When 𝒙 lies near the intersection of multiple class re- 4 foreach batch (𝑿, 𝒀 ) ∈ 𝐷 do
gions, this aggressive supervision disproportionately expands the region 5 𝑿 𝑃 𝐺𝐷(𝑓 , 𝑿, 𝒀 ) // Adversarial example
of class 𝑦 at the expense of compressing neighboring class regions, generation
analogous to the boundary distortion shown in the binary case. 6 𝑶𝑓 (𝑿)
Our dynamic label assignment mitigates this issue by relaxing 7 𝑶𝑓 (𝑿 )
the overconfident supervision for adversarial examples near decision 8 𝐾𝐿𝐷𝑖𝑣(𝑶, 𝑶 ) // KL-based distance
boundaries. Rather than forcing 𝒙 deep into the original decision field, computation
the interpolated target 𝒚 mix the interpolated target 𝒚 mix guides a more 9 𝛼 ← tanh(𝐾𝐿)+1
2
appropriate adjustment of the decision boundaries. This calibrated 10 𝒀 𝑚𝑖𝑥 ← (1 𝛼) ⋅ 𝑶 + 𝛼𝑶 // Mixing label
supervision prevents the excessive boundary shift described in Theorem construction
1, enabling the model to maintain robustness in practical multi-class 11 adv ← 𝐷𝑖𝑣(𝑶 , 𝒀 𝑚𝑖𝑥 )
traffic classification tasks. 12 clean ← CE (𝑶, 𝒀 )
13 total ← adv + clean
5. Dynamic label adversarial training 14 𝜽 ← 𝜽 𝜂 ⋅ ∇𝜽 total // Model update
15 end
Motivated by the analysis of label noise on the robustness of adver- 16 end
sarial training in Section 4, we propose DLAT (Dynamic Label Adversar-
ial Training), an adversarial training strategy that efficiently improves Given a clean example 𝒙 and its adversarial example 𝒙 = 𝒙 + 𝛿, let
adversarial robustness utilizing dynamically mixed soft labels. 𝑓 denote the classifier with outputs 𝑶 = 𝑓 (𝒙) and 𝑶 = 𝑓 (𝒙 ). Since the
mapping between clean examples and hard labels can be established
5.1. Design inspiration soon by training, we can utilize the KullbackLeibler (KL) divergence to
quantify the distance between the adversarial example and the decision
In traditional adversarial training, assigning hard labels to adver- boundary:
sarial examples introduces significant label noise, since the true label ∑ sof tmax(𝑶𝑖 )
of an adversarial example may differ from its clean counterpart. This 𝐷𝑖𝑣(𝑶, 𝑶 ) = sof tmax(𝑶𝑖 ) log . (18)
𝑖 sof tmax(𝑶𝑖 )
label noise forces the decision boundary to move far away from these
Higher 𝐷𝑖𝑣 typically indicates larger distortion and label noise. To
examples, as shown in Fig. 1, ultimately leading to degraded model
obtain a stable and responsive mixing factor 𝛼 ∈ [0, 1], we normal-
robustness. To address this issue, the first step is to mitigate label
ize 𝐷𝑖𝑣(𝑶, 𝑶 ) using the tanh function, which provides a smooth and
noise. According to Theorem 1 and Section 4.1, using soft labels can
symmetric mapping and naturally bounds the output. Accordingly, we
effectively reduce such label noise, thereby preventing the decision
define:
boundary from over-shifting. In binary classification, this corresponds ( )
to adjusting the boundary toward the neighborhood of the adversarial tanh 𝐷𝑖𝑣(𝑶, 𝑶 ) + 1
𝛼= . (19)
examples, which can be achieved by assigning a soft label such as 2
5
H. Tong et al. Computer Standards & Interfaces 97 (2026) 104111
This factor interpolates between 𝑶 and 𝑶 to form the mixed soft Table 1
label: The balanced ISCX-VPN dataset.
Type Imbalanced dataset Imbalanced dataset
𝒚 𝑚𝑖𝑥 = (1 𝛼) ⋅ 𝑶 + 𝛼𝑶. (20)
Total number Training set number Test set number
The training objective of DLAT combines two components. The first VPN_Chat 7946 1500 200
is a KL divergence loss that aligns the models prediction on 𝒙 with VPN_Email 596 1500 59
VPN_File Transfer 1898 1500 189
𝒚 𝑚𝑖𝑥 to improve the model robustness:
VPN_P2P 912 1500 91
( ) VPN_Streaming 1199 1500 119
adv = 𝐷𝑖𝑣 𝑶 , 𝒚 𝑚𝑖𝑥 , (21)
VPN_VoIP 20 581 1500 200
where the second is a cross-entropy loss that is used to allow the
model to learn generalization knowledge and improve clean example Table 2
classification accuracy: The balanced CICIoT2022 dataset.
clean = 𝒚 𝑖 log sof tmax(𝑶𝑖 ). (22) Type Imbalanced dataset Imbalanced dataset
𝑖 Total number Training set number Test set number
The overall loss is formulated as:
VPN_Chat 7946 1500 200
[ ] VPN_Email 596 1500 59
min max adv (𝑓𝜽 (𝒙 + 𝛿), 𝒚 𝑚𝑖𝑥 ) + clean (𝑓𝜃 (𝒙), 𝒚) . (23)
𝜽 𝛿𝑖 ∈𝑖 VPN_File Transfer 1898 1500 189
By dynamically adapting label softness based on Eq. (18)(20) and VPN_P2P 912 1500 91
VPN_Streaming 1199 1500 119
balancing loss components Eq. (21)(23), DLAT mitigates excessive
VPN_VoIP 20 581 1500 200
boundary shift caused by label noise, enabling models to learn robust
decision boundaries for tasks like traffic classification. The pseudo-code
for DLAT is presented on Algorithm 1. Table 3
The balanced ISCX-ALL dataset.
6. Experiments Type Imbalanced dataset Imbalanced dataset
Total number Training set number Test set number
In this section, we perform a wide variety of comprehensive ex- Chat 7681 5400 600
periments to evaluate the performance of DLAT on both clean and Email 6459 5400 600
adversarial traffic. These evaluations are carried out on two datasets File Transfer 7405 5400 600
P2P 1849 1652 184
and compared against four state-of-the-art adversarial training methods
Streaming 3936 3540 393
in the computer vision field. VoIP 19 597 5400 600
VPN_Chat 7946 5400 600
6.1. Experiment setup VPN_Email 596 538 59
VPN_File Transfer 1898 1754 189
VPN_P2P 912 830 91
Datasets. Experiments are performed using the ISCX VPN-nonVPN VPN_Streaming 1199 1108 119
VPN_VoIP 20 581 5400 600
dataset [29] and the CICIoT2022 dataset [30]. The former includes
encrypted and unencrypted traffic, while the latter focuses on IoT-
related scenarios with both benign and malicious behaviors. We con-
struct three experimental settings from those datasets. The first, re-
ferred to as ISCX-VPN, includes six categories of encrypted VPN traffic:
Evaluation Metrics. In our experiments, we adopt two primary evalua-
VPN_Chat, VPN_Email, VPN_File Transfer, VPN_P2P, VPN_Streaming,
tion metrics to assess the effectiveness of DLAT: the Robust Classification
and VPN_VoIP. The second setting, named ISCX-ALL, expands the clas-
Accuracy (RCC) and the Clean Sample Accuracy (ACC). ASR measures
sification scope to twelve categories by incorporating six VPN and six
the proportion of adversarial traffic that successfully fools the model,
non-VPN traffic types. The third setting, derived from the CICIoT2022
indicating the robustness of the defense mechanism under adversarial
dataset, defines a six-class classification task encompassing typical
attacks. A lower RCC implies stronger robustness. In contrast, ACC
IoT device states and activities. The categories include: Power, Idle,
evaluates the classification accuracy on clean, unperturbed traffic, re-
Interactions, Scenarios, Active, and Attacks. Since the original datasets
flecting the models predictive performance under normal conditions.
exhibit significant class imbalance, we first split the data into training
A higher ACC indicates better generalization and utility in benign
and testing sets with a 9:1 ratio, and then apply class-wise balancing
settings. We report both metrics to provide a comprehensive assessment
separately within each subset to ensure a relatively balanced class
distribution. The statistics of the balanced datasets are summarized in of the trade-off between robustness and standard accuracy.
Table 1, 2 and 3. Baselines. We compare DLAT to the following representative ad-
Training. We adopt two representative neural network architectures as versarial training baselines, including PGD-AT [17], TRADES [26],
backbone models: PreActResNet [31], DenseNet [32], MobileNet [33], MART [27], and AWP [28]. All baseline methods are implemented
WideResNet [34], and FFNN (Feed-Forward Neural Network) [35]. following their original settings. For TRADES, the trade-off parameter
Both models are trained for 80 epochs using the momentum-based 𝜆 is set to 16, as suggested in the original paper. For AWP, the weight
stochastic gradient descent (MSGD) [36], with a momentum coefficient perturbation step size 𝛾 is set to 0.01. Unlike those training methods,
of 0.9 and a weight decay of 5 × 104 . The initial learning rate is set which still rely on hard labels and thus remain sensitive to mislabeled
to 0.1, and a multi-stage learning rate decay strategy is applied: the data, DLAT explicitly incorporates soft-label supervision, making it
learning rate is reduced by a factor of 10 at the 40th epoch. more robust under label noise.
Attack and defense settings. For adversarial evaluation, we adopt the
6.2. The effectiveness of DLAT
widely used PGD-20 under the 𝓁∞ norm constraint. The perturbation
radius 𝜖 is set to 24255, and the step size 𝛼 is 4255. For generating
adversarial examples used in adversarial training, we employ PGD-10 Clean accuracy assessment. As shown in Table 4, the normal model
under the same 𝓁∞ -bounded perturbation settings. trained without adversarial defenses achieves the highest ACC across
6
H. Tong et al. Computer Standards & Interfaces 97 (2026) 104111
Table 4
The clean sample accuracy (ACC) and robust classification accuracy (RCC) of different adversarial training methods across four network architectures: ResNet,
DenseNet, MobileNet, WideResNet, and FFNN on the ISCX-VPN, ISCX-ALL and CICIoT2022 datasets (%).
Dataset Method Model
ResNet DenseNet MobileNet WideResNet FFNN
ACC RCC ACC RCC ACC RCC ACC RCC ACC RCC
Normal 99.02 ± 0.30 0.00 ± 0.00 99.92 ± 0.08 0.67 ± 0.09 99.17 ± 0.00 3.58 ± 0.14 99.75 ± 0.00 0.83 ± 0.07 98.25 ± 0.00 7.67 ± 0.58
PGD-AT 98.72 ± 0.18 96.32 ± 0.29 96.02 ± 0.23 91.00 ± 0.72 97.87 ± 0.25 90.00 ± 2.69 99.35 ± 0.08 96.01 ± 0.11 97.25 ± 0.24 87.00 ± 0.81
TRADES 96.75 ± 0.37 94.62 ± 0.30 92.98 ± 0.29 89.92 ± 0.15 93.18 ± 0.44 85.35 ± 3.38 97.92 ± 0.24 96.03 ± 0.18 92.02 ± 0.41 83.68 ± 0.87
ISCX-VPN
MART 98.08 ± 0.43 94.20 ± 0.59 82.65 ± 0.72 78.90 ± 0.53 80.83 ± 1.76 70.85 ± 1.74 98.51 ± 0.19 92.72 ± 0.17 93.28 ± 0.20 84.58 ± 0.60
AWP 98.18 ± 0.17 96.22 ± 0.17 95.40 ± 0.33 92.92 ± 0.09 93.40 ± 0.42 90.10 ± 0.49 73.82 ± 0.46 72.18 ± 0.54 95.63 ± 0.24 88.32 ± 0.29
DLAT 98.83 ± 0.09 96.53 ± 0.08 98.77 ± 0.26 93.93 ± 0.42 98.20 ± 0.10 93.07 ± 0.47 99.08 ± 0.05 96.38 ± 0.36 96.88 ± 0.17 86.37 ± 0.30
Normal 93.95 ± 4.36 2.04 ± 1.06 96.70 ± 2.11 0.23 ± 0.07 91.52 ± 4.99 3.74 ± 0.12 96.22 ± 1.48 7.23 ± 0.48 88.48 ± 0.27 1.61 ± 0.21
PGD-AT 88.56 ± 0.10 87.34 ± 0.20 82.96 ± 0.26 80.61 ± 0.30 82.19 ± 0.24 78.87 ± 0.73 88.63 ± 0.03 86.12 ± 2.89 83.00 ± 0.34 77.23 ± 0.29
TRADES 88.31 ± 0.13 86.19 ± 0.45 79.19 ± 1.12 73.98 ± 3.39 80.39 ± 0.80 75.26 ± 2.93 87.32 ± 1.41 84.90 ± 2.54 76.47 ± 1.90 71.01 ± 0.75
ISCX-ALL
MART 88.19 ± 0.18 86.33 ± 0.51 77.22 ± 0.19 76.08 ± 0.22 80.78 ± 0.33 77.79 ± 0.31 87.67 ± 0.12 86.10 ± 0.45 75.99 ± 0.64 69.95 ± 1.79
AWP 86.31 ± 0.11 85.44 ± 0.10 78.00 ± 0.19 76.43 ± 0.48 78.83 ± 0.07 77.58 ± 0.16 85.85 ± 0.12 84.71 ± 0.05 81.30 ± 0.21 76.91 ± 0.21
DLAT 89.44 ± 0.32 86.68 ± 0.40 88.83 ± 0.80 82.18 ± 0.43 84.35 ± 0.36 75.84 ± 1.27 88.71 ± 0.02 87.14 ± 0.41 86.79 ± 0.26 74.32 ± 0.81
Normal 99.82 ± 0.32 0.04 ± 0.01 99.73 ± 0.01 0.63 ± 0.02 98.50 ± 2.59 0.00 ± 0.00 99.99 ± 0.00 0.56 ± 0.01 99.67 ± 0.06 0.12 ± 0.06
PGD-AT 99.27 ± 0.08 96.26 ± 3.18 98.20 ± 0.02 96.86 ± 0.44 98.20 ± 0.79 97.65 ± 0.47 99.46 ± 0.21 93.73 ± 0.46 83.32 ± 2.40 81.36 ± 2.58
TRADES 98.35 ± 0.82 98.90 ± 0.57 98.04 ± 0.00 97.81 ± 1.36 98.05 ± 0.31 91.38 ± 0.74 98.06 ± 0.02 97.62 ± 0.19 96.84 ± 0.11 89.20 ± 0.27
CICIoT2022
MART 98.19 ± 0.02 96.37 ± 2.27 98.05 ± 0.31 95.50 ± 0.50 98.06 ± 0.28 95.20 ± 0.40 99.00 ± 0.05 97.00 ± 0.10 98.20 ± 0.20 91.28 ± 1.50
AWP 98.25 ± 0.10 96.50 ± 0.20 98.10 ± 0.15 96.00 ± 0.25 98.15 ± 0.12 95.50 ± 0.30 99.10 ± 0.05 98.00 ± 0.10 98.00 ± 0.15 90.10 ± 0.50
DLAT 99.70 ± 0.02 99.20 ± 0.12 98.89 ± 0.17 97.12 ± 0.24 98.06 ± 0.28 97.88 ± 0.14 99.66 ± 0.02 98.99 ± 0.11 98.87 ± 0.09 91.93 ± 0.86
Fig. 2. The robust classification accuracy (RCC) of DLAT under 𝓁1 and 𝓁2 norm-bounded PGD-20 attacks on datasets ISCX-VPN, ISCX-ALL and CICIot2022.
all architectures, ranging from 98.25% to 99.92% on ISCX-VPN, from notably outperforming PGD-AT, TRADES, MART, and AWP, with top
88.48% to 96.70% on ISCX-ALL, and from 98.50% to 99.99% on results exceeding 96% on ResNet and WideResNet. Similarly, on ISCX-
CICIot2022. However, it fails completely under adversarial attacks, ALL and CICIot2022, it maintains leading robustness, achieving up to
with robustness classification accuracy (RCC) close to zero. In the 87.14% and 98.99% RCC on WideResNet and surpassing competing
table, boldface highlights the best performance for each metric, while methods by a clear margin. These findings underscore the superior
underlining indicates the second-best. Compared to the normal model, robustness of DLAT while retaining competitive clean accuracy.
adversarial training methods such as PGD-AT, TRADES, and MART Secondly, to further assess the robustness of DLAT against unseen
significantly improve robustness, albeit at the cost of decreased clean adversarial threats, we evaluate its robustness under a diverse set
accuracy. Specifically, PGD-AT maintains relatively higher ACC (e.g., of attack methods, including adversarial perturbations constrained by
98.72% on ResNet and 88.56% on ISCX-ALL, while TRADES and MART different norm bounds (i.e., 𝓁1 and 𝓁2 norms) as well as FGSM [15],
show larger reductions in ACC on clean examples). Our method, DLAT, PGD-100 [17], and AutoAttack [37]. We first report the performance
consistently achieves competitive ACC, reaching up to 98.83% on of DLAT under 𝓁1 - and 𝓁2 -bounded PGD-20 attacks on the ISCX-
ResNet and 89.44% on ISCX-ALL, surpassing all baselines on ISCX- VPN, ISCX-ALL, and CICIot2022 datasets, as illustrated in Fig. 2. Each
ALL and maintaining top-tier accuracy on ISCX-VPN and CICIot2022. heatmap visualizes the RCC achieved by five different models un-
These results demonstrate that DLAT effectively enhances robustness der increasing perturbation radii. It can be observed that DLAT ex-
with minimal compromise to clean performance. hibits strong robustness under both 𝓁1 - and 𝓁2 -bounded PGD-20 at-
Robust accuracy assessment. We first evaluate the RCC of various ad- tacks. Notably, the defense is more effective against 𝓁1 -norm pertur-
versarial training methods under adversarial attacks. As shown in Table bations, as indicated by the overall darker color tones in the corre-
4, adversarial training markedly improves RCC compared with the nor- sponding heatmaps. This suggests that DLAT better preserves classi-
mal model, which exhibits near-zero robustness. Among the compared fication performance when facing sparse but high-magnitude pertur-
methods, DLAT consistently surpasses most baselines in the majority of bations. Among the evaluated models, ResNet and DenseNet generally
cases across both datasets and network architectures. Specifically, on exhibit higher RCC scores across both norm types and datasets, with
ISCX-VPN, DLAT attains RCC scores above 86% across all architectures, RCC remaining above 0.8 under moderate 𝓁1 perturbations (e.g., 𝜖 =
7
H. Tong et al. Computer Standards & Interfaces 97 (2026) 104111
Fig. 3. The RCC of DLAT under FGSM, PGD-100, AutoAttack on ISCX-VPN, ISCX-ALL, and CICIot2022 datasets.
Fig. 4. The robust classification accuracy (RCC) of various models across classes on ISCX-ALL under increasing adversarial perturbation radii.
1140255). In contrast, MobileNet and DenseNet show relatively lower in performance, particularly when 𝜖 exceeds 24/255. Despite this,
robustness, particularly under 𝓁2 -bounded attacks, where RCC values architectures such as ResNet and wideresnet continue to maintain RCC
gradually decrease below 0.6 as the perturbation radius increases. above 0.5 at 𝜖 = 32255, suggesting that DLAT remains effective even
Nonetheless, the performance degradation across all models is smooth under adaptive and high-strength adversarial attacks. These results
rather than abrupt, suggesting that DLAT retains a degree of robustness collectively demonstrate the generalization capability of the framework
and stability. across a broad range of attacks and perturbation intensities.
As shown in Fig. 3, we further assess the performance of DLAT We thirdly evaluate the robustness of DLAT under varying attack
under three previously unseen adversarial attacks: FGSM, PGD-100, intensities, where the attack intensity corresponds to the radii of ad-
and AutoAttack. Under FGSM, all evaluated models exhibit strong versarial perturbations (denoted by Epsilon 𝜖). As comprehensively
robustness, with RCC values typically exceeding 0.85 below 𝜖 = 24255, illustrated in Fig. 4, we present the RCC performance for each indi-
and models such as ResNet and WideResNet experiencing only marginal vidual class within the ISCX-ALL dataset (including Chat, Email, File
performance degradation. As the perturbation strength increases under Transfer, P2P, Streaming, VoIP, VPN_Chat, VPN_Email, VPN_File Trans-
PGD-100, the RCC gradually decreases across all models. Nonetheless, fer, VPN_P2P, VPN_Streaming, and VPN_VoIP) across multiple network
most models achieve RCCs above 0.5 at 𝜖 = 32255 on the ISCX-VPN architectures (ResNet, DenseNet, MobileNet, WideResNet, FFNN) un-
dataset, indicating a moderate level of robustness. AutoAttack presents der increasing perturbation radii (𝜖 ranging from 0 to 56/255). The
the most challenging scenario, leading to a more pronounced decline adversarial training of DLAT is performed using adversarial examples
8
H. Tong et al. Computer Standards & Interfaces 97 (2026) 104111
(a) Accuracy curve (b) Loss curve
Fig. 5. Comparison of accuracy and loss convergence results for DenseNet on the ISCX-ALL Dataset.
generated with a perturbation radius of 𝜖 = 24255. As shown in Fig. 4, Table 5
across most classes and architectures, the trained models demonstrate Comparison of the time consumption for each epoch of the adversarial training
strong robustness when the attack intensity remains within or below methods (s).
this radius (𝜖 ≤ 24255), and the models still maintain relatively strong Dataset Model AT TRADES MART AWP DLAT
resilience to perturbations (i.e., 24255 < 𝜖 < 32255). However, once ResNet 16.99 17.98 19.38 19.19 19.07
𝜖 exceeds 32255, the attack becomes significantly stronger, leading to ISCX-VPN DenseNet 12.59 14.02 14.52 15.84 14.28
a noticeable drop in RCC, especially for non-VPN classes. MobileNet 26.14 28.55 28.14 30.83 27.98
WideResNet 139.62 136.84 147.27 140.37 152.07
FFNN 4.02 3.85 3.94 4.36 4.41
6.3. The efficiency of DLAT
ResNet 74.32 80.69 84.49 89.11 81.57
ISCX-ALL DenseNet 57.64 60.83 63.62 66.78 62.95
To evaluate the training efficiency of DLAT, we compare its con- MobileNet 113.71 114.23 130.42 129.99 117.19
vergence with that of representative adversarial training baselines, WideResNet 673.35 621.27 688.85 688.37 762.18
including AT, TRADES, MART, and AWP. As illustrated in Fig. 5, FFNN 16.43 15.03 17.86 17.62 16.31
DLAT demonstrates significantly faster convergence in both accuracy ResNet 47.35 48.92 51.19 51.32 49.63
and loss. Specifically, in the accuracy curve (Fig. 5(a), DLAT rapidly DenseNet 61.02 63.11 66.68 68.92 64.90
improves during the initial training epochs, reaching a stable accuracy CICIoT2022 MobileNet 121.56 122.91 132.23 135.13 124.87
WideResNet 680.37 690.82 703.16 710.55 695.09
above 0.85 within 30 epochs. In contrast, competing methods exhibit
FFNN 18.06 19.42 18.98 19.56 20.43
slower convergence and lower final performance, with TRADES and
MART stabilizing below 0.80. Similarly, the loss curve (Fig. 5(b) further
highlights the advantage of DLAT in optimization stability. It consis-
tently maintains a lower loss value throughout training and converges that DLAT consistently improves robustness and generalization over
to a final loss below 0.3, which is noticeably lower than those of other standard adversarial training.
methods. These results collectively demonstrate that DLAT not only
accelerates the convergence process but also facilitates optimization CRediT authorship contribution statement
toward better minima, indicating its efficiency and practicality for
robust model training. Haoyu Tong: Writing original draft. Meixia Miao: Methodology,
In addition to its fast convergence, DLAT maintains comparable Formal analysis, Project administration. Yundong Liu: Data curation.
training time per epoch to other adversarial training methods, as re- Xiaoyu Zhang: Writing original draft, Supervision. Xiangyang Luo:
ported in Table 5. Across different model architectures and datasets, Resources, Funding acquisition. Willy Susilo: Visualization, Validation,
the time cost of DLAT remains close to that of AT, TRADES, MART, and Funding acquisition.
AWP. By achieving improved robustness and faster convergence with-
out sacrificing efficiency, DLAT offers a practical solution for robust Declaration of competing interest
network traffic classification.
The authors declare that they have no known competing finan-
cial interests or personal relationships that could have appeared to
7. Conclusion influence the work reported in this paper.
In this paper, we investigated the vulnerability of deep traffic Acknowledgments
classifiers to adversarial examples and the label noise introduced by
hard-label supervision in adversarial training. To address this issue, we This work is funded by the Open Foundation of Key Laboratory of
proposed DLAT, a dynamic adversarial training framework that assigns Cyberspace Security, Ministry of Education of China and Henan Key
soft labels to adversarial examples based on the similarity between Laboratory of Cyberspace Situation Awareness (No. KLCS20240103),
clean and perturbed outputs. This similarity-guided interpolation helps National Natural Science Foundation of China (No. 62472345), and
mitigate label noise and align the decision boundary more effectively. Fundamental Research Funds for the Central Universities, China (No.
Experimental results on traffic classification benchmarks demonstrate QTZX25088).
9
H. Tong et al. Computer Standards & Interfaces 97 (2026) 104111
Appendix. The proof Theorem 1 Data availability
Theorem 1 (Excessive Boundary Shift Induced by Hard-Label Adversarial Data will be made available on request.
Training ). Consider a binary classifier 𝑓  → [0, 1], with the pre-training
decision boundary defined as: References
pre = {𝒙 ∈  𝑓pre (𝒙) = 0.5}.
[1] A. Azab, M. Khasawneh, S. Alrabaee, K.-K.R. Choo, M. Sarsour, Network traffic
classification: Techniques, datasets, and challenges, Digit. Commun. Netw. 10 (3)
Suppose 𝒙𝐴 ∈ 𝐴 is a clean example from class A and 𝒙𝐴 = 𝒙𝐴 + 𝛿 is an
(2024) 676692.
adversarial example generated to cross pre , i.e., 𝑓pre (𝒙𝐴 ) < 0.5. Let 𝑓post be [2] H. Yuan, G. Li, A survey of traffic prediction: from spatio-temporal data to
the classifier obtained via hard-label adversarial training using (𝒙𝐴 , 𝑦𝐴 ) as intelligent transportation, Data Sci. Eng. 6 (1) (2021) 6385.
supervision, where 𝑦𝐴 = 1. Then, under hard-label supervision, the training [3] A.W. Moore, K. Papagiannaki, Toward the accurate identification of net-
work applications, in: International Workshop on Passive and Active Network
objective enforces high-confidence predictions for 𝒙𝐴 , i.e.,
Measurement, Springer, 2005, pp. 4154.
[4] A. Madhukar, C. Williamson, A longitudinal study of P2P traffic classification,
𝑓post (𝒙𝐴 ) ≫ 0.5,
in: 14th IEEE International Symposium on Modeling, Analysis, and Simulation,
IEEE, 2006, pp. 179188.
which necessarily implies that the new decision boundary post = {𝒙
[5] S. Fernandes, R. Antonello, T. Lacerda, A. Santos, D. Sadok, T. Westholm,
𝑓post (𝒙) = 0.5} must satisfy Slimming down deep packet inspection systems, in: IEEE INFOCOM Workshops
2009, IEEE, 2009, pp. 16.
𝑓post (𝒙𝐴 ) 0.5
dist(𝒙𝐴 , post ) = . [6] N. Hubballi, M. Swarnkar, M. Conti, BitProb: Probabilistic bit signatures for
‖∇𝒙 𝑓post (𝒙𝐴 )‖𝑝 accurate application identification, IEEE Trans. Netw. Serv. Manag. 17 (3) (2020)
17301741, http://dx.doi.org/10.1109/TNSM.2020.2999856.
[7] A. Azab, P. Watters, R. Layton, Characterising network traffic for skype forensics,
Proof. Let 𝒙𝐴 ∈ 𝐴 be a clean example correctly classified as class A, in: 2012 Third Cybercrime and Trustworthy Computing Workshop, 2012, pp.
and let 𝒙𝐴 = 𝒙𝐴 + 𝛿 be its adversarial variant generated to cross the 1927, http://dx.doi.org/10.1109/CTC.2012.14.
original decision boundary pre , i.e., [8] H. Mohajeri Moghaddam, Skypemorph: Protocol Obfuscation for Censorship
Resistance, University of Waterloo, 2013.
𝑓pre (𝒙𝐴 ) < 0.5. [9] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015)
436444.
Hard-label adversarial training uses the tuple (𝒙𝐴 , 𝑦𝐴 = 1) as supervised [10] M. Lotfollahi, M.J. Siavoshani, R.S.H. Zade, M. Saberian, Deep packet: a novel
data, forcing the model 𝑓post to assign high confidence to 𝒙𝐴 : approach for encrypted traffic classification using deep learning, Soft Comput.
24 (2017) 19992012, URL https://api.semanticscholar.org/CorpusID:35187639.
𝑓post (𝒙𝐴 ) → 1. [11] L. Yang, A. Finamore, F. Jun, D. Rossi, Deep learning and traffic classification:
Lessons learned from a commercial-grade dataset with hundreds of encrypted
Now, consider the new decision boundary: and zero-day applications, 2021, arXiv preprint arXiv:2104.03182.
[12] M.H. Pathmaperuma, Y. Rahulamathavan, S. Dogan, A.M. Kondoz, Deep learning
post = {𝒙 𝑓post (𝒙) = 0.5}. for encrypted traffic classification and unknown data detection, Sensors 22 (19)
(2022) 7643.
We approximate 𝑓post in a neighborhood of 𝒙𝐴 using a first-order Taylor [13] X. Lin, G. Xiong, G. Gou, Z. Li, J. Shi, J. Yu, Et-bert: A contextualized datagram
representation with pre-training transformers for encrypted traffic classification,
expansion: in: Proceedings of the ACM Web Conference 2022, 2022, pp. 633642.
[14] X. Ma, W. Zhu, J. Wei, Y. Jin, D. Gu, R. Wang, EETC: An extended encrypted
𝑓post (𝒙) ≈ 𝑓post (𝒙𝐴 ) + ∇𝒙 𝑓post (𝒙𝐴 ) (𝒙 𝒙𝐴 ). traffic classification algorithm based on variant resnet network, Comput. Secur.
128 (2023) 103175.
Let 𝒙 ∈ post denote the closest point on the new boundary to 𝒙𝐴 . By [15] I.J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adversarial
definition, examples, in: International Conference on Learning Representations, ICLR, 2014.
[16] A.M. Sadeghzadeh, S. Shiravi, R. Jalili, Adversarial network traffic: Towards
𝑓post (𝒙 ) = 0.5. evaluating the robustness of deep-learning-based network traffic classification,
IEEE Trans. Netw. Serv. Manag. 18 (2) (2021) 19621976.
Using the linear approximation, we have: [17] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, A. Vladu, Towards deep learning
models resistant to adversarial attacks, in: International Conference on Learning
0.5 ≈ 𝑓post (𝒙𝐴 ) + ∇𝒙 𝑓post (𝒙𝐴 ) (𝒙 𝒙𝐴 ). Representations, ICLR, 2018.
[18] C. Dong, L. Liu, J. Shang, Label noise in adversarial training: A novel per-
Solving for the shift vector: spective to study robust overfitting, Adv. Neural Inf. Process. Syst. 35 (2022)
1755617567.
∇𝒙 𝑓post (𝒙𝐴 ) (𝒙 𝒙𝐴 ) ≈ 0.5 𝑓post (𝒙𝐴 ). [19] W. Wang, M. Zhu, J. Wang, X. Zeng, Z. Yang, End-to-end encrypted traffic
classification with one-dimensional convolution neural networks, in: 2017 IEEE
Let 𝒗 = ∇𝒙 𝑓post (𝒙𝐴 )∕‖∇𝒙 𝑓post (𝒙𝐴 )‖𝑝 be the normalized gradient (i.e., the International Conference on Intelligence and Security Informatics, ISI, IEEE,
local normal direction to the decision boundary). Then the minimal 2017, pp. 4348.
distance from 𝒙𝐴 to the boundary is: [20] J. Lan, X. Liu, B. Li, Y. Li, T. Geng, DarknetSec: A novel self-attentive deep
learning method for darknet traffic classification and application identification,
|𝑓post (𝒙𝐴 ) 0.5| Comput. Secur. 116 (2022) 102663.
‖𝒙 𝒙𝐴 ‖𝑝 = . [21] K. Fauvel, F. Chen, D. Rossi, A lightweight, efficient and explainable-by-design
‖∇𝒙 𝑓post (𝒙𝐴 )‖𝑝 convolutional neural network for internet traffic classification, in: Proceedings
As 𝑓post (𝒙𝐴 ) → 1, this implies: of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,
2023, pp. 40134023.
0.5
dist(𝒙𝐴 , post ) → . [22] Z. Liu, Y. Xie, Y. Luo, Y. Wang, X. Ji, TransECA-net: A transformer-based model
‖∇𝒙 𝑓post (𝒙𝐴 )‖𝑝 for encrypted traffic classification, Appl. Sci. 15 (6) (2025) 2977.
This lower bound quantifies how far the decision boundary must [23] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R.
Fergus, Intriguing properties of neural networks, 2013, arXiv:1312.6199.
move beyond 𝒙𝐴 to satisfy 𝑓post (𝒙𝐴 ) = 1. If ∇𝒙 𝑓post (𝒙𝐴 ) is not vanish-
[24] A. Kurakin, I.J. Goodfellow, S. Bengio, Adversarial examples in the physical
ingly large, this distance is significant. Finally, since 𝒙𝐴 was crafted to world, in: Artificial Intelligence Safety and Security, Chapman and Hall/CRC,
lie just beyond pre , i.e., in close proximity to the original boundary, 2018, pp. 99112.
the boundary movement beyond 𝒙𝐴 implies that the new decision [25] N. Carlini, D. Wagner, Towards evaluating the robustness of neural networks, in:
2017 IEEE Symposium on Security and Privacy, S&P, IEEE, 2017, pp. 3957.
boundary has crossed deep into the region previously occupied by class
[26] H. Zhang, Y. Yu, J. Jiao, E. Xing, L. El Ghaoui, M. Jordan, Theoretically princi-
B. Therefore, class-B examples in the vicinity of 𝒙𝐴 are likely to be pled trade-off between robustness and accuracy, in: International Conference on
misclassified as class A under 𝑓post . □ Machine Learning, PMLR, 2019, pp. 74727482.
10
H. Tong et al. Computer Standards & Interfaces 97 (2026) 104111
[27] Y. Wang, D. Zou, J. Yi, J. Bailey, X. Ma, Q. Gu, Improving adversarial [32] G. Huang, Z. Liu, L. Van Der Maaten, K.Q. Weinberger, Densely connected
robustness requires revisiting misclassified examples, in: International Conference convolutional networks, in: Proceedings of the IEEE Conference on Computer
on Learning Representations, ICLR, 2019. Vision and Pattern Recognition, 2017, pp. 47004708.
[28] D. Wu, S.-T. Xia, Y. Wang, Adversarial weight perturbation helps robust [33] A.G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M.
generalization, Adv. Neural Inf. Process. Syst. 33 (2020) 29582969. Andreetto, H. Adam, Mobilenets: Efficient convolutional neural networks for
[29] G.D. Gil, A.H. Lashkari, M. Mamun, A.A. Ghorbani, Characterization of encrypted mobile vision applications, 2017, arXiv preprint arXiv:1704.04861.
and VPN traffic using time-related features, in: Proceedings of the 2nd Interna- [34] S. Zagoruyko, N. Komodakis, Wide residual networks, 2016, arXiv preprint
tional Conference on Information Systems Security and Privacy, ICISSP 2016, arXiv:1605.07146.
SciTePress Setúbal, Portugal, 2016, pp. 407414. [35] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning representations by
[30] S. Dadkhah, H. Mahdikhani, P.K. Danso, A. Zohourian, K.A. Truong, A.A. back-propagating errors, Nature 323 (6088) (1986) 533536.
Ghorbani, Towards the development of a realistic multidimensional IoT profiling [36] N. Qian, On the momentum term in gradient descent learning algorithms, Neural
dataset, in: 2022 19th Annual International Conference on Privacy, Security & Netw. 12 (1) (1999) 145151.
Trust, PST, IEEE, 2022, pp. 111. [37] F. Croce, M. Hein, Reliable evaluation of adversarial robustness with an ensemble
[31] K. He, X. Zhang, S. Ren, J. Sun, Identity mappings in deep residual networks, of diverse parameter-free attacks, in: ICML, 2020.
in: Computer VisionECCV 2016: 14th European Conference, Amsterdam, the
Netherlands, October 1114, 2016, Proceedings, Part IV 14, Springer, 2016, pp.
630645.
11

View File

@@ -0,0 +1,946 @@
Computer Standards & Interfaces 97 (2026) 104121
Contents lists available at ScienceDirect
Computer Standards & Interfaces
journal homepage: www.elsevier.com/locate/csi
Sharing as You Desire: A fuzzy certificateless proxy re-encryption scheme for
efficient and privacy-preserving cloud data sharing
Jiasheng Chen a , Zhenfu Cao a ,, Liangliang Wang b,c , Jiachen Shen a , Xiaolei Dong a
a
East China Normal University, Software Engineering Institute, Shanghai Collaborative Innovation Center of Trusted Industry Internet
Software, Shanghai, 200062, China
b
Shanghai University of Electric Power, Faculty of Artificial Intelligence, Shanghai, 201306, China
c
Police Integration Computing Key Laboratory of Sichuan Province, Luzhou, 646000, China
ARTICLE INFO ABSTRACT
Keywords: Secure sharing mechanism in the cloud environment not only needs to realize efficient ciphertext storage of
Cloud security resource-constrained clients, but also needs to build a trusted data sharing system. Aiming at the limitations of
Proxy re-encryption existing schemes in terms of user identity privacy protection, insufficient access control granularity, and data
Certificateless cryptography
sharing security, we propose a fuzzy certificateless proxy re-encryption (FCL-PRE) scheme. In order to achieve
Conditional privacy
much better fine-grained delegation and effective conditional privacy, our scheme regards the conditions as an
attribute set associated with pseudo-identities, and re-encryption can be performed if and only if the overlap
distance of the senders and receivers attribute sets meets a specific threshold. Moreover, the FCL-PRE scheme
ensures anonymity, preventing the exposure of users real identities through ciphertexts containing identity
information during transmission. In the random oracle model, FCL-PRE not only guarantees confidentiality,
anonymity, and collusion resistance but also leverages the fuzziness of re-encryption to provide a certain level
of error tolerance in the cloud-sharing architecture. Experimental results indicate that, compared to other
existing schemes, FCL-PRE offers up to a 44.6% increase in decryption efficiency while maintaining the lowest
overall computational overhead.
1. Introduction In response to the demand for secure cloud data sharing, the proxy
re-encryption (PRE) [4] scheme was proposed. This technology not
As information technology and the Internet continue to evolve, only allows data to be stored on the cloud server but also capitalizes
users can now access networks anytime and anywhere through mo- on the clouds computing capabilities to securely achieve decryption
bile devices, driving the widespread adoption of cloud services. By authorization in Fig. 1. In a typical PRE scheme, key generation center
leveraging flexible resource scheduling and high network accessibility, (KGC) is responsible for generating the systems public parameters
cloud computing has attracted enterprises such as Amazon, Google, and issuing publicprivate key pairs for registered users based on the
and Alibaba to introduce cloud-based data storage, access, and shar- master secret key. Generally, the data sender encrypts information
ing services [13]. However, cloud service providers are not always with their own 𝐼𝐷 (i.e., e-mail account, phone numbers) and produces
completely trustworthy. Due to factors such as technical limitations the re-encryption key for authorized users, which is stored on the
or economic incentives, they may engage in practices that could com- cloud server alongside the ciphertext. Only the authorized recipient
promise users rights. In recent years, data breaches have occurred
can instruct the cloud server to perform ciphertext transformation using
frequently: in 2018, Teslas Kubernetes console on AWS was left un-
the re-encryption key, thereby achieving secure data sharing. However,
secured, allowing attackers to exploit the cloud environment; in 2019,
despite simplifying certificate management, traditional identity-based
Capital One faced misconfigurations on AWS, enabling hackers to gain
proxy re-encryption (IB-PRE [5]) still suffers from several limitations:
unauthorized access and disclose more than 100 million user data. Ev-
(1) it relies on the KGC for key escrow, meaning that if the KGC is
idently, although outsourcing data to the cloud can reduce the burden
of hardware maintenance, it also deprives users of direct control over compromised or acts maliciously, users private keys are at serious risk
their data, thereby increasing the risk of potential privacy breaches. of exposure; (2) it lacks flexible dynamic authorization, such that even
Corresponding author.
E-mail addresses: jschen@stu.ecnu.edu.cn (J. Chen), zfcao@sei.ecnu.edu.cn (Z. Cao), llwang@shiep.edu.cn (L. Wang), jcshen@sei.ecnu.edu.cn (J. Shen),
dongxiaolei@sei.ecnu.edu.cn (X. Dong).
https://doi.org/10.1016/j.csi.2025.104121
Received 30 June 2025; Received in revised form 23 November 2025; Accepted 21 December 2025
Available online 23 December 2025
0920-5489/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
J. Chen et al. Computer Standards & Interfaces 97 (2026) 104121
progressed, the limitations of the original PRE model gradually be-
came evident. For example, a malicious user may collude with the
proxy to recover the senders private key. Ateniese et al. [12] later
presented a unidirectional PRE scheme that offers a certain level of
resistance against collusion attacks, although it still depends on a
public key infrastructure (PKI) for certificate management. Gentry [13]
addressed the burden imposed by PKI by introducing the paradigm
of certificate-based cryptography, thereby eliminating the need for
Fig. 1. Data sharing based on proxy re-encryption.
online third-party certificate queries. Sur et al. [14] further applied
this paradigm by designing a certificate-based encryption scheme. They
were the first to combine it with proxy re-encryption, and thus pro-
minor changes in a users identity information require the regeneration posed a certificate-based proxy re-encryption (CB-PRE) scheme that
of private keys, thus increasing administrative overhead and system achieves chosen-ciphertext (IND-CCA) security in the random oracle
complexity; and (3) it struggles to satisfy the requirements of high- model. On the other hand, to further simplify the public key infrastruc-
privacy scenarios. For instance, in mobile healthcare, patients private ture, Green and Ateniese [5] extended PRE to identity-based scenarios,
information may be directly used as public keys for encryption [68]. significantly reducing certificate management overhead by replacing
Once an attacker traces such identifiers to a patients real identity, a traditional public keys with user identifiers and achieving adaptive
severe privacy breach can result, endangering the patients information CCA security. In this context, Ge et al. [15] designed an identity-
security. based broadcast PRE (BPRE) scheme that supports revocation of a
To address the challenges of insufficient anonymity, key escrow, shared user set and can resist chosen-plaintext attacks, while Zhang
and difficulty in dynamic privilege adjustment, we propose an anony- et al. [16] employed bilinear pairings to construct an identity-based
mous fuzzy certificateless proxy re-encryption scheme (FCL-PRE). Our BPRE scheme for VANETs that achieves CPA security with constant
scheme not only supports identity hiding and fuzzy matching, but decryption overhead.
also effectively prevents unauthorized access and significantly improves
(2) Conditional PRE schemes: Once the basic transformation capabil-
system error tolerance. The main contributions of FCL-PRE are as
ity of PRE had been established, researchers began to enrich PRE with
follows.
more expressive access control and privacy guarantees. In traditional
• Fuzzy certificateless PRE with conditional privacy. A new PRE systems, once the proxy obtains a re-encryption key, it can often
fuzzy certificateless proxy re-encryption scheme that is tolerant convert all ciphertexts of the delegator for the designated delegatee,
to noisy biometric measurements is proposed. Specifically, the which is incompatible with fine-grained authorization requirements. To
trusted authority first derives a stable, unique biometric iden- address this issue, Weng et al. [19] first proposed conditional proxy
tity 𝑈 𝐼𝐷 from noisy biometric samples, and then generates a re-encryption (CPRE). In their construction, a condition expression is
pseudo-identity with a specific set of attributes 𝜔 = (𝜔𝑖 )𝑛𝑖=1 embedded into the re-encryption key, so that the proxy is only able
for it. Re-encryption is allowed only when the overlap between to transform ciphertexts that satisfy the specified condition, which
the senders and receivers attribute sets satisfies a threshold enforces strict control over the proxys capability at the semantic level.
condition, that is |𝜔 ∩ 𝜔′ | ≥ 𝑑. This policy enforces conditional At the same time, Ateniese et al. [22] presented a PRE scheme with key
privacy on top of pseudo-identities, simplifies key management in privacy. Even if an adversary obtains a re-encryption key, it cannot dis-
the certificateless setting, and enables flexible and efficient data tinguish the delegatees identity, which further protects the receivers
sharing among users with similar attributes. privacy. Shao et al. [18] achieved key privacy while preserving CCA
• Anonymous data sharing via pseudonyms. The proposed security. Li et al. [17] incorporated the idea of conditional PRE into
scheme enhances conditional privacy and reduces the cost of certificate-based cryptography. Their scheme allows only ciphertexts
managing pseudonyms by tightly binding biometrics, pseudo- associated with specific subsets to be transformed and forwarded to
identities, and strong keys. The trusted authority internally main- designated delegatees, and also attains CCA security. In order to sup-
tains a mapping (𝑈 𝐼𝐷, 𝑃 𝑈 𝐼𝐷, 𝜔), where 𝜔 is associated with port more expressive access structures, Yao et al. [21] designed a CPRE
𝑃 𝑈 𝐼𝐷. Thus, the privacy-preserving pseudo-identity can only scheme with ciphertext evolution, which ensures that the delegation
be recovered by the fully trusted authority. Meanwhile, a user process remains under the data owners control. Li et al. [20] proposed
can encrypt and share data on behalf of an attribute group a CPRE scheme that supports only a single receiver. Lin et al. [30]
using a single 𝑃 𝑈 𝐼𝐷, rather than maintaining many separate developed a CPRE scheme tailored for IoT scenarios, which supports
pseudonyms, thus significantly reducing the key management revocation of misbehaving users without relying on a fully trusted
overhead on the user side. third party. Zhang et al. [31] designed a key-sharing mechanism based
• Security and practicality. We provide a detailed security proof on CPRE and combined it with a bilinear accumulator to verify the
of FCL-PRE in the random oracle model, demonstrating that it integrity of homomorphic encryption keys stored in the cloud. Chen
satisfies chosen plaintext attack (IND-CPA) security. Theoreti- et al. [25] constructed a conditional BPRE scheme based on bilinear
cal analysis and experimental results show that FCL-PRE not pairings under conditional constraints.
only achieves anonymity, error tolerance, and resistance to collu- (3) Certificateless-based PRE schemes: Due to the inherent key escrow
sion attack, but also has minimal computational overhead in the problem in identity-based cryptography, Sur et al. [32] introduced
decryption phase. PRE into the certificateless public key setting [33], and then proposed
the concept of certificateless proxy re-encryption (CL-PRE). In CL-PRE,
2. Related work each users private key is split into a partial private key generated
by a key generation center (KGC) and a user-chosen secret value.
(1) Basic PRE schemes: In 1998, Blaze et al. [4] first introduced the This design avoids full key escrow by the KGC and does not require
notion of proxy re-encryption (PRE), which enables a semi-honest proxy traditional certificate management, which makes CL-PRE particularly
to transform ciphertexts without accessing the underlying decryption suitable for resource-constrained environments. Within this framework,
keys. Subsequent early works primarily examined how to delegate Bhatia et al. [34] constructed a lightweight pairing-free CL-PRE scheme
decryption capabilities securely and efficiently so as to support data and applied it to mobile healthcare scenarios. Eltayieb et al. [35]
sharing and access control in cloud environments [911]. As research further adopted blockchain as the proxy to execute the re-encryption
2
J. Chen et al. Computer Standards & Interfaces 97 (2026) 104121
Table 1
Summary of functional comparison with other schemes.
Schemes Techniques Conditional privacy Fuzzy matching Anonymity Multiple receivers Collusion resistance
[13,14,17] CB-PRE × × × ×
[18] CPRE ✓ × ✓ ✓ ×
[15,16] IB-PRE × × × ✓ ✓
[19,20] CPRE ✓ × × × ×
[21] IB-CPRE ✓ × × ✓ ✓
[22] CPRE ✓ ×× ×
[23,24] CL-PRE × × × ✓ ✓
[25] IB-CPRE ✓ × ✓ ✓ ✓
[26,27] Fuzzy IB-CPRE ✓ ✓ ××
[28,29] CL-CPRE ✓ × × ✓ ✓
Ours Fuzzy CL-CPRE ✓ ✓ ✓ ✓ ✓
algorithm, which not only preserves data confidentiality but also pro- 3.1. Bilinear map
vides a flexible revocation mechanism. Subsequent CL-PRE works [23,
24,36] mainly focused on improving efficiency, supporting revocation, Suppose there exists a mapping 𝑒 G × G → G𝑇 , where G and
and enhancing traceability. Similarly, to prevent cloud platforms from G𝑇 represent two cyclic groups with the same prime order 𝑞. 𝑃 is
abusing re-encryption permissions, Li et al. [28] proposed a novel a generator of G, then a bilinear map 𝑒 should have the following
pairing-free scheme based on certificateless conditional BPRE. Zhou properties [40]:
et al. [29] combined certificateless public key cryptography and PRE,
• Bilinearity: 𝑒(𝑎𝑃 , 𝑏𝑃 ) = 𝑒(𝑃 , 𝑃 )𝑎𝑏 holds for all 𝑎, 𝑏𝑍𝑞 .
which realizes multi-level data access control, dynamic key update, and
ciphertext evolution. • Nondegeneracy: There exists 𝑃 such that 𝑒(𝑃 , 𝑃 ) ≠ 1.
(4) Fuzzy PRE schemes: In another line of research, advances in • Computability: 𝑒(𝑃1 , 𝑃2 ) can be computed efficiently for all 𝑃1 , 𝑃2
biometric technologies have introduced new design dimensions for ∈ G.
PRE. Fuzzy identity-based encryption (FIBE) [37] leverages biometric
characteristics such as fingerprints and irises, which are inherently 3.2. Useful definitions
unique and tamper-resistant, to derive descriptive attribute sets that
serve as a natural attribute space for encryption and authorization. Definition 1 (Shamir Secret Sharing [41]). Shamirs secret sharing
Following this idea, Fang et al. [26] proposed an FCPRE scheme in scheme, introduced in 1979, is based on polynomial interpolation. A
which descriptive keywords are used as conditions to realize fuzzy secret 𝑠 is divided into 𝑛 shares, denoted as 𝑠1 , … , 𝑠𝑛 with a threshold
𝑡, such that any set of at least 𝑡 participants 𝑖 can recover 𝑠, whereas
conditional PRE. In their scheme, the proxy can re-encrypt ciphertexts
any subset of size less than 𝑡 gains no information about it. The scheme
according to a 𝑡-out-of-𝑑 threshold strategy. Xiong et al. [38] later
consists of the following phases:
proposed an improved pairing-based fuzzy identity-based signature
(FIBS) scheme that supports the error tolerance property. Li et al. [27] • Secret distribution: Let  = {1 , … , 𝑛 } denote the set of par-
presented the first lattice-based FIB-CPRE scheme. Their scheme pro- ticipants and randomly select the secret value 𝑠𝑍𝑞 . Then, a
vides finer-grained control over delegated decryption, but incurs high polynomial 𝐹 (𝑥) of degree 𝑡 1 is selected that satisfying the
computational cost, which negatively affects overall encryption and condition of 𝐹 (0) = 𝑠, then 𝐹 (𝑥) can be expressed as:
decryption efficiency. It should be noted that the use of biometric
𝑡1
traits can significantly improve usability, but the noise inevitably intro- 𝐹 (𝑥) = 𝑠 + 𝑎𝑗 𝑥𝑗 mod 𝑞.
duced during biometric acquisition and feature extraction makes key 𝑗=1
generation and matching more challenging. To cope with this issue, Therefore, the share set 𝑆𝑆 = {(𝜔𝑖 , 𝑠𝑖 )|1 ≤ 𝑖𝑛}, where 𝐹 (𝜔𝑖 ) =
Wang et al. [39] proposed a novel fuzzy certificateless signature au- 𝑠𝑖 . The 𝑖th share (𝜔𝑖 , 𝑠𝑖 ) is privately delivered to the corresponding
thentication scheme that achieves conditional privacy while effectively participant 𝑖 .
protecting the confidentiality of users real biometric characteristics. • Secret reconstruction: Let 𝑆 ⊆ {1, … , 𝑛} be a group with |𝑆| = 𝑡.
As summarized in Table 1, existing PRE schemes and their variants The secret value is reconstructed from shares 𝑠1 , … , 𝑠𝑛 using the
have achieved substantial progress in terms of functionality and ap- Lagrange interpolation method:
plicability to diverse scenarios. However, several important limitations ∑ ∑
remain. 𝐹 (𝑥) = 𝛥𝜔𝑖 ,𝑆 (𝑥)𝐹 (𝜔𝑖 ) = 𝛥𝜔𝑖 ,𝑆 (𝑥)𝑠𝑖 .
𝑖 ∈𝑆 𝑖 ∈𝑆
• The scalability on the receiver side is restricted. Many schemes ∏ 𝑥−𝜔𝑘
where 𝛥𝜔𝑖 ,𝑆 (𝑥) = 𝑖 ∈𝑆,𝑘≠𝑖 𝜔𝑖 −𝜔𝑘 is denoted as the Lagrange
do not efficiently support data sharing among multiple receivers, coefficient.
which limits their practicality in large-scale collaborative appli-
cations, such as schemes [14,17,20]. Definition 2 (Decisional Bilinear DiffieHellman (DBDH) Assumption).
• The strong binding between real identities and biometric char- Given a random instance (𝑃 , 𝑎𝑃 , 𝑏𝑃 , 𝑐𝑃 , 𝑇 ), 𝑃 ∈ G, 𝑎, 𝑏, 𝑐 are randomly
acteristics introduces significant privacy risks. Some biometric- selected elements from 𝑍𝑞 , and 𝑇 is an element in G𝑇 . The DBDH
based schemes do not adequately protect the identity privacy assumption requires determining whether 𝑇 is equal to 𝑒(𝑃 , 𝑃 )𝑎𝑏𝑐 or
of senders and receivers, and therefore cannot satisfy stringent a random element in G𝑇 . For any PPT algorithms , the advantage
privacy requirements, as in schemes [23,24,26,28,29]. of successfully distinguishing between 𝑇 = 𝑒(𝑃 , 𝑃 )𝑎𝑏𝑐 and a random
element is defined as follows.
3. Preliminaries 𝐴𝑑𝑣𝐷𝐵𝐷𝐻 (𝜆) = |𝑃 𝑟[(𝑃 , 𝑎𝑃 , 𝑏𝑃 , 𝑐𝑃 , 𝑒(𝑃 , 𝑃 )𝑎𝑏𝑐 ) = 1]|
|𝑃 𝑟[(𝑃 , 𝑎𝑃 , 𝑏𝑃 , 𝑐𝑃 , 𝑇 ) = 1]|
This section briefly overviews the basic concepts and techniques
discussed in our scheme. Table 2 provides a list of symbols and their If the advantage 𝐴𝑑𝑣𝐷𝐵𝐷𝐻
(𝜆) in solving the DBDH is negligible, then
descriptions. the DBDH assumption holds.
3
J. Chen et al. Computer Standards & Interfaces 97 (2026) 104121
Table 2
Summary of notations.
Symbol Description
𝜆 Security parameter
𝑚𝑠𝑘 Master secret key
𝑏𝑖𝑜 Biometric characteristic
𝐼𝑑𝐺𝑒𝑛(⋅) An identity extraction function
𝑈 𝐼𝐷 Realistic identity
𝑃 𝑈 𝐼𝐷 Pseudo-identity
𝑑 Error tolerance
𝜔 An attribute set
𝑥𝑃 𝑈 𝐼𝐷 Secret value
𝑆𝐾𝑃 𝑈 𝐼𝐷 Users full private key
𝑃 𝐾𝑃 𝑈 𝐼𝐷 Public key
𝑅𝐾,𝜔, Re-encryption key
𝐶𝑇 Original ciphertext
𝐶𝑇 Re-encrypted ciphertext
Fig. 2. The operation flow of FCL-PRE.
Definition 3 (Syntax of FCL-PRE). The nine polynomial-time algorithms
shown below constitute our FCL-PRE scheme.
• Key Generation Center (KGC): As an honest but curious KGC, it
• Setup. On input a security parameter 𝜆, TA and KGC generate
is responsible for performing system initialization and generating
system parameter 𝑝𝑎𝑟𝑎𝑚𝑠, and a master secret key 𝑚𝑠𝑘 that is kept
a partial private key related to the users identity, and it is
secret from user.
assumed that KGC and TA will not collude.
• PartialPrivateKey. After TA publishes the pseudo-identity 𝑃 𝑈 𝐼𝐷
• Cloud Proxy Server (CPS): CPS is responsible for storing original
for each registered user, KGC generates the corresponding partial
ciphertexts and executing conditional re-encryption operations.
private key 𝐷𝑃 𝑈 𝐼𝐷 and sends it to the user.
When the receiver  sends an access request, CPS first verifies
• SetSecretValue. The sender  executes the algorithm, and
whether the condition |𝜔 ∩ 𝜔′ | ≥ 𝑑. If so, sender  generates a cor-
chooses a secret value 𝑥𝑃 𝑈 𝐼𝐷 randomly.
responding re-encryption key for CPS to perform re-encryption.
• SetPrivateKey. On input 𝑃 𝑈 𝐼𝐷, 𝑝𝑎𝑟𝑎𝑚𝑠, 𝑥𝑃 𝑈 𝐼𝐷 and 𝐷𝑃 𝑈 𝐼𝐷 , 
Otherwise, CPS refuses to implement the re-encryption operation.
generates the complete private key 𝑆𝐾𝑃 𝑈 𝐼𝐷 .
Please note that, as a semi-trusted entity, it may still attempt to
• SetPublicKey.  performs this algorithm, and inputs 𝑥𝑃 𝑈 𝐼𝐷 , then
infer user privacy from the shared data.
outputs the full public key 𝑃 𝐾𝑃 𝑈 𝐼𝐷 .
• Sender ():  can use the public key associated with 𝑃 𝑈 𝐼𝐷 to
• Encryption. On input 𝑃 𝑈 𝐼𝐷, 𝑝𝑎𝑟𝑎𝑚𝑠, a message 𝑚, and 𝑃 𝐾𝑃 𝑈 𝐼𝐷 , encrypt the data to be shared, generate the original ciphertext
 computes the original ciphertext 𝐶𝑇 .
𝐶𝑇 and upload it to CPS storage. In addition,  produces the
• ReKey Generation. Given the private key 𝑆𝐾𝑃 𝑈 𝐼𝐷 , s pseudo- corresponding re-encryption key 𝑅𝐾 ,𝜔, according to the result
identity 𝑃 𝑈 𝐼𝐷 and the corresponding 𝑃 𝐾𝑃 𝑈 𝐼𝐷 ,  generates a of the verification equation, and sends it to CPS.
conditional re-encryption key 𝑅𝐾 ,𝜔, by running this algorithm.
• Receiver (): The authorized receiver  can decrypt and obtain
• Re-encryption. Upon receiving 𝑅𝐾 ,𝜔, , the original ciphertext the plaintext by downloading the re-encrypted ciphertext.
𝐶𝑇 , the cloud should verify whether the equation |𝜔 ∩ 𝜔′ | ≥
𝑑 holds. If and only when the algorithm satisfies, the origi-
nal ciphertext 𝐶𝑇 can be re-encrypted, and the second-layer of 4.2. Security guarantee model
ciphertext 𝐶𝑇 can be generated.
• Decryption. The user invokes it to decrypt the corresponding There are two types of adversaries in the certificateless cryptosys-
ciphertext, resulting in either the plaintext 𝑚 or ⟂. tem [42]: 1 is the first type of adversary, which can replace the users
public key, and 2 is the second type of adversary, which can obtain
4. Scheme model the master secret key. Game-I and Game-II are the IND-CPA security
games for FCL-PRE. Please note that each pseudo-identity 𝑃 𝑈 𝐼𝐷 is
In this section, we introduce the system model, outline the security associated with an attribute set 𝜔.
guarantee model, and specify security requirements, respectively. Game-I. This game embodies the attack ability of 1 , challenger 
responds to 1 s a series queries by controlling the following oracles.
4.1. System model
• Initialization. When 𝜆 is received,  first executes the Setup
The operation flow of fuzzy certificateless proxy re-encryption algorithm to obtain 𝑝𝑎𝑟𝑎𝑚𝑠, and generates the system master key
scheme is shown in Fig. 2. It includes five different parties, namely: 𝑚𝑠𝑘. Then,  outputs 𝑝𝑎𝑟𝑎𝑚𝑠 and keeps 𝑚𝑠𝑘 in secret.
Trusted Authority, Key Generation Center, Cloud Proxy Server, Sender, • Phase 1. The adversary 1 initiates a series of queries, and 
and Receiver. responds accordingly.
• Trusted Authority (TA): TA is a fully trusted authority whose PPKQuery oracle 𝑝𝑝𝑘 :  executes the PartialPrivateKey
primary role is to generate privacy-preserving pseudo-identities algorithm to generate the partial private key 𝐷𝑃 𝑈 𝐼𝐷 for the
𝑃 𝑈 𝐼𝐷 for users and to cooperate with KGC in setting up and pub- 𝑃 𝑈 𝐼𝐷 and returns it to 1 .
lishing the public parameters. At the same time, it maintains an SKQuery oracle 𝑠𝑘 : After receiving the partial private key
internal mapping (𝑈 𝐼𝐷, 𝑃 𝑈 𝐼𝐷, 𝜔), where 𝜔 denotes the attribute 𝐷𝑃 𝑈 𝐼𝐷 ,  first runs PartialPrivateKey and SetSecretValue
set associated with each 𝑃 𝑈 𝐼𝐷. Only the pseudo-identity and algorithms to obtain the corresponding 𝐷𝑃 𝑈 𝐼𝐷 and 𝑥𝑃 𝑈 𝐼𝐷 .
its associated attribute information are exposed to other entities, Next,  runs the SetPrivateKey algorithm to generate the
while the real identity 𝑈 𝐼𝐷 remains exclusively known to TA. complete private key 𝑆𝐾𝑃 𝑈 𝐼𝐷 , and returns it to 1 .
4
J. Chen et al. Computer Standards & Interfaces 97 (2026) 104121
PKQuery oracle 𝑝𝑘 :  runs the SetSecretValue algorithm (3) If 2 has sent the private key queries to the challenge
to obtain 𝑥𝑃 𝑈 𝐼𝐷 , and extracts the users public key 𝑃 𝐾𝑃 𝑈 𝐼𝐷 identity 𝑃 𝑈 𝐼𝐷𝜋 that meets the |𝜔 ∩ 𝜔𝜋 | ≥ 𝑑 condition,
by running the SetPublicKey algorithm. Finally,  returns the re-encryption key queries can no longer be performed,
it to 1 . and the information related to the re-encrypted ciphertext
PK replacement oracle 𝑝𝑘𝑟𝑝 : When 1 queries a two- cannot be queried.
tuple (𝑃 𝑈 𝐼𝐷, 𝑃 ̃𝐾𝑃 𝑈 𝐼𝐷 ), where 𝑃 ̃𝐾𝑃 𝑈 𝐼𝐷 is the newly se-
• Guess. Finally, 2 guesses the challenge bit 𝑏 ∈ {0, 1}. If 𝑏 = 𝑏,
lected public key to replace the public key 𝑃 𝐾𝑃 𝑈 𝐼𝐷 cur-
2 wins this game.
rently associated with 𝑃 𝑈 𝐼𝐷. Therefore, 1 performs pub-
lic key replacement, such as 𝑃 𝐾𝑃 𝑈 𝐼𝐷 = 𝑃 ̃ 𝐾𝑃 𝑈 𝐼𝐷 .
Definition 5. According to the definition of Game-II, our FCL-PRE is
ReKeyGen oracle 𝑟𝑘 :  runs the ReKey Generation al-
IND-CPA secure if the advantage of 2 is negligible, defined as
gorithm and returns a re-encryption key 𝑅𝐾 ,𝜔, to 1 . If
1
the public key of 𝑃 𝑈 𝐼𝐷 has been replaced at this time, 1 𝐴𝑑𝑣𝐺𝑎𝑚𝑒−𝐼𝐼
(𝜆) = |𝑃 𝑟[𝑏 = 𝑏] |.
2 2
cannot perform this query.
Re-encryption oracle 𝑟𝑒𝑒𝑛 :  performs it and returns a re-
4.3. Security requirements
encrypted 𝐶𝑇 to 1 . If the public key of 𝑃 𝑈 𝐼𝐷 has been
replaced, 1 cannot perform the query.
The proposed FCL-PRE scheme should satisfy the following security
• Challenge. After completing all the interactions between 1 and objectives.
, 1 outputs a challenge identity 𝑃 𝑈 𝐼𝐷𝜋 and two messages of
• Confidentiality. FCL-PRE must protect sensitive information before
equal length (𝑚0 , 𝑚1 ).  randomly selects a message 𝑚𝑏 , 𝑏 ∈ {0, 1},
it is uploaded to the CPS and prevent any access by unauthorized
calculates the corresponding ciphertext and returns it to 1 .
recipients. Additionally, when generating the original ciphertext
• Phase 2. 1 and challenger  continue to conduct queries and and re-encryption key, conditional information is incorporated to
answers similar to phase 1, but must follow three constraints. ensure that re-encryption can only be performed if the original
ciphertext meets specific conditions.
(1) 1 has never queried the partial private key or private key
• Anonymity. To protect user privacy, FCL-PRE must conceal the
for the challenge identity 𝑃 𝑈 𝐼𝐷𝜋 that meets the |𝜔 ∩ 𝜔𝜋 | ≥
users real biometric identity. Unless it is a trusted third party,
𝑑.
no adversary can establish a valid biometric identification as-
(2) If 1 sends the re-encryption key queries to a challenge
sociation, thereby preventing the leakage of the users identity
identity 𝑃 𝑈 𝐼𝐷𝜋 that meets the |𝜔 ∩ 𝜔𝜋 | ≥ 𝑑 condition, then
information.
the partial private key queries or private key queries can no
• Error tolerance. Considering that biometric characteristic may con-
longer be performed.
tain some noise with each sampling, FCL-PRE must exhibit error
(3) If 1 has sent the partial private key or private key queries tolerance. Specifically, when the distance between the biometric
to challenge identity 𝑃 𝑈 𝐼𝐷𝜋 that meets the |𝜔 ∩ 𝜔𝜋 | ≥ 𝑑 identity 𝜔 of the sender  and another identity 𝜔′ is higher than
condition, the re-encryption key queries can no longer be a predefined threshold 𝑑, the proxy can use the re-encryption
performed, and the information related to the re-encrypted key to generate the corresponding re-encrypted ciphertext for 𝜔′ ,
ciphertext cannot be queried. enabling efficient data sharing.
• Collusion resistance. In our FCL-PRE, even in the presence of semi-
• Guess. Finally, 1 guesses the challenge bit 𝑏 ∈ {0, 1}. If 𝑏 = 𝑏,
trusted parties, such as collusion between CPS and the receiver,
1 wins this game.
CPS cannot obtain the senders complete private key and thus
cannot perform any decryption operations, ensuring the systems
Definition 4. According to the definition of Game-I, our FCL-PRE is security against internal collusion attacks.
IND-CPA secure if the advantage of 1 is negligible, defined as
1
𝐴𝑑𝑣𝐺𝑎𝑚𝑒−𝐼
(𝜆) = |𝑃 𝑟[𝑏 = 𝑏] |. 5. The proposed FCL-PRE scheme
1 2
Game-II. The game embodies the attack ability of 2 , challenger  In this section, we thoroughly describe FCL-PRE, which supports
responds to 2 s a series queries by controlling the following oracles. efficient fuzzy data sharing through anonymized biometric identities.
Game-II is similar to Game-I, therefore, only their main differences are The procedure flow of FCL-PRE is presented in Fig. 3.
presented below.
5.1. System initialization
• Initialization. When 𝜆 is received,  first executes the Setup
algorithm to obtain 𝑝𝑎𝑟𝑎𝑚𝑠, and generates a system master key (1) Upon inputting the security parameter 𝜆, KGC generates a bilinear
𝑚𝑠𝑘. Then,  returns them to 2 . pairing parameters (𝑒, G, G𝑇 , 𝑞, 𝑃 ), where G and G𝑇 represent two
• Phase 1. 2 issues a series of queries similar to those in Game-I, cyclic groups with the same prime order 𝑞, 𝑒 G × G → G𝑇 , 𝑃
and  responds accordingly. At this time, 2 lacks the ability to is the generator of G. Then, KGC selects 𝑠𝑍𝑞 randomly and
replace the public key. calculates the system public key 𝑃𝑝𝑢𝑏 = 𝑠𝑃 .
• Challenge. Similar to the Game-I. (2) TA considers a symmetric key encryption scheme to hide the
• Phase 2. 2 and challenger  continue to conduct similar queries users realistic identity 𝑈 𝐼𝐷, denoted by 𝐸𝑛𝑐𝜙 (⋅) and 𝐷𝑒𝑐𝜙 (⋅).
and answers as in phase 1, but must follow three constraints. Here, 𝐸𝑛𝑐𝜙 (⋅) represents the encryption algorithm, 𝐷𝑒𝑐𝜙 (⋅) rep-
resents the decryption algorithm, and 𝜙 is the shared symmetric
(1) 2 has never queried the private key for the challenge key.
identity 𝑃 𝑈 𝐼𝐷𝜋 that meets the |𝜔 ∩ 𝜔𝜋 | ≥ 𝑑 condition. (3) Finally, TA and KGC choose four collision-resistant hash func-
(2) If 2 sends the re-encryption key queries to a challenge tions: 𝐻1 {0, 1} → G, 𝐻2 {0, 1} → G, 𝐻3 {0, 1} → G,
identity 𝑃 𝑈 𝐼𝐷𝜋 that meets the |𝜔 ∩ 𝜔𝜋 | ≥ 𝑑 condition, then and 𝐻4 {0, 1}𝑍𝑞 , define the system parameters as 𝑝𝑎𝑟𝑎𝑚𝑠 =
the private key queries can no longer be performed. {G, G𝑇 , 𝑒, 𝑞, 𝑑, 𝑃 , 𝑃𝑝𝑢𝑏 , 𝐻1 , 𝐻2 , 𝐻3 , 𝐻4 }.
5
J. Chen et al. Computer Standards & Interfaces 97 (2026) 104121
Fig. 3. The algorithm procedure of FCL-PRE.
5.2. User registration phase (1) 𝑗 picks a random number 𝑟𝑗𝑍𝑞 , and a polynomial 𝑔(𝑥) of
degree 𝑑 1 such that 𝑔(0) = 𝑟𝑗 and assigns 𝑔(𝜔𝑖 ) = 𝑟𝑖,𝑗 , where
Before sharing data, each user must register their identity informa- 𝑖 ∈ {1, … , 𝑛}. Then, 𝑗 computes
tion with TA. Let the sender be denoted as 𝑗 . First, 𝑗 transmits the
𝑈1 = 𝑟𝑗 𝑃 , 𝐸𝑗 = 𝐻2 (𝑃 𝑈 𝐼𝐷𝑗𝑃 𝐾𝑃 𝑈 𝐼𝐷𝑗𝑃𝑝𝑢𝑏 ),
realistic biometric information 𝑏𝑖𝑜 (i.e., fingerprint) to TA via a secure ∏
channel. Then, TA applies the identity extraction function 𝐼𝑑𝐺𝑒𝑛(⋅) 𝑉1 = 𝑚 (𝑒(𝑃𝑝𝑢𝑏 , 𝐻1 (𝑃 𝑈 𝐼𝐷𝑗 ))𝑟𝑖,𝑗 × 𝑒(𝑃 𝐾𝑃 𝑈 𝐼𝐷𝑗 , 𝐸𝑗 )𝑟𝑖,𝑗 )𝛥𝜔𝑖 ,𝑆 (0)
to convert 𝑏𝑖𝑜 into a unique biometric identity 𝑈 𝐼𝐷𝑗 = 𝐼𝑑𝐺𝑒𝑛(𝑏𝑖𝑜). 𝜔𝑖 ∈𝑆
The 𝐼𝑑𝐺𝑒𝑛(⋅) function is similar to a hash function and is irreversible. 𝑗 uploads the original ciphertext 𝐶𝑇 = (𝑈1 , 𝑉1 ) to the CPS.
It transforms the biometrics into an identity that is indistinguishable
(2) Finally, 𝑗 selects 𝑘𝑍𝑞 randomly, and computes 𝑅 = 𝑘𝑃 ,
from random information and cannot be used to infer the original
= 𝐻4 (𝑈1 ∥ 𝑉1 ∥ 𝑅𝑃 𝐾𝑃 𝑈 𝐼𝐷𝑗𝑃 𝑈 𝐼𝐷𝑗 ). Then, 𝑗 generates a
biometrics [39,41].
signature 𝜎𝑗 = 𝑘 + 𝑥𝑃 𝑈 𝐼𝐷𝑗 mod 𝑞, and transmits (𝑅, 𝜎) to the
Next, TA generates a pseudo-identity as 𝑃 𝑈 𝐼𝐷𝑗 = 𝐸𝑛𝑐𝜙 (𝑈 𝐼𝐷𝑗
CPS.
𝑛𝑃 𝑈 𝐼𝐷 ) ∥ 𝑇𝑗 to protect the real biometric identity, where 𝑛𝑃 𝑈 𝐼𝐷 repre-
sents the number of pseudo-identities requested and 𝑇𝑗 is the validity
period of the pseudo-identity. Meanwhile, TA internally maintains a 5.4. Verification and sharing phase
mapping (𝑈 𝐼𝐷𝑗 , 𝑃 𝑈 𝐼𝐷𝑗 , 𝜔), where 𝜔 is the attribute set associated with
𝑃 𝑈 𝐼𝐷𝑗 . Eventually, TA publishes 𝑃 𝑈 𝐼𝐷𝑗 and keeps 𝑈 𝐼𝐷𝑗 secret. When a new receiver 𝑗 initiates an access request, 𝑗 first needs
to send the current pseudo-identity to CPS. After the identity authen-
(1) Upon receiving the attribute set 𝜔 associated with 𝑗 s pseudo- tication is successful, CPS performs re-encryption operations based on
identity 𝑃 𝑈 𝐼𝐷𝑗 , KGC first randomly selects a polynomial 𝑝(𝑥) of this pseudo-identity.
degree 𝑑 1 such that 𝑝(0) = 𝑠 and assigns 𝑝(𝜔𝑖 ) = 𝑠𝑖 , where
𝑖 ∈ {1, … , 𝑛}. Then it calculates the partial private key as 𝐷𝑖,𝑗 = (1) The CPS first computes = 𝐻4 (𝑈1 ∥ 𝑉1 ∥ 𝑅𝑃 𝐾𝑃 𝑈 𝐼𝐷
𝑗
𝑠𝑖 𝐻1 (𝑃 𝑈 𝐼𝐷𝑗 ). The partial private key (𝐷𝑖,𝑗 )𝑛𝑖=1 of 𝑗 is represented ?
by KGC as 𝐷𝑃 𝑈 𝐼𝐷𝑗 . 𝑃 𝑈 𝐼𝐷𝑗 ) and 𝜎𝑗 𝑃 = 𝑅 + 𝑃 𝐾𝑃 𝑈 𝐼𝐷 . After the signature verifi-
𝑗
cation is successful, CPS selects a 𝑑-element subset, 𝑆 ⊆ 𝜔 ∩ 𝜔′
(2) After receiving the partial private key 𝐷𝑃 𝑈 𝐼𝐷𝑗 , 𝑗 can calculate
randomly, and determines whether the input attribute set 𝜔′
Lagrange coefficients and perform local verification to ensure
satisfies |𝜔 ∩ 𝜔′ | ≥ 𝑑, if yes, CPS returns the result to the sender.
consistency: 𝑒(𝐷𝑃 𝑈 𝐼𝐷𝑗 , 𝑃 ) = 𝑒(𝐻1 (𝑃 𝑈 𝐼𝐷𝑗 ), 𝑃𝑝𝑢𝑏 ). Then, 𝑗 chooses
(2) 𝑗 generates the corresponding re-encryption key for the pseudo-
a random secret value 𝑥𝑃 𝑈 𝐼𝐷𝑗𝑍𝑞 , a polynomial 𝑦(𝑥) of degree
identity based on the result. 𝑗 computes 𝜑 = 𝑒(𝐷𝑃 𝑈 𝐼𝐷𝑗 ,
𝑑 1 such that 𝑦(0) = 𝑥𝑃 𝑈 𝐼𝐷𝑗 , and lets 𝑦(𝜔𝑖 ) = 𝑥𝑖,𝑃 𝑈 𝐼𝐷𝑗 , where
𝐻1 (𝑃 𝑈 𝐼𝐷𝑗 )), 𝑅𝐾 ,𝜔, = 𝐷𝑃 𝑈 𝐼𝐷𝑗 𝑥𝑃 𝑈 𝐼𝐷𝑗 𝐸𝑗 + 𝐻3 (𝜑 ∥ 𝑥𝑃 𝑈 𝐼𝐷𝑗
𝑖 ∈ {1, … , 𝑛}. Then, 𝑗 s secret value (𝑥𝑖,𝑃 𝑈 𝐼𝐷𝑗 )𝑛𝑖=1 is defined as
𝑃 𝐾𝑃 𝑈 𝐼𝐷 ∥ 𝜔 ∥ 𝜔′ ), and then sends 𝑅𝐾 ,𝜔, to CPS.
𝑥𝑃 𝑈 𝐼𝐷𝑗 . 𝑗
(3) Obtaining 𝐷𝑃 𝑈 𝐼𝐷𝑗 , 𝑗 sets the full private key as 𝑆𝐾𝑃 𝑈 𝐼𝐷𝑗 = (3) Finally, CPS can use the re-encryption key 𝑅𝐾 ,𝜔, to convert
(𝐷𝑃 𝑈 𝐼𝐷𝑗 , 𝑥𝑃 𝑈 𝐼𝐷𝑗 ). 𝐶𝑇 into a re-encrypted ciphertext 𝐶𝑇 . It computes 𝑈2 = 𝑈1 ,
𝑉2 = 𝑉1 𝑒(𝑈1 , 𝑅𝐾 ,𝜔, ), and then outputs 𝐶𝑇 = (𝑈2 , 𝑉2 ) to the
(4) 𝑗 calculates 𝑃 𝐾𝑃 𝑈 𝐼𝐷𝑗 = 𝑥𝑃 𝑈 𝐼𝐷𝑗 𝑃 as the public key, and pub-
authorized recipient.
lishes it.
5.3. Data encryption phase 5.5. Data decryption phase
Given the 𝑗 s identity 𝑃 𝑈 𝐼𝐷𝑗 associated with an attribute set 𝜔 = The procedure to decrypt the original ciphertext and the re-
(𝜔𝑖 )𝑛𝑖=1 , the public key 𝑃 𝐾𝑃 𝑈 𝐼𝐷𝑗 , and a message 𝑚. encrypted ciphertext is as follows:
6
J. Chen et al. Computer Standards & Interfaces 97 (2026) 104121
Correctness
For the original ciphertext 𝐶𝑇 = (𝑈1 , 𝑉1 ):
𝑉1
𝑚= ∏
𝛥𝜔𝑖 ,𝑆 (0)
𝜔𝑖 ∈𝑆 𝑒(𝑈1 , 𝐷𝑃 𝑈 𝐼𝐷𝑗 + 𝑥𝑃 𝑈 𝐼𝐷𝑗 𝐸𝑗 )
𝑚 𝜔𝑖 ∈𝑆 (𝑒(𝑃𝑝𝑢𝑏 , 𝐻1 (𝑃 𝑈 𝐼𝐷𝑗 ))
𝑟𝑖,𝑗
× 𝑒(𝑃 𝐾𝑃 𝑈 𝐼𝐷𝑗 , 𝐸𝑗 )𝑟𝑖,𝑗 )𝛥𝜔𝑖 ,𝑆 (0)
= ∏ 𝛥𝜔𝑖 ,𝑆 (0)
𝜔𝑖 ∈𝑆 𝑒(𝑈1 , 𝐷𝑃 𝑈 𝐼𝐷𝑗 + 𝑥𝑃 𝑈 𝐼𝐷𝑗 𝐸𝑗 )
𝑚
=
𝑒(𝑈1 ,𝐷𝑃 𝑈 𝐼𝐷𝑗 +𝑥𝑃 𝑈 𝐼𝐷𝑗 𝐸𝑗 ) 𝛥𝜔𝑖 ,𝑆 (0)
𝜔𝑖 ∈𝑆 ( 𝑒(𝑃𝑝𝑢𝑏 ,𝐻1 (𝑃 𝑈 𝐼𝐷𝑗 ))
𝑟𝑖,𝑗
×𝑒(𝑃 𝐾𝑃 𝑈 𝐼𝐷𝑗 ,𝐸𝑗 )𝑟𝑖,𝑗
)
𝑚
= ∑
𝑒(𝑟𝑗 𝑃 , 𝜔 ∈𝑆 (𝑝(𝜔𝑖 )𝛥𝜔𝑖 ,𝑆 (0))𝐻1 (𝑃 𝑈 𝐼𝐷𝑗 ))𝑒(𝑟𝑗 𝑃 ,𝑥𝑃 𝑈 𝐼𝐷𝑗 𝐸𝑗 )
𝑖
∑ ∑
𝑒(𝑠𝑃 , 𝜔 ∈𝑆 (𝑔(𝜔𝑖 )𝛥𝜔𝑖 ,𝑆 (0))𝐻1 (𝑃 𝑈 𝐼𝐷𝑗 ))𝑒(𝑥𝑃 𝑈 𝐼𝐷𝑗 𝑃 , 𝜔 ∈𝑆 (𝑔(𝜔𝑖 )𝛥𝜔𝑖 ,𝑆 (0))𝐸𝑗 )
𝑖 𝑖
𝑚
= =𝑚
𝑒(𝑟𝑗 𝑃 ,𝑠𝐻1 (𝑃 𝑈 𝐼𝐷𝑗 ))𝑒(𝑟𝑗 𝑃 ,𝑥𝑃 𝑈 𝐼𝐷𝑗 𝐸𝑗 )
𝑒(𝑠𝑃 ,𝑟𝑗 𝐻1 (𝑃 𝑈 𝐼𝐷𝑗 ))𝑒(𝑥𝑃 𝑈 𝐼𝐷𝑗 𝑃 ,𝑟𝑗 𝐸𝑗 )
For the re-encrypted ciphertext 𝐶𝑇 = (𝑈2 , 𝑉2 ):
𝑉2
𝑚= ∏
𝛥𝜔𝑖 ,𝑆 (0)
𝜔𝑖 ∈𝑆 𝑒(𝑈2 , 𝐻3 (𝜑 ∥ 𝑥𝑃 𝑈 𝐼𝐷𝑗 𝑃 𝐾𝑃 𝑈 𝐼𝐷𝑗 ∥ 𝜔 ∥ 𝜔 ))
𝑚 𝜔𝑖 ∈𝑆 (𝑒(𝑃𝑝𝑢𝑏 , 𝐻1 (𝑃 𝑈 𝐼𝐷𝑗 ))𝑟𝑖,𝑗 × 𝑒(𝑃 𝐾𝑃 𝑈 𝐼𝐷𝑗 , 𝐸𝑗 )𝑟𝑖,𝑗 )𝛥𝜔𝑖 ,𝑆 (0) 𝑒(𝑈1 , 𝑅𝐾 ,𝜔, )
= ∏ 𝛥𝜔𝑖 ,𝑆 (0)
𝜔𝑖 ∈𝑆 𝑒(𝑈2 , 𝐻3 (𝜑 ∥ 𝑥𝑃 𝑈 𝐼𝐷 𝑃 𝐾𝑃 𝑈 𝐼𝐷𝑗 ∥ 𝜔 ∥ 𝜔 ))
𝑗
𝑚𝑒(𝑠𝑃 , 𝑟𝑗 𝐻1 (𝑃 𝑈 𝐼𝐷𝑗 ))𝑒(𝑥𝑃 𝑈 𝐼𝐷𝑗 𝑃 , 𝑟𝑗 𝐸𝑗 )𝑒(𝑟𝑗 𝑃 , 𝐷𝑃 𝑈 𝐼𝐷𝑗 𝑥𝑃 𝑈 𝐼𝐷𝑗 𝐸𝑗 + 𝐻3 (𝜑 ∥ 𝑥𝑃 𝑈 𝐼𝐷𝑗 𝑃 𝐾𝑃 𝑈 𝐼𝐷 ∥ 𝜔 ∥ 𝜔′ ))
𝑗
=
𝑒(𝑟𝑗 𝑃 , 𝐻3 (𝜑 ∥ 𝑥𝑃 𝑈 𝐼𝐷 𝑃 𝐾𝑃 𝑈 𝐼𝐷𝑗 ∥ 𝜔 ∥ 𝜔′ ))
𝑗
𝑚𝑒(𝑟𝑗 𝑃 , 𝐷𝑃 𝑈 𝐼𝐷𝑗 + 𝑥𝑃 𝑈 𝐼𝐷𝑗 𝐸𝑗 )𝑒(𝑟𝑗 𝑃 , 𝐷𝑃 𝑈 𝐼𝐷𝑗 𝑥𝑃 𝑈 𝐼𝐷𝑗 𝐸𝑗 )𝑒(𝑟𝑗 𝑃 , 𝐻3 (𝜑 ∥ 𝑥𝑃 𝑈 𝐼𝐷𝑗 𝑃 𝐾𝑃 𝑈 𝐼𝐷 ∥ 𝜔 ∥ 𝜔′ )))
𝑗
= =𝑚
𝑒(𝑟𝑗 𝑃 , 𝐻3 (𝜑 ∥ 𝑥𝑃 𝑈 𝐼𝐷 𝑃 𝐾𝑃 𝑈 𝐼𝐷𝑗 ∥ 𝜔 ∥ 𝜔′ ))
𝑗
(1) For the original ciphertext 𝐶𝑇 , sender 𝑗 can get the plaintext by  restores the corresponding record and returns 𝐻1 (𝑃 𝑈 𝐼𝐷)
computing = (1𝑖 )𝑛𝑖=1 to 1 . Otherwise, for this tuple,  considers the
𝑉1 following two cases:
𝑚= ∏
𝛥𝜔𝑖 ,𝑆 (0)
𝜔𝑖 ∈𝑆 𝑒(𝑈1 , 𝐷𝑃 𝑈 𝐼𝐷𝑗 + 𝑥𝑃 𝑈 𝐼𝐷𝑗 𝐸𝑗 ) Case 1: If |𝜔 ∩ 𝜔𝜋 | ≥ 𝑑,  randomly selects a polyno-
mial 𝑡(𝑥) of degree 𝑑 1 such as 𝑡(0) = , and returns
(2) For the re-encrypted ciphertext 𝐶𝑇 , only authorized receivers to 1 . Then,  saves the tuple (𝑃 𝑈 𝐼𝐷, , ⟂, ⟂) in the
can successfully obtain the data. 𝐿1 .
𝑉2 Case 2: If |𝜔 ∩ 𝜔𝜋 | < 𝑑,  need to selects 𝛼𝑢 ∈ {0, 1} at
𝑚= ∏
𝛥𝜔𝑖 ,𝑆 (0) random, where the probability of 𝛼𝑢 = 1 is 𝛾.
𝜔𝑖 ∈𝑆 𝑒(𝑈2 ,𝐻3 (𝜑 ∥ 𝑥𝑃 𝑈 𝐼𝐷𝑗 𝑃 𝐾𝑃 𝑈 𝐼𝐷𝑗 ∥ 𝜔 ∥ 𝜔 ))
(1) When 𝛼𝑢 = 0,  chooses a random number
𝑧𝑖𝑍𝑞 , a polynomial 𝑦(𝑥) of degree 𝑑 1,
6. Security analysis 𝑦(0) = 𝑧. Let 𝑧𝑖 = 𝑦(𝜔𝑖 ), where 𝑖 = {1, … , 𝑛},
 calculates 𝐻1 (𝑃 𝑈 𝐼𝐷) = 𝑧𝑖 𝑐𝑃 , and saves tuple
6.1. Security proof for FCL-PRE (𝑃 𝑈 𝐼𝐷, 𝑧𝑖 𝑐𝑃 , (𝑧𝑖 )𝑛𝑖=1 , 0) in the 𝐿1 .
(2) When 𝛼𝑢 = 1,  selects 𝑧𝑍𝑞 , outputs
Theorem 1. If adversary 1 breaks FCL-PRE with a non-negligible advan- 𝐻1 (𝑃 𝑈 𝐼𝐷) = 𝑧 𝑃 and saves tuple (𝑃 𝑈 𝐼𝐷, 𝑧 𝑃 ,
tage 𝜀, we can construct an algorithm  that solves the DBDH assumption 𝑧 , 1) in the 𝐿1 .
in polynomial time with an advantage 𝜀′ .
Proof. Given a set of challenge instance (𝑃 , 𝑎𝑃 , 𝑏𝑃 , 𝑐𝑃 , 𝑇 ),  acts as
𝐻2 Query:  maintains an initially empty list of the form
a subroutine of the adversary 1 and attempts to determine whether
𝐿2 (𝑃 𝑈 𝐼𝐷, 𝑡𝑖 , 𝑌𝑖 ). When 1 makes a query, if 𝑃 𝑈 𝐼𝐷 already
𝑇 = 𝑒(𝑃 , 𝑃 )𝑎𝑏𝑐 . Therefore,  needs to answer a series of inquiries from
exists in the 𝐿2 ,  answers with 𝑌𝑖 , otherwise it randomly
1 .
selects 𝑡𝑖𝑍𝑞 , calculates 𝑌𝑖 = 𝑡𝑖 𝑃 and adds the tuple
∙ Initialization. By executing Setup algorithm,  gets 𝑝𝑎𝑟𝑎𝑚𝑠 = (𝑃 𝑈 𝐼𝐷, 𝑡𝑖 , 𝑌𝑖 ) to the 𝐿2 .
{G, G𝑇 , 𝑞, 𝑒, 𝑑, 𝑃 , 𝑃𝑝𝑢𝑏 , 𝐻1 , 𝐻2 , 𝐻3 }. Then,  sets 𝑃𝑝𝑢𝑏 = 𝑎𝑃 , and 𝑎 𝐻3 Query:  maintains an initially empty list of the form
is the master key, which is unknown to . 𝐿3 (𝑋 , 𝐻 ). If 𝑋 is in the list 𝐿3 ,  returns 𝐻 to 1 .
Otherwise,  uniformly selects an element 𝐻 ∈ G, returns
𝐻1 Query:  maintains an initially empty list of the form it and records the pair (𝑋 , 𝐻 ) in 𝐿3 .
𝐿1 (𝑃 𝑈 𝐼𝐷, (1𝑖 )𝑛𝑖=1 , (𝑧𝑖 )𝑛𝑖=1 , 𝛼𝑢 ), 1 publishes 𝑃 𝑈 𝐼𝐷 for
query.  first chooses 𝜋 ∈ {1, 2, … , 𝑞𝐻1 } and defines 𝑃 𝑈 𝐼𝐷𝜋 ∙ Phase 1. For a series of inquiries raised by 1 ,  answers as
as the challenge identity. If 𝑃 𝑈 𝐼𝐷 already exists in the 𝐿1 , follows.
7
J. Chen et al. Computer Standards & Interfaces 97 (2026) 104121
PPKQuery oracle 𝑝𝑝𝑘 : 1 publishes an identity 𝑃 𝑈 𝐼𝐷 for 𝑃 𝑈 𝐼𝐷𝜋 ,  fails in this game. Otherwise,  randomly selects a
query,  maintains a list of the form 𝐿𝑝𝑝𝑘 (𝑃 𝑈 𝐼𝐷, 𝐷𝑃 𝑈 𝐼𝐷 ) message 𝑚𝑏 , where 𝑏 ∈ {0, 1}, calculates the ciphertext 𝐶𝑇𝑏 =
as the answer to 1 . If 𝑃 𝑈 𝐼𝐷 already exists in the 𝐿𝑝𝑝𝑘 , (𝑈𝑏 , 𝑉𝑏 ) = (𝑏𝑃 , 𝑚𝑏 𝜔𝑖 ∈𝑆 𝑒(𝑃 𝐾𝑃 𝑈 𝐼𝐷𝜋 , 𝑡𝑖 𝑏𝑃 )𝑇 𝛥𝜔𝑖 ,𝑆 (0) ) and sends 𝐶𝑇𝑏
 first performs the 𝐻1 Query in the above steps to obtain to 1 .
𝐻1 (𝑃 𝑈 𝐼𝐷). Otherwise,  finds the tuple in the 𝐿1 : ∙ Phase 2. Adversary 1 initiates a series of queries similar to
Case1: If |𝜔 ∩ 𝜔𝜋 | ≥ 𝑑, the challenger  aborts and Phase 1, and  responds accordingly. Please note that the queries
outputs fault. issued by 1 in this phase must comply with the constraints in
Case2: If |𝜔 ∩ 𝜔𝜋 | < 𝑑,  randomly selects a polyno- the security model.
mial 𝑝(𝑥) of degree 𝑑 1, 𝑝(0) = 𝑎, let 𝑝(𝜔𝑖 ) = 𝑎𝑖 , where ∙ Guess. Once the adversary 1 provides a guess 𝑏 ∈ {0, 1} for the
𝑖 ∈ {1, … , 𝑛}.  returns 𝑧𝑖 𝑎𝑃 to 1 , and saves tuple challenge bit,  outputs 1 if 𝑏 = 𝑏 and 0 otherwise. □
(𝑃 𝑈 𝐼𝐷, (𝐷𝑃 𝑈 𝐼𝐷 )) in the 𝐿𝑝𝑝𝑘 .
Theorem 2. If adversary 2 breaks FCL-PRE with a non-negligible advan-
PKQuery oracle 𝑝𝑘 : 1 publishes an identity 𝑃 𝑈 𝐼𝐷 for tage 𝜀, we can construct an algorithm  that solves the DBDH assumption
query,  maintains a list of the form 𝐿𝑝𝑢𝑏 (𝑃 𝑈 𝐼𝐷, 𝑃 𝐾𝑃 𝑈 𝐼𝐷 , in polynomial time with an advantage 𝜀′ .
(𝑥𝑖,𝑃 𝑈 𝐼𝐷 )𝑛𝑖=1 ) as the answer to 1 . If 𝑃 𝑈 𝐼𝐷 already exists in
the 𝐿𝑝𝑢𝑏 ,  restores the corresponding record and returns
Proof. Similar to the Theorem 1, therefore, only their main differences
𝑃 𝐾𝑃 𝑈 𝐼𝐷 to 1 . Otherwise,  randomly selects 𝑥𝑗𝑍𝑞 ,
are presented below.
a polynomial 𝑦(𝑥) of degree 𝑑 1, 𝑦(0) = 𝑥𝑗 , let 𝑦(𝜔𝑖 ) =
𝑥𝑖,𝑃 𝑈 𝐼𝐷 , where 𝑖 ∈ {1, … , 𝑛}. In this case, we suppose that ∙ Initialization.  returns the 𝑝𝑎𝑟𝑎𝑚𝑠 and 𝑚𝑠𝑘 = 𝑠 to 2 . It should
𝑥𝑃 𝑈 𝐼𝐷 = (𝑥𝑖,𝑃 𝑈 𝐼𝐷 )𝑛𝑖=1 while  calculates 𝑃 𝐾𝑃 𝑈 𝐼𝐷 = 𝑥𝑃 𝑈 𝐼𝐷 𝑃 , be noted that 2 represents the KGC, which has access to the
and returns it to 1 . Finally,  maintains (𝑃 𝑈 𝐼𝐷, 𝑃 𝐾𝑃 𝑈 𝐼𝐷 , partial private key and is computed by challenger . Therefore,
(𝑥𝑖,𝑃 𝑈 𝐼𝐷 )𝑛𝑖=1 ) in 𝐿𝑝𝑢𝑏 . in this case, there is no need to simulate the PartialPrivateKey
PK replacement oracle 𝑝𝑘𝑟𝑝 : When 1 queries the tuple algorithm as well as the hash function 𝐻1 . Next,  randomly
(𝑃 𝑈 𝐼𝐷, 𝑃 ̃
𝐾𝑃 𝑈 𝐼𝐷 ), if 𝑃 𝑈 𝐼𝐷 has not been queried for the chooses an integer 𝑟 ∈ [1, 𝑞𝐻2 ] and to the queries raised by 2 , 
public key,  generates a public key query on 𝑃 𝑈 𝐼𝐷 to answers as follows:
obtain 𝑃 ̃𝐾𝑃 𝑈 𝐼𝐷 and records (𝑃 𝑈 𝐼𝐷, 𝑃 ̃ 𝐾𝑃 𝑈 𝐼𝐷 , ⟂) in 𝐿𝑝𝑢𝑏 .
Otherwise,  maintains (𝑃 𝑈 𝐼𝐷, 𝑃 ̃ 𝐾𝑃 𝑈 𝐼𝐷 , ⟂) in 𝐿𝑝𝑢𝑏 . 𝐻2 Query: When 2 queries the existing 𝑃 𝑈 𝐼𝐷 in 𝐿2 , 
SKQuery oracle 𝑠𝑘 : 1 publishes an identity 𝑃 𝑈 𝐼𝐷 for will respond with 𝑌𝑖 , otherwise it considers the following
query,  maintains a list of the form 𝐿𝑠𝑘 (𝑃 𝑈 𝐼𝐷, 𝑆𝐾𝑃 𝑈 𝐼𝐷 ) two situations:
as the answer to 1 . If 𝑃 𝑈 𝐼𝐷 has already queried, 
restores the corresponding record and returns 𝑆𝐾𝑃 𝑈 𝐼𝐷 to Case 1: If 𝑗 = 𝑟,  computes 𝐻2 (𝑃 𝑈 𝐼𝐷𝑗𝑃 𝐾𝑃 𝑈 𝐼𝐷𝑗
1 , otherwise,  considers the following two cases: 𝑃𝑝𝑢𝑏 ) = 𝑐𝑃 and returns it to 2 .
Case 2: If 𝑗𝑟,  randomly selects 𝑡𝑖𝑍𝑞 , and
Case 1: If |𝜔 ∩ 𝜔𝜋 | ≥ 𝑑,  aborts and outputs fault. calculates 𝑌𝑖 = 𝑡𝑖 𝑃 , then  returns it to 2 . Finally,
Case 2: If |𝜔 ∩ 𝜔𝜋 | < 𝑑,  returns the 𝑆𝐾𝑃 𝑈 𝐼𝐷 to 1  adds the tuple (𝑃 𝑈 𝐼𝐷, 𝑡𝑖 , 𝑌𝑖 ) to 𝐿2 .
and saves tuple (𝑃 𝑈 𝐼𝐷, 𝐷𝑃 𝑈 𝐼𝐷 , 𝑥𝑃 𝑈 𝐼𝐷 ) in the 𝐿𝑠𝑘 .
∙ Phase 1. For a series of inquiries raised by 2 ,  answers as
ReKeyGen oracle 𝑟𝑘 :  first searches whether tuple follows.
(𝑃 𝑈 𝐼𝐷, 𝑃 𝑈 𝐼𝐷 , 𝑅𝐾 ,𝜔, ) exists in the 𝐿𝑟 𝑘. If so,  returns
𝑅𝐾 ,𝜔, to 1 . Otherwise, we suppose that 1 has con- PKQuery oracle 𝑝𝑘 : 2 publishes an identity 𝑃 𝑈 𝐼𝐷 for
ducted the above series of queries when querying the ROM, query,  first selects 𝜋 ∈ [1, 𝑞𝑝𝑢𝑏 ] randomly, and defines
so when |𝜔 ∩ 𝜔𝜋 | ≥ 𝑑,  will follow the steps below: 𝑃 𝑈 𝐼𝐷𝜋 as the challenge identity.
Case 1: When 𝛼1 = 1,  follows the above steps Case 1: If 𝑃 𝑈 𝐼𝐷 has been queried,  restores the
to obtain 𝑃 𝑈 𝐼𝐷s publicprivate key pair (𝑆𝐾𝑃 𝑈 𝐼𝐷 , corresponding record and returns 𝑃 𝐾𝑃 𝑈 𝐼𝐷 = 𝑥𝑃 𝑈 𝐼𝐷 𝑃
𝑃 𝐾𝑃 𝑈 𝐼𝐷 ), and the public key 𝑃 𝐾𝑃 𝑈 𝐼𝐷 of 𝑃 𝑈 𝐼𝐷 . to 2 .
Then,  calculates 𝜑 = 𝑒(𝐷𝑃 𝑈 𝐼𝐷 , 𝐻1 (𝑃 𝑈 𝐼𝐷 )), and the
Case 2: If 𝑃 𝑈 𝐼𝐷 has not been queried, then  consid-
re-encryption key 𝑅𝐾 ,𝜔, = 𝐷𝑃 𝑈 𝐼𝐷𝑗 𝑥𝑃 𝑈 𝐼𝐷𝑗 𝐸𝑗 +
ers the following scenario:
𝐻3 (𝜑 ∥ 𝑥𝑃 𝑈 𝐼𝐷𝑗 𝑃 𝐾𝑃 𝑈 𝐼𝐷 ∥ 𝜔 ∥ 𝜔′ ).
𝑗
Case 2: When 𝛼1 = 0 and 𝛼2 = 1,  response fails. (1) If |𝜔 ∩ 𝜔𝜋 | < 𝑑 and 𝑗 ≠ 𝜋,  selects a ran-
Case 3: When 𝛼1 = 0 and 𝛼2 = 0,  randomly selects dom number 𝑥𝑖,𝑃 𝑈 𝐼𝐷𝑍𝑞 , a polynomial 𝑦(𝑥)
𝑗
𝑅𝐾 ,𝜔, ∈ G and returns to 1 . of degree 𝑑 1, 𝑦(0) = 𝑥𝑖,𝑃 𝑈 𝐼𝐷 , let 𝑦(𝜔𝑖 ) =
𝑗
𝑥𝑖,𝑃 𝑈 𝐼𝐷 , where 𝑖 ∈ {1, … , 𝑛}. Next,  calculates
𝑗
Re-encryption oracle 𝑟𝑒𝑒𝑛 : Suppose that the public key of 𝑃 𝐾𝑃 𝑈 𝐼𝐷 = 𝑥𝑃 𝑈 𝐼𝐷 𝑃 , and returns it to 2 . Finally,
𝑃 𝑈 𝐼𝐷 has not been replaced, the original ciphertext 𝐶𝑇 =
 saves the tuple (𝑃 𝑈 𝐼𝐷, (𝑥𝑖,𝑃 𝑈 𝐼𝐷𝑗 )𝑛𝑖=1 , 𝑃 𝐾𝑃 𝑈 𝐼𝐷 )
(𝑈1 , 𝑉1 ) at this time.
to 𝐿𝑝𝑢𝑏 .
Case 1: If |𝜔 ∩ 𝜔𝜋 | < 𝑑,  aborts and outputs fault. (2) If |𝜔 ∩ 𝜔𝜋 | ≥ 𝑑 and 𝑗 = 𝜋,  calculates 𝑃 𝐾𝑃 𝑈 𝐼𝐷 =
Case 2: If |𝜔 ∩ 𝜔𝜋 | ≥ 𝑑,  considers the following two 𝑎𝑃 , and returns it to the adversary 2 . Finally, 
cases: maintains the tuple (𝑃 𝑈 𝐼𝐷𝜋 , (𝑥𝑖,𝑃 𝑈 𝐼𝐷𝑗 )𝑛𝑖=1 ,
𝑃 𝐾𝑃 𝑈 𝐼𝐷 ) to the 𝐿𝑝𝑢𝑏 .
(1) If 𝛼𝑢 = 1,  aborts and outputs fault.
(2) If 𝛼𝑢 = 0,  re-encrypts the 𝐶𝑇 into 𝐶𝑇 = SKQuery oracle 𝑠𝑘 :  considers the following two cases:
(𝑈1 , 𝑉1 𝑒(𝑈1 , 𝑅𝐾 ,𝜔, )) and sends it to 1 .
Case 1: If 𝑃 𝑈 𝐼𝐷 has been queried,  restores the
corresponding record and returns 𝑆𝐾𝑃 𝑈 𝐼𝐷 to 2 .
∙ Challenge. 1 outputs 𝑃 𝑈 𝐼𝐷𝜋 and two messages of equal length Case 2: If 𝑃 𝑈 𝐼𝐷 has not been queried,  considers the
(𝑚0 , 𝑚1 ). If the flag variable 𝛼𝑢 ≠ 0 of the challenge identity following scenario:
8
J. Chen et al. Computer Standards & Interfaces 97 (2026) 104121
(1) If |𝜔 ∩ 𝜔𝜋 | < 𝑑 and 𝑗𝑟,  makes sure 7. Performance evaluation
that 2 has performed PKQuery and all hash
queries. Then,  calculates 𝐷𝑃 𝑈 𝐼𝐷 and returns This section provides a systematic performance evaluation of FCL-
the 𝑆𝐾𝑃 𝑈 𝐼𝐷 = (𝐷𝑃 𝑈 𝐼𝐷 , 𝑥𝑃 𝑈 𝐼𝐷 ) to 2 , while PRE and other related schemes from both theoretical and experimental
saving the tuple (𝑃 𝑈 𝐼𝐷, 𝐷𝑃 𝑈 𝐼𝐷 , 𝑥𝑃 𝑈 𝐼𝐷 ) in the perspectives. First, we built an experimental system on Ubuntu 20.10,
𝐿𝑠𝑘 . using Python 3.10 and Sagemath 9.8, setting the security parameter to
(2) If |𝜔 ∩ 𝜔𝜋 | ≥ 𝑑 and 𝑗 = 𝑟,  aborts and outputs 𝜆 = 256. The chosen elliptic curve 𝐸𝐹𝑝 is defined by the simplified
fault. Weierstrass equation 𝑦2 = 𝑥3 + 𝑎𝑥 + 𝑏.
ReKeyGen oracle 𝑟𝑘 : For the re-encryption key queries
7.1. Theoretical analysis
of 𝑃 𝑈 𝐼𝐷 and 𝑃 𝑈 𝐼𝐷 , when |𝜔 ∩ 𝜔𝜋 | ≥ 𝑑,  makes the
following answer:
Table 3 compares the number of modular exponentiations, scalar
(1) If 𝑗𝑟, the challenger  outputs the re-encryption multiplications, and bilinear pairings for FCL-PRE, YDKR21 [43],
key 𝑅𝐾 ,𝜔, = 𝐷𝑃 𝑈 𝐼𝐷𝑗 𝑥𝑃 𝑈 𝐼𝐷𝑗 𝐸𝑗 + 𝐻3 (𝜑 ∥ 𝑥𝑃 𝑈 𝐼𝐷𝑗 FLWL24 [24], and ZZYL20 [44], to assess the computational overhead
𝑃 𝐾𝑃 𝑈 𝐼𝐷 ∥ 𝜔 ∥ 𝜔′ ). at different stages. All three references adopt CL-PRE in data-sharing
𝑗
scenarios. In the following, we focus on the major computational
(2) If 𝑗 = 𝑟 and the private key of 𝑃 𝑈 𝐼𝐷 has been
overhead on the sender side 𝑗 .
queried,  responds with failure.
Encryption: The efficiency ranking is YDKR21 [43] < FLWL24 [24]
(3) If 𝑗 = 𝑟 and the private key of 𝑃 𝑈 𝐼𝐷 has not been
< Ours < ZZYL20 [44]. Since biometric characteristic 𝑏𝑖𝑜 inevitably
queried,  randomly selects 𝑅𝐾 ,𝜔, ∈ G as the
contains noise during collection, FCL-PRE binds each registered users
answer and returns it to 2 . pseudo-identity to an attribute set {𝜔}𝑛𝑖=1 . Consequently, during encryp-
∙ Challenge. 2 outputs 𝑃 𝑈 𝐼𝐷𝜋 and two messages of equal length tion, 𝑗 must bind attribute fragments to the message, ensuring both
(𝑚0 , 𝑚1 ). If the challenge identity 𝑃 𝑈 𝐼𝐷𝜋 ≠ 𝑃 𝑈 𝐼𝐷𝑟 ,  fails in this data confidentiality and system error tolerance.
game. Otherwise,  randomly selects a message 𝑚𝑏 , where 𝑏 ∈ ReKey Generation: The efficiency ranking is YDKR21 [43] <
∏ ZZYL20 [44] < Ours < FLWL24 [24]. In FCL-PRE, users are allowed
{0, 1}, calculates the ciphertext 𝐶𝑇𝑏 = (𝑈𝑏 , 𝑉𝑏 ) = (𝑏𝑃 , 𝑚𝑏 𝜔𝑖 ∈𝑆
𝑒(𝑏𝑃 , 𝑠𝐻1 (𝑃 𝑈 𝐼𝐷𝜋 ))𝑇 𝛥𝜔𝑖 ,𝑆 (0)
) and sends 𝐶𝑇𝑏 to 2 . □ to omit or update some attributes during key generation, eliminating
the extra computational overhead associated with regenerating public
private key pairs. Moreover, even if the proxy CPS colludes with the
6.2. Security properties of FCL-PRE receiver, it cannot deduce the users real identity from the re-encryption
key.
• Confidentiality. According to the above security proof, the pro- Decrypt1: The efficiency ranking is ZZYL20 [44] < YDKR21 [43]
posed FCL-PRE scheme satisfies IND-CPA secure in the random < FLWL24 [24] = Ours. Compared to ZZYL20 [44] and YDKR21 [43],
oracle model and holds under the DBDH assumption. In addition, FCL-PRE improves the decryption efficiency on the sender side 𝑗 by
before re-encryption, the proxy CPS needs to authenticate regis- 40.57% and 44.6%, respectively, significantly reducing computational
tered users, and re-encryption is only allowed when the original burden.
ciphertext meets a certain condition, which further enhances the In summary, by integrating certificateless encryption with secret
confidentiality of the scheme. sharing technology, FCL-PRE enhances user privacy and system error
• Anonymity. FCL-PRE converts each users real biometric identity tolerance while effectively addressing the stringent privacy require-
𝑈 𝐼𝐷𝑗 into a pseudo-identity 𝑃 𝑈 𝐼𝐷𝑗 = 𝐸𝑛𝑐𝜙 (𝑈 𝐼𝐷𝑗𝑛𝑃 𝑈 𝐼𝐷𝑗 ) ∥ 𝑇𝑗 ments in cloud-based data-sharing scenarios.
through a symmetric encryption algorithm for hiding. Therefore,
if an adversary wishes to obtain 𝑈 𝐼𝐷𝑗 , he/she must first acquire 7.2. Experimental analysis
the symmetric key 𝜙. However, in our scheme, only a trusted TA
can extract 𝜙, thereby ensuring the anonymity of the users real Computational overhead. To ensure the objectivity and accuracy
identity. of our results, we excluded the Setup algorithm from the experiment,
• Error tolerance. We employ secret sharing technology to divide as it is executed only once and has a negligible impact on the user
the system master key 𝑠 and the secret value 𝑥𝑃 𝑈 𝐼𝐷𝑗 into 𝑛 encryption experience. For the remaining algorithms, each was exe-
independent components. Based on these components, the sender cuted 100 times, and the average execution time was recorded. Fig.
𝑗 generates the final complete private key and the corresponding 4 reports the execution time of all main stages in our scheme as a
ciphertext. In the verification phase, the ciphertext can be re- function of the number of receivers/messages. Specifically, Fig. 4(a)(c)
encrypted if the attribute set contains at least 𝑑 valid attributes. show the sender-side costs, including Encryption time, ReKey Gen-
Here, 𝑑 is defined as an error tolerance parameter, so as to achieve eration time, and Decrypt1 time, respectively. Fig. 4(d) presents the
the systems error tolerance and enhance its robustness. Re-encryption time at the cloud proxy server, while Fig. 4(e) depicts
• Collusion Resistance. Given the commercial nature of cloud ser- the Decrypt2 time at the authorized receiver. Fig. 4(f) summarizes
vice providers, a potential risk arises that they may collude the total computational overhead across all parties. As the number
with the receiver 𝑗 to acquire 𝑗 s private key 𝑆𝐾𝑃 𝑈 𝐼𝐷𝑗 = of receivers/messages increases, all stages exhibit an approximately
(𝐷𝑃 𝑈 𝐼𝐷𝑗 , 𝑥𝑃 𝑈 𝐼𝐷𝑗 ). However, under the threshold secret sharing, linear growth. Our FCL-PRE scheme consistently incurs lower decryp-
collusion between 𝑗 and CPS is infeasible. First, 𝑗 s full private tion time, re-encryption time, and overall computational cost than the
key consists of a partial private key 𝐷𝑃 𝑈 𝐼𝐷𝑗 and a secret value compared schemes, as illustrated in Fig. 4(c), (d), and (f). These results
𝑥𝑃 𝑈 𝐼𝐷𝑗 , both of which are divided into 𝑛 components. This means demonstrate that FCL-PRE achieves better efficiency and scalability,
that at least 𝑡 attribute shards must be obtained to recover one particularly in multi-receiver settings.
of the keys. Second, even if the colluder obtains 𝑥𝑃 𝑈 𝐼𝐷𝑗 , they Communication overhead. Table 3 compares the communication
cannot deduce the senders partial private key 𝐷𝑃 𝑈 𝐼𝐷𝑗 , because overhead of YDKR21 [43], FLWL24 [24], ZZYL20 [44], and our pro-
𝐷𝑃 𝑈 𝐼𝐷𝑗 = 𝑠𝐻1 (𝑃 𝑈 𝐼𝐷𝑗 ), where 𝑠 is the master key. Since the posed scheme. The storage and transmission overheads of the data
master key 𝑠 is unknown to the colluder, they cannot calculate sender and cloud proxy server, including the original ciphertext, re-
𝐷𝑃 𝑈 𝐼𝐷𝑗 . encryption key, and re-encrypted ciphertext, are discussed in detail.
9
J. Chen et al. Computer Standards & Interfaces 97 (2026) 104121
Table 3
Comparison of cryptographic operations of related schemes.
Scheme Computational cost Communication cost
Encryption ReKeyGen Re-encryption Decrypt1 Decrypt2 CT1 CT2 ReKey
YDKR21 [43] 𝑇𝑝 + 8𝑇𝑒 6𝑇𝑒 2𝑇𝑝 + 2𝑇𝑒 𝑇𝑝 + 𝑇𝑒 𝑇𝑝 + 2𝑇𝑒 3|G| + 2|G𝑇 | 4|G| + 2|G𝑇 | 6|G| + 4|𝑍𝑞 |
FLWL24 [24] 𝑇𝑝 + 3𝑇𝑒 2𝑇𝑒 2𝑇𝑝 𝑇𝑝 2𝑇𝑒 2|G| + |G𝑇 | 3|G𝑇 | |G|
ZZYL20 [44] 2𝑇𝑒 + 𝑇𝑠𝑚 𝑇𝑝 + 3𝑇𝑒 + 𝑇𝑠𝑚 𝑇𝑝 𝑇𝑝 + 𝑇𝑒 + 𝑇𝑠𝑚 𝑇𝑝 + 𝑇𝑒 + 𝑇𝑠𝑚 2|G| + |𝑍𝑞 | 2|G| + |𝑍𝑞 | |𝑍𝑞 |
Ours 2𝑇𝑝 + 𝑇𝑒 + 2𝑇𝑠𝑚 𝑇𝑝 + 𝑇𝑒 𝑇𝑝 𝑇𝑝 2𝑇𝑝 |G| + |G𝑇 | + |𝑍𝑞 | |G| + |G𝑇 | |G| + 2|𝑍𝑞 |
(a) Execution time of Encryption. (b) Execution time of ReKey Genera- (c) Execution time of Decrypt1.
tion.
(d) Execution time of Re-encryption. (e) Execution time of Decrypt2. (f) Total execution time.
Fig. 4. The execution time of each phase.
(a) Original ciphertext. (b) Re-encrypted ciphertext. (c) Re-encryption key.
Fig. 5. Communication overhead comparison.
Sender side: Regarding the transmission of the original ciphertext, which may lead to a potential risk of key misuse. As we can see in Fig.
our proposed scheme and ZZYL20 [44] achieve the lowest commu- 5(c), FCL-PRE requires only KB level for storage, making it well-suited
nication cost, as shown in Fig. 5(a). Although our scheme incurs for resource-constrained mobile devices without imposing a significant
slightly higher communication overhead for the transmission of the burden on the sender side.
re-encryption key compared to ZZYL20 [44], it is worth noting that Cloud proxy server (CPS) side: For the storage of re-encrypted cipher-
ZZYL20 pre-generates and stores the re-encryption key in the cloud, text, our scheme also demonstrates the lowest communication cost, as
10
J. Chen et al. Computer Standards & Interfaces 97 (2026) 104121
shown in Fig. 5(b). Even when the number of designated recipients [5] Matthew Green, Giuseppe Ateniese, Identity-based proxy re-encryption, in: Ap-
is relatively large, i.e., 50 receivers, FCL-PRE requires only 12.5 KB plied Cryptography and Network Security: 5th International Conference, ACNS
2007, Zhuhai, China, June 5-8, 2007, Springer, 2007, pp. 288306.
of communication overhead at the CPS side. It indicates that FCL-
[6] Chunpeng Ge, Willy Susilo, Jiandong Wang, Liming Fang, Identity-based condi-
PRE not only effectively minimizes the clouds communication burden tional proxy re-encryption with fine-grained policy, Comput. Stand. Interfaces 52
but also ensures a flexible and reliable sharing mechanism without (2017) 19.
compromising data security. [7] Hongmei Pei, Peng Yang, Weihao Li, Miao Du, Zhongjian Hu, Proxy re-encryption
for secure data sharing with blockchain in internet of medical things, Comput.
Netw. 245 (2024) 110373.
8. Conclusion [8] Guijiang Liu, Haibo Xie, Wenming Wang, Haiping Huang, A secure and efficient
electronic medical record data sharing scheme based on blockchain and proxy
In this paper, we propose FCL-PRE, a fuzzy certificateless proxy re-encryption, J. Cloud Comput. 13 (1) (2024) 44.
re-encryption scheme that facilitates flexible key management while [9] Anca-Andreea Ivan, Yevgeniy Dodis, Proxy cryptography revisited, in: NDSS,
2003.
ensuring efficient and secure data sharing. By integrating anonymous
[10] Yang Lu, Efficient certificate-based proxy re-encryption scheme for data sharing
biometric recognition, our approach conceals users real identities, in public clouds, KSII Trans. Internet Inf. Syst. (TIIS) 9 (7) (2015) 27032718.
achieving effective conditional privacy and bolstering system error [11] Zhiguang Qin, Hu Xiong, Shikun Wu, Jennifer Batamuliza, A survey of proxy re-
tolerance. Notably, we prevent malicious re-encryption requests by encryption for secure data sharing in cloud computing, IEEE Trans. Serv. Comput.
verifying the signature, while secret sharing technology enhances collu- (2016) 118.
[12] Giuseppe Ateniese, Kevin Fu, Matthew Green, Susan Hohenberger, Improved
sion resistance. Moreover, a formal security analysis under the random proxy re-encryption schemes with applications to secure distributed storage, ACM
oracle model demonstrates that FCL-PRE resists chosen-plaintext at- Trans. Inf. Syst. Secur. (TISSEC) 9 (1) (2006) 130.
tacks. Compared to existing schemes, FCL-PRE significantly reduces [13] Craig Gentry, Certificate-based encryption and the certificate revocation problem,
computational and communication overhead, achieving the lowest total in: International Conference on the Theory and Applications of Cryptographic
Techniques, Springer, 2003, pp. 272293.
computational cost and ciphertext storage overhead. In future work, we
[14] Chul Sur, Youngho Park, Sang Uk Shin, Kyung Hyune Rhee, Changho Seo,
aim to optimize dynamic user revocation and enhance adaptability to Certificate-based proxy re-encryption for public cloud storage, in: 2013 Sev-
real-world cloud environments with more complex access policies. enth International Conference on Innovative Mobile and Internet Services in
Ubiquitous Computing, IEEE, 2013, pp. 159166.
CRediT authorship contribution statement [15] Chunpeng Ge, Zhe Liu, Jinyue Xia, Liming Fang, Revocable identity-based
broadcast proxy re-encryption for data sharing in clouds, IEEE Trans. Dependable
Secur. Comput. 18 (3) (2019) 12141226.
Jiasheng Chen: Writing original draft, Software, Methodology, [16] Jing Zhang, Shuangshuang Su, Hong Zhong, Jie Cui, Debiao He, Identity-based
Investigation, Formal analysis, Conceptualization. Zhenfu Cao: Writing broadcast proxy re-encryption for flexible data sharing in VANETs, IEEE Trans.
review & editing, Supervision, Resources, Funding acquisition. Lian- Inf. Forensics Secur. 18 (2023) 48304842.
[17] Jiguo Li, Xuexia Zhao, Yichen Zhang, Certificate-based conditional proxy re-
gliang Wang: Writing review & editing, Validation, Methodology,
encryption, in: International Conference on Network and System Security,
Formal analysis, Data curation. Jiachen Shen: Validation, Supervision, Springer, 2015, pp. 299310.
Formal analysis. Xiaolei Dong: Validation, Funding acquisition, Formal [18] Jun Shao, Peng Liu, Yuan Zhou, Achieving key privacy without losing CCA
analysis. security in proxy re-encryption, J. Syst. Softw. 85 (3) (2012) 655665.
[19] Jian Weng, Robert H. Deng, Xuhua Ding, Cheng-Kang Chu, Junzuo Lai,
Conditional proxy re-encryption secure against chosen-ciphertext attack, in:
Declaration of competing interest Proceedings of the 4th International Symposium on Information, Computer, and
Communications Security, 2009, pp. 322332.
The authors declare that they have no known competing finan- [20] Cui Li, Rongmao Chen, Yi Wang, Qianqian Xing, Baosheng Wang, REEDS: An
cial interests or personal relationships that could have appeared to efficient revocable end-to-end encrypted message distribution system for IoT,
IEEE Trans. Dependable Secur. Comput. 21 (5) (2024) 45264542.
influence the work reported in this paper.
[21] Shimao Yao, Ralph Voltaire J. Dayot, In-Ho Ra, Liya Xu, Zhuolin Mei, Jiaoli
Shi, An identity-based proxy re-encryption scheme with single-hop conditional
Acknowledgments delegation and multi-hop ciphertext evolution for secure cloud data sharing, IEEE
Trans. Inf. Forensics Secur. 18 (2023) 38333848.
[22] Giuseppe Ateniese, Karyn Benson, Susan Hohenberger, Key-private proxy re-
This work was supported in part by the National Natural Science
encryption, in: Cryptographers Track at the RSA Conference, Springer, 2009,
Foundation of China (Grant No. 62132005, 62172162), in part by pp. 279294.
Shanghai Trusted Industry Internet Software Collaborative Innovation [23] Chengdong Ren, Xiaolei Dong, Jiachen Shen, Zhenfu Cao, Yuanjian Zhou, Clap-
Center, in part by Fundamental Research Funds for the Central Uni- pre: Certificateless autonomous path proxy re-encryption for data sharing in the
versities, in part by Police Integration Computing Key Laboratory of cloud, Appl. Sci. 12 (9) (2022) 4353.
[24] Jingyu Feng, Yue Li, Teng Wang, Shuanggen Liu, A certificateless threshold proxy
Sichuan Province (Grant No. JWRH202401001).
re-encrypted data sharing scheme with cloud-chain collaboration in industrial
internet environments, IEEE Internet Things J. 11 (20) (2024) 3324733268.
Data availability [25] Liqing Chen, Meng Zhang, Jiguo Li, Conditional identity-based broadcast proxy
re-encryption with anonymity and revocation, IEEE Trans. Reliab. 74 (3) (2025)
35733584.
Data will be made available on request.
[26] Liming Fang, Jiandong Wang, Chunpeng Ge, Yongjun Ren, Fuzzy conditional
proxy re-encryption, Sci. China Inf. Sci. 56 (5) (2013) 113.
[27] BaoHong Li, JieFei Xu, YanZhi Liu, Lattice-based fuzzy conditional proxy
References re-encryption, J. Internet Technol. 20 (5) (2019) 13791385.
[28] Binhan Li, Lunzhi Deng, Yiming Mou, Na Wang, Yanli Chen, Siwei Li, A pairing-
[1] Shuzhou Sun, Hui Ma, Zishuai Song, Rui Zhang, WebCloud: Web-based cloud free data sharing scheme based on certificateless conditional broadcast proxy
storage for secure data sharing across platforms, IEEE Trans. Dependable Secur. re-encryption suitable for cloud-assisted IoT, IEEE Internet Things J. 12 (20)
Comput. 19 (3) (2020) 18711884. (2025) 4275442768.
[2] Maithilee Joshi, Karuna P. Joshi, Tim Finin, Delegated authorization framework [29] Yousheng Zhou, Yurong Li, Yuanni Liu, A certificateless and dynamic conditional
for ehr services using attribute-based encryption, IEEE Trans. Serv. Comput. 14 proxy re-encryption-based data sharing scheme for IoT cloud, J. Internet Technol.
(6) (2019) 16121623. 26 (2) (2025) 165172.
[3] Yinbin Miao, Robert H. Deng, Ximeng Liu, Kim-Kwang Raymond Choo, Hongjun [30] Shi Lin, Li Cui, Niu Ke, End-to-end encrypted message distribution system for
Wu, Hongwei Li, Multi-authority attribute-based keyword search over encrypted the Internet of Things based on conditional proxy re-encryption, Sensors 24 (2)
cloud data, IEEE Trans. Dependable Secur. Comput. 18 (4) (2019) 16671680. (2024) 116.
[4] Matt Blaze, Gerrit Bleumer, Martin Strauss, Divertible protocols and atomic proxy [31] Yongjing Zhang, Zhouyang Zhang, Shan Ji, Shenqing Wang, Shitao Huang,
cryptography, in: International Conference on the Theory and Applications of Conditional proxy re-encryption-based key sharing mechanism for clustered
Cryptographic Techniques, Springer, 1998, pp. 127144. federated learning, Electronics 13 (5) (2024) 848.
11
J. Chen et al. Computer Standards & Interfaces 97 (2026) 104121
[32] Chul Sur, Chae Duk Jung, Youngho Park, Kyung Hyune Rhee, Chosen-ciphertext Zhenfu Cao is currently a Distinguished Professor with
secure certificateless proxy re-encryption, in: IFIP International Conference on East China Normal University, China. Since 1981, he has
Communications and Multimedia Security, Springer, 2010, pp. 214232. been published over 400 academic papers in journals or
[33] Sattam S. Al-Riyami, Kenneth G. Paterson, Certificateless public key cryptogra- conferences. His research interests include cryptography,
phy, in: International Conference on the Theory and Application of Cryptology number theory, and information security. He has received
and Information Security, Springer, 2003, pp. 452473. a number of awards, including the Ying-Tung Fok Young
[34] Tarunpreet Bhatia, Anil K. Verma, Gaurav Sharma, Secure sharing of mobile Teacher Award, in 1989, the National Outstanding Youth
personal healthcare records using certificateless proxy re-encryption in cloud, Fund of China, in 2002, and the Special Allowance by
Trans. Emerg. Telecommun. Technol. 29 (6) (2018) e3309. the State Council, in 2005. He was a co-recipient of the
[35] Nabeil Eltayieb, Liang Sun, Ke Wang, Fagen Li, A certificateless proxy re- 2007 IEEE International Conference on Communications
encryption scheme for cloud-based blockchain, in: Frontiers in Cyber Security: Computer Award, in 2007.
Second International Conference, FCS 2019, Xian, China, November 1517,
2019, Proceedings 2, Springer, 2019, pp. 293307.
[36] Emmanuel Ahene, Junfeng Dai, Hao Feng, Fagen Li, A certificateless signcryption Liangliang Wang received the Ph.D. degree from Shanghai
with proxy re-encryption for practical access control in cloud-based reliable smart Jiao Tong University, in 2016. He has published academic
grid, Telecommun. Syst. 70 (2019) 491510. papers in prestigious venues including IEEE Transactions
[37] Amit Sahai, Brent Waters, Fuzzy identity-based encryption, in: Annual Interna- on Dependable and Secure Computing, IEEE Transactions
tional Conference on the Theory and Applications of Cryptographic Techniques, on Vehicular Technology, IEEE Internet of Things Journal,
Springer, 2005, pp. 457473. Knowledge-Based Systems and SCIENCE CHINA Information
[38] Hu Xiong, YaNan Chen, GuoBin Zhu, ZhiGuang Qin, Analysis and improvement Sciences. He is currently an Associate Professor with the
of a provable secure fuzzy identity-based signature scheme, Sci. China Inf. Sci. College of Computer Science and Technology, Shanghai
57 (2014) 15. University of Electric Power. His research interests include
[39] Liangliang Wang, Jiangwei Xu, Baodong Qin, Mi Wen, Kefei Chen, An efficient applied cryptography, information security and privacy
fuzzy certificateless signature-based authentication scheme using anonymous preserving.
biometric identities for VANETs, IEEE Trans. Dependable Secur. Comput. 22 (1)
(2024) 292307. Jiachen Shen received the bachelors degree from Shang-
[40] Dan Boneh, Matt Franklin, Identity-based encryption from the Weil pairing, in: hai Jiao Tong University, Shanghai, China, in 2001, and
Annual International Cryptology Conference, Springer, 2001, pp. 213229. the masters and Ph.D. degrees from the University of
[41] Adi Shamir, How to share a secret, Commun. ACM 22 (11) (1979) 612613. Louisiana at Lafayette, Lafayette, LA, USA, in 2003 and
[42] A. Riyami, Sattam S., K.G. Paterson, Certificateless public key cryptography, in: 2008, respectively. He joined East China Normal University,
Chi-Sung Laih (Ed.), Advances in Cryptology - ASIACRYPT 2003, Springer Berlin Shanghai, China, in 2015. His research interests include
Heidelberg, Berlin, Heidelberg, 2003, pp. 452473. applied cryptography, cloud security, searchable encryption,
[43] Shimao Yao, Ralph Voltaire J. Dayot, Hyung-Jin Kim, In-Ho Ra, A novel revo- and blockchains.
cable and identity-based conditional proxy re-encryption scheme with ciphertext
evolution for secure cloud data sharing, IEEE Access 9 (2021) 4280142816.
[44] Xiaoyu Zheng, Yuyang Zhou, Yalan Ye, Fagen Li, A cloud data deduplication
scheme based on certificateless proxy re-encryption, J. Syst. Archit. 102 (2020)
Xiaolei Dong is currently a Distinguished Professor with
101666.
East China Normal University. She hosts a lot of research
projects supported by the National Basic Research Program
Jiasheng Chen is currently pursuing the Ph.D. degree with of China (973 Program), the National Natural Science
the Department of Cryptography and Cyber Security School Foundation of China, and the Special Funds on Information
of Software Engineering, East China Normal University, Security of the National Development and Reform Commis-
Shanghai, China. Her research interests include applied sion. Her research interests include cryptography, number
cryptography and information security. theory, and trusted computing.
12

View File

@@ -0,0 +1,704 @@
Journal of Systems Architecture 160 (2025) 103348
Contents lists available at ScienceDirect
Journal of Systems Architecture
journal homepage: www.elsevier.com/locate/sysarc
StorStack: A full-stack design for in-storage file systems
Juncheng Hu, Shuo Chen, Haoyang Wei, Guoyu Wang, Chenju Pei, Xilong Che
College of Computer Science and Technology, Jilin University, Chang Chun, 130022, China
ARTICLE INFO ABSTRACT
Keywords: Due to the increasingly significant cost of data movement, In-storage Computing has attracted considerable
File system attention in academia. While most In-storage Computing works allow direct data processing, these methods do
In-storage Computing not completely eliminate the participation of the CPU during file access, and data still needs to be moved from
Storage-class Memory
the file system into memory for processing. Even though there are attempts to put file systems into storage
devices to solve this problem, the performance of the system is not ideal when facing high latency storage
devices due to bypassing the kernel and lacking page cache.
To address the above issues, we propose StorStack, a full-stack, highly configurable in-storage file system
framework, and simulator that facilitates architecture and system-level researches. By offloading the file system
into the storage device, the file system can be closer to the data, reducing the overhead of data movements.
Meanwhile, it also avoids kernel traps and reduces communication overhead. More importantly, this design
enables In-storage Computing applications to completely eliminate CPU participation. StorStack also designs
the user-level cache to maintain performance when storage device access latency is high. To study performance,
we implement a StorStack prototype and evaluate it under various benchmarks on QEMU and Linux. The results
show that StorStack achieves up to 7x performance improvement with direct access and 5.2x with cache.
1. Introduction the design and operation of file systems determine their reliance on
the CPU when accessing the file system. For In-storage Computing,
In traditional computing architectures, data must be transferred although researchers are gradually reducing CPU involvement, current
from storage devices to memory for processing, which not only con- file systems still rely on the CPU to handle complex file management
sumes the computing resources of the host, but also results in high tasks and ensure system security and integrity.
energy consumption and I/O latency. As data scales continue to expand, On the one hand, to reduce the software overhead of file systems,
In-storage Computing has been proposed to alleviate the pressure of
many works aim at the kernel trap. For example, there are some efforts
data movement [1,2]. The core idea is to perform computations directly
to move the file system into user space [813]. But running in user
where the data is stored, without the need to move the data. The
space may compromise the reliability of the file system, hence bugs
emergence of high-speed storage devices like SSDs [3] and SCMs [4,5]
has significantly advanced research in In-storage Computing and trans- or malicious software may cause crashes and data loss. Some of these
formed computer storage systems. To fully leverage the potential of works try to move the critical parts of the file system back to the kernel.
storage systems and exploit the characteristics of this new computing But in most cases, data-plane operations are interleaved with control-
paradigm, a redesign of storage stack software is required. plane operations, which may diminish the performance improvement
As the most essential part of the storage stack software, file systems brought by kernel bypassing. In recent years, firmware file systems
have been residing in the operating system kernel for a very long have been proposed, which move file systems onto the storage device
time because they need to perform integrity assurance and access controller [1416] to completely get rid of the kernel trap. However,
control to ensure data security. The kernel is considered a trusted field those file systems are designed to be strongly coupled with the storage
compared to the user space. However, this seemingly good design has device, making the device lack the replaceability of file system and
been challenged by new technologies. With the emergence of faster the compatibility with conventional operating systems. In addition,
storage devices such as SSDs and SCMs, access latency decreases signif- these firmware file systems do not provide comprehensive security
icantly compared to HDDs [6], leading to the software overhead of file
guarantees.
systems [7,8] becoming a major performance bottleneck. Meanwhile,
Corresponding author.
E-mail addresses: jchu@jlu.edu.cn (J. Hu), chenshuo22@mails.jlu.edu.cn (S. Chen), hywei23@mails.jlu.edu.cn (H. Wei), wgy21@mails.jlu.edu.cn
(G. Wang), peicj2121@mails.jlu.edu.cn (C. Pei), chexilong@jlu.edu.cn (X. Che).
https://doi.org/10.1016/j.sysarc.2025.103348
Received 29 August 2024; Received in revised form 24 November 2024; Accepted 18 January 2025
Available online 27 January 2025
1383-7621/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
J. Hu et al. Journal of Systems Architecture 160 (2025) 103348
On the other hand, to fully leverage the advantages of In-storage 2.1. Hardware trends
Computing, it is necessary to eliminate the participation of host-side
OS from the storage access path. In-storage Computing advocates for a Compared to the large, slow HDD, solid-state drive (SSD) is a kind of
data-centric approach, where computation units are embedded within flash-based non-volatile storage with small form factor, high speed, and
the storage devices to enable direct data processing. However, in the low energy costs [17,18]. SSDs on the market today can provide up to
process of accessing files, traditional file systems still require CPU 30 TB of capacity and 7 GB/s throughput on sequential read/write. To
involvement. To know which data should be transferred next, file access fully exploit the high performance, modern SSDs have switched from
should be first handled by the host-side file system in the operating SATA to PCIe and NVMe. PCIe 5.0 [19] supports up to 16 lanes and 32
system kernel. This CPU intervention limits the computational capacity GT/s data rate, which leads to more than 60GB/s bandwidth. NVMe [3]
improvements that In-storage Computing can offer. is a communication protocol for non-volatile memories attached via
Another point worth noting is that numerous studies propose im- PCIe, supporting up to 65,535 I/O queues each with 65,535 depth. It
proving system performance by allowing user applications to bypass also supports SSD-friendly operations like ZNS and KV, which can help
the kernel and communicate directly with storage devices. This method SSDs further enhance SSDs throughput capabilities.
demonstrates significant performance improvements when dealing with Storage class memory (SCM), also referred to as persistent mem-
high-speed storage devices. However, due to the diversity of storage de- ory (PMEM) or non-volatile memory (NVM), is a different type of
vices and their varying latencies, system performance may suffer when storage device that is fast and byte-addressable like DRAM, but can
bypassing the high-speed cache, especially when using high-latency, also retain data without power like SSDs. Various technologies such as
low-speed storage devices. Therefore, the impact of cache configuration PRAM [20,21], MRAM [22], and ReRAM [23,24]have been explored to
on performance is also a subject of our further research. In summary, implement SCM, each exhibiting different performance characteristics.
despite various attempts to optimize file systems performance and SCM provides higher bandwidth than SSD; it offers latency close to
reduce CPU involvement, current solutions still have several issues. DRAM, and its capacity falls between SSD and DRAM [25]. As a new
To further optimize the performance and security of file systems blood in the storage hierarchy, SCM can provide more possibilities to
and fully unleash the potential of in-storage computing, we propose multiple workloads [2629].
StorStack, which is a full-stack, highly configurable, in memory file Consequently, while the increased bandwidth and reduced latency
system framework and simulator on high-speed storage devices such as of storage devices have substantially boosted the performance of com-
SSDs and SCMs. Since file systems always have a fixed primary func- puter systems and enabled novel application scenarios, these advance-
tionality of managing the data mapping, which is similar in function to ments also introduce several challenges. These challenges include
the flash translation layer (FTL) on the storage controller, we consider it heightened complexity in data management, the need to balance cost
natural and reasonable to run the file system on the storage controller. and efficiency, and issues related to technical compatibility and migra-
StorStack has three main components: a device firmware runtime tion.
for file systems enabling file systems to run directly on the storage
device, a user library to expose POSIX interfaces to user applications, 2.2. In-storage computing
and a kernel driver to guarantee access control. By moving the file sys-
tem into the storage, StorStack aims to gain performance improvement While these new storage devices have significantly altered the
from the concept of In-storage Computing that brings the file system memory hierarchy of computer systems, the memory wall between
closer to the data. Moreover, the file system code is removed from the CPU and off-chip memory is still the bottleneck of the whole system,
kernel, which can avoid the latency and context switches caused by especially with the rise of data-intensive workloads and the slowdown
kernel traps during file access. More importantly, StorStack can remove of Moores law and Dennard scaling. To reduce the overhead of data
the CPU from the storage access path of In-storage Computing appli- movement, In-storage Computing(ISC) [3032]is proposed, gaining
cations, maximizing the potential of In-storage Computing. To ensure increasing attention with advancements in integration technologies.
the security and reliability of the file system, StorStack has designed an However, most current research predominantly focuses on offloading
efficient security mechanism, introducing a device-side controller as the user-defined tasks to storage devices, and this approach still faces
runtime and retaining control plane operations within the host kernel. limitations in practice.
By reducing the ratio of control plane to data plane operations, kernel First, existing ISC methods exhibit significant shortcomings in terms
traps are minimized, enhancing performance. StorStack also includes a of compatibility and portability. On the host side, developers must de-
user-level cache to explore the impact of cache on the performance of sign custom APIs for ISC, which are incompatible with existing system
interfaces such as POSIX, demanding substantial modifications to the
in-storage file systems.
host code [32]. On the drive side, the drive program either collaborates
We implemented StorStack as a prototype and evaluated it on
with the host file system to access the correct file data [33] or manages
QEMU and Linux 5.15. Experimental results demonstrate that StorStack
the drive as a bare block device without a file system. However, most
performs up to 5.2x faster times than Ext-4 with cache and 7x times
systems still rely on file system-based external storage access, with the
with direct access. Regarding the cache, we find that as access latency
file system typically running on the CPU. Consequently, ISC tasks often
increases, file systems with cache always maintain high speeds, whereas
require CPU involvement when accessing external storage data.
the speed of file systems without cache decreases significantly.
Secondly, current approaches lack adequate protection and isolation
for ISC applications. To fully leverage the high speed of modern storage
2. Background and related work devices, multiple ISC applications may need to execute concurrently.
Without proper data protection mechanisms, malicious or erroneous
The storage or memory system has changed a lot in the past decades. ISC tasks could access unauthorized data. Without isolation, the exe-
With the development of speed, capacity, and size, and the emergence cution of one ISC task could compromise the performance and security
of new types of storage, a rethink of both hardware and software is of others. However, most existing research [1,34,35] assumes that ISC
required to exploit the potential of the system in the next era. In this tasks operate in an exclusive execution environment, failing to address
section, we first discuss the trends of two novel high-speed non-volatile these concerns effectively. Additionally, when specific code is offloaded
storage, and then explored the significance of applying In-storage Com- to storage devices, attackers can exploit vulnerabilities in in-storage
puting on these storage devices. Finally, we briefly introduce three file software and hardware firmware, such as buffer overflows [36,37] or
systems in different locations. bus snooping attacks, to escalate privileges and harm the system.
2
J. Hu et al. Journal of Systems Architecture 160 (2025) 103348
2.3. File system 3. Design
The evolution of storage hardware poses higher demands for soft-
In this section, we first discuss the design principles of StorStack,
ware systems. As a crucial part of the software stack of the storage
followed by an overview of its architecture, connection between host
system, file systems should be redesigned to minimize software over-
heads, especially the involvement of the OS kernel on the data path. and device, scheduling mechanisms and reliability designs.
Many efforts have explored the possibility of different file system
locations.
3.1. Principles
Kernel file systems. Numerous typical file systems are implemented
inside kernel as kernel file systems, including Ext4, XFS, etc. Due to
the isolation of kernel space, kernel file systems can easily manage 1. Provide a full-stack framework to enable in-storage file sys-
data and metadata with reliability guarantees [38]. Recent works on tems without compromising performance. To support in-storage
kernel file systems have sought to exploit the capabilities of modern FS, StorStacks design includes a user library, a kernel driver, and a
storage devices. For example, F2FS [39] is built on append-only logging firmware FS runtime. By bringing FS code out of the kernel and closer
to adapt to the characteristics of flash memory. PMFS [38] introduce to the data, StorStack avoids the kernel trap and reduces the commu-
a new hardware primitive to avoid the consistency issues caused by nication overhead. StorStack also incorporates a user-level cache to
CPU cache while accessing SCM. DAX [40] bypasses the buffer cache maintain the performance when the access latency of the device is high.
of the system to support direct access to the storage hardware so that
2. Make full use of the heterogeneity of the host CPU and
the redundant data movement between DRAM and SCM is removed.
storage device controller. The in-storage FS yields the host CPU time
NOVA [41] explores the hybrid of DRAM and SCM as a specially
designed log-structured file system. However, kernel file systems have to user application codes and cuts the energy cost, while conflicts due
several limitations. Firstly, the development and debugging process to concurrent access are resolved on the host CPU to maintain the per-
within kernel space is inherently complex and difficult. Furthermore, formance. If necessary, the cache is also retained on the host side, and is
every file system access necessitates a kernel trap, which inevitably in- managed by the user space. Such a heterogeneous system can maximize
troduces latency. Additionally, the frequent context switching between the overall performance and minimize the power consumption of the
user processes and the kernel increases CPU overhead. system.
User-space file systems. User-space file systems are implemented 3. Guarantee the reliability of the file system with minimal
mostly in user space to bypass the kernel and reduce the overhead as- overhead. To provide essential guarantees such as permission check-
sociated with kernel traps. However, since most user-space file systems ing, StorStack keeps its control plane within the trusted area. Addi-
are implemented in untrusted environments, ensuring data security and tionally, to enhance performance, a token mechanism is introduced to
reliability becomes challenging. User-space file systems need sophisti-
prevent StorStack from accessing the kernel during data-plane opera-
cated design, usually the collaboration between kernel space and user
tions.
space, to keep them reliable. For example, Strata [11] separate the
file system into a per-process user space update log for concurrent 4. Keep compatible with conventional operating systems. The
writing and a read-only kernel space shared area for data persistence. design of StorStack does not require changes to current operating
Moneta-D [9] provides a hardware virtual channel support with kernel systems. Instead, the user lib and kernel driver of StorStack are add-
space file system protection policy and a user space driver to access the ons. Even without them, the StorStack storage device can be accessed
hardware. There are also efforts to implement the control-plane of the with typical block- or byte-based interfaces, just like traditional SSDs
file system as a trusted user space process [8,12]. or SCMs. StorStack also supports per-partition replaceable file sys-
Firmware file systems. Works that offload part or the whole of the file tems, which is a regular function in current operating systems but not
system into the storage device firmware are categorized as firmware supported by firmware file systems.
file systems. There are three representative works on firmware file sys- 5. Support heterogeneous computing. By providing a device-level
tems: DevFS [14], CrossFS [15] and FusionFS [16]. DevFS and CrossFS
file interface, StorStack may enable multiple advanced heterogeneous
explore the possibility of moving the file system to the storage side to
access patterns, including In-storage Computing (ISC) [31,32,42,43]
benefit from kernel bypass. FusionFS goes further on the previous two
and direct I/O access from GPUs [44,45] or NICs [42,46]. In this work,
works and attempts to gain performance by combining multiple storage
access operations. However, we have identified several problems of we provide basic support for these patterns and plan to further explore
these file systems. First, these firmware file systems are tightly coupled them in future research.
with specific storage devices, which makes it hard for users to select 6. Run with reasonable hardware setup on the storage device.
alternative file systems or upgrade the software version of the current Previous research on firmware file systems has assumed that device
file system. Second, none of these file systems are designed to operate controller hardware capabilities are severely limited. However, todays
effectively in scenarios with significant communication latency. Third, high-end storage devices feature up to 4 cores and DRAM capacity that
the lack of security mechanisms limits their applicability in real-world can reach 1% of their storage capacity [47]. As in-storage processing
environments. evolves, hardware configurations will continue to improve [30,43,
4850]. In StorStack, we assume that the device possesses sufficient
2.4. Motivation capabilities to run file systems alongside a runtime environment. Fu-
ture research can investigate the benefits of integrating in-storage file
Although kernel file systems are well-designed and time-tested, their
systems with additional device-side capabilities, such as power loss
design principles, which assume high device access latency, are no
protection capacitors or the flash translation layer.
longer suitable for modern high-speed devices. User-space file systems
and firmware file systems have explored new approaches to file system
implementation in the era of high-speed storage; however, they may 3.2. Architecture
lead to inferior performance with traditional devices, compromised
security controls, or inflexible, non-replaceable file systems. To ad-
dress these issues, we introduce StorStack, a fast, flexible, and secure To support in-storage file systems with compatibility, flexibility, and
in-storage file system framework. The detailed comparison between reliability, StorStack has three major parts distributed over user space,
StorStack and previous file systems is shown in Table 1. kernel space, and device side.
3
J. Hu et al. Journal of Systems Architecture 160 (2025) 103348
Table 1
The detailed comparison between StorStack and previous file systems.
Software access Expected hardware FS position Host-side cache Replaceable FS Isolated access
latency latency control
Kernel FS High High Host ✓ ✓ ✓
User-space FS Low Low Host ◦ ✓ ◦
Prev.Firm FS Low Low Device × × ×
StorStack Low Either Device ✓ ✓ ✓
Fig. 1. StorStack Architecture. StorStack consists of three major modules: the U-lib, the K-lib, and the Firm-RT; and there are two workflows: a data-plane workflow, and
a control-plane workflow. The interconnection between them is shown in the figure.
3.2.1. High-level design subsequently transmits it to the device-side Firm-RT. The Firm-RT
As shown in Fig. 1, StorStack consists of three major parts: a user receives the NVMe command, checks its validity, and then forwards the
lib (U-lib), a kernel driver (K-lib), and an FS runtime in device command to the FS. The FS handles the file operation and then works
firmware (Firm-RT). with the FTL or other hardware instruments to arrange the data blocks
U-lib. The U-lib is the interface for user applications to access the on the storage media. The primary distinction between this routine and
in-storage FS, offered as a dynamic link library. The main job of the a typical kernel-based file system lies in the fact that the file system
U-lib is to expose POSIX file operations to users, provide user-level logic is inside the storage device, thus StorStack thereby eliminating
cache, and manage the connection with the device. It also cooperates the need for kernel traps during data access.
with the K-lib and the Firm-RT to ensure the reliability of the The control plane (blue dashed lines in Fig. 1) provides necessary
system. supports for the data plane to work properly. Control-plane operations
K-lib. The K-lib is a kernel module to provide control-plane op- on the host side, including memory resource allocation and identity
erations with reliability. Its work includes resource allocation and token assignment, are delegated to the kernel to ensure security and
permission checking. Although it resides in the kernel, the functions reliability. The host-side control-plane operations are designed to be
of K-lib are designed to be rarely called to avoid the performance rarely called to reduce kernel trap overhead. On the device, the control
penalty associated with kernel traps. plane assists in check the authentication of requests, manage the FS,
Firm-RT. The Firm-RT is a runtime on the storage firmware that and deal with other management operations. More detailed security
offers essential hardware and software support for in-storage FS to run and reliability policies will be described in Section 3.5.
on the device controller. To serve the FS, Firm-RT communicates
with both the U-lib for data-plane operations, and the K-lib for 3.2.3. Organization on the storage
control-plane operations. In StorStack, file systems are stored in the storage media with
pointers originating from partitions, so that the framework can choose
3.2.2. StorStack workflow the right FS to access a partition. We dedicate a partition to store all
For clarity, the workflow of StorStack is divided into a data plane the FS binaries that are used by user-created partitions, and each FS
and a control plan. The data-plane workflow handles data accesses from in this partition can be indexed by a number. Here we assume that a
user space, and the control plane is responsible for maintaining the GUID partition table (GPT) to organize the partitions. Each user-created
systems functionality, safety, and reliability. partition is associated with an FS when it is formatted, and the FS will
For the data plane (red lines in Fig. 1), when a user application be added to the FS partition we just mentioned if it was not there yet.
calls a file operation in StorStack, the host-side U-lib will check the To indicate the relation between the user-created partition and its FS,
cache if the cache is used. If the cache is bypassed or penetrated, the index number of the FS is added to the attribute flags bits of the
the U-lib packs it into an extended NVMe protocol command, and partitions GPT entry. The organization is illustrated in Fig. 2. This
4
J. Hu et al. Journal of Systems Architecture 160 (2025) 103348
Fig. 2. Partition organization. Figure shows how the FS is stored on the storage and associated with the partition.
design allows StorStack to provide different file systems to different disk access when the system does not support StorStack. It is note-
partitions. Meanwhile, the GPT and the partitions are still available for worthy that the protocol can be further extended under StorStack to
the typical kernel file system routine. support more paradigms like transactional access [51], log-structured
access [52,53], operations fusing [16], or In-storage Computing. We
3.3. File access pattern will leave these further explorations to our future work.
With StorStack, heterogeneous hardware like GPUs can implement
The U-lib provides POSIX IO and AIO interfaces to user appli- this extended protocol to access files directly without involving the
cations, and the complicated reliability and performance designs are CPU. For different types of hardware, there are two ways to transmit
transparent to users. For regular IO interfaces, the write operations data. For those who have their own memory (memory-mapped) like
(write, pwrite) act differently with and without cache. When the cache GPUs, StorStack can directly place the data to their memory via PCIe
is used, writes will return as soon as an operation passes some simple bus. For hardware without memory (I/O mapped), StorStack should put
check and is put into the queue. The interface will not promise that the data into the main memory. The manipulation of data destination
the data is written to the disk before it returns, just like a traditional is directed by the target device driver.
kernel file system, unless the fsync is called. Without cache, the writes
will block the process until the data is written to the storage. The 3.4.2. Multi-queue arrangement
read interfaces (read, pread) will not return until the data is available, NVMe uses multiple queues to improve performance, supporting up
regardless of whether there is a cache. The AIO interfaces return to 65,536 I/O queues, with 65,536 commands per queue. Normally,
immediately when an operation is put into the queue, and the real NVMe offers at least a pair of queues (one submission queue and one
return value can be fetched by non-blocking check, blocking suspend, completion queue) for each core to fully utilize the bandwidth without
or signal. introducing locks. In StorStack, file operations are processed on the
device side, particularly when the storage device features a multi-core
To make sure that StorStack performs well on high-latency storage
controller. To fully utilize the parallelism of the controller cores while
devices, an optional user-level per-process cache is provided. Because
minimizing the potential conflicts of concurrent file access, StorStack
the reliability of StorStack can only be ensured by the device-side file
introduces a special queue organization.
system but not the U-lib, we choose per-process cache to prevent
As Fig. 3 shows, every user process in StorStack is assigned a bunch
malicious processes from polluting data by writing to a global cache
of queue pairs, the number of which is equal to the storage device
without check. The user-level cache has two ways to deal with write
controller core count. Each queue pair of the queue pair bunch is bound
operations: the write-back method returns immediately after the data
to a controller core of the storage device, so that a process can distribute
is put into the cache; the write-around method drops the dirty data in
any file operation to a specific controller core. Meanwhile, each user
cache and returns after the operation is put into the queue. The write-
thread has its exclusive queue pair bunch to avoid queue contention
back cache has a higher performance than the write-around cache,
on the host side.
while the write-around cache can provide higher data consistency. In
The purpose of this arrangement is to enable the host-side ap-
fact, our evaluation shows that the write-back cache in StorStack can
plications to control which operation should be dispatched to which
outperform the page cache inside the kernel.
controller core. For example, read intensive applications can issue read
operations to all cores with a round robin strategy. For write intensive
3.4. Connectivity applications, different threads can send the write operations on the
same file to the same controller core to reduce lock contention between
Here we discuss how the host-side U-lib and K-lib communicate controller cores. We will leave the exploration of the scheduling policy
with the device-side Firm-RT. StorStacks communication is based for different workloads to future works.
on NVMe to take full advantage of high-speed storage devices. We
also propose a multi-queue design to improve the performance of 3.5. Security and reliability
device-side FS.
From a hardware perspective, the privileged mode (ring 0) that the
3.4.1. Communication protocol kernel runs on and the user mode that user applications run on are
The communication protocol between the host CPU and StorStack isolated, which means the access to resources is restricted by hardware.
device is a queued protocol extended from NVMe [3]. NVMe is a The privileged mode can thus be treated as a trusted area, whereas the
protocol for accessing non-volatile memories connected via PCIe that user mode as an untrusted area. StorStack introduces the device-side
supports multiple queues to maximize the throughput, which is suitable controller as a run-time, which is also isolated from user code and thus
for novel high-speed storage devices such as SSDs and SCMs. viewed as a trusted area.
To enable the transfer of file operations, we extend the NVMe For safety, everything critical to the correctness of the system should
command list to incorporate the POSIX I/O interface. Meanwhile, the be placed in the trusted area. Typical kernel file systems are placed
regular data access pattern of NVMe is retained to enable normal inside the kernel as they need to manage the data on block devices.
5
J. Hu et al. Journal of Systems Architecture 160 (2025) 103348
Fig. 3. Queue arrangement and scheduling policies. This figure shows how the
queue pairs are mapped between host CPU threads and device controller cores.
StorStack shifts FS to the device side, which is also a trusted area.
Meanwhile, as described in Section 3.2.2, StorStack separates the host-
side workflow into a control plane and a data plane. The control plane
is designed to reside in the host-side trusted area, i.e. the kernel, to
cooperate with the device-side FS to ensure security and reliability.
An important design principle of the control plane is to reduce the Fig. 4. Permission checking. Figure shows how the user space, the kernel space, and
overhead of the kernel trap. In StorStack, this is done by reducing the device work together to check the validity of a request without frequent kernel
the proportion of control-plane operations and data-plane operations. traps.
There are two types of control-plane workflow on the host side: re-
source allocation and access control. Both of them are designed to be
called rarely. generates a secret key if one has not been set yet, then save and copy
it to the device by kernel NVMe driver. Once the key is set, K-lib
3.5.1. Resource allocation uses it to encrypt the processs credential information (i.e. uid) into
The U-lib of StorStack is a user-space driver that communicates MAC (Message Authentication Code). The resulting token, which is the
with the NVMe storage device. It needs to set up VFIO and manage output of the encryption, is then returned to the process. Since the
DMA memory mapping to enable direct access from user space. It also secret key is stored in the kernel, the process cannot forge a token
needs to allocate areas for caches. These operations involve the kernel but can only use the one assigned by the kernel, which can prove the
but only need to be run once when the device is initialized, so there authenticity of the uid claimed by the process. Before being sent to the
will not be any performance loss in regular file access. device, every request from the process is tagged with the processs uid
and the token, so that the device can use the secret key and the token
3.5.2. Permission checking to verify the uid and check the identity of the request. This mechanism
To provide access control, file systems must check the users permis- requires only one communication between the kernel and the device to
sion to make sure that a file operation is legal. In kernel file systems, share the secret key, and one kernel trap to initialize the token for each
the file system can use the process structure in the kernel to validate the process. Also, the K-lib is implemented as a kernel driver, without
processs identity, and then compare it with the permission information any modification to the core functions of the kernel, which makes it
stored in the files inode. In StorStack, however, the file system resides compatible with conventional operating system.
on the device rather than in the kernel, so the kernel needs to share the
processs information with the device to support permission checking. 3.5.3. Device lock
To avoid entering the kernel frequently, DevFS [14] maintains a StorStack is designed to support direct I/O not only from CPUs,
table that maps CPU IDs to process credentials in the device. All but also from different types of heterogeneous computing devices.
requests are tagged with the CPUs ID that the process runs on before To prevent concurrent access to the same file from multiple devices,
they are sent to the device. The kernel is modified to update the table a concurrency control method is required. A common practice is to
whenever a process is scheduled on a host CPU. There are two problems implement a distributed lock across all devices, but this can be too
with this mechanism. Firstly, it assumes that the CPU ID is unforgeable, costly for low-level hardware. In StorStack, we provide in-storage file-
but usually a malicious process can potentially exploit the ID of another level locking mechanisms to protect the files from unexpected access
CPU to escalate its privilege. Secondly, this requires a modification to by multiple devices.
the process scheduler, which is a core module of the kernel, so making StorStack supports two types of lock: (1) spinning lock, an error
it incompatible with standard OS kernels, and may slow down the code will be returned to the caller if the file it accesses is already locked
system. by another device, allowing the caller to continue attempting to acquire
In StorStack, we propose a new method to share the credential of the lock until the file is unlocked; (2) sleeping lock, where if the file
the process, with less communication, safer guarantee, and no change is locked, any requests from other devices to that file will wait in the
to the Linux kernel. The process is shown in Fig. 4. When the U- submission queue until the file is unlocked. From the perspective of
lib is initialized on a process, it calls the K-lib (a kernel driver) concurrency, StorStack supports both shared lock and exclusive lock,
via ioctl() (system call) to get a credential token. The K-lib which act exactly the same as those on other systems.
6
J. Hu et al. Journal of Systems Architecture 160 (2025) 103348
Fig. 5. Random and sequential r/w. Figure shows the basic performance of StorStack compared with Ext-4, under different cache, block size, and in-storage file system settings.
running on its host machine. There are two reasons for the simula-
tion: first, although there are several works regarding programmable
storage controllers [49,5557], these solutions are either expensive or
lack high-level programmability as most of them are based on FPGA;
second, by simulating with various latency settings, we can evaluate the
performance of StorStack on different types of storage devices, which
can be costly if done with real hardware. In our prototype, QEMU has
been modified to handle extended NVMe POSIX I/O operations and
check the token of each operation.
4. Evaluation
In this section, we evaluate the performance of StorStack and com-
pare it with popular file systems to answer the following questions:
Fig. 6. Time cost for a single operation.
• Is StorStack efficient enough compared to widely used kernel file
systems?
3.6. Implementation • How much performance is gained from the kernel trap avoidance?
• How does StorStack perform on different types of devices?
We have implemented a prototype of StorStack, which consists of • How is the concurrency performance of StorStack?
three parts: the U-lib, the K-lib and the Firm-RT. The source code
of this prototype is available at https://anonymous.4open.science/r/
StorStack-524F/. 4.1. Experimental setup
The U-lib is implemented under Linux 5.15, utilizing SPDK [54]
to access storage devices from user space. The SPDK library is modified Our experiment platform is a 20-core 2.4 GHz Intel Xeon server
in StorStack to transfer POSIX I/O operations over NVMe. The U-lib equipped with 64 GB DDR4 memory and 512 GB SSD. Among them, 8
comprises two major components: a dynamic link library that provides
cores with 16 GB memory are assigned to the QEMU VM to simulate the
interfaces and a user-level cache for accessing the device, and a daemon
StorStack host; other cores with 16 GB memory are reserved to emulate
program responsible for managing the connection to the device.
the StorStack device. Both the StorStack host and the StorStack device
The K-lib is implemented as a simple kernel module in Linux 5.15
runs on Linux 5.15.
kernel. It only takes charge of two things: creating the secret key when
the StorStack is initialized so that the K-lib and the Firm-RT can StorStacks expected settings on the device require only a minimal
use it to encrypt and decrypt the MAC token for processes credentials; embedded system with abstractions of hardware functions and neces-
generating the MAC token from the uid of the current process with sary libraries, but due to our simulation requirements, we choose Linux
HMAC algorithm when the process initializes, and then return it to as the device-side environment to support the execution of QEMU.
the U-lib. The interface of the K-lib is exposed to the user space In this section, we evaluate the performance of StorStack using
through ioctl. Filebench [58], a widely used benchmarking suite for testing file system
The Firm-RT is the only component that located on the device performance. We access StorStack under various configurations, includ-
side. In this work, the Firm-RT is not implemented on actual stor- ing different cache options, device access latency, thread numbers and
age hardware but is instead simulated using QEMU and the system read/write ratios, to address the four questions previously raised.
7
J. Hu et al. Journal of Systems Architecture 160 (2025) 103348
Fig. 7. Performance with simulated latency. This figure shows the change in throughput as a function of simulated device access latency.
Fig. 8. Multi-thread Performance.
4.2. Random and sequential r/w read, and uncached write.
When the cache hits, the data resides in fast DRAM, resulting in
First, we evaluate StorStacks performance with single-thread ran- low data-fetch latency. In this scenario, traditional Ext-4 exhibits higher
dom and sequential read/write tests. The random tests run on a 1 GB access latency, as the kernel trap accounts for most of the latency.
file with 1K, 4K, and 16K bytes I/O size. The sequential tests run on In contrast, StorStack shows lower latency because its cache is imple-
a 8 GB file with 8K, 32K, and 128K bytes I/O size. Both of the files mented inside user space eliminating the need for kernel traps. When a
are stored on the DRAM memory, which is simulated as a PMEM by cache miss occurs, the primary overhead shifts to the multiple rounds
memmap. The tests are performed on StorStack (referred to as SS) with of storage device access, which further increases the performance gap
two different in-storage FS settings: SS+Ext-4 and SS+Ext-4_DAX. between traditional Ext-4 and StorStack.
Then we compare them with Ext-4. We also evaluate the performance
of SS without cache (SS NC) and Ext-4 with direct IO (Ext-4_DIO) 4.4. Impact of access latency
to study performance improvement when accessed directly.
Fig. 5 shows the results of the random and sequential tests. In Storage devices with different access latencies may influence the
both tests, SS outperforms traditional kernel-level Ext-4, due to our performance of file systems. In this experiment, we use multiple latency
kernel-bypass and near-data file system design. SS+Ext-4_DAX with settings to simulate devices with different access speed. The latency is
user-level write-back cache achieves averagely 1.98x, 4.25x, 3.59x, and simulated on the device side by QEMU.
4.08x performance gain on random read, random write, sequential We compare the performance of SS with Ext-4 under cached and
read, and sequential write respectively compared with Ext-4 with uncached settings using several latency settings. The latency ranges
page cache. For direct access, the speed increase is 6.41x, 6.21x, from 0 μs to 25 μs to simulate connection methods from DDR to PCIe
4.72x, and 1.90x respectively. Another interesting phenomenon is that to RDMA. Tests run with 4KB block size.
in cached StorStack, the performances of SS+Ext-4 and SS+Ext- Fig. 7 shows the result of this test. With a cache, both SS and
4_DAX are similar, indicating that the choice of the in-storage file Ext-4 are not susceptible to the rise of latency. However, without
system does not matter because most operations are handled by cache, the performance of SS has a 78.20% degrade from 526MB/s
the user-level cache. However, in uncached tests, SS+Ext-4_DAX at 0 simulated latency to 115 MB/s at 25 μs latency. The performance
show better results, which means that the in-storage file system may of Ext-4 also cuts 20.98% from 54MB/s to 43MB/s. Note that the
influence the overall performance in direct access. experiment introduces extra latency due to QEMU, so the simulated 0
latency is larger than 0 actually, meaning that the curve can even go
4.3. Profit of kernel bypassing higher on the left side of the graph. The result illustrates that direct
access of SS should only be enabled on ultra-low latency devices. For
We measure the time cost of a single operation to study the profit other hardware, it is better to enable the cache.
of kernel bypassing. The cached test demonstrates the impact of kernel
trap on the access of in-memory page cache. The uncached test shows 4.5. Multi-thread performance
the impact of both kernel trap and write amplification on direct access
to the storage device. Both tests utilize 4KB block size, and the files To study the performance of StorStack under multiple threads, we
are stored on the simulated PMEM. The results in Fig. 6 indicate that evaluate SS and Ext-4 under a multi-thread micro-benchmark. The
compared to Ext-4, SS+Ext-4_DAX reduces latency by 91.91%, benchmark is to perform parallel 4KB file operations on one file with 4
50.46%, 69.83%, and 81.83% on cached read, cached write, uncached threads, each thread is a reader or a writer, and the ratio of readers and
8
J. Hu et al. Journal of Systems Architecture 160 (2025) 103348
writers is set to 4:0, 3:1, 1:3, and 0:4. Fig. 8 shows the result. StorStack [10] M. Dong, H. Bu, J. Yi, B. Dong, H. Chen, Performance and protection in the
is faster than Ext-4 in all concurrent read and write scenarios of our ZoFS user-space NVM file system, in: Proceedings of the 27th ACM Symposium
on Operating Systems Principles, ACM, Huntsville Ontario Canada, 2019, pp.
test. For cached scenario, SS is on average 2.88x faster than Ext-4 in
478493, http://dx.doi.org/10.1145/3341301.3359637.
all read-write ratios. For uncached scenario, the speed up is 17.34x. [11] Y. Kwon, H. Fingler, T. Hunt, S. Peter, E. Witchel, T. Anderson, Strata: A cross
media file system, in: Proceedings of the 26th Symposium on Operating Systems
5. Conclusion Principles, in: SOSP 17, Association for Computing Machinery, New York, NY,
USA, 2017, pp. 460477, http://dx.doi.org/10.1145/3132747.3132770.
[12] J. Liu, A.C. Arpaci-Dusseau, R.H. Arpaci-Dusseau, S. Kannan, File systems as
In this paper, we present StorStack, a full-stack design for in-storage processes, in: 11th USENIX Workshop on Hot Topics in Storage and File Systems,
file systems framework and simulator. The StorStack components across HotStorage 19, USENIX Association, Renton, WA, 2019.
user space, kernel space, and device space collaborate to enable file [13] S. Zhong, C. Ye, G. Hu, S. Qu, A. Arpaci-Dusseau, R. Arpaci-Dusseau, M. Swift,
systems to run inside the storage device efficiently and reliably. We MadFS: per-file virtualization for userspace persistent memory filesystems, in:
21st USENIX Conference on File and Storage Technologies, FAST 23, 2023, pp.
implement a prototype of StorStack and evaluate it with various set-
265280.
tings. Experimental results show that StorStack outperforms current [14] S. Kannan, A.C. Arpaci-Dusseau, R.H. Arpaci-Dusseau, Y. Wang, J. Xu, G. Palani,
kernel file systems in both cached and uncached scenes. Some further Designing a true direct-access file system with devfs, in: 16th USENIX Conference
performance optimizations, such as the combination of file systems and on File and Storage Technologies, FAST 18, USENIX Association, Oakland, CA,
2018, pp. 241256.
storage hardware capabilities, the exploration of multi-queue schedul-
[15] Y. Ren, C. Min, S. Kannan, Crossfs: A cross-layered direct-access file system,
ing strategies for different workloads, and the performance of direct in: 14th USENIX Symposium on Operating Systems Design and Implementation,
access from heterogeneous devices, are left to future work. OSDI 20, USENIX Association, 2020, pp. 137154.
[16] J. Zhang, Y. Ren, S. Kannan, FusionFS: fusing I/O operations using ciscops
in firmware file systems, in: 20th USENIX Conference on File and Storage
CRediT authorship contribution statement
Technologies, FAST 22, USENIX Association, Santa Clara, CA, 2022, pp. 297312.
[17] N. Agrawal, V. Prabhakaran, T. Wobber, J.D. Davis, M. Manasse, R. Panigrahy,
Juncheng Hu: Writing review & editing, Writing original draft. Design tradeoffs for SSD performance, in: USENIX 2008 Annual Technical
Shuo Chen: Formal analysis, Data curation. Haoyang Wei: Formal Conference, in: ATC08, USENIX Association, USA, 2008, pp. 5770.
analysis, Data curation. Guoyu Wang: Writing review & editing, [18] F. Chen, D.A. Koufaty, X. Zhang, Understanding intrinsic characteristics and
system implications of flash memory based solid state drives, in: Proceedings
Writing original draft. Chenju Pei: Formal analysis, Data curation. of the Eleventh International Joint Conference on Measurement and Modeling of
Xilong Che: Methodology, Conceptualization. Computer Systems, in: SIGMETRICS 09, Association for Computing Machinery,
New York, NY, USA, 2009, pp. 181192, http://dx.doi.org/10.1145/1555349.
Declaration of competing interest 1555371.
[19] Welcome to PCI-SIG | PCI-SIG, https://pcisig.com/.
[20] Y. Choi, I. Song, M.-H. Park, H. Chung, S. Chang, B. Cho, J. Kim, Y. Oh, D.
The authors declare that they have no known competing finan- Kwon, J. Sunwoo, J. Shin, Y. Rho, C. Lee, M.G. Kang, J. Lee, Y. Kwon, S. Kim,
cial interests or personal relationships that could have appeared to J. Kim, Y.-J. Lee, Q. Wang, S. Cha, S. Ahn, H. Horii, J. Lee, K. Kim, H. Joo, K.
influence the work reported in this paper. Lee, Y.-T. Lee, J. Yoo, G. Jeong, A 20nm 1.8V 8gb PRAM with 40mb/s program
bandwidth, in: 2012 IEEE International Solid-State Circuits Conference, 2012,
pp. 4648, http://dx.doi.org/10.1109/ISSCC.2012.6176872.
Acknowledgments [21] H. Volos, A.J. Tack, M.M. Swift, Mnemosyne: Lightweight persistent memory,
ACM SIGARCH Comput. Archit. News 39 (1) (2011) 91104, http://dx.doi.org/
This work was funded by the National Key Research and De- 10.1145/1961295.1950379.
velopment Programme No. 2024YFB3310200, and by Key scientific [22] S.-W. Chung, T. Kishi, J.W. Park, M. Yoshikawa, K.S. Park, T. Nagase, K.
Sunouchi, H. Kanaya, G.C. Kim, K. Noma, M.S. Lee, A. Yamamoto, K.M. Rho,
and technological R&D Plan of Jilin Province of China under Grant K. Tsuchida, S.J. Chung, J.Y. Yi, H.S. Kim, Y. Chun, H. Oyamatsu, S.J. Hong,
No. 20230201066GX, and by the Central University Basic Scientific 4Gbit density STT-MRAM using perpendicular MTJ realized with compact cell
Research Fund Grant No.2023-JCXK-04. structure, in: 2016 IEEE International Electron Devices Meeting, IEDM, 2016, pp.
27.1.127.1.4, http://dx.doi.org/10.1109/IEDM.2016.7838490.
[23] H. Akinaga, H. Shima, Resistive random access memory (ReRAM) based on metal
References
oxides, Proc. IEEE 98 (12) (2010) 22372251, http://dx.doi.org/10.1109/JPROC.
2010.2070830.
[1] G. Koo, K.K. Matam, T. I, H.K.G. Narra, J. Li, H.-W. Tseng, S. Swanson, M. [24] K. Kawai, A. Kawahara, R. Yasuhara, S. Muraoka, Z. Wei, R. Azuma, K. Tanabe,
Annavaram, Summarizer: trading communication with computing near storage, K. Shimakawa, Highly-reliable TaOx reram technology using automatic forming
in: Proceedings of the 50th Annual IEEE/ACM International Symposium on circuit, in: 2014 IEEE International Conference on IC Design & Technology, 2014,
Microarchitecture, 2017, pp. 219231. pp. 14, http://dx.doi.org/10.1109/ICICDT.2014.6838600.
[2] S.S.M. Gahagan, S. Bhaskaran, T. Bunker, A. De, Y. Jin, Y. Liu, S. Swanson, [25] K. Suzuki, S. Swanson, The Non-Volatile Memory Technology Database
Willow: A User-Programmable ssd, OSDI, 2014. (NVMDB), Tech. Rep. CS2015-1011, Department of Computer Science &
[3] NVMe specifications, https://nvmexpress.org/specifications/. Engineering, University of California, San Diego, 2015.
[4] Intel, Intel® Optane™ Persistent Memory, https://www.intel.com/content/www/ [26] S. Matsuura, Designing a persistent-memory-native storage engine for SQL
us/en/products/docs/memory-storage/optane-persistent-memory/overview. database systems, in: 2021 IEEE 10th Non-Volatile Memory Systems and Ap-
html. plications Symposium, NVMSA, IEEE, Beijing, China, 2021, pp. 16, http://dx.
[5] S. Mittal, J.S. Vetter, A survey of software techniques for using non-volatile doi.org/10.1109/NVMSA53655.2021.9628842.
memories for storage and main memory systems, IEEE Trans. Parallel Distrib. [27] R. Tadakamadla, M. Patocka, T. Kani, S.J. Norton, Accelerating database work-
Syst. 27 (5) (2016) 15371550, http://dx.doi.org/10.1109/TPDS.2015.2442980. loads with DM-WriteCache and persistent memory, in: Proceedings of the 2019
[6] M. Wei, M. Bjørling, P. Bonnet, S. Swanson, I/O speculation for the microsecond ACM/SPEC International Conference on Performance Engineering, in: ICPE 19,
era, in: 2014 USENIX Annual Technical Conference, USENIX ATC 14, 2014, pp. Association for Computing Machinery, New York, NY, USA, 2019, pp. 255263,
475481. http://dx.doi.org/10.1145/3297663.3309669.
[7] S. Peter, J. Li, I. Zhang, D.R.K. Ports, D. Woos, A. Krishnamurthy, T. Anderson, [28] W. Wang, C. Yang, R. Zhang, S. Nie, X. Chen, D. Liu, Themis: malicious wear
T. Roscoe, Arrakis: the operating system is the control plane, in: 11th USENIX detection and defense for persistent memory file systems, in: 2020 IEEE 26th
Symposium on Operating Systems Design and Implementation, OSDI 14, 2014, International Conference on Parallel and Distributed Systems, ICPADS, 2020, pp.
pp. 116. 140147, http://dx.doi.org/10.1109/ICPADS51040.2020.00028.
[8] H. Volos, S. Nalli, S. Panneerselvam, V. Varadarajan, P. Saxena, M.M. Swift, [29] B. Zhu, Y. Chen, Q. Wang, Y. Lu, J. Shu, Octopus+ : An RDMA-enabled distributed
Aerie: Flexible file-system interfaces to storage-class memory, in: Proceedings persistent memory file system, ACM Trans. Storage 17 (3) (2021) 125, http:
of the Ninth European Conference on Computer Systems, in: EuroSys 14, //dx.doi.org/10.1145/3448418.
Association for Computing Machinery, New York, NY, USA, 2014, pp. 114, [30] J. Do, V.C. Ferreira, H. Bobarshad, M. Torabzadehkashi, S. Rezaei, A. Hey-
http://dx.doi.org/10.1145/2592798.2592810. darigorji, D. Souza, B.F. Goldstein, L. Santiago, M.S. Kim, P.M.V. Lima, F.M.G.
[9] A.M. Caulfield, T.I. Mollov, L.A. Eisner, A. De, J. Coburn, S. Swanson, Providing França, V. Alves, Cost-effective, energy-efficient, and scalable storage computing
safe, user space access to fast, solid state disks, ACM SIGPLAN Not. 47 (4) (2012) for large-scale AI applications, ACM Trans. Storage 16 (4) (2020) 21:121:37,
387400, http://dx.doi.org/10.1145/2248487.2151017. http://dx.doi.org/10.1145/3415580.
9
J. Hu et al. Journal of Systems Architecture 160 (2025) 103348
[31] L. Kang, Y. Xue, W. Jia, X. Wang, J. Kim, C. Youn, M.J. Kang, H.J. Lim, [57] J. Kwak, S. Lee, K. Park, J. Jeong, Y.H. Song, Cosmos+ OpenSSD: rapid prototype
B. Jacob, J. Huang, IceClave: A trusted execution environment for in-storage for flash storage systems, ACM Trans. Storage 16 (3) (2020) 15:115:35, http:
computing, in: MICRO-54: 54th Annual IEEE/ACM International Symposium //dx.doi.org/10.1145/3385073.
on Microarchitecture, in: MICRO 21, Association for Computing Machinery, [58] Filebench, https://github.com/filebench/filebench.
New York, NY, USA, 2021, pp. 199211, http://dx.doi.org/10.1145/3466752.
3480109.
[32] Z. Ruan, T. He, J. Cong, INSIDER: designing in-storage computing system Juncheng Hu received the bachelors degree and doctor
for emerging high-performance drive, in: 2019 USENIX Annual Technical of Engineering degree from Jilin University in 2017 and
Conference, USENIX ATC 19, USENIX Association, Renton, WA, 2019, pp. 2022, where he is current a lecturer in Jilin University. His
379394. research interests include data mining, machine learning,
[33] A.M. Caulfield, T.I. Mollov, L.A. Eisner, A. De, J. Coburn, S. Swanson, Providing computer network and parallel computing.
safe, user space access to fast, solid state disks, ACM SIGPLAN Not. 47 (4) (2012) jchu@jlu.edu.cn
387400.
[34] S. Cho, C. Park, H. Oh, S. Kim, Y. Yi, G.R. Ganger, Active disk meets flash: A case
for intelligent ssds, in: Proceedings of the 27th International ACM Conference
on International Conference on Supercomputing, 2013, pp. 91102.
[35] J. Do, Y.-S. Kee, J.M. Patel, C. Park, K. Park, D.J. DeWitt, Query processing
on smart ssds: Opportunities and challenges, in: Proceedings of the 2013
ACM SIGMOD International Conference on Management of Data, 2013, pp. Shuo Chen is currently working toward the masters de-
12211230. gree with College of Computer Science and Technology,
[36] C. Cowan, S. Beattie, J. Johansen, P. Wagle, {Point Guar d.}: Protecting pointers Jilin University since 2022. His research field is computer
from buffer overflow vulnerabilities, in: 12th USENIX Security Symposium, architecture, mainly focusing on optimization for caching
USENIX Security 03, 2003. systems.
[37] L. Szekeres, M. Payer, T. Wei, D. Song, Sok: Eternal war in memory, in: 2013 chenshuo22@mails.jlu.edu.cn
IEEE Symposium on Security and Privacy, IEEE, 2013, pp. 4862.
[38] S.R. Dulloor, S. Kumar, A. Keshavamurthy, P. Lantz, D. Reddy, R. Sankaran, J.
Jackson, System software for persistent memory, in: Proceedings of the Ninth Eu-
ropean Conference on Computer Systems - EuroSys 14, ACM Press, Amsterdam,
The Netherlands, 2014, pp. 115, http://dx.doi.org/10.1145/2592798.2592814.
[39] C. Lee, D. Sim, J. Hwang, S. Cho, F2FS: A new file system for flash storage, in:
13th USENIX Conference on File and Storage Technologies, FAST 15, USENIX Wei Haoyang a 23rd-year Masters student in Computer
Association, Santa Clara, CA, 2015, pp. 273286. Science and Technology at Jilin University, focuses on
[40] DAX, https://www.kernel.org/doc/Documentation/filesystems/dax.txt. computer architecture research, with a primary interest in
[41] J. Xu, S. Swanson, NOVA: A log-structured file system for hybrid volatile/non- the application of new storage devices.
volatile main memories, in: Proceedings of the 14th Usenix Conference on File hywei23@mails.jlu.edu.cn
and Storage Technologies, in: FAST16, USENIX Association, USA, 2016, pp.
323338.
[42] M. Torabzadehkashi, S. Rezaei, A. HeydariGorji, H. Bobarshad, V. Alves, N.
Bagherzadeh, Computational storage: An efficient and scalable platform for big
data and HPC applications, J. Big Data 6 (1) (2019) 100, http://dx.doi.org/10.
1186/s40537-019-0265-5.
[43] W. Cao, Y. Liu, Z. Cheng, N. Zheng, W. Li, W. Wu, L. Ouyang, P. Wang, Y.
Wang, R. Kuan, Z. Liu, F. Zhu, T. Zhang, POLARDB meets computational storage: Guoyu Wang is currently working toward the doctors
efficiently support analytical workloads in cloud-native relational database, in: degree with College of Computer Science and Technology,
Proceedings of the 18th USENIX Conference on File and Storage Technologies, Jilin University.
in: FAST20, USENIX Association, USA, 2020, pp. 2942. wgy21@mails.jlu.edu.cn
[44] Nvidia, NVIDIA RTX IO: GPU accelerated storage technology, https://www.
nvidia.com/en-us/geforce/news/rtx-io-gpu-accelerated-storage-technology/.
[45] AMD, Radeon™ Pro SSG graphics, https://www.amd.com/en/products/
professional-graphics/radeon-pro-ssg.
[46] Z. An, Z. Zhang, Q. Li, J. Xing, H. Du, Z. Wang, Z. Huo, J. Ma, Optimizing the
datapath for key-value middleware with NVMe SSDs over RDMA interconnects,
in: 2017 IEEE International Conference on Cluster Computing, CLUSTER, 2017,
pp. 582586, http://dx.doi.org/10.1109/CLUSTER.2017.69. Pei Chenju is an undergraduate student at the School of
[47] Samsung, Samsung 990 PRO with heatsink, https://semiconductor.samsung. Computer Science and Technology at Jilin University. His
com/content/semiconductor/global/consumer-storage/internal-ssd/990-pro- field of research is computer system architecture, and he is
with-heatsink.html. currently investigating new L7 load balancing solutions.
[48] A. Ltd, ARM computational storage solution, https://www.arm.com/solutions/ peicj2121@mails.jlu.edu.cn
storage/computational-storage.
[49] Samsung, Samsung SmartSSD, https://www.xilinx.com/applications/data-center/
computational-storage/smartssd.html.
[50] ScaleFlux, ScaleFlux, https://scaleflux.com/.
[51] E. Gal, S. Toledo, A transactional flash file system for microcontrollers, in: 2005
USENIX Annual Technical Conference, USENIX ATC 05, 2005.
[52] J. Koo, J. Im, J. Song, J. Park, E. Lee, B.S. Kim, S. Lee, Modernizing file system
through in-storage indexing, in: Proceedings of the 15th Usenix Symposium on Che Xilong Received the M.S. and Ph.D. degrees in Com-
Operating Systems Design and Implementation, Osdi 21, Usenix Assoc, Berkeley, puter Science from Jilin University, in 2006 and 2009
2021, pp. 7592, http://dx.doi.org/10.5281/zenodo.4659803. respectively.
[53] LevelDB, https://github.com/google/leveldb. Currently, He is a full professor and doctoral supervisor
[54] Storage performance development kit, https://spdk.io/. at the College of Computer Science and Technology, Jilin
[55] DFC open source, https://github.com/DFC-OpenSource. University, China.
[56] M. Jung, OpenExpress: fully hardware automated open research framework His current research areas are Parallel & Distributed
for future fast NVMe devices, in: 2020 USENIX Annual Technical Conference, Computing, High Performance Computing Architectures,
USENIX ATC 20, 2020, pp. 649656. and related optimizations.
He is a member of the China Computer Federation.
Corresponding author of this paper.
chexilong@jlu.edu.cn
10

View File

@@ -0,0 +1,874 @@
Computer Standards & Interfaces 97 (2026) 104123
Contents lists available at ScienceDirect
Computer Standards & Interfaces
journal homepage: www.elsevier.com/locate/csi
V-Bridge: A dynamic cross-shard blockchain protocol based on off-chain
payment channel
Xueting Huang a , Xiangwei Meng a,c,b , Kai Zhang a,c,b , Ce Yang a,c,b , Wei Liang a,c,b ,,
Kuan-Ching Li a,c,b ,
a
College of Computer Science and Engineering, Hunan University of Science and Technology, XiangTan 411201, China
b Hunan University of Science and Technology Sanya Research Institute, SanYa 572000, China
c Hunan Key Laboratory for Service Computing and Novel Software Technology, XiangTan 411201, China
ARTICLE INFO ABSTRACT
Keywords: Sharding technology effectively improves system throughput by distributing the blockchain transaction
Blockchain load to multiple shards for parallel processing, and it is the core solution to the scalability problem of
Sharding blockchain. However, as the number of shards increases, the frequency of cross-shard transactions increases
Cross-shard
significantly, leading to increased communication and computational overhead, transaction delays, uneven
Off-chain payment channel
resource allocation, and load imbalance, which becomes a key bottleneck for performance expansion. To this
end, this article proposes the cross-shard transaction protocol V-Bridge, which draws on the concept of off-chain
payment channels to establish distributed virtual fund channels between Trustors in different shards, convert
cross-shard transactions into off-chain transactions and realize the logical flow of funds. To further enhance
cross-shard transaction performance, our V-Bridge integrates an intelligent sharding adjustment mechanism,
and a cross-shard optimized critical path protection algorithm (CSOCPPA) to dynamically balance shard loads,
alleviate resource allocation issues, and minimize performance bottlenecks. Experimental results show that
compared with existing state-of-the-art protocols, our proposed V-Bridges average throughput is increased by
26% to 46%, and transaction delays are reduced by 15% to 24%.
1. Introduction overhead, introduces complex synchronization issues, and creates po-
tential performance bottlenecks—diminishing the efficiency gains of
Blockchain [1,2], as a decentralized, transparent, and tamper-proof sharding [14].
technology, has great potential in various fields such as privacy protec- Various cross-shard transaction protocols have been proposed to
address these challenges to enhance processing efficiency and reduce
tion, medical applications, and the Internet of Things [35]. In cross-
synchronization overhead. For instance, Monoxide [15] employs an
border payments, for example, it offers an alternative to traditional
asynchronous consensus mechanism to minimize inter-shard waiting
banking systems, enabling fast and cost-effective transactions [6]. How-
times. However, while transactions are processed in parallel, the sys-
ever, blockchain systems face significant scalability challenges during
tem struggles to ensure state consistency and communication over-
high transaction volumes [79]. For instance, Ethereum [10] often ex-
head remains significant in large-scale sharding environments. Broker-
periences network congestion during peak usage, leading to transaction
Chain [16] introduces brokers to coordinate cross-shard transaction
delays and increased GAS fees. Addressing scalability has become a
processing, but its approach appears impractical in real-world appli-
critical priority. Sharding [11,12] technology offers a promising solu-
cations. The system relies heavily on securing sufficient intermediary
tion by partitioning the blockchain network into independent shards,
nodes to stabilize cross-shard transactions. However, such nodes are
enabling parallel transaction processing and faster confirmations [13].
challenging to acquire in decentralized networks, especially under
However, the benefits of sharding are hindered by challenges posed by
heavy transaction loads. This reliance amplifies dependence on individ-
cross-shard transactions. These transactions require data synchroniza-
ual nodes, increasing the risk of centralization and limiting scalability.
tion and state validation across shards, which increases communication
Consequently, BrokerChain [16] struggles to address complex scenarios
Correspondence to: College of Computer Science and Engineering, Hunan University of Science and Technology, No. 2 Taoyuan Road, Yuhu
District, Xiangtan 411201, Hunan Province, China.
E-mail addresses: wliang@hnust.edu.cn (W. Liang), aliric@hnust.edu.cn (K.-C. Li).
https://doi.org/10.1016/j.csi.2025.104123
Received 17 December 2024; Received in revised form 27 October 2025; Accepted 26 December 2025
Available online 31 December 2025
0920-5489/© 2026 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
X. Huang et al. Computer Standards & Interfaces 97 (2026) 104123
involving dynamic node allocation and shard optimization, exposing 2. Related work
key weaknesses in its designs practicality and adaptability. To ad-
dress shard distribution, BrokerChain [16] employs epoch, where the 2.1. Cross-shard transaction
Metis algorithm [17] optimizes user account partitioning to prevent
related transactions from being spread across shards in subsequent To address the challenges of cross-shard transactions, a variety
of protocols have been developed [2224], with a focus on enhanc-
epochs. This strategy is similarly adopted in protocols like X-Shard [18].
ing performance, scalability, and consistency. Monoxide [15] uses an
However, in practice, the Metis algorithm [17] suffers from partition
asynchronous consensus mechanism and temporary payment chan-
imbalance and fragmentation of critical nodes. This fragmentation can
nels to decompose cross-shard transactions into intra-shard operations,
distribute high-frequency transaction paths across multiple shards, in- boosting throughput. However, the complexity of state synchronization
creasing cross-shard communication overhead and processing latency. and consistency verification across shards increases communication
Ultimately, this undermines system performance and scalability. In con- overhead, impacting large-scale system performance. OmniLedger [25]
clusion, significant challenges remain while both BrokerChain [16] and employs the Atomix protocol to ensure the atomicity of cross-shard
X-Shard [18] offer innovative approaches to cross-shard transaction transactions. Through a client-driven two-phase commit process, it
optimization. Improvements are needed in node allocation, security freezes the UTXO of the input shard and then releases the UTXO
management, and performance optimization to realize the full potential of the output shard, maintaining consistent transaction state updates.
of cross-shard systems. RapidChain [26] improves system throughput by reducing commu-
To address the shortcomings of existing cross-shard transaction nication rounds through parallel verification. However, it struggles
with performance bottlenecks when handling cross-shard transactions
protocols, we propose the V-Bridge, a novel solution leveraging the
with long dependency chains due to prolonged waiting times. Bro-
off-chain transaction model of bidirectional payment channels. The V-
kerChain [16] simplifies cross-shard communication by introducing
Bridge facilitates logical fund interactions across shards by constructing
third-party intermediary nodes that convert cross-shard transactions
virtual channels. To reduce reliance on single nodes in decentralized into intra-shard transactions. However, this approach introduces risks
environments, the V-Bridge introduces trustee groups as relay nodes related to decentralization and trust management. Pyramid [27] im-
within each shard. These groups, supported by a flexible and ro- plements a hierarchical sharding protocol where BridgeShard processes
bust management framework, ensure seamless cross-shard transactions transactions across multiple shards in a single consistency round, sig-
while distributing the load effectively. Virtual fund channels between nificantly reducing confirmation latency. However, it increases system
trustees are settled on-chain only upon closure, which significantly management complexity. CHERUBIM [28] leverages pipeline process-
reduces on-chain synchronization overhead and enhances transaction ing based on the 2PC protocol to enhance cross-shard transaction
efficiency. Additionally, the V-Bridge incorporates an intelligent shard throughput but falls short in mitigating long transaction latencies.
adjustment mechanism based on a Consistent Hashing Ring [19,20] Building on these advancements, we propose the V-Bridge protocol,
and GINI coefficients [21], integrated with the Cross-Shard Optimized offering an efficient and flexible solution to the challenges of cross-
shard transactions. V-Bridge supports seamless operation in complex
Critical Path Protection Algorithm (CSOCPPA). This combination im-
transaction scenarios, ensuring the efficient functioning of blockchain
proves the coordination of dynamic shard adjustments and cross-shard
systems.
transactions, further enhancing system performance.
The main contributions of this article are summarized as follows: 2.2. Payment channel
• Blockchain Sharding Protocol Based on Virtual Fund Payment Channels Payment channels are off-chain mechanisms that improve blockchain
(V-Bridge): We propose a virtual channel solution inspired by off- performance by optimizing transaction processing, reducing network
chain payment channels to facilitate cross-shard transactions. This load, delays, and fees. Users lock funds via smart contracts, enabling
approach minimizes direct interactions between shards, significantly multiple off-chain transactions without recording each one on the
reducing communication overhead and transaction delays. Conse- blockchain. The mechanism defines rules for initial fund allocation and
quently, V-Bridge enhances system throughput and overall perfor- status updates. Users can update the status off-chain, with each update
mance. verified by signed certificates from both parties. When the channel
closes, the final status is submitted to the blockchain to settle the fund
• Cross-shard Optimization and Dynamic Shard Adjustment Mechanism:
distribution. By minimizing on-chain interactions, payment channels
The V-Bridge Protocol integrates the Cross-Shard Optimized Crit-
reduce blockchain resource use. A key feature is the dynamic adjustment
ical Path Protection Algorithm (CSOCPPA) and a dynamic shard
of participant balances while maintaining total fund immutability. This
adjustment mechanism to improve system performance. CSOCPPA improves transaction efficiency and ensures security. Payment channels
enhances the Metis algorithm [17] for better account allocation, re- leverage cryptographic tools like digital signatures to ensure transaction
ducing cross-shard assignments. The dynamic adjustment mechanism integrity and reduce the costs and delays of on-chain operations,
uses Consistent Hashing and GINI coefficient analysis to optimize especially during congestion.
shard splitting and merging for balanced load distribution. In recent years, payment channels have been widely used in many
• System Implementation: We implemented the V-Bridge protocol and fields, such as privacy protection [29], network scalability [30] and
evaluated its performance on Ubuntu 20.04.1. Experimental results cross-chain interoperability [31], Internet of Things expansion [32]
demonstrate that, compared to BrokerChain and X-Shard, V-Bridge and other directions. In the sharding architecture of this article, the
outperforms in terms of throughput, transaction confirmation la- off-chain payment channel also provides an efficient solution for pro-
tency, workload balancing, and consensus success rate under identical cessing cross-shard transactions. By transferring transactions between
shards to off-chain, payment channels effectively reduce the complexity
conditions.
of cross-shard communications, significantly improve the throughput
and performance of the sharding system, and provide important support
The remainder of this paper is organized as follows. Section 2
for the further development of sharding technology.
reviews related work. Section 3 presents an overview of the V-Bridge
system. Section 4 details the protocol design. Section 5 introduces cross- 3. Overview
shard optimization and dynamic shard adjustment. Section 6 provides a
security analysis. Section 7 reports experimental results, and Section 8 This section introduces the V-Bridge system model and its workflow,
concludes the paper. followed by the deployment process of Trustor.
2
X. Huang et al. Computer Standards & Interfaces 97 (2026) 104123
3.1. System architecture and workflow 3.2. Trustor deployment
Similar to BrokerChain [16], V-Bridge is also executed in epoch, and In our V-Bridge, we introduce the concept of the Trustor. Before
V-Bridge includes two types of shards: settlement and account shards, delving into the specifics of the Trustor, it is essential to understand
of which there are S settlement shards and 1 status shard. The specific the concept of a trust group. A trust group consists of nodes selected by
definitions are as follows: the system to facilitate the transfer of funds across shards. These nodes
provide liquidity through staking or contributing resources like funds
• Settle shard (S-Shard): Generates transaction blocks by packaging or computing power, creating a bridge for cross-shard transactions.
transactions and achieves intra-shard consistency at the beginning of In return, they earn commissions as incentives. By ensuring sufficient
each epoch. liquidity, these nodes establish virtual transaction channels between
• Account Shard (A-shard): A-shard optimizes account allocation based shards, enabling successful cross-shard payments for users both inside
on the users transaction history data to alleviate the problem of load and outside the shards. The node performing this critical role is the
imbalance in the shard system. During the system startup phase, A- Trustor. The process for participating in system transactions involves
shard collects user transaction data from S-shard, uses this data to the following key steps:
build a user transaction network, and optimizes user status distribu-
Step1 Trustor application: At the start of each epoch, nodes from
tion through the CSOCPPA. After the optimization, A-shard generates
various shards may apply to serve as Trustors by submitting collateral,
a status block and sends it to S-shard to update the user account status
undergoing a credit assessment (excluding low-reputation nodes with
distribution.
poor historical performance), and providing proof of sufficient liquidity
Other Settings: We adopt a Byzantine fault-tolerant adversarial model to meet the minimum funding threshold.
[33], assuming the presence of malicious actors capable of compromis- Step2 Qualification verification: The system assesses applicant
ing specific shard nodes and performing arbitrary dishonest actions, nodes based on their collateral, computational capacity, credit score,
such as data tampering, delayed messaging, or refusing to submit re- and liquidity. Each node is assigned an initial credit score that re-
quired information. These adversaries are slow-adaptive, meaning they flects its resources and historical performance. The system prioritizes
cannot frequently rotate the compromised nodes within a single epoch. selecting Trustors from nodes with higher credit scores. However, to
The system assumes a partially synchronous communication model, mitigate the risk of excessive centralization, a probabilistic selection
where messages may experience delays but are guaranteed eventual mechanism is employed. Specifically, high-reputation nodes (Top 30%)
delivery. are assigned a 60% probability of selection, while medium-reputation
Building upon this adversarial model, the V-Bridge mechanism is nodes (Top 30%60%) are allocated a 40% probability. Even if there
designed to ensure resilience against such malicious actors. The system are enough high-reputation candidates, there remains a 40% chance
architecture is structured into three key layers: the Network Initializa- of selecting a medium-reputation node, thus distributing power within
tion Layer, the Account State Reconstruction Layer, and the Transaction the system and reducing centralization risks. In the event that there
Processing and Consensus Layer. are insufficient qualified candidates, the system will randomly select
additional well-performing nodes to supplement the trust set. The
• Network Initialization Layer (NIL): This layer is the core foundation of trusted set is dynamically maintained: nodes with significantly reduced
V-Bridge and is responsible for the reasonable sharding of nodes and liquidity or credit scores are automatically removed.
transaction loads in the system to form multiple independent sharding Step3 Leader election: For each shards trust set, the system des-
structures. ignates the Trustor node with the highest credit score as the leader,
• Transaction Processing and Consensus Layer (TPCL): According to the with the leaders term being tied to the current epoch. The leader is
distribution of accounts and transactions, this layer optimizes and re- not allowed to serve consecutive terms in two consecutive epochs. Each
constructs the account status within the shard through the CSOCPPA, Trustor generates a unique identifier 𝑇id , defined as:
generates a dynamic state reconfiguration plan, and improves system
performance. 𝑇id = hash(ShardID ∥ Rep ∥ Deposit ∥ Value ∥ Relate)
• Account State Reconstruction Layer (ASRL): This layer is responsible where Rep is the credit score, Deposit is the collateral, Value is the
for transaction verification and consensus, ensuring the security and remaining balance, Relate indicates the cross-shard relationships.
consistency of all transactions. The Relate variable is defined as:
{
Based on the above architecture, each layer works together to real- 0 if no channel with other shards;
Relate =
ize each epoch process from node identity authentication to transaction ShardID if a channel with another shard exists.
processing. As shown in Fig. 1, the workflow can be divided into the
following five steps: Each 𝑇id is stored within the shard. The leader can query these 𝑇id
Step1 PoW verification: Nodes verify their identity using PoW [34] values to execute cross-shard transactions and select the most suitable
(via public key and IP) to prevent Sybil attacks. Verified nodes are executor from the trusted set.
evenly distributed across shards in a round-robin fashion.
Step2 Select Trustor: After shard assignment, nodes apply to serve 4. V-Bridge protocol design
as Trustors in this epoch. Each shard forms a trust group to support
subsequent transactions. In this section, we introduced the core modules of the V-Brige pro-
Step3 Transaction consensus: The settlement shard validates tocol, including the new Merkle tree, the solution of cross-shard trans-
transactions and adds them to the pool. Consensus is reached via actions based on virtual payment channels, and the final transaction
PBFT [33]. For cross-shard cases, virtual channels (Section 4.2) manage settlement.
interactions; failed verifications trigger rollback (Section 4.3).
Step4 State optimization: The CSOCPPA algorithm (Section 5.1) 4.1. New Merkle Patricia Tree
adjusts account placement based on access patterns. A consensus-
derived state block is distributed to shards for the next epoch. To support efficient user state queries and cross-shard transaction
Step5 Epoch sharding: In the new epoch, PBFT [33] guides account routing, we design a shard management framework that combines a
migration and transaction routing based on the latest state diagram, Consistent Hashing Ring with a New Merkle Patricia Tree (NMPT) (see
ensuring consistency and performance. Fig. 2).
3
X. Huang et al. Computer Standards & Interfaces 97 (2026) 104123
Fig. 1. Workflow diagram of V-Bridge for an Epoch.
Fig. 2. The data structure and mapping of NMPT.
When a user initiates a query or transaction request that requires the first shard whose assigned range contains 𝐻𝑢 , thereby determining
locating their associated shard, the system first computes the users the users query shard. Each shard maintains a uniquely assigned hash
hash value 𝐻𝑢 = Hash(ID𝑢 ) using the SHA-256 algorithm [35]. It then interval and dynamically records the usershard mappings within that
performs a clockwise search on the consistent hash ring [19] to identify range to support efficient user lookup and cross-shard routing.
4
X. Huang et al. Computer Standards & Interfaces 97 (2026) 104123
Fig. 3. Example of cross-shard Tx processing.
The NMPT serves as the core of state storage and improves on the This message is signed by Alice with Sig𝐴 and sent to the leader of the
traditional Merkle Patricia Tree by: trust group in her shard, denoted as 𝑀leader .
Upon receiving the message, the leader verifies whether Alices bal-
• Separating ordinary and Trustor user states into independent ance is sufficient to cover the transaction fee and ensures no duplicate
branches; transactions with the same timestamp, thereby preventing double-
• Embedding a Shard Range module to explicitly encode shard spending. Once verification is complete, the leader selects 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑀
positions on the hash ring; with sufficient financial reserves based on the transaction requirements
• Enabling parallel updates, fast redirection, and reduced state and then sends the verification results and the 𝑇id information of the
conflicts. 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑀 responsible for this transaction to other Trustor nodes for
confirmation. Suppose 2/3 of the Trustors agree to the transaction.
These enhancements ensure consistency and rapid verification in In that case, the leader will record the transaction, send 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑀 s
dynamic, cross-shard environments. The usage of this framework in information to Alice for confirmation, and then 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑀 will begin
transaction execution is elaborated in the next section. processing the transaction.
Step2 Trustor coordination and locking: When the 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑀
4.2. The specific implementation process of V-brige protocol receives the transaction request 𝑇 𝑋request , it extracts the recipient Bobs
user ID and uses a Hash Ring to locate his shard (e.g., Shard3). It then
generates and sends a query message:
We establish a channel contract (ChannelContract) between two
{{ } }
shards via a Trustor, converting most transactions into intra-shard op- 𝑄𝑀 = 𝑇 𝑋request , Inform, Sig𝑀
erations. Fund transfers between channel participants occur off-chain,
with on-chain interactions limited to channel creation and final set- Upon receiving 𝑄𝑀 , Shard3 verifies the transaction and retrieves Al-
tlement. This approach significantly reduces the on-chain overhead of ices account by checking her signature. It consults the Dynamic Map-
ping Table to determine the shard locations of both Alice and Bob
cross-shard transactions and enhances system performance. The proto-
(e.g., Shard1 and Shard2), then forwards the query to Shard2 and
col involves four main steps: Initial Fund Locking, Trustor Coordination
broadcasts the involved shard locations.
and Locking, 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑁 Payment Execution, and HTLC (Hashed Time-
Afterward, 𝑇 𝑋request is submitted to Shard1 for verification. Simul-
lock Contract) Unlocking [36,37]. To illustrate the protocols operation,
taneously, the leader node 𝑁leader in Shard2 analyzes the transaction
we present a cross-shard transaction case that demonstrates how it
and trust-related metrics to select a 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑁 to handle execution.
ensures the seamless completion of cross-shard transactions (see Fig.
This proactive selection validates the transaction without waiting for
3). Alice, a user in Shard1, wants to initiate a cross-shard transaction
Shard1s response, reducing delay.
to transfer an amount 𝑣 to Bob, a user in Shard2. The process unfolds
The selected 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑁 in Shard2 extracts 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑀 s address from
as follows:
Sig𝑀 and deploys a contract 𝐶𝐵 , which locks the specified funds. The
Step1 Initial fund locking: First, Alice must locate Bobs shard contract enforces mutual agreement between senders and receivers
position through the Hash Ring. Therefore, Alice will create a message Trustor to release funds. If consensus is not reached, the funds remain
containing the transaction information, structured as follows: locked. 𝐶𝐵 also includes timeout and fallback mechanisms to ensure
𝑇 𝑋request = {Property𝐴 , ToUser𝐵 , Time𝐴 , 𝑣, 𝑇lock , Sig𝐴 } progress under adverse conditions. Once completed, the verifier in
Shard2 broadcasts the result to confirm transaction integrity.
where 𝑇lock is the preset lock time for the funds, Time𝐴 is the timestamp In parallel, upon confirmation of 𝑇 𝑋request in Shard1, 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑀
for creating the transaction, and 𝑣 represents the transaction amount. creates another contract 𝐶𝐴 and locks funds under rules similar to 𝐶𝐵 .
5
X. Huang et al. Computer Standards & Interfaces 97 (2026) 104123
Table 1
Symbol explanations for algorithms.
Symbol Explanation
G Graph structure, containing nodes (accounts) and edges (transactions).
u, v Account nodes in the graph, representing any two accounts.
a, b Account nodes involved in matching or merging operations.
degree(a) The degree of account 𝑎, representing the number of neighbors (transactions) it has.
maxDepth Maximum depth of a node, used to prioritize high-frequency trading nodes.
neighbor(a) The neighboring accounts of account 𝑎, connected in the graph.
edge(u, v) Edge between accounts 𝑢 and 𝑣, representing the transaction connection.
edge_weight(u, v) Weight of the edge between accounts 𝑢 and 𝑣.
target_size Target size for the coarsened network, representing the desired scale.
threshold Threshold value for merging or retaining nodes based on transaction volume.
region Region formed by the depth-priority growing algorithm, containing multiple accounts.
sorted_accounts List of accounts sorted by degree (transaction volume).
When 𝐶𝐴 is finalized, its details are shared with Shard2 for synchro- 4.3. Transaction settlement and failure handling
nization.
Alice then sends the funds 𝑣 and a message 𝜁 to 𝐶𝐴 , where 𝜁 = The transaction settlement process follows a standard payment
{𝑣, Timenow , ToUser𝐵 , 𝑇lock }. The contracts unlocking condition channel model, addressing both cooperative and exceptional scenarios.
requires that the corresponding transfer occurs within the time window In the cooperative case, 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑀 and 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑁 mutually agree to
𝑇lock , avoiding conflicts or double-spending. This ensures that the close the channel. 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑀 generates and signs a closure message
locked funds are correctly routed to 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑀 . containing the final balance and fund allocation, then forwards it to
Step3 TrustorN payment execution: In Shard2, after the contract 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑁 for verification and co-signature. The jointly signed message is
is established, the 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑁 creates an intra-shard transaction message broadcast to the blockchain, where Shard 1 and Shard 2 verify the sig-
as follows: natures and release the allocated funds. If residual funds remain locked,
the inter-shard smart contract redistributes them to the appropriate
𝑇 𝑋second = {Property𝑁 , toUser𝐵 , Time𝑁 , 𝑣, Sig𝑁 }
addresses, finalizing the settlement.
This transaction is sent to the verifier nodes within the shard for In the uncooperative case, 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑀 may unilaterally broadcast
validation. Meanwhile, Bob generates a random number 𝑅 and creates the most recent transaction state, initiating a challenge period during
the following message: which 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑁 can dispute outdated information. If no challenge is
made, the transaction is settled according to the submitted state.
𝜁 = {H(𝑅), Sig𝐵 , {𝑇 𝑋second }}
In the event of abnormal failures, recovery mechanisms safeguard
This message is sent to 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑀 . Upon receiving the message, the both fund security and system liveness. If 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑁 maliciously with-
𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑀 forwards H(𝑅) to contract 𝐶𝐴 , which automatically initiates holds the required secret 𝑅, and 𝐶𝐴 fails to receive it within the time-
the HTLC [36]. According to the rules of 𝐶𝐴 , only when 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑀 out, a rollback is triggered: funds are refunded to 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑀 , 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑁
provides the value of 𝑅 corresponding to H(𝑅) within the specified forfeits their deposit, and suffers a reputation penalty. Repeated of-
time 𝑇1 (𝑇1 < 𝑇lock ), 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑀 can start the next transaction smoothly. fenses may result in disqualification.
Otherwise, when the time 𝑇lock for Alice to lock the funds ends, the If failure occurs due to force majeure, 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑁 submits a failure
funds will return to Alices account, and the system will start to query proof
the initiator of the transaction failure and punish him (Section 4.3).
𝜇 = {Sig𝑁 , 𝑇 𝑋request , 𝑇 𝑋second , Tablenow }
Once 𝑇 𝑋second is included in the block, 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑁 will transfer the
funds 𝑠 = 𝑣 to contract 𝐶𝐵 , locking them for the time period 𝑇2 (𝑇2 < to Shard 1 and Shard 2. Once validated that the transfer could not be
𝑇1 ). Bob is then notified that the funds are ready to be claimed. Once completed within the lock time 𝑇1 , the locked funds in 𝐶𝐵 are refunded
Bob provides the correct 𝑅 within the specified time window, the to 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑁 , and Shard 1 returns funds to 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑀 , ensuring proper
locked funds will be released to Bob. recovery and termination (see Table 1).
Step4 HTLC unlocking: The 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑁 completes the payment, yet
the funds remain locked unless the process is correctly finalized within 5. Cross-shard optimization and dynamic shard adjustment mech-
the designated time window 𝑇2 . To initiate the release, Bob must submit anism
the correct value of 𝑅 along with his digital signature, which enables
public verification. Upon receiving and verifying 𝑅, 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑁 generates 5.1. CSOCPPA
a channel message:
Traditional Metis algorithms [17] coarsen transaction graphs using
𝜃1 = {Sig𝑁 , Tablenow , 𝑅}
random or edge-weighted matching but often overlook critical paths
Here, Tablenow represents the latest balance allocation table, which is and high-transaction nodes in cross-shard transactions. These elements
then sent to the 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑀 . The 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑀 provides𝑅 to the contract 𝐶𝐴 are essential for throughput and latency, and mishandling them can
for verification. If H(𝑅) matches, the contract notifies 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑀 that increase cross-shard communication and cause load imbalance [38]. To
the match is successful. At this time, 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑀 forwards the message to address this, we propose CSOCPPA, a coarsening-phase optimization al-
other idle Trustors in shard1 to jointly verify the allocation table and gorithm that identifies and preserves critical paths and high-transaction
version number. If 23 of the Trustors verify that it is correct, 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑀 nodes during graph compression (The key symbols and notations used
will generate the following channel message: in our algorithms are summarized in Table 1). CSOCPPA effectively
alleviates load imbalance, improves system throughput, and optimizes
𝜃2 = {Sig𝑀 , {𝜃1 }}
overall resource utilization.
and sends it back to 𝑇 𝑟𝑢𝑠𝑡𝑜𝑟𝑁 . The idle Trustor group in Shard2 verifies (1) Adjustment during the coarsening stage: Consider a directed
the updated balance table and ensures that the latest channel states weighted graph 𝐺 = (𝑇 , 𝐸), where 𝑇 represents the transaction nodes
are properly recorded. Through this coordinated process, the updated and 𝐸 represents the transaction edges. Each node 𝑡𝑇 represents a
balances are securely synchronized, and the final signed states preserve user account or contract, and each directed edge 𝑒 = (𝑡𝑖 , 𝑡𝑗 ) ∈ 𝐸 denotes
the integrity and consistency of the transaction. a transaction from 𝑡𝑖 to 𝑡𝑗 with associated cost or weight 𝑤𝑒 (𝑒).
6
X. Huang et al. Computer Standards & Interfaces 97 (2026) 104123
To evaluate transaction chain structure during coarsening, we de- Algorithm 1: Network Coarsening with Isolation Prevention
fine the longest path 𝑃 as the path with the highest cumulative trans- Input: 𝐺 , target_size , threshold
action cost or weight from the starting node to the terminal node. To Output: 𝐺𝑐𝑜𝑎𝑟𝑠𝑒𝑛𝑒𝑑
capture the depth and height of such a chain, we introduce two metrics: 1 foreach 𝑎𝐺.𝑎𝑐𝑐𝑜𝑢𝑛𝑡𝑠 do
Depth(𝑡), which represents the maximum cumulative weight of any 2 𝑑𝑒𝑔𝑟𝑒𝑒(𝑎) ← count(neighbor (𝑎))// Init node degree
path ending at node 𝑡, and Height(𝑡), which represents the maximum
3 𝑠𝑜𝑟𝑡𝑒𝑑_𝑎𝑐𝑐𝑜𝑢𝑛𝑡𝑠 ← Sort 𝐺.𝑎𝑐𝑐𝑜𝑢𝑛𝑡𝑠,
cumulative weight of any path starting from node 𝑡.
4 by degree // Sort by degree
We define the cumulative transaction cost along a path 𝑃 from 𝑡𝑖 to
5 while |𝐺| > 𝑡𝑎𝑟𝑔𝑒𝑡_𝑠𝑖𝑧𝑒 do
𝑡𝑗 as:
6 foreach 𝑎𝐺.𝑎𝑐𝑐𝑜𝑢𝑛𝑡𝑠 do
𝑛
7 𝑏 ← Select neighbor(𝑎), min_degree // Select
𝛾(𝑡𝑖 , 𝑡𝑗 ) = 𝑤𝑒 (𝑒𝑘 ) (1)
𝑘=1
low-degree neighbor
8 Merge 𝑎, 𝑏 // Merge nodes
where 𝑒𝑘 are the edges along the path 𝑃 . 9 Update 𝐺, 𝑎, 𝑏 // Update graph
The weight 𝜔(𝑡) of a node 𝑡 denotes the sum of all weights of 10 if degree(𝑎) < threshold then
incoming and outgoing edges connected to 𝑡, representing the total 11 Merge low-volume accounts into
transaction volume related to that account. 12 super accounts // Group small nodes
Based on this, Depth and Height can be recursively computed as: 13 Update connect super accounts to
( )
Depth(𝑡) = max Depth(𝑡𝑖 ) + 𝜔(𝑡𝑖 ) + 𝛾(𝑡𝑖 , 𝑡) (2) 14 neighbors // Reconnect
𝑡𝑖 ∈Predecessors(𝑡)
( ) 15 foreach (𝑢, 𝑣) ∈ 𝐺.𝑒𝑑𝑔𝑒𝑠 do
Height(𝑡) = max Height(𝑡𝑗 ) + 𝜔(𝑡𝑗 ) + 𝛾(𝑡, 𝑡𝑗 ) (3) 16 𝑒𝑑𝑔𝑒_𝑤𝑒𝑖𝑔𝑡(𝑢, 𝑣) ← calculate_weight(𝑢, 𝑣)
𝑡𝑗 ∈Successors(𝑡)
// Reweight edges
These metrics allow us to identify high-impact paths in the transac-
17 if edge_weight(𝑢, 𝑣) > threshold then
tion graph for partitioning and optimization purposes.
18 Merge strongly connected accounts ; // Preserve
The formula for the longest path maxPath(𝑒) can be adjusted as
strong links
follows:
maxPath(𝑒) = Depth(source(𝑒)) + 𝑤𝑒 (𝑒) + Height(target(𝑒)) (4) 19 return 𝐺𝑐𝑜𝑎𝑟𝑠𝑒𝑛𝑒𝑑 ;
This formula calculates the longest path of transaction edge𝑒 within
the system. Here, source(𝑒) represents the starting node of transaction
edge𝑒, and target(𝑒) represents the terminal node of transaction edge𝑒. increasing the cost of cross-shard transactions and communication. To
If the maxPath(𝑒) value of a path is high, it indicates that the accounts address this issue, we propose the Depth-Priority Growing Algorithm
and transactions on the path have a significant influence or frequent (DPGA). This algorithm prioritizes accounts with high transaction fre-
transaction records. In the state partitioning process, such high-weight quency and network importance (i.e., nodes with larger maxDepth) as
paths should not be fragmented; instead, the transactions on these paths starting points. Through a region-growing strategy, eligible neighboring
should be assigned to the same partition to reduce communication nodes are merged into the same region until no more neighbors can be
and synchronization costs associated with cross-partition transactions. added. This approach reduces the dispersion of high-frequency trans-
After completing the calculation of maxPath(𝑒), we also need to address action accounts and lowers the complexity of cross-shard transactions.
issues such as reducing the likelihood of transaction account isolation. The pseudocode is presented in Algorithm 2.
Therefore, to further improve the partitioning process, the following
steps are executed, as shown in Algorithm 1.
5.2. Dynamic sharding adjustment mechanism
Step 1: Degree calculation and initial matching
Calculate the degrees of all accounts and sort them in ascending
By adjusting account allocation, the overall load of the sharding
order. Select the account with the lowest degree from the set of un-
system can be significantly improved. To further ensure the systems
matched accounts and match it with its adjacent accounts. Prioritize the
performance, we incorporate a dynamic sharding mechanism combined
reduction of isolated accounts and prevent the spread of low-frequency
with the CSOCPPA to maintain load stability across the system. The
trading accounts across shards to minimize cross-shard transactions.
dynamic sharding adjustment mechanism monitors the GINI coefficient
Step 2: Multi-Edge matching phase
to assess load balance and evaluates transaction patterns to dynamically
Match adjacent accounts in descending order of edge weight. If
split or merge shards. This approach optimizes both load balancing and
multiple accounts share the same weight, prioritize accounts with fewer
cross-shard transaction performance. The following sections provide
merged edges to preserve high-frequency trading paths and minimize
a detailed explanation of shard load definitions, the GINI coefficient
cutting critical paths.
measurement criteria, and the dynamic adjustment mechanism.
Step 3: Network update and simplification
After each match, update the networks connections. If the network (1) Definition and algorithm of shard load: In a distributed
reaches the target size, proceed to the next step; otherwise, return to system, shard load is a key indicator for measuring the workload of
Step 2 and continue simplifying the network. shards. The calculation of shard load should comprehensively consider
Step 4: Load balancing and super account creation various factors, including the number of users stored within a shard,
After coarsening, merge low-transaction accounts into super ac- the transaction volume processed, and the frequency of cross-shard
counts. Ensure these super accounts remain connected to their original interactions. To ensure the comprehensiveness of the load indicator,
neighbors, accumulate all edge weights, and retain full transaction the following load calculation methods are defined:
data. The user count load of shard 𝑆𝑖 is defined as the number of users
(2) Initialization phased optimization: In the context of cross- managed by the shard, expressed as:
shard transactions, the initial partitioning is critical for subsequent
𝑙𝑗𝑢 = 𝑈𝑗 (5)
optimizations. The traditional Metis algorithm [17] grows regions by
randomly selecting starting points, which may lead to high-frequency The transaction volume load of shard 𝑆𝑖 is defined as the total trans-
transaction accounts being distributed across different shards, thereby action volume processed by the shard within a unit of time, expressed
7
X. Huang et al. Computer Standards & Interfaces 97 (2026) 104123
Algorithm 2: Depth-Priority Growing Algorithm for Cross-Shard • 𝐺 ∈ (0.3, 0.5]: The system load is basically balanced, and load
Transaction Optimization differences are within an acceptable range.
Input: 𝐺, target_size, threshold, maxDepth • 𝐺 > 0.5: The system load is imbalanced, requiring redistribution
Output: 𝐺𝑐𝑜𝑎𝑟𝑠𝑒𝑛𝑒𝑑 of high-load shards or merging of low-load shards.
1 foreach 𝑎𝐺.𝑎𝑐𝑐𝑜𝑢𝑛𝑡𝑠 do
(3) Dynamic sharding adjustment: In a distributed shard system,
2 𝑑𝑒𝑔𝑟𝑒𝑒(𝑎) ← count(neighbor (𝑎)) ;
load balancing operations, including shard splitting and shard merging,
// Initialize degree
are triggered based on specific conditions to maintain system efficiency
3 𝑠𝑜𝑟𝑡𝑒𝑑_𝑎𝑐𝑐𝑜𝑢𝑛𝑡𝑠 ← Sort 𝐺.𝑎𝑐𝑐𝑜𝑢𝑛𝑡𝑠, by and fairness. When the load of a shard significantly exceeds that of
4 Priority, then by maxDepth ; others or the GINI coefficient surpasses a predefined threshold, the
5 while |𝐺| > 𝑡𝑎𝑟𝑔𝑒𝑡_𝑠𝑖𝑧𝑒 do
system triggers shard splitting. The shard with the highest load, 𝑆max ,
6 foreach 𝑎𝐺.𝑎𝑐𝑐𝑜𝑢𝑛𝑡𝑠 do is identified based on the condition:
7 if maxDepth(𝑎) > threshold then
// Prioritize high-frequency nodes 𝑙max > 𝜇 + 𝛾𝜎 (9)
8 𝑏 ← Select neighbor(𝑎), 1 ∑𝑚
where 𝜇 = 𝑚 𝑗=1 𝑙𝑗 is the average load of all shards, 𝜎 =
9 max_degree ; √ ∑
1 𝑚 2
10 Merge 𝑎, 𝑏 ; 𝑚 𝑗=1 (𝑙𝑗 𝜇) is the standard deviation, and 𝛾 is an adjustment
11 Update 𝐺, 𝑎, 𝑏 ; coefficient. For users in the split shard, the system employs consistency
12 if degree(𝑎) < threshold then hashing to find corresponding storage shards and updates the
// Identify low-volume nodes mapping to ensure query consistency. Conversely, when certain shards
13 Merge low-volume accounts into experience prolonged low load below a predefined threshold, the
14 super accounts ; system triggers shard merging. The set of candidate shards for merging,
15 Update connect super accounts 𝑆low , is identified based on the condition:
16 to neighbors ;
𝑙𝑗 < 𝜃 (10)
17 foreach (𝑢, 𝑣) ∈ 𝐺.𝑒𝑑𝑔𝑒𝑠 do
where 𝜃 is the minimum load threshold, during the merging process, the
18 𝑒𝑑𝑔𝑒_𝑤𝑒𝑖𝑔𝑡(𝑢, 𝑣) ←
system updates the user-to-shard mapping and ensures synchronization
19 calculate_weight(𝑢, 𝑣) ;
of query positions for affected users. These mechanisms collectively
20 if edge_weight(𝑢, 𝑣) > threshold then
enhance system load distribution and ensure balanced utilization of
21 Merge strongly connected
resources.
22 accounts ;
// Merge strong connections
6. Security analysis
23 return 𝐺𝑐𝑜𝑎𝑟𝑠𝑒𝑛𝑒𝑑 ;
6.1. Atomicity of transactions
V-Bridge uses a hypergeometric distribution to calculate the prob-
as: ability of failure in each epoch. In V-Bridge, atomicity depends on
∑ whether multiple shards can complete state verification and fund con-
𝑙𝑗𝑡 = 𝑡𝑘 (6)
firmation within a specified time window while maintaining a secure
𝑘∈𝑇𝑗
state. If any shards consensus committee includes more than 13
where 𝑇𝑖 represents the set of all transactions related to shard 𝑆𝑖 , and malicious nodes, the shard is considered failed, potentially causing
𝑡𝑘 represents the transaction volume of transaction 𝑘. transaction abortion or triggering exceptional rollbacks.
By combining these factors, the comprehensive load is defined as: Assume that the current epoch includes 𝑁 valid registered nodes,
𝑙𝑗 = 𝛼𝑙𝑗𝑢 + 𝛽 ⋅ 𝑙𝑗𝑡 (7) with 𝑡 = 𝑓𝑁 being malicious. Each shard randomly selects 𝑛 nodes to
⌊ ⌋ its consensus group. The probability that a shard contains at least
form
where 𝛼 and 𝛽 are weighting coefficients used to balance the contribu- 𝑛
3
malicious nodes is:
tion of different load sources to the shards pressure, this comprehen-
sive load indicator can better reflect the actual pressure of the shard ( 𝑡 ) (𝑁𝑡)
𝑛
𝑥
𝑛𝑥
and provide a reference for continuous adjustment. 𝑃shard-fail = (𝑁 ) (11)
⌊ ⌋
(2) Load Balancing measurement based on GINI coefficient: 𝑥= 3𝑛 𝑛
This study introduces the GINI coefficient as a measure to evaluate the
load distribution state among shards. The GINI coefficient is a classical Here, 𝑁 is the total number of registered nodes, 𝑡 = 𝑓𝑁 is the
indicator of distribution inequality, with a range of [0, 1]. A value closer number of malicious nodes, 𝑛 is the number of consensus nodes per
to 0 indicates a more balanced distribution, while a value closer to 1 shard, and 𝑥 denotes the number of malicious nodes in a single shard.
indicates a more imbalanced distribution. If a transaction spans 𝑆txn shards, the upper bound on the atomicity
In the shard load scenario, the calculation formula for the GINI failure probability is:
coefficient is as follows:
∑𝑚 ∑𝑚 𝑃 𝑟[Atomicity Failure] < 𝑆txn ⋅ 𝑃shard-fail (12)
𝑗=1 𝑘=1 |𝑙𝑗 𝑙𝑘 |
𝐺= ∑ (8)
2𝑚 𝑚 𝑗=1 𝑙𝑗
As long as the proportion of malicious nodes in each shard does
not exceed 13 (i.e., 𝑓 < 13), V-Bridge can guarantee atomicity. In
where 𝑚 represents the number of shards, and 𝑙𝑗 represents the load of
the subsequent analysis, we demonstrate how V-Bridge ensures atomic
shard 𝑆𝑗 . Based on the calculated GINI coefficient, the current balance
cross-shard transactions.
of system shard loads can be directly judged. The specific judgment
rules are as follows:
Theorem 1. If a cross-shard transaction completes within the HTLC window
𝐺 < 0.3: The system load is completely balanced, and all shard 𝑇1 and submits a valid 𝑅, V-Bridge guarantees atomicity without requiring
loads are identical. rollback.
8
X. Huang et al. Computer Standards & Interfaces 97 (2026) 104123
Proof. Assume the transaction involves shards 𝑆1 , 𝑆2 , . . . , 𝑆𝑘 , where 7. Experimental evaluation
the funds in each shard are locked using HTLC contracts. The unlocking
condition requires the receipt of a correct 𝑅 that satisfies a predeter- 7.1. Setting
mined hash value H(𝑅) within the window 𝑇1 , fulfilling the release
condition. We developed a prototype of V-Bridge in Golang and evaluated its
performance on Ubuntu 20.04.1. The testbed was configured with an
Once the receiver Bob submits the correct 𝑅 within 𝑇1 , all con-
8-core AMD Ryzen 6000 processor, 16 GB LPDDR5-6400 memory, and
tracts are deemed releasable. Trustor nodes in each shard immediately
a 1TB PCIe 4.0 SSD. To emulate real-world conditions, we introduced
execute the fund release and update the balance state as Tablenow .
random network latency between 50100 ms and limited the band-
This updated state is signed by both Trustors and broadcast across all
width to 500 Mbps. Each block accommodates up to 3000 transactions,
involved shards for chain-level finalization. Since valid release requires
with each transaction fixed at 512 bytes. The number of C-Shards was
the correct 𝑅 and consistent co-signatures, the final state is valid only
set as 𝑆 ∈ {2, 4, 8, 16, 32, 64, 100} to evaluate system scalability.
under mutual agreement. Therefore, if the transaction succeeds, all
The dataset was extracted from the Ethereum blockchain using
contracts release simultaneously; otherwise, if any contract fails to
Python scripts from the XBlock-ETH project, including sender/receiver
trigger, it indicates that 𝑅 was not submitted, and the transaction is
addresses, amounts, and timestamps. The preprocessed data was used
entirely aborted without partial execution. In conclusion, as long as 𝑅
as input for V-Bridge.
is submitted within 𝑇1 , the HTLC conditions ensure atomicity across all
In dynamic load regulation, we assigned weight factors 𝛼 = 0.4 and
shards without the need for rollback logic.
𝛽 = 0.6 to user count and transaction volume. The GINI coefficient was
used to evaluate shard imbalance. A split is triggered when GINI > 0.5,
Theorem 2. If a cross-shard transaction misses the HTLC deadline 𝑇1 , V-
and merging is triggered when the condition 𝑙𝑗 < 𝜃 = 0.3𝜇 is met,
Bridge ensures a consistent rollback across all shards, preventing fund loss where 𝜇 is the average load. To reduce over-sensitivity, the adjustment
or double spending. factor was set to 𝛾 = 1.5. In the CSOCPPA module, all transaction edges
and node weights were set to 1 to simplify computation, representing
Proof. The HTLC contract in V-Bridge is equipped with a unified a uniform transaction cost.
timeout parameter 𝑇1 . If the correct random secret 𝑅 is not submitted For comparison, we implemented two additional load-balancing
within this time frame, the contract automatically triggers a rollback schemes: BrokerChain and X-Shard. BrokerChain utilizes a partitioning
mechanism, returning the locked funds to the original accounts. Since algorithm based on user account relationships, grouping frequently
the funds remain unreleased throughout the process, the state in each interacting accounts within the same shard to reduce cross-shard trans-
shard remains unchanged, and the system proceeds to restore consis- actions and enhance throughput. X-Shard employs an optimistic cross-
tency. There is no partial commitment or state divergence, and the shard transaction strategy, processing transactions in parallel on input
design inherently prevents double spending. Therefore, even in the shards and verifying them via gate accounts to minimize delays. All
event of transaction failure, atomicity and consistency of the system solutions were tested under identical conditions using the PBFT [33]
are preserved. This completes the proof. intra-shard consensus protocol.
6.2. Trustor security 7.2. System throughput
To address potential malicious or faulty behavior among Trustors, Fig. 4 illustrates how throughput varies with the number of shards
the system employs a dynamic reputation mechanism that continuously (𝑆) using a combination of boxplots, kernel density estimation curves,
monitors node performance. Actions such as refusing to sign, forward- and scatter plots. The boxplot shows the quartile throughput range,
ing delays, or failing to submit HTLC secrets are penalized. Nodes with white dots marking the median. The smooth kernel density curve
whose reputation scores fall below a defined threshold are demoted, highlights data concentration, while the scatter plot visually represents
stripped of execution privileges, and forfeit their staked collateral. In individual data points. As 𝑆 increases from 4 to 100, V-Bridge consis-
the event of partial trust failure, the system ensures continuity through tently demonstrates superior throughput, with values stabilizing around
leader re-election or transaction rollback. All critical operations require 3k to 3.5k TPS across different shard counts. This is evident in the tight
co-signatures from at least 23 of the shards Trustors, providing re- interquartile ranges and smooth density curves. BrokerChains through-
silience against Byzantine behavior. We assume a majority of Trustors put remains relatively stable but lower, hovering around 2k to 2.5k
are honest in each round and that HTLC secrets are submitted within TPS, while X-Shards performance lags behind, stabilizing just above 2k
a bounded timeframe. Otherwise, contracts automatically roll back to TPS. Moreover, X-Shard exhibits more variability, with wider boxplots
preserve state consistency. and more scattered data points, highlighting its lower reliability in
These assumptions are realistic in practice: Trustors must stake comparison to V-Bridge. These trends highlight V-Bridges scalability
collateral and earn reputation over time through consistent, verifiable and stability as the number of shards increases.
actions, making large-scale compromises both economically prohibitive Fig. 5 examines throughput under varying transaction arrival rates
and statistically improbable. The HTLC mechanism incorporates ex- (40180 TX/s). Similar to Fig. 4, the boxplot depicts throughput dis-
plicit timeouts and fallback logic, including automated rollback, which tribution, while the kernel density curve and scatter plot highlight
enables the system to function correctly without relying on perfect data concentration and individual variations. V-Bridge consistently
synchrony. achieves nearly 3k TPS across all arrival rates with minimal fluctua-
Execution rights are assigned dynamically based on reputation: tions, confirming its robust performance. BrokerChain maintains steady
high-reputation nodes are prioritized, while low-reputation nodes are throughput around 2k TPS, though fluctuations slightly increase at
sidelined to prevent abuse of authority. Even in cases of collusion, higher arrival rates. X-sharp again underperforms, with throughput
the co-signature requirement significantly raises the threshold for suc- consistently below 2k TPS and significant variability at high arrival
cessful misconduct. Collectively, these mechanisms mitigate the risks rates. These results reinforce the performance and stability advantages
associated with partial Trustor failures and HTLC disruptions. Rather of V-Bridge under heavy transaction loads, while BrokerChain and X-
than relying on idealized assumptions, the system maintains robustness sharp struggle to adapt. The detailed visualizations further validate the
through built-in safeguards such as incentive alignment, role rotation, reliability of the data, providing a strong foundation for comparing
and protocol-level fallback strategies. system performance.
9
X. Huang et al. Computer Standards & Interfaces 97 (2026) 104123
Fig. 4. The impact of the number of shards on throughput.
Fig. 5. Impact of transaction arrival rate on throughput.
7.3. Transaction processing delays As the number of shards increases, the average transaction delay
of all systems shows a gradual downward trend. This indicates that
Fig. 6 depicts the transaction delay under different protocols as a more shards can effectively distribute transaction processing workloads
result of factors such as the number of shards, transaction arrival rate, and improve system performance. V-Bridge demonstrates significant
and cross-shard ratio. From the observed trends, these experimental optimization at higher shard counts (e.g., 32 shards) with delays re-
results reveal key factors affecting delay and highlight the performance duced to around 1400 ms, outperforming BrokerChain and X-Shard.
differences of various protocols in distributed environments.
This reflects V-Bridges superior cross-shard processing efficiency. By
(1) Impact of the number of shards (Fig. 6(a)) contrast, BrokerChain and X-Shard show only slight delay reductions
10
X. Huang et al. Computer Standards & Interfaces 97 (2026) 104123
Fig. 6. Comparison of transaction delays.
Fig. 7. Consensus and load comparison.
with more shards, and the overall delay remains higher than V-Bridge, approximately 1400 ms, demonstrating excellent cross-shard processing
with the gap further widening at 32 shards. capabilities. By contrast, BrokerChain and X-Shard experience signifi-
cant delay increases, particularly when the cross-shard ratio exceeds
(2) Impact of transaction arrival rate (Fig. 6(b))
80%. Their delays surpass those of V-Bridge, highlighting the ineffi-
Increasing the transaction arrival rate leads to an apparent rise
ciency of their cross-shard communication and the more significant
in transaction delay. All protocols exhibit relatively low delays at
impact of high cross-shard traffic.
lower arrival rates (40 TX/s). However, as the arrival rate increases,
delays grow significantly. V-Bridge maintains stable performance even
7.4. Consensus and load comparison
at higher arrival rates (e.g., 180 TX/s), showing good scalability. In
contrast, BrokerChain and X-Shard experience rapidly increasing delays
Consensus success rate is a critical indicator in distributed systems
at higher arrival rates (e.g., above 120 TX/s), indicating their lim-
that measures the proportion of successfully completed consensus pro-
ited capacity to handle higher loads and reflecting their performance
cesses within a given time. It reflects the systems ability to synchronize
bottlenecks in high-load environments.
and process transactions efficiently while maintaining data consistency.
(3) Impact of cross-shard ratio (Fig. 6(c)) A higher success rate directly correlates with better system performance
At a lower cross-shard ratio (20%), delays across all protocols are and reliability. Conversely, a lower success rate can lead to transaction
relatively close. However, as the cross-shard ratio increases, inter- failures, increased delays, or even system reconfiguration. Optimizing
protocol differences become apparent. V-Bridge maintains stable per- consensus protocols enhances the systems stability, performance, and
formance even at an 80% cross-shard ratio, with an average delay of cross-shard transaction processing capabilities.
11
X. Huang et al. Computer Standards & Interfaces 97 (2026) 104123
Fig. 7 illustrates the consensus success rate and transaction effi- Acknowledgments
ciency of three protocols (V-Bridge, BrokerChain, and X-Shard) under
varying conditions, such as the number of shards, transaction arrival This work was partially supported by the Key Program of the Joint
rates, and cross-shard ratios. V-Bridge consistently outperforms the Funds of the National Natural Science Foundation of China under
other protocols across all conditions. In Fig. 7(a), as the number of Grant U2468205, the National Natural Science Foundation of China
shards increases, the consensus success rate and transaction efficiency under Grants 62472168, 62072170 and 61976087, the Hunan Provin-
for all protocols decline due to higher cross-shard communication cial Natural Science Foundation of China under Grants 2021JJ30141
overhead. However, V-Bridge maintains a high success rate (close and 2024JJ6066, the Key Research and Development Program of Hu-
to 95%) even with 32 shards, demonstrating excellent scalability. In nan Province under Grant 2022GK2015, the Science and Technology
contrast, BrokerChain and X-Shard experience significant performance Project of the Department of Communications of Hunan Province under
degradation as the number of shards increases. Fig. 7(b) examines the Grant 202101, and the Research Projects of the Hunan Provincial
impact of transaction arrival rates. V-Bridge maintains stable consensus Department of Education under Grants 23B0449 and 23B0288.
success rates and efficiency even at high arrival rates (e.g., 180 TX/s).
In contrast, BrokerChain and X-Shard exhibit significant declines in Data availability
performance under heavy transaction loads, reflecting their processing
limitations. Fig. 7(c) highlights the impact of cross-shard ratios. V- Data will be made available on request.
Bridge demonstrates strong performance, maintaining a stable success
rate even at an 80% cross-shard ratio. Meanwhile, the success rates of
BrokerChain and X-Shard drop sharply under high cross-shard ratios, References
revealing their weaknesses in handling intensive cross-shard scenarios.
[1] J. Xu, C. Wang, X. Jia, A survey of blockchain consensus protocols, ACM Comput.
Finally, the comparison of load balancing across different shard Surv. 55 (13s) (2023) 135.
numbers shows that V-Bridge achieves the most balanced load distribu- [2] S. Zhang, Z. Yan, W. Liang, K.-C. Li, B. Di Martino, BCAE: A blockchain-based
tion. Its variance remains minimal as the number of shards increases, cross domain authentication scheme for edge computing, IEEE Internet Things
approaching the Optimal Load Balance standard. In contrast, the load J. 11 (13) (2024) 2403524048.
[3] W. Liang, Y. Yang, C. Yang, Y. Hu, S. Xie, K.-C. Li, J. Cao, PDPChain: A
variance for BrokerChain and X-Shard is significantly higher, indicating
consortium blockchain-based privacy protection scheme for personal data, IEEE
poorer load balancing capabilities. Trans. Reliab. 72 (2) (2022) 586598.
[4] W. Liang, S. Xie, K.-C. Li, X. Li, X. Kui, A.Y. Zomaya, MC-DSC: A dynamic
8. Conclusion and future work secure resource configuration scheme based on medical consortium blockchain,
IEEE Trans. Inf. Forensics Secur. 19 (2024) 35253538.
[5] J. Cai, W. Liang, X. Li, K. Li, Z. Gui, M.K. Khan, GTxChain: A secure IoT smart
We propose a novel virtual off-chain cross-shard transaction mech- blockchain architecture based on graph neural network, IEEE Internet Things J.
anism that employs logical fund interactions instead of actual currency 10 (24) (2023) 2150221514.
transfers. This approach eliminates delays caused by continuous up- [6] M.M. Islam, M.K. Islam, M. Shahjalal, M.Z. Chowdhury, Y.M. Jang, A low-cost
cross-border payment system based on auditable cryptocurrency with consortium
loading and significantly enhances throughput. By integrating an intel-
blockchain: Joint digital currency, IEEE Trans. Serv. Comput. 16 (3) (2022)
ligent sharding adjustment mechanism with the CSOCPPA, we address 16161629.
the limitations of traditional account optimization and mitigate load [7] Y. Lu, The blockchain: State-of-the-art and research challenges, J. Ind. Inf. Integr.
imbalance. Experimental results show that, compared to BrokerChain 15 (2019) 8090.
and X-Shard, V-Bridge achieves up to 50% higher average throughput [8] H. Jin, J. Xiao, Towards trustworthy blockchain systems in the era of internet
of value: development, challenges, and future trends, Sci. China Inf. Sci. 65
and reduces transaction latency by at least 15%. Additionally, its (153101) (2022) 111.
consensus success rate consistently exceeds 90%. Across varying shard [9] X. Meng, W. Liang, Z. Xu, K. Li, M.K. Khan, X. Kui, An anonymous authenticated
counts, V-Bridge demonstrates a progressively decreasing load, which group key agreement scheme for transfer learning edge services systems, ACM
remains consistently lower than those of the other two protocols. These Trans. Sen. Netw. 20 (2024).
[10] T. Chen, Z. Li, Y. Zhu, J. Chen, X. Luo, J.C.-S. Lui, X. Lin, X. Zhang,
results underscore V-Bridges superior performance, scalability, and
Understanding ethereum via graph analysis, ACM Trans. Internet Technol. (TOIT)
reliability as a solution for cross-shard transactions. 20 (2) (2020) 132.
In future work, we plan to establish virtual fund channels among [11] L. Luu, V. Narayanan, C. Zheng, K. Baweja, S. Gilbert, P. Saxena, A secure
multiple Trustors to achieve interoperability across the entire shard net- sharding protocol for open blockchains, in: Proceedings of the 2016 ACM SIGSAC
Conference on Computer and Communications Security, ACM SIGSAC, 2016, pp.
work. Additionally, we will explore advanced optimization strategies
1730.
for dynamic shard management to reduce communication further over- [12] Z. Hong, S. Guo, P. Li, Scaling blockchain via layered sharding, IEEE J. Sel.
head. Meanwhile, we aim to incorporate zero-knowledge proofs and Areas Commun. 40 (12) (2022) 35753588.
other cryptographic techniques into security management to enhance [13] F. Cheng, J. Xiao, C. Liu, S. Zhang, Y. Zhou, B. Li, B. Li, H. Jin, Shardag: Scaling
system security. dag-based blockchains via adaptive sharding, in: 2024 IEEE 40th International
Conference on Data Engineering, ICDE, IEEE, 2024, pp. 20682081.
[14] P. Zheng, Q. Xu, Z. Zheng, Z. Zhou, Y. Yan, H. Zhang, Meepo: Multiple execution
CRediT authorship contribution statement environments per organization in sharded consortium blockchain, IEEE J. Sel.
Areas Commun. 40 (12) (2022) 35623574.
Xueting Huang: Writing original draft, Software, Methodology, [15] J. Wang, H. Wang, Monoxide: Scale out blockchains with asynchronous con-
sensus zones, in: 16th USENIX Symposium on Networked Systems Design and
Conceptualization. Xiangwei Meng: Writing review & editing, Val- Implementation, NSDI 19, USENIX Association, 2019, pp. 95112.
idation, Supervision. Kai Zhang: Formal analysis, Conceptualization. [16] H. Huang, X. Peng, J. Zhan, S. Zhang, Y. Lin, Z. Zheng, S. Guo, Brokerchain:
Ce Yang: Methodology, Formal analysis. Wei Liang: Writing review A cross-shard blockchain protocol for account/balance-based state sharding, in:
& editing, Funding acquisition, Formal analysis, Conceptualization. IEEE INFOCOM 2022-IEEE Conference on Computer Communications, IEEE,
2022, pp. 19681977.
Kuan-Ching Li: Writing review & editing, Supervision.
[17] G. Karypis, V. Kumar, A fast and high quality multilevel scheme for partitioning
irregular graphs, SIAM J. Sci. Comput. 20 (1) (1998) 359392.
Declaration of competing interest [18] J. Xu, Y. Ming, Z. Wu, C. Wang, X. Jia, X-Shard: Optimistic cross-shard transac-
tion processing for sharding-based blockchains, IEEE Trans. Parallel Distrib. Syst.
35 (4) (2024) 548559.
The authors declare that they have no known competing finan- [19] Y. Levi, I. Keslassy, Beyond the ring: Quantized heterogeneous consistent hashing,
cial interests or personal relationships that could have appeared to in: 2023 IEEE 31st International Conference on Network Protocols, ICNP, IEEE,
influence the work reported in this paper. 2023, pp. 112.
12
X. Huang et al. Computer Standards & Interfaces 97 (2026) 104123
[20] G. Mendelson, S. Vargaftik, K. Barabash, D.H. Lorenz, I. Keslassy, A. Orda, [29] X. Wang, C. Lin, X. Huang, D. He, Anonymity-enhancing multi-hop locks for
Anchorhash: A scalable consistent hash, IEEE/ACM Trans. Netw. 29 (2) (2020) monero-enabled payment channel networks, IEEE Trans. Inf. Forensics Secur. 19
517528. (2023) 24382453.
[21] B. Hou, D. Wang, T. Xia, L. Xi, Z. Peng, K.-L. Tsui, Generalized Gini indices: [30] T. Cai, W. Chen, K.E. Psannis, S.K. Goudos, Y. Yu, Z. Zheng, S. Wan, Scalable
Complementary sparsity measures to BoxCox sparsity measures for machine on-chain and off-chain blockchain for sharing economy in large-scale wireless
condition monitoring, Mech. Syst. Signal Process. 169 (2022) 108751. networks, IEEE Wirel. Commun. 29 (3) (2022) 3238.
[22] X. Qi, Y. Li, LightCross: Sharding with lightweight cross-shard execution [31] X. Jia, Z. Yu, J. Shao, R. Lu, G. Wei, Z. Liu, Cross-chain virtual payment channels,
for smart contracts, in: IEEE INFOCOM 2024-IEEE Conference on Computer IEEE Trans. Inf. Forensics Secur. 18 (2023) 34013413.
Communications, IEEE, 2024, pp. 16811690. [32] Z. Li, W. Su, M. Xu, R. Yu, D. Niyato, S. Xie, Compact learning model for dynamic
[23] S. Jiang, J. Cao, C.L. Tung, Y. Wang, S. Wang, SHARON: Secure and efficient off-chain routing in blockchain-based IoT, IEEE J. Sel. Areas Commun. 40 (12)
cross-shard transaction processing via shard rotation, in: Proceedings of the IEEE (2022) 36153630.
International Conference on Computer Communications, INFOCOM, IEEE, 2024, [33] W. Li, C. Feng, L. Zhang, H. Xu, B. Cao, M.A. Imran, A scalable multi-layer
pp. 2023. PBFT consensus for blockchain, IEEE Trans. Parallel Distrib. Syst. 32 (5) (2020)
[24] Y. Zhang, S. Pan, J. Yu, Txallo: Dynamic transaction allocation in sharded 11461160.
blockchain systems, in: 2023 IEEE 39th International Conference on Data [34] H. Azimy, A.A. Ghorbani, E. Bagheri, Preventing proof-of-work mining attacks,
Engineering, ICDE, IEEE, 2023, pp. 721733. Inform. Sci. 608 (2022) 15031523.
[25] E. Kokoris-Kogias, P. Jovanovic, L. Gasser, N. Gailly, E. Syta, B. Ford, Om- [35] A. Hosoyamada, Y. Sasaki, Quantum collision attacks on reduced SHA-256 and
niledger: A secure, scale-out, decentralized ledger via sharding, in: 2018 IEEE SHA-512, in: Annual International Cryptology Conference, Springer, 2021, pp.
Symposium on Security and Privacy, SP, IEEE, 2018, pp. 583598. 616646.
[26] M. Zamani, M. Movahedi, M. Raykova, Rapidchain: Scaling blockchain via full [36] C. Boyd, K. Gjøsteen, S. Wu, A blockchain model in tamarin and formal analysis
sharding, in: Proceedings of the 2018 ACM SIGSAC Conference on Computer and of hash time lock contract, in: 2nd Workshop on Formal Methods for Blockchains,
Communications Security, ACM SIGSAC, 2018, pp. 931948. FMBC 2020, Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2020, p. 13.
[27] Z. Hong, S. Guo, P. Li, W. Chen, Pyramid: A layered sharding blockchain system, [37] Y. Liu, W. Liang, K. Xie, S. Xie, K. Li, W. Meng, LightPay: A lightweight and
in: IEEE INFOCOM 2021-IEEE Conference on Computer Communications, IEEE, secure off-chain multi-path payment scheme based on adapter signatures, IEEE
2021, pp. 110. Trans. Serv. Comput. 17 (4) (2023) 15031523.
[28] A. Liu, Y. Liu, Q. Wu, B. Zhao, D. Li, Y. Lu, R. Lu, W. Susilo, CHERUBIM: [38] J. Herrmann, J. Kho, B. Uçar, K. Kaya, Ü.V. Çatalyürek, Acyclic partitioning of
A secure and highly parallel cross-shard consensus using quadruple pipelined large directed acyclic graphs, in: 2017 17th IEEE/ACM International Symposium
two-phase commit for sharding blockchains, IEEE Trans. Inf. Forensics Secur. 19 on Cluster, Cloud and Grid Computing, CCGRID, IEEE, 2017, pp. 371380.
(2024) 31783193.
13

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

420
papers_txt/lwe-problem.txt Normal file
View File

@@ -0,0 +1,420 @@
CS 294. The Learning with Errors Problem:
Introduction and Basic Cryptography
The learning with errors (LWE) problem was introduced in its current form in a seminal work
of Oded Regev for which he won the Gödel prize in 2018. In its typical form, the LWE problem
asks to solve a system of noisy linear equations. That is, it asks to find s ∈ Znq given
m
(ai , hai , si + ei ) : s ← Znq , ai ← Znq , ei ← χ i=1

(1)
where:
• Zq = Z/qZ denotes the finite ring of integers modulo q, Znq denotes the vector space of
dimension n over Zq ;
• χ is a probability distribution over Z which typically outputs “small” numbers, an example
being the uniform distribution over an interval [B, . . . , B] where B  q/2; and
• a ← D denotes that a is chosen according to the finite probability distribution D, a ← S
denotes that a is chosen uniformly at random from the (finite) set S.
In this first lecture, we will present various perspectives on the LWE (and the closely related “short
integer solutions” or SIS) problem, basic theorems regarding the different variants of these problems
and their basic cryptographic applications.
We will shortly derive LWE in a different way, “from first principles”, starting from a different
view, that of finding special solutions to systems of linear equations.
1 Solving Systems of Linear Equations
Consider the problem of solving a system of linear equations
Ae = b mod q (2)
given A ∈ Zn×mq and b ∈ Znq . This can be accomplished in polynomial time with Gaussian
elimination. However, slight variations of this problem become hard for Gaussian elimination and
indeed, we believe, for all polynomial-time algorithms. This course is concerned with two such
problems, very related to each other, called the SIS problem and the LWE problem.
1.1 The “Total” Regime and SIS
Assume that we now ask for solutions to equation 2 where e lies in some subset S ⊆ Zm
q . Typically
we will think of subsets S that are defined geometrically, for example:
• S = {0, 1}, which is the classical subset sum problem modulo q. More generally, S =
[B . . . B]m is the set of all solutions where each coordinate can only take a bounded value
(absolute value bounded by some number B  q/2). This will be the primary setting of
interest.
• S = Ball2R , the Euclidean ball of (small) radius R.
1
In all cases, we are asking for short solutions to systems of linear equations and hence this is called
the SIS (short integer solutions) problem.
The SIS problem SIS(n, m, q, B) as we will study is parameterized by the number of variables
m, the number of equations n, the ambient finite field Zq , and the bound on the absolute value of
the solutions B. Namely, we require that each coordinate ei ∈ [B, B + 1, . . . , B 1, B].
To define an average-case problem, we need to specify the probability distributions for A and
b. We will, for the most part of this course, take A to be uniformly random in Zn×m q . There are
two distinct ways to define b. The first is in the “total” regime where we simply choose b from the
uniform distribution over Znq .
What does “total” mean? Total problems in NP are ones for which each problem instance has
a solution that can be verified given a witness, but the solution may be hard to find. An example
is the factoring problem where you are given a positive integer N and you are asked for its prime
factorization. A non-example is the 3-coloring problem where you are given a graph G and you
are asked for a 3-coloring; although this problem is in NP, it is not total as not every graph is
3-colorable.
Totality of SIS on the Average. Here, using a simple probabilistic argument, one can show
that (B-bounded) solutions are very likely to exist if (2B + 1)m  q n , or m = Ω( nlog
log q
B ). We call
this regime of parameters the total regime or the SIS regime. Thus, roughly speaking, in the SIS
regime, m is large enough that we are guaranteed solutions (even exponentially many of them)
when A and b are chosen to be uniformly random. The problem then is to actually find a solution.
A Variant: homogenous SIS. The homogenous version of SIS asks for a non-zero solution to
equation 1 with the right hand side being 0, that is, Ae = 0 (mod q). This variant is worst-case
total as long as (B + 1)m > q n . That is, for every instance A is guaranteed to have a solution. We
leave the proof to the reader (Hint: Pigeonhole). SIS and hSIS are equivalent on the average-case.
We again leave the simple proof to the reader.
1.2 The Planted Regime and LWE
When m  nlog log q
B , one can show again that there are likely to be no B-bounded solutions for a
uniformly random b and thus, we have to find a different, sensible, way to state this problem. To
do this, we first pick a B-bounded vector e and compute b as Ae mod q. In a sense, we plant the
solution e inside b. The goal now is to recover e (which is very likely to be unique) given A and
b. We call this the planted regime or the LWE regime.
But why is this LWE when it looks so different from Equation 1?
This is because the SIS problem in the planted regime is simply LWE in disguise. For, given
(mn)×m
an LWE instance (A, yT = sT A + eT ), let A⊥ ∈ Zq be a full-rank set of vectors in the
right-kernel of A. That is,
A⊥ · At = 0 mod q
Then,
b := A⊥ · y = A⊥ · (At s + e) = A⊥ · e mod q
so (A⊥ , b) is an SIS instance SIS(m n, m, q, B) whose solution is the LWE error vector. Further-
more, this is in the planted regime since one can show with an easy probabilistic argument that
the LWE error vector e is unique given (A, y).
2
The reader should also notice that we can run the reduction in reverse, creating an LWE
instance from a SIS instance. If the SIS instance is in the planted regime, this (reverse) reduction
will produce an LWE instance.
In summary, the only difference between the SIS and the LWE problems is whether they live
in the total world or the planted world, respectively. But the world you live in may make a
big difference. Algorithmically, so far, we dont see a difference. In cryptography, SIS gives us
applications in “minicrypt” (such as one-way functions) whereas we need LWE for applications in
“cryptomania” and beyond (such as public-key encryption and fully homomorphic encryption).
Decision vs. Search for LWE. In the decisional version of LWE, the problem is to distinguish
between (A, yT := sT A + eT mod q) and a uniformly random distribution. One can show, through
a reduction that runs in poly(q) time, that the two problems are equivalent. The interesting
direction is to show that if there is a poly-time algorithm that solves the decision-LWE problem
for a uniformly random matrix A, then there is a poly-time algorithm that solves the search LWE
problem for a (possibly different and possibly larger) uniformly random matrix A0 . We will see a
search to decision reduction later in class.
1.3 Reductions Between SIS and LWE
SIS is at least as hard as LWE. We wish to show that if you have a solution for SIS w.r.t.
A, then it is immediate to solve decision-LWE w.r.t. A. Indeed, given a SIS solution e such that
Ae = 0 (mod q), and a vector bT , compute bT e (mod q). If b is an LWE instance, then
bT e = (sT A + xT )e = xT e (mod q)
which is a “small” number (as long as xT is small enough). On the other hand, if b is random,
then this quantity is uniformly random mod q (in particular, with a non-negligible probability, not
small). This gives us a distinguisher.
LWE is (quantumly) at least as hard as SIS. This turns out to be true, as we will see later
in the course.
1.4 SIS, LWE and Lattice Problems
SIS and LWE are closely related to lattices and lattice problems. We will have much to say about
this connection, in later lectures.
2 Basic Theorems
We start with some basic structural theorems on LWE and SIS.
2.1 Normal Form SIS and Short-Secret LWE
The normal form for SIS is where the matrix A is systematic, that is of the form A = [A0 ||I] where
n×(mn)
A0 ∈ Zq .
Lemma 1. Normal-form SIS is as hard as SIS.
3
Proof. To reduce from normal-form SIS to SIS, simply multiply the input to normal-form SIS
(nfSIS), denoted [A0 ||I], on the left by a random matrix B ← Zn×n
q . We will leave it to the reader
to verify that the resulting matrix denoted A := B[A0 ||I] is uniformly random. Furthermore, a
solution to SIS on input (A, Bb0 ) gives us a solution to nfSIS on input (A0 , b0 ).
In the other direction, to reduce from SIS to normal-form SIS, write A as [A0 ||B] and gener-
ate [B1 A0 ||I] as the normal-form SIS instance. Again, a solution to the normal form instance
(B1 A0 , B1 b) gives us a solution to SIS on input (A, b).
The corresponding version of LWE is called short-secret LWE where both the entries of s and
that of e are chosen from the error distribution χ. The proof of the following lemma follows along
the lines of that for normal form SIS and is left as an exercise. (Indeed, a careful reader will observe
that short-secret LWE is nothing but normal-form SIS in disguise.)
Lemma 2. There is a polynomial-time reduction from ssLWE(n, m, q, χ) to LWE(n, m, q, χ) and
one from LWE(n, m, q, χ) to ssLWE(n, m + n, q, χ).
We will continue to see more structural theorems about LWE through the course, but this
suffices for now.
3 Basic Cryptographic Applications
3.1 Collision-Resistant Hashing
A collision resistant hashing scheme H consists of an ensemble of hash functions {Hn }n∈N where
each Hn consists of a collection of functions that map n bits to m < n bits. So, each hash function
compresses its input, and by pigeonhole principle, it has collisions. That is, inputs x 6= y such that
h(x) = h(y). Collision-resistance requires that every p.p.t. adversary who gets a hash function
h ← Hn chosen at random fails to find a collision except with negligible probability.
Collision-Resistant Hashing from SIS. Here is a hash family Hn that is secure under SIS(n, m, q, B)
where n log q > m log(B + 1). Each hash function hA is parameterized by a matrix A ∈ Zn×m
q ,
takes as input e ∈ [0, . . . , B]m and outputs
hA (e) = Ae mod q
A collision gives us e, e0 ∈ [0, . . . , B]m where Ae = Ae0 mod q which in turn says that A(e e0 ) =
0 mod q. Since each entry of e e0 is in [B, . . . , B], this gives us a solution to SIS(n, m, q, B).
3.2 Private-Key Encryption
A private-key encryption scheme has three algorithms: a probabilistic key generation Gen which,
on input a security parameter λ, generates a private key sk; a probabilistic encryption algorithm
Enc which, on input sk and a message m chosen from a message space M, generates a ciphertext
c; and a deterministic decryption algorithm Dec which, on input sk and the ciphertext c, outputs
a message m0 .
Correctness requires that for every sk generated by Gen and every m ∈ M,
Dec(sk, Enc(sk, m)) = m
4
The notion of security for private-key encryption is semantic security or equivalently, CPA-security,
as defined in the Pass-Shelat lecture notes (see References at the end of the notes.) In a nutshell,
this says that no probabilistic polynomial time (p.p.t.) adversary which gets oracle access to either
the Left oracle or the Right oracle can distinguish between the two. Here, the Left (resp. the Right)
oracle take as input a pair of messages (mL , mR ) ∈ M2 and outputs an encryption of mL (resp.
mR ).
Private-Key Encryption from LWE.
• Gen(1λ ): Compute n = n(λ), q = q(λ) and χ = χ(λ) in a way we will describe later in this
lecture. Let the private key sk be a uniformly random vector
sk := s ← Znq .
• Enc(sk, m): We will work with the message space M := {0, 1}. Larger message spaces can
be handled by encrypting each bit of the message independently. The ciphertext is
c := (a, b) := (a, sT a + e + mbq/2e mod q)
where a ← Znq and e ← χ is chosen from the LWE error distribution.
• Dec(sk, c = (a, b)): Output 0 if
b sT a mod q < q/4
and 1 otherwise.
Lemma 3. The scheme above is correct if the support of the error distribution Supp(χ) ⊆ (q/4, q/4)
and CPA-secure under the LWE assumption LWE(n, m = poly(n), q, χ).
Correctness and security are immediate and left as an exercise to the reader.
We left the issue of how to pick n, q and χ open, and indeed, they need to be chosen appropriately
for the scheme to be secure. Correctness and security give us constraints on these parameters
(see Lemma 3 above), but do not tell us how to completely specify them. To fully specify the
parameters, we need to ensure security against attackers “running in 2λ time” (this is the meaning
of the security parameter λ that we will use throughout this course) and to do that, we need to
evaluate the efficacy of various attacks on LWE which we will do (at least, asymptotically) in the
next lecture.
Open Problem 1.1. Construct a nice private-key encryption scheme from the hardness of SIS.
Note that SIS implies a one-way function directly. Together with generic transformations
in cryptography from one-way functions to pseudorandom generators (Håstad-Impagliazzo-Levin-
Luby) and from pseudorandom generators to pseudorandom functions (Goldreich-Goldwasser-Micali)
and from pseudorandom functions to private-key encryption (easy/folklore), this is possible. The
problem is to avoid the ugliness that results from using these general transformations.
5
3.3 Public-Key Encryption
A public-key encryption scheme is the same as private-key encryption except for two changes: first,
the key generation algorithm Gen outputs a public key pk as well as a private key sk; and second,
the encryption algorithm requires only the public key pk to encrypt. Security requires that a p.p.t.
adversary which is given pk (and thus can encrypt as many messages as it wants on its own) cannot
distinguish between an encryption of any two messages m0 , m1 ∈ M of its choice.
Public-Key Encryption from LWE (the LPR Scheme) There are many ways of doing this;
we will present the cleanest one due to Lyubashevsky-Peikert-Regev.
• Gen(1λ ): Compute n = n(λ), q = q(λ) and χ = χ(λ) in a way we will describe later in this lec-
ture. Let the private key sk be a random vector sk := s ← χn is chosen from the error distribution
and the public key is
pk := (A, yT := sT A + eT ) ∈ Zqn×n × Znq
where A is a uniformly random n-by-n matrix and e ← χn is chosen from the error distribu-
tion.
• Enc(sk, m): We will work with the message space M := {0, 1} as above. The ciphertext is
c := (a, b) := (Ar + x, yT r + x0 + mbq/2e mod q)
where r, x ← χn and x0 ← χ are chosen from the LWE error distribution.
• Dec(sk, c = (a, b)): Output 0 if
b sT a mod q < q/4
and 1 otherwise.
p p
Lemma 4. The scheme above is correct if Supp(χ) ⊆ ( q/4(2n + 1), q/4(2n + 1)) and CPA-
secure under the LWE assumption LWE(n, m = 2(n + 1), q, χ).
Proof. For correctness, note that the decryption algorithm computes
b sT a mod q = sT x + eT r + x0
p p
whose absolute value, as long as Supp(χ) ⊆ ( q/4(2n + 1), q/4(2n + 1)) is at most
q/4(2n + 1) · (2n + 1) = q/4 .
For security, we proceed by the following sequence of hybrid experiments.
Hybrid 0.m. The adversary gets pk and Enc(pk, m) where m ∈ {0, 1}.
Hybrid 1.m. Feed the adveresary with a “fake” public key pk
f computed as
f = (A, y) ← Zn×n × Zn
pk q q
and Enc(pk,
f m). This is indistinguishable from Hybrid 0 by the hardness of ssLWE(n, n, q, χ) and
therefore, by Lemma 2, LWE(n, 2n, q, χ).
6
Hybrid 2.m. Feed the adversary with pk
f and Enc(
g pk,f m) computed as
Enc(
g pk,f m) = (a, b0 + mbq/2e mod q)
where a ← Znq is uniformly random. This is indistinguishable from Hybrid 1 by ssLWE(n, n+1, q, χ)
or by Lemma 2, LWE(n, 2n + 1, q, χ), since the entire ciphertext can easily be rewritten as
     
A x 0
r+ + mod q
yT x0 mbq/2e
which, since y is now uniformly random, is n + 1 ssLWE samples and therefore can be indistin-
guishably replaced by    
a 0
+ mod q
b0 mbq/2e
where a ← Znq and b0 ← Zq .
Hybrid 3.m. Feed the adversary with uniformly random numbers from the appropriate domains.
Follows from the previous expression for the fake ciphertext (random + anything = random).
For every m ∈ M, Hybrid 0.m is computationally indistinguishable from Hybrid 3.m. Furthermore,
Hybrid 3 is completely independent of m. Therefore, Hybrids 0.0 and 0.1 are computationally
indistinguishable from each other, establishing semantic security or CPA-security.
There are many ways to improve the rate of this encryption scheme, that is, lower the ratio of
(#bits in ciphertext)/(#bits in plaintext) and indeed, even achieve a rate close to 1. We can also
use these techniques as building blocks to construct several other cryptographic systems such as
oblivious transfer protocols. This public-key encryption scheme has its origins in earlier works of
Ajtai and Dwork (1997) and Regev (2004).
Public-Key Encryption from LWE (the Regev Scheme) We present a second public-key
encryption scheme due to Regev. We will only provide a sketch of the correctness and security
analysis and leave it as an exercise to the reader. We remark that the security proof relies on a
beautiful lemma called the “leftover hash lemma” (Impagliazzo, Levin and Luby 1990).
• Gen(1λ ): Compute n = n(λ), q = q(λ) and χ = χ(λ) in a way we will describe later in this
lecture. Let the private key sk be a random vector sk := s ← Znq is chosen uniformly at
random from Zq and the public key is
pk := (A, yT := sT A + eT ) ∈ Zn×n
q × Zm
q
where A is a uniformly random n-by-m matrix and e ← χn is chosen from the error distri-
bution. Here m = Ω(n log q).
Note the difference from LPR where the secret key had small entries. Note also that the
matrix A is somewhat larger than in LPR.
7
• Enc(sk, m): We will work with the message space M := {0, 1} as above. The ciphertext is
c := (a, b) := (Ar, yT r + mbq/2e mod q)
where r ← {0, 1}m . x0 ← χ is chosen from the LWE error distribution.
Note the difference from LPR where the vector r was chosen from the error distribution and
the first component of the ciphertext had an additive error as well. Roughly speaking, in Regev,
we will argue that the first component is statistically close to random, whereas in LPR, we
argued that it is computationally close to random under the decisional LWE assumption.
• Dec(sk, c = (a, b)): Output 0 if
b sT a mod q < q/4
and 1 otherwise.
Decryption recovers mbq/2e plus an error eT r + x0 whose norm should be smaller than q/4
for the correctness of decryption. This is true as long as the support of the error distribution is
Supp(χ) ⊆ (q/4(m + 1), q/4(m + 1)).
In the security proof, we first replace the public key with a uniformly random vector relying on
the LWE assumption. Once this is done, use the leftover hash lemma to argue that the ciphertext
is statistically close to random.
Public-Key Encryption from LWE (the dual Regev Scheme) We present yet another
public-key encryption scheme due to Gentry, Peikert and Vaikuntanathan called the “dual Regev”
scheme. The nice feature of this scheme, which will turn out to be important when we get to
identity-based encryption is that the distribution of the public key is really random. In other words,
any string could be a possible public key in the scheme.
• Gen(1λ ): Compute n = n(λ), q = q(λ) and χ = χ(λ) in a way we will describe later in this
lecture. Let the private key sk be a random vector sk := r ← {0, 1}m is chosen uniformly at
random with 0 or 1 entries and the public key is
pk := (A, a := Ar ∈ Zn×n
q × Zm
q )
where A is a uniformly random n-by-m matrix. Here m = Ω(n log q).
Note the difference from Regev where the private key here seems to have a component similar
to the first component of a Regev ciphertext. No wonder this is called “dual Regev”.
• Enc(sk, m): We will work with the message space M := {0, 1} as above. The ciphertext is
c := (yT , b) := (sT A + eT , sT a + x0 + mbq/2e mod q)
where s ← Znq and eT ← χm . x0 ← χ is chosen from the LWE error distribution.
• Dec(sk, c = (yT , b)): Output 0 if
b yT r mod q < q/4
and 1 otherwise.
8
Open Problem 1.2. Construct a public-key encryption scheme from the hardness of LWE where
the support of the error distribution χ is large, namely [cq, cq] for some constant c.
LWE with such large errors does imply a one-way function, and therefore, a private-key encryp-
tion scheme. The question therefore asks if there is a gap between the LWE parameters that gives
us public-key vs private-key encryption.
References
The primary reference for the cryptographic definitions in this lecture is lecture notes by Pass and
Shelat, available at this url.
9

File diff suppressed because it is too large Load Diff

3002
papers_txt/opaque-2018.txt Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

1040
papers_txt/owl-apake.txt Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

1753
papers_txt/regev-lattice.txt Normal file

File diff suppressed because it is too large Load Diff

3942
papers_txt/rfc9807.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,418 @@
(Vector) Oblivious Linear Evaluation:
Basic Constructions and Applications
Peter Scholl
24 January 2022, Bar-Ilan Winter School
This talk What is it?
VOLE variants
OLE
Whats it good for?
Conclusion (V)OLE
How do you build it? correlated
randomness
active security homomorphic encryption
oblivious transfer
Oblivious PRF
Peter Scholl 3
Oblivious linear evaluation (OLE)
Input: 𝑥! Input:
𝑎, 𝑏!
Output: 𝑦 = 𝑎𝑥 + 𝑏
𝑥! 𝑎, 𝑏!
OLE functionality
𝑦 = 𝑎𝑥 + 𝑏
5
OLE is secret-shared multiplication
Input: 𝑥! Input:
𝑎!
𝑥 𝑎, 𝑏 𝑏!
OLE
𝑦
𝑦 𝑏 = 𝑎𝑥
6
Variants: random-OLE, vector-OLE
𝑥! 𝑎, 𝑏!
OLE
𝑦 = 𝑎𝑥 + 𝑏
𝑥! 𝑎, 𝑏!
𝑦 = 𝑎𝑥 + 𝑏 $-OLE
𝑥!
𝑏"!
𝑎,
VOLE
𝑦⃗ = 𝑎𝑥
⃗ +𝑏
7
A few basic observations
𝑛 × OLE ⇒ 1× VOLE (unconditional, passive security)
v VOLE is easier to build than 𝑛 × OLE
$-OLE ⇒ OLE (unconditional, send 3 ! elem.)
v $-(V)OLE is enough
Oblivious
OLE ⇒ (unconditional)
Transfer
v Public-key crypto is necessary [IR 89]
8
Motivation: Secure Computation with
Preprocessing
[Beaver 91]
Correlated randomness Preprocessing
𝑥 𝑦
Online phase
• Information-theoretic
𝑓(𝑥, 𝑦) • Cheap computation
Peter Scholl 9
Example: multiplication triples from OLE
𝑥, 𝑥 " , 𝑦, 𝑦 2x $-OLE 𝑎, 𝑎" , 𝑏, 𝑏
𝑦 𝑏 = 𝑎𝑥
𝑦 " 𝑏 = 𝑎" 𝑥 "
𝑥 + 𝑎𝑥 ! + 𝑎 = 𝑥𝑥 ! + 𝑎𝑎! + 𝑎𝑥 + 𝑎! 𝑥
𝑢𝑣 = 𝑤
10
(V)OLE for correlated randomness
v Scalar/vector triples, matrix triples
○ Build from VOLE
v Multi-party correlations:
○ From pairwise instances of (V)OLE
○ Other approaches: depth-1 homomorphic encryption [DPSZ 12]
v Authenticated secret shares:
○ Use VOLE to generate information-theoretic MACs
○ Key part of SPDZ protocols [DPSZ 12, KOS 16, KPR 18, …] 11
Application: Oblivious Pseudorandom Functions
PRF 𝐹 Oblivious PRF
𝑥 𝑏 ← 0,1
𝐾 ← 0,1 !
𝑦+ 𝐾 𝑥
Guess 𝑏 𝑦" = 𝐹(𝐾, 𝑥)
𝑦# = $(𝑥) 𝐹(𝐾, 𝑥)
𝐹(𝐾, 𝑦) remains
pseudorandom for any 𝑦𝑥
14
Vector-OLE ⇒ Batch OPRF evaluation [BCGIKS 19]
𝑠𝔽1 𝑎2 ∈ 𝔽1
VOLE
𝑡2 = 𝑎2 𝑠 + 𝑏2 𝑏2 ← 𝔽1
Keys 𝐾2 : = 𝑠, 𝑡2 2 Output 𝐻(𝑏" )
𝐹 𝐾, , 𝑎, ≔ 𝐻(𝑡, 𝑎, 𝑠)
v Relaxed OPRF: related keys, leakage
v Secure if 𝐻 is a random oracle
• Or variant of correlation-robustness
16
Random Vector-OLE ⇒ Batch OPRF evaluation
𝑠𝔽1 𝑟2 ← 𝔽1
$-VOLE
𝑡2 = 𝑟2 𝑠 + 𝑏2 𝑏2 ← 𝔽1
𝑑2 = 𝑎2 𝑟2
𝑡2 = 𝑡23 + 𝑑2 𝑠
Keys 𝐾2 : = 𝑠, 𝑡2 2 Output 𝐻(𝑏" )
v Optimal communication: 1 𝔽1 element
Ø (given $-VOLE)
17
Applications of OPRF
v Random 1-out-of-𝑞 OT
○ Correlated randomness, e.g. masked truth tables [DKSSZZ 17]
v Password-authenticated key exchange, e.g. OPAQUE [JKX 18]
○ Batch OPRF seems less useful
v Private set intersection
○ Reducing use of public-key crypto [KKRT 16, KMPRT 17, …]
○ With polynomial-based encoding [GPRTY 21, Sec 7.1]
■ Simple protocol, communication: |input| 18
Constructing VOLE, “non-silently”
19
Taxonomy of VOLE protocols
Oblivious Transfer Homomorphic Encryption
”Non-silent”
𝑏 𝑠# , 𝑠$ 𝑥 𝑓(𝑥)
OT Enc Eval Dec
𝑠%
”Silent”
v Mostly based on LPN
v Require “seed” VOLEs +
to bootstrap 20
(V)OLE from Oblivious Transfer [Gilboa 99]
𝑥1 𝑎, 𝑏1
𝑥$ 𝑏& , 𝑏& + 𝑎
Bit-decompose 𝑥 = ∑9 22:8 𝑥 Sample 𝑏2 ∈ 1 s.t.
278 2
OT 𝑏 = ∑2 22:8𝑏2 mod 𝑞
𝑦$
𝑥' 𝑏' , 𝑏' + 𝑎
OT
𝑦'
Repeat for VOLE
[KOS 16]
Output 𝑦 = ∑2 22:8𝑦2 𝑦2 = 𝑏2 + 𝑎𝑥2
𝑦 = 𝑏 + 𝑎𝑥
21
(V)OLE from Oblivious Transfer [Gilboa 99]
v Perfectly secure
v Each output: 𝑚 = log 𝑞 calls to OT on 𝑚-bit strings
○ Computational cost: cheap via OT extension [IKNP 03]
○ Communication: ≥ 𝑚< bits
v Active security?
22
(V)OLE from Oblivious Transfer: active security?
𝑥1 𝑎, 𝑏1
𝑥$ 𝑏& , 𝑏& + 𝑎
Bit-decompose 𝑥 = ∑2 22:8𝑥2 Sample 𝑏2 ∈ 1 s.t.
OT Bob uses 𝑎" ≠𝑏𝑎:= ∑2 22:8 𝑏2 mod 𝑞
𝑦$
Output becomes 𝑦 + 𝑎" 𝑎 𝑥$
𝑥' 𝑏' , 𝑏' + 𝑎
OT
𝑦'
Output 𝑦 = ∑2 22:8𝑦2
23
VOLE: lightweight correctness check
𝑥, 𝑦2 𝑎2 , 𝑏2
Goal: check that 𝑦2 = 𝑎2 𝑥 + 𝑏2 , for all 𝑖
Random challenges 𝜒# , … , 𝜒$ ∈ %
𝑎 = - 𝜒$ 𝑎$ , 𝑏 = - 𝜒$ 𝑏$
𝑎 , 𝑏 $ $
+𝑎"%& +𝑏"%&
𝑦 = ∑𝜒" 𝑦" +𝑦"%&
Intuition:
Check 𝑦 = 𝑎 𝑥 + 𝑏 • To pass check when 𝑦& is incorrect, Bob must guess 𝜒&
• Succeed with pr. 1/𝑝
24
Problems with selective failure
v Recall: corrupt Bob can induce error:
𝑦 / = 𝑦 + 𝑎/ 𝑎 𝑥0
○ Error depends on secret bit 𝑥8!
○ Even if VOLE is correct, leaks that 𝑥8 = 0
v Solutions:
○ 1) Relaxed VOLE: allow small leakage on 𝑥 [KOS 16], [WYKW 21]
○ 2) Privacy amplification via leftover hash lemma [KOS 16]
25
(V)OLE from OT: Summary
v Simple protocol with lightweight computation
○ Leveraging fast OT extension techniques
v Expensive communication
○ At least 𝑚< bits, where 𝑚 = log 𝑞
v Active security almost for free
○ If leakage on 𝑥 is OK
26
VOLE from Homomorphic Encryption
27
Linearly homomorphic encryption
vPKE scheme (𝐾𝑒𝑦𝐺𝑒𝑛, 𝐸𝑛𝑐, 𝐷𝑒𝑐), encrypts vectors over $
For 𝑎⃗ ∈ (! , write 𝑎⃗ ≔ Enc)* (𝑎)
vLinear homomorphism:
⃗ for 𝑐⃗ ∈ $' , s.t.
ØCan compute 𝑎⃗ + 𝑏 or 𝑐⃗ ⋅ [𝑎],
Dec 𝑎⃗ + 𝑏 = 𝑎⃗ + 𝑏
Dec 𝑐⃗ ⋅ 𝑎⃗ = 𝑐⃗ ⋅ 𝑎⃗
Component-wise
product
Peter Scholl 28
Examples of Linearly Homomorphic
Encryption
More on Wednesday!
vPaillier encryption
ØEach ciphertext encrypts a G element (𝑁 = 𝑝𝑞)
vDDH
ØElGamal in the exponent: poly-size plaintexts in
ØClass groups: ! for large prime 𝑝 [CL 15]
vRing Learning With Errors (RLWE) [LPR 10]
ØNatively encrypts a vector in 9
!
Peter Scholl 29
Naïve VOLE from Linearly Homomorphic
Encryption
𝑥! ⃗ 𝑏9
𝑎, !
𝑝𝑘, [𝑥]
(
𝑝𝑘, 𝑠𝑘𝐺𝑒𝑛(1 )
𝑦⃗ = 𝑎⃗ ⋅ 𝑥 + [𝑏]
𝑦⃗ = 𝐷𝑒𝑐)* ( 𝑦⃗ )
Security:
• Alice: CPA security
• Bob: circuit privacy
Peter Scholl 30
Circuit privacy in homomorphic encryption
vIn RLWE, message hidden by “noise”: message
extra noise ≫ 𝑎𝑒 + 𝑏
vAfter computing 𝑎⃗ ⋅ 𝑥 + [𝑏]:
noise 𝑒𝑎𝑒 + 𝑏
ØNoise depends on 𝑎⃗ and 𝑏 (removed in decryption)
vClassic solution:
Optimization: ”Gentle noise flooding” [dCHIV 21]
Ø“Noise flooding” • Encrypt 𝑡-out-of-𝑛 sharing of message
ØRequires much larger ciphertexts • A few leaked coordinates dont matter
Peter Scholl 31
What about active security?
vWhat can go wrong?
ØAlice/Bob could send garbage ciphertexts…
vWhat about correctness check as in OT?
ØSelective failure is more subtle
ØError may depend on ciphertext noise/secret key
vSolution: zero-knowledge proofs
ØAlice: proof of plaintext knowledge
ØBob: proof of correct multiplication
Peter Scholl 32
ZK proofs for homomorphic encryption
vRLWE is more challenging than number-theoretic assumptions
vProof of plaintext knowledge
ØNaïve sigma protocol: soundness ½
ØVarious optimizations [BCS 19], amortization [BBG 19]
ØStill computationally expensive, often need larger parameters
vProof of correct multiplication
ØEven worse! Tricky to amortize
ØCan be avoided, assuming linear-only encryption [BISW 18, KPR 18]
Peter Scholl 33
Conclusion: Basic constructions and applications
v OLE and VOLE are core building blocks of secure computation
○ Correlated randomness
○ Special-purpose applications like OPRF, private set intersection
○ Next talk: zero knowledge
v Non-silent protocols: OT, AHE
○ Important, even if silent protocols win J
○ Open question: improving RLWE parameters and efficiency
■ Especially for active security
34
Thank you!
Peter Scholl 35

1064
papers_txt/vole-ring-lwe.txt Normal file

File diff suppressed because it is too large Load Diff