Files
opaque-lattice/papers_txt/AdaTraj-DP--An-adaptive-privacy-framework-for-contex_2026_Computer-Standards.txt
2026-01-06 12:49:26 -07:00

750 lines
87 KiB
Plaintext
Raw Permalink Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Computer Standards & Interfaces 97 (2026) 104125
Contents lists available at ScienceDirect
Computer Standards & Interfaces
journal homepage: www.elsevier.com/locate/csi
AdaTraj-DP: An adaptive privacy framework for context-aware trajectory
data publishingI
Yongxin Zhao a , Chundong Wang a,b ,, Hao Lin c ,, Xumeng Wang d , Yixuan Song a , Qiuyu Du c
a
Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology, Tianjin University of Technology, Tianjin, China
b
TianJin Police Institute, Tianjin, China
c
College of Intelligent Science and Technology (College of Cyberspace Security), Inner Mongolia University of Technology, Inner Mongolia, China
d
College of Cryptology and Cyber Science, Nankai University, Tianjin, China
ARTICLE INFO ABSTRACT
Keywords: Trajectory data are widely used in AI-based spatiotemporal analysis but raise privacy concerns due to their fine-
Differential privacy grained nature and the potential for individual re-identification. Existing differential privacy (DP) approaches
Trustworthy AI often apply uniform perturbation, which compromises spatial continuity, or adopt personalized mechanisms
Trajectory data publishing
that overlook structural utility. This study introduces AdaTraj-DP, an adaptive differential privacy framework
Personalized perturbation
designed to balance trajectory-level protection and analytical utility. The framework combines context-aware
sensitivity detection with hierarchical aggregation. Specifically, a dynamic sensitivity model evaluates privacy
risks according to spatial density and semantic context, enabling adaptive allocation of privacy budgets. An
adaptive perturbation mechanism then injects noise proportionally to the estimated sensitivity and represents
trajectories through Hilbert-based encoding for prefix-oriented hierarchical aggregation with layer-wise budget
distribution. Experiments conducted on the T-Drive and GeoLife datasets indicate that AdaTraj-DP maintains
stable query accuracy, spatial consistency, and downstream analytical utility across varying privacy budgets
while satisfying formal differential privacy guarantees.
1. Introduction differential privacy for trajectory data has become essential to support
reliable and ethically compliant AI development.
The proliferation of mobile devices, GPS sensors, and intelligent Differential Privacy (DP) [6] provides a rigorous mathematical guar-
transportation infrastructures has resulted in the large-scale collection antee against information leakage. However, its application to tra-
of spatiotemporal data. Such data serve as the foundation for numerous jectory publishing introduces a persistent trade-off between privacy
Location-Based Services (LBS), including navigation, ride-hailing, and strength, data utility, and personalization, which conventional mecha-
urban planning [1,2]. Trajectory datasets record detailed sequences of nisms fail to reconcile. Two primary gaps remain unresolved: (1) the
individual movements, enabling a wide range of AI applications such as tension between point-level perturbation and structural integrity;(2)
traffic forecasting, mobility prediction, and behavioral modeling. These the difficulty of adapting privacy budgets to varying contextual sen-
applications have become indispensable for smart city management and sitivity. Early studies injected uniform Laplace noise into each location
autonomous systems, where the integrity and granularity of trajectory point [7,8], which protected individual coordinates but severely dis-
data directly affect analytical and decision-making accuracy. torted the spatiotemporal correlation essential for route-level analysis.
Despite their utility, trajectory datasets raise critical privacy con- Subsequent hierarchical schemes based on prefix trees or space-filling
cerns for trustworthy AI. A single trajectory may expose an individuals curves [9,10] preserved aggregate statistics but relied on global, fixed
home, workplace, or health-related locations, revealing sensitive be- privacy parameters, ignoring heterogeneous sensitivity across trajecto-
havioral patterns and social relationships [3,4]. Even after removing ries. Recent progress in Personalized Differential Privacy (PDP) [1113]
explicit identifiers, re-identification attacks can reconstruct personal introduced adaptive noise based on semantic or frequency-based sen-
traces with minimal auxiliary information [5]. Consequently, ensuring sitivity, yet these methods typically lack integration with hierarchical
I This article is part of a Special issue entitled: Secure AI published in Computer Standards & Interfaces.
Corresponding author at: Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology, Tianjin University of Technology, Tianjin, China.
Corresponding author.
E-mail addresses: zyx4237@163.com (Y. Zhao), michael3769@163.com (C. Wang), suzukaze_aoba@126.com (H. Lin), wangxumeng@nankai.edu.cn
(X. Wang), fykatb0824@163.com (Q. Du).
https://doi.org/10.1016/j.csi.2025.104125
Received 29 October 2025; Received in revised form 25 December 2025; Accepted 29 December 2025
Available online 30 December 2025
0920-5489/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
aggregation, resulting in limited query accuracy and poor scalability quadtree variants support spatial indexing under privacy constraints [7,
for AI model training. 10]. Recent work improves spatial locality and query accuracy us-
To bridge this gap, we propose AdaTraj-DP, an adaptive differ- ing Hilbert/Geohash encodings and adaptive tree strategies [9]. Zhao
entially private trajectory publishing framework that unifies context- et al.s PerTrajTree-DP further integrates point-level sensitivity with
aware sensitivity modeling and hierarchical aggregation. AdaTraj-DP prefix-tree publishing to better support trustworthy AI analytics [24].
introduces a two-stage protection mechanism. The first stage detects Complementary systems research on private data access and expla-
and quantifies sensitivity using contextual and statistical cues, allowing nation (e.g., DPXPlain, Saibot) demonstrates practical techniques for
adaptive privacy budget assignment at the point level. The second supporting DP-protected analytics and helping users interpret noisy
stage encodes perturbed trajectories into a hierarchical prefix tree, aggregates [25,26].
applying layer-wise budget allocation to preserve structural consistency
for downstream analysis. This design ensures both localized protection 2.3. Personalized and adaptive privacy protection
and global analytical utility, addressing the core limitations of prior
DP-based trajectory mechanisms. Personalized Differential Privacy (PDP) methods adapt protection
The main contributions of this work are summarized as follows: to varying point- or user-level sensitivity. Semantics-driven approaches
use POI categories or external labels to identify sensitive locations [27,
(1) We propose AdaTraj-DP, an adaptive framework that unifies per- 28], and movement-model-based frameworks like OPTDP estimate pri-
sonalized perturbation and hierarchical aggregation. By estab- vacy risk from mobility patterns [11]. Statistical personalization meth-
lishing a mathematical link between local coordinate noise and ods infer sensitivity from dataset properties; for example, TFIDF-based
global prefix-tree structures, the framework ensures that fine- approaches quantify local importance and global rarity to guide bud-
grained point-level protection remains structurally consistent get allocation [12,13]. Interactive tools and visual analytics (DPKnob,
with trajectory-level differential privacy guarantees, enabling Defogger) provide practical support for configuring heterogeneous DP
high-fidelity reconstruction for downstream tasks. strategies according to utility goals [20,21].
(2) We design a context-aware sensitivity model that combines spa- In parallel, recent advances in differentially private deep learning
tial density with semantic context to guide adaptive budget and private model training yield methods for improved utility in noisy
allocation. This mechanism quantifies privacy risks at a granular training regimes (e.g., optimized DP-SGD variants, selective-update
training, and heterogeneous-noise schemes) that can inform budget
level, enabling the dynamic adjustment of perturbation intensity
allocation and model-aware privacy strategies in trajectory publish-
to balance privacy protection and data fidelity.
ing [25,26,2931]. These works highlight opportunities to close the
(3) We implement a hierarchical aggregation scheme utilizing Hilbert
gap between personalized point-level protection and structural aggrega-
spatial mapping and logarithmic layer-wise budget distribution.
tion, motivating AdaTraj-DPs integration of context-aware sensitivity
Experiments on the T-Drive and GeoLife datasets validate the
detection, adaptive perturbation, and hierarchical encoding to support
frameworks effectiveness in preserving query accuracy, spatial
AI-oriented downstream tasks.
consistency, and AI model performance under varying privacy
budgets. 3. Preliminaries
2. Related work
Trajectory Representation. A trajectory 𝑇𝑖 of user 𝑢𝑖 is a temporally
Existing privacy-preserving trajectory publishing approaches can ordered sequence of geo-referenced points [32]:
be broadly categorized into three classes: (1) foundational differen- 𝑇𝑖 = {(𝑝𝑖,1 , 𝑡𝑖,1 ), (𝑝𝑖,2 , 𝑡𝑖,2 ), … , (𝑝𝑖,𝐿𝑖 , 𝑡𝑖,𝐿𝑖 )}, (1)
tial privacy models that ensure privacy but compromise trajectory
continuity; (2) structural aggregation mechanisms that enhance data where 𝑝𝑖,𝑗 = (lat 𝑖,𝑗 , lon𝑖,𝑗 ) denotes the spatial coordinate and 𝑡𝑖,𝑗 is the
utility via hierarchical organization; and (3) personalized and adaptive timestamp. The trajectory dataset is denoted as  = {𝑇1 , 𝑇2 , … , 𝑇𝑁 }.
privacy protection strategies that tailor noise to sensitivity but often Each point can be projected into a discrete grid cell 𝑐𝑖,𝑗 for statistical
lack integration with structural models. This section reviews these three analysis or further spatial encoding. The dimensionality and sampling
directions and discusses recent advances that motivate AdaTraj-DP. irregularity of  result in high sparsity and heterogeneous sensitivity
among locations, which requires adaptive privacy mechanisms.
2.1. Foundational models for differentially private trajectory publishing Differential Privacy. Let 1 and 2 be two neighboring datasets dif-
fering in at most one trajectory. A randomized mechanism  satisfies
Differential Privacy (DP) [6] is the standard formalism for privacy- 𝜀-differential privacy if for any measurable subset 𝑂 in the output
preserving data publication. Early approaches discretize continuous space:
spatio-temporal domains and inject Laplace noise into cell counts
Pr[(1 ) ∈ 𝑂] ≤ 𝑒𝜀 Pr[(2 ) ∈ 𝑂]. (2)
or simple aggregates [14,15], but such methods often disrupt tra-
jectory continuity and reduce utility for route-level analysis [7]. To The privacy budget 𝜀 > 0 controls the trade-off between privacy pro-
address this, research has explored trajectory generalization and syn- tection and data utility. Smaller 𝜀 implies stronger privacy guarantees
thetic data generation under DP, including clustering-based generaliza- but larger perturbation noise.
tion [16] and GAN-based synthetic trajectory models [1719]. Work For a numerical query 𝑓  → R𝑘 with 𝓁1 sensitivity 𝛥𝑓 =
on DP-aware data exploration and visualization—e.g., DPKnob and max1 ,2 ‖𝑓 (1 ) 𝑓 (2 )‖1 , the Laplace mechanism adds independent
Defogger—highlights the challenge of configuring DP mechanisms to noise drawn from the Laplace distribution:
balance utility and risk in interactive settings and motivates user- or
() = 𝑓 () + Lap(𝛥𝑓 ∕𝜀). (3)
task-guided privacy configuration [20,21].
This mechanism provides 𝜀-differential privacy and is used in sub-
2.2. Structural aggregation for utility enhancement sequent trajectory perturbation and aggregation processes.
Geographic Indistinguishability. For any two spatial points 𝑥, 𝑥 ∈ R2
Hierarchical structures—such as prefix trees, Hilbert-encoded se-
and any reported location 𝑧, a mechanism  achieves 𝜀-geographic
quences, and spatial index trees—have been widely adopted to preserve
indistinguishability if
aggregate query utility under DP. Early prefix-tree methods aggre-
gate shared prefixes to reduce noise impact [22,23], while R-tree and Pr[(𝑥) = 𝑧] ≤ 𝑒𝜀⋅𝑑(𝑥,𝑥 ) Pr[(𝑥 ) = 𝑧], (4)
2
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
by combining statistical frequency and contextual semantics to guide
subsequent adaptive perturbation.
Spatial Discretization. The continuous geographical domain is parti-
tioned into a uniform grid of 𝐺 × 𝐺 cells. Each point 𝑝𝑖,𝑗 is mapped to
a corresponding grid cell 𝑐𝑖,𝑗 . This transformation converts raw coordi-
nates into discrete spatial tokens, enabling frequency-based statistical
analysis.
Fig. 1. Framework of the proposed AdaTraj-DP scheme. Context-aware Sensitivity Measure. For each cell 𝑐𝑖,𝑗 , a sensitivity score
𝑆(𝑐𝑖,𝑗 ) is defined as
𝑆(𝑐𝑖,𝑗 ) = TF(𝑐𝑖,𝑗 , 𝑇𝑖 ) ⋅ IDF(𝑐𝑖,𝑗 ) ⋅ 𝜔𝑐 , (6)
where 𝑑(𝑥, 𝑥 ) is the Euclidean distance between 𝑥 and 𝑥 [33]. count(𝑐𝑖,𝑗 ∈𝑇𝑖 )
This formulation extends differential privacy to continuous spatial where TF(𝑐𝑖,𝑗 , 𝑇𝑖 ) = 𝐿𝑖
represents the normalized local fre-
||
domains and provides distance-dependent protection. quency of visits within trajectory 𝑇𝑖 , and IDF(𝑐𝑖,𝑗 ) = log |{𝑇 ∈∶𝑐
𝑘 𝑖,𝑗 ∈𝑇𝑘 }|
Hierarchical Aggregation Structure. Trajectory data exhibit hierarchi- denotes the global rarity of the location across the dataset. The term
cal correlations that can be represented through prefix-based aggre- 𝜔𝑐 is a contextual weighting coefficient that quantifies the semantic
gation. Let each discretized or encoded trajectory be expressed as a sensitivity of a location category. Following the semantic sensitivity
hierarchy established in [34], we assign higher weights to privacy-
sequence of spatial identifiers 𝑆𝑖 = [𝑠𝑖,1 , 𝑠𝑖,2 , … , 𝑠𝑖,𝐿𝑖 ]. A prefix tree 
critical categories (e.g., 𝜔ℎ𝑒𝑎𝑙𝑡ℎ𝑐𝑎𝑟𝑒 = 1.5, 𝜔𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡𝑖𝑎𝑙 = 1.2) to enforce
organizes all trajectories in  by shared prefixes, where each node 𝑣
stricter protection, while assigning lower base weights to public infras-
corresponds to a spatial prefix and maintains a count 𝑐(𝑣) of trajectories
tructure (e.g., 𝜔𝑟𝑜𝑎𝑑 = 1.0). These semantic categories are mapped from
passing through it. The hierarchical form allows noise to be injected at
public map services (e.g., OpenStreetMap), ensuring that the sensitivity
multiple granularities while preserving global spatial consistency.
configuration relies solely on public knowledge and does not consume
The total privacy budget 𝜀tree is distributed across tree layers to the private budget.
balance upper-level accuracy and lower-level detail preservation.
Normalization and Classification. To unify the sensitivity scale, all
Problem Definition. Given a trajectory dataset  consisting of 𝑁 users scores are normalized into [0, 1]:
and a total privacy budget𝜀total , the objective is to design a mechanism
𝑆(𝑐𝑖,𝑗 ) min(𝑆)
traj that releases a trajectory dataset ̃ = traj () satisfying: ̂ 𝑖,𝑗 ) =
𝑆(𝑐 . (7)
max(𝑆) min(𝑆)
Each point 𝑝𝑖,𝑗 is then labeled as sensitive or non-sensitive according
(1) traj ensures 𝜀total -differential privacy at the trajectory level;
to a predefined threshold 𝜃𝑆 :
(2) The released dataset ̃ preserves statistical and structural prop- {
erties essential for AI-based spatiotemporal analysis; ̂ 𝑖,𝑗 ) ≥ 𝜃𝑆 ,
1, if 𝑆(𝑐
label(𝑝𝑖,𝑗 ) = (8)
(3) The expected analytical error between results obtained from ̃ 0, otherwise.
and  remains bounded. The resulting annotated dataset is represented as ′ = {𝑇1 , 𝑇2 , … , 𝑇𝑁 },
where each 𝑇𝑖 contains the points and corresponding sensitivity labels.
Let 𝑓AI (⋅) denote an AI model trained or evaluated on trajectory The normalized score 𝑆(𝑐 ̂ 𝑖,𝑗 ) serves as a continuous privacy indicator in
data. The utility preservation objective is formulated as the subsequent adaptive perturbation phase.
[ ]
̃ 𝑓AI ()‖2 ,
𝐿utility = E ‖𝑓AI () (5)
2 4.2. Adaptive personalized perturbation
subject to ̃ satisfying 𝜀total -differential privacy. The goal is to minimize
𝐿utility while maintaining formal privacy guarantees. This phase injects controlled noise into all trajectory points in ′ to
ensure trajectory-level differential privacy. All locations are perturbed
4. Proposed framework to avoid inference risks arising from selective protection. The perturba-
tion strength is adaptively adjusted based on the normalized sensitivity
̂ 𝑖,𝑗 ) and local spatial density, allowing the mechanism to preserve
𝑆(𝑐
Rapid development of AI-driven spatiotemporal analysis has in-
creased the demand for high-quality trajectory data with strong privacy analytical fidelity while maintaining formal privacy guarantees.
protection. Traditional differential privacy mechanisms often adopt Adaptive Privacy Budget Allocation. Each trajectory point 𝑝𝑖,𝑗 is as-
fixed noise scales or uniform budget allocation, which can cause exces- signed an individual privacy budget 𝜀𝑝𝑖,𝑗 determined by both its sensi-
sive utility degradation in dense areas or insufficient protection in sensi- tivity level and spatial context.
tive regions. To address these limitations, this study proposes AdaTraj- Let 𝜌(𝑝𝑖,𝑗 ) denote the local point density around 𝑝𝑖,𝑗 within a neigh-
DP, a framework that integrates adaptive personalized perturbation borhood radius 𝑟. The adaptive budget is defined as
with hierarchical aggregation to achieve trajectory-level differential ( )
̂ 𝑖,𝑗 ) + (1 𝛼)(1 𝜌(𝑝𝑖,𝑗 )) ,
𝜀𝑝𝑖,𝑗 = 𝜀max (𝜀max 𝜀min ) × 𝛼 𝑆(𝑐 (9)
privacy while maintaining analytical utility for AI-based modeling.
As illustrated in Fig. 1, AdaTraj-DP operates in three main phases: where 𝛼 ∈ [0, 1] controls the balance between sensitivity-based and
(1) trajectory preprocessing and context-aware sensitivity detection; density-based adaptation.
(2) adaptive personalized perturbation guided by local sensitivity and A higher 𝑆(𝑐 ̂ 𝑖,𝑗 ) or lower 𝜌(𝑝𝑖,𝑗 ) leads to a smaller 𝜀𝑝 , introducing
𝑖,𝑗
spatial density; (3) hierarchical aggregation using Hilbert encoding and stronger noise for privacy-critical or sparsely visited regions. The range
dynamic layer-wise budget allocation. [𝜀min , 𝜀max ] defines the permissible privacy strength, ensuring stability
across heterogeneous data distributions.
4.1. Context-aware sensitivity detection
Two-Dimensional Laplace Perturbation. For each point 𝑝𝑖,𝑗 = (lat 𝑖,𝑗 , lon𝑖,𝑗 ),
independent Laplace noise is applied to both coordinates according to
Let  = {𝑇1 , … , 𝑇𝑁 } denote the trajectory dataset after basic
the assigned privacy budget:
preprocessing. Each trajectory 𝑇𝑖 = {(𝑝𝑖,1 , 𝑡𝑖,1 ), … , (𝑝𝑖,𝐿𝑖 , 𝑡𝑖,𝐿𝑖 )} consists {
of temporally ordered spatial points 𝑝𝑖,𝑗 = (lat 𝑖,𝑗 , lon𝑖,𝑗 ). The objective lat 𝑖,𝑗 + Laplace(0, 1𝜀𝑝𝑖,𝑗 )
𝑝𝑖,𝑗 = (10)
of this phase is to quantify the privacy sensitivity of each spatial point lon𝑖,𝑗 + Laplace(0, 1𝜀𝑝𝑖,𝑗 )
3
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
Algorithm 1 Adaptive Personalized Perturbation under AdaTraj-DP Algorithm 2 Dynamic Hierarchical Aggregation under AdaTraj-DP
Input: Annotated dataset ′ , privacy range [𝜀min , 𝜀max ], sensitivity Input: Perturbed dataset ′′ , total tree budget 𝜀tree , height ,
scores 𝑆, ̂ balance coefficient 𝛼 parameters 𝑎, 𝛾, encoding length 𝐿enc
Output: Perturbed dataset ′′ Output: Privacy-aware prefix tree 
1: ′′ ← ∅ 1: Initialize empty tree 
2: for each trajectory 𝑇𝑖 ∈ ′ do 2: for each trajectory 𝑇𝑖 = {𝑝𝑖,1 , … , 𝑝𝑖,𝐿 } in ′′ do
𝑖
3: 𝑇𝑖 ← ∅ 3: Encode trajectory:
4: for each point 𝑝𝑖,𝑗 in 𝑇𝑖 do 𝑆𝑖 ← [Encode1D(𝐻(𝑝𝑖,1 )), … , Encode1D(𝐻(𝑝𝑖,𝐿 ))]
𝑖
5: Compute local density 𝜌(𝑝𝑖,𝑗 ) 4: Insert 𝑆𝑖 into  and increment node counts along each path
6: 𝜀𝑝𝑖,𝑗 ← 𝜀max (𝜀max 𝜀min ) × (𝛼 𝑆(𝑐 ̂ 𝑖,𝑗 ) + (1 𝛼)(1 𝜌(𝑝𝑖,𝑗 ))) 5: end for
7: 𝑛lat Laplace(0, 1𝜀𝑝𝑖,𝑗 ) 6: for layer 𝑖 = 1 to do
8: 𝑛lon Laplace(0, 1𝜀𝑝𝑖,𝑗 ) 7: Compute node count variance 𝜎𝑖2
9: 𝑝𝑖,𝑗 ← (lat 𝑖,𝑗 + 𝑛lat , lon𝑖,𝑗 + 𝑛lon ) (log(𝑖+𝑎))(1+𝛾𝜎𝑖2 )
8: 𝜀level,𝑖 ← ∑ℎ ⋅ 𝜀tree
10: Append 𝑝𝑖,𝑗 to 𝑇𝑖 2
𝑗=1 (log(𝑗+𝑎))(1+𝛾𝜎𝑗 )
11: end for 9: for each node 𝑣 at layer 𝑖 do
12: Add 𝑇𝑖 to ′′ 10: 𝑐 (𝑣) ← 𝑐(𝑣) + Laplace(0, 1𝜀level,𝑖 )
13: end for 11: Update 𝑐(𝑣) ← 𝑐 (𝑣)
14: return ′′ 12: end for
13: end for
14: return 
The perturbed trajectory 𝑇𝑖 = {𝑝𝑖,1 , 𝑝𝑖,2 , … , 𝑝𝑖,𝐿 } is constructed by
𝑖
replacing each original point with its perturbed counterpart. The com-
plete differentially private dataset is denoted as  = {𝑇1 , 𝑇2 , … , 𝑇𝑁 }.
loss in fine-grained trajectories, the logarithmic term ensures that leaf
Algorithm 1 outlines the adaptive personalized perturbation proce- nodes retain sufficient privacy budget to preserve local spatial details.
dure. Differentially Private Node Perturbation. For each node 𝑣 at layer 𝑖,
the sensitivity of its count query is 𝛥𝑓 = 1. Laplace noise is applied
according to its layer-wise budget:
4.3. Hierarchical aggregation with dynamic budget allocation
( )
1
𝑐 (𝑣) = 𝑐(𝑣) + Laplace 0, . (13)
This phase organizes the perturbed trajectories into a structured 𝜀level,𝑖
form for privacy-preserving analytical querying and AI model training. The resulting prefix tree  with perturbed counts serves as a
A hierarchical prefix tree is constructed from the encoded trajectories, privacy-preserving hierarchical representation supporting aggregate
where node counts are perturbed under a dynamically adjusted budget analytics and AI-based trajectory modeling.
to preserve global consistency while mitigating noise propagation. Algorithm 2 summarizes the hierarchical aggregation process with
dynamic budget adjustment.
Spatial Encoding via Hilbert Curve. Each perturbed point 𝑝𝑖,𝑗 ∈ ′′
is mapped into a one-dimensional integer value 𝑣𝑖,𝑗 using a Hilbert
space-filling curve 𝐻(⋅), ensuring spatial locality preservation: 4.4. Privacy analysis
𝑣𝑖,𝑗 = 𝐻(𝑝𝑖,𝑗 ). (11)
The proposed AdaTraj-DP framework comprises two sequential
Each integer value 𝑣𝑖,𝑗 is then converted into a fixed-length binary privacy-preserving mechanisms: adaptive personalized perturbation
string 𝑠𝑖,𝑗 of length 𝐿enc , forming a discretized trajectory representation (with budget 𝜀point ) and hierarchical aggregation (with budget 𝜀tree ).
𝑆𝑖 = [𝑠𝑖,1 , 𝑠𝑖,2 , … , 𝑠𝑖,𝐿𝑖 ]. The set of all encoded trajectories {𝑆𝑖 } consti- By the sequential composition theorem of differential privacy, the total
tutes the input to hierarchical aggregation. The technical details of this privacy guarantee satisfies
Hilbert-to-binary-string encoding, including the relationship between 𝜀total = 𝜀point + 𝜀tree . (14)
the curves order and the string length, are elaborated in Appendix.
Prefix Tree Construction. A prefix tree  is built from {𝑆𝑖 }, where each Privacy of Adaptive Personalized Perturbation (𝜀point ). The adaptive
path from the root to a node 𝑣 represents a spatial prefix, and the node perturbation mechanism assigns an individual privacy budget 𝜀𝑝𝑖,𝑗 to
count 𝑐(𝑣) indicates the number of trajectories sharing that prefix. The ̂ 𝑖,𝑗 )
each trajectory point 𝑝𝑖,𝑗 derived from its normalized sensitivity 𝑆(𝑐
maximum tree depth corresponds to the maximum trajectory length and local density 𝜌(𝑝𝑖,𝑗 ). To ensure rigorous privacy guarantees, it is
or encoding depth. assumed that the global weighting parameters (e.g., contextual weights
𝜔𝑐 and density thresholds) are computed from public sources, such as
Dynamic Layer-wise Budget Allocation. The total privacy budget 𝜀tree
map topologies or non-sensitive historical statistics. This reliance on
is distributed across tree layers according to both layer depth and
public metadata is a standard practice in privacy-preserving spatial
statistical variance. Let 𝜎𝑖2 denote the empirical variance of node counts
publishing [14,33], ensuring that the sensitivity calibration process
at layer 𝑖. The adaptive allocation for layer 𝑖 is defined as
itself does not leak private information. Consequently, the allocated
(log(𝑖 + 𝑎)) ⋅ (1 + 𝛾𝜎𝑖2 ) budget 𝜀𝑝𝑖,𝑗 depends solely on the characteristics of its corresponding
𝜀level,𝑖 = ∑ℎ ⋅ 𝜀tree , (12) trajectory 𝑇𝑖 . Under this assumption:
2
𝑗=1 (log(𝑗 + 𝑎))(1 + 𝛾𝜎𝑗 )
where 𝑎 > 0 is a smoothing parameter and 𝛾 ≥ 0 controls the weight of (1) The assignment of 𝜀𝑝𝑖,𝑗 relies solely on local statistics within 𝑇𝑖
variance-based adjustment. Adopting the logarithmic strategy from [9], and public constants, which ensures independence among users.
the function log(𝑖 + 𝑎) is selected to smooth the budget decay across (2) Each trajectory is processed through an independent Laplace
layers. Unlike linear or exponential allocation schemes, which might mechanism. For any point 𝑝𝑖,𝑗 , the Laplace mechanism with scale
excessively penalize deeper layers and lead to significant information 1𝜀𝑝𝑖,𝑗 satisfies 𝜀𝑝𝑖,𝑗 -differential privacy.
4
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
(3) Because the budgets are bounded within [𝜀min , 𝜀max ], the overall Both datasets are preprocessed by: (1) removing sampling intervals
privacy cost of this phase is dominated by the smallest allocated exceeding 300 s; (2) filtering out trajectories shorter than 20 points;
budget, and the worst-case (strongest) guarantee corresponds to (3) normalizing all coordinates into a [0, 1] × [0, 1] grid to ensure scale
𝜀min -DP for each point. comparability.
(4) By parallel composition across trajectories, the global privacy These datasets collectively provide both high-density and low-
consumption of this phase is 𝜀point = 𝜀max , representing the max- density spatial distributions, enabling a fair evaluation of the proposed
imum privacy loss incurred when the weakest noise is added. context-aware sensitivity modeling.
Hence, the adaptive perturbation phase satisfies 𝜀max -differential 5.1.2. Baseline methods
privacy. To demonstrate the advantages of AdaTraj-DP, we compare it with
Privacy of Hierarchical Aggregation (𝜀tree ). The hierarchical aggrega- four representative baselines, each reflecting a distinct privacy design
tion mechanism constructs a prefix tree and perturbs its node counts paradigm:
with layer-specific noise calibrated by 𝜀level,𝑖 . Each trajectory affects
• HA-Tree [9]: A hierarchical aggregation method based on Hilbert
exactly one node per layer, implying that the sensitivity of the count
mapping and fixed logarithmic budget allocation, representing
query at any layer is 𝛥𝑓 = 1. Adding Laplace noise with scale 1𝜀level,𝑖
state-of-the-art static DP trees.
guarantees 𝜀level,𝑖 -DP for that layer.
• TFIDF-DP [13]: A personalized perturbation method using TF
Because the per-layer budgets 𝜀level,𝑖 are partitioned from 𝜀tree ac-
IDF-based sensitivity scoring without hierarchical structure, cor-
cording to
responding to point-level DP only.
• QJLP (LDP) [7]: A local differential privacy baseline where each
𝜀level,𝑖 = 𝜀tree , (15) trajectory is perturbed independently on the client side.
𝑖=1
• AdaTraj-DP (Ours): The proposed adaptive framework that com-
and the layers are sequentially composed along each trajectory path, bines context-aware sensitivity detection, adaptive perturbation,
the entire prefix tree synthesis mechanism satisfies 𝜀tree -differential and dynamic hierarchical aggregation.
privacy. The dynamic allocation factor (1 + 𝛾𝜎𝑖2 ) modifies the budget
distribution without altering the total privacy bound, ensuring that the 5.1.3. Evaluation metrics
overall guarantee remains unchanged. Performance is evaluated from three complementary perspectives:
Overall Privacy Guarantee. Applying the sequential composition theo- Data Utility. We adopt three quantitative metrics: Mean Absolute Error
rem to the two phases yields the total privacy protection level: (MAE), Mean Relative Error (MRE), and Hausdorff Distance (HD).
𝜀total = 𝜀max + 𝜀tree . (16) MAE and MRE evaluate accuracy for range-count queries on perturbed
trajectories, while HD measures spatial fidelity between original and
This ensures that AdaTraj-DP provides formal, trajectory-level released datasets.
differential privacy. The adaptive and hierarchical mechanisms jointly
Model Utility. To align with AI-oriented evaluation, we train a down-
maintain consistent privacy guarantees while supporting utility-
stream trajectory classification model based on a lightweight Mamba
preserving analysis for AI-based spatiotemporal modeling.
encoder [37]. The model predicts driver ID from trajectory segments,
and classification accuracy on the perturbed data reflects end-task
5. Experimental evaluation
utility (𝑈cls ).
This section presents an extensive empirical evaluation of the pro- Computational Efficiency. We report total runtime (𝑇total ) from prepro-
posed AdaTraj-DP framework. The experiments aim to validate both cessing to privacy-protected publication, including all three phases of
privacy preservation and analytical utility in AI-oriented trajectory AdaTraj-DP.
publishing. Specifically, we address the following research questions:
5.1.4. Parameter configuration
• RQ1: How does the total privacy budget 𝜀total affect the analytical Unless otherwise stated, experiments use the following default con-
utility of the released trajectories? figuration: the total privacy budget 𝜀total is divided by an allocation
• RQ2: How does AdaTraj-DP perform compared to state-of-the- ratio 𝛼, where 𝛼 ∈ [0.3, 0.7] controls the portion used for adaptive
art differential privacy mechanisms in terms of accuracy and perturbation (𝜀point ), and (1 𝛼) for hierarchical aggregation (𝜀tree ):
computational efficiency?
• RQ3: What are the impacts of the adaptive parameters—including 𝜀point = 𝛼𝜀total , 𝜀tree = (1 𝛼)𝜀total . (17)
allocation ratio 𝛼 and variance factor 𝛾—on privacyutility trade- We vary 𝜀total from 0.5 to 3.0 to investigate the privacyutility
offs? trade-off.
The variance factor 𝛾 controlling dynamic budget adaptation is se-
5.1. Experimental setup lected from {0, 0.2, 0.5, 1.0}, and the hierarchical smoothing parameter
is set to 𝑎 = 1.0. The sensitivity threshold 𝜃𝑆 for classifying sensitive
This subsection introduces the datasets, baseline methods, evalua- points is chosen from {0.6, 0.7, 0.8, 0.9}. The personalized budget range
tion metrics, and parameter configurations used in the experiments. is fixed at [𝜀min , 𝜀max ] = [0.1, 1.0].
To ensure comparability, all methods share identical grid resolution
5.1.1. Datasets (𝐺 = 128) and Hilbert encoding length (𝐿enc = 16). All experiments are
Experiments are primarily conducted on the widely used T-Drive implemented in Python 3.8 with PyTorch 2.4 on an NVIDIA RTX 4090
dataset, which records GPS trajectories of 10,357 taxis in Beijing GPU.
over seven days (February 28, 2008) [35]. It contains approximately
15 million spatial points after preprocessing. To further verify cross- 5.2. RQ1: Data utility evaluation
domain robustness, we additionally include the GeoLife dataset [36],
which comprises 17,621 trajectories from 182 users, covering both This experiment evaluates how AdaTraj-DP preserves the analytical
dense urban and sparse suburban mobility patterns. utility of published trajectories under different privacy budgets. All
5
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
(a) MAE of Count Queries (b) MRE of Count Queries
Fig. 2. Trajectory count query accuracy under varying 𝜀total on both datasets.
evaluations are conducted on both the T-Drive and GeoLife datasets, Table 1
covering dense and sparse mobility scenarios to ensure cross-domain Spatial fidelity comparison (average over T-Drive and GeoLife datasets). Lower
consistency. values indicate higher spatial accuracy.
𝜀total Hausdorff Distance (HD) Mean Displacement (MD)
5.2.1. Accuracy of trajectory count queries AdaTraj-DP Best Baseline AdaTraj-DP Best Baseline
We evaluate the ability of each method to answer prefix-based count 0.5 0.152 0.171 (HA-Tree) 0.098 0.113 (HA-Tree)
queries accurately. For each dataset, a query set  consisting of 1000 1.0 0.096 0.127 (HA-Tree) 0.069 0.087 (HA-Tree)
1.5 0.089 0.125 (TFIDF-DP) 0.063 0.088 (TFIDF-DP)
random trajectory prefixes with lengths between 4 and 8 is selected.
2.0 0.083 0.118 (TFIDF-DP) 0.059 0.083 (TFIDF-DP)
Let 𝑐(𝑞) denote the true count of trajectories matching prefix 𝑞 ∈ , and 3.0 0.079 0.130 (QJLP) 0.056 0.094 (QJLP)
𝑐(𝑞)
̂ be the noisy count returned by the mechanism. The data utility is
quantified using Mean Absolute Error (MAE) and Mean Relative Error
(MRE), defined as:
tasks. Two representative learning tasks are considered: (1) trajectory
1 ∑ 1 ∑ |𝑐(𝑞) 𝑐(𝑞)|
̂
MAE = |𝑐(𝑞) 𝑐(𝑞)|,
̂ MRE = (18) classification, which predicts the semantic category of a movement se-
|| 𝑞∈ || 𝑞∈ max(𝑐(𝑞), 𝛿)
quence; (2) destination prediction, which estimates the likely endpoint
where 𝛿 is a smoothing parameter (set to 1% of the total dataset size) of an ongoing trajectory. These tasks are evaluated on the T-Drive
to prevent division by zero for small counts. The results are averaged and GeoLife datasets to reflect both dense and sparse urban mobility
over ten repetitions with independent noise realizations. environments.
Effect of Privacy Budget 𝜀total . Figs. 2(a) and 2(b) illustrate the quan- 5.3.1. Trajectory classification
titative relationship between privacy strength and data utility. All A hierarchical Transformer-based model with positional encoding is
methods exhibit a convex error decay curve as 𝜀total increases from 0.5 trained on the published trajectories to perform multi-class trajectory
to 3.0, reflecting the fundamental differential privacy trade-off. classification. The model architecture follows a standard encoder setup
In the strict privacy regime (𝜖𝑡𝑜𝑡𝑎𝑙 ∈ [0.5, 1.5]), our method achieves with three attention layers and a hidden size of 256. Each experiment
the steepest marginal reduction in MAE, indicating a high return on is repeated five times under independent noise realizations, and the
privacy budget investment. Specifically, when 𝜖𝑡𝑜𝑡𝑎𝑙 increases from 0.5 average classification accuracy and macro F1-score are reported. The
to 1.0, AdaTraj-DP reduces the MAE by approximately 45.3% (from total privacy budget 𝜀total is varied from 0.5 to 3.0.
18.1 to 9.9), whereas the second-best baseline, HA-Tree, only achieves
Effect of Privacy Budget 𝜀total . Figs. 4(a) and 4(b) illustrate the influ-
a 31.4% reduction. This quantitative gap demonstrates that AdaTraj-
ence of 𝜀total on model performance. As the privacy budget increases,
DP yields a significantly higher marginal utility gain for every unit of
both accuracy and F1-score improve across all methods. AdaTraj-
privacy budget expended compared to static hierarchical structures.
DP consistently maintains the highest model utility on both datasets,
demonstrating that adaptive sensitivity control effectively preserves
5.2.2. Preservation of spatial distribution
discriminative features. The hierarchical tree representation mitigates
Spatial fidelity evaluates the geometric similarity between the orig-
local noise accumulation, supporting stable model convergence.
inal and perturbed trajectories. We use two complementary metrics:
the Hausdorff Distance (HD) for worst-case deviation and the Mean 5.3.2. Destination prediction
Displacement (MD) for average positional distortion. To evaluate predictive consistency, a sequence-to-sequence neural
Effect of Privacy Budget 𝜀total . Fig. 3 and Table 1 summarize the spatial decoder is trained to predict the destination region of each trajectory
accuracy across privacy levels. For both T-Drive and GeoLife datasets, prefix. Prediction accuracy is measured by the top-1 hit rate, while
AdaTraj-DP consistently achieves smaller deviations, demonstrating its spatial accuracy is quantified by the mean geodesic distance between
robustness across data densities and spatial patterns. The sensitivity- predicted and true destinations.
guided perturbation preserves local consistency, while adaptive budget Effect of Privacy Budget 𝜀total . Figs. 5(a) and 5(b) illustrate the results
redistribution reduces distortion in dense urban regions. of destination prediction across both datasets. AdaTraj-DP maintains
Overall, AdaTraj-DP demonstrates consistent spatial and statisti- stable predictive performance even under strict privacy constraints
cal accuracy across both datasets, validating its generalizability to (𝜀total < 1.0), consistently outperforming fixed-budget baselines that
heterogeneous mobility distributions. cannot adapt to local sensitivity variations. As the privacy budget
increases, the prediction accuracy steadily improves, while the mean
5.3. RQ2: Model utility evaluation spatial deviation between predicted and true destinations decreases.
This demonstrates that adaptive perturbation and hierarchical encoding
This experiment evaluates how the differentially private trajectories together preserve mobility semantics and ensure downstream models
generated by AdaTraj-DP retain their utility for AI-based downstream can effectively capture trajectory intent despite injected noise.
6
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
(a) Hausdorff Distance vs. Privacy (b) Mean Displacement vs. Privacy
Budget Budget
Fig. 3. Spatial fidelity comparison on T-Drive and GeoLife datasets.
(a) Classification Accuracy (b) F1-score
Fig. 4. Trajectory classification performance under varying 𝜀total on T-Drive and GeoLife datasets.
(a) Destination Prediction Accuracy (b) Destination Prediction Mean Dis-
(Top-1 Hit Rate) tance Error (km)
Fig. 5. Destination prediction accuracy and spatial deviation under varying 𝜀total on T-Drive and GeoLife datasets.
5.4. RQ3: Parameter sensitivity analysis 𝛼 = 0.6, where both the query error and model accuracy achieve
near-balanced performance. When 𝛼 < 0.4, excessive noise in point
This experiment investigates the effect of key parameters in AdaTraj- perturbation causes degraded spatial precision, while 𝛼 > 0.8 reduces
DP on privacyutility balance, focusing on two critical hyperparame- the reliability of aggregated counts in the prefix tree, highlighting the
ters: the budget allocation ratio 𝛼 and the sensitivity threshold 𝜃TFIDF . necessity of coordinated budget allocation.
All experiments are conducted with the total privacy budget 𝜀total = 1.5 In practice, the optimal 𝛼 depends on the specific utility require-
on both the T-Drive and GeoLife datasets. ments. For applications prioritizing fine-grained point precision (e.g.,
destination prediction), a larger 𝛼 (e.g., 0.60.7) is recommended to
5.4.1. Effect of budget allocation ratio 𝛼 allocate more budget to the perturbation phase. Conversely, for range
The parameter 𝛼 controls the distribution of the total privacy budget query tasks relying on aggregate statistics, a smaller 𝛼 favors the hier-
between the point-level perturbation and the hierarchical tree aggre- archical tree structure. An empirical strategy for parameter selection
gation phases, where 𝜀point = 𝛼𝜀total and 𝜀tree = (1 𝛼)𝜀total . A small involves using a small, non-sensitive validation set to estimate the
𝛼 assigns more budget to aggregation, reducing hierarchical noise, inflection point of the loss function. A balanced initialization of 𝛼 = 0.6
whereas a large 𝛼 increases point-level fidelity at the expense of tree is recommended as a default setting, which prioritizes neither point-
consistency. We vary 𝛼 from 0.1 to 0.9 and evaluate both data utility level perturbation nor structural aggregation excessively. To ensure
and model accuracy. privacy integrity, this validation set is constructed from public histor-
Figs. 6 presents the effect of 𝛼 on count query error (MAE) and ical trajectory data (e.g., open-source T-Drive samples) or a disjoint
trajectory classification accuracy. An optimal trade-off is observed near subset of historical records that does not overlap with the private
7
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
Fig. 8. Computational cost decomposition of AdaTraj-DP across three key
Fig. 6. Impact of budget allocation ratio 𝛼 on query utility and model
stages.
performance at 𝜀total = 1.5.
T-Drive dataset and the sparse, diverse GeoLife dataset. This cross-
dataset stability suggests that AdaTraj-DP is robust to heterogeneous
spatial distributions, indicating that a standard parameter configura-
tion can yield reliable performance without the need for exhaustive
hyperparameter retuning for every new application scenario.
5.5. Scalability analysis
To address practical deployment concerns, particularly for city-wide
scenarios, we analyze the scalability of AdaTraj-DP regarding both
dataset volume (number of users 𝑁) and temporal duration (trajectory
length 𝐿).
Scalability to Large-scale User Datasets. The computational complex-
Fig. 7. Effect of the sensitivity threshold 𝜃TFIDF on spatial fidelity and predic- ity of AdaTraj-DP is dominated by the linear scanning of trajectory
tive performance at 𝜀total = 1.5. points. Specifically, the sensitivity detection and adaptive perturbation
phases operate on each trajectory independently, with a time complex-
ity of 𝑂(𝑁𝐿). This independence allows for trivial parallelization
across multiple processors, significantly reducing runtime on large-
dataset . This separation guarantees that the hyperparameter tuning
scale datasets. Furthermore, the hierarchical aggregation phase inserts
process relies solely on public knowledge and does not consume the
encoded sequences into the prefix tree with a complexity of 𝑂(𝑁𝐿),
privacy budget allocated for the sensitive data.
avoiding the quadratic 𝑂(𝑁 2 ) pairwise comparisons often required by
clustering-based or 𝐾-anonymity approaches. Consequently, the run-
5.4.2. Effect of sensitivity threshold 𝜃TFIDF time of AdaTraj-DP grows linearly with the number of users, indicating
The threshold 𝜃TFIDF determines how many trajectory points are that the framework is scalable to large-scale spatiotemporal datasets
classified as sensitive during the TFIDF-based detection process. A typical of modern urban computing.
smaller threshold labels more points as sensitive, resulting in stronger
Robustness for Long Historical Trajectories. For long historical tra-
protection but higher noise magnitude. We vary 𝜃TFIDF from 0.6 to 1.2
jectories, the challenge lies in maintaining structural efficiency and
and evaluate the mean displacement (MD) and destination prediction
data utility as the sequence length increases. AdaTraj-DP addresses this
accuracy.
through two mechanisms:
Figs. 7 depicts the variation of spatial fidelity and predictive util-
ity under different 𝜃TFIDF values. As 𝜃TFIDF increases, the number of (1) Efficient Encoding: The Hilbert space-filling curve maps high-
sensitive points decreases, leading to reduced perturbation intensity dimensional spatial points into 1D integers via efficient bit-
and smaller average displacement. However, excessively large 𝜃TFIDF wise operations. Since the encoding complexity is constant per
weakens privacy coverage and slightly degrades downstream predic- point, the computational cost scales linearly with the trajectory
tion accuracy. The optimal setting is observed around 𝜃TFIDF = 0.9, length, avoiding the performance bottlenecks often associated
balancing spatial accuracy with model generalization. with complex sequence alignment methods.
(2) Depth-Robust Aggregation: Long trajectories naturally necessitate
5.4.3. Generalization and parameter stability deeper prefix trees, which typically suffer from severe budget
In the ablation studies presented above, we observed that the frame- dilution at lower levels. AdaTraj-DP addresses this through its
works utility is responsive to variations in the budget allocation ratio logarithmic layer-wise allocation (Eq. (12)), which dampens
𝛼 and sensitivity threshold 𝜃TFIDF , particularly when these parameters the noise increase rate relative to tree depth. This mechanism
approach the boundaries of their respective ranges. This sensitivity ensures that the tail ends of extended mobility sequences re-
necessitates a discussion on the models generalization capabilities tain analytical utility, preventing the rapid signal degradation
across different data distributions. commonly observed in uniform allocation schemes.
While the framework exhibits sensitivity to extreme parameter vari-
ations, it is worth noting that the optimal operating points (𝛼 ≈ Empirical Efficiency Evaluation. To complement the theoretical com-
0.6, 𝜃TFIDF ≈ 0.9) remain consistent across both the high-density plexity analysis, Fig. 8 presents the empirical runtime decomposition
8
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
of AdaTraj-DP on the T-Drive dataset. The total processing time is This transformation is controlled by the Hilbert curves order pa-
approximately 250 s. As observed, the TFIDF Analysis phase con- rameter, designated as 𝑘. When applying a Hilbert curve with order 𝑘,
stitutes the majority of the computational overhead (approx. 60%) the two-dimensional space becomes divided into a (2𝑘 ) × (2𝑘 ) cellular
due to the necessity of global statistical aggregation across the spatial grid. To guarantee that every coordinate within dataset 𝐷 receives
grid. However, the core privacy mechanisms—Prefix Tree Construction a distinct Hilbert index √assignment, the order parameter must fulfill
and Perturbation—demonstrate high efficiency. Notably, the adaptive the condition 𝑘 ≥ ⌈log |𝐷|⌉. This configuration assigns each cell,
perturbation phase accounts for less than 10% of the total time, con- including any coordinate it contains, to a unique integer within the
firming that the granular noise injection introduces negligible latency. interval [0, (2𝑘 )2 1].
This performance profile validates that AdaTraj-DP is well-suited for The binary sequence length, denoted 𝐿enc , depends on the total
periodic batch publishing scenarios (e.g., releasing trajectory updates count of representable integer values. Representing all (2𝑘 )2 = 22𝑘
every 5-10 min for traffic monitoring). While the current execution distinct values necessitates a binary sequence of length 𝐿enc = 2𝑘. The
time is sufficient for such batch-based near-real-time analytics, we transformation consists of a direct conversion from integer 𝑣𝑖,𝑗 to its
acknowledge that strictly latency-critical streaming applications may 𝐿enc -bit binary form, applying leading zero-padding when needed to
require further optimization of the tree construction process. Neverthe- maintain uniform length.
less, for the targeted high-utility analysis tasks, this computational cost Consider the following illustration: assume a Hilbert curve with
is a justifiable trade-off for the structural consistency provided by the order 𝑘 = 8. Under these conditions: The cellular count equals (28 )2 =
framework. 65,536. The integer value 𝑣𝑖,𝑗 resides within the interval [0, 65535]. The
necessary binary sequence length becomes 𝐿enc = 2 × 8 = 16.
6. Conclusion When coordinate 𝑝𝑖,𝑗 maps to integer 𝑣𝑖,𝑗 = 47593, its 16-bit binary
sequence representation becomes:
This study presented AdaTraj-DP, an adaptive privacy-preserving
𝑠𝑖,𝑗 = Encode(47593, 16) = "1011100111101001". (A.1)
framework for publishing trajectory data with differential privacy guar-
antees. The framework introduces context-aware sensitivity modeling This sequence 𝑠𝑖,𝑗 serves as the actual element for navigating and
and adaptive budget allocation to balance privacy protection and an- constructing the prefix tree. Individual bits within the sequence deter-
alytical utility in AI-based mobility analysis. By integrating personal- mine decisions at corresponding tree levels, establishing a multi-level
ized perturbation with hierarchical prefix-tree aggregation, AdaTraj-DP spatial indexing structure. The selection of parameter 𝑘 (and conse-
enables trajectory-level differential privacy while maintaining spatial quently 𝐿enc ) represents a crucial design choice that mediates between
fidelity and downstream model performance. spatial granularity and the prefix trees dimensions and computational
Future work will focus on extending AdaTraj-DP to support multi- overhead.
modal trajectory data, integrating semantic and temporal context under
unified privacy constraints. Additionally, to address the efficiency con- Data availability
cerns in high-frequency streaming environments, we plan to investigate
incremental tree update algorithms. This would allow the framework Data will be made available on request.
to handle real-time data streams with significantly lower latency while
maintaining the established privacy guarantees.
References
CRediT authorship contribution statement
[1] W. Zhang, M. Li, R. Tandon, H. Li, Online location trace privacy: An information
theoretic approach, IEEE Trans. Inf. Forensics Secur. 14 (1) (2018) 235250.
Yongxin Zhao: Writing review & editing, Writing original [2] F. Jin, W. Hua, M. Francia, P. Chao, M.E. Orlowska, X. Zhou, A survey and
draft, Visualization, Validation, Methodology, Investigation, Data cu- experimental study on privacy-preserving trajectory data publishing, IEEE Trans.
ration, Conceptualization. Chundong Wang: Writing review & edit- Knowl. Data Eng. 35 (6) (2022) 55775596.
[3] J. Liu, J. Chen, R. Law, S. Wang, L. Yang, Travel patterns and spatial structure:
ing, Project administration, Methodology. Hao Lin: Visualization, Val-
understanding winter tourism by trajectory data mining, Asia Pac. J. Tour. Res.
idation, Methodology. Xumeng Wang: Writing review & editing, 29 (11) (2024) 13511368.
Methodology, Conceptualization. Yixuan Song: Methodology, Investi- [4] Z. Wu, X. Wang, Z. Huang, T. Zhang, M. Zhu, X. Huang, M. Xu, W. Chen, A
gation, Conceptualization. Qiuyu Du: Investigation, Conceptualization. utility-aware privacy-preserving method for trajectory publication, IEEE Trans.
Vis. Comput. Graphics.
[5] S. Schestakov, S. Gottschalk, T. Funke, E. Demidova, RE-Trace: Re-identification
Declaration of competing interest of modified GPS trajectories, ACM Trans. Spat. Algorithms Syst. 10 (4) (2024)
128.
The authors declare that they have no known competing finan- [6] C. Dwork, Differential privacy, in: International Colloquium on Automata,
cial interests or personal relationships that could have appeared to Languages, and Programming, Springer, 2006, pp. 112.
[7] Z. Yang, R. Wang, D. Wu, H. Wang, H. Song, X. Ma, Local trajectory privacy
influence the work reported in this paper. protection in 5G enabled industrial intelligent logistics, IEEE Trans. Ind. Inform.
18 (4) (2021) 28682876.
Acknowledgments [8] Z. Shen, Y. Zhang, H. Wang, P. Liu, K. Liu, Y. Shen, BiGRU-DP: Improved
differential privacy protection method for trajectory data publishing, Expert Syst.
Appl. 252 (2024) 124264.
Thanks to the National Key R&D Program of China (2023YFB2703
[9] Y. Zhao, C. Wang, Protecting privacy and enhancing utility: A novel approach for
900). personalized trajectory data publishing using noisy prefix tree, Comput. Secur.
144 (2024) 103922.
Appendix. Conversion from integer values to binary sequences [10] S. Yuan, D. Pi, X. Zhao, M. Xu, Differential privacy trajectory data protection
scheme based on R-tree, Expert Syst. Appl. 182 (2021) 115215.
[11] W. Cheng, R. Wen, H. Huang, W. Miao, C. Wang, OPTDP: Towards opti-
Our prefix tree construction necessitates the representation of each mal personalized trajectory differential privacy for trajectory data publishing,
geographic coordinate as a character sequence. Although the Hilbert Neurocomputing 472 (2022) 201211.
space-filling curve successfully transforms a two-dimensional coordi- [12] N. Niknami, M. Abadi, F. Deldar, A fully spatial personalized differentially private
nate 𝑝𝑖,𝑗 into a one-dimensional integer 𝑣𝑖,𝑗 , this numerical value can- mechanism to provide non-uniform privacy guarantees for spatial databases, Inf.
Syst. 92 (2020) 101526.
not be directly incorporated into a conventional prefix tree structure. [13] P. Liu, D. Wu, Z. Shen, H. Wang, K. Liu, Personalized trajectory privacy data
Consequently, we implement an additional transformation phase that publishing scheme based on differential privacy, Internet Things 25 (2024)
converts this integer into a binary sequence 𝑠𝑖,𝑗 with fixed length. 101074.
9
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
[14] W. Qardaji, W. Yang, N. Li, Differentially private grids for geospatial data, in: [25] T. Wang, Y. Tao, A. Gilad, A. Machanavajjhala, S. Roy, Explaining differen-
2013 IEEE 29th International Conference on Data Engineering, ICDE, IEEE, 2013, tially private query results with dpxplain, Proc. VLDB Endow. 16 (12) (2023)
pp. 757768. 39623965.
[15] G. Cormode, C. Procopiuc, D. Srivastava, E. Shen, T. Yu, Differentially private [26] Z. Huang, J. Liu, D.G. Alabi, R.C. Fernandez, E. Wu, Saibot: A differentially
spatial decompositions, in: 2012 IEEE 28th International Conference on Data private data search platform, Proc. VLDB Endow. (PVLDB) 16 (11) (2023) PVLDB
Engineering, IEEE, 2012, pp. 2031. 2023 demo / system paper.
[16] J. Hua, Y. Gao, S. Zhong, Differentially private publication of general time- [27] Y. Dai, J. Shao, C. Wei, D. Zhang, H.T. Shen, Personalized semantic trajectory
serial trajectory data, in: 2015 IEEE Conference on Computer Communications, privacy preservation through trajectory reconstruction, World Wide Web 21
INFOCOM, IEEE, 2015, pp. 549557. (2018) 875914.
[17] Z. Zhang, X. Xu, F. Xiao, LGAN-DP: A novel differential private publication [28] K. Zuo, R. Liu, J. Zhao, Z. Shen, F. Chen, Method for the protection of
mechanism of trajectory data, Future Gener. Comput. Syst. 141 (2023) 692703. spatiotemporal correlation location privacy with semantic information, J. Xidian
[18] Y. Hu, Y. Du, Z. Zhang, Z. Fang, L. Chen, K. Zheng, Y. Gao, Real-time trajectory Univ. 49 (1) (2022) 6777.
synthesis with local differential privacy, in: 2024 IEEE 40th International [29] S. Denisov, H.B. McMahan, J. Rush, A. Smith, A. Guha Thakurta, Improved
Conference on Data Engineering, ICDE, IEEE, 2024, pp. 16851698. differential privacy for sgd via optimal private linear operators on adaptive
[19] R. Zhang, W. Ni, N. Fu, L. Hou, D. Zhang, Y. Zhang, DP-LTGAN: Differentially streams, Adv. Neural Inf. Process. Syst. 35 (2022) 59105924.
private trajectory publishing via Locally-aware Transformer-based GAN, Future [30] H. Fang, X. Li, C. Fan, P. Li, Improved convergence of differential private sgd
Gener. Comput. Syst. 166 (2025) 107686. with gradient clipping, in: The Eleventh International Conference on Learning
[20] S. Jiao, J. Cheng, Z. Huang, T. Li, T. Xie, W. Chen, Y. Ma, X. Wang, DPKnob: A Representations, 2023.
visual analysis approach to risk-aware formulation of differential privacy schemes [31] J. Fu, coauthors, DPSUR: Accelerating differentially private training via selective
for data query scenarios, Vis. Inform. 8 (3) (2024) 4252. updates and release, Proc. VLDB Endow. (PVLDB) 17 (2024) PVLDB paper; PDF
[21] X. Wang, S. Jiao, C. Bryan, Defogger: A visual analysis approach for data available from VLDB site.
exploration of sensitive data protected by differential privacy, IEEE Trans. Vis. [32] Y. Zheng, Trajectory data mining: an overview, ACM Trans. Intell. Syst. Technol.
Comput. Graphics 31 (1) (2025) 448458, http://dx.doi.org/10.1109/TVCG. (TIST) 6 (3) (2015) 141.
2024.3456304. [33] M.E. Andrés, N.E. Bordenabe, K. Chatzikokolakis, C. Palamidessi, Geo-
[22] R. Chen, B.C.M. Fung, B.C. Desai, Differentially private trajectory data indistinguishability: Differential privacy for location-based systems, in: Proceed-
publication, 2011, arXiv:1112.2020, URL https://arxiv.org/abs/1112.2020. ings of the 2013 ACM SIGSAC Conference on Computer & Communications
[23] C. Yin, J. Xi, R. Sun, J. Wang, Location privacy protection based on differential Security, 2013, pp. 901914.
privacy strategy for big data in industrial internet of things, IEEE Trans. Ind. [34] W. Zhang, M. Li, R. Tandon, H. Li, Semantic-aware privacy-preserving online
Inform. 14 (8) (2017) 36283636. location trajectory data sharing, IEEE Trans. Inf. Forensics Secur. 17 (2022)
[24] Y. Zhao, C. Wang, E. Zhao, X. Zheng, H. Lin, PerTrajTree-DP: A personalized 22922306.
privacy-preserving trajectory publishing framework for trustworthy AI systems, [35] J. Yuan, Y. Zheng, C. Zhang, W. Xie, X. Xie, G. Sun, Y. Huang, T-drive: driving
in: Data Security and Privacy Protection, Springer Nature Singapore, Singapore, directions based on taxi trajectories, in: Proceedings of the 18th SIGSPATIAL
ISBN: 978-981-95-3182-0, 2026, pp. 5775. International Conference on Advances in Geographic Information Systems, 2010,
pp. 99108.
[36] Y. Zheng, X. Xie, W.-Y. Ma, et al., GeoLife: A collaborative social networking
service among user, location and trajectory, IEEE Data Eng. Bull. 33 (2) (2010)
3239.
[37] Y. Zhao, C. Wang, L. Li, X. Wang, H. Lin, Z. Liu, TrajMamba: A multi-scale
mamba-based framework for joint trajectory and road network representation
learning, 2025, https://ssrn.com/abstract=5624451.
10