750 lines
87 KiB
Plaintext
750 lines
87 KiB
Plaintext
Computer Standards & Interfaces 97 (2026) 104125
|
||
|
||
|
||
Contents lists available at ScienceDirect
|
||
|
||
|
||
Computer Standards & Interfaces
|
||
journal homepage: www.elsevier.com/locate/csi
|
||
|
||
|
||
|
||
|
||
AdaTraj-DP: An adaptive privacy framework for context-aware trajectory
|
||
data publishingI
|
||
Yongxin Zhao a , Chundong Wang a,b ,∗, Hao Lin c ,∗∗, Xumeng Wang d , Yixuan Song a , Qiuyu Du c
|
||
a
|
||
Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology, Tianjin University of Technology, Tianjin, China
|
||
b
|
||
TianJin Police Institute, Tianjin, China
|
||
c
|
||
College of Intelligent Science and Technology (College of Cyberspace Security), Inner Mongolia University of Technology, Inner Mongolia, China
|
||
d
|
||
College of Cryptology and Cyber Science, Nankai University, Tianjin, China
|
||
|
||
|
||
|
||
ARTICLE INFO ABSTRACT
|
||
|
||
Keywords: Trajectory data are widely used in AI-based spatiotemporal analysis but raise privacy concerns due to their fine-
|
||
Differential privacy grained nature and the potential for individual re-identification. Existing differential privacy (DP) approaches
|
||
Trustworthy AI often apply uniform perturbation, which compromises spatial continuity, or adopt personalized mechanisms
|
||
Trajectory data publishing
|
||
that overlook structural utility. This study introduces AdaTraj-DP, an adaptive differential privacy framework
|
||
Personalized perturbation
|
||
designed to balance trajectory-level protection and analytical utility. The framework combines context-aware
|
||
sensitivity detection with hierarchical aggregation. Specifically, a dynamic sensitivity model evaluates privacy
|
||
risks according to spatial density and semantic context, enabling adaptive allocation of privacy budgets. An
|
||
adaptive perturbation mechanism then injects noise proportionally to the estimated sensitivity and represents
|
||
trajectories through Hilbert-based encoding for prefix-oriented hierarchical aggregation with layer-wise budget
|
||
distribution. Experiments conducted on the T-Drive and GeoLife datasets indicate that AdaTraj-DP maintains
|
||
stable query accuracy, spatial consistency, and downstream analytical utility across varying privacy budgets
|
||
while satisfying formal differential privacy guarantees.
|
||
|
||
|
||
|
||
1. Introduction differential privacy for trajectory data has become essential to support
|
||
reliable and ethically compliant AI development.
|
||
The proliferation of mobile devices, GPS sensors, and intelligent Differential Privacy (DP) [6] provides a rigorous mathematical guar-
|
||
transportation infrastructures has resulted in the large-scale collection antee against information leakage. However, its application to tra-
|
||
of spatiotemporal data. Such data serve as the foundation for numerous jectory publishing introduces a persistent trade-off between privacy
|
||
Location-Based Services (LBS), including navigation, ride-hailing, and strength, data utility, and personalization, which conventional mecha-
|
||
urban planning [1,2]. Trajectory datasets record detailed sequences of nisms fail to reconcile. Two primary gaps remain unresolved: (1) the
|
||
individual movements, enabling a wide range of AI applications such as tension between point-level perturbation and structural integrity;(2)
|
||
traffic forecasting, mobility prediction, and behavioral modeling. These the difficulty of adapting privacy budgets to varying contextual sen-
|
||
applications have become indispensable for smart city management and sitivity. Early studies injected uniform Laplace noise into each location
|
||
autonomous systems, where the integrity and granularity of trajectory point [7,8], which protected individual coordinates but severely dis-
|
||
data directly affect analytical and decision-making accuracy. torted the spatiotemporal correlation essential for route-level analysis.
|
||
Despite their utility, trajectory datasets raise critical privacy con- Subsequent hierarchical schemes based on prefix trees or space-filling
|
||
cerns for trustworthy AI. A single trajectory may expose an individual’s curves [9,10] preserved aggregate statistics but relied on global, fixed
|
||
home, workplace, or health-related locations, revealing sensitive be- privacy parameters, ignoring heterogeneous sensitivity across trajecto-
|
||
havioral patterns and social relationships [3,4]. Even after removing ries. Recent progress in Personalized Differential Privacy (PDP) [11–13]
|
||
explicit identifiers, re-identification attacks can reconstruct personal introduced adaptive noise based on semantic or frequency-based sen-
|
||
traces with minimal auxiliary information [5]. Consequently, ensuring sitivity, yet these methods typically lack integration with hierarchical
|
||
|
||
|
||
|
||
I This article is part of a Special issue entitled: ‘Secure AI’ published in Computer Standards & Interfaces.
|
||
∗ Corresponding author at: Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology, Tianjin University of Technology, Tianjin, China.
|
||
∗∗ Corresponding author.
|
||
E-mail addresses: zyx4237@163.com (Y. Zhao), michael3769@163.com (C. Wang), suzukaze_aoba@126.com (H. Lin), wangxumeng@nankai.edu.cn
|
||
(X. Wang), fykatb0824@163.com (Q. Du).
|
||
|
||
https://doi.org/10.1016/j.csi.2025.104125
|
||
Received 29 October 2025; Received in revised form 25 December 2025; Accepted 29 December 2025
|
||
Available online 30 December 2025
|
||
0920-5489/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
|
||
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
|
||
|
||
|
||
aggregation, resulting in limited query accuracy and poor scalability quadtree variants support spatial indexing under privacy constraints [7,
|
||
for AI model training. 10]. Recent work improves spatial locality and query accuracy us-
|
||
To bridge this gap, we propose AdaTraj-DP, an adaptive differ- ing Hilbert/Geohash encodings and adaptive tree strategies [9]. Zhao
|
||
entially private trajectory publishing framework that unifies context- et al.’s PerTrajTree-DP further integrates point-level sensitivity with
|
||
aware sensitivity modeling and hierarchical aggregation. AdaTraj-DP prefix-tree publishing to better support trustworthy AI analytics [24].
|
||
introduces a two-stage protection mechanism. The first stage detects Complementary systems research on private data access and expla-
|
||
and quantifies sensitivity using contextual and statistical cues, allowing nation (e.g., DPXPlain, Saibot) demonstrates practical techniques for
|
||
adaptive privacy budget assignment at the point level. The second supporting DP-protected analytics and helping users interpret noisy
|
||
stage encodes perturbed trajectories into a hierarchical prefix tree, aggregates [25,26].
|
||
applying layer-wise budget allocation to preserve structural consistency
|
||
for downstream analysis. This design ensures both localized protection 2.3. Personalized and adaptive privacy protection
|
||
and global analytical utility, addressing the core limitations of prior
|
||
DP-based trajectory mechanisms. Personalized Differential Privacy (PDP) methods adapt protection
|
||
The main contributions of this work are summarized as follows: to varying point- or user-level sensitivity. Semantics-driven approaches
|
||
use POI categories or external labels to identify sensitive locations [27,
|
||
(1) We propose AdaTraj-DP, an adaptive framework that unifies per- 28], and movement-model-based frameworks like OPTDP estimate pri-
|
||
sonalized perturbation and hierarchical aggregation. By estab- vacy risk from mobility patterns [11]. Statistical personalization meth-
|
||
lishing a mathematical link between local coordinate noise and ods infer sensitivity from dataset properties; for example, TF–IDF-based
|
||
global prefix-tree structures, the framework ensures that fine- approaches quantify local importance and global rarity to guide bud-
|
||
grained point-level protection remains structurally consistent get allocation [12,13]. Interactive tools and visual analytics (DPKnob,
|
||
with trajectory-level differential privacy guarantees, enabling Defogger) provide practical support for configuring heterogeneous DP
|
||
high-fidelity reconstruction for downstream tasks. strategies according to utility goals [20,21].
|
||
(2) We design a context-aware sensitivity model that combines spa- In parallel, recent advances in differentially private deep learning
|
||
tial density with semantic context to guide adaptive budget and private model training yield methods for improved utility in noisy
|
||
allocation. This mechanism quantifies privacy risks at a granular training regimes (e.g., optimized DP-SGD variants, selective-update
|
||
training, and heterogeneous-noise schemes) that can inform budget
|
||
level, enabling the dynamic adjustment of perturbation intensity
|
||
allocation and model-aware privacy strategies in trajectory publish-
|
||
to balance privacy protection and data fidelity.
|
||
ing [25,26,29–31]. These works highlight opportunities to close the
|
||
(3) We implement a hierarchical aggregation scheme utilizing Hilbert
|
||
gap between personalized point-level protection and structural aggrega-
|
||
spatial mapping and logarithmic layer-wise budget distribution.
|
||
tion, motivating AdaTraj-DP’s integration of context-aware sensitivity
|
||
Experiments on the T-Drive and GeoLife datasets validate the
|
||
detection, adaptive perturbation, and hierarchical encoding to support
|
||
framework’s effectiveness in preserving query accuracy, spatial
|
||
AI-oriented downstream tasks.
|
||
consistency, and AI model performance under varying privacy
|
||
budgets. 3. Preliminaries
|
||
|
||
2. Related work
|
||
Trajectory Representation. A trajectory 𝑇𝑖 of user 𝑢𝑖 is a temporally
|
||
Existing privacy-preserving trajectory publishing approaches can ordered sequence of geo-referenced points [32]:
|
||
be broadly categorized into three classes: (1) foundational differen- 𝑇𝑖 = {(𝑝𝑖,1 , 𝑡𝑖,1 ), (𝑝𝑖,2 , 𝑡𝑖,2 ), … , (𝑝𝑖,𝐿𝑖 , 𝑡𝑖,𝐿𝑖 )}, (1)
|
||
tial privacy models that ensure privacy but compromise trajectory
|
||
continuity; (2) structural aggregation mechanisms that enhance data where 𝑝𝑖,𝑗 = (lat 𝑖,𝑗 , lon𝑖,𝑗 ) denotes the spatial coordinate and 𝑡𝑖,𝑗 is the
|
||
utility via hierarchical organization; and (3) personalized and adaptive timestamp. The trajectory dataset is denoted as = {𝑇1 , 𝑇2 , … , 𝑇𝑁 }.
|
||
privacy protection strategies that tailor noise to sensitivity but often Each point can be projected into a discrete grid cell 𝑐𝑖,𝑗 for statistical
|
||
lack integration with structural models. This section reviews these three analysis or further spatial encoding. The dimensionality and sampling
|
||
directions and discusses recent advances that motivate AdaTraj-DP. irregularity of result in high sparsity and heterogeneous sensitivity
|
||
among locations, which requires adaptive privacy mechanisms.
|
||
2.1. Foundational models for differentially private trajectory publishing Differential Privacy. Let 1 and 2 be two neighboring datasets dif-
|
||
fering in at most one trajectory. A randomized mechanism satisfies
|
||
Differential Privacy (DP) [6] is the standard formalism for privacy- 𝜀-differential privacy if for any measurable subset 𝑂 in the output
|
||
preserving data publication. Early approaches discretize continuous space:
|
||
spatio-temporal domains and inject Laplace noise into cell counts
|
||
Pr[(1 ) ∈ 𝑂] ≤ 𝑒𝜀 Pr[(2 ) ∈ 𝑂]. (2)
|
||
or simple aggregates [14,15], but such methods often disrupt tra-
|
||
jectory continuity and reduce utility for route-level analysis [7]. To The privacy budget 𝜀 > 0 controls the trade-off between privacy pro-
|
||
address this, research has explored trajectory generalization and syn- tection and data utility. Smaller 𝜀 implies stronger privacy guarantees
|
||
thetic data generation under DP, including clustering-based generaliza- but larger perturbation noise.
|
||
tion [16] and GAN-based synthetic trajectory models [17–19]. Work For a numerical query 𝑓 ∶ → R𝑘 with 𝓁1 sensitivity 𝛥𝑓 =
|
||
on DP-aware data exploration and visualization—e.g., DPKnob and max1 ,2 ‖𝑓 (1 ) − 𝑓 (2 )‖1 , the Laplace mechanism adds independent
|
||
Defogger—highlights the challenge of configuring DP mechanisms to noise drawn from the Laplace distribution:
|
||
balance utility and risk in interactive settings and motivates user- or
|
||
() = 𝑓 () + Lap(𝛥𝑓 ∕𝜀). (3)
|
||
task-guided privacy configuration [20,21].
|
||
This mechanism provides 𝜀-differential privacy and is used in sub-
|
||
2.2. Structural aggregation for utility enhancement sequent trajectory perturbation and aggregation processes.
|
||
Geographic Indistinguishability. For any two spatial points 𝑥, 𝑥′ ∈ R2
|
||
Hierarchical structures—such as prefix trees, Hilbert-encoded se-
|
||
and any reported location 𝑧, a mechanism achieves 𝜀-geographic
|
||
quences, and spatial index trees—have been widely adopted to preserve
|
||
indistinguishability if
|
||
aggregate query utility under DP. Early prefix-tree methods aggre-
|
||
′
|
||
gate shared prefixes to reduce noise impact [22,23], while R-tree and Pr[(𝑥) = 𝑧] ≤ 𝑒𝜀⋅𝑑(𝑥,𝑥 ) Pr[(𝑥′ ) = 𝑧], (4)
|
||
|
||
2
|
||
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
|
||
|
||
|
||
by combining statistical frequency and contextual semantics to guide
|
||
subsequent adaptive perturbation.
|
||
Spatial Discretization. The continuous geographical domain is parti-
|
||
tioned into a uniform grid of 𝐺 × 𝐺 cells. Each point 𝑝𝑖,𝑗 is mapped to
|
||
a corresponding grid cell 𝑐𝑖,𝑗 . This transformation converts raw coordi-
|
||
nates into discrete spatial tokens, enabling frequency-based statistical
|
||
analysis.
|
||
|
||
Fig. 1. Framework of the proposed AdaTraj-DP scheme. Context-aware Sensitivity Measure. For each cell 𝑐𝑖,𝑗 , a sensitivity score
|
||
𝑆(𝑐𝑖,𝑗 ) is defined as
|
||
|
||
𝑆(𝑐𝑖,𝑗 ) = TF(𝑐𝑖,𝑗 , 𝑇𝑖 ) ⋅ IDF(𝑐𝑖,𝑗 ) ⋅ 𝜔𝑐 , (6)
|
||
where 𝑑(𝑥, 𝑥′ ) is the Euclidean distance between 𝑥 and 𝑥′ [33]. count(𝑐𝑖,𝑗 ∈𝑇𝑖 )
|
||
This formulation extends differential privacy to continuous spatial where TF(𝑐𝑖,𝑗 , 𝑇𝑖 ) = 𝐿𝑖
|
||
represents the normalized local fre-
|
||
||
|
||
domains and provides distance-dependent protection. quency of visits within trajectory 𝑇𝑖 , and IDF(𝑐𝑖,𝑗 ) = log |{𝑇 ∈∶𝑐
|
||
𝑘 𝑖,𝑗 ∈𝑇𝑘 }|
|
||
Hierarchical Aggregation Structure. Trajectory data exhibit hierarchi- denotes the global rarity of the location across the dataset. The term
|
||
cal correlations that can be represented through prefix-based aggre- 𝜔𝑐 is a contextual weighting coefficient that quantifies the semantic
|
||
gation. Let each discretized or encoded trajectory be expressed as a sensitivity of a location category. Following the semantic sensitivity
|
||
hierarchy established in [34], we assign higher weights to privacy-
|
||
sequence of spatial identifiers 𝑆𝑖 = [𝑠𝑖,1 , 𝑠𝑖,2 , … , 𝑠𝑖,𝐿𝑖 ]. A prefix tree
|
||
critical categories (e.g., 𝜔ℎ𝑒𝑎𝑙𝑡ℎ𝑐𝑎𝑟𝑒 = 1.5, 𝜔𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡𝑖𝑎𝑙 = 1.2) to enforce
|
||
organizes all trajectories in by shared prefixes, where each node 𝑣
|
||
stricter protection, while assigning lower base weights to public infras-
|
||
corresponds to a spatial prefix and maintains a count 𝑐(𝑣) of trajectories
|
||
tructure (e.g., 𝜔𝑟𝑜𝑎𝑑 = 1.0). These semantic categories are mapped from
|
||
passing through it. The hierarchical form allows noise to be injected at
|
||
public map services (e.g., OpenStreetMap), ensuring that the sensitivity
|
||
multiple granularities while preserving global spatial consistency.
|
||
configuration relies solely on public knowledge and does not consume
|
||
The total privacy budget 𝜀tree is distributed across tree layers to the private budget.
|
||
balance upper-level accuracy and lower-level detail preservation.
|
||
Normalization and Classification. To unify the sensitivity scale, all
|
||
Problem Definition. Given a trajectory dataset consisting of 𝑁 users scores are normalized into [0, 1]:
|
||
and a total privacy budget𝜀total , the objective is to design a mechanism
|
||
𝑆(𝑐𝑖,𝑗 ) − min(𝑆)
|
||
traj that releases a trajectory dataset ̃ = traj () satisfying: ̂ 𝑖,𝑗 ) =
|
||
𝑆(𝑐 . (7)
|
||
max(𝑆) − min(𝑆)
|
||
Each point 𝑝𝑖,𝑗 is then labeled as sensitive or non-sensitive according
|
||
(1) traj ensures 𝜀total -differential privacy at the trajectory level;
|
||
to a predefined threshold 𝜃𝑆 :
|
||
(2) The released dataset ̃ preserves statistical and structural prop- {
|
||
erties essential for AI-based spatiotemporal analysis; ̂ 𝑖,𝑗 ) ≥ 𝜃𝑆 ,
|
||
1, if 𝑆(𝑐
|
||
label(𝑝𝑖,𝑗 ) = (8)
|
||
(3) The expected analytical error between results obtained from ̃ 0, otherwise.
|
||
and remains bounded. The resulting annotated dataset is represented as ′ = {𝑇1′ , 𝑇2′ , … , 𝑇𝑁′ },
|
||
where each 𝑇𝑖′ contains the points and corresponding sensitivity labels.
|
||
Let 𝑓AI (⋅) denote an AI model trained or evaluated on trajectory The normalized score 𝑆(𝑐 ̂ 𝑖,𝑗 ) serves as a continuous privacy indicator in
|
||
data. The utility preservation objective is formulated as the subsequent adaptive perturbation phase.
|
||
[ ]
|
||
̃ − 𝑓AI ()‖2 ,
|
||
𝐿utility = E ‖𝑓AI () (5)
|
||
2 4.2. Adaptive personalized perturbation
|
||
subject to ̃ satisfying 𝜀total -differential privacy. The goal is to minimize
|
||
𝐿utility while maintaining formal privacy guarantees. This phase injects controlled noise into all trajectory points in ′ to
|
||
ensure trajectory-level differential privacy. All locations are perturbed
|
||
4. Proposed framework to avoid inference risks arising from selective protection. The perturba-
|
||
tion strength is adaptively adjusted based on the normalized sensitivity
|
||
̂ 𝑖,𝑗 ) and local spatial density, allowing the mechanism to preserve
|
||
𝑆(𝑐
|
||
Rapid development of AI-driven spatiotemporal analysis has in-
|
||
creased the demand for high-quality trajectory data with strong privacy analytical fidelity while maintaining formal privacy guarantees.
|
||
protection. Traditional differential privacy mechanisms often adopt Adaptive Privacy Budget Allocation. Each trajectory point 𝑝𝑖,𝑗 is as-
|
||
fixed noise scales or uniform budget allocation, which can cause exces- signed an individual privacy budget 𝜀𝑝𝑖,𝑗 determined by both its sensi-
|
||
sive utility degradation in dense areas or insufficient protection in sensi- tivity level and spatial context.
|
||
tive regions. To address these limitations, this study proposes AdaTraj- Let 𝜌(𝑝𝑖,𝑗 ) denote the local point density around 𝑝𝑖,𝑗 within a neigh-
|
||
DP, a framework that integrates adaptive personalized perturbation borhood radius 𝑟. The adaptive budget is defined as
|
||
with hierarchical aggregation to achieve trajectory-level differential ( )
|
||
̂ 𝑖,𝑗 ) + (1 − 𝛼)(1 − 𝜌(𝑝𝑖,𝑗 )) ,
|
||
𝜀𝑝𝑖,𝑗 = 𝜀max − (𝜀max − 𝜀min ) × 𝛼 𝑆(𝑐 (9)
|
||
privacy while maintaining analytical utility for AI-based modeling.
|
||
As illustrated in Fig. 1, AdaTraj-DP operates in three main phases: where 𝛼 ∈ [0, 1] controls the balance between sensitivity-based and
|
||
(1) trajectory preprocessing and context-aware sensitivity detection; density-based adaptation.
|
||
(2) adaptive personalized perturbation guided by local sensitivity and A higher 𝑆(𝑐 ̂ 𝑖,𝑗 ) or lower 𝜌(𝑝𝑖,𝑗 ) leads to a smaller 𝜀𝑝 , introducing
|
||
𝑖,𝑗
|
||
spatial density; (3) hierarchical aggregation using Hilbert encoding and stronger noise for privacy-critical or sparsely visited regions. The range
|
||
dynamic layer-wise budget allocation. [𝜀min , 𝜀max ] defines the permissible privacy strength, ensuring stability
|
||
across heterogeneous data distributions.
|
||
4.1. Context-aware sensitivity detection
|
||
Two-Dimensional Laplace Perturbation. For each point 𝑝𝑖,𝑗 = (lat 𝑖,𝑗 , lon𝑖,𝑗 ),
|
||
independent Laplace noise is applied to both coordinates according to
|
||
Let = {𝑇1 , … , 𝑇𝑁 } denote the trajectory dataset after basic
|
||
the assigned privacy budget:
|
||
preprocessing. Each trajectory 𝑇𝑖 = {(𝑝𝑖,1 , 𝑡𝑖,1 ), … , (𝑝𝑖,𝐿𝑖 , 𝑡𝑖,𝐿𝑖 )} consists {
|
||
of temporally ordered spatial points 𝑝𝑖,𝑗 = (lat 𝑖,𝑗 , lon𝑖,𝑗 ). The objective lat 𝑖,𝑗 + Laplace(0, 1∕𝜀𝑝𝑖,𝑗 )
|
||
𝑝′𝑖,𝑗 = (10)
|
||
of this phase is to quantify the privacy sensitivity of each spatial point lon𝑖,𝑗 + Laplace(0, 1∕𝜀𝑝𝑖,𝑗 )
|
||
|
||
3
|
||
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
|
||
|
||
|
||
Algorithm 1 Adaptive Personalized Perturbation under AdaTraj-DP Algorithm 2 Dynamic Hierarchical Aggregation under AdaTraj-DP
|
||
Input: Annotated dataset ′ , privacy range [𝜀min , 𝜀max ], sensitivity Input: Perturbed dataset ′′ , total tree budget 𝜀tree , height ℎ,
|
||
scores 𝑆, ̂ balance coefficient 𝛼 parameters 𝑎, 𝛾, encoding length 𝐿enc
|
||
Output: Perturbed dataset ′′ Output: Privacy-aware prefix tree ′
|
||
1: ′′ ← ∅ 1: Initialize empty tree
|
||
2: for each trajectory 𝑇𝑖 ∈ ′ do 2: for each trajectory 𝑇𝑖′′ = {𝑝′𝑖,1 , … , 𝑝′𝑖,𝐿 } in ′′ do
|
||
𝑖
|
||
3: 𝑇𝑖′′ ← ∅ 3: Encode trajectory:
|
||
4: for each point 𝑝𝑖,𝑗 in 𝑇𝑖 do 𝑆𝑖 ← [Encode1D(𝐻(𝑝′𝑖,1 )), … , Encode1D(𝐻(𝑝′𝑖,𝐿 ))]
|
||
𝑖
|
||
5: Compute local density 𝜌(𝑝𝑖,𝑗 ) 4: Insert 𝑆𝑖 into and increment node counts along each path
|
||
6: 𝜀𝑝𝑖,𝑗 ← 𝜀max − (𝜀max − 𝜀min ) × (𝛼 𝑆(𝑐 ̂ 𝑖,𝑗 ) + (1 − 𝛼)(1 − 𝜌(𝑝𝑖,𝑗 ))) 5: end for
|
||
7: 𝑛lat ∼ Laplace(0, 1∕𝜀𝑝𝑖,𝑗 ) 6: for layer 𝑖 = 1 to ℎ do
|
||
8: 𝑛lon ∼ Laplace(0, 1∕𝜀𝑝𝑖,𝑗 ) 7: Compute node count variance 𝜎𝑖2
|
||
9: 𝑝′𝑖,𝑗 ← (lat 𝑖,𝑗 + 𝑛lat , lon𝑖,𝑗 + 𝑛lon ) (log(𝑖+𝑎))(1+𝛾𝜎𝑖2 )
|
||
8: 𝜀level,𝑖 ← ∑ℎ ⋅ 𝜀tree
|
||
10: Append 𝑝′𝑖,𝑗 to 𝑇𝑖′′ 2
|
||
𝑗=1 (log(𝑗+𝑎))(1+𝛾𝜎𝑗 )
|
||
11: end for 9: for each node 𝑣 at layer 𝑖 do
|
||
12: Add 𝑇𝑖′′ to ′′ 10: 𝑐 ′ (𝑣) ← 𝑐(𝑣) + Laplace(0, 1∕𝜀level,𝑖 )
|
||
13: end for 11: Update 𝑐(𝑣) ← 𝑐 ′ (𝑣)
|
||
14: return ′′ 12: end for
|
||
13: end for
|
||
14: return ′
|
||
|
||
The perturbed trajectory 𝑇𝑖′′ = {𝑝′𝑖,1 , 𝑝′𝑖,2 , … , 𝑝′𝑖,𝐿 } is constructed by
|
||
𝑖
|
||
replacing each original point with its perturbed counterpart. The com-
|
||
plete differentially private dataset is denoted as = {𝑇1′′ , 𝑇2′′ , … , 𝑇𝑁′′ }.
|
||
′′ loss in fine-grained trajectories, the logarithmic term ensures that leaf
|
||
Algorithm 1 outlines the adaptive personalized perturbation proce- nodes retain sufficient privacy budget to preserve local spatial details.
|
||
dure. Differentially Private Node Perturbation. For each node 𝑣 at layer 𝑖,
|
||
the sensitivity of its count query is 𝛥𝑓 = 1. Laplace noise is applied
|
||
according to its layer-wise budget:
|
||
4.3. Hierarchical aggregation with dynamic budget allocation
|
||
( )
|
||
1
|
||
𝑐 ′ (𝑣) = 𝑐(𝑣) + Laplace 0, . (13)
|
||
This phase organizes the perturbed trajectories into a structured 𝜀level,𝑖
|
||
form for privacy-preserving analytical querying and AI model training. The resulting prefix tree ′ with perturbed counts serves as a
|
||
A hierarchical prefix tree is constructed from the encoded trajectories, privacy-preserving hierarchical representation supporting aggregate
|
||
where node counts are perturbed under a dynamically adjusted budget analytics and AI-based trajectory modeling.
|
||
to preserve global consistency while mitigating noise propagation. Algorithm 2 summarizes the hierarchical aggregation process with
|
||
dynamic budget adjustment.
|
||
Spatial Encoding via Hilbert Curve. Each perturbed point 𝑝′𝑖,𝑗 ∈ ′′
|
||
is mapped into a one-dimensional integer value 𝑣𝑖,𝑗 using a Hilbert
|
||
space-filling curve 𝐻(⋅), ensuring spatial locality preservation: 4.4. Privacy analysis
|
||
|
||
𝑣𝑖,𝑗 = 𝐻(𝑝′𝑖,𝑗 ). (11)
|
||
The proposed AdaTraj-DP framework comprises two sequential
|
||
Each integer value 𝑣𝑖,𝑗 is then converted into a fixed-length binary privacy-preserving mechanisms: adaptive personalized perturbation
|
||
string 𝑠𝑖,𝑗 of length 𝐿enc , forming a discretized trajectory representation (with budget 𝜀point ) and hierarchical aggregation (with budget 𝜀tree ).
|
||
𝑆𝑖 = [𝑠𝑖,1 , 𝑠𝑖,2 , … , 𝑠𝑖,𝐿𝑖 ]. The set of all encoded trajectories {𝑆𝑖 } consti- By the sequential composition theorem of differential privacy, the total
|
||
tutes the input to hierarchical aggregation. The technical details of this privacy guarantee satisfies
|
||
Hilbert-to-binary-string encoding, including the relationship between 𝜀total = 𝜀point + 𝜀tree . (14)
|
||
the curve’s order and the string length, are elaborated in Appendix.
|
||
|
||
Prefix Tree Construction. A prefix tree is built from {𝑆𝑖 }, where each Privacy of Adaptive Personalized Perturbation (𝜀point ). The adaptive
|
||
path from the root to a node 𝑣 represents a spatial prefix, and the node perturbation mechanism assigns an individual privacy budget 𝜀𝑝𝑖,𝑗 to
|
||
count 𝑐(𝑣) indicates the number of trajectories sharing that prefix. The ̂ 𝑖,𝑗 )
|
||
each trajectory point 𝑝𝑖,𝑗 derived from its normalized sensitivity 𝑆(𝑐
|
||
maximum tree depth ℎ corresponds to the maximum trajectory length and local density 𝜌(𝑝𝑖,𝑗 ). To ensure rigorous privacy guarantees, it is
|
||
or encoding depth. assumed that the global weighting parameters (e.g., contextual weights
|
||
𝜔𝑐 and density thresholds) are computed from public sources, such as
|
||
Dynamic Layer-wise Budget Allocation. The total privacy budget 𝜀tree
|
||
map topologies or non-sensitive historical statistics. This reliance on
|
||
is distributed across tree layers according to both layer depth and
|
||
public metadata is a standard practice in privacy-preserving spatial
|
||
statistical variance. Let 𝜎𝑖2 denote the empirical variance of node counts
|
||
publishing [14,33], ensuring that the sensitivity calibration process
|
||
at layer 𝑖. The adaptive allocation for layer 𝑖 is defined as
|
||
itself does not leak private information. Consequently, the allocated
|
||
(log(𝑖 + 𝑎)) ⋅ (1 + 𝛾𝜎𝑖2 ) budget 𝜀𝑝𝑖,𝑗 depends solely on the characteristics of its corresponding
|
||
𝜀level,𝑖 = ∑ℎ ⋅ 𝜀tree , (12) trajectory 𝑇𝑖 . Under this assumption:
|
||
2
|
||
𝑗=1 (log(𝑗 + 𝑎))(1 + 𝛾𝜎𝑗 )
|
||
|
||
where 𝑎 > 0 is a smoothing parameter and 𝛾 ≥ 0 controls the weight of (1) The assignment of 𝜀𝑝𝑖,𝑗 relies solely on local statistics within 𝑇𝑖
|
||
variance-based adjustment. Adopting the logarithmic strategy from [9], and public constants, which ensures independence among users.
|
||
the function log(𝑖 + 𝑎) is selected to smooth the budget decay across (2) Each trajectory is processed through an independent Laplace
|
||
layers. Unlike linear or exponential allocation schemes, which might mechanism. For any point 𝑝𝑖,𝑗 , the Laplace mechanism with scale
|
||
excessively penalize deeper layers and lead to significant information 1∕𝜀𝑝𝑖,𝑗 satisfies 𝜀𝑝𝑖,𝑗 -differential privacy.
|
||
|
||
4
|
||
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
|
||
|
||
|
||
(3) Because the budgets are bounded within [𝜀min , 𝜀max ], the overall Both datasets are preprocessed by: (1) removing sampling intervals
|
||
privacy cost of this phase is dominated by the smallest allocated exceeding 300 s; (2) filtering out trajectories shorter than 20 points;
|
||
budget, and the worst-case (strongest) guarantee corresponds to (3) normalizing all coordinates into a [0, 1] × [0, 1] grid to ensure scale
|
||
𝜀min -DP for each point. comparability.
|
||
(4) By parallel composition across trajectories, the global privacy These datasets collectively provide both high-density and low-
|
||
consumption of this phase is 𝜀point = 𝜀max , representing the max- density spatial distributions, enabling a fair evaluation of the proposed
|
||
imum privacy loss incurred when the weakest noise is added. context-aware sensitivity modeling.
|
||
|
||
Hence, the adaptive perturbation phase satisfies 𝜀max -differential 5.1.2. Baseline methods
|
||
privacy. To demonstrate the advantages of AdaTraj-DP, we compare it with
|
||
Privacy of Hierarchical Aggregation (𝜀tree ). The hierarchical aggrega- four representative baselines, each reflecting a distinct privacy design
|
||
tion mechanism constructs a prefix tree and perturbs its node counts paradigm:
|
||
with layer-specific noise calibrated by 𝜀level,𝑖 . Each trajectory affects
|
||
• HA-Tree [9]: A hierarchical aggregation method based on Hilbert
|
||
exactly one node per layer, implying that the sensitivity of the count
|
||
mapping and fixed logarithmic budget allocation, representing
|
||
query at any layer is 𝛥𝑓 = 1. Adding Laplace noise with scale 1∕𝜀level,𝑖
|
||
state-of-the-art static DP trees.
|
||
guarantees 𝜀level,𝑖 -DP for that layer.
|
||
• TFIDF-DP [13]: A personalized perturbation method using TF–
|
||
Because the per-layer budgets 𝜀level,𝑖 are partitioned from 𝜀tree ac-
|
||
IDF-based sensitivity scoring without hierarchical structure, cor-
|
||
cording to
|
||
responding to point-level DP only.
|
||
∑
|
||
ℎ
|
||
• QJLP (LDP) [7]: A local differential privacy baseline where each
|
||
𝜀level,𝑖 = 𝜀tree , (15) trajectory is perturbed independently on the client side.
|
||
𝑖=1
|
||
• AdaTraj-DP (Ours): The proposed adaptive framework that com-
|
||
and the layers are sequentially composed along each trajectory path, bines context-aware sensitivity detection, adaptive perturbation,
|
||
the entire prefix tree synthesis mechanism satisfies 𝜀tree -differential and dynamic hierarchical aggregation.
|
||
privacy. The dynamic allocation factor (1 + 𝛾𝜎𝑖2 ) modifies the budget
|
||
distribution without altering the total privacy bound, ensuring that the 5.1.3. Evaluation metrics
|
||
overall guarantee remains unchanged. Performance is evaluated from three complementary perspectives:
|
||
Overall Privacy Guarantee. Applying the sequential composition theo- Data Utility. We adopt three quantitative metrics: Mean Absolute Error
|
||
rem to the two phases yields the total privacy protection level: (MAE), Mean Relative Error (MRE), and Hausdorff Distance (HD).
|
||
𝜀total = 𝜀max + 𝜀tree . (16) MAE and MRE evaluate accuracy for range-count queries on perturbed
|
||
trajectories, while HD measures spatial fidelity between original and
|
||
This ensures that AdaTraj-DP provides formal, trajectory-level released datasets.
|
||
differential privacy. The adaptive and hierarchical mechanisms jointly
|
||
Model Utility. To align with AI-oriented evaluation, we train a down-
|
||
maintain consistent privacy guarantees while supporting utility-
|
||
stream trajectory classification model based on a lightweight Mamba
|
||
preserving analysis for AI-based spatiotemporal modeling.
|
||
encoder [37]. The model predicts driver ID from trajectory segments,
|
||
and classification accuracy on the perturbed data reflects end-task
|
||
5. Experimental evaluation
|
||
utility (𝑈cls ).
|
||
|
||
This section presents an extensive empirical evaluation of the pro- Computational Efficiency. We report total runtime (𝑇total ) from prepro-
|
||
posed AdaTraj-DP framework. The experiments aim to validate both cessing to privacy-protected publication, including all three phases of
|
||
privacy preservation and analytical utility in AI-oriented trajectory AdaTraj-DP.
|
||
publishing. Specifically, we address the following research questions:
|
||
5.1.4. Parameter configuration
|
||
• RQ1: How does the total privacy budget 𝜀total affect the analytical Unless otherwise stated, experiments use the following default con-
|
||
utility of the released trajectories? figuration: the total privacy budget 𝜀total is divided by an allocation
|
||
• RQ2: How does AdaTraj-DP perform compared to state-of-the- ratio 𝛼, where 𝛼 ∈ [0.3, 0.7] controls the portion used for adaptive
|
||
art differential privacy mechanisms in terms of accuracy and perturbation (𝜀point ), and (1 − 𝛼) for hierarchical aggregation (𝜀tree ):
|
||
computational efficiency?
|
||
• RQ3: What are the impacts of the adaptive parameters—including 𝜀point = 𝛼𝜀total , 𝜀tree = (1 − 𝛼)𝜀total . (17)
|
||
allocation ratio 𝛼 and variance factor 𝛾—on privacy–utility trade- We vary 𝜀total from 0.5 to 3.0 to investigate the privacy–utility
|
||
offs? trade-off.
|
||
The variance factor 𝛾 controlling dynamic budget adaptation is se-
|
||
5.1. Experimental setup lected from {0, 0.2, 0.5, 1.0}, and the hierarchical smoothing parameter
|
||
is set to 𝑎 = 1.0. The sensitivity threshold 𝜃𝑆 for classifying sensitive
|
||
This subsection introduces the datasets, baseline methods, evalua- points is chosen from {0.6, 0.7, 0.8, 0.9}. The personalized budget range
|
||
tion metrics, and parameter configurations used in the experiments. is fixed at [𝜀min , 𝜀max ] = [0.1, 1.0].
|
||
To ensure comparability, all methods share identical grid resolution
|
||
5.1.1. Datasets (𝐺 = 128) and Hilbert encoding length (𝐿enc = 16). All experiments are
|
||
Experiments are primarily conducted on the widely used T-Drive implemented in Python 3.8 with PyTorch 2.4 on an NVIDIA RTX 4090
|
||
dataset, which records GPS trajectories of 10,357 taxis in Beijing GPU.
|
||
over seven days (February 2–8, 2008) [35]. It contains approximately
|
||
15 million spatial points after preprocessing. To further verify cross- 5.2. RQ1: Data utility evaluation
|
||
domain robustness, we additionally include the GeoLife dataset [36],
|
||
which comprises 17,621 trajectories from 182 users, covering both This experiment evaluates how AdaTraj-DP preserves the analytical
|
||
dense urban and sparse suburban mobility patterns. utility of published trajectories under different privacy budgets. All
|
||
|
||
5
|
||
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
|
||
|
||
|
||
|
||
|
||
(a) MAE of Count Queries (b) MRE of Count Queries
|
||
|
||
|
||
Fig. 2. Trajectory count query accuracy under varying 𝜀total on both datasets.
|
||
|
||
|
||
evaluations are conducted on both the T-Drive and GeoLife datasets, Table 1
|
||
covering dense and sparse mobility scenarios to ensure cross-domain Spatial fidelity comparison (average over T-Drive and GeoLife datasets). Lower
|
||
consistency. values indicate higher spatial accuracy.
|
||
𝜀total Hausdorff Distance (HD) Mean Displacement (MD)
|
||
|
||
5.2.1. Accuracy of trajectory count queries AdaTraj-DP Best Baseline AdaTraj-DP Best Baseline
|
||
We evaluate the ability of each method to answer prefix-based count 0.5 0.152 0.171 (HA-Tree) 0.098 0.113 (HA-Tree)
|
||
queries accurately. For each dataset, a query set consisting of 1000 1.0 0.096 0.127 (HA-Tree) 0.069 0.087 (HA-Tree)
|
||
1.5 0.089 0.125 (TFIDF-DP) 0.063 0.088 (TFIDF-DP)
|
||
random trajectory prefixes with lengths between 4 and 8 is selected.
|
||
2.0 0.083 0.118 (TFIDF-DP) 0.059 0.083 (TFIDF-DP)
|
||
Let 𝑐(𝑞) denote the true count of trajectories matching prefix 𝑞 ∈ , and 3.0 0.079 0.130 (QJLP) 0.056 0.094 (QJLP)
|
||
𝑐(𝑞)
|
||
̂ be the noisy count returned by the mechanism. The data utility is
|
||
quantified using Mean Absolute Error (MAE) and Mean Relative Error
|
||
(MRE), defined as:
|
||
tasks. Two representative learning tasks are considered: (1) trajectory
|
||
1 ∑ 1 ∑ |𝑐(𝑞) − 𝑐(𝑞)|
|
||
̂
|
||
MAE = |𝑐(𝑞) − 𝑐(𝑞)|,
|
||
̂ MRE = (18) classification, which predicts the semantic category of a movement se-
|
||
|| 𝑞∈ || 𝑞∈ max(𝑐(𝑞), 𝛿)
|
||
quence; (2) destination prediction, which estimates the likely endpoint
|
||
where 𝛿 is a smoothing parameter (set to 1% of the total dataset size) of an ongoing trajectory. These tasks are evaluated on the T-Drive
|
||
to prevent division by zero for small counts. The results are averaged and GeoLife datasets to reflect both dense and sparse urban mobility
|
||
over ten repetitions with independent noise realizations. environments.
|
||
|
||
Effect of Privacy Budget 𝜀total . Figs. 2(a) and 2(b) illustrate the quan- 5.3.1. Trajectory classification
|
||
titative relationship between privacy strength and data utility. All A hierarchical Transformer-based model with positional encoding is
|
||
methods exhibit a convex error decay curve as 𝜀total increases from 0.5 trained on the published trajectories to perform multi-class trajectory
|
||
to 3.0, reflecting the fundamental differential privacy trade-off. classification. The model architecture follows a standard encoder setup
|
||
In the strict privacy regime (𝜖𝑡𝑜𝑡𝑎𝑙 ∈ [0.5, 1.5]), our method achieves with three attention layers and a hidden size of 256. Each experiment
|
||
the steepest marginal reduction in MAE, indicating a high return on is repeated five times under independent noise realizations, and the
|
||
privacy budget investment. Specifically, when 𝜖𝑡𝑜𝑡𝑎𝑙 increases from 0.5 average classification accuracy and macro F1-score are reported. The
|
||
to 1.0, AdaTraj-DP reduces the MAE by approximately 45.3% (from total privacy budget 𝜀total is varied from 0.5 to 3.0.
|
||
18.1 to 9.9), whereas the second-best baseline, HA-Tree, only achieves
|
||
Effect of Privacy Budget 𝜀total . Figs. 4(a) and 4(b) illustrate the influ-
|
||
a 31.4% reduction. This quantitative gap demonstrates that AdaTraj-
|
||
ence of 𝜀total on model performance. As the privacy budget increases,
|
||
DP yields a significantly higher marginal utility gain for every unit of
|
||
both accuracy and F1-score improve across all methods. AdaTraj-
|
||
privacy budget expended compared to static hierarchical structures.
|
||
DP consistently maintains the highest model utility on both datasets,
|
||
demonstrating that adaptive sensitivity control effectively preserves
|
||
5.2.2. Preservation of spatial distribution
|
||
discriminative features. The hierarchical tree representation mitigates
|
||
Spatial fidelity evaluates the geometric similarity between the orig-
|
||
local noise accumulation, supporting stable model convergence.
|
||
inal and perturbed trajectories. We use two complementary metrics:
|
||
the Hausdorff Distance (HD) for worst-case deviation and the Mean 5.3.2. Destination prediction
|
||
Displacement (MD) for average positional distortion. To evaluate predictive consistency, a sequence-to-sequence neural
|
||
Effect of Privacy Budget 𝜀total . Fig. 3 and Table 1 summarize the spatial decoder is trained to predict the destination region of each trajectory
|
||
accuracy across privacy levels. For both T-Drive and GeoLife datasets, prefix. Prediction accuracy is measured by the top-1 hit rate, while
|
||
AdaTraj-DP consistently achieves smaller deviations, demonstrating its spatial accuracy is quantified by the mean geodesic distance between
|
||
robustness across data densities and spatial patterns. The sensitivity- predicted and true destinations.
|
||
guided perturbation preserves local consistency, while adaptive budget Effect of Privacy Budget 𝜀total . Figs. 5(a) and 5(b) illustrate the results
|
||
redistribution reduces distortion in dense urban regions. of destination prediction across both datasets. AdaTraj-DP maintains
|
||
Overall, AdaTraj-DP demonstrates consistent spatial and statisti- stable predictive performance even under strict privacy constraints
|
||
cal accuracy across both datasets, validating its generalizability to (𝜀total < 1.0), consistently outperforming fixed-budget baselines that
|
||
heterogeneous mobility distributions. cannot adapt to local sensitivity variations. As the privacy budget
|
||
increases, the prediction accuracy steadily improves, while the mean
|
||
5.3. RQ2: Model utility evaluation spatial deviation between predicted and true destinations decreases.
|
||
This demonstrates that adaptive perturbation and hierarchical encoding
|
||
This experiment evaluates how the differentially private trajectories together preserve mobility semantics and ensure downstream models
|
||
generated by AdaTraj-DP retain their utility for AI-based downstream can effectively capture trajectory intent despite injected noise.
|
||
|
||
6
|
||
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
|
||
|
||
|
||
|
||
|
||
(a) Hausdorff Distance vs. Privacy (b) Mean Displacement vs. Privacy
|
||
Budget Budget
|
||
|
||
|
||
Fig. 3. Spatial fidelity comparison on T-Drive and GeoLife datasets.
|
||
|
||
|
||
|
||
|
||
(a) Classification Accuracy (b) F1-score
|
||
|
||
|
||
Fig. 4. Trajectory classification performance under varying 𝜀total on T-Drive and GeoLife datasets.
|
||
|
||
|
||
|
||
|
||
(a) Destination Prediction Accuracy (b) Destination Prediction Mean Dis-
|
||
(Top-1 Hit Rate) tance Error (km)
|
||
|
||
|
||
Fig. 5. Destination prediction accuracy and spatial deviation under varying 𝜀total on T-Drive and GeoLife datasets.
|
||
|
||
|
||
5.4. RQ3: Parameter sensitivity analysis 𝛼 = 0.6, where both the query error and model accuracy achieve
|
||
near-balanced performance. When 𝛼 < 0.4, excessive noise in point
|
||
This experiment investigates the effect of key parameters in AdaTraj- perturbation causes degraded spatial precision, while 𝛼 > 0.8 reduces
|
||
DP on privacy–utility balance, focusing on two critical hyperparame- the reliability of aggregated counts in the prefix tree, highlighting the
|
||
ters: the budget allocation ratio 𝛼 and the sensitivity threshold 𝜃TFIDF . necessity of coordinated budget allocation.
|
||
All experiments are conducted with the total privacy budget 𝜀total = 1.5 In practice, the optimal 𝛼 depends on the specific utility require-
|
||
on both the T-Drive and GeoLife datasets. ments. For applications prioritizing fine-grained point precision (e.g.,
|
||
destination prediction), a larger 𝛼 (e.g., 0.6–0.7) is recommended to
|
||
5.4.1. Effect of budget allocation ratio 𝛼 allocate more budget to the perturbation phase. Conversely, for range
|
||
The parameter 𝛼 controls the distribution of the total privacy budget query tasks relying on aggregate statistics, a smaller 𝛼 favors the hier-
|
||
between the point-level perturbation and the hierarchical tree aggre- archical tree structure. An empirical strategy for parameter selection
|
||
gation phases, where 𝜀point = 𝛼𝜀total and 𝜀tree = (1 − 𝛼)𝜀total . A small involves using a small, non-sensitive validation set to estimate the
|
||
𝛼 assigns more budget to aggregation, reducing hierarchical noise, inflection point of the loss function. A balanced initialization of 𝛼 = 0.6
|
||
whereas a large 𝛼 increases point-level fidelity at the expense of tree is recommended as a default setting, which prioritizes neither point-
|
||
consistency. We vary 𝛼 from 0.1 to 0.9 and evaluate both data utility level perturbation nor structural aggregation excessively. To ensure
|
||
and model accuracy. privacy integrity, this validation set is constructed from public histor-
|
||
Figs. 6 presents the effect of 𝛼 on count query error (MAE) and ical trajectory data (e.g., open-source T-Drive samples) or a disjoint
|
||
trajectory classification accuracy. An optimal trade-off is observed near subset of historical records that does not overlap with the private
|
||
|
||
7
|
||
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
|
||
|
||
|
||
|
||
|
||
Fig. 8. Computational cost decomposition of AdaTraj-DP across three key
|
||
Fig. 6. Impact of budget allocation ratio 𝛼 on query utility and model
|
||
stages.
|
||
performance at 𝜀total = 1.5.
|
||
|
||
|
||
|
||
T-Drive dataset and the sparse, diverse GeoLife dataset. This cross-
|
||
dataset stability suggests that AdaTraj-DP is robust to heterogeneous
|
||
spatial distributions, indicating that a standard parameter configura-
|
||
tion can yield reliable performance without the need for exhaustive
|
||
hyperparameter retuning for every new application scenario.
|
||
|
||
5.5. Scalability analysis
|
||
|
||
To address practical deployment concerns, particularly for city-wide
|
||
scenarios, we analyze the scalability of AdaTraj-DP regarding both
|
||
dataset volume (number of users 𝑁) and temporal duration (trajectory
|
||
length 𝐿).
|
||
Scalability to Large-scale User Datasets. The computational complex-
|
||
Fig. 7. Effect of the sensitivity threshold 𝜃TFIDF on spatial fidelity and predic- ity of AdaTraj-DP is dominated by the linear scanning of trajectory
|
||
tive performance at 𝜀total = 1.5. points. Specifically, the sensitivity detection and adaptive perturbation
|
||
phases operate on each trajectory independently, with a time complex-
|
||
ity of 𝑂(𝑁 ⋅ 𝐿). This independence allows for trivial parallelization
|
||
across multiple processors, significantly reducing runtime on large-
|
||
dataset . This separation guarantees that the hyperparameter tuning
|
||
scale datasets. Furthermore, the hierarchical aggregation phase inserts
|
||
process relies solely on public knowledge and does not consume the
|
||
encoded sequences into the prefix tree with a complexity of 𝑂(𝑁 ⋅ 𝐿),
|
||
privacy budget allocated for the sensitive data.
|
||
avoiding the quadratic 𝑂(𝑁 2 ) pairwise comparisons often required by
|
||
clustering-based or 𝐾-anonymity approaches. Consequently, the run-
|
||
5.4.2. Effect of sensitivity threshold 𝜃TFIDF time of AdaTraj-DP grows linearly with the number of users, indicating
|
||
The threshold 𝜃TFIDF determines how many trajectory points are that the framework is scalable to large-scale spatiotemporal datasets
|
||
classified as sensitive during the TF–IDF-based detection process. A typical of modern urban computing.
|
||
smaller threshold labels more points as sensitive, resulting in stronger
|
||
Robustness for Long Historical Trajectories. For long historical tra-
|
||
protection but higher noise magnitude. We vary 𝜃TFIDF from 0.6 to 1.2
|
||
jectories, the challenge lies in maintaining structural efficiency and
|
||
and evaluate the mean displacement (MD) and destination prediction
|
||
data utility as the sequence length increases. AdaTraj-DP addresses this
|
||
accuracy.
|
||
through two mechanisms:
|
||
Figs. 7 depicts the variation of spatial fidelity and predictive util-
|
||
ity under different 𝜃TFIDF values. As 𝜃TFIDF increases, the number of (1) Efficient Encoding: The Hilbert space-filling curve maps high-
|
||
sensitive points decreases, leading to reduced perturbation intensity dimensional spatial points into 1D integers via efficient bit-
|
||
and smaller average displacement. However, excessively large 𝜃TFIDF wise operations. Since the encoding complexity is constant per
|
||
weakens privacy coverage and slightly degrades downstream predic- point, the computational cost scales linearly with the trajectory
|
||
tion accuracy. The optimal setting is observed around 𝜃TFIDF = 0.9, length, avoiding the performance bottlenecks often associated
|
||
balancing spatial accuracy with model generalization. with complex sequence alignment methods.
|
||
(2) Depth-Robust Aggregation: Long trajectories naturally necessitate
|
||
5.4.3. Generalization and parameter stability deeper prefix trees, which typically suffer from severe budget
|
||
In the ablation studies presented above, we observed that the frame- dilution at lower levels. AdaTraj-DP addresses this through its
|
||
work’s utility is responsive to variations in the budget allocation ratio logarithmic layer-wise allocation (Eq. (12)), which dampens
|
||
𝛼 and sensitivity threshold 𝜃TFIDF , particularly when these parameters the noise increase rate relative to tree depth. This mechanism
|
||
approach the boundaries of their respective ranges. This sensitivity ensures that the tail ends of extended mobility sequences re-
|
||
necessitates a discussion on the model’s generalization capabilities tain analytical utility, preventing the rapid signal degradation
|
||
across different data distributions. commonly observed in uniform allocation schemes.
|
||
While the framework exhibits sensitivity to extreme parameter vari-
|
||
ations, it is worth noting that the optimal operating points (𝛼 ≈ Empirical Efficiency Evaluation. To complement the theoretical com-
|
||
0.6, 𝜃TFIDF ≈ 0.9) remain consistent across both the high-density plexity analysis, Fig. 8 presents the empirical runtime decomposition
|
||
|
||
8
|
||
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
|
||
|
||
|
||
of AdaTraj-DP on the T-Drive dataset. The total processing time is This transformation is controlled by the Hilbert curve’s order pa-
|
||
approximately 250 s. As observed, the TF–IDF Analysis phase con- rameter, designated as 𝑘. When applying a Hilbert curve with order 𝑘,
|
||
stitutes the majority of the computational overhead (approx. 60%) the two-dimensional space becomes divided into a (2𝑘 ) × (2𝑘 ) cellular
|
||
due to the necessity of global statistical aggregation across the spatial grid. To guarantee that every coordinate within dataset 𝐷 receives
|
||
grid. However, the core privacy mechanisms—Prefix Tree Construction a distinct Hilbert index √assignment, the order parameter must fulfill
|
||
and Perturbation—demonstrate high efficiency. Notably, the adaptive the condition 𝑘 ≥ ⌈log |𝐷|⌉. This configuration assigns each cell,
|
||
perturbation phase accounts for less than 10% of the total time, con- including any coordinate it contains, to a unique integer within the
|
||
firming that the granular noise injection introduces negligible latency. interval [0, (2𝑘 )2 − 1].
|
||
This performance profile validates that AdaTraj-DP is well-suited for The binary sequence length, denoted 𝐿enc , depends on the total
|
||
periodic batch publishing scenarios (e.g., releasing trajectory updates count of representable integer values. Representing all (2𝑘 )2 = 22𝑘
|
||
every 5-10 min for traffic monitoring). While the current execution distinct values necessitates a binary sequence of length 𝐿enc = 2𝑘. The
|
||
time is sufficient for such batch-based near-real-time analytics, we transformation consists of a direct conversion from integer 𝑣𝑖,𝑗 to its
|
||
acknowledge that strictly latency-critical streaming applications may 𝐿enc -bit binary form, applying leading zero-padding when needed to
|
||
require further optimization of the tree construction process. Neverthe- maintain uniform length.
|
||
less, for the targeted high-utility analysis tasks, this computational cost Consider the following illustration: assume a Hilbert curve with
|
||
is a justifiable trade-off for the structural consistency provided by the order 𝑘 = 8. Under these conditions: The cellular count equals (28 )2 =
|
||
framework. 65,536. The integer value 𝑣𝑖,𝑗 resides within the interval [0, 65535]. The
|
||
necessary binary sequence length becomes 𝐿enc = 2 × 8 = 16.
|
||
6. Conclusion When coordinate 𝑝′𝑖,𝑗 maps to integer 𝑣𝑖,𝑗 = 47593, its 16-bit binary
|
||
sequence representation becomes:
|
||
This study presented AdaTraj-DP, an adaptive privacy-preserving
|
||
𝑠𝑖,𝑗 = Encode(47593, 16) = "1011100111101001". (A.1)
|
||
framework for publishing trajectory data with differential privacy guar-
|
||
antees. The framework introduces context-aware sensitivity modeling This sequence 𝑠𝑖,𝑗 serves as the actual element for navigating and
|
||
and adaptive budget allocation to balance privacy protection and an- constructing the prefix tree. Individual bits within the sequence deter-
|
||
alytical utility in AI-based mobility analysis. By integrating personal- mine decisions at corresponding tree levels, establishing a multi-level
|
||
ized perturbation with hierarchical prefix-tree aggregation, AdaTraj-DP spatial indexing structure. The selection of parameter 𝑘 (and conse-
|
||
enables trajectory-level differential privacy while maintaining spatial quently 𝐿enc ) represents a crucial design choice that mediates between
|
||
fidelity and downstream model performance. spatial granularity and the prefix tree’s dimensions and computational
|
||
Future work will focus on extending AdaTraj-DP to support multi- overhead.
|
||
modal trajectory data, integrating semantic and temporal context under
|
||
unified privacy constraints. Additionally, to address the efficiency con- Data availability
|
||
cerns in high-frequency streaming environments, we plan to investigate
|
||
incremental tree update algorithms. This would allow the framework Data will be made available on request.
|
||
to handle real-time data streams with significantly lower latency while
|
||
maintaining the established privacy guarantees.
|
||
References
|
||
CRediT authorship contribution statement
|
||
[1] W. Zhang, M. Li, R. Tandon, H. Li, Online location trace privacy: An information
|
||
theoretic approach, IEEE Trans. Inf. Forensics Secur. 14 (1) (2018) 235–250.
|
||
Yongxin Zhao: Writing – review & editing, Writing – original [2] F. Jin, W. Hua, M. Francia, P. Chao, M.E. Orlowska, X. Zhou, A survey and
|
||
draft, Visualization, Validation, Methodology, Investigation, Data cu- experimental study on privacy-preserving trajectory data publishing, IEEE Trans.
|
||
ration, Conceptualization. Chundong Wang: Writing – review & edit- Knowl. Data Eng. 35 (6) (2022) 5577–5596.
|
||
[3] J. Liu, J. Chen, R. Law, S. Wang, L. Yang, Travel patterns and spatial structure:
|
||
ing, Project administration, Methodology. Hao Lin: Visualization, Val-
|
||
understanding winter tourism by trajectory data mining, Asia Pac. J. Tour. Res.
|
||
idation, Methodology. Xumeng Wang: Writing – review & editing, 29 (11) (2024) 1351–1368.
|
||
Methodology, Conceptualization. Yixuan Song: Methodology, Investi- [4] Z. Wu, X. Wang, Z. Huang, T. Zhang, M. Zhu, X. Huang, M. Xu, W. Chen, A
|
||
gation, Conceptualization. Qiuyu Du: Investigation, Conceptualization. utility-aware privacy-preserving method for trajectory publication, IEEE Trans.
|
||
Vis. Comput. Graphics.
|
||
[5] S. Schestakov, S. Gottschalk, T. Funke, E. Demidova, RE-Trace: Re-identification
|
||
Declaration of competing interest of modified GPS trajectories, ACM Trans. Spat. Algorithms Syst. 10 (4) (2024)
|
||
1–28.
|
||
The authors declare that they have no known competing finan- [6] C. Dwork, Differential privacy, in: International Colloquium on Automata,
|
||
cial interests or personal relationships that could have appeared to Languages, and Programming, Springer, 2006, pp. 1–12.
|
||
[7] Z. Yang, R. Wang, D. Wu, H. Wang, H. Song, X. Ma, Local trajectory privacy
|
||
influence the work reported in this paper. protection in 5G enabled industrial intelligent logistics, IEEE Trans. Ind. Inform.
|
||
18 (4) (2021) 2868–2876.
|
||
Acknowledgments [8] Z. Shen, Y. Zhang, H. Wang, P. Liu, K. Liu, Y. Shen, BiGRU-DP: Improved
|
||
differential privacy protection method for trajectory data publishing, Expert Syst.
|
||
Appl. 252 (2024) 124264.
|
||
Thanks to the National Key R&D Program of China (2023YFB2703
|
||
[9] Y. Zhao, C. Wang, Protecting privacy and enhancing utility: A novel approach for
|
||
900). personalized trajectory data publishing using noisy prefix tree, Comput. Secur.
|
||
144 (2024) 103922.
|
||
Appendix. Conversion from integer values to binary sequences [10] S. Yuan, D. Pi, X. Zhao, M. Xu, Differential privacy trajectory data protection
|
||
scheme based on R-tree, Expert Syst. Appl. 182 (2021) 115215.
|
||
[11] W. Cheng, R. Wen, H. Huang, W. Miao, C. Wang, OPTDP: Towards opti-
|
||
Our prefix tree construction necessitates the representation of each mal personalized trajectory differential privacy for trajectory data publishing,
|
||
geographic coordinate as a character sequence. Although the Hilbert Neurocomputing 472 (2022) 201–211.
|
||
space-filling curve successfully transforms a two-dimensional coordi- [12] N. Niknami, M. Abadi, F. Deldar, A fully spatial personalized differentially private
|
||
nate 𝑝′𝑖,𝑗 into a one-dimensional integer 𝑣𝑖,𝑗 , this numerical value can- mechanism to provide non-uniform privacy guarantees for spatial databases, Inf.
|
||
Syst. 92 (2020) 101526.
|
||
not be directly incorporated into a conventional prefix tree structure. [13] P. Liu, D. Wu, Z. Shen, H. Wang, K. Liu, Personalized trajectory privacy data
|
||
Consequently, we implement an additional transformation phase that publishing scheme based on differential privacy, Internet Things 25 (2024)
|
||
converts this integer into a binary sequence 𝑠𝑖,𝑗 with fixed length. 101074.
|
||
|
||
|
||
9
|
||
Y. Zhao et al. Computer Standards & Interfaces 97 (2026) 104125
|
||
|
||
|
||
[14] W. Qardaji, W. Yang, N. Li, Differentially private grids for geospatial data, in: [25] T. Wang, Y. Tao, A. Gilad, A. Machanavajjhala, S. Roy, Explaining differen-
|
||
2013 IEEE 29th International Conference on Data Engineering, ICDE, IEEE, 2013, tially private query results with dpxplain, Proc. VLDB Endow. 16 (12) (2023)
|
||
pp. 757–768. 3962–3965.
|
||
[15] G. Cormode, C. Procopiuc, D. Srivastava, E. Shen, T. Yu, Differentially private [26] Z. Huang, J. Liu, D.G. Alabi, R.C. Fernandez, E. Wu, Saibot: A differentially
|
||
spatial decompositions, in: 2012 IEEE 28th International Conference on Data private data search platform, Proc. VLDB Endow. (PVLDB) 16 (11) (2023) PVLDB
|
||
Engineering, IEEE, 2012, pp. 20–31. 2023 demo / system paper.
|
||
[16] J. Hua, Y. Gao, S. Zhong, Differentially private publication of general time- [27] Y. Dai, J. Shao, C. Wei, D. Zhang, H.T. Shen, Personalized semantic trajectory
|
||
serial trajectory data, in: 2015 IEEE Conference on Computer Communications, privacy preservation through trajectory reconstruction, World Wide Web 21
|
||
INFOCOM, IEEE, 2015, pp. 549–557. (2018) 875–914.
|
||
[17] Z. Zhang, X. Xu, F. Xiao, LGAN-DP: A novel differential private publication [28] K. Zuo, R. Liu, J. Zhao, Z. Shen, F. Chen, Method for the protection of
|
||
mechanism of trajectory data, Future Gener. Comput. Syst. 141 (2023) 692–703. spatiotemporal correlation location privacy with semantic information, J. Xidian
|
||
[18] Y. Hu, Y. Du, Z. Zhang, Z. Fang, L. Chen, K. Zheng, Y. Gao, Real-time trajectory Univ. 49 (1) (2022) 67–77.
|
||
synthesis with local differential privacy, in: 2024 IEEE 40th International [29] S. Denisov, H.B. McMahan, J. Rush, A. Smith, A. Guha Thakurta, Improved
|
||
Conference on Data Engineering, ICDE, IEEE, 2024, pp. 1685–1698. differential privacy for sgd via optimal private linear operators on adaptive
|
||
[19] R. Zhang, W. Ni, N. Fu, L. Hou, D. Zhang, Y. Zhang, DP-LTGAN: Differentially streams, Adv. Neural Inf. Process. Syst. 35 (2022) 5910–5924.
|
||
private trajectory publishing via Locally-aware Transformer-based GAN, Future [30] H. Fang, X. Li, C. Fan, P. Li, Improved convergence of differential private sgd
|
||
Gener. Comput. Syst. 166 (2025) 107686. with gradient clipping, in: The Eleventh International Conference on Learning
|
||
[20] S. Jiao, J. Cheng, Z. Huang, T. Li, T. Xie, W. Chen, Y. Ma, X. Wang, DPKnob: A Representations, 2023.
|
||
visual analysis approach to risk-aware formulation of differential privacy schemes [31] J. Fu, coauthors, DPSUR: Accelerating differentially private training via selective
|
||
for data query scenarios, Vis. Inform. 8 (3) (2024) 42–52. updates and release, Proc. VLDB Endow. (PVLDB) 17 (2024) PVLDB paper; PDF
|
||
[21] X. Wang, S. Jiao, C. Bryan, Defogger: A visual analysis approach for data available from VLDB site.
|
||
exploration of sensitive data protected by differential privacy, IEEE Trans. Vis. [32] Y. Zheng, Trajectory data mining: an overview, ACM Trans. Intell. Syst. Technol.
|
||
Comput. Graphics 31 (1) (2025) 448–458, http://dx.doi.org/10.1109/TVCG. (TIST) 6 (3) (2015) 1–41.
|
||
2024.3456304. [33] M.E. Andrés, N.E. Bordenabe, K. Chatzikokolakis, C. Palamidessi, Geo-
|
||
[22] R. Chen, B.C.M. Fung, B.C. Desai, Differentially private trajectory data indistinguishability: Differential privacy for location-based systems, in: Proceed-
|
||
publication, 2011, arXiv:1112.2020, URL https://arxiv.org/abs/1112.2020. ings of the 2013 ACM SIGSAC Conference on Computer & Communications
|
||
[23] C. Yin, J. Xi, R. Sun, J. Wang, Location privacy protection based on differential Security, 2013, pp. 901–914.
|
||
privacy strategy for big data in industrial internet of things, IEEE Trans. Ind. [34] W. Zhang, M. Li, R. Tandon, H. Li, Semantic-aware privacy-preserving online
|
||
Inform. 14 (8) (2017) 3628–3636. location trajectory data sharing, IEEE Trans. Inf. Forensics Secur. 17 (2022)
|
||
[24] Y. Zhao, C. Wang, E. Zhao, X. Zheng, H. Lin, PerTrajTree-DP: A personalized 2292–2306.
|
||
privacy-preserving trajectory publishing framework for trustworthy AI systems, [35] J. Yuan, Y. Zheng, C. Zhang, W. Xie, X. Xie, G. Sun, Y. Huang, T-drive: driving
|
||
in: Data Security and Privacy Protection, Springer Nature Singapore, Singapore, directions based on taxi trajectories, in: Proceedings of the 18th SIGSPATIAL
|
||
ISBN: 978-981-95-3182-0, 2026, pp. 57–75. International Conference on Advances in Geographic Information Systems, 2010,
|
||
pp. 99–108.
|
||
[36] Y. Zheng, X. Xie, W.-Y. Ma, et al., GeoLife: A collaborative social networking
|
||
service among user, location and trajectory, IEEE Data Eng. Bull. 33 (2) (2010)
|
||
32–39.
|
||
[37] Y. Zhao, C. Wang, L. Li, X. Wang, H. Lin, Z. Liu, TrajMamba: A multi-scale
|
||
mamba-based framework for joint trajectory and road network representation
|
||
learning, 2025, https://ssrn.com/abstract=5624451.
|
||
|
||
|
||
|
||
|
||
10
|
||
|