Files
opaque-lattice/papers_txt/Co-distillation-based-defense-framework-for-federated-k_2026_Computer-Standa.txt
2026-01-06 12:49:26 -07:00

830 lines
99 KiB
Plaintext
Raw Permalink Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Computer Standards & Interfaces 97 (2026) 104113
Contents lists available at ScienceDirect
Computer Standards & Interfaces
journal homepage: www.elsevier.com/locate/csi
Co-distillation-based defense framework for federated knowledge graph
embedding against poisoning attacks
Yiqin Lu, Jiarui Chen , Jiancheng Qin
School of Electronic and Information Engineering, South China University of Technology, 510641, China
ARTICLE INFO ABSTRACT
Keywords: Federated knowledge graph embedding (FKGE) enables collaborative knowledge sharing without data ex-
Federated learning change, but it also introduces risks of poisoning attacks that degrade model accuracy or force incorrect
Knowledge graph outputs. Protecting FKGE from poisoning attacks becomes a critical research problem. This paper reveals
Poisoning attack
the malicious strategy of untargeted FKGE poisoning attacks and proposes CoDFKGE, a co-distillation-based
Knowledge distillation
FKGE framework for defending against poisoning attacks. CoDFKGE deploys two collaborative knowledge
graph embedding models on clients, decoupling prediction parameters from shared parameters as a model-
agnostic solution. By designing distinct distillation loss functions, CoDFKGE transfers clean knowledge from
potentially poisoned shared parameters while compressing dimensions to reduce communication overhead.
Experiments show CoDFKGE preserves link prediction performance with lower communication costs, eliminates
malicious manipulations under targeted poisoning attacks, and significantly mitigates accuracy degradation
under untargeted poisoning attacks.
1. Introduction embedding for entities and relations. However, real-world KGs of dif-
ferent organizations are often incomplete, making it difficult to train
Knowledge graphs (KGs) are structured representations of real- high-quality knowledge graph reasoning models. Moreover, KG data
world entities and their relationships, supporting applications in search often contains a large amount of private data, and direct data sharing
engines [1,2], recommendation systems [3,4], and security analysis [5, will inevitably lead to privacy leakage. For this reason, federated
6]. Knowledge graph embedding (KGE) techniques project entities learning [12] is introduced into knowledge graph reasoning.
and relations into low-dimensional vector spaces, enabling efficient
FKGE assumes that there are multiple participants with comple-
knowledge reasoning and completion [7]. Due to privacy regulations
mentary but incomplete KGs, aiming to derive optimal knowledge
and data sensitivity requirements, KGs across organizations within the
embeddings for each participant without data exchange. Most existing
same domain remain fragmented despite growing data volumes. In this
context, federated knowledge graph embedding (FKGE) emerges as a studies [1315] model FKGE as multiple clients that maintain local
collaborative learning technique for sharing KG embeddings without KGE models and a central server. Clients train models locally and
data exchange. However, the introduction of federation mechanisms upload the model parameters to the central server, which aggregates
will bring new privacy risks. malicious participants can inject poisoned the parameters and then returns them to the clients.
parameters during training or aggregation to launch a poisoning attack, However, since the embedding vectors are directly the model pa-
degrading model accuracy or forcing incorrect outputs. Consequently, rameters, FKGE is highly vulnerable to poisoning attacks. With the
protecting FKGE systems against poisoning attacks has emerged as a intent to reduce model performance, steal sensitive information, or dis-
critical research challenge. rupt system stability, poisoning attacks refer to malicious modifications
Unlike graph neural network (GNN)-based models, KGE models of parameters during local training or parameter aggregation on the
usually rely on the translation-based model [811]. The embedding
server. To protect the participants of FKGE, it is necessary to propose
vectors of entity and relation in the KG are directly used as learnable
a protection mechanism against FKGE poisoning attacks.
parameters. KGE models utilize different score functions to measure
Moreover, other related indicators in FKGE deserve attention. For
the plausibility of triples (h,r,t). By contrasting the outputs of existing
triples and negatively sampled triples, KGE models derive appropriate example, the federated learning of KGE requires frequent parameter
Corresponding author.
E-mail addresses: eeyqlu@scut.edu.cn (Y. Lu), ee_jrchen@mail.scut.edu.cn (J. Chen), jcqin@scut.edu.cn (J. Qin).
https://doi.org/10.1016/j.csi.2025.104113
Received 3 June 2025; Received in revised form 8 November 2025; Accepted 8 December 2025
Available online 9 December 2025
0920-5489/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113
exchange, and the use of a translation-based model will submit the en- 2.3. Poisoning attack in federated learning
tity or relation embeddings, which makes the communication overhead
greater than that of traditional federated learning. Federated Learning (FL), due to its distributed training nature,
Knowledge distillation [16] is a model compression technique that creates favorable conditions for poisoning attacks while protecting
improves the performance of a simple (student) model by transfer- data privacy. Poisoning attacks in federated learning have attracted
ring the knowledge from a complex (teacher) model. Distillation-based significant attention from researchers [25]. In federated learning sce-
methods are considered to be a feasible solution to combat poisoning narios, poisoning attacks pose serious threats to model security by
attacks [1719]. A teacher model can extract clean knowledge from manipulating partial training data or local models to embed malicious
the poisoned parameters and transfer it to a student model, thereby behaviors [26]. The literature [27] generates stealthy backdoor trig-
improving the robustness without changing the model structure. Co- gers by extracting high-frequency features from images using discrete
distillation [20] is a variant of knowledge distillation that trains two or wavelet transform and introduces an asymmetric frequency confusion
more models simultaneously, allowing mutual learning and information mechanism, achieving efficient backdoor attacks on multiple datasets.
sharing. This paper aims to design a federated knowledge graph defense
Meanwhile, many studies have proposed defense methods against poi-
framework based on Co-distillation, which can enhance the models
soning attacks. The Literature [28] proposes the Krum method, which
resistance to poisoning attacks through collaborative learning without
selects the most reliable gradient update by evaluating the consistency
changing the original FKGE architecture.
of gradients, thereby effectively defending against poisoning attacks.
The rest of this paper is organized as follows. Section 2 reviews the
The Literature [29] proposes Fl-Defender, which improves robustness
related work on FKGE and knowledge distillation. Section 3 introduces
by introducing cosine similarity to adjust the weights of parameter
the preliminary concepts and methodologies essential for addressing
aggregation. The literature [30] proposed a two-stage backdoor defense
FKGE poisoning attacks, with the main contributions of this paper
method called MCLDef based on Model Contrastive Learning (MCL),
summarized at the end of this section. In Section 4, we detail the threat
which can significantly reduce the success rate of backdoor attacks with
model and malicious strategies for targeted and untargeted poison-
only a small amount of clean data. In summary, existing research on
ing attacks in FKGE. Section 5 presents the CoDFKGE framework for
poisoning attacks in federated learning mainly focuses on traditional
defending against FKGE poisoning attacks, followed by experimental
validation in Section 6. Finally, concluding remarks and future research deep learning domains. The design ideas of defense frameworks have
directions are outlined in Section 7. laid the foundation for subsequent poisoning attack defense methods of
FKGE.
2. Related work
2.4. Security issues in FKGE
2.1. Basic FKGE framework
With the development of FKGE, its security and privacy issues have
Early research on FKGE mainly focused on how to achieve cross- attracted increasing attention, with existing research mainly focusing
client knowledge sharing and model aggregation while protecting data on privacy leakage defense. The literature [31] proposed a decentral-
privacy. FedE [13] is the first paper to introduce federated learning into ized scalable learning framework where embeddings from different KGs
KGE. FedE facilitates cross-client knowledge sharing by maintaining an can be learned in an asynchronous and peer-to-peer manner while
entity table. Nevertheless, the mechanism of sharing entity embeddings being privacy-preserving. The literature [21] conducts the first holistic
in FedE has been proven to contain privacy vulnerabilities [21]. At- study of the privacy threat on FKGE from both attack and defense
tackers can leverage the embedding information to infer the existence perspectives. It introduced three new inference attacks and proposed
of private triples within client datasets. Based on FedE, FedEC [14] a differentially private FKGE model DP-Flames with private selection
applies embedding contrastive learning for tackling data heterogeneity and an adaptive privacy budget allocation policy. Based on [21], the
and utilizes a global update procedure for sharing entity embeddings. literature [32] introduces five new inference attacks, and proposed
In response to the privacy vulnerability of FedE, FedR [15] proposed a PDP-Flames, which leverages the sparse gradient nature of FKGE for
privacy-preserving relation embedding aggregation method. By sharing
better privacy-utility trade-off.
relation embeddings instead of entity embeddings, FedR can signifi-
Compared with privacy leakage issues, research on defending
cantly reduce the communication overhead of privacy leakage risks
against poisoning attacks in FKGE is still in its early stages. Traditional
while retaining the semantic information of the KG.
federated learning typically does not directly transmit original embed-
dings. However, entity and relation embeddings are core components
2.2. Knowledge distillation in FKGE
in translation-based KGE, so direct transmission of embeddings is
required during FKGE aggregation. Direct malicious modifications to
Knowledge Distillation techniques are widely applied in the FKGE
embeddings are difficult to effectively defend against using traditional
field due to their advantages in model compression and knowledge
transfer. To cope with the drift between local optimization and global federated learning defense methods.
convergence caused by data heterogeneity, FedLU [22] proposes mu- The recent literature [33] is the first work to systematize the risks of
tual knowledge distillation. Moreover, it contains an unlearning method FKGE poisoning attacks. However, it primarily focuses on several forms
to erase specific knowledge from local clients. FedKD [23] uses knowl- of targeted poisoning attacks in FKGE, without mentioning untargeted
edge distillation to reduce communication costs, and proposes to adap- poisoning attacks. Although this research provides some defense sug-
tively learn temperature to scale the scores of triples to mitigate teacher gestions, such as zero-knowledge proof and privacy set intersection, it
over-confidence issues. In addition to FKGE, the KGE model ColE [24] does not propose specific defense methods. In summary, the existing
proposes co-distillation learning to exploit the complementarity of research lacks a systematic introduction to the untargeted poisoning
graph structure and text information. It employs Transformer and Bert attack of FKGE, and there is no complete defense method against FKGE
for graph and text respectively, then distills selective knowledge from poisoning attacks.
each others prediction logits. Overall, existing research on knowledge To address the above issues, this paper reveals the malicious strat-
distillation in FKGE primarily focuses on handling data heterogeneity, egy of FKGE untargeted poisoning attacks and proposes CoDFKGE,
with insufficient exploration of its potential value in model security. a co-distillation-based federating knowledge graph embedding frame-
This paper will explore the application of knowledge distillation in work for defending against poisoning attacks. The main contributions
FKGE security to defend against poisoning attacks. of this paper are summarized as follows.
2
Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113
1 We systematically define untargeted poisoning attacks in FKGE local KGE model to update its local embedding 𝜃𝐿𝑘 and server-shared
𝑐
and reveal the poisoning attacks malicious strategy, thereby en- embedding 𝜃𝑆𝑘 . Then, client 𝑐 uploads its shared embedding 𝜃𝑆𝑘 to the
𝑐 𝑐
hancing threat identification in FKGE and providing a foundation server. In server aggregate stage, the central server 𝑆 aggregates the
for subsequent defense research. shared embeddings from all clients to obtain the shared parameters
2 We propose CoDFKGE, the first co-distillation defense framework 𝜃𝑆𝑘+1 . Finally, the server broadcasts the shared parameters 𝜃𝑆𝑘+1 to all
against poisoning attacks in FKGE. By deploying bidirectional clients. Entity embeddings in KGE are usually shared parameters, while
distillation models with distinct distillation loss at the client side, relation embeddings are local parameters. Only rare literature [15] uses
CoDFKGE as a model-agnostic solution decouples prediction pa- relation embeddings as shared parameters.
rameters from shared parameters, thereby enhancing the models In FKGE, how the server effectively aggregates shared embeddings
resistance to poisoning attacks and improving robustness. We from different clients is a common problem. The most common FKGE
designed distinct distillation loss functions for the two models in server aggregation method is FedE [13], which is an improvement on
CoDFKGE, enabling CoDFKGE to transfer clean knowledge from FedAvg [12]. To handle the imbalance in the number of entities across
potentially poisoned shared parameters and compress shared pa- different clients, FedE aggregate the shared entities using the number
rameter dimensions, which reduces communication overhead. of occurrences in the local data as the weight 𝑤𝑐 . This weight value
3 We validated the performance of CoDFKGE against poisoning can be obtained using the existence matrix 𝑀 mentioned above. The
attacks through experiments. The results show that without com- mathematical expression for FedEs server aggregation method is shown
promising link prediction performance CoDFKGE can completely in (2).
eliminate targeted poisoning attacks and significantly mitigate ∑
𝜃𝑆𝑘+1 = 𝑐 𝑤𝑐 𝜃𝑆𝑘 (2)
the performance degradation caused by untargeted poisoning 𝑐
attacks, while simultaneously reducing communication overhead. The final target of FKGE is to minimize the loss function of all client
Ablation experiments further confirm the effectiveness of the two local triplets simultaneously through federated learning. Its optimiza-
distillation loss functions in CoDFKGE. tion objective can be expressed as Eq. (3).
∑𝐶
𝑎𝑟𝑔 min 𝑐 (𝜃𝐿𝑐 , 𝜃𝑆𝑐 ) (3)
3. Preliminaries (𝜃 ,𝜃 ) 𝑐
𝐿𝑐 𝑆𝑐
3.1. Knowledge graph embedding 3.3. Knowledge distillation
KG can be represented as (, ,  ), where E and R are entity sets Knowledge distillation is a model compression technique that trans-
and relationship sets.  is a set of triples, where a triple (, 𝑟, 𝑡) ∈  fers knowledge contained in a complex model (teacher) to a simple
indicates that a relationship 𝑟 ∈  connects the entities , 𝑡 ∈ . model (student) to improve the performance of the simple model. In the
Translation-based KGE models project entities and relationships classic knowledge distillation framework, the student models training
in KGs into a continuous vector space. Models employ the scoring loss comprises two components: the cross entropy loss 𝐿𝐶𝐸 , computed
function 𝑔(, 𝑟, 𝑡; 𝜃) to evaluate the plausibility of triples, while 𝜃 rep- between its output and the true label, and the distillation loss 𝐿𝐾𝐷 ,
resents the embedding parameters. During model training, negative computed between its output and the teacher models output (soft
samples (, 𝑟, 𝑡 ) are constructed by randomly replacing the tail entities label). In practical applications, the distillation loss is usually quantified
of positive triples. The training process aims to maximize the score using the KullbackLeibler divergence 𝐷𝐾𝐿 between the student model
discrepancy between positive and negative samples. Currently, most output and the soft label, and its mathematical expression is shown
KGE models [9,11] employ the binary cross-entropy loss to measure in Eq. (4).
the difference between positive and negative samples. Its mathematical ( ) ∑ ( )
𝑝 (𝑖)
expression is as Eq. (1). 𝐷𝐾𝐿 𝑝𝑡𝑒𝑎𝑝𝑠𝑡𝑢 = 𝑖 𝑝𝑡𝑒𝑎 (𝑖) log 𝑝𝑡𝑒𝑎 (𝑖)
( ) 𝑠𝑡𝑢 ( ) (4)
(
𝐿𝐾𝐷 = 𝜏 2 𝐷𝐾𝐿 𝜎(𝑧(𝑛) (𝑛)
𝑡𝑒𝑎 ) ∥ 𝜎(𝑧𝑠𝑡𝑢 ) , 𝑤𝑒𝑟𝑒 𝜎(𝑥) = sof tmax 𝜏
𝑥
𝐿 = log 𝜎 (𝑔(, 𝑟, 𝑡; 𝜃) 𝛾)
(,𝑟,𝑡)∈ Among them, 𝑧𝑡𝑒𝑎 and 𝑧𝑠𝑡𝑢 are the logits of the teacher model and
)
∑ student model, respectively. 𝜏 is the temperature coefficient, which is
+ 𝑝(, 𝑟, 𝑡𝑖 ; 𝜃) log 𝜎(𝛾 𝑔(, 𝑟, 𝑡𝑖 ; 𝜃)) (1) used to control the smoothness of the output.
𝑖
To allow the student model to effectively absorb the knowledge
Among them, 𝛾 represents the margin, and (, 𝑟, 𝑡𝑖 ) is 𝑖th negative contained in the teacher model while fitting the real data distribution,
triples. 𝑝(, 𝑟, 𝑡𝑖 ; 𝜃) stands for the occurrence probability of this negative the final loss function is usually the weighted sum of 𝐿𝐶𝐸 and 𝐿𝐾𝐷 .
sample given the embedding parameters 𝜃.
4. Threat model
3.2. Federated knowledge graph embedding
Poisoning attacks in federated learning can be categorized into
FKGE is an application of federated learning that aims to fuse and targeted poisoning attacks, semi-targeted poisoning attacks, and untar-
share knowledge vectors from different KGs to enhance the effective- geted poisoning attacks according to the intention of attackers [34].
ness of KGE. Currently, most related studies are based on the framework In FKGE, a semi-targeted poisoning attack can be regarded as a special
proposed in FedE [13]. case of a targeted poisoning attack. Therefore, this paper focuses on the
The basic framework of FKGE consists of a client set 𝐶 and a central targeted and untargeted poisoning attack type.
server 𝑆. Each client 𝑐𝐶 holds a local KG 𝑐 (𝑐 , 𝑐 , 𝑐 ). The entity
sets of different KGs are partially overlapping, so the understanding of 4.1. Targeted poisoning attack
entities in a certain client can be supplemented by information from
other clients. The server has the one-hot existence matrix 𝑀 ∈ R𝐶×𝑁 Targeted poisoning attacks are a attack strategy where the attacker
of all entities in the client, where 𝑁 is the number of entities. crafts specific malicious triples that do not exist in the target system,
In each client, KGE model parameters consist of local parame- and manipulate the target model to accept these fake triples by inject-
ters 𝜃𝐿 and shared parameters 𝜃𝑆 . During FKGE training, each epoch ing poisoned parameters into the shared parameters. This type of attack
progresses through two sequential phases: client update and server poses a serious threat to the application of FKGE, as the false relation-
aggregation. In the 𝑘th client update stage, client 𝑐 first trains its ships it introduces can lead to reasoning errors and decision-making
3
Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113
Fig. 1. Process of targeted poisoning attack.
Fig. 2. Framework of CoDFKGE model.
biases in downstream tasks. For example, in financial transaction net- attackers deceptive information. The shadow models parameters in-
works, a knowledge graph is constructed with transaction entities clude 𝜃𝑆𝑝 , which can be initialized with the victim shared parameters
as nodes and transaction relationships as edges. Link prediction can 𝜃𝑆𝑐 , and 𝜃𝐿𝑝 , which approximates the victims local model parameters
then be applied to detect potential transaction relationships (such as 𝜃𝐿𝑐 from random initial values. To ensure the shadow model effectively
money laundering or fraud). If an attacker compromises one of the bridges both the victims genuine knowledge and the attackers ma-
participants, they can introduce false transaction relationships through licious objectives, its parameters are optimized to minimize the loss
targeted poisoning attacks, leading to unreasonable inferences about function across all triples in the poisoned dataset, as formalized in Eq.
the victim entity. (5).
To execute such an attack successfully, the attacker typically follows arg min 𝐿(, 𝑟, 𝑡; 𝜃𝑆𝑝 , 𝜃𝐿𝑝 )
(𝜃𝑆𝑝 ,𝜃𝐿𝑝 ) (5)
a multi-stage process that begins with victims local information gath- (,𝑟,𝑡)∈𝑝
ering. Fig. 1 shows the process of a targeted poisoning attack. In FKGE
Where L is the loss function of the baseline model.
systems, while the server can observe the entities and relations each
After training the shadow model, the attacker extracts the poisoned
client possesses, it lacks visibility into how these elements are struc- shared parameters 𝜃𝑆𝑝 using the same procedure that legitimate clients
tured into specific triples. However, for frameworks that share entity employ to prepare parameters for server aggregation. The attacker can
embeddings (such as FedE [13]), recent research [21] has shown that a aggregate the poisoned parameters 𝜃𝑆𝑝 with the normal clients shared
malicious server can use KGE scoring function to infer the victims local parameters. The attacker usually operates as a compromised server and
relationship patterns and reconstruct the victims triple 𝑣 . Armed with assigns a disproportionately high weight to the poisoned parameters
this inferred knowledge, the attacker strategically constructs malicious during the aggregation process to ensure that the poisoned parameter
triples 𝑚 that align with the victims existing KG schema but represent dominate the aggregated shared parameters.
false information. The final stage of the attack exploits the implicit trust in feder-
The next critical attack phase involves training a shadow model, a ated systems. The victim client, unaware of the poisoning, directly
surrogate KGE model designed to mimic the victims learning process. incorporates the compromised aggregated parameters into its local
The shadow model is trained on a poisoned dataset 𝑝 , which combines training process without validation. As a result, the victims model
the inferred victim triples 𝑣 and the malicious triples 𝑚 . This training gradually learns to accept the malicious triples as valid, ultimately pro-
strategy ensures the shadow model learns to generate embeddings ducing incorrect predictions on these non-existent relationships while
that are consistent with both the victims genuine knowledge and the maintaining seemingly normal performance on other parts of the KG.
4
Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113
4.2. Untargeted poisoning attack facilitate the reproducibility of our CoDFKGE model, we provide the
complete training framework pseudocode as shown in Algorithm 1.
The conditions for achieving a targeted poisoning attack are com-
plex. For example, FedR [15] shares only relation embeddings (not
Algorithm 1 CoDFKGE Training Framework
entity embeddings), preventing attackers from inferring victim rela-
tions via entity matrices and thus avoiding targeted poisoning attacks. Require: Baseline KGE model 𝑔, Training triples  , Learning rate 𝜂,
Even with relational data leaks, targeted poisoning attacks are difficult. Distillation weight 𝛽, Distillation temperature 𝜏, Total iterations 𝐾
Compared with sharing entity embeddings, the sparsity of relation Initialization:
embeddings reduces the shadow models ability to align parameters 1: Initialize client-side prediction model with 𝜃0𝑃 = (𝜃0𝑆 , 𝜃0𝐿 ) ⊳ Local
with the victims vector space. However, FedR has almost no defense parameters randomly initialized
2: Initialize client-side communication model with reduced feature
effect against untargeted poisoning attacks.
dimensions
An untargeted poisoning attack means that the attacker aims to dis-
3: Initialize server-side aggregated parameters 𝜃1𝑆 = 𝜃0𝑆 ⊳ First round
rupt victim model convergence or maximize the mispredictions among
initialization
test cases. By maximizing the victims loss function during training,
Main Training Loop (Iterations 𝑘 = 1, 2, ..., 𝐾):
attackers can force non-convergent predictions. The attacker can gen-
// Client Update Phase (For each client)
erate the poisoned shared parameter 𝜃𝑆∗ for the victim, which can be
𝑣 4: for each client 𝑐𝐶 do
formalized in Eq. (6).
∑ 5: // Step 1: Communication to Prediction Model Distillation
arg max 𝐿(, 𝑟, 𝑡; 𝜃𝑆∗ , 𝜃𝐿𝑣 ) (6) 6: Load server-shared parameters 𝜃𝑘𝑆 ⊳ Latest global shared
𝜃∗𝑆𝑣 (,𝑟,𝑡)∈𝑣
𝑣
embeddings
𝐶𝐿
Among them, 𝜃𝐿𝑣 denotes the victims local parameters. 𝑣 is the 7: Initialize communication model with 𝜃 𝐶 = (𝜃𝑘𝑆 , 𝜃𝑘1 )
8: Freeze communication model parameters ⊳ Act as teacher
victims triplet set. Since it is difficult for the attacker to obtain these
model
two parameters directory, they can use random values as guesses for 𝑃
9: Compute distillation loss 𝐿𝑘 𝐾𝐷 using Equation (7) ⊳ Only
𝜃𝐿𝑣 and use triples of random combinations of 𝑣 and  as guesses for
positive samples
𝑣 . 𝑃
10: Compute KGE loss 𝐿𝑘 𝐾𝐺𝐸 on training triples 
In particular, for the TransE model [7] with the scoring function 𝑃 𝑃
𝑔(, 𝑟, 𝑡) = | + 𝑟 𝑡|, the attacker can launch an untargeted poisoning 11: Update prediction model parameters (𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 ) with:
𝑃 𝑃
attack by setting the shared parameter 𝜃𝑆′ sent to the victim to identical 12: ∇𝜃𝑘𝑃 = ∇(𝛽𝐿𝑘 𝐾𝐺𝐸 + (1 𝛽)𝐿𝑘 𝐾𝐷 )⊳ Gradient flows through
𝑣
value or using negative aggregation parameters. To avoid detection, prediction model only
𝑃 𝑃
noise is often added to poisoned parameters. The prediction perfor- 13: 𝜃𝑘 = 𝜃𝑘 𝜂∇𝜃𝑘𝑃 , 𝑤𝑒𝑟𝑒 𝜃𝑘 = {𝜃𝑘 𝐿 , 𝜃𝑘 𝑆 } ⊳ Update
mance of the victim model may even be lower than that of standalone prediction model parameters
training without federated aggregation. 14: Unfreeze communication model parameters
In general, the success of FKGE poisoning attacks relies on vic- 15: // Step 2: Prediction to Communication Model Distillation
tims using attacker-provided aggregate parameters directly for training 16: Freeze prediction model parameters 𝜃𝑘𝑃 ⊳ Used as teacher
without validation. To prevent poisoning attacks, it is critical to isolate model
𝐶
the parameters of the prediction model from externally provided aggre- 17: Compute distillation loss 𝐿𝑘 𝐾𝐷 using Equation (9) ⊳ Both
gate parameters. Specifically, potentially poisoned shared parameters samples
𝐶 𝐶
must be filtered before training. Meanwhile, minimizing parameter ex- 18: Update communication model parameters (𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 ) with
𝐶
posure to the external environment is essential. Therefore, we propose 19: ∇𝜃𝑘𝐶 = ∇𝐿𝑘 𝐾𝐷 ⊳ Gradient flows through communication
CoDFKGE, a defense FKGE framework based on co-distillation. model only
𝐶 𝐶
20: 𝜃𝑘 = 𝜃𝑘 𝜂∇𝜃𝑘𝐶 , 𝑤𝑒𝑟𝑒 𝜃𝑘 = {𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 }
𝐶
5. Model design 21: Upload updated shared parameters 𝜃𝑘 𝑆 to server
22: Unfreeze prediction model parameters
CoDFKGE is a training framework on the client side. Its training 23: end for
process is shown in Fig. 2. CoDFKGE initializes two baseline models // Server Aggregation Phase
with the same structure and scoring function, but for different purposes. 24: Server aggregates 𝜃𝑘𝑆 + 1 from all clients using baseline federated
The communication model is mainly responsible for receiving and aggregate method.
processing shared parameters, while the prediction model is used for 25: Set 𝑘 = 𝑘 + 1 and repeat main loop until 𝑘 > 𝐾 ⊳ Continue Main
the final embedding and prediction. To minimize potential parameter Training Loop
leakage and communication overhead, the feature dimension of the return Final prediction model parameters of each client.
communication model is intentionally designed to be smaller than that
of the prediction model.
During the training process, the two models learn collaboratively CoDFKGE is designed to be model-agnostic, enabling seamless in-
through knowledge distillation. Once the communication model re- tegration with diverse FKGE models based on their shared parameter
ceives the potentially poisoned shared parameters from the server, types. Both communication and prediction models used by CoDFKGE
it acts as a teacher model to transfer clean knowledge to the pre- clients utilize the same scoring function 𝑔 as the original KGE model.
diction model. Following the training of the prediction model, the Clients upload and utilize shared parameters identically to the baseline
roles are reversed: the prediction model becomes the teacher, and the
model, with these parameters maintaining the same form and dimen-
communication model serves as the student for distillation. This stage
sionality as the original implementation. This parameter compatibility
extracts knowledge from the prediction model and compresses it into
the communication model, ensuring efficient knowledge sharing while enables the server to aggregate updates using existing federated learn-
minimizing parameter exposure and communication overhead. By de- ing aggregation methods without modification. This design ensures that
ploying two distinct model instances, the framework physically isolates CoDFKGE preserves the original knowledge representation capabilities
attacker-injected parameters from the prediction models parameters, while maintaining consistent operational semantics with the baseline
making poisoning attacks significantly more difficult to execute. To model.
5
Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113
5.1. Communication to prediction model distillation of 𝑝 follows the approach in [9], with its mathematical formulation
provided in Eq. (10).
In the first iteration, the model trains the prediction component exp 𝜏 𝑔(,𝑟,𝑡 )
following the standard procedure. Starting from the second iteration of 𝑝(, 𝑟, 𝑡𝑖 ) = ∑ exp𝛼𝜏 𝑔(,𝑟,𝑡
𝑖
) (10)
𝑗 𝛼 𝑗
the training process, the communication model loads the server-shared
Where 𝜏𝛼 is the self-adversarial sampling temperature.
parameters 𝜃𝑘𝑆 and initializes itself jointly with the local embeddings
𝐿 from the previous iterations local prediction model. After the bidirectional distillation process of CoDFKGE, the com-
𝜃𝑘1 𝐶 𝐶
munication model parameters are updated to 𝜃𝑘 𝑆 and 𝜃𝑘 𝐿 . Client then
After the communication model receives and applies the server- 𝐶𝑆
uploads 𝜃𝑘 to the server, which aggregates these parameters from all
shared parameters, it filters out potentially poisoned model parameters
clients using federated averaging to generate the next rounds shared
through knowledge distillation. The communication model acts as a 𝑆 .
parameters 𝜃𝑘+1
teacher model to transfer clean knowledge to the prediction model,
which serves as the student model. During this process, the prediction
6. Experiments
model parameters are frozen to ensure that the knowledge transfer
direction is strictly from the communication model to the prediction
Experiments are conducted on the open available dataset FB15K-
model. Gradients only flow through the prediction model parameters,
237 [35], which is a subset of Freebase, containing 14,505 entities,
while the communication model parameters remain frozen, preventing
544,230 triples, and 474 relations. To perform federated learning, we
gradient leakage back to potentially poisoned shared parameters.
adopt the relational partitioning method in [22]. This method first
If the communication model suffers from poisoning attacks and
partitions the relationships through clustering, ensuring that the triple
contains the poisoning parameter, its outputs for negative samples are
relationships within each partition are as close as possible. Then, these
not reliable. Distilling or teaching such uncertain predictions would
partitions are divided into groups of roughly equal numbers of triples
propagate noise rather than useful knowledge. To exclude the poisoned
and distributed to the client. This results in tighter triple relationships
knowledge, the prediction model should focus on positive samples
within the client, better reflecting real-world scenarios.
during distillation, ensuring that only trustworthy knowledge is trans-
The TransE model [7] is selected as the KGE model, serving as
ferred. The mathematical expression for the distillation loss of the
the foundation for all federated learning methods in the experiments—
prediction model in the 𝑘th training epoch is provided in Eq. (7). including the attackers shadow model. To benchmark CoDFKGE, we
∑ ( ) select multiple baseline models. First, the local training model without
𝑃 𝑃𝐿 𝑃 𝑃
𝐿𝑘 𝐾𝐷 = 𝜏 2 𝐷𝐾𝐿 𝜎(𝑔(, 𝑟, 𝑡; 𝜃𝑘𝑆 , 𝜃𝑘1 )) ∥ 𝜎(𝑔(, 𝑟, 𝑡; 𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 ))
federated learning is selected as the KGE baseline model. It does not
(,𝑟,𝑡)∈
share parameters between clients, so it has no communication over-
(7) head and is not vulnerable to poisoning attacks. Then, FedE [13] and
Among them, 𝑡 is the distillation temperature coefficient, and 𝜎 is FedR [15] are also chosen as baseline FGKE models, representing stan-
dard approaches in the field. Additionally, we implement a knowledge
the softmax function of the ratio of the model output to 𝑡. 𝑔 represents
distillation model, which utilizes communication and prediction models
the scoring function of the prediction model, which is used to compute
𝑃𝐿 similar to CoDFKGE but only processes a unidirectional knowledge dis-
the KGE loss. 𝑔(, 𝑟, 𝑡; 𝜃𝑘𝑆 , 𝜃𝑘1 ) represents the communication model
𝑃𝐿 tillation. Specifically, it uses the communication model as the teacher
output under server-shared parameter 𝜃𝑘𝑆 and local parameter 𝜃𝑘1 , and
𝑃𝑆 𝑃𝐿 model and the prediction model as the student model to filter out
𝑔(, 𝑟, 𝑡; 𝜃𝑘 , 𝜃𝑘 ) represents the training prediction model output. poisoning knowledge, with the distillation loss function following Eq.
When training distillation, the model also needs to consider the (4).
KGE loss function. The overall loss function of the prediction model All experiments are performed on a 72-core Ubuntu 18.04.6 LTS
is the weighted sum of the KGE loss and the distillation loss, and its machine with an Intel(R) Xeon(R) Gold 5220 CPU @ 2.20 GHz and
mathematical expression is shown in Eq. (8). a V100S-PCIE-32GB GPU. We implemented the proposed FKGE frame-
𝑃
𝐿𝑃𝑘 = 𝛽𝐿𝑘 𝐾𝐺𝐸 + (1 𝛽)𝐿𝑘 𝐾𝐷
𝑃
(8) work and baseline model based on PyTorch Geometric [36] and dis-
tributed AI framework Ray [37]. We used KGE hyperparameter settings
𝑃𝑘
Where, 𝐿𝐾𝐺𝐸 is the KGE loss of the 𝑘th epoch of the prediction model based on [9] and FKGE hyperparameter settings based on FedE [13].
defined by Eq. (1), and 𝛽 is the weight. Specifically, we used the Adam [38] optimizer with a learning rate of
1e-3. 𝛾 is 10, and self-advertise negative sampling temperature 𝜏𝛼 in
5.2. Prediction to communication model distillation KGE is 1. The distillation temperature 𝜏 is 2, and the coefficient 𝛽 of
distillation and KGE loss are both 0.5. The maximum training epoch
After training the prediction model, we train the communication is 400. In each epoch, the client performs 3 iterations locally before
model through distillation, which extracts and propagates knowledge uploading the parameters to the server.
without directly sharing prediction parameters, thereby avoiding pri- We utilize the link prediction task, a sub-task of KGE, to validate the
vacy leakage. During the communication models distillation, the out- models accuracy. Referencing the common implementation of the link
put of the prediction model under positive and negative samples serves prediction, we employ the Mean Reciprocal Rank (MRR) and Hits@N as
as soft labels. As Eq. (1) illustrates, the loss function must account accuracy metrics. The MRR is the average of the reciprocals of the ranks
for the probability of negative samples when balancing the impact of the predicted triples among all possible triples. Mathematically, if
of positive and negative predictions. Therefore, the distillation loss 𝑟𝑎𝑛𝑘𝑖 is the rank of the correct triple for the 𝑖th query, and 𝑛 is the
function of the communication model is formalized in Eq. (9). total number of queries, then 𝑀𝑅𝑅 = 1𝑛 𝑛𝑖=1 𝑟𝑎𝑛𝑘 1
. The Hits@N is the
𝑖
∑ proportion of query triples for which the correct triple is present among
𝐶𝑘 𝑃 𝑃 𝐶 𝐶
𝐿𝐾𝐷 = 𝜏2 (𝐷𝐾𝐿 (𝜎(𝑔(, 𝑟, 𝑡; 𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 )) ∥ 𝜎(𝑔(, 𝑟, 𝑡; 𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 )))) the top 𝑁 candidates generated by the model. Generally, higher values
∑(,𝑟,𝑡)∈ for both metrics indicate better model performance in link prediction.
𝑃 𝑃 𝐶 𝐶
+ 𝑝(, 𝑟, 𝑡𝑖 )𝐷𝐾𝐿 (𝜎(𝑔(, 𝑟, 𝑡𝑖 ; 𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 ) ∥ 𝜎(𝑔(, 𝑟, 𝑡𝑖 ; 𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 )))) Through experiments, the following research questions will be ver-
𝑖 ified.
(9)
𝐶 𝐶
RQ1 Does CoDFKGE maintain KGE prediction performance while re-
Among them, 𝑔(, 𝑟, 𝑡; 𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 ) represents the communication model ducing FKGE communication overhead?
𝑃𝑆 𝑃𝐿
output. 𝑔(, 𝑟, 𝑡; 𝜃𝑘 , 𝜃𝑘 ) represents the prediction model output under RQ2 Can CoDFKGE effectively defend against targeted poisoning at-
𝑃 𝑃
shared parameter 𝜃𝑘 𝑆 and local parameter 𝜃𝑘 𝐿 . The calculation method tacks?
6
Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113
Table 1
Experiment result on normal link prediction.
Fed type Model Mem(MB) CC(MB) MRR Hits@1 Hits@5 Hits@10
Local Local(128) 57.05 0.4081 ± 0.0015 0.3066 ± 0.0014 0.5223 ± 0.0023 0.6077 ± 0.0015
Entity FedE(128) 185.58 42.60 0.4082 ± 0.0004 0.3068 ± 0.0012 0.5232 ± 0.0013 0.6080 ± 0.0018
Entity Distillation (128-128) 356.10 42.60 0.4129 ± 0.0008 0.3118 ± 0.0016 0.5279 ± 0.0008 0.6122 ± 0.0003
Entity CoDFKGE (128-128) 356.10 42.60 0.4109 ± 0.0043 0.3097 ± 0.0041 0.5246 ± 0.0044 0.6087 ± 0.0040
Entity Distillation (32-128) 217.39 10.65 0.3914 ± 0.0011 0.2935 ± 0.0008 0.5005 ± 0.0014 0.5838 ± 0.0032
Entity CoDFKGE (32-128) 217.40 10.65 0.4090 ± 0.0010 0.3079 ± 0.0007 0.5233 ± 0.0019 0.6068 ± 0.0019
Relation FedR(128) 75.49 0.69 0.4085 ± 0.0011 0.3079 ± 0.0021 0.5219 ± 0.0016 0.6066 ± 0.0017
Relation Distillation (128-128) 151.74 0.69 0.4106 ± 0.0013 0.3092 ± 0.0023 0.5242 ± 0.0008 0.6098 ± 0.0009
Relation CoDFKGE (128-128) 150.02 0.69 0.4065 ± 0.0007 0.3056 ± 0.0013 0.5190 ± 0.0023 0.6063 ± 0.0012
Relation Distillation (32-128) 94.53 0.17 0.3920 ± 0.0012 0.2960 ± 0.0007 0.4996 ± 0.0019 0.5807 ± 0.0013
Relation CoDFKGE (32-128) 93.69 0.17 0.4078 ± 0.0009 0.3060 ± 0.0007 0.5224 ± 0.0031 0.6074 ± 0.0015
RQ3 Can CoDFKGE effectively defend against untargeted poisoning 6.2. Targeted poisoning attack experiment (RQ2)
attacks?
RQ4 Do the two proposed distillation loss functions individually con- In the targeted poisoning attack, 32 pairs of non-existent triples
tribute to poisoning defense? are selected as attack targets from the victims KG through negative
sampling to construct a poisoned triple dataset. First, a predetermined
6.1. Normal link prediction (RQ1) number of normal triples are selected from the victims training triples.
Subsequently, the head or tail nodes of these triples are randomly re-
To explore the performance of the proposed model in normal link placed, and any triples already existing in the training set are iteratively
prediction, we first tested the model on a conventional dataset. The removed until 32 pairs of non-existent triples are successfully con-
performance of the model is measured using MRR and Hits@1, Hits@5, structed. In each epoch, the shadow model undergoes the same number
and Hits@10. The model is trained by federated learning and evaluated of local training rounds as legitimate clients on the poisoned dataset to
on the local test sets of clients. generate poisoned parameters. The malicious server aggregates these
Table 1 lists the performance of the local KGE model, FedE, FedR, poisoned parameters with the parameters of the normal client into
and CoDFKGE with different dimensions. The experimental results are shared parameters and distributes them to all clients. Attackers can
grouped according to the type of shared embeddings and the dimension assign high weights to poisoned model parameters during aggregation.
of the prediction model. The parameter dimensions are specified in Following the setup in Ref. [33], we set the weight of the attackers
parentheses within the Model column. For example, CoDFKGE(32- aggregated poisoned triples to be 256 times that of normal triples.
128) denotes the CoDKGE model with a 32-dimensional communication Experiments focus on models with shared entity parameters (required
model and a 128-dimensional prediction model. All link prediction
for targeted poisoning attacks) and non-federated local baselines.
experiments were repeated 5 times with different random seeds, and
For space considerations, this section reports only MRR and
the accuracy results of all models are reported as (mean ± standard
Hits@10 metrics. Attack effectiveness is measured by the MRR and
deviation). The best performing model results in each group (excluding
Hits@10 of poisoned triples on the victim. The higher metrics of the
the local model) are bolded. The results of the CoDFKGE (32-128)
poisoned triples indicate greater vulnerability to poisoning and weaker
model that are better than those of Distillation(32-128) are underlined.
resistance of the model to targeted poisoning attacks.
The performance of locally trained models is lower than most feder-
Table 2 lists the performance of baseline models and CoDFKGE
ated learning models, highlighting the advantages of sharing model pa-
under targeted poisoning attacks, grouped by the prediction model
rameters. High-dimensional distillation(128-128) models achieve better
dimension. The parameter dimensions are specified in parentheses
link prediction performance. Compared to distillation(128-128), CoD-
within the Model column. The All Clients column reports av-
FKGE models show slightly inferior prediction performance. However,
erage performance across all clients test sets during attacks, while
by comparing models with the same dimensions, CoDFKGE outperform
both local baselines and federated baselines (FedE, FedR). The co- Victim Poisoned measures the victims performance on predicting
distillation process in CoDFKGE may lead to a loss of generalization poisoned triples. All experiments were repeated 5 times with differ-
accuracy. We believe that the main advantage of CoDFKGE is its ent random seeds, and the results are reported as (mean ± standard
ability to enhance the security of FKGE. In addition to the security deviation). The best performing model results are bolded. Moreover,
performance demonstrated in Sections 6.2 and 6.3, it also maintains the Communication Poison column highlights the communication
link prediction performance comparable to its baseline FKGE models. models performance on poisoned triples for CoDFKGE and the dis-
Beyond accuracy metrics, the CC (Communication Cost) column tillation model, demonstrating that both communication models are
reports the communication overhead per training epoch, which is impacted by targeted poisoning attacks. Through distillation, the pre-
calculated based on the byte size of PyTorch Embedding used in the diction accuracy of poisoned triples by the prediction model decreases
implementation. The Mem column shows the GPU memory usage in both cases.
of federated models in MB. Distillation-based model requires main- For targeted poisoning attacks, the primary evaluation metrics
taining two KGE models, resulting in higher computational resource should be the MRR and Hits@10 performance indicators of the victim
consumption. Distillation-based models need larger GPU memory to model when predicting poisoned triples. The Local training model,
store the parameters of both models. Compared to using model pa- which does not employ federated learning, remains immune to poi-
rameters of the same size, distillation-based models allow to compress soning attacks, resulting in low MRR for poisoned triples, with the
parameters in the communication model, achieving significantly lower Hits@10 value being exactly 0. This indicates that the unpoisoned Local
communication overhead. In cases of smaller communication overhead, model does not include non-existent poisoned triples among its top
CoDFKGE(32-128) outperforms distillation(32-128) in link prediction 10 candidate results when making predictions. If a model incorrectly
performance. Therefore, we believe that the CoDFKGE model does marks non-existent poisoned test triples as one of the top 10 candidates,
not degrade the normal link prediction performance of baseline FKGE it demonstrates that the poisoning attack has successfully manipulated
models and can effectively reduce the communication overhead of the the models predictions. Therefore, we use Hits@10 as the metric to
model. measure the Attack Success Rate (ASR).
7
Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113
Table 2
Experiment result under targeted poisoning attack.
Model All clients Victim poison Communication poison
MRR Hits@10 MRR Hits@10(ASR) MRR Hits@10
Local(128, unpoisoned) 0.4081 ± 0.0015 0.6077 ± 0.0015 0.0003 ± 0.0001 0.0000 ± 0.0000
FedE(128) 0.4034 ± 0.0035 0.6004 ± 0.0029 0.4450 ± 0.0938 0.7857 ± 0.1248
Distillation(128-128) 0.4026 ± 0.0025 0.6006 ± 0.0039 0.0844 ± 0.0552 0.2000 ± 0.1311 0.4999 ± 0.1429 0.7714 ± 0.1046
CoDFKGE(128-128) 0.4086 ± 0.0007 0.6089 ± 0.0012 0.0010 ± 0.0003 0.0009 ± 0.0005 0.4694 ± 0.1511 0.6589 ± 0.1242
Distillation(32-128) 0.3821 ± 0.0022 0.5717 ± 0.0018 0.1511 ± 0.3356 0.1960 ± 0.4362 0.4919 ± 0.2364 0.6625 ± 0.1887
CoDFKGE(32-128) 0.3856 ± 0.0039 0.5740 ± 0.0054 0.0010 ± 0.0001 0.0010 ± 0.0003 0.3794 ± 0.0032 0.5702 ± 0.005
Fig. 3. Performance degradation comparison.
The FedE model maintains high prediction accuracy on normal communication model in CoDFKGE(32-128) less susceptible to poison-
test triples when under attack, but exhibits abnormally high MRR and ing attacks.
Hits@10 metrics for targeted poisoned triples, even exceeding those
of normal triples. This indicates that targeted poisoning attacks can 6.3. Untargeted poisoning attack experiment (RQ3)
effectively manipulate the FedE model to generate incorrect prediction
results. Similarly, in distillation-based models, their communication In untargeted poisoning attack experiments, the attacker returns
models are severely affected by poisoning attacks, while the impact on negative aggregate parameters to the victim client, making the victim
the prediction models is relatively minor. Although the distill(128-128) model non-converge and degrading prediction performance. The results
model can partially eliminate poisoning knowledge, it still remains vul- presented in this section reflect average prediction performance on
nerable to the targeted poisoning attacks. Moreover, as the dimension local test triples of clients.
of the communication model parameter increases, the extent of the Table 3 lists the performance of each model under untargeted
models vulnerability to poisoning attacks also grows. poisoning attacks, grouped by the prediction model dimension and
In contrast, CoDFKGEs prediction model performs distillation learn- federated type. The parameter dimensions are specified in parenthe-
ing exclusively on verified positive samples, effectively eliminating ses within the Model column. The All Clients column shows the
potential poisoning knowledge that might exist in negative samples. average performance of all clients under untargeted poisoning attacks,
Similar to the Local training model, CoDFKGE achieves extremely low and the Victim Client column shows the performance of the victim
MRR and Hits@10 metrics for poisoned triples, which fully demon- client. To measure the severity of the model being attacked, the MRR of
strates that the CoDFKGE model can effectively defend against targeted the local model in Table 1 is used as a benchmark. The Decay Ratio
poisoning attacks in FKGE. Furthermore, due to the compression of column shows the ratio of performance degradation on the victim
the communication models dimension, the amount of information client compared to the local model shown in Table 1. All experiments
that attackers can transmit is correspondingly reduced, making the were repeated 5 times with different random seeds, and the results
8
Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113
Table 3
Experiment result under untargeted poisoning attack.
Fed Type Model All clients Victim Decay ratio (%)
MRR Hits@10 MRR Hits@10 MRR Hits@10
Entity FedE(128) 0.3896 ± 0.0010 0.5939 ± 0.0009 0.3625 ± 0.0102 0.5620 ± 0.0144 11.21 7.58
Entity Distillation(128-128) 0.3900 ± 0.0017 0.5921 ± 0.0007 0.3641 ± 0.0012 0.5664 ± 0.0018 11.82 7.54
Entity CoDFKGE(128-128) 0.4084 ± 0.0007 0.6068 ± 0.0003 0.4017 ± 0.0010 0.6009 ± 0.0005 2.25 1.28
Entity Distillation (32-128) 0.3024 ± 0.0208 0.5422 ± 0.0105 0.2739 ± 0.0264 0.5262 ± 0.0124 30.02 9.49
Entity CoDFKGE (32-128) 0.4093 ± 0.0018 0.6081 ± 0.0014 0.4022 ± 0.0022 0.6023 ± 0.0011 1.66 0.75
Relation FedR(128) 0.3915 ± 0.0010 0.5951 ± 0.0016 0.3637 ± 0.0093 0.5636 ± 0.0150 10.96 7.10
Relation Distillation(128-128) 0.3978 ± 0.0017 0.6022 ± 0.0019 0.3881 ± 0.0023 0.5942 ± 0.0028 5.51 2.56
Relation CoDFKGE(128-128) 0.4086 ± 0.0017 0.6075 ± 0.0029 0.4014 ± 0.0020 0.6018 ± 0.0037 1.24 0.75
Relation Distillation (32-128) 0.3058 ± 0.0079 0.5463 ± 0.0029 0.2787 ± 0.0101 0.5307 ± 0.0038 27.78 8.61
Relation CoDFKGE (32-128) 0.4090 ± 0.0008 0.6066 ± 0.0011 0.4026 ± 0.0008 0.6018 ± 0.0013 1.27 0.92
Table 4
Ablation study in normal link prediction and under targeted attack.
Model Link prediction Targeted all clients Targeted victim poisoning
MRR Hits@10 MRR Hits@10 MRR Hits@10 (targeted poisoning ASR)
CoDFKGE 0.4112 ± 0.0039 0.6084 ± 0.0036 0.4086 ± 0.0007 0.6089 ± 0.0012 0.0010 ± 0.0003 0.0009 ± 0.0005
Ablation(Comm) 0.4095 ± 0.0016 0.6074 ± 0.0014 0.4086 ± 0.0022 0.6076 ± 0.0021 0.0017 ± 0.0008 0.0013 ± 0.0008
Ablation(Pred) 0.4132 ± 0.0006 0.6116 ± 0.0012 0.4098 ± 0.0011 0.6080 ± 0.0009 0.8086 ± 0.0064 0.9702 ± 0.0228
are reported as (mean ± standard deviation). The best and second best were repeated 5 times with different random seeds, and the results are
results in each group have been marked in bold and underline. reported as (mean ± standard deviation). The best results are bolded.
From the experimental results, it can be observed that when sub- Experimental results demonstrate that while Ablation(Pred) per-
jected to untargeted poisoning attacks, the CoDFKGE series models forms well in conventional link prediction, its resistance to poisoning
achieve optimal MRR and Hits@10 performance metrics compared to attacks lags behind the other two models due to not employing a
other models. In this context, all models exhibit varying degrees of negative sample exclusion strategy in its loss function. Among the re-
decline in both their overall performance metrics and their performance maining two models, while both demonstrate robust resilience against
metrics on victims. In Fig. 3, we present a comparison of the predic- poisoning attacks, the CoDFKGE model achieves superior link pre-
tion performance of various models under normal link prediction and diction performance compared to Ablation(Comm). Ablation(Comm)
untargeted poisoning attack scenarios. It can be observed that the Dis- employs a baseline loss function during the distillation training of
tillation(32-128) model experiences the most significant performance the communication model. In contrast, the CoDFKGE model adopts
degradation; for Distillation(128-128), FedE, and FedR models, their the approach from [9] and utilizes self-adversarial sampling temper-
performance degradation is also substantial and cannot be ignored. ature 𝜏𝛼 to reweight negative samples, thereby enhancing the models
These models directly incorporate poisoned global knowledge as an ability to distinguish between negative samples. Overall, the ablation
integral part of their own models, causing the convergence process of experiments demonstrate that applying the proposed distillation loss
the models to be adversely affected. In contrast, the performance degra- functions simultaneously enhances the models capability in defending
dation of CoDFKGE models is fully within 3%. This is because even in against poisoning attacks and link prediction.
the absence of global knowledge, the prediction model of CoDFKGE still
7. Conclusion
utilizes local data knowledge for training, and its training effectiveness
is comparable to that of local KGE models without knowledge sharing.
This paper proposes CoDFKGE, a co-distillation-based defense
Baseline models may have their results manipulated or exhibit
framework for FKGE poisoning attacks. As the first co-distillation
significant performance degradation when facing poisoning attacks.
defense framework against poisoning attacks in FKGE, CoDFKGE does
Although in link prediction experiments, distillation models exhibited
have some limitations. First, maintaining two separate models requires
advantages in performance, their defense effectiveness is extremely
higher computational resource consumption on clients. Second, the
limited when facing poisoning attacks. In contrast, CoDFKGE remains
bidirectional distillation process may lead to a loss of generalization
unmanipulated when encountering targeted poisoning attacks and does
accuracy. In contrast, CoDFKGEs advantages lie in its model-agnostic
not exhibit significant performance degradation when subjected to
applicability to existing FKGE models without compromising perfor-
untargeted poisoning attacks, demonstrating its effective defense capa- mance. By decoupling clients prediction models from shared parameter
bility against poisoning attacks. models, CoDFKGE effectively filters out poisoned knowledge embedded
in shared updates. CoDFKG eliminates malicious manipulations under
6.4. Ablation study (RQ4) targeted poisoning attacks, and significantly mitigates accuracy degra-
dation under untargeted poisoning attacks. Leveraging distillation,
This section evaluates the defensive effects of applying different the framework further reduces communication overhead. This work
loss functions in CoDFKGE against poisoning attacks. Specifically, we provides new ideas for enhancing the security of FKGE.
compare the performance of models using 128-dimensional training The limitations of FKGE poisoning defense research are partially
parameters for both communication and prediction models across nor- rooted in the unique characteristics of KGE. When considering
mal link prediction, targeted poisoning attack scenarios, and untargeted translation-based KGE models in FKGE, sharing entity or relation
poisoning attack scenarios. Two ablation baselines were implemented: embeddings introduces risks related to both privacy preservation and
Ablation(Comm) applies the baseline loss function (Eq. (4)) solely poisoning attacks. Employing GNN-based KGE models in FKGE that
during the communication modules distillation, while Ablation(Pred) transmit GNN parameters or gradients can alleviate these concerns.
uses it exclusively for the prediction modules distillation. However, due to their superior robustness to sparse data and lower
Tables 4 and 5 shows the experiment results of models with different computational resource requirements, translation-based models still
distillation loss functions sharing entity embeddings. All experiments maintain unparalleled advantages in specific application scenarios.
9
Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113
Table 5
Ablation study under untargeted attack.
Model Untargeted all clients Untargeted victim Decay ratio (%)
MRR Hits@10 MRR Hits@10 MRR Hits@10
CoDFKGE 0.4084 ± 0.0007 0.6068 ± 0.0003 0.4017 ± 0.0010 0.6009 ± 0.0005 2.25 1.27
Ablation(Comm) 0.4056 ± 0.0017 0.6062 ± 0.0011 0.3996 ± 0.0018 0.6003 ± 0.0013 2.42 1.16
Ablation(Pred) 0.3951 ± 0.0011 0.6022 ± 0.0008 0.3852 ± 0.0009 0.5951 ± 0.0005 6.76 2.69
For future research, we recommend exploring the application of the [8] Z. Wang, J. Zhang, J. Feng, Z. Chen, Knowledge graph embedding by translating
CoDFKGE framework in more complex real-world scenarios, such as on hyperplanes, in: Proceedings of the AAAI Conference on Artificial Intelligence,
vol. 28, 2014.
personalized FKGE problems. Additionally, in large-scale dynamic KG
[9] Z. Sun, Z.-H. Deng, J.-Y. Nie, J. Tang, Rotate: Knowledge graph embedding by
environments, the security landscape for FKGE may undergo signifi- relational rotation in complex space, 2019, arXiv preprint arXiv:1902.10197.
cant changes, necessitating further investigation into defense methods [10] Z. Zhang, J. Jia, Y. Wan, Y. Zhou, Y. Kong, Y. Qian, J. Long, Transr*: Repre-
tailored to these evolving scenarios. sentation learning model by flexible translation and relation matrix projection,
J. Intell. Fuzzy Systems 40 (5) (2021) 1025110259.
[11] T. Dettmers, P. Minervini, P. Stenetorp, S. Riedel, Convolutional 2d knowl-
CRediT authorship contribution statement edge graph embeddings, in: Proceedings of the AAAI Conference on Artificial
Intelligence, vol. 32, (1) 2018.
Yiqin Lu: Supervision. Jiarui Chen: Writing original draft, Soft- [12] B. McMahan, E. Moore, D. Ramage, S. Hampson, B.A. y Arcas, Communication-
efficient learning of deep networks from decentralized data, in: Artificial
ware, Methodology. Jiancheng Qin: Writing review & editing.
Intelligence and Statistics, PMLR, 2017, pp. 12731282.
[13] M. Chen, W. Zhang, Z. Yuan, Y. Jia, H. Chen, Fede: Embedding knowledge graphs
Declaration of Generative AI and AI-assisted technologies in the in federated setting, in: Proceedings of the 10th International Joint Conference
writing process on Knowledge Graphs, 2021, pp. 8088.
[14] M. Chen, W. Zhang, Z. Yuan, Y. Jia, H. Chen, Federated knowledge graph
During the preparation of this work the author(s) used deepseek in completion via embedding-contrastive learning, Knowl.-Based Syst. 252 (2022)
109459.
order to improve language and readability. After using this tool/service, [15] K. Zhang, Y. Wang, H. Wang, L. Huang, C. Yang, X. Chen, L. Sun, Efficient fed-
the author(s) reviewed and edited the content as needed and take(s) full erated learning on knowledge graphs via privacy-preserving relation embedding
responsibility for the content of the publication. aggregation, 2022, arXiv preprint arXiv:2203.09553.
[16] G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network,
2015, arXiv preprint arXiv:1503.02531.
Declaration of competing interest [17] N. Papernot, P. McDaniel, X. Wu, S. Jha, A. Swami, Distillation as a de-
fense to adversarial perturbations against deep neural networks, in: 2016 IEEE
The authors declare that they have no known competing finan- Symposium on Security and Privacy, SP, IEEE, 2016, pp. 582597.
cial interests or personal relationships that could have appeared to [18] K. Yoshida, T. Fujino, Countermeasure against backdoor attack on neural
networks utilizing knowledge distillation, J. Signal Process. 24 (4) (2020)
influence the work reported in this paper.
141144.
[19] K. Yoshida, T. Fujino, Disabling backdoor and identifying poison data by
Acknowledgment using knowledge distillation in backdoor attacks on deep neural networks, in:
Proceedings of the 13th ACM Workshop on Artificial Intelligence and Security,
2020, pp. 117127.
This work is supported by the Special Project for Research and [20] R. Anil, G. Pereyra, A. Passos, R. Ormandi, G.E. Dahl, G.E. Hinton, Large
Development in Key Areas of Guangdong Province, under Grant scale distributed neural network training through online distillation, 2018, arXiv
2019B010137001. preprint arXiv:1804.03235.
[21] Y. Hu, W. Liang, R. Wu, K. Xiao, W. Wang, X. Li, J. Liu, Z. Qin, Quantifying and
defending against privacy threats on federated knowledge graph embedding, in:
Data availability
Proceedings of the ACM Web Conference 2023, 2023, pp. 23062317.
[22] X. Zhu, G. Li, W. Hu, Heterogeneous federated knowledge graph embedding
Data will be made available on request. learning and unlearning, in: Proceedings of the ACM Web Conference 2023,
2023, pp. 24442454.
[23] X. Zhang, Z. Zeng, X. Zhou, Z. Shen, Low-dimensional federated knowledge graph
embedding via knowledge distillation, 2024, arXiv preprint arXiv:2408.05748.
References
[24] Y. Liu, Z. Sun, G. Li, W. Hu, I know what you do not know: Knowledge
graph embedding via co-distillation learning, in: Proceedings of the 31st ACM
[1] X. Zhao, H. Chen, Z. Xing, C. Miao, Brain-inspired search engine assistant based International Conference on Information & Knowledge Management, 2022, pp.
on knowledge graph, IEEE Trans. Neural Netw. Learn. Syst. 34 (8) (2021) 13291338.
43864400. [25] F. Xia, W. Cheng, A survey on privacy-preserving federated learning against
[2] S. Sharma, Fact-finding knowledge-aware search engine, in: Data Management, poisoning attacks, Clust. Comput. 27 (10) (2024) 1356513582.
Analytics and Innovation: Proceedings of ICDMAI 2021, vol. 2, Springer, 2021, [26] J. Chen, H. Yan, Z. Liu, M. Zhang, H. Xiong, S. Yu, When federated learning
pp. 225235. meets privacy-preserving computation, ACM Comput. Surv. (ISSN: 0360-0300)
[3] Y. Jiang, Y. Yang, L. Xia, C. Huang, DiffKG: Knowledge graph diffusion model for 56 (12) (2024).
recommendation, in: Proceedings of the 17th ACM International Conference on [27] J. Xia, Z. Yue, Y. Zhou, Z. Ling, Y. Shi, X. Wei, M. Chen, Waveattack: Asymmetric
Web Search and Data Mining, WSDM 24, Association for Computing Machinery, frequency obfuscation-based backdoor attacks against deep neural networks, Adv.
New York, NY, USA, ISBN: 9798400703713, 2024, pp. 313321. Neural Inf. Process. Syst. 37 (2024) 4354943570.
[4] W. Wang, X. Shen, B. Yi, H. Zhang, J. Liu, C. Dai, Knowledge-aware fine-grained [28] P. Blanchard, E.M. El Mhamdi, R. Guerraoui, J. Stainer, Machine learning with
attention networks with refined knowledge graph embedding for personalized adversaries: Byzantine tolerant gradient descent, Adv. Neural Inf. Process. Syst.
recommendation, Expert Syst. Appl. 249 (2024) 123710. 30 (2017).
[5] J. Chen, Y. Lu, Y. Zhang, F. Huang, J. Qin, A management knowledge graph [29] N.M. Jebreel, J. Domingo-Ferrer, Fl-defender: Combating targeted attacks in
approach for critical infrastructure protection: Ontology design, information ex- federated learning, Knowl.-Based Syst. 260 (2023) 110178.
traction and relation prediction, Int. J. Crit. Infrastruct. Prot. (ISSN: 1874-5482) [30] Z. Yue, J. Xia, Z. Ling, M. Hu, T. Wang, X. Wei, M. Chen, Model-contrastive
43 (2023) 100634. learning for backdoor elimination, in: Proceedings of the 31st ACM International
[6] Y. Zhang, J. Chen, Z. Cheng, X. Shen, J. Qin, Y. Han, Y. Lu, Edge propagation Conference on Multimedia, 2023, pp. 88698880.
for link prediction in requirement-cyber threat intelligence knowledge graph, [31] H. Peng, H. Li, Y. Song, V. Zheng, J. Li, Differentially private federated
Inform. Sci. (ISSN: 0020-0255) 653 (2024) 119770. knowledge graphs embedding, in: Proceedings of the 30th ACM International
[7] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, O. Yakhnenko, Translating Conference on Information & Knowledge Management, CIKM 21, Association
embeddings for modeling multi-relational data, Adv. Neural Inf. Process. Syst. for Computing Machinery, New York, NY, USA, ISBN: 9781450384469, 2021,
26 (2013). pp. 14161425.
10
Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113
[32] Y. Hu, Y. Wang, J. Lou, W. Liang, R. Wu, W. Wang, X. Li, J. Liu, Z. Qin, Privacy [36] M. Fey, J.E. Lenssen, Fast graph representation learning with PyTorch Geometric,
risks of federated knowledge graph embedding: New membership inference in: ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
attacks and personalized differential privacy defense, IEEE Trans. Dependable [37] P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang, M. Elibol,
Secur. Comput. (2024). Z. Yang, W. Paul, M.I. Jordan, I. Stoica, Ray: A distributed framework for
[33] E. Zhou, S. Guo, Z. Ma, Z. Hong, T. Guo, P. Dong, Poisoning attack on federated emerging AI applications, in: 13th USENIX Symposium on Operating Systems
knowledge graph embedding, in: Proceedings of the ACM Web Conference 2024, Design and Implementation (OSDI 18), USENIX Association, Carlsbad, CA, ISBN:
2024, pp. 19982008. 978-1-939133-08-3, 2018, pp. 561577.
[34] G. Xia, J. Chen, C. Yu, J. Ma, Poisoning attacks in federated learning: A survey, [38] D.P. Kingma, J. Ba, Adam: A method for stochastic optimization, 2014, arXiv
Ieee Access 11 (2023) 1070810722. preprint arXiv:1412.6980.
[35] K. Toutanova, D. Chen, P. Pantel, H. Poon, P. Choudhury, M. Gamon, Repre-
senting text for joint embedding of text and knowledge bases, in: Proceedings
of the 2015 Conference on Empirical Methods in Natural Language Processing,
2015, pp. 14991509.
11