Computer Standards & Interfaces 97 (2026) 104113


                                                                  Contents lists available at ScienceDirect


                                                        Computer Standards & Interfaces
                                                           journal homepage: www.elsevier.com/locate/csi


Co-distillation-based defense framework for federated knowledge graph
embedding against poisoning attacks
                                      ∗
Yiqin Lu, Jiarui Chen                  , Jiancheng Qin
School of Electronic and Information Engineering, South China University of Technology, 510641, China


ARTICLE                  INFO                            ABSTRACT

Keywords:                                                Federated knowledge graph embedding (FKGE) enables collaborative knowledge sharing without data ex-
Federated learning                                       change, but it also introduces risks of poisoning attacks that degrade model accuracy or force incorrect
Knowledge graph                                          outputs. Protecting FKGE from poisoning attacks becomes a critical research problem. This paper reveals
Poisoning attack
                                                         the malicious strategy of untargeted FKGE poisoning attacks and proposes CoDFKGE, a co-distillation-based
Knowledge distillation
                                                         FKGE framework for defending against poisoning attacks. CoDFKGE deploys two collaborative knowledge
                                                         graph embedding models on clients, decoupling prediction parameters from shared parameters as a model-
                                                         agnostic solution. By designing distinct distillation loss functions, CoDFKGE transfers clean knowledge from
                                                         potentially poisoned shared parameters while compressing dimensions to reduce communication overhead.
                                                         Experiments show CoDFKGE preserves link prediction performance with lower communication costs, eliminates
                                                         malicious manipulations under targeted poisoning attacks, and significantly mitigates accuracy degradation
                                                         under untargeted poisoning attacks.


1. Introduction                                                                               embedding for entities and relations. However, real-world KGs of dif-
                                                                                              ferent organizations are often incomplete, making it difficult to train
    Knowledge graphs (KGs) are structured representations of real-                            high-quality knowledge graph reasoning models. Moreover, KG data
world entities and their relationships, supporting applications in search                     often contains a large amount of private data, and direct data sharing
engines [1,2], recommendation systems [3,4], and security analysis [5,                        will inevitably lead to privacy leakage. For this reason, federated
6]. Knowledge graph embedding (KGE) techniques project entities                               learning [12] is introduced into knowledge graph reasoning.
and relations into low-dimensional vector spaces, enabling efficient
                                                                                                  FKGE assumes that there are multiple participants with comple-
knowledge reasoning and completion [7]. Due to privacy regulations
                                                                                              mentary but incomplete KGs, aiming to derive optimal knowledge
and data sensitivity requirements, KGs across organizations within the
                                                                                              embeddings for each participant without data exchange. Most existing
same domain remain fragmented despite growing data volumes. In this
context, federated knowledge graph embedding (FKGE) emerges as a                              studies [13–15] model FKGE as multiple clients that maintain local
collaborative learning technique for sharing KG embeddings without                            KGE models and a central server. Clients train models locally and
data exchange. However, the introduction of federation mechanisms                             upload the model parameters to the central server, which aggregates
will bring new privacy risks. malicious participants can inject poisoned                      the parameters and then returns them to the clients.
parameters during training or aggregation to launch a poisoning attack,                           However, since the embedding vectors are directly the model pa-
degrading model accuracy or forcing incorrect outputs. Consequently,                          rameters, FKGE is highly vulnerable to poisoning attacks. With the
protecting FKGE systems against poisoning attacks has emerged as a                            intent to reduce model performance, steal sensitive information, or dis-
critical research challenge.                                                                  rupt system stability, poisoning attacks refer to malicious modifications
    Unlike graph neural network (GNN)-based models, KGE models                                of parameters during local training or parameter aggregation on the
usually rely on the translation-based model [8–11]. The embedding
                                                                                              server. To protect the participants of FKGE, it is necessary to propose
vectors of entity and relation in the KG are directly used as learnable
                                                                                              a protection mechanism against FKGE poisoning attacks.
parameters. KGE models utilize different score functions to measure
                                                                                                  Moreover, other related indicators in FKGE deserve attention. For
the plausibility of triples (h,r,t). By contrasting the outputs of existing
triples and negatively sampled triples, KGE models derive appropriate                         example, the federated learning of KGE requires frequent parameter


  ∗ Corresponding author.
     E-mail addresses: eeyqlu@scut.edu.cn (Y. Lu), ee_jrchen@mail.scut.edu.cn (J. Chen), jcqin@scut.edu.cn (J. Qin).

https://doi.org/10.1016/j.csi.2025.104113
Received 3 June 2025; Received in revised form 8 November 2025; Accepted 8 December 2025
Available online 9 December 2025
0920-5489/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Y. Lu et al.                                                                                                       Computer Standards & Interfaces 97 (2026) 104113


exchange, and the use of a translation-based model will submit the en-            2.3. Poisoning attack in federated learning
tity or relation embeddings, which makes the communication overhead
greater than that of traditional federated learning.                                  Federated Learning (FL), due to its distributed training nature,
    Knowledge distillation [16] is a model compression technique that             creates favorable conditions for poisoning attacks while protecting
improves the performance of a simple (student) model by transfer-                 data privacy. Poisoning attacks in federated learning have attracted
ring the knowledge from a complex (teacher) model. Distillation-based             significant attention from researchers [25]. In federated learning sce-
methods are considered to be a feasible solution to combat poisoning              narios, poisoning attacks pose serious threats to model security by
attacks [17–19]. A teacher model can extract clean knowledge from                 manipulating partial training data or local models to embed malicious
the poisoned parameters and transfer it to a student model, thereby               behaviors [26]. The literature [27] generates stealthy backdoor trig-
improving the robustness without changing the model structure. Co-                gers by extracting high-frequency features from images using discrete
distillation [20] is a variant of knowledge distillation that trains two or       wavelet transform and introduces an asymmetric frequency confusion
more models simultaneously, allowing mutual learning and information              mechanism, achieving efficient backdoor attacks on multiple datasets.
sharing. This paper aims to design a federated knowledge graph defense
                                                                                  Meanwhile, many studies have proposed defense methods against poi-
framework based on Co-distillation, which can enhance the model’s
                                                                                  soning attacks. The Literature [28] proposes the Krum method, which
resistance to poisoning attacks through collaborative learning without
                                                                                  selects the most reliable gradient update by evaluating the consistency
changing the original FKGE architecture.
                                                                                  of gradients, thereby effectively defending against poisoning attacks.
    The rest of this paper is organized as follows. Section 2 reviews the
                                                                                  The Literature [29] proposes Fl-Defender, which improves robustness
related work on FKGE and knowledge distillation. Section 3 introduces
                                                                                  by introducing cosine similarity to adjust the weights of parameter
the preliminary concepts and methodologies essential for addressing
                                                                                  aggregation. The literature [30] proposed a two-stage backdoor defense
FKGE poisoning attacks, with the main contributions of this paper
                                                                                  method called MCLDef based on Model Contrastive Learning (MCL),
summarized at the end of this section. In Section 4, we detail the threat
                                                                                  which can significantly reduce the success rate of backdoor attacks with
model and malicious strategies for targeted and untargeted poison-
                                                                                  only a small amount of clean data. In summary, existing research on
ing attacks in FKGE. Section 5 presents the CoDFKGE framework for
                                                                                  poisoning attacks in federated learning mainly focuses on traditional
defending against FKGE poisoning attacks, followed by experimental
validation in Section 6. Finally, concluding remarks and future research          deep learning domains. The design ideas of defense frameworks have
directions are outlined in Section 7.                                             laid the foundation for subsequent poisoning attack defense methods of
                                                                                  FKGE.
2. Related work
                                                                                  2.4. Security issues in FKGE
2.1. Basic FKGE framework
                                                                                      With the development of FKGE, its security and privacy issues have
    Early research on FKGE mainly focused on how to achieve cross-                attracted increasing attention, with existing research mainly focusing
client knowledge sharing and model aggregation while protecting data              on privacy leakage defense. The literature [31] proposed a decentral-
privacy. FedE [13] is the first paper to introduce federated learning into        ized scalable learning framework where embeddings from different KGs
KGE. FedE facilitates cross-client knowledge sharing by maintaining an            can be learned in an asynchronous and peer-to-peer manner while
entity table. Nevertheless, the mechanism of sharing entity embeddings            being privacy-preserving. The literature [21] conducts the first holistic
in FedE has been proven to contain privacy vulnerabilities [21]. At-              study of the privacy threat on FKGE from both attack and defense
tackers can leverage the embedding information to infer the existence             perspectives. It introduced three new inference attacks and proposed
of private triples within client datasets. Based on FedE, FedEC [14]              a differentially private FKGE model DP-Flames with private selection
applies embedding contrastive learning for tackling data heterogeneity            and an adaptive privacy budget allocation policy. Based on [21], the
and utilizes a global update procedure for sharing entity embeddings.             literature [32] introduces five new inference attacks, and proposed
In response to the privacy vulnerability of FedE, FedR [15] proposed a            PDP-Flames, which leverages the sparse gradient nature of FKGE for
privacy-preserving relation embedding aggregation method. By sharing
                                                                                  better privacy-utility trade-off.
relation embeddings instead of entity embeddings, FedR can signifi-
                                                                                      Compared with privacy leakage issues, research on defending
cantly reduce the communication overhead of privacy leakage risks
                                                                                  against poisoning attacks in FKGE is still in its early stages. Traditional
while retaining the semantic information of the KG.
                                                                                  federated learning typically does not directly transmit original embed-
                                                                                  dings. However, entity and relation embeddings are core components
2.2. Knowledge distillation in FKGE
                                                                                  in translation-based KGE, so direct transmission of embeddings is
                                                                                  required during FKGE aggregation. Direct malicious modifications to
    Knowledge Distillation techniques are widely applied in the FKGE
                                                                                  embeddings are difficult to effectively defend against using traditional
field due to their advantages in model compression and knowledge
transfer. To cope with the drift between local optimization and global            federated learning defense methods.
convergence caused by data heterogeneity, FedLU [22] proposes mu-                     The recent literature [33] is the first work to systematize the risks of
tual knowledge distillation. Moreover, it contains an unlearning method           FKGE poisoning attacks. However, it primarily focuses on several forms
to erase specific knowledge from local clients. FedKD [23] uses knowl-            of targeted poisoning attacks in FKGE, without mentioning untargeted
edge distillation to reduce communication costs, and proposes to adap-            poisoning attacks. Although this research provides some defense sug-
tively learn temperature to scale the scores of triples to mitigate teacher       gestions, such as zero-knowledge proof and privacy set intersection, it
over-confidence issues. In addition to FKGE, the KGE model ColE [24]              does not propose specific defense methods. In summary, the existing
proposes co-distillation learning to exploit the complementarity of               research lacks a systematic introduction to the untargeted poisoning
graph structure and text information. It employs Transformer and Bert             attack of FKGE, and there is no complete defense method against FKGE
for graph and text respectively, then distills selective knowledge from           poisoning attacks.
each other’s prediction logits. Overall, existing research on knowledge               To address the above issues, this paper reveals the malicious strat-
distillation in FKGE primarily focuses on handling data heterogeneity,            egy of FKGE untargeted poisoning attacks and proposes CoDFKGE,
with insufficient exploration of its potential value in model security.           a co-distillation-based federating knowledge graph embedding frame-
This paper will explore the application of knowledge distillation in              work for defending against poisoning attacks. The main contributions
FKGE security to defend against poisoning attacks.                                of this paper are summarized as follows.

                                                                              2
Y. Lu et al.                                                                                                             Computer Standards & Interfaces 97 (2026) 104113


    1 We systematically define untargeted poisoning attacks in FKGE                    local KGE model to update its local embedding 𝜃𝐿𝑘 and server-shared
                                                                                                                                            𝑐
      and reveal the poisoning attacks’ malicious strategy, thereby en-                embedding 𝜃𝑆𝑘 . Then, client 𝑐 uploads its shared embedding 𝜃𝑆𝑘 to the
                                                                                                        𝑐                                               𝑐
      hancing threat identification in FKGE and providing a foundation                 server. In server aggregate stage, the central server 𝑆 aggregates the
      for subsequent defense research.                                                 shared embeddings from all clients to obtain the shared parameters
    2 We propose CoDFKGE, the first co-distillation defense framework                  𝜃𝑆𝑘+1 . Finally, the server broadcasts the shared parameters 𝜃𝑆𝑘+1 to all
      against poisoning attacks in FKGE. By deploying bidirectional                    clients. Entity embeddings in KGE are usually shared parameters, while
      distillation models with distinct distillation loss at the client side,          relation embeddings are local parameters. Only rare literature [15] uses
      CoDFKGE as a model-agnostic solution decouples prediction pa-                    relation embeddings as shared parameters.
      rameters from shared parameters, thereby enhancing the model’s                        In FKGE, how the server effectively aggregates shared embeddings
      resistance to poisoning attacks and improving robustness. We                     from different clients is a common problem. The most common FKGE
      designed distinct distillation loss functions for the two models in              server aggregation method is FedE [13], which is an improvement on
      CoDFKGE, enabling CoDFKGE to transfer clean knowledge from                       FedAvg [12]. To handle the imbalance in the number of entities across
      potentially poisoned shared parameters and compress shared pa-                   different clients, FedE aggregate the shared entities using the number
      rameter dimensions, which reduces communication overhead.                        of occurrences in the local data as the weight 𝑤𝑐 . This weight value
    3 We validated the performance of CoDFKGE against poisoning                        can be obtained using the existence matrix 𝑀 mentioned above. The
      attacks through experiments. The results show that without com-                  mathematical expression for FedE’s server aggregation method is shown
      promising link prediction performance CoDFKGE can completely                     in (2).
      eliminate targeted poisoning attacks and significantly mitigate                             ∑
                                                                                         𝜃𝑆𝑘+1 = 𝑐 𝑤𝑐 𝜃𝑆𝑘                                                   (2)
      the performance degradation caused by untargeted poisoning                                         𝑐

      attacks, while simultaneously reducing communication overhead.                       The final target of FKGE is to minimize the loss function of all client
      Ablation experiments further confirm the effectiveness of the two                local triplets simultaneously through federated learning. Its optimiza-
      distillation loss functions in CoDFKGE.                                          tion objective can be expressed as Eq. (3).
                                                                                                     ∑𝐶
                                                                                         𝑎𝑟𝑔 min         𝑐 (𝜃𝐿𝑐 , 𝜃𝑆𝑐 )                                       (3)
3. Preliminaries                                                                            (𝜃 ,𝜃 ) 𝑐
                                                                                              𝐿𝑐   𝑆𝑐


3.1. Knowledge graph embedding                                                         3.3. Knowledge distillation

    KG can be represented as (, ,  ), where E and R are entity sets                    Knowledge distillation is a model compression technique that trans-
and relationship sets.  is a set of triples, where a triple (ℎ, 𝑟, 𝑡) ∈              fers knowledge contained in a complex model (teacher) to a simple
indicates that a relationship 𝑟 ∈  connects the entities ℎ, 𝑡 ∈ .                    model (student) to improve the performance of the simple model. In the
    Translation-based KGE models project entities and relationships                    classic knowledge distillation framework, the student model’s training
in KGs into a continuous vector space. Models employ the scoring                       loss comprises two components: the cross entropy loss 𝐿𝐶𝐸 , computed
function 𝑔(ℎ, 𝑟, 𝑡; 𝜃) to evaluate the plausibility of triples, while 𝜃 rep-           between its output and the true label, and the distillation loss 𝐿𝐾𝐷 ,
resents the embedding parameters. During model training, negative                      computed between its output and the teacher model’s output (soft
samples (ℎ, 𝑟, 𝑡′ ) are constructed by randomly replacing the tail entities            label). In practical applications, the distillation loss is usually quantified
of positive triples. The training process aims to maximize the score                   using the Kullback–Leibler divergence 𝐷𝐾𝐿 between the student model
discrepancy between positive and negative samples. Currently, most                     output and the soft label, and its mathematical expression is shown
KGE models [9,11] employ the binary cross-entropy loss to measure                      in Eq. (4).
the difference between positive and negative samples. Its mathematical                                   (            ) ∑              (       )
                                                                                                                                         𝑝 (𝑖)
expression is as Eq. (1).                                                                           𝐷𝐾𝐿 𝑝𝑡𝑒𝑎 ∥ 𝑝𝑠𝑡𝑢 = 𝑖 𝑝𝑡𝑒𝑎 (𝑖) log 𝑝𝑡𝑒𝑎 (𝑖)
                                                                                                         (                   )            𝑠𝑡𝑢         ( )         (4)
               (
          ∑                                                                              𝐿𝐾𝐷 = 𝜏 2 𝐷𝐾𝐿 𝜎(𝑧(𝑛)           (𝑛)
                                                                                                             𝑡𝑒𝑎 ) ∥ 𝜎(𝑧𝑠𝑡𝑢 ) , 𝑤ℎ𝑒𝑟𝑒 𝜎(𝑥) = sof tmax 𝜏
                                                                                                                                                       𝑥
𝐿 = −              log 𝜎 (𝑔(ℎ, 𝑟, 𝑡; 𝜃) − 𝛾)
          (ℎ,𝑟,𝑡)∈                                                                       Among them, 𝑧𝑡𝑒𝑎 and 𝑧𝑠𝑡𝑢 are the logits of the teacher model and
                                                          )
          ∑                                                                            student model, respectively. 𝜏 is the temperature coefficient, which is
      +        𝑝(ℎ, 𝑟, 𝑡′𝑖 ; 𝜃) log 𝜎(𝛾 − 𝑔(ℎ, 𝑟, 𝑡′𝑖 ; 𝜃))                 (1)        used to control the smoothness of the output.
           𝑖
                                                                                          To allow the student model to effectively absorb the knowledge
    Among them, 𝛾 represents the margin, and (ℎ, 𝑟, 𝑡′𝑖 ) is 𝑖th negative              contained in the teacher model while fitting the real data distribution,
triples. 𝑝(ℎ, 𝑟, 𝑡′𝑖 ; 𝜃) stands for the occurrence probability of this negative       the final loss function is usually the weighted sum of 𝐿𝐶𝐸 and 𝐿𝐾𝐷 .
sample given the embedding parameters 𝜃.
                                                                                       4. Threat model
3.2. Federated knowledge graph embedding
                                                                                           Poisoning attacks in federated learning can be categorized into
    FKGE is an application of federated learning that aims to fuse and                 targeted poisoning attacks, semi-targeted poisoning attacks, and untar-
share knowledge vectors from different KGs to enhance the effective-                   geted poisoning attacks according to the intention of attackers [34].
ness of KGE. Currently, most related studies are based on the framework                In FKGE, a semi-targeted poisoning attack can be regarded as a special
proposed in FedE [13].                                                                 case of a targeted poisoning attack. Therefore, this paper focuses on the
    The basic framework of FKGE consists of a client set 𝐶 and a central               targeted and untargeted poisoning attack type.
server 𝑆. Each client 𝑐 ∈ 𝐶 holds a local KG 𝑐 (𝑐 , 𝑐 , 𝑐 ). The entity
sets of different KGs are partially overlapping, so the understanding of               4.1. Targeted poisoning attack
entities in a certain client can be supplemented by information from
other clients. The server has the one-hot existence matrix 𝑀 ∈ R𝐶×𝑁                       Targeted poisoning attacks are a attack strategy where the attacker
of all entities in the client, where 𝑁 is the number of entities.                      crafts specific malicious triples that do not exist in the target system,
    In each client, KGE model parameters consist of local parame-                      and manipulate the target model to accept these fake triples by inject-
ters 𝜃𝐿 and shared parameters 𝜃𝑆 . During FKGE training, each epoch                    ing poisoned parameters into the shared parameters. This type of attack
progresses through two sequential phases: client update and server                     poses a serious threat to the application of FKGE, as the false relation-
aggregation. In the 𝑘th client update stage, client 𝑐 first trains its                 ships it introduces can lead to reasoning errors and decision-making

                                                                                   3
Y. Lu et al.                                                                                                      Computer Standards & Interfaces 97 (2026) 104113


                                                        Fig. 1. Process of targeted poisoning attack.


                                                          Fig. 2. Framework of CoDFKGE model.


biases in downstream tasks. For example, in financial transaction net-            attacker’s deceptive information. The shadow model’s parameters in-
works, a knowledge graph is constructed with transaction entities                 clude 𝜃𝑆𝑝 , which can be initialized with the victim shared parameters
as nodes and transaction relationships as edges. Link prediction can              𝜃𝑆𝑐 , and 𝜃𝐿𝑝 , which approximates the victim’s local model parameters
then be applied to detect potential transaction relationships (such as            𝜃𝐿𝑐 from random initial values. To ensure the shadow model effectively
money laundering or fraud). If an attacker compromises one of the                 bridges both the victim’s genuine knowledge and the attacker’s ma-
participants, they can introduce false transaction relationships through          licious objectives, its parameters are optimized to minimize the loss
targeted poisoning attacks, leading to unreasonable inferences about              function across all triples in the poisoned dataset, as formalized in Eq.
the victim entity.                                                                (5).
                                                                                                    ∑
    To execute such an attack successfully, the attacker typically follows          arg min            𝐿(ℎ, 𝑟, 𝑡; 𝜃𝑆𝑝 , 𝜃𝐿𝑝 )
                                                                                        (𝜃𝑆𝑝 ,𝜃𝐿𝑝 )                                                     (5)
a multi-stage process that begins with victim’s local information gath-                       (ℎ,𝑟,𝑡)∈𝑝
ering. Fig. 1 shows the process of a targeted poisoning attack. In FKGE
                                                                                  Where L is the loss function of the baseline model.
systems, while the server can observe the entities and relations each
                                                                                      After training the shadow model, the attacker extracts the poisoned
client possesses, it lacks visibility into how these elements are struc-          shared parameters 𝜃𝑆𝑝 using the same procedure that legitimate clients
tured into specific triples. However, for frameworks that share entity            employ to prepare parameters for server aggregation. The attacker can
embeddings (such as FedE [13]), recent research [21] has shown that a             aggregate the poisoned parameters 𝜃𝑆𝑝 with the normal clients’ shared
malicious server can use KGE scoring function to infer the victim’s local         parameters. The attacker usually operates as a compromised server and
relationship patterns and reconstruct the victim’s triple 𝑣 . Armed with         assigns a disproportionately high weight to the poisoned parameters
this inferred knowledge, the attacker strategically constructs malicious          during the aggregation process to ensure that the poisoned parameter
triples 𝑚 that align with the victim’s existing KG schema but represent          dominate the aggregated shared parameters.
false information.                                                                    The final stage of the attack exploits the implicit trust in feder-
    The next critical attack phase involves training a shadow model, a            ated systems. The victim client, unaware of the poisoning, directly
surrogate KGE model designed to mimic the victim’s learning process.              incorporates the compromised aggregated parameters into its local
The shadow model is trained on a poisoned dataset 𝑝 , which combines             training process without validation. As a result, the victim’s model
the inferred victim triples 𝑣 and the malicious triples 𝑚 . This training       gradually learns to accept the malicious triples as valid, ultimately pro-
strategy ensures the shadow model learns to generate embeddings                   ducing incorrect predictions on these non-existent relationships while
that are consistent with both the victim’s genuine knowledge and the              maintaining seemingly normal performance on other parts of the KG.

                                                                              4
Y. Lu et al.                                                                                                        Computer Standards & Interfaces 97 (2026) 104113


4.2. Untargeted poisoning attack                                                   facilitate the reproducibility of our CoDFKGE model, we provide the
                                                                                   complete training framework pseudocode as shown in Algorithm 1.
    The conditions for achieving a targeted poisoning attack are com-
plex. For example, FedR [15] shares only relation embeddings (not
                                                                                   Algorithm 1 CoDFKGE Training Framework
entity embeddings), preventing attackers from inferring victim rela-
tions via entity matrices and thus avoiding targeted poisoning attacks.            Require: Baseline KGE model 𝑔, Training triples  , Learning rate 𝜂,
Even with relational data leaks, targeted poisoning attacks are difficult.             Distillation weight 𝛽, Distillation temperature 𝜏, Total iterations 𝐾
Compared with sharing entity embeddings, the sparsity of relation                      Initialization:
embeddings reduces the shadow model’s ability to align parameters                   1: Initialize client-side prediction model with 𝜃0𝑃 = (𝜃0𝑆 , 𝜃0𝐿 ) ⊳ Local
with the victim’s vector space. However, FedR has almost no defense                    parameters randomly initialized
                                                                                    2: Initialize client-side communication model with reduced feature
effect against untargeted poisoning attacks.
                                                                                       dimensions
    An untargeted poisoning attack means that the attacker aims to dis-
                                                                                    3: Initialize server-side aggregated parameters 𝜃1𝑆 = 𝜃0𝑆 ⊳ First round
rupt victim model convergence or maximize the mispredictions among
                                                                                       initialization
test cases. By maximizing the victim’s loss function during training,
                                                                                       Main Training Loop (Iterations 𝑘 = 1, 2, ..., 𝐾):
attackers can force non-convergent predictions. The attacker can gen-
                                                                                       // Client Update Phase (For each client)
erate the poisoned shared parameter 𝜃𝑆∗ for the victim, which can be
                                         𝑣                                          4: for each client 𝑐 ∈ 𝐶 do
formalized in Eq. (6).
            ∑                                                                       5:     // Step 1: Communication to Prediction Model Distillation
  arg max        𝐿(ℎ, 𝑟, 𝑡; 𝜃𝑆∗ , 𝜃𝐿𝑣 )                               (6)           6:     Load server-shared parameters 𝜃𝑘𝑆         ⊳ Latest global shared
      𝜃∗𝑆𝑣 (ℎ,𝑟,𝑡)∈𝑣
                               𝑣
                                                                                       embeddings
                                                                                                                                              𝐶𝐿
     Among them, 𝜃𝐿𝑣 denotes the victim’s local parameters. 𝑣 is the               7:     Initialize communication model with 𝜃 𝐶 = (𝜃𝑘𝑆 , 𝜃𝑘−1    )
                                                                                    8:     Freeze communication model parameters            ⊳ Act as teacher
victim’s triplet set. Since it is difficult for the attacker to obtain these
                                                                                       model
two parameters directory, they can use random values as guesses for                                                     𝑃
                                                                                    9:     Compute distillation loss 𝐿𝑘 𝐾𝐷 using Equation (7)          ⊳ Only
𝜃𝐿𝑣 and use triples of random combinations of 𝑣 and  as guesses for
                                                                                       positive samples
𝑣 .                                                                                                             𝑃
                                                                                   10:     Compute KGE loss 𝐿𝑘 𝐾𝐺𝐸 on training triples 
     In particular, for the TransE model [7] with the scoring function                                                                  𝑃   𝑃
𝑔(ℎ, 𝑟, 𝑡) = |ℎ + 𝑟 − 𝑡|, the attacker can launch an untargeted poisoning          11:      Update prediction model parameters (𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 ) with:
                                                                                                           𝑃               𝑃
attack by setting the shared parameter 𝜃𝑆′ sent to the victim to identical         12:          ∇𝜃𝑘𝑃 = ∇(𝛽𝐿𝑘 𝐾𝐺𝐸 + (1 − 𝛽)𝐿𝑘 𝐾𝐷 )⊳ Gradient flows through
                                             𝑣
value or using negative aggregation parameters. To avoid detection,                    prediction model only
                                                                                                                               𝑃     𝑃
noise is often added to poisoned parameters. The prediction perfor-                13:         𝜃𝑘 = 𝜃𝑘 − 𝜂∇𝜃𝑘𝑃 , 𝑤ℎ𝑒𝑟𝑒 𝜃𝑘 = {𝜃𝑘 𝐿 , 𝜃𝑘 𝑆 }          ⊳ Update
mance of the victim model may even be lower than that of standalone                    prediction model parameters
training without federated aggregation.                                            14:    Unfreeze communication model parameters
     In general, the success of FKGE poisoning attacks relies on vic-              15:    // Step 2: Prediction to Communication Model Distillation
tims using attacker-provided aggregate parameters directly for training            16:    Freeze prediction model parameters 𝜃𝑘𝑃            ⊳ Used as teacher
without validation. To prevent poisoning attacks, it is critical to isolate            model
                                                                                                                     𝐶
the parameters of the prediction model from externally provided aggre-             17:    Compute distillation loss 𝐿𝑘 𝐾𝐷 using Equation (9)           ⊳ Both
gate parameters. Specifically, potentially poisoned shared parameters                  samples
                                                                                                                                           𝐶   𝐶
must be filtered before training. Meanwhile, minimizing parameter ex-              18:    Update communication model parameters (𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 ) with
                                                                                                           𝐶
posure to the external environment is essential. Therefore, we propose             19:        ∇𝜃𝑘𝐶 = ∇𝐿𝑘 𝐾𝐷     ⊳ Gradient flows through communication
CoDFKGE, a defense FKGE framework based on co-distillation.                            model only
                                                                                                                              𝐶     𝐶
                                                                                   20:        𝜃𝑘 = 𝜃𝑘 − 𝜂∇𝜃𝑘𝐶 , 𝑤ℎ𝑒𝑟𝑒 𝜃𝑘 = {𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 }
                                                                                                                                    𝐶
5. Model design                                                                    21:    Upload updated shared parameters 𝜃𝑘 𝑆 to server
                                                                                   22:    Unfreeze prediction model parameters
    CoDFKGE is a training framework on the client side. Its training               23: end for
process is shown in Fig. 2. CoDFKGE initializes two baseline models                      // Server Aggregation Phase
with the same structure and scoring function, but for different purposes.          24: Server aggregates 𝜃𝑘𝑆 + 1 from all clients using baseline federated
The communication model is mainly responsible for receiving and                          aggregate method.
processing shared parameters, while the prediction model is used for               25: Set 𝑘 = 𝑘 + 1 and repeat main loop until 𝑘 > 𝐾    ⊳ Continue Main
the final embedding and prediction. To minimize potential parameter                      Training Loop
leakage and communication overhead, the feature dimension of the                             return Final prediction model parameters of each client.
communication model is intentionally designed to be smaller than that
of the prediction model.
    During the training process, the two models learn collaboratively                  CoDFKGE is designed to be model-agnostic, enabling seamless in-
through knowledge distillation. Once the communication model re-                   tegration with diverse FKGE models based on their shared parameter
ceives the potentially poisoned shared parameters from the server,                 types. Both communication and prediction models used by CoDFKGE
it acts as a teacher model to transfer clean knowledge to the pre-                 clients utilize the same scoring function 𝑔 as the original KGE model.
diction model. Following the training of the prediction model, the                 Clients upload and utilize shared parameters identically to the baseline
roles are reversed: the prediction model becomes the teacher, and the
                                                                                   model, with these parameters maintaining the same form and dimen-
communication model serves as the student for distillation. This stage
                                                                                   sionality as the original implementation. This parameter compatibility
extracts knowledge from the prediction model and compresses it into
the communication model, ensuring efficient knowledge sharing while                enables the server to aggregate updates using existing federated learn-
minimizing parameter exposure and communication overhead. By de-                   ing aggregation methods without modification. This design ensures that
ploying two distinct model instances, the framework physically isolates            CoDFKGE preserves the original knowledge representation capabilities
attacker-injected parameters from the prediction model’s parameters,               while maintaining consistent operational semantics with the baseline
making poisoning attacks significantly more difficult to execute. To               model.

                                                                               5
Y. Lu et al.                                                                                                                                      Computer Standards & Interfaces 97 (2026) 104113


5.1. Communication to prediction model distillation                                                      of 𝑝 follows the approach in [9], with its mathematical formulation
                                                                                                         provided in Eq. (10).
    In the first iteration, the model trains the prediction component                                                      exp 𝜏 𝑔(ℎ,𝑟,𝑡′ )
following the standard procedure. Starting from the second iteration of                                   𝑝(ℎ, 𝑟, 𝑡′𝑖 ) = ∑ exp𝛼𝜏 𝑔(ℎ,𝑟,𝑡
                                                                                                                                       𝑖
                                                                                                                                          ′)                                                (10)
                                                                                                                            𝑗     𝛼           𝑗
the training process, the communication model loads the server-shared
                                                                                                         Where 𝜏𝛼 is the self-adversarial sampling temperature.
parameters 𝜃𝑘𝑆 and initializes itself jointly with the local embeddings
 𝐿 from the previous iteration’s local prediction model.                                                     After the bidirectional distillation process of CoDFKGE, the com-
𝜃𝑘−1                                                                                                                                                      𝐶      𝐶
                                                                                                         munication model parameters are updated to 𝜃𝑘 𝑆 and 𝜃𝑘 𝐿 . Client then
    After the communication model receives and applies the server-                                                 𝐶𝑆
                                                                                                         uploads 𝜃𝑘 to the server, which aggregates these parameters from all
shared parameters, it filters out potentially poisoned model parameters
                                                                                                         clients using federated averaging to generate the next round’s shared
through knowledge distillation. The communication model acts as a                                                     𝑆 .
                                                                                                         parameters 𝜃𝑘+1
teacher model to transfer clean knowledge to the prediction model,
which serves as the student model. During this process, the prediction
                                                                                                         6. Experiments
model parameters are frozen to ensure that the knowledge transfer
direction is strictly from the communication model to the prediction
                                                                                                              Experiments are conducted on the open available dataset FB15K-
model. Gradients only flow through the prediction model parameters,
                                                                                                         237 [35], which is a subset of Freebase, containing 14,505 entities,
while the communication model parameters remain frozen, preventing
                                                                                                         544,230 triples, and 474 relations. To perform federated learning, we
gradient leakage back to potentially poisoned shared parameters.
                                                                                                         adopt the relational partitioning method in [22]. This method first
    If the communication model suffers from poisoning attacks and
                                                                                                         partitions the relationships through clustering, ensuring that the triple
contains the poisoning parameter, its outputs for negative samples are
                                                                                                         relationships within each partition are as close as possible. Then, these
not reliable. Distilling or teaching such uncertain predictions would
                                                                                                         partitions are divided into groups of roughly equal numbers of triples
propagate noise rather than useful knowledge. To exclude the poisoned
                                                                                                         and distributed to the client. This results in tighter triple relationships
knowledge, the prediction model should focus on positive samples
                                                                                                         within the client, better reflecting real-world scenarios.
during distillation, ensuring that only trustworthy knowledge is trans-
                                                                                                              The TransE model [7] is selected as the KGE model, serving as
ferred. The mathematical expression for the distillation loss of the
                                                                                                         the foundation for all federated learning methods in the experiments—
prediction model in the 𝑘th training epoch is provided in Eq. (7).                                       including the attacker’s shadow model. To benchmark CoDFKGE, we
               ∑          (                                                      )                       select multiple baseline models. First, the local training model without
    𝑃                                        𝑃𝐿                     𝑃      𝑃
  𝐿𝑘 𝐾𝐷 = 𝜏 2        𝐷𝐾𝐿 𝜎(𝑔(ℎ, 𝑟, 𝑡; 𝜃𝑘𝑆 , 𝜃𝑘−1 )) ∥ 𝜎(𝑔(ℎ, 𝑟, 𝑡; 𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 ))
                                                                                                         federated learning is selected as the KGE baseline model. It does not
                   (ℎ,𝑟,𝑡)∈
                                                                                                         share parameters between clients, so it has no communication over-
                                                                                               (7)       head and is not vulnerable to poisoning attacks. Then, FedE [13] and
    Among them, 𝑡 is the distillation temperature coefficient, and 𝜎 is                                  FedR [15] are also chosen as baseline FGKE models, representing stan-
                                                                                                         dard approaches in the field. Additionally, we implement a knowledge
the softmax function of the ratio of the model output to 𝑡. 𝑔 represents
                                                                                                         distillation model, which utilizes communication and prediction models
the scoring function of the prediction model, which is used to compute
                                𝑃𝐿                                                                       similar to CoDFKGE but only processes a unidirectional knowledge dis-
the KGE loss. 𝑔(ℎ, 𝑟, 𝑡; 𝜃𝑘𝑆 , 𝜃𝑘−1 ) represents the communication model
                                                                  𝑃𝐿                                     tillation. Specifically, it uses the communication model as the teacher
output under server-shared parameter 𝜃𝑘𝑆 and local parameter 𝜃𝑘−1     , and
            𝑃𝑆 𝑃𝐿                                                                                        model and the prediction model as the student model to filter out
𝑔(ℎ, 𝑟, 𝑡; 𝜃𝑘 , 𝜃𝑘 ) represents the training prediction model output.                                    poisoning knowledge, with the distillation loss function following Eq.
    When training distillation, the model also needs to consider the                                     (4).
KGE loss function. The overall loss function of the prediction model                                          All experiments are performed on a 72-core Ubuntu 18.04.6 LTS
is the weighted sum of the KGE loss and the distillation loss, and its                                   machine with an Intel(R) Xeon(R) Gold 5220 CPU @ 2.20 GHz and
mathematical expression is shown in Eq. (8).                                                             a V100S-PCIE-32GB GPU. We implemented the proposed FKGE frame-
               𝑃
 𝐿𝑃𝑘 = 𝛽𝐿𝑘 𝐾𝐺𝐸 + (1 − 𝛽)𝐿𝑘 𝐾𝐷
                                   𝑃
                                                                                               (8)       work and baseline model based on PyTorch Geometric [36] and dis-
                                                                                                         tributed AI framework Ray [37]. We used KGE hyperparameter settings
        𝑃𝑘
Where, 𝐿𝐾𝐺𝐸 is the KGE loss of the 𝑘th epoch of the prediction model                                     based on [9] and FKGE hyperparameter settings based on FedE [13].
defined by Eq. (1), and 𝛽 is the weight.                                                                 Specifically, we used the Adam [38] optimizer with a learning rate of
                                                                                                         1e-3. 𝛾 is 10, and self-advertise negative sampling temperature 𝜏𝛼 in
5.2. Prediction to communication model distillation                                                      KGE is 1. The distillation temperature 𝜏 is 2, and the coefficient 𝛽 of
                                                                                                         distillation and KGE loss are both 0.5. The maximum training epoch
   After training the prediction model, we train the communication                                       is 400. In each epoch, the client performs 3 iterations locally before
model through distillation, which extracts and propagates knowledge                                      uploading the parameters to the server.
without directly sharing prediction parameters, thereby avoiding pri-                                         We utilize the link prediction task, a sub-task of KGE, to validate the
vacy leakage. During the communication model’s distillation, the out-                                    model’s accuracy. Referencing the common implementation of the link
put of the prediction model under positive and negative samples serves                                   prediction, we employ the Mean Reciprocal Rank (MRR) and Hits@N as
as soft labels. As Eq. (1) illustrates, the loss function must account                                   accuracy metrics. The MRR is the average of the reciprocals of the ranks
for the probability of negative samples when balancing the impact                                        of the predicted triples among all possible triples. Mathematically, if
of positive and negative predictions. Therefore, the distillation loss                                   𝑟𝑎𝑛𝑘𝑖 is the rank of the correct triple for the 𝑖th query, and 𝑛 is the
                                                                                                                                                      ∑
function of the communication model is formalized in Eq. (9).                                            total number of queries, then 𝑀𝑅𝑅 = 1𝑛 𝑛𝑖=1 𝑟𝑎𝑛𝑘    1
                                                                                                                                                                 . The Hits@N is the
                                                                                                                                                               𝑖
             ∑                                                                                           proportion of query triples for which the correct triple is present among
  𝐶𝑘                                  𝑃      𝑃                      𝐶      𝐶
 𝐿𝐾𝐷  = 𝜏2        (𝐷𝐾𝐿 (𝜎(𝑔(ℎ, 𝑟, 𝑡; 𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 )) ∥ 𝜎(𝑔(ℎ, 𝑟, 𝑡; 𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 ))))                      the top 𝑁 candidates generated by the model. Generally, higher values
            ∑(ℎ,𝑟,𝑡)∈                                                                                   for both metrics indicate better model performance in link prediction.
                                                 𝑃      𝑃                        𝐶      𝐶
        +     𝑝(ℎ, 𝑟, 𝑡′𝑖 )𝐷𝐾𝐿 (𝜎(𝑔(ℎ, 𝑟, 𝑡′𝑖 ; 𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 ) ∥ 𝜎(𝑔(ℎ, 𝑟, 𝑡′𝑖 ; 𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 ))))              Through experiments, the following research questions will be ver-
               𝑖                                                                                         ified.
                                                                                               (9)
                              𝐶   𝐶
                                                                                                         RQ1 Does CoDFKGE maintain KGE prediction performance while re-
   Among them, 𝑔(ℎ, 𝑟, 𝑡; 𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 ) represents the communication model                                   ducing FKGE communication overhead?
                    𝑃𝑆 𝑃𝐿
output. 𝑔(ℎ, 𝑟, 𝑡; 𝜃𝑘 , 𝜃𝑘 ) represents the prediction model output under                                RQ2 Can CoDFKGE effectively defend against targeted poisoning at-
                      𝑃                          𝑃
shared parameter 𝜃𝑘 𝑆 and local parameter 𝜃𝑘 𝐿 . The calculation method                                      tacks?

                                                                                                     6
Y. Lu et al.                                                                                                    Computer Standards & Interfaces 97 (2026) 104113


Table 1
Experiment result on normal link prediction.
 Fed type       Model                     Mem(MB)      CC(MB)        MRR                     Hits@1                Hits@5                    Hits@10
 Local          Local(128)                57.05        –             0.4081 ± 0.0015         0.3066 ± 0.0014       0.5223 ± 0.0023           0.6077 ± 0.0015
 Entity         FedE(128)                 185.58       42.60         0.4082 ± 0.0004         0.3068 ± 0.0012       0.5232 ± 0.0013           0.6080 ± 0.0018
 Entity         Distillation (128-128)    356.10       42.60         0.4129 ± 0.0008         0.3118 ± 0.0016       0.5279 ± 0.0008           0.6122 ± 0.0003
 Entity         CoDFKGE (128-128)         356.10       42.60         0.4109 ± 0.0043         0.3097 ± 0.0041       0.5246 ± 0.0044           0.6087 ± 0.0040
 Entity         Distillation (32-128)     217.39       10.65         0.3914 ± 0.0011         0.2935 ± 0.0008       0.5005 ± 0.0014           0.5838 ± 0.0032
 Entity         CoDFKGE (32-128)          217.40       10.65         0.4090 ± 0.0010         0.3079 ± 0.0007       0.5233 ± 0.0019           0.6068 ± 0.0019
 Relation       FedR(128)                 75.49        0.69          0.4085 ± 0.0011         0.3079 ± 0.0021       0.5219 ± 0.0016           0.6066 ± 0.0017
 Relation       Distillation (128-128)    151.74       0.69          0.4106 ± 0.0013         0.3092 ± 0.0023       0.5242 ± 0.0008           0.6098 ± 0.0009
 Relation       CoDFKGE (128-128)         150.02       0.69          0.4065 ± 0.0007         0.3056 ± 0.0013       0.5190 ± 0.0023           0.6063 ± 0.0012
 Relation       Distillation (32-128)     94.53        0.17          0.3920 ± 0.0012         0.2960 ± 0.0007       0.4996 ± 0.0019           0.5807 ± 0.0013
 Relation       CoDFKGE (32-128)          93.69        0.17          0.4078 ± 0.0009         0.3060 ± 0.0007       0.5224 ± 0.0031           0.6074 ± 0.0015


RQ3 Can CoDFKGE effectively defend against untargeted poisoning                6.2. Targeted poisoning attack experiment (RQ2)
    attacks?
RQ4 Do the two proposed distillation loss functions individually con-               In the targeted poisoning attack, 32 pairs of non-existent triples
    tribute to poisoning defense?                                              are selected as attack targets from the victim’s KG through negative
                                                                               sampling to construct a poisoned triple dataset. First, a predetermined
6.1. Normal link prediction (RQ1)                                              number of normal triples are selected from the victim’s training triples.
                                                                               Subsequently, the head or tail nodes of these triples are randomly re-
    To explore the performance of the proposed model in normal link            placed, and any triples already existing in the training set are iteratively
prediction, we first tested the model on a conventional dataset. The           removed until 32 pairs of non-existent triples are successfully con-
performance of the model is measured using MRR and Hits@1, Hits@5,             structed. In each epoch, the shadow model undergoes the same number
and Hits@10. The model is trained by federated learning and evaluated          of local training rounds as legitimate clients on the poisoned dataset to
on the local test sets of clients.                                             generate poisoned parameters. The malicious server aggregates these
    Table 1 lists the performance of the local KGE model, FedE, FedR,          poisoned parameters with the parameters of the normal client into
and CoDFKGE with different dimensions. The experimental results are            shared parameters and distributes them to all clients. Attackers can
grouped according to the type of shared embeddings and the dimension           assign high weights to poisoned model parameters during aggregation.
of the prediction model. The parameter dimensions are specified in             Following the setup in Ref. [33], we set the weight of the attacker’s
parentheses within the ‘‘Model’’ column. For example, CoDFKGE(32-              aggregated poisoned triples to be 256 times that of normal triples.
128) denotes the CoDKGE model with a 32-dimensional communication              Experiments focus on models with shared entity parameters (required
model and a 128-dimensional prediction model. All link prediction
                                                                               for targeted poisoning attacks) and non-federated local baselines.
experiments were repeated 5 times with different random seeds, and
                                                                                    For space considerations, this section reports only MRR and
the accuracy results of all models are reported as (mean ± standard
                                                                               Hits@10 metrics. Attack effectiveness is measured by the MRR and
deviation). The best performing model results in each group (excluding
                                                                               Hits@10 of poisoned triples on the victim. The higher metrics of the
the local model) are bolded. The results of the CoDFKGE (32-128)
                                                                               poisoned triples indicate greater vulnerability to poisoning and weaker
model that are better than those of Distillation(32-128) are underlined.
                                                                               resistance of the model to targeted poisoning attacks.
    The performance of locally trained models is lower than most feder-
                                                                                    Table 2 lists the performance of baseline models and CoDFKGE
ated learning models, highlighting the advantages of sharing model pa-
                                                                               under targeted poisoning attacks, grouped by the prediction model
rameters. High-dimensional distillation(128-128) models achieve better
                                                                               dimension. The parameter dimensions are specified in parentheses
link prediction performance. Compared to distillation(128-128), CoD-
                                                                               within the ‘‘Model’’ column. The ‘‘All Clients’’ column reports av-
FKGE models show slightly inferior prediction performance. However,
                                                                               erage performance across all clients’ test sets during attacks, while
by comparing models with the same dimensions, CoDFKGE outperform
both local baselines and federated baselines (FedE, FedR). The co-             ‘‘Victim Poisoned’’ measures the victim’s performance on predicting
distillation process in CoDFKGE may lead to a loss of generalization           poisoned triples. All experiments were repeated 5 times with differ-
accuracy. We believe that the main advantage of CoDFKGE is its                 ent random seeds, and the results are reported as (mean ± standard
ability to enhance the security of FKGE. In addition to the security           deviation). The best performing model results are bolded. Moreover,
performance demonstrated in Sections 6.2 and 6.3, it also maintains            the ‘‘Communication Poison’’ column highlights the communication
link prediction performance comparable to its baseline FKGE models.            model’s performance on poisoned triples for CoDFKGE and the dis-
    Beyond accuracy metrics, the ‘‘CC’’ (Communication Cost) column            tillation model, demonstrating that both communication models are
reports the communication overhead per training epoch, which is                impacted by targeted poisoning attacks. Through distillation, the pre-
calculated based on the byte size of PyTorch Embedding used in the             diction accuracy of poisoned triples by the prediction model decreases
implementation. The ‘‘Mem’’ column shows the GPU memory usage                  in both cases.
of federated models in MB. Distillation-based model requires main-                  For targeted poisoning attacks, the primary evaluation metrics
taining two KGE models, resulting in higher computational resource             should be the MRR and Hits@10 performance indicators of the victim
consumption. Distillation-based models need larger GPU memory to               model when predicting poisoned triples. The Local training model,
store the parameters of both models. Compared to using model pa-               which does not employ federated learning, remains immune to poi-
rameters of the same size, distillation-based models allow to compress         soning attacks, resulting in low MRR for poisoned triples, with the
parameters in the communication model, achieving significantly lower           Hits@10 value being exactly 0. This indicates that the unpoisoned Local
communication overhead. In cases of smaller communication overhead,            model does not include non-existent poisoned triples among its top
CoDFKGE(32-128) outperforms distillation(32-128) in link prediction            10 candidate results when making predictions. If a model incorrectly
performance. Therefore, we believe that the CoDFKGE model does                 marks non-existent poisoned test triples as one of the top 10 candidates,
not degrade the normal link prediction performance of baseline FKGE            it demonstrates that the poisoning attack has successfully manipulated
models and can effectively reduce the communication overhead of the            the model’s predictions. Therefore, we use Hits@10 as the metric to
model.                                                                         measure the Attack Success Rate (ASR).

                                                                           7
Y. Lu et al.                                                                                                        Computer Standards & Interfaces 97 (2026) 104113


Table 2
Experiment result under targeted poisoning attack.
 Model                       All clients                                  Victim poison                                 Communication poison
                             MRR                     Hits@10              MRR                     Hits@10(ASR)          MRR                      Hits@10
 Local(128, unpoisoned)      0.4081 ± 0.0015         0.6077 ± 0.0015      0.0003 ± 0.0001         0.0000 ± 0.0000       –                        –
 FedE(128)                   0.4034 ± 0.0035         0.6004 ± 0.0029      0.4450 ± 0.0938         0.7857 ± 0.1248       –                        –
 Distillation(128-128)       0.4026 ± 0.0025         0.6006 ± 0.0039      0.0844 ± 0.0552         0.2000 ± 0.1311       0.4999 ± 0.1429          0.7714 ± 0.1046
 CoDFKGE(128-128)            0.4086 ± 0.0007         0.6089 ± 0.0012      0.0010 ± 0.0003         0.0009 ± 0.0005       0.4694 ± 0.1511          0.6589 ± 0.1242
 Distillation(32-128)        0.3821 ± 0.0022         0.5717 ± 0.0018      0.1511 ± 0.3356         0.1960 ± 0.4362       0.4919 ± 0.2364          0.6625 ± 0.1887
 CoDFKGE(32-128)             0.3856 ± 0.0039         0.5740 ± 0.0054      0.0010 ± 0.0001         0.0010 ± 0.0003       0.3794 ± 0.0032          0.5702 ± 0.005


                                                         Fig. 3. Performance degradation comparison.


    The FedE model maintains high prediction accuracy on normal                     communication model in CoDFKGE(32-128) less susceptible to poison-
test triples when under attack, but exhibits abnormally high MRR and                ing attacks.
Hits@10 metrics for targeted poisoned triples, even exceeding those
of normal triples. This indicates that targeted poisoning attacks can               6.3. Untargeted poisoning attack experiment (RQ3)
effectively manipulate the FedE model to generate incorrect prediction
results. Similarly, in distillation-based models, their communication                   In untargeted poisoning attack experiments, the attacker returns
models are severely affected by poisoning attacks, while the impact on              negative aggregate parameters to the victim client, making the victim
the prediction models is relatively minor. Although the distill(128-128)            model non-converge and degrading prediction performance. The results
model can partially eliminate poisoning knowledge, it still remains vul-            presented in this section reflect average prediction performance on
nerable to the targeted poisoning attacks. Moreover, as the dimension               local test triples of clients.
of the communication model parameter increases, the extent of the                       Table 3 lists the performance of each model under untargeted
model’s vulnerability to poisoning attacks also grows.                              poisoning attacks, grouped by the prediction model dimension and
    In contrast, CoDFKGE’s prediction model performs distillation learn-            federated type. The parameter dimensions are specified in parenthe-
ing exclusively on verified positive samples, effectively eliminating               ses within the ‘‘Model’’ column. The ‘‘All Clients’’ column shows the
potential poisoning knowledge that might exist in negative samples.                 average performance of all clients under untargeted poisoning attacks,
Similar to the Local training model, CoDFKGE achieves extremely low                 and the ‘‘Victim Client’’ column shows the performance of the victim
MRR and Hits@10 metrics for poisoned triples, which fully demon-                    client. To measure the severity of the model being attacked, the MRR of
strates that the CoDFKGE model can effectively defend against targeted              the local model in Table 1 is used as a benchmark. The ‘‘Decay Ratio’’
poisoning attacks in FKGE. Furthermore, due to the compression of                   column shows the ratio of performance degradation on the victim
the communication model’s dimension, the amount of information                      client compared to the local model shown in Table 1. All experiments
that attackers can transmit is correspondingly reduced, making the                  were repeated 5 times with different random seeds, and the results

                                                                                8
Y. Lu et al.                                                                                                             Computer Standards & Interfaces 97 (2026) 104113


Table 3
Experiment result under untargeted poisoning attack.
 Fed Type         Model                    All clients                                       Victim                                             Decay ratio (%)
                                           MRR                    Hits@10                    MRR                       Hits@10                  MRR           Hits@10
 Entity           FedE(128)                0.3896 ± 0.0010        0.5939 ± 0.0009            0.3625 ± 0.0102           0.5620 ± 0.0144          11.21         7.58
 Entity           Distillation(128-128)    0.3900 ± 0.0017        0.5921 ± 0.0007            0.3641 ± 0.0012           0.5664 ± 0.0018          11.82         7.54
 Entity           CoDFKGE(128-128)         0.4084 ± 0.0007        0.6068 ± 0.0003            0.4017 ± 0.0010           0.6009 ± 0.0005          2.25          1.28
 Entity           Distillation (32-128)    0.3024 ± 0.0208        0.5422 ± 0.0105            0.2739 ± 0.0264           0.5262 ± 0.0124          30.02         9.49
 Entity           CoDFKGE (32-128)         0.4093 ± 0.0018        0.6081 ± 0.0014            0.4022 ± 0.0022           0.6023 ± 0.0011          1.66          0.75
 Relation         FedR(128)                0.3915 ± 0.0010        0.5951 ± 0.0016            0.3637 ± 0.0093           0.5636 ± 0.0150          10.96         7.10
 Relation         Distillation(128-128)    0.3978 ± 0.0017        0.6022 ± 0.0019            0.3881 ± 0.0023           0.5942 ± 0.0028          5.51          2.56
 Relation         CoDFKGE(128-128)         0.4086 ± 0.0017        0.6075 ± 0.0029            0.4014 ± 0.0020           0.6018 ± 0.0037          1.24          0.75
 Relation         Distillation (32-128)    0.3058 ± 0.0079        0.5463 ± 0.0029            0.2787 ± 0.0101           0.5307 ± 0.0038          27.78         8.61
 Relation         CoDFKGE (32-128)         0.4090 ± 0.0008        0.6066 ± 0.0011            0.4026 ± 0.0008           0.6018 ± 0.0013          1.27          0.92


Table 4
Ablation study in normal link prediction and under targeted attack.
 Model                Link prediction                          Targeted all clients                            Targeted victim poisoning
                      MRR                 Hits@10              MRR                      Hits@10                MRR                   Hits@10 (targeted poisoning ASR)
 CoDFKGE              0.4112 ± 0.0039     0.6084 ± 0.0036      0.4086 ± 0.0007          0.6089 ± 0.0012        0.0010 ± 0.0003       0.0009 ± 0.0005
 Ablation(Comm)       0.4095 ± 0.0016     0.6074 ± 0.0014      0.4086 ± 0.0022          0.6076 ± 0.0021        0.0017 ± 0.0008       0.0013 ± 0.0008
 Ablation(Pred)       0.4132 ± 0.0006     0.6116 ± 0.0012      0.4098 ± 0.0011          0.6080 ± 0.0009        0.8086 ± 0.0064       0.9702 ± 0.0228


are reported as (mean ± standard deviation). The best and second best                 were repeated 5 times with different random seeds, and the results are
results in each group have been marked in bold and underline.                         reported as (mean ± standard deviation). The best results are bolded.
     From the experimental results, it can be observed that when sub-                     Experimental results demonstrate that while Ablation(Pred) per-
jected to untargeted poisoning attacks, the CoDFKGE series models                     forms well in conventional link prediction, its resistance to poisoning
achieve optimal MRR and Hits@10 performance metrics compared to                       attacks lags behind the other two models due to not employing a
other models. In this context, all models exhibit varying degrees of                  negative sample exclusion strategy in its loss function. Among the re-
decline in both their overall performance metrics and their performance               maining two models, while both demonstrate robust resilience against
metrics on victims. In Fig. 3, we present a comparison of the predic-                 poisoning attacks, the CoDFKGE model achieves superior link pre-
tion performance of various models under normal link prediction and                   diction performance compared to Ablation(Comm). Ablation(Comm)
untargeted poisoning attack scenarios. It can be observed that the Dis-               employs a baseline loss function during the distillation training of
tillation(32-128) model experiences the most significant performance                  the communication model. In contrast, the CoDFKGE model adopts
degradation; for Distillation(128-128), FedE, and FedR models, their                  the approach from [9] and utilizes self-adversarial sampling temper-
performance degradation is also substantial and cannot be ignored.                    ature 𝜏𝛼 to reweight negative samples, thereby enhancing the model’s
These models directly incorporate poisoned global knowledge as an                     ability to distinguish between negative samples. Overall, the ablation
integral part of their own models, causing the convergence process of                 experiments demonstrate that applying the proposed distillation loss
the models to be adversely affected. In contrast, the performance degra-              functions simultaneously enhances the model’s capability in defending
dation of CoDFKGE models is fully within 3%. This is because even in                  against poisoning attacks and link prediction.
the absence of global knowledge, the prediction model of CoDFKGE still
                                                                                      7. Conclusion
utilizes local data knowledge for training, and its training effectiveness
is comparable to that of local KGE models without knowledge sharing.
                                                                                          This paper proposes CoDFKGE, a co-distillation-based defense
     Baseline models may have their results manipulated or exhibit
                                                                                      framework for FKGE poisoning attacks. As the first co-distillation
significant performance degradation when facing poisoning attacks.
                                                                                      defense framework against poisoning attacks in FKGE, CoDFKGE does
Although in link prediction experiments, distillation models exhibited
                                                                                      have some limitations. First, maintaining two separate models requires
advantages in performance, their defense effectiveness is extremely
                                                                                      higher computational resource consumption on clients. Second, the
limited when facing poisoning attacks. In contrast, CoDFKGE remains
                                                                                      bidirectional distillation process may lead to a loss of generalization
unmanipulated when encountering targeted poisoning attacks and does
                                                                                      accuracy. In contrast, CoDFKGE’s advantages lie in its model-agnostic
not exhibit significant performance degradation when subjected to
                                                                                      applicability to existing FKGE models without compromising perfor-
untargeted poisoning attacks, demonstrating its effective defense capa-               mance. By decoupling clients’ prediction models from shared parameter
bility against poisoning attacks.                                                     models, CoDFKGE effectively filters out poisoned knowledge embedded
                                                                                      in shared updates. CoDFKG eliminates malicious manipulations under
6.4. Ablation study (RQ4)                                                             targeted poisoning attacks, and significantly mitigates accuracy degra-
                                                                                      dation under untargeted poisoning attacks. Leveraging distillation,
    This section evaluates the defensive effects of applying different                the framework further reduces communication overhead. This work
loss functions in CoDFKGE against poisoning attacks. Specifically, we                 provides new ideas for enhancing the security of FKGE.
compare the performance of models using 128-dimensional training                          The limitations of FKGE poisoning defense research are partially
parameters for both communication and prediction models across nor-                   rooted in the unique characteristics of KGE. When considering
mal link prediction, targeted poisoning attack scenarios, and untargeted              translation-based KGE models in FKGE, sharing entity or relation
poisoning attack scenarios. Two ablation baselines were implemented:                  embeddings introduces risks related to both privacy preservation and
Ablation(Comm) applies the baseline loss function (Eq. (4)) solely                    poisoning attacks. Employing GNN-based KGE models in FKGE that
during the communication module’s distillation, while Ablation(Pred)                  transmit GNN parameters or gradients can alleviate these concerns.
uses it exclusively for the prediction module’s distillation.                         However, due to their superior robustness to sparse data and lower
    Tables 4 and 5 shows the experiment results of models with different              computational resource requirements, translation-based models still
distillation loss functions sharing entity embeddings. All experiments                maintain unparalleled advantages in specific application scenarios.

                                                                                9
Y. Lu et al.                                                                                                                          Computer Standards & Interfaces 97 (2026) 104113


                 Table 5
                 Ablation study under untargeted attack.
                   Model                   Untargeted all clients                            Untargeted victim                                Decay ratio (%)
                                           MRR                      Hits@10                  MRR                      Hits@10                 MRR         Hits@10
                   CoDFKGE                 0.4084 ± 0.0007          0.6068 ± 0.0003          0.4017 ± 0.0010          0.6009 ± 0.0005         2.25        1.27
                   Ablation(Comm)          0.4056 ± 0.0017          0.6062 ± 0.0011          0.3996 ± 0.0018          0.6003 ± 0.0013         2.42        1.16
                   Ablation(Pred)          0.3951 ± 0.0011          0.6022 ± 0.0008          0.3852 ± 0.0009          0.5951 ± 0.0005         6.76        2.69


    For future research, we recommend exploring the application of the                            [8] Z. Wang, J. Zhang, J. Feng, Z. Chen, Knowledge graph embedding by translating
CoDFKGE framework in more complex real-world scenarios, such as                                       on hyperplanes, in: Proceedings of the AAAI Conference on Artificial Intelligence,
                                                                                                      vol. 28, 2014.
personalized FKGE problems. Additionally, in large-scale dynamic KG
                                                                                                  [9] Z. Sun, Z.-H. Deng, J.-Y. Nie, J. Tang, Rotate: Knowledge graph embedding by
environments, the security landscape for FKGE may undergo signifi-                                    relational rotation in complex space, 2019, arXiv preprint arXiv:1902.10197.
cant changes, necessitating further investigation into defense methods                           [10] Z. Zhang, J. Jia, Y. Wan, Y. Zhou, Y. Kong, Y. Qian, J. Long, Transr*: Repre-
tailored to these evolving scenarios.                                                                 sentation learning model by flexible translation and relation matrix projection,
                                                                                                      J. Intell. Fuzzy Systems 40 (5) (2021) 10251–10259.
                                                                                                 [11] T. Dettmers, P. Minervini, P. Stenetorp, S. Riedel, Convolutional 2d knowl-
CRediT authorship contribution statement                                                              edge graph embeddings, in: Proceedings of the AAAI Conference on Artificial
                                                                                                      Intelligence, vol. 32, (1) 2018.
   Yiqin Lu: Supervision. Jiarui Chen: Writing – original draft, Soft-                           [12] B. McMahan, E. Moore, D. Ramage, S. Hampson, B.A. y Arcas, Communication-
                                                                                                      efficient learning of deep networks from decentralized data, in: Artificial
ware, Methodology. Jiancheng Qin: Writing – review & editing.
                                                                                                      Intelligence and Statistics, PMLR, 2017, pp. 1273–1282.
                                                                                                 [13] M. Chen, W. Zhang, Z. Yuan, Y. Jia, H. Chen, Fede: Embedding knowledge graphs
Declaration of Generative AI and AI-assisted technologies in the                                      in federated setting, in: Proceedings of the 10th International Joint Conference
writing process                                                                                       on Knowledge Graphs, 2021, pp. 80–88.
                                                                                                 [14] M. Chen, W. Zhang, Z. Yuan, Y. Jia, H. Chen, Federated knowledge graph
   During the preparation of this work the author(s) used deepseek in                                 completion via embedding-contrastive learning, Knowl.-Based Syst. 252 (2022)
                                                                                                      109459.
order to improve language and readability. After using this tool/service,                        [15] K. Zhang, Y. Wang, H. Wang, L. Huang, C. Yang, X. Chen, L. Sun, Efficient fed-
the author(s) reviewed and edited the content as needed and take(s) full                              erated learning on knowledge graphs via privacy-preserving relation embedding
responsibility for the content of the publication.                                                    aggregation, 2022, arXiv preprint arXiv:2203.09553.
                                                                                                 [16] G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network,
                                                                                                      2015, arXiv preprint arXiv:1503.02531.
Declaration of competing interest                                                                [17] N. Papernot, P. McDaniel, X. Wu, S. Jha, A. Swami, Distillation as a de-
                                                                                                      fense to adversarial perturbations against deep neural networks, in: 2016 IEEE
    The authors declare that they have no known competing finan-                                      Symposium on Security and Privacy, SP, IEEE, 2016, pp. 582–597.
cial interests or personal relationships that could have appeared to                             [18] K. Yoshida, T. Fujino, Countermeasure against backdoor attack on neural
                                                                                                      networks utilizing knowledge distillation, J. Signal Process. 24 (4) (2020)
influence the work reported in this paper.
                                                                                                      141–144.
                                                                                                 [19] K. Yoshida, T. Fujino, Disabling backdoor and identifying poison data by
Acknowledgment                                                                                        using knowledge distillation in backdoor attacks on deep neural networks, in:
                                                                                                      Proceedings of the 13th ACM Workshop on Artificial Intelligence and Security,
                                                                                                      2020, pp. 117–127.
   This work is supported by the Special Project for Research and                                [20] R. Anil, G. Pereyra, A. Passos, R. Ormandi, G.E. Dahl, G.E. Hinton, Large
Development in Key Areas of Guangdong Province, under Grant                                           scale distributed neural network training through online distillation, 2018, arXiv
2019B010137001.                                                                                       preprint arXiv:1804.03235.
                                                                                                 [21] Y. Hu, W. Liang, R. Wu, K. Xiao, W. Wang, X. Li, J. Liu, Z. Qin, Quantifying and
                                                                                                      defending against privacy threats on federated knowledge graph embedding, in:
Data availability
                                                                                                      Proceedings of the ACM Web Conference 2023, 2023, pp. 2306–2317.
                                                                                                 [22] X. Zhu, G. Li, W. Hu, Heterogeneous federated knowledge graph embedding
    Data will be made available on request.                                                           learning and unlearning, in: Proceedings of the ACM Web Conference 2023,
                                                                                                      2023, pp. 2444–2454.
                                                                                                 [23] X. Zhang, Z. Zeng, X. Zhou, Z. Shen, Low-dimensional federated knowledge graph
                                                                                                      embedding via knowledge distillation, 2024, arXiv preprint arXiv:2408.05748.
References
                                                                                                 [24] Y. Liu, Z. Sun, G. Li, W. Hu, I know what you do not know: Knowledge
                                                                                                      graph embedding via co-distillation learning, in: Proceedings of the 31st ACM
 [1] X. Zhao, H. Chen, Z. Xing, C. Miao, Brain-inspired search engine assistant based                 International Conference on Information & Knowledge Management, 2022, pp.
     on knowledge graph, IEEE Trans. Neural Netw. Learn. Syst. 34 (8) (2021)                          1329–1338.
     4386–4400.                                                                                  [25] F. Xia, W. Cheng, A survey on privacy-preserving federated learning against
 [2] S. Sharma, Fact-finding knowledge-aware search engine, in: Data Management,                      poisoning attacks, Clust. Comput. 27 (10) (2024) 13565–13582.
     Analytics and Innovation: Proceedings of ICDMAI 2021, vol. 2, Springer, 2021,               [26] J. Chen, H. Yan, Z. Liu, M. Zhang, H. Xiong, S. Yu, When federated learning
     pp. 225–235.                                                                                     meets privacy-preserving computation, ACM Comput. Surv. (ISSN: 0360-0300)
 [3] Y. Jiang, Y. Yang, L. Xia, C. Huang, DiffKG: Knowledge graph diffusion model for                 56 (12) (2024).
     recommendation, in: Proceedings of the 17th ACM International Conference on                 [27] J. Xia, Z. Yue, Y. Zhou, Z. Ling, Y. Shi, X. Wei, M. Chen, Waveattack: Asymmetric
     Web Search and Data Mining, WSDM ’24, Association for Computing Machinery,                       frequency obfuscation-based backdoor attacks against deep neural networks, Adv.
     New York, NY, USA, ISBN: 9798400703713, 2024, pp. 313–321.                                       Neural Inf. Process. Syst. 37 (2024) 43549–43570.
 [4] W. Wang, X. Shen, B. Yi, H. Zhang, J. Liu, C. Dai, Knowledge-aware fine-grained             [28] P. Blanchard, E.M. El Mhamdi, R. Guerraoui, J. Stainer, Machine learning with
     attention networks with refined knowledge graph embedding for personalized                       adversaries: Byzantine tolerant gradient descent, Adv. Neural Inf. Process. Syst.
     recommendation, Expert Syst. Appl. 249 (2024) 123710.                                            30 (2017).
 [5] J. Chen, Y. Lu, Y. Zhang, F. Huang, J. Qin, A management knowledge graph                    [29] N.M. Jebreel, J. Domingo-Ferrer, Fl-defender: Combating targeted attacks in
     approach for critical infrastructure protection: Ontology design, information ex-                federated learning, Knowl.-Based Syst. 260 (2023) 110178.
     traction and relation prediction, Int. J. Crit. Infrastruct. Prot. (ISSN: 1874-5482)        [30] Z. Yue, J. Xia, Z. Ling, M. Hu, T. Wang, X. Wei, M. Chen, Model-contrastive
     43 (2023) 100634.                                                                                learning for backdoor elimination, in: Proceedings of the 31st ACM International
 [6] Y. Zhang, J. Chen, Z. Cheng, X. Shen, J. Qin, Y. Han, Y. Lu, Edge propagation                    Conference on Multimedia, 2023, pp. 8869–8880.
     for link prediction in requirement-cyber threat intelligence knowledge graph,               [31] H. Peng, H. Li, Y. Song, V. Zheng, J. Li, Differentially private federated
     Inform. Sci. (ISSN: 0020-0255) 653 (2024) 119770.                                                knowledge graphs embedding, in: Proceedings of the 30th ACM International
 [7] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, O. Yakhnenko, Translating                     Conference on Information & Knowledge Management, CIKM ’21, Association
     embeddings for modeling multi-relational data, Adv. Neural Inf. Process. Syst.                   for Computing Machinery, New York, NY, USA, ISBN: 9781450384469, 2021,
     26 (2013).                                                                                       pp. 1416–1425.


                                                                                            10
Y. Lu et al.                                                                                                                      Computer Standards & Interfaces 97 (2026) 104113


[32] Y. Hu, Y. Wang, J. Lou, W. Liang, R. Wu, W. Wang, X. Li, J. Liu, Z. Qin, Privacy         [36] M. Fey, J.E. Lenssen, Fast graph representation learning with PyTorch Geometric,
     risks of federated knowledge graph embedding: New membership inference                        in: ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
     attacks and personalized differential privacy defense, IEEE Trans. Dependable            [37] P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang, M. Elibol,
     Secur. Comput. (2024).                                                                        Z. Yang, W. Paul, M.I. Jordan, I. Stoica, Ray: A distributed framework for
[33] E. Zhou, S. Guo, Z. Ma, Z. Hong, T. Guo, P. Dong, Poisoning attack on federated               emerging AI applications, in: 13th USENIX Symposium on Operating Systems
     knowledge graph embedding, in: Proceedings of the ACM Web Conference 2024,                    Design and Implementation (OSDI 18), USENIX Association, Carlsbad, CA, ISBN:
     2024, pp. 1998–2008.                                                                          978-1-939133-08-3, 2018, pp. 561–577.
[34] G. Xia, J. Chen, C. Yu, J. Ma, Poisoning attacks in federated learning: A survey,        [38] D.P. Kingma, J. Ba, Adam: A method for stochastic optimization, 2014, arXiv
     Ieee Access 11 (2023) 10708–10722.                                                            preprint arXiv:1412.6980.
[35] K. Toutanova, D. Chen, P. Pantel, H. Poon, P. Choudhury, M. Gamon, Repre-
     senting text for joint embedding of text and knowledge bases, in: Proceedings
     of the 2015 Conference on Empirical Methods in Natural Language Processing,
     2015, pp. 1499–1509.


                                                                                         11