Computer Standards & Interfaces 97 (2026) 104113 Contents lists available at ScienceDirect Computer Standards & Interfaces journal homepage: www.elsevier.com/locate/csi Co-distillation-based defense framework for federated knowledge graph embedding against poisoning attacks ∗ Yiqin Lu, Jiarui Chen , Jiancheng Qin School of Electronic and Information Engineering, South China University of Technology, 510641, China ARTICLE INFO ABSTRACT Keywords: Federated knowledge graph embedding (FKGE) enables collaborative knowledge sharing without data ex- Federated learning change, but it also introduces risks of poisoning attacks that degrade model accuracy or force incorrect Knowledge graph outputs. Protecting FKGE from poisoning attacks becomes a critical research problem. This paper reveals Poisoning attack the malicious strategy of untargeted FKGE poisoning attacks and proposes CoDFKGE, a co-distillation-based Knowledge distillation FKGE framework for defending against poisoning attacks. CoDFKGE deploys two collaborative knowledge graph embedding models on clients, decoupling prediction parameters from shared parameters as a model- agnostic solution. By designing distinct distillation loss functions, CoDFKGE transfers clean knowledge from potentially poisoned shared parameters while compressing dimensions to reduce communication overhead. Experiments show CoDFKGE preserves link prediction performance with lower communication costs, eliminates malicious manipulations under targeted poisoning attacks, and significantly mitigates accuracy degradation under untargeted poisoning attacks. 1. Introduction embedding for entities and relations. However, real-world KGs of dif- ferent organizations are often incomplete, making it difficult to train Knowledge graphs (KGs) are structured representations of real- high-quality knowledge graph reasoning models. Moreover, KG data world entities and their relationships, supporting applications in search often contains a large amount of private data, and direct data sharing engines [1,2], recommendation systems [3,4], and security analysis [5, will inevitably lead to privacy leakage. For this reason, federated 6]. Knowledge graph embedding (KGE) techniques project entities learning [12] is introduced into knowledge graph reasoning. and relations into low-dimensional vector spaces, enabling efficient FKGE assumes that there are multiple participants with comple- knowledge reasoning and completion [7]. Due to privacy regulations mentary but incomplete KGs, aiming to derive optimal knowledge and data sensitivity requirements, KGs across organizations within the embeddings for each participant without data exchange. Most existing same domain remain fragmented despite growing data volumes. In this context, federated knowledge graph embedding (FKGE) emerges as a studies [13–15] model FKGE as multiple clients that maintain local collaborative learning technique for sharing KG embeddings without KGE models and a central server. Clients train models locally and data exchange. However, the introduction of federation mechanisms upload the model parameters to the central server, which aggregates will bring new privacy risks. malicious participants can inject poisoned the parameters and then returns them to the clients. parameters during training or aggregation to launch a poisoning attack, However, since the embedding vectors are directly the model pa- degrading model accuracy or forcing incorrect outputs. Consequently, rameters, FKGE is highly vulnerable to poisoning attacks. With the protecting FKGE systems against poisoning attacks has emerged as a intent to reduce model performance, steal sensitive information, or dis- critical research challenge. rupt system stability, poisoning attacks refer to malicious modifications Unlike graph neural network (GNN)-based models, KGE models of parameters during local training or parameter aggregation on the usually rely on the translation-based model [8–11]. The embedding server. To protect the participants of FKGE, it is necessary to propose vectors of entity and relation in the KG are directly used as learnable a protection mechanism against FKGE poisoning attacks. parameters. KGE models utilize different score functions to measure Moreover, other related indicators in FKGE deserve attention. For the plausibility of triples (h,r,t). By contrasting the outputs of existing triples and negatively sampled triples, KGE models derive appropriate example, the federated learning of KGE requires frequent parameter ∗ Corresponding author. E-mail addresses: eeyqlu@scut.edu.cn (Y. Lu), ee_jrchen@mail.scut.edu.cn (J. Chen), jcqin@scut.edu.cn (J. Qin). https://doi.org/10.1016/j.csi.2025.104113 Received 3 June 2025; Received in revised form 8 November 2025; Accepted 8 December 2025 Available online 9 December 2025 0920-5489/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies. Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113 exchange, and the use of a translation-based model will submit the en- 2.3. Poisoning attack in federated learning tity or relation embeddings, which makes the communication overhead greater than that of traditional federated learning. Federated Learning (FL), due to its distributed training nature, Knowledge distillation [16] is a model compression technique that creates favorable conditions for poisoning attacks while protecting improves the performance of a simple (student) model by transfer- data privacy. Poisoning attacks in federated learning have attracted ring the knowledge from a complex (teacher) model. Distillation-based significant attention from researchers [25]. In federated learning sce- methods are considered to be a feasible solution to combat poisoning narios, poisoning attacks pose serious threats to model security by attacks [17–19]. A teacher model can extract clean knowledge from manipulating partial training data or local models to embed malicious the poisoned parameters and transfer it to a student model, thereby behaviors [26]. The literature [27] generates stealthy backdoor trig- improving the robustness without changing the model structure. Co- gers by extracting high-frequency features from images using discrete distillation [20] is a variant of knowledge distillation that trains two or wavelet transform and introduces an asymmetric frequency confusion more models simultaneously, allowing mutual learning and information mechanism, achieving efficient backdoor attacks on multiple datasets. sharing. This paper aims to design a federated knowledge graph defense Meanwhile, many studies have proposed defense methods against poi- framework based on Co-distillation, which can enhance the model’s soning attacks. The Literature [28] proposes the Krum method, which resistance to poisoning attacks through collaborative learning without selects the most reliable gradient update by evaluating the consistency changing the original FKGE architecture. of gradients, thereby effectively defending against poisoning attacks. The rest of this paper is organized as follows. Section 2 reviews the The Literature [29] proposes Fl-Defender, which improves robustness related work on FKGE and knowledge distillation. Section 3 introduces by introducing cosine similarity to adjust the weights of parameter the preliminary concepts and methodologies essential for addressing aggregation. The literature [30] proposed a two-stage backdoor defense FKGE poisoning attacks, with the main contributions of this paper method called MCLDef based on Model Contrastive Learning (MCL), summarized at the end of this section. In Section 4, we detail the threat which can significantly reduce the success rate of backdoor attacks with model and malicious strategies for targeted and untargeted poison- only a small amount of clean data. In summary, existing research on ing attacks in FKGE. Section 5 presents the CoDFKGE framework for poisoning attacks in federated learning mainly focuses on traditional defending against FKGE poisoning attacks, followed by experimental validation in Section 6. Finally, concluding remarks and future research deep learning domains. The design ideas of defense frameworks have directions are outlined in Section 7. laid the foundation for subsequent poisoning attack defense methods of FKGE. 2. Related work 2.4. Security issues in FKGE 2.1. Basic FKGE framework With the development of FKGE, its security and privacy issues have Early research on FKGE mainly focused on how to achieve cross- attracted increasing attention, with existing research mainly focusing client knowledge sharing and model aggregation while protecting data on privacy leakage defense. The literature [31] proposed a decentral- privacy. FedE [13] is the first paper to introduce federated learning into ized scalable learning framework where embeddings from different KGs KGE. FedE facilitates cross-client knowledge sharing by maintaining an can be learned in an asynchronous and peer-to-peer manner while entity table. Nevertheless, the mechanism of sharing entity embeddings being privacy-preserving. The literature [21] conducts the first holistic in FedE has been proven to contain privacy vulnerabilities [21]. At- study of the privacy threat on FKGE from both attack and defense tackers can leverage the embedding information to infer the existence perspectives. It introduced three new inference attacks and proposed of private triples within client datasets. Based on FedE, FedEC [14] a differentially private FKGE model DP-Flames with private selection applies embedding contrastive learning for tackling data heterogeneity and an adaptive privacy budget allocation policy. Based on [21], the and utilizes a global update procedure for sharing entity embeddings. literature [32] introduces five new inference attacks, and proposed In response to the privacy vulnerability of FedE, FedR [15] proposed a PDP-Flames, which leverages the sparse gradient nature of FKGE for privacy-preserving relation embedding aggregation method. By sharing better privacy-utility trade-off. relation embeddings instead of entity embeddings, FedR can signifi- Compared with privacy leakage issues, research on defending cantly reduce the communication overhead of privacy leakage risks against poisoning attacks in FKGE is still in its early stages. Traditional while retaining the semantic information of the KG. federated learning typically does not directly transmit original embed- dings. However, entity and relation embeddings are core components 2.2. Knowledge distillation in FKGE in translation-based KGE, so direct transmission of embeddings is required during FKGE aggregation. Direct malicious modifications to Knowledge Distillation techniques are widely applied in the FKGE embeddings are difficult to effectively defend against using traditional field due to their advantages in model compression and knowledge transfer. To cope with the drift between local optimization and global federated learning defense methods. convergence caused by data heterogeneity, FedLU [22] proposes mu- The recent literature [33] is the first work to systematize the risks of tual knowledge distillation. Moreover, it contains an unlearning method FKGE poisoning attacks. However, it primarily focuses on several forms to erase specific knowledge from local clients. FedKD [23] uses knowl- of targeted poisoning attacks in FKGE, without mentioning untargeted edge distillation to reduce communication costs, and proposes to adap- poisoning attacks. Although this research provides some defense sug- tively learn temperature to scale the scores of triples to mitigate teacher gestions, such as zero-knowledge proof and privacy set intersection, it over-confidence issues. In addition to FKGE, the KGE model ColE [24] does not propose specific defense methods. In summary, the existing proposes co-distillation learning to exploit the complementarity of research lacks a systematic introduction to the untargeted poisoning graph structure and text information. It employs Transformer and Bert attack of FKGE, and there is no complete defense method against FKGE for graph and text respectively, then distills selective knowledge from poisoning attacks. each other’s prediction logits. Overall, existing research on knowledge To address the above issues, this paper reveals the malicious strat- distillation in FKGE primarily focuses on handling data heterogeneity, egy of FKGE untargeted poisoning attacks and proposes CoDFKGE, with insufficient exploration of its potential value in model security. a co-distillation-based federating knowledge graph embedding frame- This paper will explore the application of knowledge distillation in work for defending against poisoning attacks. The main contributions FKGE security to defend against poisoning attacks. of this paper are summarized as follows. 2 Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113 1 We systematically define untargeted poisoning attacks in FKGE local KGE model to update its local embedding 𝜃𝐿𝑘 and server-shared 𝑐 and reveal the poisoning attacks’ malicious strategy, thereby en- embedding 𝜃𝑆𝑘 . Then, client 𝑐 uploads its shared embedding 𝜃𝑆𝑘 to the 𝑐 𝑐 hancing threat identification in FKGE and providing a foundation server. In server aggregate stage, the central server 𝑆 aggregates the for subsequent defense research. shared embeddings from all clients to obtain the shared parameters 2 We propose CoDFKGE, the first co-distillation defense framework 𝜃𝑆𝑘+1 . Finally, the server broadcasts the shared parameters 𝜃𝑆𝑘+1 to all against poisoning attacks in FKGE. By deploying bidirectional clients. Entity embeddings in KGE are usually shared parameters, while distillation models with distinct distillation loss at the client side, relation embeddings are local parameters. Only rare literature [15] uses CoDFKGE as a model-agnostic solution decouples prediction pa- relation embeddings as shared parameters. rameters from shared parameters, thereby enhancing the model’s In FKGE, how the server effectively aggregates shared embeddings resistance to poisoning attacks and improving robustness. We from different clients is a common problem. The most common FKGE designed distinct distillation loss functions for the two models in server aggregation method is FedE [13], which is an improvement on CoDFKGE, enabling CoDFKGE to transfer clean knowledge from FedAvg [12]. To handle the imbalance in the number of entities across potentially poisoned shared parameters and compress shared pa- different clients, FedE aggregate the shared entities using the number rameter dimensions, which reduces communication overhead. of occurrences in the local data as the weight 𝑤𝑐 . This weight value 3 We validated the performance of CoDFKGE against poisoning can be obtained using the existence matrix 𝑀 mentioned above. The attacks through experiments. The results show that without com- mathematical expression for FedE’s server aggregation method is shown promising link prediction performance CoDFKGE can completely in (2). eliminate targeted poisoning attacks and significantly mitigate ∑ 𝜃𝑆𝑘+1 = 𝑐 𝑤𝑐 𝜃𝑆𝑘 (2) the performance degradation caused by untargeted poisoning 𝑐 attacks, while simultaneously reducing communication overhead. The final target of FKGE is to minimize the loss function of all client Ablation experiments further confirm the effectiveness of the two local triplets simultaneously through federated learning. Its optimiza- distillation loss functions in CoDFKGE. tion objective can be expressed as Eq. (3). ∑𝐶 𝑎𝑟𝑔 min 𝑐 (𝜃𝐿𝑐 , 𝜃𝑆𝑐 ) (3) 3. Preliminaries (𝜃 ,𝜃 ) 𝑐 𝐿𝑐 𝑆𝑐 3.1. Knowledge graph embedding 3.3. Knowledge distillation KG can be represented as (, ,  ), where E and R are entity sets Knowledge distillation is a model compression technique that trans- and relationship sets.  is a set of triples, where a triple (ℎ, 𝑟, 𝑡) ∈  fers knowledge contained in a complex model (teacher) to a simple indicates that a relationship 𝑟 ∈  connects the entities ℎ, 𝑡 ∈ . model (student) to improve the performance of the simple model. In the Translation-based KGE models project entities and relationships classic knowledge distillation framework, the student model’s training in KGs into a continuous vector space. Models employ the scoring loss comprises two components: the cross entropy loss 𝐿𝐶𝐸 , computed function 𝑔(ℎ, 𝑟, 𝑡; 𝜃) to evaluate the plausibility of triples, while 𝜃 rep- between its output and the true label, and the distillation loss 𝐿𝐾𝐷 , resents the embedding parameters. During model training, negative computed between its output and the teacher model’s output (soft samples (ℎ, 𝑟, 𝑡′ ) are constructed by randomly replacing the tail entities label). In practical applications, the distillation loss is usually quantified of positive triples. The training process aims to maximize the score using the Kullback–Leibler divergence 𝐷𝐾𝐿 between the student model discrepancy between positive and negative samples. Currently, most output and the soft label, and its mathematical expression is shown KGE models [9,11] employ the binary cross-entropy loss to measure in Eq. (4). the difference between positive and negative samples. Its mathematical ( ) ∑ ( ) 𝑝 (𝑖) expression is as Eq. (1). 𝐷𝐾𝐿 𝑝𝑡𝑒𝑎 ∥ 𝑝𝑠𝑡𝑢 = 𝑖 𝑝𝑡𝑒𝑎 (𝑖) log 𝑝𝑡𝑒𝑎 (𝑖) ( ) 𝑠𝑡𝑢 ( ) (4) ( ∑ 𝐿𝐾𝐷 = 𝜏 2 𝐷𝐾𝐿 𝜎(𝑧(𝑛) (𝑛) 𝑡𝑒𝑎 ) ∥ 𝜎(𝑧𝑠𝑡𝑢 ) , 𝑤ℎ𝑒𝑟𝑒 𝜎(𝑥) = sof tmax 𝜏 𝑥 𝐿 = − log 𝜎 (𝑔(ℎ, 𝑟, 𝑡; 𝜃) − 𝛾) (ℎ,𝑟,𝑡)∈ Among them, 𝑧𝑡𝑒𝑎 and 𝑧𝑠𝑡𝑢 are the logits of the teacher model and ) ∑ student model, respectively. 𝜏 is the temperature coefficient, which is + 𝑝(ℎ, 𝑟, 𝑡′𝑖 ; 𝜃) log 𝜎(𝛾 − 𝑔(ℎ, 𝑟, 𝑡′𝑖 ; 𝜃)) (1) used to control the smoothness of the output. 𝑖 To allow the student model to effectively absorb the knowledge Among them, 𝛾 represents the margin, and (ℎ, 𝑟, 𝑡′𝑖 ) is 𝑖th negative contained in the teacher model while fitting the real data distribution, triples. 𝑝(ℎ, 𝑟, 𝑡′𝑖 ; 𝜃) stands for the occurrence probability of this negative the final loss function is usually the weighted sum of 𝐿𝐶𝐸 and 𝐿𝐾𝐷 . sample given the embedding parameters 𝜃. 4. Threat model 3.2. Federated knowledge graph embedding Poisoning attacks in federated learning can be categorized into FKGE is an application of federated learning that aims to fuse and targeted poisoning attacks, semi-targeted poisoning attacks, and untar- share knowledge vectors from different KGs to enhance the effective- geted poisoning attacks according to the intention of attackers [34]. ness of KGE. Currently, most related studies are based on the framework In FKGE, a semi-targeted poisoning attack can be regarded as a special proposed in FedE [13]. case of a targeted poisoning attack. Therefore, this paper focuses on the The basic framework of FKGE consists of a client set 𝐶 and a central targeted and untargeted poisoning attack type. server 𝑆. Each client 𝑐 ∈ 𝐶 holds a local KG 𝑐 (𝑐 , 𝑐 , 𝑐 ). The entity sets of different KGs are partially overlapping, so the understanding of 4.1. Targeted poisoning attack entities in a certain client can be supplemented by information from other clients. The server has the one-hot existence matrix 𝑀 ∈ R𝐶×𝑁 Targeted poisoning attacks are a attack strategy where the attacker of all entities in the client, where 𝑁 is the number of entities. crafts specific malicious triples that do not exist in the target system, In each client, KGE model parameters consist of local parame- and manipulate the target model to accept these fake triples by inject- ters 𝜃𝐿 and shared parameters 𝜃𝑆 . During FKGE training, each epoch ing poisoned parameters into the shared parameters. This type of attack progresses through two sequential phases: client update and server poses a serious threat to the application of FKGE, as the false relation- aggregation. In the 𝑘th client update stage, client 𝑐 first trains its ships it introduces can lead to reasoning errors and decision-making 3 Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113 Fig. 1. Process of targeted poisoning attack. Fig. 2. Framework of CoDFKGE model. biases in downstream tasks. For example, in financial transaction net- attacker’s deceptive information. The shadow model’s parameters in- works, a knowledge graph is constructed with transaction entities clude 𝜃𝑆𝑝 , which can be initialized with the victim shared parameters as nodes and transaction relationships as edges. Link prediction can 𝜃𝑆𝑐 , and 𝜃𝐿𝑝 , which approximates the victim’s local model parameters then be applied to detect potential transaction relationships (such as 𝜃𝐿𝑐 from random initial values. To ensure the shadow model effectively money laundering or fraud). If an attacker compromises one of the bridges both the victim’s genuine knowledge and the attacker’s ma- participants, they can introduce false transaction relationships through licious objectives, its parameters are optimized to minimize the loss targeted poisoning attacks, leading to unreasonable inferences about function across all triples in the poisoned dataset, as formalized in Eq. the victim entity. (5). ∑ To execute such an attack successfully, the attacker typically follows arg min 𝐿(ℎ, 𝑟, 𝑡; 𝜃𝑆𝑝 , 𝜃𝐿𝑝 ) (𝜃𝑆𝑝 ,𝜃𝐿𝑝 ) (5) a multi-stage process that begins with victim’s local information gath- (ℎ,𝑟,𝑡)∈𝑝 ering. Fig. 1 shows the process of a targeted poisoning attack. In FKGE Where L is the loss function of the baseline model. systems, while the server can observe the entities and relations each After training the shadow model, the attacker extracts the poisoned client possesses, it lacks visibility into how these elements are struc- shared parameters 𝜃𝑆𝑝 using the same procedure that legitimate clients tured into specific triples. However, for frameworks that share entity employ to prepare parameters for server aggregation. The attacker can embeddings (such as FedE [13]), recent research [21] has shown that a aggregate the poisoned parameters 𝜃𝑆𝑝 with the normal clients’ shared malicious server can use KGE scoring function to infer the victim’s local parameters. The attacker usually operates as a compromised server and relationship patterns and reconstruct the victim’s triple 𝑣 . Armed with assigns a disproportionately high weight to the poisoned parameters this inferred knowledge, the attacker strategically constructs malicious during the aggregation process to ensure that the poisoned parameter triples 𝑚 that align with the victim’s existing KG schema but represent dominate the aggregated shared parameters. false information. The final stage of the attack exploits the implicit trust in feder- The next critical attack phase involves training a shadow model, a ated systems. The victim client, unaware of the poisoning, directly surrogate KGE model designed to mimic the victim’s learning process. incorporates the compromised aggregated parameters into its local The shadow model is trained on a poisoned dataset 𝑝 , which combines training process without validation. As a result, the victim’s model the inferred victim triples 𝑣 and the malicious triples 𝑚 . This training gradually learns to accept the malicious triples as valid, ultimately pro- strategy ensures the shadow model learns to generate embeddings ducing incorrect predictions on these non-existent relationships while that are consistent with both the victim’s genuine knowledge and the maintaining seemingly normal performance on other parts of the KG. 4 Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113 4.2. Untargeted poisoning attack facilitate the reproducibility of our CoDFKGE model, we provide the complete training framework pseudocode as shown in Algorithm 1. The conditions for achieving a targeted poisoning attack are com- plex. For example, FedR [15] shares only relation embeddings (not Algorithm 1 CoDFKGE Training Framework entity embeddings), preventing attackers from inferring victim rela- tions via entity matrices and thus avoiding targeted poisoning attacks. Require: Baseline KGE model 𝑔, Training triples  , Learning rate 𝜂, Even with relational data leaks, targeted poisoning attacks are difficult. Distillation weight 𝛽, Distillation temperature 𝜏, Total iterations 𝐾 Compared with sharing entity embeddings, the sparsity of relation Initialization: embeddings reduces the shadow model’s ability to align parameters 1: Initialize client-side prediction model with 𝜃0𝑃 = (𝜃0𝑆 , 𝜃0𝐿 ) ⊳ Local with the victim’s vector space. However, FedR has almost no defense parameters randomly initialized 2: Initialize client-side communication model with reduced feature effect against untargeted poisoning attacks. dimensions An untargeted poisoning attack means that the attacker aims to dis- 3: Initialize server-side aggregated parameters 𝜃1𝑆 = 𝜃0𝑆 ⊳ First round rupt victim model convergence or maximize the mispredictions among initialization test cases. By maximizing the victim’s loss function during training, Main Training Loop (Iterations 𝑘 = 1, 2, ..., 𝐾): attackers can force non-convergent predictions. The attacker can gen- // Client Update Phase (For each client) erate the poisoned shared parameter 𝜃𝑆∗ for the victim, which can be 𝑣 4: for each client 𝑐 ∈ 𝐶 do formalized in Eq. (6). ∑ 5: // Step 1: Communication to Prediction Model Distillation arg max 𝐿(ℎ, 𝑟, 𝑡; 𝜃𝑆∗ , 𝜃𝐿𝑣 ) (6) 6: Load server-shared parameters 𝜃𝑘𝑆 ⊳ Latest global shared 𝜃∗𝑆𝑣 (ℎ,𝑟,𝑡)∈𝑣 𝑣 embeddings 𝐶𝐿 Among them, 𝜃𝐿𝑣 denotes the victim’s local parameters. 𝑣 is the 7: Initialize communication model with 𝜃 𝐶 = (𝜃𝑘𝑆 , 𝜃𝑘−1 ) 8: Freeze communication model parameters ⊳ Act as teacher victim’s triplet set. Since it is difficult for the attacker to obtain these model two parameters directory, they can use random values as guesses for 𝑃 9: Compute distillation loss 𝐿𝑘 𝐾𝐷 using Equation (7) ⊳ Only 𝜃𝐿𝑣 and use triples of random combinations of 𝑣 and  as guesses for positive samples 𝑣 . 𝑃 10: Compute KGE loss 𝐿𝑘 𝐾𝐺𝐸 on training triples  In particular, for the TransE model [7] with the scoring function 𝑃 𝑃 𝑔(ℎ, 𝑟, 𝑡) = |ℎ + 𝑟 − 𝑡|, the attacker can launch an untargeted poisoning 11: Update prediction model parameters (𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 ) with: 𝑃 𝑃 attack by setting the shared parameter 𝜃𝑆′ sent to the victim to identical 12: ∇𝜃𝑘𝑃 = ∇(𝛽𝐿𝑘 𝐾𝐺𝐸 + (1 − 𝛽)𝐿𝑘 𝐾𝐷 )⊳ Gradient flows through 𝑣 value or using negative aggregation parameters. To avoid detection, prediction model only 𝑃 𝑃 noise is often added to poisoned parameters. The prediction perfor- 13: 𝜃𝑘 = 𝜃𝑘 − 𝜂∇𝜃𝑘𝑃 , 𝑤ℎ𝑒𝑟𝑒 𝜃𝑘 = {𝜃𝑘 𝐿 , 𝜃𝑘 𝑆 } ⊳ Update mance of the victim model may even be lower than that of standalone prediction model parameters training without federated aggregation. 14: Unfreeze communication model parameters In general, the success of FKGE poisoning attacks relies on vic- 15: // Step 2: Prediction to Communication Model Distillation tims using attacker-provided aggregate parameters directly for training 16: Freeze prediction model parameters 𝜃𝑘𝑃 ⊳ Used as teacher without validation. To prevent poisoning attacks, it is critical to isolate model 𝐶 the parameters of the prediction model from externally provided aggre- 17: Compute distillation loss 𝐿𝑘 𝐾𝐷 using Equation (9) ⊳ Both gate parameters. Specifically, potentially poisoned shared parameters samples 𝐶 𝐶 must be filtered before training. Meanwhile, minimizing parameter ex- 18: Update communication model parameters (𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 ) with 𝐶 posure to the external environment is essential. Therefore, we propose 19: ∇𝜃𝑘𝐶 = ∇𝐿𝑘 𝐾𝐷 ⊳ Gradient flows through communication CoDFKGE, a defense FKGE framework based on co-distillation. model only 𝐶 𝐶 20: 𝜃𝑘 = 𝜃𝑘 − 𝜂∇𝜃𝑘𝐶 , 𝑤ℎ𝑒𝑟𝑒 𝜃𝑘 = {𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 } 𝐶 5. Model design 21: Upload updated shared parameters 𝜃𝑘 𝑆 to server 22: Unfreeze prediction model parameters CoDFKGE is a training framework on the client side. Its training 23: end for process is shown in Fig. 2. CoDFKGE initializes two baseline models // Server Aggregation Phase with the same structure and scoring function, but for different purposes. 24: Server aggregates 𝜃𝑘𝑆 + 1 from all clients using baseline federated The communication model is mainly responsible for receiving and aggregate method. processing shared parameters, while the prediction model is used for 25: Set 𝑘 = 𝑘 + 1 and repeat main loop until 𝑘 > 𝐾 ⊳ Continue Main the final embedding and prediction. To minimize potential parameter Training Loop leakage and communication overhead, the feature dimension of the return Final prediction model parameters of each client. communication model is intentionally designed to be smaller than that of the prediction model. During the training process, the two models learn collaboratively CoDFKGE is designed to be model-agnostic, enabling seamless in- through knowledge distillation. Once the communication model re- tegration with diverse FKGE models based on their shared parameter ceives the potentially poisoned shared parameters from the server, types. Both communication and prediction models used by CoDFKGE it acts as a teacher model to transfer clean knowledge to the pre- clients utilize the same scoring function 𝑔 as the original KGE model. diction model. Following the training of the prediction model, the Clients upload and utilize shared parameters identically to the baseline roles are reversed: the prediction model becomes the teacher, and the model, with these parameters maintaining the same form and dimen- communication model serves as the student for distillation. This stage sionality as the original implementation. This parameter compatibility extracts knowledge from the prediction model and compresses it into the communication model, ensuring efficient knowledge sharing while enables the server to aggregate updates using existing federated learn- minimizing parameter exposure and communication overhead. By de- ing aggregation methods without modification. This design ensures that ploying two distinct model instances, the framework physically isolates CoDFKGE preserves the original knowledge representation capabilities attacker-injected parameters from the prediction model’s parameters, while maintaining consistent operational semantics with the baseline making poisoning attacks significantly more difficult to execute. To model. 5 Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113 5.1. Communication to prediction model distillation of 𝑝 follows the approach in [9], with its mathematical formulation provided in Eq. (10). In the first iteration, the model trains the prediction component exp 𝜏 𝑔(ℎ,𝑟,𝑡′ ) following the standard procedure. Starting from the second iteration of 𝑝(ℎ, 𝑟, 𝑡′𝑖 ) = ∑ exp𝛼𝜏 𝑔(ℎ,𝑟,𝑡 𝑖 ′) (10) 𝑗 𝛼 𝑗 the training process, the communication model loads the server-shared Where 𝜏𝛼 is the self-adversarial sampling temperature. parameters 𝜃𝑘𝑆 and initializes itself jointly with the local embeddings 𝐿 from the previous iteration’s local prediction model. After the bidirectional distillation process of CoDFKGE, the com- 𝜃𝑘−1 𝐶 𝐶 munication model parameters are updated to 𝜃𝑘 𝑆 and 𝜃𝑘 𝐿 . Client then After the communication model receives and applies the server- 𝐶𝑆 uploads 𝜃𝑘 to the server, which aggregates these parameters from all shared parameters, it filters out potentially poisoned model parameters clients using federated averaging to generate the next round’s shared through knowledge distillation. The communication model acts as a 𝑆 . parameters 𝜃𝑘+1 teacher model to transfer clean knowledge to the prediction model, which serves as the student model. During this process, the prediction 6. Experiments model parameters are frozen to ensure that the knowledge transfer direction is strictly from the communication model to the prediction Experiments are conducted on the open available dataset FB15K- model. Gradients only flow through the prediction model parameters, 237 [35], which is a subset of Freebase, containing 14,505 entities, while the communication model parameters remain frozen, preventing 544,230 triples, and 474 relations. To perform federated learning, we gradient leakage back to potentially poisoned shared parameters. adopt the relational partitioning method in [22]. This method first If the communication model suffers from poisoning attacks and partitions the relationships through clustering, ensuring that the triple contains the poisoning parameter, its outputs for negative samples are relationships within each partition are as close as possible. Then, these not reliable. Distilling or teaching such uncertain predictions would partitions are divided into groups of roughly equal numbers of triples propagate noise rather than useful knowledge. To exclude the poisoned and distributed to the client. This results in tighter triple relationships knowledge, the prediction model should focus on positive samples within the client, better reflecting real-world scenarios. during distillation, ensuring that only trustworthy knowledge is trans- The TransE model [7] is selected as the KGE model, serving as ferred. The mathematical expression for the distillation loss of the the foundation for all federated learning methods in the experiments— prediction model in the 𝑘th training epoch is provided in Eq. (7). including the attacker’s shadow model. To benchmark CoDFKGE, we ∑ ( ) select multiple baseline models. First, the local training model without 𝑃 𝑃𝐿 𝑃 𝑃 𝐿𝑘 𝐾𝐷 = 𝜏 2 𝐷𝐾𝐿 𝜎(𝑔(ℎ, 𝑟, 𝑡; 𝜃𝑘𝑆 , 𝜃𝑘−1 )) ∥ 𝜎(𝑔(ℎ, 𝑟, 𝑡; 𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 )) federated learning is selected as the KGE baseline model. It does not (ℎ,𝑟,𝑡)∈ share parameters between clients, so it has no communication over- (7) head and is not vulnerable to poisoning attacks. Then, FedE [13] and Among them, 𝑡 is the distillation temperature coefficient, and 𝜎 is FedR [15] are also chosen as baseline FGKE models, representing stan- dard approaches in the field. Additionally, we implement a knowledge the softmax function of the ratio of the model output to 𝑡. 𝑔 represents distillation model, which utilizes communication and prediction models the scoring function of the prediction model, which is used to compute 𝑃𝐿 similar to CoDFKGE but only processes a unidirectional knowledge dis- the KGE loss. 𝑔(ℎ, 𝑟, 𝑡; 𝜃𝑘𝑆 , 𝜃𝑘−1 ) represents the communication model 𝑃𝐿 tillation. Specifically, it uses the communication model as the teacher output under server-shared parameter 𝜃𝑘𝑆 and local parameter 𝜃𝑘−1 , and 𝑃𝑆 𝑃𝐿 model and the prediction model as the student model to filter out 𝑔(ℎ, 𝑟, 𝑡; 𝜃𝑘 , 𝜃𝑘 ) represents the training prediction model output. poisoning knowledge, with the distillation loss function following Eq. When training distillation, the model also needs to consider the (4). KGE loss function. The overall loss function of the prediction model All experiments are performed on a 72-core Ubuntu 18.04.6 LTS is the weighted sum of the KGE loss and the distillation loss, and its machine with an Intel(R) Xeon(R) Gold 5220 CPU @ 2.20 GHz and mathematical expression is shown in Eq. (8). a V100S-PCIE-32GB GPU. We implemented the proposed FKGE frame- 𝑃 𝐿𝑃𝑘 = 𝛽𝐿𝑘 𝐾𝐺𝐸 + (1 − 𝛽)𝐿𝑘 𝐾𝐷 𝑃 (8) work and baseline model based on PyTorch Geometric [36] and dis- tributed AI framework Ray [37]. We used KGE hyperparameter settings 𝑃𝑘 Where, 𝐿𝐾𝐺𝐸 is the KGE loss of the 𝑘th epoch of the prediction model based on [9] and FKGE hyperparameter settings based on FedE [13]. defined by Eq. (1), and 𝛽 is the weight. Specifically, we used the Adam [38] optimizer with a learning rate of 1e-3. 𝛾 is 10, and self-advertise negative sampling temperature 𝜏𝛼 in 5.2. Prediction to communication model distillation KGE is 1. The distillation temperature 𝜏 is 2, and the coefficient 𝛽 of distillation and KGE loss are both 0.5. The maximum training epoch After training the prediction model, we train the communication is 400. In each epoch, the client performs 3 iterations locally before model through distillation, which extracts and propagates knowledge uploading the parameters to the server. without directly sharing prediction parameters, thereby avoiding pri- We utilize the link prediction task, a sub-task of KGE, to validate the vacy leakage. During the communication model’s distillation, the out- model’s accuracy. Referencing the common implementation of the link put of the prediction model under positive and negative samples serves prediction, we employ the Mean Reciprocal Rank (MRR) and Hits@N as as soft labels. As Eq. (1) illustrates, the loss function must account accuracy metrics. The MRR is the average of the reciprocals of the ranks for the probability of negative samples when balancing the impact of the predicted triples among all possible triples. Mathematically, if of positive and negative predictions. Therefore, the distillation loss 𝑟𝑎𝑛𝑘𝑖 is the rank of the correct triple for the 𝑖th query, and 𝑛 is the ∑ function of the communication model is formalized in Eq. (9). total number of queries, then 𝑀𝑅𝑅 = 1𝑛 𝑛𝑖=1 𝑟𝑎𝑛𝑘 1 . The Hits@N is the 𝑖 ∑ proportion of query triples for which the correct triple is present among 𝐶𝑘 𝑃 𝑃 𝐶 𝐶 𝐿𝐾𝐷 = 𝜏2 (𝐷𝐾𝐿 (𝜎(𝑔(ℎ, 𝑟, 𝑡; 𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 )) ∥ 𝜎(𝑔(ℎ, 𝑟, 𝑡; 𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 )))) the top 𝑁 candidates generated by the model. Generally, higher values ∑(ℎ,𝑟,𝑡)∈ for both metrics indicate better model performance in link prediction. 𝑃 𝑃 𝐶 𝐶 + 𝑝(ℎ, 𝑟, 𝑡′𝑖 )𝐷𝐾𝐿 (𝜎(𝑔(ℎ, 𝑟, 𝑡′𝑖 ; 𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 ) ∥ 𝜎(𝑔(ℎ, 𝑟, 𝑡′𝑖 ; 𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 )))) Through experiments, the following research questions will be ver- 𝑖 ified. (9) 𝐶 𝐶 RQ1 Does CoDFKGE maintain KGE prediction performance while re- Among them, 𝑔(ℎ, 𝑟, 𝑡; 𝜃𝑘 𝑆 , 𝜃𝑘 𝐿 ) represents the communication model ducing FKGE communication overhead? 𝑃𝑆 𝑃𝐿 output. 𝑔(ℎ, 𝑟, 𝑡; 𝜃𝑘 , 𝜃𝑘 ) represents the prediction model output under RQ2 Can CoDFKGE effectively defend against targeted poisoning at- 𝑃 𝑃 shared parameter 𝜃𝑘 𝑆 and local parameter 𝜃𝑘 𝐿 . The calculation method tacks? 6 Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113 Table 1 Experiment result on normal link prediction. Fed type Model Mem(MB) CC(MB) MRR Hits@1 Hits@5 Hits@10 Local Local(128) 57.05 – 0.4081 ± 0.0015 0.3066 ± 0.0014 0.5223 ± 0.0023 0.6077 ± 0.0015 Entity FedE(128) 185.58 42.60 0.4082 ± 0.0004 0.3068 ± 0.0012 0.5232 ± 0.0013 0.6080 ± 0.0018 Entity Distillation (128-128) 356.10 42.60 0.4129 ± 0.0008 0.3118 ± 0.0016 0.5279 ± 0.0008 0.6122 ± 0.0003 Entity CoDFKGE (128-128) 356.10 42.60 0.4109 ± 0.0043 0.3097 ± 0.0041 0.5246 ± 0.0044 0.6087 ± 0.0040 Entity Distillation (32-128) 217.39 10.65 0.3914 ± 0.0011 0.2935 ± 0.0008 0.5005 ± 0.0014 0.5838 ± 0.0032 Entity CoDFKGE (32-128) 217.40 10.65 0.4090 ± 0.0010 0.3079 ± 0.0007 0.5233 ± 0.0019 0.6068 ± 0.0019 Relation FedR(128) 75.49 0.69 0.4085 ± 0.0011 0.3079 ± 0.0021 0.5219 ± 0.0016 0.6066 ± 0.0017 Relation Distillation (128-128) 151.74 0.69 0.4106 ± 0.0013 0.3092 ± 0.0023 0.5242 ± 0.0008 0.6098 ± 0.0009 Relation CoDFKGE (128-128) 150.02 0.69 0.4065 ± 0.0007 0.3056 ± 0.0013 0.5190 ± 0.0023 0.6063 ± 0.0012 Relation Distillation (32-128) 94.53 0.17 0.3920 ± 0.0012 0.2960 ± 0.0007 0.4996 ± 0.0019 0.5807 ± 0.0013 Relation CoDFKGE (32-128) 93.69 0.17 0.4078 ± 0.0009 0.3060 ± 0.0007 0.5224 ± 0.0031 0.6074 ± 0.0015 RQ3 Can CoDFKGE effectively defend against untargeted poisoning 6.2. Targeted poisoning attack experiment (RQ2) attacks? RQ4 Do the two proposed distillation loss functions individually con- In the targeted poisoning attack, 32 pairs of non-existent triples tribute to poisoning defense? are selected as attack targets from the victim’s KG through negative sampling to construct a poisoned triple dataset. First, a predetermined 6.1. Normal link prediction (RQ1) number of normal triples are selected from the victim’s training triples. Subsequently, the head or tail nodes of these triples are randomly re- To explore the performance of the proposed model in normal link placed, and any triples already existing in the training set are iteratively prediction, we first tested the model on a conventional dataset. The removed until 32 pairs of non-existent triples are successfully con- performance of the model is measured using MRR and Hits@1, Hits@5, structed. In each epoch, the shadow model undergoes the same number and Hits@10. The model is trained by federated learning and evaluated of local training rounds as legitimate clients on the poisoned dataset to on the local test sets of clients. generate poisoned parameters. The malicious server aggregates these Table 1 lists the performance of the local KGE model, FedE, FedR, poisoned parameters with the parameters of the normal client into and CoDFKGE with different dimensions. The experimental results are shared parameters and distributes them to all clients. Attackers can grouped according to the type of shared embeddings and the dimension assign high weights to poisoned model parameters during aggregation. of the prediction model. The parameter dimensions are specified in Following the setup in Ref. [33], we set the weight of the attacker’s parentheses within the ‘‘Model’’ column. For example, CoDFKGE(32- aggregated poisoned triples to be 256 times that of normal triples. 128) denotes the CoDKGE model with a 32-dimensional communication Experiments focus on models with shared entity parameters (required model and a 128-dimensional prediction model. All link prediction for targeted poisoning attacks) and non-federated local baselines. experiments were repeated 5 times with different random seeds, and For space considerations, this section reports only MRR and the accuracy results of all models are reported as (mean ± standard Hits@10 metrics. Attack effectiveness is measured by the MRR and deviation). The best performing model results in each group (excluding Hits@10 of poisoned triples on the victim. The higher metrics of the the local model) are bolded. The results of the CoDFKGE (32-128) poisoned triples indicate greater vulnerability to poisoning and weaker model that are better than those of Distillation(32-128) are underlined. resistance of the model to targeted poisoning attacks. The performance of locally trained models is lower than most feder- Table 2 lists the performance of baseline models and CoDFKGE ated learning models, highlighting the advantages of sharing model pa- under targeted poisoning attacks, grouped by the prediction model rameters. High-dimensional distillation(128-128) models achieve better dimension. The parameter dimensions are specified in parentheses link prediction performance. Compared to distillation(128-128), CoD- within the ‘‘Model’’ column. The ‘‘All Clients’’ column reports av- FKGE models show slightly inferior prediction performance. However, erage performance across all clients’ test sets during attacks, while by comparing models with the same dimensions, CoDFKGE outperform both local baselines and federated baselines (FedE, FedR). The co- ‘‘Victim Poisoned’’ measures the victim’s performance on predicting distillation process in CoDFKGE may lead to a loss of generalization poisoned triples. All experiments were repeated 5 times with differ- accuracy. We believe that the main advantage of CoDFKGE is its ent random seeds, and the results are reported as (mean ± standard ability to enhance the security of FKGE. In addition to the security deviation). The best performing model results are bolded. Moreover, performance demonstrated in Sections 6.2 and 6.3, it also maintains the ‘‘Communication Poison’’ column highlights the communication link prediction performance comparable to its baseline FKGE models. model’s performance on poisoned triples for CoDFKGE and the dis- Beyond accuracy metrics, the ‘‘CC’’ (Communication Cost) column tillation model, demonstrating that both communication models are reports the communication overhead per training epoch, which is impacted by targeted poisoning attacks. Through distillation, the pre- calculated based on the byte size of PyTorch Embedding used in the diction accuracy of poisoned triples by the prediction model decreases implementation. The ‘‘Mem’’ column shows the GPU memory usage in both cases. of federated models in MB. Distillation-based model requires main- For targeted poisoning attacks, the primary evaluation metrics taining two KGE models, resulting in higher computational resource should be the MRR and Hits@10 performance indicators of the victim consumption. Distillation-based models need larger GPU memory to model when predicting poisoned triples. The Local training model, store the parameters of both models. Compared to using model pa- which does not employ federated learning, remains immune to poi- rameters of the same size, distillation-based models allow to compress soning attacks, resulting in low MRR for poisoned triples, with the parameters in the communication model, achieving significantly lower Hits@10 value being exactly 0. This indicates that the unpoisoned Local communication overhead. In cases of smaller communication overhead, model does not include non-existent poisoned triples among its top CoDFKGE(32-128) outperforms distillation(32-128) in link prediction 10 candidate results when making predictions. If a model incorrectly performance. Therefore, we believe that the CoDFKGE model does marks non-existent poisoned test triples as one of the top 10 candidates, not degrade the normal link prediction performance of baseline FKGE it demonstrates that the poisoning attack has successfully manipulated models and can effectively reduce the communication overhead of the the model’s predictions. Therefore, we use Hits@10 as the metric to model. measure the Attack Success Rate (ASR). 7 Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113 Table 2 Experiment result under targeted poisoning attack. Model All clients Victim poison Communication poison MRR Hits@10 MRR Hits@10(ASR) MRR Hits@10 Local(128, unpoisoned) 0.4081 ± 0.0015 0.6077 ± 0.0015 0.0003 ± 0.0001 0.0000 ± 0.0000 – – FedE(128) 0.4034 ± 0.0035 0.6004 ± 0.0029 0.4450 ± 0.0938 0.7857 ± 0.1248 – – Distillation(128-128) 0.4026 ± 0.0025 0.6006 ± 0.0039 0.0844 ± 0.0552 0.2000 ± 0.1311 0.4999 ± 0.1429 0.7714 ± 0.1046 CoDFKGE(128-128) 0.4086 ± 0.0007 0.6089 ± 0.0012 0.0010 ± 0.0003 0.0009 ± 0.0005 0.4694 ± 0.1511 0.6589 ± 0.1242 Distillation(32-128) 0.3821 ± 0.0022 0.5717 ± 0.0018 0.1511 ± 0.3356 0.1960 ± 0.4362 0.4919 ± 0.2364 0.6625 ± 0.1887 CoDFKGE(32-128) 0.3856 ± 0.0039 0.5740 ± 0.0054 0.0010 ± 0.0001 0.0010 ± 0.0003 0.3794 ± 0.0032 0.5702 ± 0.005 Fig. 3. Performance degradation comparison. The FedE model maintains high prediction accuracy on normal communication model in CoDFKGE(32-128) less susceptible to poison- test triples when under attack, but exhibits abnormally high MRR and ing attacks. Hits@10 metrics for targeted poisoned triples, even exceeding those of normal triples. This indicates that targeted poisoning attacks can 6.3. Untargeted poisoning attack experiment (RQ3) effectively manipulate the FedE model to generate incorrect prediction results. Similarly, in distillation-based models, their communication In untargeted poisoning attack experiments, the attacker returns models are severely affected by poisoning attacks, while the impact on negative aggregate parameters to the victim client, making the victim the prediction models is relatively minor. Although the distill(128-128) model non-converge and degrading prediction performance. The results model can partially eliminate poisoning knowledge, it still remains vul- presented in this section reflect average prediction performance on nerable to the targeted poisoning attacks. Moreover, as the dimension local test triples of clients. of the communication model parameter increases, the extent of the Table 3 lists the performance of each model under untargeted model’s vulnerability to poisoning attacks also grows. poisoning attacks, grouped by the prediction model dimension and In contrast, CoDFKGE’s prediction model performs distillation learn- federated type. The parameter dimensions are specified in parenthe- ing exclusively on verified positive samples, effectively eliminating ses within the ‘‘Model’’ column. The ‘‘All Clients’’ column shows the potential poisoning knowledge that might exist in negative samples. average performance of all clients under untargeted poisoning attacks, Similar to the Local training model, CoDFKGE achieves extremely low and the ‘‘Victim Client’’ column shows the performance of the victim MRR and Hits@10 metrics for poisoned triples, which fully demon- client. To measure the severity of the model being attacked, the MRR of strates that the CoDFKGE model can effectively defend against targeted the local model in Table 1 is used as a benchmark. The ‘‘Decay Ratio’’ poisoning attacks in FKGE. Furthermore, due to the compression of column shows the ratio of performance degradation on the victim the communication model’s dimension, the amount of information client compared to the local model shown in Table 1. All experiments that attackers can transmit is correspondingly reduced, making the were repeated 5 times with different random seeds, and the results 8 Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113 Table 3 Experiment result under untargeted poisoning attack. Fed Type Model All clients Victim Decay ratio (%) MRR Hits@10 MRR Hits@10 MRR Hits@10 Entity FedE(128) 0.3896 ± 0.0010 0.5939 ± 0.0009 0.3625 ± 0.0102 0.5620 ± 0.0144 11.21 7.58 Entity Distillation(128-128) 0.3900 ± 0.0017 0.5921 ± 0.0007 0.3641 ± 0.0012 0.5664 ± 0.0018 11.82 7.54 Entity CoDFKGE(128-128) 0.4084 ± 0.0007 0.6068 ± 0.0003 0.4017 ± 0.0010 0.6009 ± 0.0005 2.25 1.28 Entity Distillation (32-128) 0.3024 ± 0.0208 0.5422 ± 0.0105 0.2739 ± 0.0264 0.5262 ± 0.0124 30.02 9.49 Entity CoDFKGE (32-128) 0.4093 ± 0.0018 0.6081 ± 0.0014 0.4022 ± 0.0022 0.6023 ± 0.0011 1.66 0.75 Relation FedR(128) 0.3915 ± 0.0010 0.5951 ± 0.0016 0.3637 ± 0.0093 0.5636 ± 0.0150 10.96 7.10 Relation Distillation(128-128) 0.3978 ± 0.0017 0.6022 ± 0.0019 0.3881 ± 0.0023 0.5942 ± 0.0028 5.51 2.56 Relation CoDFKGE(128-128) 0.4086 ± 0.0017 0.6075 ± 0.0029 0.4014 ± 0.0020 0.6018 ± 0.0037 1.24 0.75 Relation Distillation (32-128) 0.3058 ± 0.0079 0.5463 ± 0.0029 0.2787 ± 0.0101 0.5307 ± 0.0038 27.78 8.61 Relation CoDFKGE (32-128) 0.4090 ± 0.0008 0.6066 ± 0.0011 0.4026 ± 0.0008 0.6018 ± 0.0013 1.27 0.92 Table 4 Ablation study in normal link prediction and under targeted attack. Model Link prediction Targeted all clients Targeted victim poisoning MRR Hits@10 MRR Hits@10 MRR Hits@10 (targeted poisoning ASR) CoDFKGE 0.4112 ± 0.0039 0.6084 ± 0.0036 0.4086 ± 0.0007 0.6089 ± 0.0012 0.0010 ± 0.0003 0.0009 ± 0.0005 Ablation(Comm) 0.4095 ± 0.0016 0.6074 ± 0.0014 0.4086 ± 0.0022 0.6076 ± 0.0021 0.0017 ± 0.0008 0.0013 ± 0.0008 Ablation(Pred) 0.4132 ± 0.0006 0.6116 ± 0.0012 0.4098 ± 0.0011 0.6080 ± 0.0009 0.8086 ± 0.0064 0.9702 ± 0.0228 are reported as (mean ± standard deviation). The best and second best were repeated 5 times with different random seeds, and the results are results in each group have been marked in bold and underline. reported as (mean ± standard deviation). The best results are bolded. From the experimental results, it can be observed that when sub- Experimental results demonstrate that while Ablation(Pred) per- jected to untargeted poisoning attacks, the CoDFKGE series models forms well in conventional link prediction, its resistance to poisoning achieve optimal MRR and Hits@10 performance metrics compared to attacks lags behind the other two models due to not employing a other models. In this context, all models exhibit varying degrees of negative sample exclusion strategy in its loss function. Among the re- decline in both their overall performance metrics and their performance maining two models, while both demonstrate robust resilience against metrics on victims. In Fig. 3, we present a comparison of the predic- poisoning attacks, the CoDFKGE model achieves superior link pre- tion performance of various models under normal link prediction and diction performance compared to Ablation(Comm). Ablation(Comm) untargeted poisoning attack scenarios. It can be observed that the Dis- employs a baseline loss function during the distillation training of tillation(32-128) model experiences the most significant performance the communication model. In contrast, the CoDFKGE model adopts degradation; for Distillation(128-128), FedE, and FedR models, their the approach from [9] and utilizes self-adversarial sampling temper- performance degradation is also substantial and cannot be ignored. ature 𝜏𝛼 to reweight negative samples, thereby enhancing the model’s These models directly incorporate poisoned global knowledge as an ability to distinguish between negative samples. Overall, the ablation integral part of their own models, causing the convergence process of experiments demonstrate that applying the proposed distillation loss the models to be adversely affected. In contrast, the performance degra- functions simultaneously enhances the model’s capability in defending dation of CoDFKGE models is fully within 3%. This is because even in against poisoning attacks and link prediction. the absence of global knowledge, the prediction model of CoDFKGE still 7. Conclusion utilizes local data knowledge for training, and its training effectiveness is comparable to that of local KGE models without knowledge sharing. This paper proposes CoDFKGE, a co-distillation-based defense Baseline models may have their results manipulated or exhibit framework for FKGE poisoning attacks. As the first co-distillation significant performance degradation when facing poisoning attacks. defense framework against poisoning attacks in FKGE, CoDFKGE does Although in link prediction experiments, distillation models exhibited have some limitations. First, maintaining two separate models requires advantages in performance, their defense effectiveness is extremely higher computational resource consumption on clients. Second, the limited when facing poisoning attacks. In contrast, CoDFKGE remains bidirectional distillation process may lead to a loss of generalization unmanipulated when encountering targeted poisoning attacks and does accuracy. In contrast, CoDFKGE’s advantages lie in its model-agnostic not exhibit significant performance degradation when subjected to applicability to existing FKGE models without compromising perfor- untargeted poisoning attacks, demonstrating its effective defense capa- mance. By decoupling clients’ prediction models from shared parameter bility against poisoning attacks. models, CoDFKGE effectively filters out poisoned knowledge embedded in shared updates. CoDFKG eliminates malicious manipulations under 6.4. Ablation study (RQ4) targeted poisoning attacks, and significantly mitigates accuracy degra- dation under untargeted poisoning attacks. Leveraging distillation, This section evaluates the defensive effects of applying different the framework further reduces communication overhead. This work loss functions in CoDFKGE against poisoning attacks. Specifically, we provides new ideas for enhancing the security of FKGE. compare the performance of models using 128-dimensional training The limitations of FKGE poisoning defense research are partially parameters for both communication and prediction models across nor- rooted in the unique characteristics of KGE. When considering mal link prediction, targeted poisoning attack scenarios, and untargeted translation-based KGE models in FKGE, sharing entity or relation poisoning attack scenarios. Two ablation baselines were implemented: embeddings introduces risks related to both privacy preservation and Ablation(Comm) applies the baseline loss function (Eq. (4)) solely poisoning attacks. Employing GNN-based KGE models in FKGE that during the communication module’s distillation, while Ablation(Pred) transmit GNN parameters or gradients can alleviate these concerns. uses it exclusively for the prediction module’s distillation. However, due to their superior robustness to sparse data and lower Tables 4 and 5 shows the experiment results of models with different computational resource requirements, translation-based models still distillation loss functions sharing entity embeddings. All experiments maintain unparalleled advantages in specific application scenarios. 9 Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113 Table 5 Ablation study under untargeted attack. Model Untargeted all clients Untargeted victim Decay ratio (%) MRR Hits@10 MRR Hits@10 MRR Hits@10 CoDFKGE 0.4084 ± 0.0007 0.6068 ± 0.0003 0.4017 ± 0.0010 0.6009 ± 0.0005 2.25 1.27 Ablation(Comm) 0.4056 ± 0.0017 0.6062 ± 0.0011 0.3996 ± 0.0018 0.6003 ± 0.0013 2.42 1.16 Ablation(Pred) 0.3951 ± 0.0011 0.6022 ± 0.0008 0.3852 ± 0.0009 0.5951 ± 0.0005 6.76 2.69 For future research, we recommend exploring the application of the [8] Z. Wang, J. Zhang, J. Feng, Z. Chen, Knowledge graph embedding by translating CoDFKGE framework in more complex real-world scenarios, such as on hyperplanes, in: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 28, 2014. personalized FKGE problems. Additionally, in large-scale dynamic KG [9] Z. Sun, Z.-H. Deng, J.-Y. Nie, J. Tang, Rotate: Knowledge graph embedding by environments, the security landscape for FKGE may undergo signifi- relational rotation in complex space, 2019, arXiv preprint arXiv:1902.10197. cant changes, necessitating further investigation into defense methods [10] Z. Zhang, J. Jia, Y. Wan, Y. Zhou, Y. Kong, Y. Qian, J. Long, Transr*: Repre- tailored to these evolving scenarios. sentation learning model by flexible translation and relation matrix projection, J. Intell. Fuzzy Systems 40 (5) (2021) 10251–10259. [11] T. Dettmers, P. Minervini, P. Stenetorp, S. Riedel, Convolutional 2d knowl- CRediT authorship contribution statement edge graph embeddings, in: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, (1) 2018. Yiqin Lu: Supervision. Jiarui Chen: Writing – original draft, Soft- [12] B. McMahan, E. Moore, D. Ramage, S. Hampson, B.A. y Arcas, Communication- efficient learning of deep networks from decentralized data, in: Artificial ware, Methodology. Jiancheng Qin: Writing – review & editing. Intelligence and Statistics, PMLR, 2017, pp. 1273–1282. [13] M. Chen, W. Zhang, Z. Yuan, Y. Jia, H. Chen, Fede: Embedding knowledge graphs Declaration of Generative AI and AI-assisted technologies in the in federated setting, in: Proceedings of the 10th International Joint Conference writing process on Knowledge Graphs, 2021, pp. 80–88. [14] M. Chen, W. Zhang, Z. Yuan, Y. Jia, H. Chen, Federated knowledge graph During the preparation of this work the author(s) used deepseek in completion via embedding-contrastive learning, Knowl.-Based Syst. 252 (2022) 109459. order to improve language and readability. After using this tool/service, [15] K. Zhang, Y. Wang, H. Wang, L. Huang, C. Yang, X. Chen, L. Sun, Efficient fed- the author(s) reviewed and edited the content as needed and take(s) full erated learning on knowledge graphs via privacy-preserving relation embedding responsibility for the content of the publication. aggregation, 2022, arXiv preprint arXiv:2203.09553. [16] G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network, 2015, arXiv preprint arXiv:1503.02531. Declaration of competing interest [17] N. Papernot, P. McDaniel, X. Wu, S. Jha, A. Swami, Distillation as a de- fense to adversarial perturbations against deep neural networks, in: 2016 IEEE The authors declare that they have no known competing finan- Symposium on Security and Privacy, SP, IEEE, 2016, pp. 582–597. cial interests or personal relationships that could have appeared to [18] K. Yoshida, T. Fujino, Countermeasure against backdoor attack on neural networks utilizing knowledge distillation, J. Signal Process. 24 (4) (2020) influence the work reported in this paper. 141–144. [19] K. Yoshida, T. Fujino, Disabling backdoor and identifying poison data by Acknowledgment using knowledge distillation in backdoor attacks on deep neural networks, in: Proceedings of the 13th ACM Workshop on Artificial Intelligence and Security, 2020, pp. 117–127. This work is supported by the Special Project for Research and [20] R. Anil, G. Pereyra, A. Passos, R. Ormandi, G.E. Dahl, G.E. Hinton, Large Development in Key Areas of Guangdong Province, under Grant scale distributed neural network training through online distillation, 2018, arXiv 2019B010137001. preprint arXiv:1804.03235. [21] Y. Hu, W. Liang, R. Wu, K. Xiao, W. Wang, X. Li, J. Liu, Z. Qin, Quantifying and defending against privacy threats on federated knowledge graph embedding, in: Data availability Proceedings of the ACM Web Conference 2023, 2023, pp. 2306–2317. [22] X. Zhu, G. Li, W. Hu, Heterogeneous federated knowledge graph embedding Data will be made available on request. learning and unlearning, in: Proceedings of the ACM Web Conference 2023, 2023, pp. 2444–2454. [23] X. Zhang, Z. Zeng, X. Zhou, Z. Shen, Low-dimensional federated knowledge graph embedding via knowledge distillation, 2024, arXiv preprint arXiv:2408.05748. References [24] Y. Liu, Z. Sun, G. Li, W. Hu, I know what you do not know: Knowledge graph embedding via co-distillation learning, in: Proceedings of the 31st ACM [1] X. Zhao, H. Chen, Z. Xing, C. Miao, Brain-inspired search engine assistant based International Conference on Information & Knowledge Management, 2022, pp. on knowledge graph, IEEE Trans. Neural Netw. Learn. Syst. 34 (8) (2021) 1329–1338. 4386–4400. [25] F. Xia, W. Cheng, A survey on privacy-preserving federated learning against [2] S. Sharma, Fact-finding knowledge-aware search engine, in: Data Management, poisoning attacks, Clust. Comput. 27 (10) (2024) 13565–13582. Analytics and Innovation: Proceedings of ICDMAI 2021, vol. 2, Springer, 2021, [26] J. Chen, H. Yan, Z. Liu, M. Zhang, H. Xiong, S. Yu, When federated learning pp. 225–235. meets privacy-preserving computation, ACM Comput. Surv. (ISSN: 0360-0300) [3] Y. Jiang, Y. Yang, L. Xia, C. Huang, DiffKG: Knowledge graph diffusion model for 56 (12) (2024). recommendation, in: Proceedings of the 17th ACM International Conference on [27] J. Xia, Z. Yue, Y. Zhou, Z. Ling, Y. Shi, X. Wei, M. Chen, Waveattack: Asymmetric Web Search and Data Mining, WSDM ’24, Association for Computing Machinery, frequency obfuscation-based backdoor attacks against deep neural networks, Adv. New York, NY, USA, ISBN: 9798400703713, 2024, pp. 313–321. Neural Inf. Process. Syst. 37 (2024) 43549–43570. [4] W. Wang, X. Shen, B. Yi, H. Zhang, J. Liu, C. Dai, Knowledge-aware fine-grained [28] P. Blanchard, E.M. El Mhamdi, R. Guerraoui, J. Stainer, Machine learning with attention networks with refined knowledge graph embedding for personalized adversaries: Byzantine tolerant gradient descent, Adv. Neural Inf. Process. Syst. recommendation, Expert Syst. Appl. 249 (2024) 123710. 30 (2017). [5] J. Chen, Y. Lu, Y. Zhang, F. Huang, J. Qin, A management knowledge graph [29] N.M. Jebreel, J. Domingo-Ferrer, Fl-defender: Combating targeted attacks in approach for critical infrastructure protection: Ontology design, information ex- federated learning, Knowl.-Based Syst. 260 (2023) 110178. traction and relation prediction, Int. J. Crit. Infrastruct. Prot. (ISSN: 1874-5482) [30] Z. Yue, J. Xia, Z. Ling, M. Hu, T. Wang, X. Wei, M. Chen, Model-contrastive 43 (2023) 100634. learning for backdoor elimination, in: Proceedings of the 31st ACM International [6] Y. Zhang, J. Chen, Z. Cheng, X. Shen, J. Qin, Y. Han, Y. Lu, Edge propagation Conference on Multimedia, 2023, pp. 8869–8880. for link prediction in requirement-cyber threat intelligence knowledge graph, [31] H. Peng, H. Li, Y. Song, V. Zheng, J. Li, Differentially private federated Inform. Sci. (ISSN: 0020-0255) 653 (2024) 119770. knowledge graphs embedding, in: Proceedings of the 30th ACM International [7] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, O. Yakhnenko, Translating Conference on Information & Knowledge Management, CIKM ’21, Association embeddings for modeling multi-relational data, Adv. Neural Inf. Process. Syst. for Computing Machinery, New York, NY, USA, ISBN: 9781450384469, 2021, 26 (2013). pp. 1416–1425. 10 Y. Lu et al. Computer Standards & Interfaces 97 (2026) 104113 [32] Y. Hu, Y. Wang, J. Lou, W. Liang, R. Wu, W. Wang, X. Li, J. Liu, Z. Qin, Privacy [36] M. Fey, J.E. Lenssen, Fast graph representation learning with PyTorch Geometric, risks of federated knowledge graph embedding: New membership inference in: ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019. attacks and personalized differential privacy defense, IEEE Trans. Dependable [37] P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang, M. Elibol, Secur. Comput. (2024). Z. Yang, W. Paul, M.I. Jordan, I. Stoica, Ray: A distributed framework for [33] E. Zhou, S. Guo, Z. Ma, Z. Hong, T. Guo, P. Dong, Poisoning attack on federated emerging AI applications, in: 13th USENIX Symposium on Operating Systems knowledge graph embedding, in: Proceedings of the ACM Web Conference 2024, Design and Implementation (OSDI 18), USENIX Association, Carlsbad, CA, ISBN: 2024, pp. 1998–2008. 978-1-939133-08-3, 2018, pp. 561–577. [34] G. Xia, J. Chen, C. Yu, J. Ma, Poisoning attacks in federated learning: A survey, [38] D.P. Kingma, J. Ba, Adam: A method for stochastic optimization, 2014, arXiv Ieee Access 11 (2023) 10708–10722. preprint arXiv:1412.6980. [35] K. Toutanova, D. Chen, P. Pantel, H. Poon, P. Choudhury, M. Gamon, Repre- senting text for joint embedding of text and knowledge bases, in: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015, pp. 1499–1509. 11