Efficient secure aggregation for privacy-preserving federated learning based on secret sharing

2024-03-23 04:33:28XuanJinYuanzhiYaoandNenghaiYu

中國科學(xué)技術(shù)大學(xué)學(xué)報(bào) 2024年1期

Xuan Jin ,Yuanzhi Yao ?,and Nenghai Yu

1School of Cyber Science and Technology, University of Science and Technology of China, Hefei 230027, China;

2School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China

Abstract: Federated learning allows multiple mobile participants to jointly train a global model without revealing their local private data.Communication-computation cost and privacy preservation are key fundamental issues in federated learning.Existing secret sharing-based secure aggregation mechanisms for federated learning still suffer from significant additional costs,insufficient privacy preservation,and vulnerability to participant dropouts.In this paper,we aim to solve these issues by introducing flexible and effective secret sharing mechanisms into federated learning.We propose two novel privacy-preserving federated learning schemes: federated learning based on one-way secret sharing (FLOSS) and federated learning based on multi-shot secret sharing (FLMSS).Compared with the state-of-the-art works,FLOSS enables high privacy preservation while significantly reducing the communication cost by dynamically designing secretly shared content and objects.Meanwhile,FLMSS further reduces the additional cost and has the ability to efficiently enhance the robustness of participant dropouts in federated learning.Foremost,FLMSS achieves a satisfactory tradeoff between privacy preservation and communication-computation cost.Security analysis and performance evaluations on real datasets demonstrate the superiority of our proposed schemes in terms of model accuracy,privacy preservation,and cost reduction.

Keywords: federated learning;privacy preservation;secret sharing;secure aggregation

1 Introduction

Computational models generated by deep learning are able to learn valuable representations of data[1].Deep learning has dramatically improved artificial intelligence in many domains,such as object recognition[2],object classification[3],and brain science[4].A relative consensus is that stronger hardware foundation,better algorithm design,and larger available data are the main driving forces for the further development of deep learning.Emerging deep physical neural networks[5]provide new ways of learning at the hardware level.Ingenious algorithms and model designs,such as GoogLeNet[6]and ResNet[7],have also been successful in advancing deep learning.In the meantime,how to obtain efficient data for deep learning is a key issue.

With the proliferation of mobile communication terminals and the rapid spread of sensory engineering,such as the Internet of Things,mankind has stepped into the era of big data.However,due to a series of barriers,such as personal privacy and industry secrecy,it is difficult to share data securely,and it is harder for deep learning to gain access to data on a mass of terminals.To this end,Google has proposed a federated learning[8]framework for mobile communication terminals,which adopts a distributed deep learning approach of local training and parameter aggregation.In federated learning,local training is performed at the participant side to update local model parameters,and parameter aggregation is conducted for the global model update by transmitting model parameters (e.g.,gradients).However,many studies[9-12]in recent years have shown that federated learning also faces serious privacy threats,such as membership inference attacks,GAN-based attacks,and reconstruction attacks.Training data can be recovered with high quality from the model parameters shared in federated learning[11].Protecting local model parameters transmitted in federated learning is a considerable problem[13].

Privacy-preserving strategies for federated learning are usually divided into two categories[14]: perturbation strategies based on differential privacy (DP),encryption strategies based on homomorphic encryption (HE),and secure multiparty computation (SMPC).DP-based preservation[15]inevitably reduces model utility,although privacy guarantees can be obtained.In addition,the applications[16,17]of HE for highdimensional gradients impose an unbearable computational load on the mobile terminal.It is worth noting that the general SMPC model using secret sharing fits well with the scenario of distributed learning and edge computing.The number of participants in secret sharing directly affects privacypreserving intensity and system overhead but is solidified into two-party or all-party in previously proposed deep learning frameworks.The two-party computation (2-PC) setting is often used to implement a lightweight privacy-preserving deep learning framework[18,19].However,the security of the 2-PC setting is based on two noncolluding servers,and this setting is usually achieved by using servers from different service providers[20],which is fragile and has no theoretical guarantee.All-party computation (All-PC) secret sharing-based secure aggregation is widely utilized in privacy-preserving federated learning[21-23].However,a large number of participants in federated learning all join in the fully connected secret sharing computing,which will inevitably lead to huge computing and communication costs in the process of encrypting and handling dropout.2-PC,All-PC,lightweight,and full connection are not secret sharing schemes that can better balance the strength and cost of privacy protection.Therefore,secret sharing-based privacy-preserving federated learning schemes with low communication-computation cost,sufficient privacy preservation,and robustness of participant dropouts should be sought.

In this work,we propose two efficient secure aggregation schemes for privacy-preserving federated learning: federated learning based on one-way secret sharing (FLOSS) and federated learning based on multi-shot secret sharing (FLMSS).The problem of constructing a global model where no participant reveals its updated gradients is referred to as secure aggregation for federated learning.We consider the general cross-device federated learning framework where mobile terminals have high communication constraints,weak computing power,and the possibility of dropouts at any time.FLOSS allows participants to join secret sharing of gradients within their own sharing group and reduce the communication cost by replacing entire high-dimensional random vectors with pseudorandom seeds.FLMSS takes advantage of the large number of cross-device federated learning participants in the real world to design secure aggregation protocols with tunable performance and the ability to solve participant dropout with minimal cost.It is worth mentioning that we abandon All-PC for the first time in FLMSS.The theoretical analysis proves that the local dataset of any participant can be well protected under the severe threat of the honest-but-curious server and participant collusion.Extensive experiments demonstrate that our proposed schemes have distinct advantages over the state-of-the-art works.

The remainder of the paper is organized as follows.Section 2 discusses the related work.In Section 3,we review some preliminaries used in our proposed schemes.In Section 4,problem statements are given.In Section 5,we elaborate on the design details of FLOSS and FLMSS.In Section 6,we present the security analysis followed by the experimental results in Section 7.Section 8 concludes this paper.

2 Related work

Our research is related to the line of work on federated learning with secure aggregation to protect the updated model parameters.To the best of our knowledge,none of the existing works can achieve our effectiveness between privacypreserving intensity and additional overhead in a federated learning setting.

Some works[16,17]have tried to prevent honest-but-curious servers from gaining user privacy by encrypting the uploaded model parameters with original homomorphic encryption.For the first time,Phong et al.[16]applied homomorphic encryption to distributed learning completely and provided systematic practice based on two different encryption methods: Paillier encryption and LWE (learning with errors)-based encryption.However,the only key pair is available to all participants,which makes the security of the homomorphic encryption system a huge risk.DeepPAR[17]is a privacypreserving distributed learning scheme based on reencryption in an attempt to address the aforesaid security vulnerabilities of the shared key.However,due to the unreasonable design of the system mechanism,trusted third parties(TTP) can easily be used to decrypt any user’s encrypted data.Most importantly,the computational overhead of the above schemes for the original homomorphic encryption of highdimensional parameters is unacceptable for ordinary devices.

Due to the low computational power of federated learning participants,many efforts have attempted to use more efficient encryption methods[24,25]and hybrid methods[24,26]with DP.The works of Xu et al.[24]and Wu et al.[25]both adopted multi-input function encryption (MIFE) to complete the encryption aggregation of local model parameters,which allows parties to encrypt their data and the sever to compute a specific function on the ciphertext.HybridAlpha[24]improves conventional MIFE schemes to reduce the number of communication rounds and running time without intensive pairing operations in federated learning and allows participants to dynamically join or drop.Wu et al.[25]employed the all-ornothing transform (AONT) to encode parameter matrixes,thereby reducing the number of parameters required to be encrypted with MIFE.Even if efficient homomorphic encryption methods are adopted,performance overheads are not reduced to a desired level.DP-based protection schemes can achieve a mathematically guaranteed protection strength with a small additional computation overhead.Truex et al.[26]skillfully found that leveraging the secure multi-party computation (SMC) framework can reduce the noise required by DP and maintain the original protection strength for local model parameters.Therefore,the hybrid approach combines the Threshold-Paillier (TP) system and DP to achieve privacy protection of federated learning and reduce the noise added by DP.HybridAlpha[24]follows the above findings,but the difference is that it combines DP with MIFE.However,we must realize that once DP is introduced,it inevitably reduces the accuracy of the global model.Obviously,the lightweight privacy protection based on reducing the availability of the global model is not the optimal solution.

Based on the above considerations,additive secret sharing,which is a lightweight and efficient privacy protection method,is becoming popular.Bonawitz et al.[21]used a doublemasking structure to protect the shared model parameters and secret sharing to enable the masks to be securely removed.However,the proposed double-masking mechanism to solve the participant dropout problem would reveal dropout participants’ secret keys and introduce a significant amount of additional computational and communication costs.SSDDL[22]uses a simple secret sharing algorithm to achieve highstrength protection of uploaded gradients,but the unreasonable design of its secret content induces huge communication overhead,and this scheme cannot cope with the situation ofparticipant dropouts.

There are some follow-up works[23,27,28]attempting to improve Ref.[21] in many aspects,such as system overhead,robustness,and security.Turbo-aggregate[28]also employs the double-masking structure to protect model parameters based on additive secret sharing,but it sets up a topology of multigroup circular to aggregate the results,which reduces the aggregation overhead fromO(N2) toO(NlogN).Redundancy based on Lagrange coding is introduced to Turboaggregate[28],which enables the federated learning system to tolerate 50% dropouts.However,the structure of multi-group circles depends on additional topology information and the increase in system complexity,and there is still redundancy in computing and communication when aggregation is performed in groups.In fact,the aggregation overhead of participants can be reduced toO(N)[23].We particularly note that Kadhe et al.[27]used the term multi-secret sharing,which is similar to the FLMSS.However,there are essential differences between the above two schemes.FASTSECAGG in Ref.[27] is based on the fast Fourier transform (FFT) to improve secret sharing,and its tolerance for client collusion and dropout is restricted to a preset constant fraction of clients.Zheng et al.[23]simplified the dropout handling mechanism of Ref.[21],introduced quantization-based model compression and trusted hardware into the system realization and obtained bandwidth efficiency optimization and security against an actively adversarial server.However,Zheng et al.[23]did not make innovative changes to the most basic additive secret sharing scheme originating from Ref.[21],and their tenuous dropout-tolerance algorithm could not converge to an end once any new dropout occurred in the process of dealing with client dropouts.

Compared with Refs.[21,23],we have made significant changes in the mechanisms of secret-sharing objects and dropout processing,which further reduce the computationcommunication overhead.Table 1 summarizes the comparison with most of the related works discussed above.The specific advantages of the computation-communication overhead of our work will be compared in the following sections.

Table 1. Comparison of secure aggregation schemes for federated learning.

3 Preliminaries

In this section,we introduce some preliminaries,including federated learning,secret sharing,key agreement,and pseudorandom generators.In Table 2,we provide a summary of the main notations used in this paper.

Table 2. Key notations used in this paper.

3.1 Federated learning

Federated learning[8]is a new distributed deep learning system with the goal of improving global model utility and protecting participant data privacy,which is constrained by the computational performance and communication bandwidth of edge devices.The basic framework of federated learning,which is composed of one sever andNparticipants,is shown in Fig.1.The local training dataset Difor federated learning is stored locally by each participantPi,and a central serverScollects and aggregates each local gradientGt,ito generate an updated global modelWtat thetth communication round.

Fig.1. The basic framework of federated learning.

At thetth communication round,the central sever selects a subset ofNparticipants PNand sends them the current global modelWt-1.Each selected participant uses the current global modelWt-1and local dataset Dito train to obtain the local gradientGt,iand then uploads it back to the central server.The central sever averages the received gradients to compute the new global model:

where αt,i=is the weight corresponding to participantPiat thetth communication round.A number of studies[29,30]have improved the above benchmark federated learning framework and aggregation algorithms in terms of global model convergence speed and accuracy.

3.2 Secret sharing

This paper adopts Shamir’st-out-of-nsecret sharing[31],which allows a user to split a secretsintonshares,such that anytshares can reconstructs,but anyt-1 or fewer shares cannot reveal anything abouts.The scheme consists of the sharing algorithm and the reconstruction algorithm.The sharing algorithm SS.share(s,t,n)→{s1,s2,···,sn} takes as input a secrets,the thresholdt,and the number of sharesn≥t,and then generates a set of shares S={s1,s2,···,sn}.Given a subset R where R ?S and |R|≥t,the reconstruction algorithm SS.recon(R,t)→scan reconstruct the secrets.

3.3 Key agreement

Key agreement allows the communicating parties to exchange keys securely by negotiating a consistent key without revealing any information about this key.This paper adopts the Diffie-Hellman key agreement[32]to produce secret shares securely.Diffie-Hellman key agreement consists of parameter generation algorithms,key generation algorithms,and key agreement algorithms. Algorithm KA.param(k)→pp=(G′,g,q)produces the needed public parameters,including prime orderqand generatorgof cyclic group G′.Each partyucan use the algorithm KA.gen(pp)→(S Ku,PKu)=(x,gx)to generate its private-public key pair.xis sampled randomly from group Zq(a cyclic group of prime order integers).Another partyvcan use the algorithm KA.agree(S Kv,PKu)→AKv,u=gxuxvto obtain a securely shared key betweenuandv.In the same way,onlyucan also obtainAKu,v,whereAKu,v=AKv,u.A Hash process can be added to the algorithm KA.agreeto change the field of the shared key[21].

3.4 Pseudorandom generator

A secure pseudorandom generator[33,34]PRG is a mappingdeterministic but unpredictable function that expands a binary bit string into a longer bit string.We can simplify the PRG function as P RG({0,1}λ)→{0,1}p(λ)wherep(λ)>λ.The input of PRG is a seed,i.e.,a bit string of length security parameter λ.Security for a pseudorandom generator guarantees that it is negligibly possible to computationally distinguish the output of PRG from a truly random one,as long as the seed is confidential.It is worth emphasizing that when feeding the same seeds to the same pseudorandom generator,the output is the same,which is one of the important methods used to save communication overhead.

4 Problem statements

In this section,we introduce the threat model,system models,and requirements of FLOSS and FLMSS.

4.1 Threat model

Our system is designed to withstand two potential adversaries: the central sever and the participants.

(Ⅰ) Honest-but-curious central sever.The honest-butcurious or semi-honest adversarial model is commonly considered in privacy-preserving data mining and cloud computing[35,36].The honest-but-curious central sever follows the defined protocol correctly and will not intentionally modify,add or delete data required to be processed.However,the sever will try to learn additional information or understand the content of data.Therefore,participants’ private information may be learned and leaked from the model updates by the sever.

(Ⅱ) Curious and colluding participants.We assume that some participants may collude with the central server to acquire the privacy of other participants by inspecting the information that they have access to during the correct execution of the protocol.The unrealistic extreme case in which there is only one honest participant and all other participants collude with the sever is not considered.This is a common assumption in secret sharing-based privacy-preserving schemes[22,23].

We assume that dependable and secure channels are guaranteed in all communications.Thus,messages are prevented from accidental attacks such as snooping,loss,and tampering during transmission.

4.2 System models of FLOSS and FLMSS

First,we define FLOSS and FLMSS in a functional way with a comprehensive textual explanation.

Definition 1: we formalize the definition of FLOSS=(Init,groupSet,LocalTrain,GradShare,ModelUpdate) which consists of five algorithms.

(ⅰ) Init(PN,m)→(W,α,M,K): The central server selects PNto participate in this round of training,the participation weights form a vectorα,the latest global model is used as the initial modelWfor this round,andNparticipants are grouped according to the sharing-group sizemto obtain the sharing-group setM.Kis the set of all participants’ key pairs.

(ⅱ) GroupSet(M)→(μ,idxμ): On input the sharing-group setM,the server selects a group μ and generates participants’IDsidxμin this group.

(ⅲ) LocalTrain(W,μ,η)→Gμ: ParticipantPiin groupμuses local dataset Diand global modelWto run the deep learning algorithm with parameter selection rate η and obtains local gradientGiwhereGμ={Gi}i∈μ.

(ⅳ) GradShare(Gμ,K)→Yμ: On input the set of local modelsGμand the set of key pairsK,participants obtain the set of secret gradientsYμby secret sharing in group μ and then upload these secret gradients.

(ⅴ) ModelUpdate(Yμ1,Yμ2,···)→W: After receiving the secret gradients uploaded by all sharing groups,the server performs a global model update to obtain a new global modelW.

Definition 2: We formalize the definition of FLMSS=(Init,LocalTrain,GradShare,DropHandle,ModelUpdate)which consists of five algorithms.

(ⅰ) Init(PN)→(W,α,K): FLMSS no longer needs to group participants in the initialization phase,and the rest is the same as definition 1.

(ⅱ) LocalTrain(W,PN,η,α)→(G1,G2,···GN): ParticipantPiobtains the local gradientGiin the same way as in definition 1.

(ⅲ) GradShare(G1,···,GN,K,d)→(Y1,Y2,···YN): On input the shot countdof secret sharing and the set of key pairsK,each participant splits its plaintext gradient intod+1 parts by generatingdmask vectors computed from agreement keys and secretly sharing them withdparticipants.The plaintext gradientGiof participantPibecomes the secret gradientYiafter secre(t sharing is complete)d.(

(ⅳ) DropHandlePdrop，{Yi}Pi∈Phelp，K→On input of the set of dropped participants Pdropand the set of key pairsK,the participant Phelp,which shares secrets with the dropped participants,cooperates to resolve dropout and upload new secret gradients.

Figs.2 and 3 show the system models of FLOSS and FLMSS.The yellow dashed boxes mark the sequence of the main steps of secret sharing.Key notations in both figures can be explained in Table 2.In Fig.2,STEP 1,STEP 2,and STEP 3 correspond to LocalTrain,GradShare,and ModelUpdate in the definition of FLOSS,respectively.In Fig.3,STEP 1 and STEP 2 correspond to LocalTrain and GradShare in the definition of FLMSS.It is worth noting that we need to understand STEP 3 and STEP 4 together because there are situations where new participants drop out when dealing with dropouts.This means that STEP 3 and STEP 4 may be alternating and nested.The combination of STEP 3 and STEP 4 in Fig.3 corresponds to the process of DropHandle with ModelUpdate.

Fig.2. System model of FLOSS where the formulas are simplified versions.

Fig.3. System model of FLMSS with the case of d=1.The omitted secret share values can be obtained in Fig.2.The aggregation process after handling the first round of dropped participants is not shown in this figure.The red dashed arrow represents the dropout state of the participant.

4.3 Requirements

To guarantee the performance of the proposed methods,we first define the following four requirements.

Security.The security guaranteed by the proposed schemes is based on the threat model described in Section 4.1.The local dataset and plaintext gradients computed from it should be kept secret against both the central sever and the other participants.No adversary can derive private information from any messages communicated in the federated learning system that would reveal the participants’ local data and models.

Efficiency.The extra computation and communication cost due to privacy-preserving schemes should be as small as possible,especially the cost borne by the participants.

Accuracy.Privacy-preserving schemes should minimize negative effects on global model accuracy.

Robustness.The system should be able to have an efficient solution to the problem of participant dropout at any point in the federated learning process.

5 Proposed schemes

In this section,we give a detailed construction of FLOSS and FLMSS.

5.1 Federated learning based on one-way secret sharing

Compared with redundant two-way secret sharing[22],the FLOSS scheme mainly performs one-way secret sharing of each participant’s plaintext gradients after local training is completed.According to definition 1,the five constituent algorithms of FLOSS are elaborated next.

5.1.1 Init (PN,m)→(W,α,M,K)

At the beginning of each round of training,the central server first randomly selectsNparticipants from all participants to participate in this round of training,i.e.,PN={P1,P2,···,PN}.To flexibly control the scope of secret sharing and the tradeoff between privacy protection strength and communication cost,we divideNparticipants intosharing groups of sizem. μirepresents the sharing group with the indexi. Therefore,the group set is denotedM={μ1,···,μ「N/m?}.The server calculates the participation weights α,which are explained in Section 3.1.Given the security parameterk,each participantPuneeds to generate its own key pair (S Ku,PKu) through KA.param(k) and KA.gen(pp).Then,participants send their public keys to the central sever.Finally,the central server broadcasts the global modelWand the public key setto all participants and group μ.

5.1.2 GroupSet (M)→(μ,idxμ)

After dividing theNparticipants into groups,the central server distinguishes thesegroups by numbering them from 1 toand then randomly numbers each member of all shared groups from 1 tom,i.e.,idxμ={idx1,···,idxm}={1,···,m}.Each participant can be uniquely identified by (μ,idx).The output of GroupSet(M) is the result of setting the group information for one shared group.The above elaboration ignores one group that may have fewer thanmparticipants,which is nonessential.

5.1.3 LocalTrain(W,μ,η,α)→Gμ

Without loss of generality,consider a participantPiin a shared group μ as an example.Pireceives the global modelWsent by the central server,starts training on the local dataset Diand obtains the local gradientGiafter replacing the local model withW.

To further reduce communication overhead,many schemes[37-39]sample gradients or perform higher-level compression,thus degrading model utility for a significant reduction in communication data size of gradient exchange.Taking the distributed selective stochastic gradient descent(DSSGD)[38]as an example,participantPichooses a fraction η ofGito be shared rather than the entireGi.The contribution of each parameter of the ultrahigh-dimensional gradient to SGD varies greatly,and gradient descent is a dynamic and continuous process.Therefore,the selective abandonment of some parameters will not have a great impact on the usability of the final model.The two common parameter selection methods are random sampling and TOPK selection,where TOPK is the firstKvalues in descending order.In recent years,more advanced gradient compression methods,such as Refs.[37,39],can compress gradients more than 300 times with only negligible loss of accuracy by error compensation techniques.The focus of this paper is not on the gradient compression method but on making privacy-preserving communication more effective by improving the secret sharing protocol.Finally,participantPimultiplies the compressed gradient by the participation weight αiin advance.

5.1.4 GradShare(Gμ,K)→Yμ

Without loss of generality,take a shared group μ and one of its membersP(μ,k)whose number isidx=kas an example.Next,m-out-of-madditive secret sharing of gradients needs to be performed within the shared group μ,i.e.,each group member splits its local gradient into several shares as required and keeps one of them local.Other shares are sent to designated group members for preservation.It is worth noting that the secret sharing method used in this paper only needs the sharing algorithm instead of the reconstruction algorithm.

Specifically,the group memberP(μ,k)dividesG(μ,k)intokparts,one of which is saved by himself,and the otherk-1 parts are sent to the group members {P(μ,1),···,P(μ,k-1)} with a smalleridxthan himself to be saved.Therefore,group memberP(μ,m)needs to send shares to all other group members without receiving shares.Meanwhile,the group memberP(μ,1)only needs to receive the shares.

HowP(μ,k)can safely split local gradients and send shares to designated members is described as follows.The main methods are key agreement and pseudorandom number generation.First,in the initialization phase,P(μ,k)receives the set of required public keys {PKi}Pi∈μfrom the server.Without considering the group number μ,Pkneeds to split the gradientGkinto shares {sk,k,···,sk,1} wherek≥2.The secret share is calculated as:

Pkuses the key agreement algorithm to generate private shared keys for each shared object and uses these keys as seeds to the pseudorandom generator to obtain secret vectors with the same dimension as the gradient.The participants use the same pseudorandom generator.The specific operation of the PRG function is not the focus of this paper and can be found in existing work[21,22].After splitting the gradient,Pkneeds to send the secret shares one by one to the corresponding participants.It is worth noting thatPkdoes not need to actually perform the sending operation,and other participants can obtain the secret shares fromPkby themselves according to the key agreement algorithm.For example,Pjcan obtainsk,jwheresk,j=sj,kby

After all members of the group complete the abovementioned secret sharing,each participant computes its aggregation result as:

Finally,Pksends the resultYkto the central sever for model updating.

5.1.5 ModelUpdate(Yμ1,Yμ2,···)→W

Without loss of generality,the case of group μ is discussed.Upon receiving all the resultsthe central sever updates the global parameters by

Combining Eqs.(3) and (5),we can obtain

Therefore,the result obtained by central server aggregation is the result obtained by initial gradient aggregation without secret sharing.In parallel,the server is able to obtain a new global model after performing the same security aggregation process with the other sharing groups.

5.2 Federated learning based on multi-shot secret sharing

Many processes and implementation methods of FLMSS are common to FLOSS,so this section mainly describes the differences.To avoid confusion between the FLMSS and FLOSS,we first point out the essential differences between them.FLOSS performsm-out-of-mfully connected secret sharing in each sharing group ofmsize,i.e.,the group members will share secrets with each other.However,FLMSS does not need to group participants,which is easy to intentionally set by the colluding parties,and secret sharing is carried out randomly by all participants.In essence,d-shot FLMSS means that each participant performsd′-out-of-d′secret sharing whered′≥d+1 among all participants.According to definition 2,the five constituent algorithms of the FLMSS are elaborated next.The overall FLMSS scheme is presented in Algorithm 1.Meanwhile,the protocol for dealing with dropout is shown separately in Algorithm 2.

5.2.1 Init (PN)→(W,α,K)

During the initialization phase,the central server no longer needs to group participants.After generating their own key pairs,participants send their public keys to the central sever.The central server needs to broadcast the identity information of all participants,which occupies very small communication costs to all participants.

5.2.2 LocalTrain(W,PN,η,α)→(G1,G2,···,GN)

ParticipantPiobtains the local gradientGithrough training on modelW,dataset Di,and compression rate η,and the specific method is the same as that of FLOSS.

5.2.3 GradShare(G1,···,GN,K,d)→(Y1,Y2,···,YN)

Without loss of generality,consider a participantPias an example.According to the participant information received from the sever in this round,Pirandomly selectsdparticipants as its secret sharing destination.We can express it as,where=d.Specifically,Pifirst sends the identities of participants inas a message to the central sever.AssumingPj∈,the central sever verifies thatPjis online and then sends the public keyPKjandPKitoPiandPj.It should be emphasized that the central server is required to record each pair of participants,which confirms the intent for secret sharing.When the pair of participants completes the public key exchange,Piexecutes:

Correspondingly,Pjexecutes:

WhenPifindsddestinations and completes secret sharingdtimes,its active sharing ends.Under this mechanism,each participant needs to actively share secrets withdparticipants and accept the sharing from random participants.After completing the above sharing,Piupdates the local aggregation resultYi:

Finally,the central server has recorded the collaborative sharing participant set Piof each participantPi,where Pi=

Participant dropout before completion of secret sharing has a negligible impact on the global model.However,if the participant is disconnected after completing the secret sharing,the server aggregation result will lose some random shares,which greatly reduces the availability of the aggregated gradient.

The set of newly dropped participants in theith processing round is denotedcan be obtained by the central server in the process of collecting local aggregation results.Correspondingly,the central server can obtain the setwhich shares secrets with the dropped participantsand is alive from the record.

Then,in the first dropout-handling round,Paperformsshot secret sharing forYaamong participants.ForPaand Pain dropout-handling roundi,we have

Consider below one extreme situation that can cause privacy leakage when dealing with dropped participants.When=1 andPahave only shared secrets with the current round of dropped users,i.e.,Pa?,Pawill be left with only plaintextYawhereYa=Gaafter the stripping operation is completed.When this extremely unlikely situation occurs,Patakes the initiative to announce the drop to the server to protect its privacy.FLMSS then proceeds to the next round of drop processing until this very event with small probability does not happen again.

When no more participants drop out,the server uses the latest local results uploaded by online participants for secure aggregation in the same way as FLOSS.

6 Security analysis

In this section,we present the security analysis of FLOSS and FLMSS.The following proofs are based on an existing lemma[21]that if participants’ values have uniformly random masks that are added to them,then the resulting values look uniformly random.Theorem 3 analyzes the security from a probabilistic point of view.Since the probability and circumstances of participant dropouts are difficult to estimate and FLMSS provides targeted protection against possible privacy leakage,the impact of the dropout handling process on the probability is ignored.

Theorem 1.In FLOSS,the local dataset privacy ofPkcannot be guaranteed if and only if the honest-but-curious central sever and all (m-1) other participants in the shared group collude.Otherwise,the privacy of the local dataset Dkcan be guaranteed.

Proof:The original local gradientGkofPkis split into{sk,1,sk,2,···,sk,k}.To recoverGk,the adversary must obtain thesekshares or the sum of thesekshares.Since in the whole communication process of the FLOSS,there is no separate addition between thesekshares,each value of thesekshares must be obtained to restoreGk.Except forPkitself,only{P1,P2,···,Pk-1} know {sk,1,sk,2,···,sk,k-1}.Therefore,the adversary needs the collusion of these (k-1) participants.Furthermore,the only way for an adversary to obtain the local share valuesk,kissk,k=Yk-sk+1,k-sk+1,k-···-sm,k.Except forPkitself,only {Pk+1,Pk+2,···,Pm}know{sk+1,k,sk+2,k,···,sm,k},and only the central sever knowsYk.Therefore,the adversary must also require the server to collude with these (m-k) group members.In summary,to obtainGk,which reveals the privacy ofPk,the central server must collude with all other (m-1) members{P1,···,Pk-1,Pk+1,Pk+2,···,Pm}.

Theorem 2.In FLMSS,in the face of an adversary that is the honest-but-curious server or a group composed of any number of other participants in collusion,Pkcan guarantee the privacy of the local dataset Dk.

Proof.First,we demonstrate the secret sharing process.Without the participation of the central server,the adversary cannot obtainYkandsk,k.Without the help of other participants,the adversary cannot obtainsk,jandsi,k,wherei≠k,j≠k.Next,the security of the process of handling dropped users is discussed.The process of dealing with dropped participants involves Pdropand Phelp,so their security needs to be demonstrated.IfPkis any dropped user in Pdrop,neither the central server nor the other participants can obtainYkandsk,k.IfPkis any helper in Phelp,Pkis required by the FLMSS protocol to participate in another round of secret sharing in the process that handles the dropped participants before uploading the new local aggregation result,where the situation is consistent with the above proved situation.In summary,with the proof of Theorem 1,the adversary cannot recoverGkin the above two cases,i.e.,the privacy of local dataset Dkis guaranteed.

Theorem 3.In the FLMSS,which is ad-shot,suppose there are a total ofNparticipants,xparticipants are curious about colluding with the central server,andPkis any normal participant.Then,the probability that an adversary composed of curious participants and the honest-but-curious server can obtain the training data privacyGkofPkis

To better analyze the security of the FLMSS,we plot surface graphs of the probability of a normal participant being successfully attacked as a function of environmental parameters.When 5000 ≤N≤10000 and 0 ≤x≤5000,the graphs ofP=f(N,x),whered=5,10 are shown in Fig.4.According to Theorem 3,we also plot the probability of a normal participant being successfully attacked as a function of the number of curious participants when the total number of participantsNis fixed to better demonstrate the impact ofdon the security of the FLMSS (see Fig.5).As shown in Fig.4,in the face of drastic changes in the federated learning environment,a small value ofdenables FLMSS to guarantee almost 100% security in the majority of cases.Fig.5 shows that a very small increase indcan greatly increase the privacypreserving ability of FLMSS.An appropriate increase in the value ofdis an extremely effective option for communication costs.

Fig.4.Surface graphs of the security of FLMSS as a function of environmental parameters when the number of shots d is fixed.

Fig.5. Curves of the security of FLMSS as a function of the number of curious participants x when the total number of participants N is fixed.

To better demonstrate the superiority of FLOSS and FLMSS in terms of security,we compare the security of our proposed schemes with the current state-of-the-art secret sharing-based security aggregation schemes,as shown in Table 3.in Table 3 is the secret sharing threshold in Ref.[21].In SSDDL,since the grouping is controlled by the central server,as long as the number of curious participants colluding with the server is greater thanm-2,the adversary can attack any participant.Increasingmwill exponentially increase the communication cost,which will be proven later.However,in our FLOSS,mincreases drastically without worrying about a surge in communication costs.The work of Zheng et al.[23]has similar principles and security.In Bonawitz’s protocol[21],even ifttakes the suggested value+1 and the number of curious participants colluding with the server is greater than,the privacy of all normal participants will be exposed.At the same time,the capacity to handle dropped participants is also limited.In contrast,both the security and robustness of our FLMSS show great advantages.Whend=10 andx≤3N/5,the probability of privacy leakage of one participant,which is less than 0.0001104,can be negligible,whereN=10000.Changes in the large value ofNhave little effect on this probability.For practical scenarios in federated learning where curious participants account for a small proportion,the protection provided by our FLMSS is sufficient.Moreover,FLMSS can handle any number of participant dropouts.

Table 3. Security comparison among secure aggregation schemes based on secret sharing.

7 Experimental results

In this section,we conduct experiments to evaluate the performance of our proposed privacy-preserving FLOSS and FLMSS schemes.We set the operation environment,which consists of an Intel(R) Xeon(R) Gold 5117 CPU,128.00 GB RAM,an NVIDIA RTX 3090 GPU,and a 64-bit operating system with the PyTorch platform.We evaluate our proposed schemes on a well-known standard dataset: MNIST[42].The MNIST dataset consists of 60000 training samples and 10000 test samples.The sample images are single-channel 28×28 handwritten digit images.The models we use for accuracy evaluation are multilayer perceptron (MLP) and LeNet-5,which have 104938 and 136886 parameters,respectively.We also use ResNet-18[7],ResNet-34,and ResNet-50 for performance evaluation,which have 11689512,21797672,and 25557032 parameters,respectively.We use C++to realize encryption and secret sharing based on CryptoPP.

7.1 Accuracy evaluation

According to the principle of secret sharing and the implementation mechanism of our proposed scheme,FLOSS and FLMSS do not cause noteworthy damage to the transmitted model parameters,i.e.,they do not decrease the model accuracy on the basis of the original gradient compression algorithm.Therefore,the purpose of this section focuses on validation.

First,we conduct a comparative experiment of model utility using the LeNet model on the MNIST dataset.We use centralized learning in this scenario as a traditional benchmark.We use randomly selective SGD as a simple gradient compression method and add DSSGD[38],which only uses this method as a privacy-preserving method for comparison.Moreover,we also reimplement the SSDDL scheme[22],which is also based on secret sharing.The number of participantsNis set to 100,where each participant has 1% of the entire dataset.The sampling rate η of gradient selection is uniformly set to 0.1.Since the comparison schemes have no mechanism to handle user dropouts,we do not set up user dropout cases.We can use the performance of FLMSS to represent that of FLOSS without needing to repeat the experiment.The comparison of the accuracy performance is illustrated in Fig.6.Since DSSGD[38],SSDDL[22],and our proposed FLMSS use the same distributed SGD method and the secret sharing mechanism does not affect the aggregated model parameter values,their accuracy performances are almost the same.Compared with the traditional method without gradient selection and using centralized training,our proposed scheme converges slower,but the final model accuracy is almost the same.

Fig.6. Comparison of model accuracy for different learning methods.

Subsequently,we briefly verify the impact of key parameter settings in federated learning on model accuracy.According to the principle of control variables,we set the benchmark values of the gradient selection rate η,the number of participantsN,and the number of local training epochsLeto 0.1,100,5,respectively.We vary η,N,andLein the range of {0.01,0.1,0.2,0.3,1},{30,60,90,120,150},{1,2,5,10},respectively.To better simulate the federated learning scenario,in the experiment forN,each participant has a smaller proportion of the data,which is set to 0.2% of the entire dataset.As shown in Fig.7,when the gradient uploading rate η increases,the model accuracy increases.However,when we adopt more advanced gradient compression methods,the model usability is affected by the new compression method itself.A larger number of participantsNmeans that more data are involved in training,which will improve the model accuracy.When the local epochLeincreases,the global model converges faster and has higher accuracy,which is consistent with the relevant conclusions of FedAvg[8].We can conclude from the model performance experiments that our proposed privacy-preserving schemes will not affect the model performance of the original federated learning algorithm.

Fig.7. Model performance under the influence of different parameter settings.

7.2 Performance evaluation

7.2.1 Communication performance

In the federated learning scenario,participants usually do not have a large communication bandwidth and computing power.The communication cost of the participants has become an important factor restricting the development of federated learning.For fairness,we only count the participants’communication overhead incurred during one round of global training.We useNto denote the number of participants in the round,nto denote the number of parameters,bto denote the bit precision,and η to denote the gradient compression rate.In FLOSS and SSDDL,we usemto denote the group size andm≤N.In FLOSS,FLMSS,and Bonawitz’s protocol,we all usebkto denote the number of bits in a key exchange public key.We usebsto denote the number of bits in a secret share of Bonawitz’s protocol.

In FLOSS,Nparticipants first download a model with a size ofnbfrom the central server,and the communication cost isNnb.Each participant then uploads its own public key and downloads the public keys of all other participants in its participants complete the secret sharing through key agreement,they upload the compressed local aggregation results to sharing group,with a communication cost ofNmbk.After the the central server,and the communication cost isNηnb.Therefore,the total cost of our proposed FLOSS in one round of training isN(mbk+(η+1)nb).The work of Zheng et al.[23]considers group management as an optional practical solution,and their scheme without group management isN(Nbk+(η+1)nb)according to FLOSS.Of course,they are different in the method of implementing compression of model parameters.In FLMSS,each participant that has uploaded its own public key to the sever shares a secret with one other participant by key agreement,with the total communiccommunication cost of the system for secret sharing is apation cost of both parties being 2bk.The above communication process is performeddtimes by each participant,so the proximately 2Ndbk.Hence,the total communication cost of the FLMSS in one round of training isN(2dbk+(η+1)nb).We summarize the one-round participant communication costs of several other secret sharing-based privacy-preserving distributed learning schemes in Table 4.The privacypreserving capability of DSSGD is poor,and its communication cost is a starting point for reference.Compared to SSDDL,our FLOSS has the same level of strict protection and lower communication costs.Compared to Bonawitz’s protocol and other schemes,our FLMSS has strong privacy protection strength and the ability to handle dropped participants,but the proportion of communication cost used for privacy protection is small to negligible compared to the original cost.

Table 4. Comparison of the participants’ communication costs.

We measure the impact of the number of participantsNand the number of model parametersnon the communication cost.First,we train a LeNet network with 136886 parameters on the MNIST dataset.We set some moderate parameter values where group sizem=50,number of shotsd=10,and the rate η=0.1.In fact,the protection strength of FLOSS withm=50 is far from sufficient compared with FLMSS whered=10,and η can be improved by using more advanced gradient compression methods.To make a more intuitive comparison of the communication costs used for privacy protection,we omit the basic communication overhead common to all these schemes for transferring model parameters.As shown in Fig.8a,while the FLMSS guarantees a very high protection effect,it hardly increases the extra communication cost.Moreover,even if we increase the value ofmtoincrease the protection strength,the communication cost increase of our FLOSS is negligibly small compared to SSDDL,which can be inferred from Table 4.Furthermore,due to the logarithmic processing of the vertical axis,in fact,the superiority of our proposed schemes in communication cost is much greater than the intuitive perception of Fig.8a.Second,when we setN=1000 and change the model parametersnfor simulation,we obtain Fig.8b.The same conclusion can be drawn: our methods achieve a huge reduction in communication cost while achieving better protection compared to state-of-the-art woks.Apart from SSDDL,the communication overhead of other schemes used for privacy protection is independent of the size of the model parameters.

Fig.8. (a) The communication cost used for privacy protection under different numbers of participants.(b) The communication cost used for privacy protection under different numbers of model parameters.

7.2.2 Computation performance

To intuitively reflect the computational cost of the system for secure aggregation,computation performance experiments no longer count model training,communication and other time consumption but focus on the process of encryption,masking,and aggregation.We simulate 100 clients and one server and dynamically adjust the structure of the LeNet model to obtain varying model parameters.We make a comparison with the most related and state-of-the-art works[21,23]without heavy cryptographic operations,which are able to deal with dropout.

First,we examine the participant-side and sever-side computation performance without considering dropout.We set up the model LeNet with 136886 parameters and change the fraction of selected clients to participate in federated learning,i.e.,the actual number of participants,to examine the average computation performance of each participant to encrypt the model and the sever to aggregate the model securely in different works.Considering the total number of participants,we set the size of the group in FLOSS to 10 andd=3in FLMSS.To control variables,we omitted operations that may affect runtime in the work of Zheng et al.[23],such as quantization techniques and trusted hardware.We show the participant’s running time of encrypting model updates for varying fractions of participants selected in one round using different schemes in Fig.9a.Since FLOSS uses a grouping strategy and FLMSS uses a fixed number of secret-sharing shots,the participant-side computation cost of our proposed scheme does not increase with the number of participants.The participant-side computation costs of works of Bonawitz et al.[21]and Zheng et al.[23]increase linearly with the number of participants.In Fig.9a,we show computation costs on the sever side.In Ref.[21],the sever is required to perform reconstruction of secret keys and recompute masks to be subtracted from the uploaded model whether dropouts occur or not,which brings huge computation overhead.The central sever only needs to perform simple additive aggregation in our schemes.Then,we fix the selection fraction to 0.3 and examine the computation performance of clients and the sever when the size of the model varies.In Fig.10,the computation costs of participants and the server for encryption and secure aggregation increase linearly with increasing model size.

Fig.9. Computation-cost comparison with varying fractions of selected participants per round.

Fig.10. Computation-cost comparison with varying numbers of model parameters per round.

On the server side,where computation power is usually sufficient,the grouping management of FLOSS and the sparse connection method of FLMSS do not provide computing performance advantages when no dropout occurs.However,most importantly,on the client side where the performance is most constrained,the performance overhead of our proposed scheme is significantly lower than those of other schemes.

Although the work of Zheng et al.[23]cannot solve the nesting of participant dropout,we are still interested in comparing the computation performance when participant dropout occurs.For the above reason,we only consider one round of dropout.The selection fraction and the number of model parameters are fixed to 0.4 and 136886.When we change the dropout rate of participants,we can obtain Fig.11.It is worth noting that the running time in Fig.11 is dedicated to the time spent dealing with dropout.Since there is no separate process to address participants’ dropout in the work of Bonawitz et al.[21],we regard the difference between the running time where participants drop out and the benchmark running time as the time spent addressing dropout.Participants in the work of Bonawitz et al.do not need to make additional calculations for the dropout of other participants.Although this sounds like a good advantage,Fig.9 and Fig.10 show that participants in their scheme have the disadvantage in computation overhead in the regular process.The computational cost of FLMSS for dealing with dropout is low on the participant side and the server side.Another important advantage that theaverage running time in Fig.11a cannot reflect is that when the dropout rate is low,FLMSS only requires a few participants to offer assistance,while Zheng et al.’s scheme[23]requires all online participants to start the assistance program,which easily causes new large-scale dropout.This advantage can be verified from the difference in the front part of the curve in Fig.11b.

Fig.11. Computation cost for handling dropout with varying dropout rates.

Finally,we compare the theoretical computational complexity.The number of dropout participants is denoted asNd.Other notations follow the previous descriptions.Overall,FLOSS leads toO(mn) computations on each client side andO(Nn)computations on the server side.The FLMSS leads toO(n(d+Nd))computations on each client side andO(n(N-Nd))computations on the server side.According to related works[21,23],we summarize the comparison of the asymptotic computational complexity in Table 5.

Table 5. Performance complexity comparison.

8 Conclusions

We propose two computation-communication efficient secure aggregation schemes for privacy-preserving federated learning based on secret sharing: FLOSS and FLMSS.FLOSS can achieve strict privacy preservation with a small communication cost.FLMSS has the ability to provide highintensity privacy preservation to each participant in federated learning and efficiently handle participant dropout.Theoretical proof of security under the threat model of curious participants and honest-but-curious central servers is given.Finally,we demonstrate the superiority of our proposed FLOSS and FLMSS through extensive experiments.

Acknowledgements

This work was supported by the National Key Research and Development Program of China (2018YFB0804102),the National Natural Science Foundation of China (61802357),the Fundamental Research Funds for the Central Universities(WK3480000009),and the Scientific Research Startup Funds of the Hefei University of Technology (13020-03712022064).

Conflict of interest

The authors declare that they have no conflict of interest.

Biographies

Xuan Jinreceived his B.E.degree from the the Hohai University in 2022.He is currently a master’s student at the University of Science and Technology of China.His research interests include deep learning and applied cryptography.

Yuanzhi Yaois currently an Associate Professor at the Hefei University of Technology.He received his Ph.D.degree from the University of Science and Technology of China in 2017.His research interests include deep learning and multimedia security.

中國科學(xué)技術(shù)大學(xué)學(xué)報(bào)2024年1期

中國科學(xué)技術(shù)大學(xué)學(xué)報(bào)的其它文章: 中文概要; Constitutive modeling of the magnetic-dependent nonlinear dynamic behavior of isotropic magnetorheological elastomers; Dual-modality smart shoes for quantitative assessment of hemiplegic patients’ lower limb muscle strength; Robustness benchmark for unsupervised anomaly detection models; QGAE: an end-to-end answer-agnostic question generation model for generating question-answer pairs; BEV-radar: bidirectional radar-camera fusion for 3D object detection

国产日韩欧美一区二区三区三州_亚洲少妇熟女av_久久久久亚洲av国产精品_波多野结衣网站一区二区_亚洲欧美色片在线91_国产亚洲精品精品国产优播av_日本一区二区三区波多野结衣 _久久国产av不卡