MIT Open Access Articles Fast Vector Oblivious Linear Evaluation from Ring Learning with Errors The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation: de Castro, Leo, Juvekar, Chiraag and Vaikuntanathan, Vinod. 2021. "Fast Vector Oblivious Linear Evaluation from Ring Learning with Errors." Proceedings of the 9th on Workshop on Encrypted Computing & Applied Homomorphic Cryptography. As Published: 10.1145/3474366.3486928 Publisher: ACM Persistent URL: https://hdl.handle.net/1721.1/143921 Version: Final published version: final published article, as it appeared in a journal, conference proceedings, or other formally published context Terms of use: Creative Commons Attribution 4.0 International license Session 2 WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea Fast Vector Oblivious Linear Evaluation from Ring Learning with Errors Leo de Castro Chiraag Juvekar Vinod Vaikuntanathan Massachusetts Institute of Technology Analog Devices Massachusetts Institute of Technology ldec@mit.edu chiraag.juvekar@analog.com vinodv@mit.edu ABSTRACT secure computation protocols [13, 21]. In an OLE protocol over a Oblivious linear evaluation (OLE) is a fundamental building block ring Z𝑝 , there are two parties, a sender 𝑆 with values 𝛼, 𝛽 ∈ Z𝑝 and in multi-party computation protocols. In OLE, a sender holds a a receiver 𝑅 with a value π‘₯ ∈ Z𝑝 . At the end of the protocol, 𝑅 will description of an affine function 𝑓𝛼,𝛽 (𝑧) = 𝛼𝑧 + 𝛽, the receiver learn the value 𝛾 = 𝛼 Β·π‘₯ + 𝛽 while 𝑆 will learn nothing. In Vector OLE holds an input π‘₯, and gets 𝛼π‘₯ + 𝛽 (where all computations are done (VOLE), the sender has 𝜢 , 𝜷 ∈ Zπ‘š 𝑝 , the receiver has π‘₯ ∈ Z𝑝 , and over some field, or more generally, a ring). Vector OLE (VOLE) is a obtains 𝜸 := 𝜢 π‘₯ + 𝜷 ∈ Zπ‘š 𝑝 . In other words, vector OLE is running generalization where the sender has many affine functions and the many OLE protocols in parallel where the receiver has the same π‘₯. receiver learns the evaluation of all of these functions on a single Batch OLE (BOLE) can be viewed as many OLE protocols running point π‘₯. in parallel where the receiver has a vector x of values over Zπ‘š 𝑝 . In The state-of-the-art semi-honest VOLE protocols generally fall this work, we will primarily focus on VOLE with a simple extension into two groups. The first group relies on standard assumptions to BOLE. to achieve security but lacks in concrete efficiency. These con- Our approach to implementing a VOLE protocol is to use the structions are mostly based on additively homomorphic encryption packed additively homomorphic encryption (PAHE) of Braker- (AHE) and are classified as β€œfolklore". The second group relies on ski [11] and Fan and Vercuteran [17], henceforth referred to as the less standard assumptions, usually properties of sparse, random lin- BFV scheme. PAHE is a natural choice of primitive to implement ear codes, but they manage to achieve concrete practical efficiency. such a protocol, especially in the honest-but-curious model. At a In this work, we present a conceptually simple VOLE protocol that high level, our protocol has the receiver 𝑅 encrypt the value π‘₯ and derives its security from a standard assumption, namely Ring Learn- send the encryption [[π‘₯]] to the sender 𝑆. Using the PAHE opera- ing with Errors (RLWE), while still achieving concrete efficiency tions, the sender can then compute the ciphertext [[𝜸 ]] = [[𝜢 Β·π‘₯ +𝜷]] comparable to the fastest VOLE protocols from non-standard coding and return it to the receiver, who can decrypt and learn the VOLE assumptions [3, 8, 31]. Furthermore, our protocol admits a natural output 𝜸 . As long as the PAHE scheme achieves circuit privacy extension to batch OLE (BOLE), which is yet another variant of which, in this case, says that the homomorphically evaluated ci- OLE that computes many OLEs in parallel. phertext [[𝜸 ]] does not leak information about 𝛼 and 𝛽 even to the owner of the secret key, then security is achieved against honest- CCS CONCEPTS but-curious adversaries. In the case of VOLE, the circuit is specified β€’ Security and privacy β†’ Cryptography; by the sender’s values of 𝜢 and 𝜷. We note that achieving security in the honest-but-curious setting KEYWORDS is sufficient following the work of Hazay, Ishai, Marcedone, and Venkitasubramaniam [21] which instantiates the semi-honest-to- homomorphic encryption, applied cryptography, oblivious linear active compiler of Ishai, Prabhakaran, and Sahai [23] using black- evaluation box access to an honest-but-curious OLE protocol. In particular, this ACM Reference Format: instantiation enables a semi-honest OLE protocol to be upgraded Leo de Castro, Chiraag Juvekar, and Vinod Vaikuntanathan. 2021. Fast Vector to an actively secure OLE protocol with roughly a factor of two Oblivious Linear Evaluation from Ring Learning with Errors. In Proceedings overhead ([21], section 5.3). of the 9th Workshop on Encrypted Computing Applied Homomorphic Cryptog- While the high-level story is simple, instantiating a OLE scheme raphy (WAHC ’21), November 15, 2021, Virtual Event, Republic of Korea. ACM, from a PAHE scheme in a concretely efficient way is less so. In this New York, NY, USA, 13 pages. https://doi.org/10.1145/3474366.3486928 work, we present a series of optimizations of the textbook BFV PAHE scheme that results in a vector OLE scheme that outper- 1 INTRODUCTION forms all prior works in parameter regimes of practical interest. At Oblivious linear evaluation (OLE), a special case of oblivious poly- the heart of our optimizations is a new analysis of the modulus nomial evaluation [28], is a fundamental building block in many reduction operation similar to the BV homomorphic encryption scheme [12], which argues that circuit privacy is achieved when modulus reduction is performed under carefully selected parame- ters. In addition, we extend the analysis of Halevi, Polyakov, and Shoup [20] to efficiently implement this operation in the double This work is licensed under a Creative Commons Attribution International 4.0 License. Chinese remainder theorem (DCRT) representation. WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea Β© 2021 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-8656-2/21/11. https://doi.org/10.1145/3474366.3486928 29 Session 2 WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea 1.1 Our Contributions 1.2 Related Work First, we present a fast, light-weight vector OLE (VOLE) protocol. Recent related work improving VOLE and BOLE efficiency have In fact, our protocol most naturally achieves the more expressive mainly focused on obtaining optimizations from the Learning Parity batch OLE (BOLE) functionality at little additional cost. This is in with Noise (LPN) assumption [6] and other related coding assump- contrast to the LPN-based VOLE protocols of [3] and [31] which, tions. We separate prior work into two main categories based on to the best of our knowledge, do not have such a straightforward their communication complexity. The first category consists of pro- extension to BOLE. tocols that have communication complexity linear or super-linear The main technical idea in our result is a way to achieve circuit in the length of the VOLE correlations they generate. More heuris- privacy in a PAHE scheme which corresponds to sender security tically, these protocols are optimized for smaller VOLE correlations in a (V/B)OLE protocol. In slightly more detail, we’d like to ensure and tend to be very fast for these shorter VOLE lengths. The sec- that the receiver who has the private key of the PAHE scheme and ond category consists of protocols with sublinear communication π‘₯, and obtains an evaluated ciphertext [[𝜸 ]] = [[𝜢 Β· π‘₯ + 𝜷]] learns complexity in the length of the VOLE correlations they generate. 𝜸 , but no other information about 𝜢 and 𝜷. The crucial difficulty Heuristically, these protocols are optimized for larger VOLE lengths is that the noise contained in the PAHE evaluated ciphertext [[𝜸 ]] and tend to be slower than protocols in the first category for gener- could reveal information about 𝜢 and 𝜷. ating shorter VOLE correlations. The folklore idea to achieve circuit privacy is a simple technique Our protocol falls in the first category, so we compare directly called noise-flooding [18]. That is, sample from a sufficiently wide with other protocols in this category. While our protocol can be distribution, either Gaussian or uniform, so as to flood the noise extended to the more general batch-OLE functionality at minimal contained in [[𝜸 ]]. In turn, to achieve sufficient statistical security, cost, we focus our comparisons on vector-OLE. Our main point this forces us to work with inefficient multi-precision arithmetic. of comparison is the work of Applebaum et al. [3], although we For example, the procedure to sample uniformly random numbers also compare our protocol against the subsequent works of Baum from a sufficiently wide interval, using the NTL library [33], will et al. [5], Boyle et al. [10], and Weng et al. [34]. We note that [34] take more than 100% of our sender run-time. Additionally, despite implements the vector-OLE functionality whereas our work as well concrete (in)security concerns in practical scenarios [29], homomor- as [10? ] implement the more general batch-OLE functionality. phic encryption libraries either do not seem to implement circuit The protocol of Applebaum et al. [3] implements a fast vector- privacy (such as in SEAL [25, 32]) or implement a single-precision OLE protocol using assumptions over sparse linear codes that are version of noise flooding which provides very limited statistical inspired by the LPN assumption but notably are not known to have security (such as in PALISADE [30]). a reduction from the LPN assumption. Consequently, while this Other approaches to circuit-private homomorphic encryption work achieves impressive performance, the security of their pro- exist, but are much less efficient than what we can afford for such tocol is harder to quantify. In contrast, the security of our VOLE a lightweight computation as in (V/B)OLE: garbled circuit-based protocol is based on the security of the Ring-LWE assumption, techniques [19] requires computing and communicating a garbled which has recently been extensively benchmarked in the context circuit for the PAHE decryption algorithm; FHE bootstrapping- of the NIST post-quantum cryptography standardization process based techniques [15] are even less efficient; and techniques such and the homomorphic encryption standardization process [2]. In as [7] only work for the GSW homomorphic encryption scheme particular, we base our parameter choices off of the homomorphic which is also too inefficient for our purposes. encryption standard [2]. This widely endorsed standard signifi- We present a simple way to achieve circuit privacy of PAHE, cantly strengthens the security claims about the parameter choices and therefore construct a (V/B)OLE protocol, using a rounding for this protocol. In section 5.4, we show that even with these strong operation that allows greater efficiency than the folklore noise- security claims, our protocol achieves efficiency that meets or ex- flooding approach. The rounding operation and its rather simple ceeds the efficiency of [3] in many settings. In addition, our VOLE analysis is inspired by similar techniques in the context of learning protocol gives a natural extension to BOLE. with rounding (LWR) [4]. To compare our protocol against works in the second category, In addition, we give an implementation of this protocol along we must first recognize that all protocols in the second category will with performance benchmarks. This implementation aims to be asymptotically outperform ours, since eventually the linear com- usable as a black-box in larger applications; an earlier version of munication complexity of our protocol will surpass the sublinear this implementation was already used in the work of Hazay, Ishai, communication complexity of protocols in this category. The natu- Marcedone, and Venkitasubramaniam [21] in CCS 2019. ral question to ask, then, is at what VOLE length does the sublinear Furthermore, our protocol outperforms recent implementations protocol become faster than the linear protocol, and where do the ([31], CCS 2019) based on the learning parity with noise assump- practical applications of VOLE live vis-Γ -vis the crossover point? tion, as described in [8], for VOLE dimension up to π‘š = 235 . In When comparing against protocols in this category, we estimate other words, while [8, 31] will outperform our implementation this cross-over point and then show applications where the VOLE asymptotically for huge π‘š, the crossover point is quite far out. We lengths are less than this point. Our main point of comparison describe the relation between the two approaches in more detail for protocols in this category is the work of Schoppmann, GascΓ³n, in Section 1.2, and provide a detailed performance comparison in Reichert, and Raykova [31], which is a two-party implementation Section 5. of the function secret sharing (FSS) based work of Boyle, Couteau, Gilboa, and Ishai [8] that constructs a VOLE protocol with sublinear 30 Session 2 WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea complexity. In section 5.4, we show that our protocol outperforms 2 BACKGROUND [31] for smaller VOLE lengths, which is to be expected. We also In this section, we will briefly review the definitions of the homo- show that the expected cross-over point of the performances of morphic encryption scheme used below as well as the mathematics this protocol is large enough to allow potential applications that behind residue number system optimizations. do not require VOLE correlations of greater length. In section 5.6, we discuss one such application: securely evaluating convolutional 2.1 Notation neural network layers for medical images. The homomorphic encryption scheme used in our work is based off of the Ring Learning with Errors problem [27], which is defined Subsequent Work. Subsequent to the online publication of this over polynomial rings. In our instantiation, we use the ring work [14], several additional works have been published presenting various VOLE and BOLE protocols. We briefly compare against R = Z[π‘₯]/(π‘₯ 𝑛 + 1) (1) these works to show that our protocol is still competitive with the For the remainder of this work, 𝑛 will always be a power of two. state-of-the-art. For a modulus π‘ž, let Rπ‘ž be R with all coefficients mod π‘ž. The first work we compare against is the BOLE protocol of Baum et al. [5]. This two-party protocol between Alice with a Rπ‘ž = Zπ‘ž [π‘₯]/(π‘₯ 𝑛 + 1) vector input 𝜢 and Bob with a vector input 𝜷 results in each party holding an additive share of the component-wise product 𝜢 Β· 𝜷. For an integer 𝑛, we will denote the set {0, 1, 2, . . . , 𝑛 βˆ’ 1} as [𝑛]. The main feature of this protocol is that it requires only one round For integers 𝑖 ≀ 𝑗, we denote the range {𝑖, 𝑖 + 1, . . . , 𝑗 βˆ’ 1} as [𝑖 : 𝑗]. of simultaneous communication; Alice sends a message to Bob and We denote the rounding function βŒˆΒ·βŒ‹ : R β†’ Z that maps π‘₯π‘Ÿ ∈ Bob simultaneously sends a message to Alice. Once these messages R to the closest integer π‘₯ ∈ Z. We denote the flooring function have been received, the parties locally compute additive shares of βŒŠΒ·βŒ‹ : R β†’ Z that maps π‘₯π‘Ÿ ∈ R to the closest integer π‘₯ ∈ Z such that the component-wise product. While our protocol is two rounds, π‘₯ ≀ π‘₯π‘Ÿ . we show in section 5.4 that our protocol outperforms this one- For a positive integer 𝑏, we write the modular reduction opera- round protocol in both total communication and computation time. tion 𝑐 ≑ π‘Ž mod 𝑏 as 𝑐 = [π‘Ž]𝑏 . The intuition for why this is the case is because the one-round The norm notation || Β· || refers to the β„“βˆž norm, unless otherwise BOLE protocol requires two iterations of the LPR-style rounding specified. For a polynomial π‘Ž with 𝑛 coefficients each mod π‘ž, we technique, while our approach only performs this operation once. have the bound ||π‘Ž|| ≀ π‘ž. Next, we compare against a series of works that fall into the We specify the base-2 logarithm by log. category of asymptotically sublinear implementations. These im- We call a distribution whose samples have magnitude bounded plementations are based on some flavor of the LPN assumption by 𝐡 a 𝐡-bounded distribution. [6]. The majority of these works focus on VOLE, specifically a vari- We say that a function negl is negligible if for every constant ant known as subfield VOLE, which we denote sVOLE. The sVOLE 𝑐 > 1 we have negl(𝑛) < 1/𝑛𝑐 for all sufficiently large 𝑛. protocol is defined by a length 𝑛, a prime 𝑝, and an integer π‘Ÿ . The For two vectors 𝑣 and 𝑀 of length β„“, denote the component-wise receiver’s input is a value π‘₯ ∈ F𝑝 π‘Ÿ , and the sender’s input is the product of 𝑣 and 𝑀 as 𝑣 βŠ™ 𝑀. tuple (𝜢 , 𝜷) ∈ F𝑛𝑝 Γ— Fπ‘›π‘π‘Ÿ . Note that the elements of the vector 𝜢 We will denote sampling from a distribution as ← $ βˆ’. We will denote are drawn from the subfield F𝑝 rather than the full domain F𝑝 π‘Ÿ . 𝑛𝑒𝑑 sending or receiving a message from a public channel as β†βˆ’βˆ’βˆ’. The output of sVOLE takes the same format as VOLE, namely the vector 𝛾 = 𝜢 Β· π‘₯ + 𝜷 ∈ F𝑝 π‘Ÿ . We note that sVOLE can be the same as VOLE when π‘Ÿ = 1. Building off of the works of Boyle et al. [8, 9], 2.2 Circuit Privacy the fastest work (to our knowledge) implementing sVOLE is the For brevity, we move the defintion of a leveled homomorphic en- work of Weng et al. and we briefly compare against this work in cryption scheme to appendix A.3 section 5.5. Definition 2.1. Circuit Privacy ([22] definition 7, [7] definition The subsequent work of Boyle et al. [10] also uses LPN-style 5.1) assumptions to construct a sublinear protocol for BOLE. While this Let E be a leveled homomorphic encryption scheme as defined in protocol achieves very low communication, the estimated compu- definition A.4. Define the following: tational performance is not competitive with out protocol, as we discuss in section 5.5. $ βˆ’ E.KeyGen(1πœ† , 1𝐿 ) (sk, pk, evk) ← $ ct𝑖 ← βˆ’ E.Encrypt(pk, π‘šπ‘– ) 𝑖 ∈ [1 . . . π‘˜] 1.3 Road Map $ In section 2, we give some brief preliminaries and definitions neces- ct𝑖 ← βˆ’ E.EncryptSK(sk, π‘šπ‘– ) 𝑖 ∈ [π‘˜ + 1 . . . 𝑛] sary in the explanations below. In section 3, we describe our circuit ⟨ct𝑖 ⟩ ≔ {ct𝑖 }𝑛𝑖=1 privacy transformation method and prove its security. In section π‘šπ‘œπ‘’π‘‘ = 𝑓 (π‘š 1, . . . , π‘šπ‘› ) 4, we give the full VOLE protocol, including a proof of security, and in section 5 we present an implementation of this protocol and We say that E is πœ–-circuit private for functions 𝑓 of depth β„“ ≀ 𝐿 analyze the performance. if there exists a PPT simulator algorithm Sim such that for all PPT 31 Session 2 WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea distinguishing algorithms D the following holds: in our implementation. Finally, we describe the full circuit privacy h   i transformation, followed by a security proof.  Pr D E.Eval pk, evk, 𝑓 , ⟨ct𝑖 ⟩ , ⟨ct𝑖 ⟩, sk, pk, evk = 1 3.1 The Divide-and-Round Step h    i We begin by presenting our procedure for performing an efficient βˆ’ Pr D Sim 1β„“ , pk, sk, evk, π‘šπ‘œπ‘’π‘‘ , ⟨ct𝑖 ⟩, sk, pk, evk = 1 divide-and-round operation over ciphertext elements. The benefits of this operation include a decreased overall ciphertext size as well β‰€πœ– as destroying the information in the ciphertext error term that In words, definition 2.1 says that the output of the evaluation could leak information about the computation. algorithm of a circuit private leveled homomorphic encryption Let 𝑄 be the original ciphertext modulus of our scheme, and let Î scheme should be indistinguishable from a simulated output, where π‘žπ‘– be the primes such that 𝑖 π‘žπ‘– = 𝑄. Given a ciphertext ct𝑄 , where the simulator is given no information about the function 𝑓 used to the elements are in R𝑄 and the decryption operations are performed compute the result ciphertext other than the function output. We over R𝑄 , the goal of this operation is to obtain a ciphertext ctπ‘ž0 can view the simulator in definition 2.1 as an alternate encryption that encrypts the same message. This new ciphertext ctπ‘ž0 will procedure that produces a fresh ciphertext that is indistinguishable have elements in Rπ‘ž0 and the decryption of this ciphertext will be from the output of the real E.Eval algorithm. We only consider performed over Rπ‘ž0 as well. Recall that we choose primes π‘žπ‘– to fit in functions 𝑓 that will result in E.Eval outputting a correct ciphertext. a standard machine word, so decryption of ctπ‘ž0 is far more efficient This can be implemented by having both E.Eval and Sim output βŠ₯ than the decryption of ct𝑄 . In addition, the communication cost for functions that are exceed the number of levels supported by E. of sending ctπ‘ž0 over a network is significantly less than sending We will use the homomorphic encryption scheme of Brakerski, ct𝑄 . Note that this operation can only be correct if the plaintext Fan, and Vercuteran [11, 17], denoted as the BFV scheme. modulus 𝑝 is sufficiently less than π‘ž 0 . Let π‘ž 0βˆ— = 𝑄/π‘ž 0 . Below, we write the ciphertext ct𝑄 and expand 2.3 Ring Expansion Factor out the terms to anticipate the division by π‘ž 0βˆ— . In order to effectively choose parameters for our leveled homomor-   phic encryption scheme, we must accurately upper bound the noise ct𝑄 = π‘Ž, π‘Ž Β· 𝑠 + Ξ”π‘š + 𝑒   = π‘Ž β€² Β· π‘ž 0βˆ— + [π‘Ž]π‘ž0βˆ— , (π‘Ž β€² Β· π‘ž 0βˆ— + [π‘Ž]π‘ž0βˆ— ) Β· 𝑠 + Ξ”π‘š + 𝑒 growth due to homomorphic operations. When multiplying two el- ements of Rπ‘ž , we need an upper bound on the norm of the product ! by a function of the norms of the operands as well as properties of Rπ‘ž itself. = π‘Ž β€² Β· π‘ž 0βˆ— + [π‘Ž]π‘ž0βˆ— , (π‘Ž β€² Β· π‘ž 0βˆ— + [π‘Ž]π‘ž0βˆ— ) Β· 𝑠 + Ξ”π‘š + 𝑒 Definition 2.2 (Ring Expansion Factor [18, 26]). The expan- The ciphertext ctπ‘ž0 is obtained by dividing the two components of sion factor 𝛿 R for a ring R is defined as follows: ct𝑄 by π‘ž 0βˆ— . ||π‘Ž Β· 𝑏 || 𝛿 R = max $ % $ % $ % π‘Ž,𝑏 ∈R ||π‘Ž|| Β· ||𝑏 || ct𝑄  π‘Ž π‘Ž Β· 𝑠 + Ξ”π‘š + 𝑒  ctπ‘ž0 = = , Lemma 2.1 (Ring Expansion Upper Bound). For a ring R = π‘ž 0βˆ— π‘ž 0βˆ— π‘ž 0βˆ—  π‘Ž β€² Β· π‘ž βˆ— + [π‘Ž]π‘ž βˆ— (π‘Ž Β· π‘ž 0 + [π‘Ž]π‘ž0βˆ— ) Β· 𝑠 + Ξ”π‘š + 𝑒  $ % $ β€² βˆ— % Z[π‘₯]/(π‘₯ 𝑛 + 1), we can upper bound the ring expansion factor 𝛿 R for 0 0 the norm || Β· || by = , π‘ž 0βˆ— π‘ž 0βˆ— (2) 𝛿R ≀ 𝑛 [π‘Ž]π‘ž0βˆ— Β· 𝑠 + Ξ”π‘š + 𝑒  $ %  Proof. In the worst case for the expansion of the norm || Β· || is = π‘Ž β€², π‘Ž β€² Β· 𝑠 + π‘ž 0βˆ— when both polynomials π‘Ž, 𝑏 ∈ R have coefficients of all the same   magnitude. In this case, the maximum coefficient of the product will = π‘Ž β€², π‘Ž β€² Β· 𝑠 + 𝑣 be 𝑛 times the product of the maximum coefficient of the operands. Therefore, we have where [π‘Ž]π‘ž0βˆ— Β· 𝑠 + Ξ”π‘š + 𝑒 $ % ||π‘Ž|| Β· ||𝑏 || = 𝑛 Β· ||π‘Ž Β· 𝑏 || = 𝛿 R Β· ||π‘Ž Β· 𝑏 || 𝑣= (3) π‘ž 0βˆ— β–‘ is the new term in ctπ‘ž0 that equals Ξ” β€²π‘š + 𝑣 β€² for a scaling factor Ξ” β€² Remark 2.1. The upper bound in lemma 2.1 extends to Rπ‘ž for any and error term 𝑣 β€² . Note that we will analyze the whole term 𝑣 for modulus π‘ž. the remainder of this section to show the security of the circuit privacy procedure, and then we will split 𝑣 into Ξ” β€²π‘š + 𝑣 β€² to show 3 OUR CIRCUIT PRIVACY APPROACH that this procedure produces a correct, decryptable ciphertext. In this section, we describe our approach to obtaining circuit privacy. We will now begin to analyze this term 𝑣 in terms of the original We will begin by describing the main operation of the circuit privacy error term 𝑒. Our goal will be to show that the distribution of transformation, which is fundamentally a divide-then-round step. 𝑣 is statistically close to a distribution that is independent of 𝑒. Then, we will describe how this operation is performed efficiently Once we have shown this, we will be able to construct a simulator 32 Session 2 WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea that samples an error term that is statistically close to 𝑣 but still Lemma 3.3. Let π‘Ž be an element of the polynomial ring Rπ‘ž = independent of the original error term 𝑒. Zπ‘ž [π‘₯]/(π‘₯ 𝑛 + 1), let 𝑏 be a scalar in Zπ‘ž , and let 𝐡 be a positive integer. We begin by defining the event that must occur in order for We can bound the ring boundary event from definition 3.2 by the ⌊(π‘Ž + 𝑐)/π‘βŒ‹ β‰  ⌊(π‘Ž + 𝑐 + 𝑒)/π‘βŒ‹ for a small error term 𝑒 in terms of π‘Ž following: and fixed scalar 𝑐. This event is intuitively very similar to the BAD Γ• Pr[BAD(π‘Ž, 𝑏; 𝐡)] ≀ Pr[BAD(π‘Žπ‘– , 𝑏; 𝐡)] (10) event in the analysis of learning with rounding (LWR) in [4]. 𝑖 ∈ [𝑛] Definition 3.1 (Boundary Event). Let π‘Ž, 𝑏, 𝐡 be positive inte- Proof. This follows directly from taking the union bound over gers. Define the event BAD(π‘Ž, 𝑏; 𝐡) to be the following: the events corresponding to the coefficients of π‘Ž. β–‘ BAD(π‘Ž, 𝑏; 𝐡) ≔ {[π‘Ž]𝑏 ∈ [0, 𝐡) βˆͺ [𝑏 βˆ’ 𝐡, 𝑏)} (4) Corollary 3.3.1. For a polynomial π‘Ž that is sampled over Rπ‘ž In words, equation 4 is the event that π‘Ž is within distance 𝐡 of a such that each coefficient π‘Žπ‘– is uniform modulo 𝑏, we can bound the multiple of 𝑏. probability of the BAD event occurring by the following: Lemma 3.1. Let π‘Ž, 𝑏 be positive integers, and let 𝑒 be an integer Γ• 𝐡 such that |𝑒 | ≀ 𝐡 where 𝐡 < 𝑏/2. Then the following relation holds: Pr[BAD(π‘Ž, 𝑏; 𝐡)] ≀ Pr[BAD(π‘Žπ‘– , 𝑏; 𝐡)] = 2𝑛 (11) 𝑏 jπ‘Ž k jπ‘Ž + 𝑒 k 𝑖 ∈ [𝑛] β‰  =β‡’ BAD(π‘Ž, 𝑏; 𝐡) (5) 𝑏 𝑏 Corollary 3.3.2. For a polynomial π‘Ž that is sampled uniformly Proof. This proof is quite straightforward. If we write π‘Ž = π‘˜π‘ +π‘Ÿ over R𝑄 for 𝑄 = π‘ž 0βˆ— Β·π‘ž 0 and a small polynomial error term 𝑒 such that where π‘˜ is an integer and π‘Ÿ ∈ [𝑏], we have that ||𝑒 || ≀ 𝐡 for a positive integer 𝐡 < π‘ž 0βˆ— /2, we can bound the following j π‘Ž k j π‘˜π‘ + π‘Ÿ k probability: = =π‘˜ (6) hjπ‘Ž + 𝑒 k j π‘Ž ki 𝐡 𝑏 𝑏 Pr β‰  βˆ— ≀ Pr[BAD(π‘Ž, π‘ž 0βˆ— ; 𝐡)] ≀ 2𝑛 βˆ— Since ⌊(π‘Ž+𝑒)/π‘βŒ‹ β‰  π‘˜, there are two possibilities. The first possibility π‘Žβ†R𝑄 π‘ž 0βˆ— π‘ž0 π‘ž0 is that π‘Ž + 𝑒 = (π‘˜ + 1)𝑏 + π‘Ÿ β€² where π‘Ÿ β€² ∈ [𝑏]. This occurs when We now return to equation 3 for the compressed term 𝑣 and give π‘Ž ∈ [(π‘˜ + 1) Β· 𝑏 βˆ’ 𝑒, (π‘˜ + 1) Β· 𝑏) βŠ† [(π‘˜ + 1) Β· 𝑏 βˆ’ 𝐡, (π‘˜ + 1) Β· 𝑏). a lower bound the probability that this term hides the original error The second case is when π‘Ž + 𝑒 = (π‘˜ βˆ’ 1)𝑏 + π‘Ÿ β€²β€² . This occurs when term 𝑒. π‘Ž ∈ ((π‘˜ βˆ’ 1) Β· 𝑏, (π‘˜ βˆ’ 1) Β· 𝑏 + 𝑒] βŠ† ((π‘˜ βˆ’ 1) Β· 𝑏, (π‘˜ βˆ’ 1) Β· 𝑏 + 𝐡]. Both We begin by examining the numerator of this function and ob- of these cases satisfy the condition of BAD(π‘Ž, 𝑏; 𝐡) occurring. β–‘ serve that if the secret 𝑠 is invertible modulo π‘ž 0βˆ— , then the term Corollary 3.1.1. Let π‘Ž, 𝑏 be positive integers, and let 𝑒 be an [π‘Ž]π‘ž0βˆ— Β· 𝑠 would define a bijection with domain and range [π‘ž 0βˆ— ]. From integer such that |𝑒 | ≀ 𝐡 and 𝐡 < 𝑏/2. Then the following relation this, we have the simple observation that a uniformly random input holds: jπ‘Ž k jπ‘Ž + 𝑒 k to this bijection gives a uniformly random output over the same Β¬BAD(π‘Ž, 𝑏; 𝐡) =β‡’ = (7) range. 𝑏 𝑏 However, we cannot guarantee that 𝑠 is invertible, and we cannot We will now bound the probability that the BAD event occurs restrict to the case where 𝑠 is invertible while still appealing to for values of π‘Ž that are uniformly random. the security of RLWE. We handle this by observing that while Lemma 3.2 (Boundary Event for Random Values). Let 𝑏 and the polynomial element π‘Ž Β· 𝑠 may not be random, the marginal 𝐡 be positive integers. Let 𝐴 be a distribution over the integers such distribution of each of the individual coefficients of π‘Ž Β· 𝑠 are still that the distribution of [π‘Ž]𝑏 is uniformly random over [𝑏], where random. This is formalized in the following lemma. π‘Ž ← 𝐴. We can bound the probability of the BAD event occurring for Lemma 3.4. Fix an integer 𝑄 = π‘ž 0βˆ— Β· π‘ž 0 . Fix a non-zero 𝑠 ∈ R𝑄 = an output of 𝐴 as follows: Z𝑄 [π‘₯]/(π‘₯ 𝑛 + 1) and an index 𝑖 ∈ [𝑛]. Define the distribution 𝐴𝑖 to 2𝐡 produce samples by first sampling a truly random π‘Ž ← R𝑄 , then Pr [BAD(π‘Ž, 𝑏; 𝐡)] ≀ (8) π‘Žβ†π΄ 𝑏 computing the product π‘Ž Β· 𝑠 modulo π‘ž 0βˆ— , then outputing the 𝑖 π‘‘β„Ž coef- Proof. If we write the value π‘Ž ← 𝐴 as π‘Ž = π‘˜ Β· 𝑏 + π‘Ÿ , we are ficient of this product. The output of this distribution is statistically given that the distribution of π‘Ÿ = [π‘Ž]𝑏 is uniformly random over close to uniform over [π‘ž 0βˆ— ]. [𝑏]. From definition 3.1, the BAD event occurs when π‘Ÿ falls into Proof. We can write the 𝑖 π‘‘β„Ž coefficient of π‘Ž Β· 𝑠 as the inner the range [0, 𝐡) βˆͺ [𝑏 βˆ’ 𝐡, 𝑏). This range is of size 2𝐡, so this occurs product the coefficients of 𝑠 with a rotation of the coefficients of π‘Ž. with probability 2𝐡/𝑏. β–‘ Since the coefficients of π‘Ž are uniformly random modulo π‘ž 0βˆ— , and 𝑠 We will now extend the definition of the boundary event to is non-zero, this inner product is statistically close to uniform over elements over Rπ‘ž . [π‘ž 0βˆ— ]. β–‘ Definition 3.2 (Ring Boundary Event). Let π‘Ž be an element Corollary 3.4.1. Fix an integer 𝑄 = π‘ž 0βˆ— Β· π‘ž 0 . Fix a non-zero of the polynomial ring Rπ‘ž = Zπ‘ž [π‘₯]/(π‘₯ 𝑛 + 1), let 𝑏 be a scalar in Zπ‘ž , 𝑠 ∈ R𝑄 = Z𝑄 [π‘₯]/(π‘₯ 𝑛 + 1) and a positive integer B. For a distribution and let 𝐡 be a positive integer. We will define the boundary case for an of polynomials sampled uniformly over R𝑄 , we can upper bound the element of Rπ‘ž to occur if the boundary case for any of its coefficients probability of the BAD event for the product π‘Ž Β· 𝑠 by the following: occurs. Denote the 𝑖 π‘‘β„Ž coefficient of π‘Ž as π‘Žπ‘– . h i 𝐡 Pr BAD(π‘Ž Β· 𝑠, π‘ž 0βˆ— ; 𝐡) ≀ 2𝑛 βˆ— (12) BAD(π‘Ž, 𝑏; 𝐡) = {βˆƒπ‘– ∈ [𝑛] such that BAD(π‘Žπ‘– , 𝑏; 𝐡)} (9) π‘Žβ†R𝑄 π‘ž 0 33 Session 2 WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea We now complete this analysis by applying corollary 3.4.1 to us to perform ciphertext compression with π‘˜ integer multiplications, equation 3 for the final term 𝑣 to show that 𝑣 hides 𝑒. We will define π‘˜ floating point multiplications, π‘˜ floating point additions, and one distribution that outputs 𝑣 as described in equation 3 and one π‘˜ + 1 modular reductions. Comparing with the naΓ―ve approach, distribution that leaves out the error term 𝑒. The next lemma will this technique replaces the entire multi-precision divide-and-floor give an upper bound that these two distributions output different operation with only the marginal cost increase of π‘˜ floating point values. addition and multiplication operations versus π‘˜ integer addition and multiplication operations. We refer the reader to [20] for further Lemma 3.5. If ||𝑒 || ≀ 𝐡, the final term 𝑣 from equation 3 hides the analysis of the benefits of the DCRT representations. original error term 𝑒 with probability 2𝑛 π‘žπ΅βˆ— , where 𝑛 is the degree of 0 the polynomial modulus π‘₯ 𝑛 + 1. 3.3 Circuit Privacy Operation Proof. Define two distributions in terms of 𝑠, Ξ”π‘š, and 𝑒. Distri- We now give the full circuit privacy procedure and prove that it’s bution 𝐷 1 produces samples by first sampling a uniformly random secure given a sufficiently small error term relative to the divisor π‘ž 0βˆ— . $ The procedure, given in algorithm 1, is very simple: we add an en- βˆ’ R𝑄 for 𝑄 = π‘ž 0βˆ— Β· π‘ž, then producing a sample of the form: π‘Žβ† cryption of zero and then perform the divide-and-round operation [π‘Ž]π‘ž0βˆ— Β· 𝑠 + Ξ”π‘š + 𝑒 $ % described in section 3.1. 𝑑1 = π‘ž 0βˆ— Algorithm 1 Circuit Privacy Procedure, denoted CP This is identical to the real distribution of 𝑣. The second distribution Input: Ciphertext ct, Public Key pk $ 𝐷 2 produces samples by first sampling a uniformly random π‘Ž ← βˆ’ R𝑄 ctπ‘§π‘’π‘Ÿπ‘œ ← Encrypt(pk, 0) and produces a sample of the following form: ct+ = EvalAdd(ct j k π‘§π‘’π‘Ÿπ‘œ , ct) [π‘Ž]π‘ž0βˆ— Β· 𝑠 + Ξ”π‘š ct𝑐𝑝 = ct $ % + π‘ž 0βˆ— 𝑑2 = π‘ž 0βˆ— Output: ct𝑐𝑝 We can bound the probability that any distinguisher can distin- guish between the outputs of 𝐷 1 and 𝐷 2 by upper bounding the Algorithm 2 defines the full circuit-private evaluation algorithm probability that these outputs are different. By corollary 3.4.1, we in terms of algorithm 1 and the original BFV Eval procedure. can say that the marginal distribution of each coefficient of the polynomial [π‘Ž]π‘ž0βˆ— Β· 𝑠 + Ξ”π‘š modulo π‘ž 0βˆ— is uniformly random, so this Algorithm 2 Circuit Private Eval Algorithm, denoted EvalCP is upper bounded by 2𝑛 π‘žπ΅βˆ— . β–‘ Input: Ciphertexts ⟨ct𝑖 ⟩ = {ct𝑖 }𝑛𝑖=0 , Public Key pk, Function 𝑓 0 ct𝑖𝑛 ← BFV.Eval(evk, 𝑓 , ⟨ct𝑖 ⟩) 3.2 Avoiding Multi-Precision Arithmetic ct𝑐𝑝 ← $ βˆ’ CP(ct𝑖𝑛 , pk) The naΓ―ve approach to the above operation is to take an integer Output: ct𝑐𝑝 π‘₯ in DCRT representation {π‘₯ 0, . . . , π‘₯π‘˜βˆ’1 }, recombine into a multi- precision integer, then perform a multi-precision divide and floor by π‘ž 0βˆ— , then take the result mod π‘ž 0 . In total, this requires 2π‘˜ integer Finally we define BFV-CP to be a leveled homomorphic encryp- multiplications, π‘˜ integer additions, 1 multi-precision divide-and- tion scheme specified by the tuple (BFV.KeyGen, BFV.Encrypt, floor, and π‘˜ + 1 modular reductions for each coefficient. BFV.EncryptSK, BFV.Decrypt, EvalCP ). To avoid multi-precision arithmetic required when operating We will now argue that if the noise level of the ciphertext gen- over elements modulo the full ciphertext modulus, we make use erated by BFV.Eval subroutine within the EvalCP does not exceed Î a pre-specified bound, BFV-CP is circuit private (Definition 2.1). of the linearity of the DCRT recombination. Let 𝑄 = 𝑖 π‘žπ‘– , let π‘žπ‘–βˆ— = 𝑄/π‘žπ‘– , and let π‘žΛœπ‘– = π‘žπ‘–βˆ— (mod π‘žπ‘– ). From this equation, we have The high-level proof strategy is to define three hybrids (HybridReal , the following expression for division by π‘ž 0βˆ— : HybridNoiseFree , HybridIdeal ) and show that they are indistinguish- " # able from the point of view of a computationally bounded distin- hπ‘₯ i π‘˜βˆ’1 1  Γ• βˆ—   guisher. = βˆ— [π‘₯𝑖 Β· π‘žΛœπ‘– ]π‘žπ‘– Β· π‘žπ‘– βˆ’ 𝑣 Β· π‘ž (13) π‘ž 0βˆ— π‘ž0 π‘ž 0 𝑖=0 Define D (ct β€², ⟨ct𝑖 ⟩, BFV.pk, BFV.sk, BFV.evk) β†’ {0, 1} to be a π‘ž0 PPT distinguisher, where ct𝑖 are valid BFV encryptions of messages π‘šπ‘– respectively. The domain of the first argument (ct β€² ) is the set of " π‘˜βˆ’1 # 1 Γ• βˆ—  = βˆ— [π‘₯𝑖 Β· π‘žΛœπ‘– ]π‘žπ‘– Β· π‘žπ‘– (14) BFV ciphertexts. π‘ž 0 𝑖=0 π‘ž0 " π‘˜βˆ’1 # β€’ HybridReal Γ• π‘ž0 This hybrid corresponds to the real-world where the adver- = [π‘₯𝑖 Β· π‘žΛœπ‘– ]π‘žπ‘– Β· (15) sary tries to learn the function being computed from the 𝑖=0 π‘žπ‘– π‘ž0 ciphertext that is generated by the evaluator. Concretely, we  π‘ž 0/π‘žπ‘– From equation 15 above, we have that only the terms π‘žΛœπ‘– and define ct β€² ← EvalCP (⟨ct𝑖 ⟩, BFV.pk, 𝑓 ). for all 𝑖 ∈ [π‘˜] are needed to compute the desired term π‘₯/π‘ž 0βˆ— π‘ž . By the correctness of BFV, the intermediate ciphertext (ct𝑖𝑛 ) 0 Since these values only depend on the choice of factors of the computed in EvalCP is a valid encryption of 𝑓 (π‘š 1, . . . π‘šπ‘› ). ciphertext modulus, we can easily precompute them, which allows Represent this as ct𝑖𝑛 = (π‘Žπ‘–π‘› , π‘Žπ‘–π‘› Β· 𝑠 + Ξ”π‘š + 𝑒𝑖𝑛 ). Additionally, 34 Session 2 WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea represent pk = (π‘Ž β€², π‘Ž ′𝑠 + 𝑒 β€² ).1 The encryption of zero has the Given a Hybrid, let 𝑃 [Hybrid] denote a probability defined as form, follows, ctπ‘§π‘’π‘Ÿπ‘œ = (π‘Ž ′𝑒 + 𝑒 β€²β€², π‘Ž ′𝑠𝑒 + 𝑒 ′𝑒 + 𝑒 β€²β€²β€² ) Pr[Hybrid] = Pr [D (Β·, ⟨ct𝑖 ⟩, sk, pk, evk) = 1] where 𝑒, 𝑒 β€²β€², 𝑒 β€²β€²β€² ← πœ’. . Hence, the ciphertext ct+ has the form, | |𝑒 | | Lemma 3.6. If 𝑛 π‘ž+βˆ— ≀ 2βˆ’πœ† for security parameter πœ†, then no PPT β€² β€²β€² 0 ct+ [0] = π‘Žπ‘–π‘› + π‘Ž 𝑒 + 𝑒 = π‘Ž + distinguisher can distinguish Hybrid0 and Hybrid1 with probability ct+ [1] = π‘Žπ‘–π‘› Β· 𝑠 + Ξ”π‘š + 𝑒𝑖𝑛 + π‘Ž ′𝑠𝑒 + 𝑒 ′𝑒 + 𝑒 β€²β€²β€² better than 2βˆ’πœ† . = π‘Ž + Β· 𝑠 + Ξ”π‘š + 𝑒 + | Pr[HybridReal ] βˆ’ Pr[HybridNoiseFree ]| ≀ 2βˆ’πœ† where 𝑒 + = 𝑒𝑖𝑛 + 𝑒 ′𝑒 + 𝑒 β€²β€²β€² βˆ’ 𝑠 Β· 𝑒 β€²β€² . Proof. We can upper bound the success of any PPT distin- Performing the divide-and-floor operation during the CP pro- guisher 𝐷 from distinguishing the outputs of HybridReal and cedure on ct+ results in a new ciphertext ctπ‘žπ‘’π‘œπ‘‘ = ⌊ct+ /π‘ž 0βˆ— βŒ‹, HybridNoiseFree by the probability that the outputs of these hybrids which simplifies to, are different. The only place where these hybrids differ is in the error term that is added to the numerator. In equation 18 in HybridReal , jπ‘Ž k + ctπ‘žπ‘’π‘œπ‘‘ [0] = βˆ— = π‘Ž +β€² (16) π‘ž0 there is an 𝑒 + term that is added in the flooring numerator, while j π‘Ž Β· 𝑠 + Ξ”π‘š + 𝑒 k in equation 20 in hybrid HybridNoiseFree , this term is omitted. + + ctπ‘žπ‘’π‘œπ‘‘ [1] = (17) π‘ž 0βˆ— To invoke lemma 3.5, we must show that each coefficient of j [π‘Ž + ]π‘ž βˆ— Β· 𝑠 + Ξ”π‘š + 𝑒 + k π‘Ž + is uniformly random modulo π‘ž 0βˆ— . This can be seen through a = π‘Ž +β€² Β· 𝑠 + 0 (18) straightforward application of the same argument used in the proof π‘ž 0βˆ— of lemma 3.5 on the coefficients of the product π‘Ž β€² Β· 𝑒. Once we where π‘Ž + = π‘Ž +β€² Β· π‘ž 0βˆ— + [π‘Ž + ]π‘ž0βˆ— . show that these coefficients are random, the fixed offsets defined Note that the ciphertext ct β€² := ctπ‘žπ‘’π‘œπ‘‘ . by the coefficients of the rest of the terms in π‘Ž + do not change this β€’ HybridNoiseFree fact. Therefore, we can invoke lemma 3.5 on the 𝑣 term in hybrid HybridReal to show that this term hides 𝑣 and is statistically close In this hybrid, we define a new simulator that outputs a ci- to the distribution of the 𝑣 terms in HybridNoiseFree . phertext for the distinguisher D, that is very similar to the β–‘ HybridReal . The main difference is that this simulator tweaks Lemma 3.7. Assuming the hardness of RLWE𝑛,π‘ž,πœ’ , no PPT dis- the computation of ctπ‘žπ‘’π‘œπ‘‘ [1] to set the 𝑒 + term to zero. Con- tinguisher D can distinguish HybridNoiseFree from HybridIdeal with cretely, we define a simulator Sim(1𝑙 , pk, evk, sk, 𝑓 , βŸ¨π‘π‘‘π‘– ⟩) β†’ non-negligible advantage. ct β€² , which recomputes all the quantities computed in the | Pr[HybridNoiseFree ] βˆ’ Pr[HybridIdeal ]| ≀ negl(πœ†) HybridReal . It then uses the secret key sk to compute 𝑒𝑖𝑛 . Subsequently it computes 𝑒 + and which can be subtracted Proof. Assume a distinguisher D exists that can distinguish from the ct+ [1]. HybridNoiseFree from HybridIdeal with probability at least 𝛿. We can Thus, the output of this simulator ct β€² simplifies to, use D to construct an adversary A that breaks the RLWE problem jπ‘Ž k + with advantage 𝛿. Given an RLWE sample (π‘Ž, 𝑏), we can construct ct β€² [0] = βˆ— = π‘Ž +β€² (19) a key-pair and ciphertext such that the ciphertext will have the π‘ž0 j [π‘Ž + ]π‘ž βˆ— Β· 𝑠 + Ξ”π‘š k form of a HybridNoiseFree sample if 𝑏 = π‘Ž Β· 𝑒 + 𝑒 and the form of a ct β€² [1] = π‘Ž +β€² Β· 𝑠 + 0 (20) HybridIdeal sample if 𝑏 is uniform over Rπ‘ž . π‘ž 0βˆ— $ Given an RLWE sample (π‘Ž, 𝑏), begin by sampling 𝑠 β€², 𝑒 β€² ← βˆ’ πœ’ and β€’ HybridIdeal set sk = 𝑠 β€² and pk = (π‘Ž, π‘Žπ‘  β€² + 𝑒 β€² ) = (pk0, pk1 ). Next, set π‘Ÿ β€² = 𝑏 and This hybrid corresponds to the ideal world where the adver- compute ctπ‘žπ‘’π‘œπ‘‘ as in equation 22. If 𝑏 = π‘Žπ‘’ + 𝑒 then π‘Ÿ β€² is identically sary only has access to the function output π‘š = 𝑓 (π‘š 1 . . . π‘šπ‘› ) distributed to π‘Ž + in HybridNoiseFree and if 𝑏 is uniform then π‘Ÿ β€² is but has no access to the function 𝑓 . We define a new simula- identically distributed to HybridIdeal . Therefore, any PPT distin- tor Sim(1𝑙 , pk, evk, sk, π‘š) β†’ ct β€² whose output will be sent guisher D that can distinguish HybridNoiseFree from HybridIdeal to the distinguisher D. Note that this is the same simulator can solve decisional RLWE with the same probability. β–‘ required by definition 2.1. The simulator samples a uniformly polynomial π‘Ÿ from Rπ‘ž . It Theorem 3.8 (Circuit Privacy). Given that the input ciphertext then outputs, and public key are well-formed, the input ciphertext error has magni- jπ‘Ÿ k tude bounded by 𝐡 β€² and the error distribution πœ’ is 𝐡-bounded, and ct β€² [0] = βˆ— = π‘Ÿ β€² (21) for a security parameter πœ† we have π‘ž0 j [π‘Ÿ ]π‘ž βˆ— Β· 𝑠 + Ξ”π‘š k 2𝑛𝐡 2 + 𝐡 + 𝐡 β€² 2𝑛 ≀ 2βˆ’πœ† (23) ct β€² [1] = π‘Ÿ β€² Β· 𝑠 + 0 (22) π‘ž 0βˆ— π‘ž 0βˆ— 1 Since we are performing a simple homomorphic operation without homomorphic then algorithm 2 achieves 2βˆ’πœ† -circuit privacy as defined in definition multiplications or rotations, it is sufficient for evk to simply be the public key pk. 2.1. 35 Session 2 WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea Proof. This follows by combining lemmas 3.6 and 3.7, since Algorithm 3 Receiver VOLE Protocol HybridReal is equivalent to the distribution of outputs of algorithm Input: π‘₯ ∈ Z𝑝 , Secret Key sk ∈ Rπ‘ž 2 and HybridIdeal is a distribution that is independent of the circuit $ used to compute the final message. The magnitude of 𝑒 + in lemma ct𝑖𝑛 ← βˆ’ Encrypt(sk, π‘₯) 𝑛𝑒𝑑 3.6 is 𝐡 β€² plus the magnitude of the error of a public key encryption Sender β†βˆ’βˆ’βˆ’ ct𝑖𝑛 (𝑖) 𝑛𝑒𝑑 scheme using error distribution πœ’. The bound in equation 23 follows. {ctπ‘Ÿπ‘’π‘  }πœπ‘–=1 β†βˆ’βˆ’βˆ’ Sender Therefore, algorithm 2 achieves 2βˆ’πœ† -circuit privacy as defined in (𝑖) {𝜸 (𝑖) }πœπ‘–=1 ← Decrypt(sk, ctπ‘Ÿπ‘’π‘  ) definition 2.1. β–‘ Output: 𝜸 = (𝜸 (1) , 𝜸 (2) , . . . , 𝜸 (𝜏) ) 3.4 Correctness In this subsection, we give the correctness of our circuit privacy Algorithm 4 Sender VOLE Protocol procedure. Note that since we’ve already shown the security of this Input: VOLE inputs 𝜢 , 𝜷 ∈ Zπ‘š 𝑝 , Public Key pk operation, we don’t have to worry about equivalent expressions 𝑛𝑒𝑑 leaking information about the original error term. ct𝑖𝑛 β†βˆ’βˆ’βˆ’ Receiver We begin with the expression for the ciphertext quotient. for 𝑖 from 1 to 𝜏 do (𝑖) ctπ‘œπ‘™π‘’ ← EvalAddPlain(EvalMultPlain(ct𝑖𝑛 , 𝜢 (𝑖) ), 𝜷 (𝑖) ) [π‘Ž]π‘ž0βˆ— Β· 𝑠 + Ξ”π‘š + 𝑒  $ %    (𝑖) (𝑖) β€² β€² β€² β€² ct = π‘Ž , π‘Ž Β· 𝑠 + 𝑣 = π‘Ž , π‘Ž Β· 𝑠 + ct𝑐𝑝 ← CP(ctπ‘œπ‘™π‘’ , pk) π‘ž 0βˆ— end for 𝑛𝑒𝑑 (𝑖) Receiver β†βˆ’βˆ’βˆ’ ct𝑐𝑝 = {ct𝑐𝑝 }πœπ‘–=1 $ %  [π‘Ž]π‘ž0βˆ— Β· 𝑠 + 𝑒 1 𝑄 [𝑄] 𝑝  β€² β€² = π‘Ž , π‘Ž ·𝑠 + βˆ— + βˆ—( βˆ’ )π‘š Output: βŠ₯ π‘ž0 π‘ž0 𝑝 𝑝 $ %  [π‘Ž]π‘ž0βˆ— Β· 𝑠 + 𝑒 π‘ž0 [𝑄] 𝑝  β€² β€² (24) = π‘Ž , π‘Ž ·𝑠 + + ( βˆ’ )π‘š π‘ž 0βˆ— 𝑝 π‘ž 0βˆ— Β· 𝑝 In sections 4.1 and 4.2, we show the security and correctness of the VOLE protocol defined by algorithms 3 and 4. $ [𝑄 ] %  jπ‘ž k [π‘Ž]π‘ž0βˆ— Β· 𝑠 βˆ’ 𝑝 𝑝 π‘š + 𝑒  β€² β€² 0 = π‘Ž , π‘Ž ·𝑠 + 𝑝 π‘š + 𝑒𝑓 + π‘ž 0βˆ— 4.1 Security   We will now argue the security of the VOLE protocol. = π‘Ž β€², π‘Ž β€² Β· 𝑠 + Ξ” β€²π‘š + 𝑒 𝑓 + 𝑣 β€² Lemma 4.1 (Security Against Sender). If the parameters of where Ξ” β€² = βŒŠπ‘ž 0 /π‘βŒ‹ and 𝑒 𝑓 is the small error term (||𝑒 𝑓 || ≀ 1) the homomorphic encryption scheme are chose to satisfy semantic introduced by removing βŒŠπ‘ž 0 /π‘βŒ‹π‘š from the flooring term. In section security, algorithm 3 achieves security against an honest-but-curious 4, we will give concrete parameters that result in sufficiently small sender. errors to allow for decryption correctness, but at a high level this Proof. This follows directly from the semantic security of the ciphertext is decryptable if encryption scheme. β–‘ Ξ” β€² /2 > ||𝑣 β€² + 𝑒 𝑓 || The security against a semi-honest receiver is dependent on the choosing parameters that satisfy the constraints given in theorem 4 OUR VOLE PROTOCOL 3.8. In this section, we give our VOLE protocol. Let π‘š be the length of the VOLE vectors and let 𝑛 be the dimension of the ring R = Lemma 4.2 (VOLE Circuit Privacy Parameter). Let πœ’ be a Z[π‘₯]/(π‘₯ 𝑛 + 1). Our VOLE protocol splits the vectors in Zπ‘š 𝐡-bounded error distribution. For error distribution πœ’, polynomial 𝑝 over 𝜏 = βŒˆπ‘š/π‘›βŒ‰ elements of R𝑝 , padding with zeros when necessary. The degree 𝑛, plaintext modulus 𝑝, ciphertext modulus π‘ž = π‘ž 0 Β· π‘ž 0βˆ— , and sender then uses homomorphic encryption operations to compute security parameter πœ†, the following parameter bound will result in the VOLE result, which is decrypted by the receiver. the homomorphic evaluation in algorithm 4 to maintain 2βˆ’πœ† -circuit In more detail, the receiver begins the protocol by encoding the privacy. input π‘₯ ∈ Z𝑝 in a single BFV ciphertext ctπ‘₯ , which it sends to the 2𝑛𝐡 2 + 𝐡 + 𝑛𝑝𝐡 2𝑛 ≀ 2βˆ’πœ† (25) sender. The sender’s inputs 𝜢 , 𝜷 ∈ Zπ‘š 𝑝 are split into 𝜏 elements π‘ž 0βˆ— (𝜢 , 𝜢 , . . . , 𝜢 ) and (𝜷 , 𝜷 , . . . , 𝜷 (𝜏) ). For each of the 𝜏 (1) (2) (𝜏) (1) (2) Proof. This follows from theorem 3.8 if we show that the input blocks, the sender computes a ciphertext to the circuit privacy procedure has an error term with magnitude ct (𝑖) = EvalAddPlain(EvalMultPlain(ctπ‘₯ , 𝜢 (𝑖) ), 𝜷 (𝑖) ) bounded by 𝑛𝑝𝐡. This follows directly from the analysis of the error growth of the EvalMultPlain operation. The encrypted input to the The sender then performs the circuit privacy procedure from section EvalMultPlain function in algorithm 4 is a fresh encryption of the 3 and returns each of the resulting ciphertexts to the receiver. The receiver’s input which has an error term with magnitude upper receiver then decrypts the 𝜏 blocks to obtain the VOLE result. bounded by 𝐡, so the magnitude of the resulting error term is no The pseudo-code for the sender and receiver roles are given in more than 𝑛𝑝𝐡. The bound in equation 25 follows from theorem algorithms 3 and 4. 3.8 where 𝐡 β€² = 𝑛𝑝𝐡. β–‘ 36 Session 2 WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea Lemma 4.3 (Security Against Receiver). Given that the param- Algorithm 5 Receiver BOLE Protocol eters of the homomorphic encryption scheme are chosen to satisfy the Input: x ∈ Zπ‘š 𝑝 , Secret Key sk ∈ Rπ‘ž circuit privacy constraint in lemma 4.2, algorithm 4 achieves security Let (x , x (2) , . . . , x (𝜏) ) ← x, where 𝜏 = βŒˆπ‘š/π‘›βŒ‰ (1) against an honest-but-curious receiver. for 𝑖 from 1 to 𝜏 do (𝑖) $ Proof. The proof of this lemma relies on the circuit privacy ct𝑖𝑛 ← βˆ’ Encrypt(sk, x (𝑖) ) result of theorem 3.8. Note that we crucially rely on the receiver’s end for 𝑛𝑒𝑑 (𝑖) initial ciphertext being well-formed. If this is the case, then a simula- Sender β†βˆ’βˆ’βˆ’ {ct𝑖𝑛 }πœπ‘–=1 tor S can use the simulator from theorem 3.8 to return a ciphertext (𝑖) 𝑛𝑒𝑑 {ctπ‘Ÿπ‘’π‘  }πœπ‘–=1 β†βˆ’βˆ’βˆ’ Sender to the receiver that is independent of the circuit used to compute (𝑖) the final message 𝜸 by sampling a ciphertext as in Hybrid2 from {𝜸 (𝑖) }πœπ‘–=1 ← Decrypt(sk, ctπ‘Ÿπ‘’π‘  ) section 3. Since S knows the correct output message 𝜸 , this ci- Output: 𝜸 = (𝜸 (1) , 𝜸 (2) , . . . , 𝜸 (𝜏) ) phertext is indistinguishable from the output of the circuit privacy procedure in the real algorithm 4. Therefore, by theorem 3.8, the message produced by algorithm 4 is indistinguishable from the dimension 𝑛. These parameter sets must satisfy both the computa- output produced by the simulator that samples a ciphertext from tional security requirements of the RLWE problem as well as the Hybrid2 . β–‘ security requirements of the circuit-privacy procedure. In the VOLE protocol, there are no restrictions on the plaintext modulus 𝑝 other Theorem 4.4 (Secure VOLE Protocol). Algorithms 3 and 4 than its size, so we opt for powers of two for ease of comparison. In define of a secure VOLE protocol. the BOLE protocol, the plaintext modulus must support the NTT operation to encode the vectors as polynomials for component-wise Proof. This follows from combining lemmas 4.1 and 4.3. β–‘ operations. This requires that Z𝑝 contain a 2𝑛 th root of unity. From the previous section, we have several constraints on the 4.2 Correctness parameters for the protocol to satisfy both security and correctness. We will now give parameters for which the VOLE protocol is correct. (1) For circuit privacy parameter πœ†π‘π‘ , lemma 4.2 gives the fol- lowing lower bound on π‘ž 0βˆ— . Lemma 4.5 (Correctness). Let πœ’ be a 𝐡-bounded error distribu- tion. For error distribution πœ’, polynomial degree 𝑛, plaintext modulus 2𝑛𝐡 2 + 𝐡 + 𝑛𝑝𝐡 𝑝, and ciphertext modulus π‘ž = π‘ž 0 Β· π‘ž 0βˆ— . Algorithms 3 and 4 achieve 1βˆ’π‘› β‰₯ 1 βˆ’ 2βˆ’πœ†π‘π‘ π‘ž 0βˆ— the correct VOLE functionality as defined in definition A.1 if π‘ž 0βˆ— is chosen to satisfy the circuit privacy constraint from lemma 4.2 and π‘ž 0βˆ— β‰₯ 2πœ†π‘π‘ Β· 𝑛(2𝑛𝐡 2 + 𝐡 + 𝑛𝑝𝐡) π‘ž 0 > 2𝑝𝑛𝐡 + 2𝑝 + [π‘ž 0 ] 𝑝 , where 𝑣 is the compressed error term from (2) For correctness, lemma 4.5 gives the following lower bound equation 3. on π‘ž 0 . π‘ž 0 > 2𝑝𝑛𝐡 Proof. This follows directly from the BFV scheme. β–‘ (3) Finally, for semantic security, we must choose 𝑛 and π‘ž to satisfy the desired security level. These parameters are given 4.3 Simple Extension to Batched OLE in table 1 in the Homomorphic Encryption Standard [2]. In The extension of this VOLE protocol to a BOLE protocol is straight- particular, this table gives a maximum size of π‘ž for a given forward and preserves the security proofs of section 4.1. In addition, value of 𝑛. We must select an 𝑛 that is large enough for our this BOLE extension maintains the exact communication complex- choice of π‘ž. ity of the VOLE protocol. Only the receiver’s role is modified in this Since we have competing constraints on π‘ž, it is not immediately extension. We show the receiver’s BOLE procedure in algorithm 5. clear that we can pick parameters that will be able to satisfy all The receiver begins with a vector x of values, which she encodes these constraints. Luckily, there are many parameter sets that satisfy as a polynomial as described in the full version of this paper. The these conditions. operations performed by the sender are identical. To obtain the Our implementation of the discrete Gaussian sampling algorithm BOLE output, the receiver must decode the decryption result by follows [16], which takes advantage of the small standard deviation evaluating the resulting polynomial at the roots of unity determined to simply compute the probability of sampling all of the most com- by the encoding NTT parameters. mon integers. This truncated distribution is statistically close to the real discrete Gaussian distribution while still allowing us to bound 5 IMPLEMENTATION & PERFORMANCE the maximum size of a sampled term. Following the homomorphic In this section, we give an implementation of the VOLE and BOLE encryption standard, we use a standard deviation of 𝜎 = 3.2. Includ- protocols presented in section 4. ing the sign bit, no term sampled from πœ’ is greater than five bits, so 𝐡 = 32 for all our parameter sets. 5.1 Parameter Selection In most of the parameter sets in this section, each limb of the In both of these protocols, there are four parameters that must se- ciphertext modulus will be 55 bits. For a circuit privacy parameter of lected: the magnitude bound 𝐡 of the 𝐡-bounded error distribution πœ†π‘π‘ = 80, we will use a ciphertext modulus consisting of four limbs πœ’, the plaintext modulus 𝑝, the ciphertext modulus π‘ž, and the ring for a total of 220 bits. This is the maximum size of the modulus for 37 Session 2 WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea 𝑛 = 213 in table 1 of the HE standard [2]. For a plaintext modulus 𝑝 = 232 , these parameters satisfy the three constraints given above. 5.2 Extending to Larger Moduli There are some applications which require moduli that are larger than can be supported by a final π‘ž 0 that fits in a standard machine word. We handle these moduli by using the same DCRT represen- tation as the ciphertext modulus. In words, for a plaintext modulus 𝑝 that is too large for a standard machine word, we represent 𝑝 as ΓŽβ„“ a product of primes 𝑝 = 𝑖=0 𝑝𝑖 where each 𝑝𝑖 is small enough Figure 1: Time per OLE multiplication vs. VOLE length compared to be supported by a π‘ž 0 that fits in a standard machine word for to [3] (denoted ADI+17). Times are measured in microseconds, and the given circuit privacy parameter. This allows us to extend the the VOLE protocol is over a 32-bit field. The runtime for the noise- support of the protocol in section 4 to larger moduli with a factor flooding variant of our protocol was estimated by implementing the of β„“ overhead. noise sampling function using NTL [33] and adding this sampling time to our protocol time. 5.3 Optimizing for Low-Bandwidth Networks In section 5.4, we will benchmark our implementation in an envi- ronment with low network bandwidth. We define this as a setting where the network bandwidth is the bottleneck of the protocol. To reduce the overall communication of our protocol, we include in our implementation a common optimization to reduce the size of a fresh Ring LWE encryption by a factor of two. The idea is simple: instead of sampling a random π‘Ž polynomial for a ciphertext (π‘Ž, 𝑏), sample a random PRG seed 𝜎 and generate π‘Ž ← 𝑃𝑅𝐺 (𝜎) to be the output of the PRG. The remainder of the ciphertext is generated as if π‘Ž were sampled normally, but when it comes time to send the ciphertext we only need to send the pair (𝜎, 𝑏), replacing the Figure 2: Time for a single run of the VOLE protocol vs. VOLE element of Rπ‘ž with a 16 byte PRG seed. length compared to [3] (denoted ADI+17). Times are mea- sured in seconds, and the VOLE protocol is over a 32-bit field. 5.4 Experimental Setup & Results We implement2 the VOLE and BOLE protocols from section 4 on slower than the machine used for the benchmarks in [3], but this top of an implementation of the BFV [11, 17] homomorphic en- is not considered in our comparisons (all numbers are reported cryption scheme based on the BFV-RNS implementation of Halevi, without scaling from [3]). Polyakov, and Shoup [20] in addition to code from the work of Using this setup, we took two types of benchmarks. The first Juvekar, Vaikuntanathan, and Chandrakasan [24]. We use an NTT benchmark is of the protocol running between the two machines implementation based off of the NFLlib library of Aguilar-Melchor, using all of the threads. Based on this runtime, we divided by the Barrier, Guelton, Guinet, Killijian, and Lepoint [1]. length of the VOLE protocol to get the time per OLE multiplication. Asymptotically Linear Comparisons. As mentioned in section We then ran the protocol once using a single thread per machine 1.2, we separate prior work into two categories and compare to to measure the latency, the time that a higher level application each category separately. The first category consists of protocols must wait for the protocol to complete. We compare these results with (asymptotically) linear communication complexity, and, more against the numbers given in [3] (see tables 2 and 3 in [3]) for their heuristically, protocols that are optimized for small VOLE lengths. optimized 32 bit protocol. These comparisons are given in figures The fastest protocol in this category that we are aware of is the 1 and 2. We note that the trend lines for the [3] protocol do not work of Applebaum, DΓ₯mgard, Ishai, Nielsen, and Zichron [3]. To consider the increase in communication of the protocol, which compare against [3], we ran our protocol on two AWS m5.2xlarge could result in further slowdown in settings where the bandwidth instances, each having 8 vCPUs at 3.1 GHz and 32 GB of RAM. is constrained. We also note that this graph includes an estimated These instances were in the same geographic region (US-East) and benchmark for the time per OLE for our protocol if the circuit were connected by a network with a bandwidth of 500 MB/sec. This privacy analysis from section 3 is not considered and noise flooding setup was intended to replicate the conditions of the experiments is used to achieve circuit privacy. This runtime is estimated by of [3] (see section 6.5). Our network bandwidth is much lower than implementing the sampling function for the noise flooding term in in their setup which favors [3] as their communication complexity NTL [33], then adding the time for this sampling algorithm to our does not exceed the bandwidth of the network we used. We employ runtime. the communication optimization described in section 5.3. We also Finally, we measured the communication complexity of a sin- note that the clock speed of the CPU of our machine is slightly gle run of our VOLE protocol and compared it to estimates of the communication complexity of [3] based on the reported consumed 2 https://github.com/leodec/ole_wahc bandwidth for their protocol. This comparison is displayed in figure 38 Session 2 WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea Figure 4: Time for a single run of the VOLE protocol vs VOLE Figure 3: Total communication in kilobytes for a single run length compared to [31] (denoted SGRR19). The runtime is of the VOLE protocol vs. VOLE length compared to [3] (de- measured in seconds, and the VOLE is over a 32-bit field. noted ADI+17). The VOLE is over a 32-bit field. Note that the Both axes are shown in log scale to better display the trends. slope of the ADI+17 trend line is greater than the slope of our Based on extrapolation, we estimate the intersection point protocol’s trend line. We estimate the cross-over point in the in the runtimes to be near a VOLE length of 235 . communication complexity to be a VOLE length of around 216 . 3. For small VOLE, the protocol of [3] achieves better communica- tion complexity than our protocol for VOLE lengths less than 216 , although beyond this length our protocol achieves better commu- nication complexity. Asymptotically Sublinear Comparisons. The second category of prior art consists of protocols with sub-linear communication com- plexity. These protocols are designed to be optimized for large VOLE lengths, and due the sub-linear communication they will Figure 5: Total communication for a single run of the VOLE always be asymptotically faster than our protocol. Despite this, our protocol vs VOLE length compared to [31] (denoted SGRR19). protocol is faster for smaller VOLE lengths, and our comparison The communication is measured in megabytes, and the aims to determine VOLE length at which one protocol becomes VOLE is over a 32-bit field. Both axes are shown in log scale faster than the other. By determining this point, we are able to to better display the trends. Based on extrapolation, we esti- discuss applications for which our protocol is more efficient and mate the intersection point in the communication to be near when it makes sense to switch to a protocol optimized for larger a VOLE length of 232 . VOLE sizes. To our knowledge, the fastest protocol in this category is the work of Schoppmann, GascΓ³n, Reichert, and Raykova [31], which is an optimized two-party implementation of the FSS-based is able to generate arbitrary VOLE correlations directly. While there sub-linear vector VOLE [8]. is a simple and secure reduction from random VOLE to arbitrary Since the authors of [31] included their code, we were able to VOLE, it is an interesting future direction to try to further optimize run side-by-side comparisons between our protocol and theirs. our protocol for generating random VOLE correlations. Following the setup from their work, we ran each protocol on a single thread on two machines connected by a network with 500 5.5 Comparison Against Subsequent Works MB/sec bandwidth. The results of these comparisons are given in In this section, we briefly compare against works subsequent to the figures 4 and 5. By extrapolating the data in figure 4, we estimate the online publication of this work. the cross-over point is near a VOLE length of 235 . Extrapolating the communication complexity trends in figure 5 gives an approximate Comparisons to Asymptotically Linear Protocols. We compare cross-over point near a VOLE length of 232 . In section 5.6, we give against the one-round BOLE protocol of Baum et al. [5]. This work some applications that require VOLE protocols of width less than presents several different BOLE protocols, and we compare against this cross-over point. the version the authors chose to benchmark. This protocol assumes We note briefly that the protocol in [31] requires running a that the two parties each begin with the public key of the other smaller VOLE protocol that it then expands using an LPN code. party, and we do not count this key exchange as part of the com- When generating an exceptionally long VOLE correlation, one could munication of this protocol. Aside from the round complexity, our run a protocol that recursively calls [31] until it reaches a point BOLE protocol outperforms the one-round protocol in both com- where our protocol is faster, then runs our protocol as the base munication and computation. To compare this one-round protocol case. It is an interesting future direction to combine our protocol to our C++ implementation, we implemented the one-round BOLE with [31] to improve the performance of VOLE with sub-linear protocol using the same underlying RLWE library as our BOLE communication. protocol. Based on these comparisons, our protocol is a 20-30Γ— We also note that both protocols in this section focus on optimiz- faster computationally in experiments where each party is running ing the generation of random VOLE correlations while our protocol in a single thread connecting over localhost. Our benchmarks 39 Session 2 WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea for the one-round protocol for a 60-bit plaintext was less than one PALISADE library. We gratefully acknowledge funding from the second, faster than Golang benchmark given in Baum et al. [5] with Jameel-Clinic and from the National Science Foundation. the same parameters, which was roughly 1.5 seconds. Our total communication is strictly less than the one-round protocol, since one of their ciphertexts requires essentially the same LPR overhead REFERENCES as ours, while the other ciphertext that they send requires double [1] Carlos Aguilar-Melchor, Joris Barrier, Serge Guelton, Adrien Guinet, Marc-Olivier the LPR overhead. Killijian, and TancrΓ¨de Lepoint. 2016. NFLlib: NTT-Based Fast Lattice Library. In Topics in Cryptology - CT-RSA 2016, Kazue Sako (Ed.). Springer International Comparisons to Asymptotically Sublinear Protocols. To compare Publishing, Cham, 341–356. to the subsequent VOLE protocol of Weng et al. [34], we note that [2] Martin Albrecht, Melissa Chase, Hao Chen, Jintai Ding, Shafi Goldwasser, Sergey Gorbunov, Shai Halevi, Jeffrey Hoffstein, Kim Laine, Kristin Lauter, Satya Lokam, this work claims a roughly 20Γ— computational speedup over Schopp- Daniele Micciancio, Dustin Moody, Travis Morrison, Amit Sahai, and Vinod mann et al. [31]. We can extrapolate this comparison to estimate Vaikuntanathan. 2018. Homomorphic Encryption Security Standard. Technical the crossover point for the Weng et al. protocol, which is roughly a Report. HomomorphicEncryption.org, Toronto, Canada. [3] Benny Applebaum, Ivan DamgΓ₯rd, Yuval Ishai, Michael Nielsen, and Lior Zichron. VOLE length of 228 . For communication, the Weng et al. protocol a 2017. Secure Arithmetic Computation with Constant Computational Overhead. setup cost of 1.1 MB and subsequent communication of 0.42 bits In Advances in Cryptology – CRYPTO 2017, Jonathan Katz and Hovav Shacham per VOLE correlation puts the crossover point with our protocol to (Eds.). Springer International Publishing, Cham, 223–254. [4] Abhishek Banerjee, Chris Peikert, and Alon Rosen. 2012. Pseudorandom be near a VOLE length of 215 . Functions and Lattices. In Advances in Cryptology – EUROCRYPT 2012, David To compare against the BOLE protocol of Boyle et al. [10], we Pointcheval and Thomas Johansson (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 719–737. first note that the work does not give the benchmarks of an actual [5] Carsten Baum, Daniel Escudero, Alberto Pedrouzo-Ulloa, Peter Scholl, and implementation, only estimates of the computation time. They give Juan RamΓ³n Troncoso-Pastoriza. 2020. Efficient Protocols for Oblivious Lin- various parameter sets to allow for a communication-computation ear Function Evaluation from Ring-LWE. In Security and Cryptography for Net- works, Clemente Galdi and Vladimir Kolesnikov (Eds.). Springer International trade-off. For 128 bits of security, The estimates they provide for Publishing, Cham, 130–149. generating 220 batch OLE correlations give performance that ranges [6] Avrim Blum, Adam Tauman Kalai, and Hal Wasserman. 2003. Noise-Tolerant from roughly 13 seconds with nearly 18 MB of communication to Learning, the Parity Problem, and the Statistical Query Model. Journal of the ACM (JACM) 50, 4 (July 2003), 506–519. https://www.microsoft.com/en-us/research/ nearly 20 seconds with communication roughly 1.5 MB. The plain- publication/noise-tolerant-learning-parity-problem-statistical-query-model/ text modulus of these BOLE correlations was roughly 2124 , and [7] Florian Bourse, RafaΓ«l Del Pino, Michele Minelli, and Hoeteck Wee. 2016. FHE Circuit Privacy Almost for Free. In Advances in Cryptology – CRYPTO 2016, while our BOLE protocol is not optimized for these large moduli, we Matthew Robshaw and Jonathan Katz (Eds.). Springer Berlin Heidelberg, Berlin, can estimate the performance for a plaintext modulus of this size by Heidelberg, 62–89. generating four 32-bit BOLE correlations for each 124-correlation [8] Elette Boyle, Geoffroy Couteau, Niv Gilboa, and Yuval Ishai. 2018. Compressing Vector OLE. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and in [10] (this simulates a CRT splitting of a 128-bit plaintext mod- Communications Security (Toronto, Canada) (CCS ’18). Association for Comput- ulus). Our benchmarks for 222 BOLE correlations with a 32-bit ing Machinery, New York, NY, USA, 896–912. https://doi.org/10.1145/3243734. plaintext modulus give a computation time of 3 seconds and a total 3243868 [9] Elette Boyle, Geoffroy Couteau, Niv Gilboa, Yuval Ishai, Lisa Kohl, Peter Rindal, communication of about 200 MB. and Peter Scholl. 2019. Efficient Two-Round OT Extension and Silent Non- Interactive Secure Computation. In Proceedings of the 2019 ACM SIGSAC Con- 5.6 Application: Secure Image Convolution ference on Computer and Communications Security (London, United Kingdom) (CCS ’19). Association for Computing Machinery, New York, NY, USA, 291–308. There are many applications for a VOLE protocol, including many https://doi.org/10.1145/3319535.3354255 that do not require VOLE correlations of length greater than the [10] Elette Boyle, Geoffroy Couteau, Niv Gilboa, Yuval Ishai, Lisa Kohl, and Peter cross-over points discussed in section 5.4. One concrete example Scholl. 2020. Efficient Pseudorandom Correlation Generators from Ring-LPN. In Advances in Cryptology – CRYPTO 2020, Daniele Micciancio and Thomas is image convolution for secure neural network evaluation. The Ristenpart (Eds.). Springer International Publishing, Cham, 387–416. canonical image convolution operations takes in a two-dimensional [11] Zvika Brakerski. 2012. Fully Homomorphic Encryption without Modulus Switch- image and a small two-dimensional kernel and produces an output ing from Classical GapSVP. Proceedings of Advances in Cryptology-Crypto 7417 (08 2012). https://doi.org/10.1007/978-3-642-32009-5_50 image where each output pixel is the sum of the component-wise [12] Zvika Brakerski and Vinod Vaikuntanathan. 2011. Efficient Fully Homomorphic product of the kernel and a region of input image. While one can Encryption from (Standard) LWE. In IEEE 52nd Annual Symposium on Foundations of Computer Science, FOCS 2011, Palm Springs, CA, USA, October 22-25, 2011, Rafail describe this operation using two-dimensional Fourier transforms, Ostrovsky (Ed.). IEEE Computer Society, 97–106. https://doi.org/10.1109/FOCS. one of the most efficient implementations of secure convolution 2011.12 is to use VOLE. In particular, one can define a VOLE correlation [13] Melissa Chase, Yevgeniy Dodis, Yuval Ishai, Daniel Kraschewski, Tianren Liu, Rafail Ostrovsky, and Vinod Vaikuntanathan. 2019. Reusable Non-Interactive for each element in the kernel and then perform a scalar-vector Secure Computation. In Advances in Cryptology – CRYPTO 2019, Alexandra product with the elements of the image multiplied by the specific Boldyreva and Daniele Micciancio (Eds.). Springer International Publishing, kernel element. Even for large neural networks such as networks Cham, 462–488. [14] Leo de Castro, Chiraag Juvekar, and Vinod Vaikuntanathan. 2020. Fast Vector that process mammograms for cancer diagnosis [35], the size of a Oblivious Linear Evaluation from Ring Learning with Errors. Cryptology ePrint convolution does not exceed the size of the input image, which is Archive, Report 2020/685. https://ia.cr/2020/685. [15] LΓ©o Ducas and Damien StehlΓ©. 2016. Sanitization of FHE Ciphertexts. In Advances 2028 Γ— 2028 = 222 pixels. Based on the benchmarks in section 5.4, in Cryptology - EUROCRYPT 2016 - 35th Annual International Conference on the we would outperform all prior works for the generation of these Theory and Applications of Cryptographic Techniques, Vienna, Austria, May 8- VOLE correlations. 12, 2016, Proceedings, Part I (Lecture Notes in Computer Science, Vol. 9665), Marc Fischlin and Jean-SΓ©bastien Coron (Eds.). Springer, 294–310. https://doi.org/10. 1007/978-3-662-49890-3_12 ACKNOWLEDGMENTS [16] Nagarjun Dwarakanath and Steven Galbraith. 2014. Sampling from discrete Gaussians for lattice-based cryptography on a constrained device. Applicable We would like to thank Michael Nielsen for providing the code Algebra in Engineering, Communication and Computing 25 (06 2014). https: for [3] and Yuriy Polyakov for answering our questions about the //doi.org/10.1007/s00200-014-0218-3 40 Session 2 WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea [17] Junfeng Fan and Frederik Vercauteren. 2012. Somewhat practical fully homomor- let Ω𝑅 = Z𝑛𝑝 . The function 𝑓𝑆 (πœ‰π‘† , πœ‰π‘… ) = βŠ₯ for all (πœ‰π‘† , πœ‰π‘… ) ∈ Ξžπ‘† Γ— Ξžπ‘… . phic encryption. https://eprint.iacr.org/2012/144.pdf For πœ‰π‘† = (𝜢 , 𝜷) and πœ‰π‘… = π‘₯, define 𝑓𝑅 (πœ‰π‘† , πœ‰π‘… ) = 𝜢 Β· π‘₯ + 𝜷 mod 𝑝. [18] Craig Gentry. 2009. Fully homomorphic encryption using ideal lattices. STOC. [19] Craig Gentry, Shai Halevi, and Vinod Vaikuntanathan. 2010. i-Hop Homomorphic Encryption and Rerandomizable Yao Circuits. In Advances in Cryptology - CRYPTO Definition A.2 (Batch Oblivious Linear Evaluation). For an 2010, 30th Annual Cryptology Conference, Santa Barbara, CA, USA, August 15-19, integer 𝑝 and 𝑛, define the protocol BOLE (𝑝,𝑛) to have the following 2010. Proceedings (Lecture Notes in Computer Science, Vol. 6223), Tal Rabin (Ed.). ideal functionality. Let Ξžπ‘† = Z𝑛𝑝 Γ— Z𝑛𝑝 , let Ω𝑆 = βˆ…, let Ξžπ‘… = Z𝑛𝑝 , and Springer, 155–172. https://doi.org/10.1007/978-3-642-14623-7_9 [20] Shai Halevi, Yuriy Polyakov, and Victor Shoup. 2019. An Improved RNS Variant of let Ω𝑅 = Z𝑛𝑝 . The function 𝑓𝑆 (πœ‰π‘† , πœ‰π‘… ) = βŠ₯ for all (πœ‰π‘† , πœ‰π‘… ) ∈ Ξžπ‘† Γ— Ξžπ‘… . the BFV Homomorphic Encryption Scheme. In Topics in Cryptology – CT-RSA 2019 For πœ‰π‘† = (𝜢 , 𝜷) and πœ‰π‘… = x, define 𝑓𝑅 (πœ‰π‘† , πœ‰π‘… ) = 𝜢 βŠ™ x + 𝜷 mod 𝑝. - The Cryptographers’ Track at the RSA Conference 2019, Proceedings (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)), Mitsuru Matsui (Ed.). Springer-Verlag, 83–105. A.2 Ring Learning with Errors https://doi.org/10.1007/978-3-030-12612-4_5 [21] Carmit Hazay, Yuval Ishai, Antonio Marcedone, and Muthuramakrishnan Venki- In this section, we define the decisional Ring Learning with Errors tasubramaniam. 2019. LevioSA: Lightweight Secure Arithmetic Computation. In (RLWE) [27] problem. Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (London, United Kingdom) (CCS ’19). Association for Computing Ma- Definition A.3 (Decisional Ring Learning with Errors [27]). chinery, New York, NY, USA, 327–344. https://doi.org/10.1145/3319535.3354258 For integers 𝑛 and π‘ž, a ring Rπ‘ž = Zπ‘ž [π‘₯]/(π‘₯ 𝑛 + 1), and an error [22] Yuval Ishai and Anat Paskin. 2007. Evaluating Branching Programs on Encrypted Data. In Theory of Cryptography, Salil P. Vadhan (Ed.). Springer Berlin Heidelberg, distribution πœ’ over Rπ‘ž , the decisional Ring Learning with Errors Berlin, Heidelberg, 575–594. problem RLWE𝑛,π‘ž,πœ’ is to distinguish between samples of the form [23] Yuval Ishai, Manoj Prabhakaran, and Amit Sahai. 2008. Founding Cryptography $ $ on Oblivious Transfer – Efficiently. 572–591. https://doi.org/10.1007/978-3-540- (π‘Ž, π‘Žπ‘  + 𝑒) and (π‘Ž, 𝑒) where π‘Ž, 𝑒 ← βˆ’ Rπ‘ž and 𝑠, 𝑒 ← βˆ’ πœ’. 85174-5_32 [24] Chiraag Juvekar, Vinod Vaikuntanathan, and Anantha Chandrakasan. 2018. In this work, we consider πœ’ to be the discrete, zero-centered GAZELLE: A Low Latency Framework for Secure Neural Network Inference. In 27th USENIX Security Symposium (USENIX Security 18). USENIX Association, Bal- Gaussian distribution over Rπ‘ž . This follows the homomorphic en- timore, MD, 1651–1669. https://www.usenix.org/conference/usenixsecurity18/ cryption security standard [2] from which we get the concrete presentation/juvekar parameters used in the implementation in section 5. [25] Kim Laine. 2019. https://github.com/microsoft/SEAL/issues/89. [26] Vadim Lyubashevsky and Daniele Micciancio. 2006. Generalized Compact Knap- sacks Are Collision Resistant. In Automata, Languages and Programming, Michele A.3 Homomorphic Encryption Bugliesi, Bart Preneel, Vladimiro Sassone, and Ingo Wegener (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 144–155. In this section, we give high level definitions for the algorithms [27] Vadim Lyubashevsky, Chris Peikert, and Oded Regev. 2010. On Ideal Lattices and comprising a leveled homomorphic encryption scheme as well as Learning with Errors Over Rings. EUROCRYPT (2010). necessary security definitions. [28] Moni Naor and Benny Pinkas. 2006. Oblivious Polynomial Evaluation. SIAM J. Comput. 35 (01 2006), 1254–1281. https://doi.org/10.1137/S0097539704383633 Definition A.4 (Leveled Homomorphic Encryption). A lev- [29] Zhiniang Peng. 2019. Danger of using fully homomorphic encryption: A look at Microsoft SEAL. CoRR abs/1906.07127 (2019). arXiv:1906.07127 http://arxiv.org/ eled homomorphic encryption scheme abs/1906.07127 [30] Y. Polyakov, K. Rohloff, and G.W Ryan. [n.d.]. PALISADE lattice cryptography E = (KeyGen, Encrypt, Eval, Decrypt) library. https://git.njit.edu/palisade/PALISADE. [31] Phillipp Schoppmann, AdriΓ  GascΓ³n, Leonie Reichert, and Mariana Raykova. is a set of PPT algorithms defined as follows: 2019. Distributed Vector-OLE: Improved Constructions and Implementation. β€’ KeyGen(1πœ† , 1𝐿 ) β†’ (sk, pk, evk) https://doi.org/10.1145/3319535.3363228 [32] SEAL 2020. Microsoft SEAL (release 3.5). https://github.com/Microsoft/SEAL. Given the security parameter πœ† and a maximum circuit depth Microsoft Research, Redmond, WA. 𝐿, outputs a key pair consisting of a public encryption key pk, [33] Victor Shoup. [n.d.]. NTL: A Library for doing Number Theory. https://shoup. a secret decryption key sk, and an evaluation key evk. net/ntl/. [34] C. Weng, K. Yang, J. Katz, and X. Wang. 2021. Wolverine: Fast, Scalable, and β€’ Encrypt(pk, π‘š) β†’ ct Communication-Efficient Zero-Knowledge Proofs for Boolean and Arithmetic Given a message π‘š ∈ M and an encryption key pk, outputs a Circuits. In 2021 2021 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, Los Alamitos, CA, USA, 1074–1091. https://doi.org/10.1109/SP40001. ciphertext ct. 2021.00056 β€’ Eval(evk, 𝑓 , ct1, ct2, . . . , ct𝑛 ) β†’ ct β€² [35] Adam Yala, Constance Lehman, Tal Schuster, Tally Portnoi, and Regina Barzilay. Given the evaluation key, a description of a function 𝑓 : M𝑛 β†’ 2019. A Deep Learning Mammography-based Model for Improved Breast Cancer Risk Prediction. Radiology 292 (05 2019), 182716. https://doi.org/10.1148/radiol. M with multiplicative depth at most 𝐿, and 𝑛 ciphertexts 2019182716 encrypting messages π‘š 1, . . . , π‘šπ‘› , outputs the result ciphertext ct β€² encrypting π‘š β€² = 𝑓 (π‘š 1, . . . , π‘šπ‘› ). β€’ Decrypt(sk, ct) = π‘š Given the secret decryption key and a A FURTHER BACKGROUND ciphertext ct encrypting π‘š, outputs π‘š. In this appendix, we give further background omitted from section 2 due to space constraints. Optionally the scheme E may be extended with a PPT algorithm EncryptSK(sk, π‘š) β†’ ct which uses the secret key sk rather than the public key pk, to compute the ciphertext ct from the message A.1 Oblivious Linear Evaluation π‘š. We now give formal definitions of vector OLE and batch OLE, along When returning a ciphertext output by the Eval function, it is with security definitions. often desirable for this ciphertext to hide the function 𝑓 that was used to produce it. This property of the scheme is called circuit Definition A.1 (Vector Oblivious Linear Evaluation). For privacy, which we formally define below. an integer 𝑝 and 𝑛, define the protocol VOLE (𝑝,𝑛) to have the following ideal functionality. Let Ξžπ‘† = Z𝑛𝑝 Γ— Z𝑛𝑝 , let Ω𝑆 = βˆ…, let Ξžπ‘… = Z𝑝 , and 41