Files
opaque-lattice/papers_txt/vole-ring-lwe.txt
2026-01-06 12:49:26 -07:00

1064 lines
119 KiB
Plaintext
Raw Permalink Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
MIT Open Access Articles
Fast Vector Oblivious Linear Evaluation
from Ring Learning with Errors
The MIT Faculty has made this article openly available. Please share
how this access benefits you. Your story matters.
Citation: de Castro, Leo, Juvekar, Chiraag and Vaikuntanathan, Vinod. 2021. "Fast Vector
Oblivious Linear Evaluation from Ring Learning with Errors." Proceedings of the 9th on
Workshop on Encrypted Computing & Applied Homomorphic Cryptography.
As Published: 10.1145/3474366.3486928
Publisher: ACM
Persistent URL: https://hdl.handle.net/1721.1/143921
Version: Final published version: final published article, as it appeared in a journal, conference
proceedings, or other formally published context
Terms of use: Creative Commons Attribution 4.0 International license
Session 2 WAHC 21, November 15, 2021, Virtual Event, Republic of Korea
Fast Vector Oblivious Linear Evaluation
from Ring Learning with Errors
Leo de Castro Chiraag Juvekar Vinod Vaikuntanathan
Massachusetts Institute of Technology Analog Devices Massachusetts Institute of Technology
ldec@mit.edu chiraag.juvekar@analog.com vinodv@mit.edu
ABSTRACT secure computation protocols [13, 21]. In an OLE protocol over a
Oblivious linear evaluation (OLE) is a fundamental building block ring Z𝑝 , there are two parties, a sender 𝑆 with values 𝛼, 𝛽 ∈ Z𝑝 and
in multi-party computation protocols. In OLE, a sender holds a a receiver 𝑅 with a value 𝑥 ∈ Z𝑝 . At the end of the protocol, 𝑅 will
description of an affine function 𝑓𝛼,𝛽 (𝑧) = 𝛼𝑧 + 𝛽, the receiver learn the value 𝛾 = 𝛼 ·𝑥 + 𝛽 while 𝑆 will learn nothing. In Vector OLE
holds an input 𝑥, and gets 𝛼𝑥 + 𝛽 (where all computations are done (VOLE), the sender has 𝜶 , 𝜷 ∈ Z𝑚 𝑝 , the receiver has 𝑥 ∈ Z𝑝 , and
over some field, or more generally, a ring). Vector OLE (VOLE) is a obtains 𝜸 := 𝜶 𝑥 + 𝜷 ∈ Z𝑚 𝑝 . In other words, vector OLE is running
generalization where the sender has many affine functions and the many OLE protocols in parallel where the receiver has the same 𝑥.
receiver learns the evaluation of all of these functions on a single Batch OLE (BOLE) can be viewed as many OLE protocols running
point 𝑥. in parallel where the receiver has a vector x of values over Z𝑚 𝑝 . In
The state-of-the-art semi-honest VOLE protocols generally fall this work, we will primarily focus on VOLE with a simple extension
into two groups. The first group relies on standard assumptions to BOLE.
to achieve security but lacks in concrete efficiency. These con- Our approach to implementing a VOLE protocol is to use the
structions are mostly based on additively homomorphic encryption packed additively homomorphic encryption (PAHE) of Braker-
(AHE) and are classified as “folklore". The second group relies on ski [11] and Fan and Vercuteran [17], henceforth referred to as the
less standard assumptions, usually properties of sparse, random lin- BFV scheme. PAHE is a natural choice of primitive to implement
ear codes, but they manage to achieve concrete practical efficiency. such a protocol, especially in the honest-but-curious model. At a
In this work, we present a conceptually simple VOLE protocol that high level, our protocol has the receiver 𝑅 encrypt the value 𝑥 and
derives its security from a standard assumption, namely Ring Learn- send the encryption [[𝑥]] to the sender 𝑆. Using the PAHE opera-
ing with Errors (RLWE), while still achieving concrete efficiency tions, the sender can then compute the ciphertext [[𝜸 ]] = [[𝜶 ·𝑥 +𝜷]]
comparable to the fastest VOLE protocols from non-standard coding and return it to the receiver, who can decrypt and learn the VOLE
assumptions [3, 8, 31]. Furthermore, our protocol admits a natural output 𝜸 . As long as the PAHE scheme achieves circuit privacy
extension to batch OLE (BOLE), which is yet another variant of which, in this case, says that the homomorphically evaluated ci-
OLE that computes many OLEs in parallel. phertext [[𝜸 ]] does not leak information about 𝛼 and 𝛽 even to the
owner of the secret key, then security is achieved against honest-
CCS CONCEPTS but-curious adversaries. In the case of VOLE, the circuit is specified
• Security and privacy → Cryptography; by the senders values of 𝜶 and 𝜷.
We note that achieving security in the honest-but-curious setting
KEYWORDS is sufficient following the work of Hazay, Ishai, Marcedone, and
Venkitasubramaniam [21] which instantiates the semi-honest-to-
homomorphic encryption, applied cryptography, oblivious linear active compiler of Ishai, Prabhakaran, and Sahai [23] using black-
evaluation box access to an honest-but-curious OLE protocol. In particular, this
ACM Reference Format: instantiation enables a semi-honest OLE protocol to be upgraded
Leo de Castro, Chiraag Juvekar, and Vinod Vaikuntanathan. 2021. Fast Vector to an actively secure OLE protocol with roughly a factor of two
Oblivious Linear Evaluation from Ring Learning with Errors. In Proceedings overhead ([21], section 5.3).
of the 9th Workshop on Encrypted Computing Applied Homomorphic Cryptog- While the high-level story is simple, instantiating a OLE scheme
raphy (WAHC 21), November 15, 2021, Virtual Event, Republic of Korea. ACM, from a PAHE scheme in a concretely efficient way is less so. In this
New York, NY, USA, 13 pages. https://doi.org/10.1145/3474366.3486928
work, we present a series of optimizations of the textbook BFV
PAHE scheme that results in a vector OLE scheme that outper-
1 INTRODUCTION forms all prior works in parameter regimes of practical interest. At
Oblivious linear evaluation (OLE), a special case of oblivious poly- the heart of our optimizations is a new analysis of the modulus
nomial evaluation [28], is a fundamental building block in many reduction operation similar to the BV homomorphic encryption
scheme [12], which argues that circuit privacy is achieved when
modulus reduction is performed under carefully selected parame-
ters. In addition, we extend the analysis of Halevi, Polyakov, and
Shoup [20] to efficiently implement this operation in the double
This work is licensed under a Creative Commons Attribution International 4.0
License. Chinese remainder theorem (DCRT) representation.
WAHC 21, November 15, 2021, Virtual Event, Republic of Korea
© 2021 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-8656-2/21/11.
https://doi.org/10.1145/3474366.3486928
29
Session 2 WAHC 21, November 15, 2021, Virtual Event, Republic of Korea
1.1 Our Contributions 1.2 Related Work
First, we present a fast, light-weight vector OLE (VOLE) protocol. Recent related work improving VOLE and BOLE efficiency have
In fact, our protocol most naturally achieves the more expressive mainly focused on obtaining optimizations from the Learning Parity
batch OLE (BOLE) functionality at little additional cost. This is in with Noise (LPN) assumption [6] and other related coding assump-
contrast to the LPN-based VOLE protocols of [3] and [31] which, tions. We separate prior work into two main categories based on
to the best of our knowledge, do not have such a straightforward their communication complexity. The first category consists of pro-
extension to BOLE. tocols that have communication complexity linear or super-linear
The main technical idea in our result is a way to achieve circuit in the length of the VOLE correlations they generate. More heuris-
privacy in a PAHE scheme which corresponds to sender security tically, these protocols are optimized for smaller VOLE correlations
in a (V/B)OLE protocol. In slightly more detail, wed like to ensure and tend to be very fast for these shorter VOLE lengths. The sec-
that the receiver who has the private key of the PAHE scheme and ond category consists of protocols with sublinear communication
𝑥, and obtains an evaluated ciphertext [[𝜸 ]] = [[𝜶 · 𝑥 + 𝜷]] learns complexity in the length of the VOLE correlations they generate.
𝜸 , but no other information about 𝜶 and 𝜷. The crucial difficulty Heuristically, these protocols are optimized for larger VOLE lengths
is that the noise contained in the PAHE evaluated ciphertext [[𝜸 ]] and tend to be slower than protocols in the first category for gener-
could reveal information about 𝜶 and 𝜷. ating shorter VOLE correlations.
The folklore idea to achieve circuit privacy is a simple technique Our protocol falls in the first category, so we compare directly
called noise-flooding [18]. That is, sample from a sufficiently wide with other protocols in this category. While our protocol can be
distribution, either Gaussian or uniform, so as to flood the noise extended to the more general batch-OLE functionality at minimal
contained in [[𝜸 ]]. In turn, to achieve sufficient statistical security, cost, we focus our comparisons on vector-OLE. Our main point
this forces us to work with inefficient multi-precision arithmetic. of comparison is the work of Applebaum et al. [3], although we
For example, the procedure to sample uniformly random numbers also compare our protocol against the subsequent works of Baum
from a sufficiently wide interval, using the NTL library [33], will et al. [5], Boyle et al. [10], and Weng et al. [34]. We note that [34]
take more than 100% of our sender run-time. Additionally, despite implements the vector-OLE functionality whereas our work as well
concrete (in)security concerns in practical scenarios [29], homomor- as [10? ] implement the more general batch-OLE functionality.
phic encryption libraries either do not seem to implement circuit The protocol of Applebaum et al. [3] implements a fast vector-
privacy (such as in SEAL [25, 32]) or implement a single-precision OLE protocol using assumptions over sparse linear codes that are
version of noise flooding which provides very limited statistical inspired by the LPN assumption but notably are not known to have
security (such as in PALISADE [30]). a reduction from the LPN assumption. Consequently, while this
Other approaches to circuit-private homomorphic encryption work achieves impressive performance, the security of their pro-
exist, but are much less efficient than what we can afford for such tocol is harder to quantify. In contrast, the security of our VOLE
a lightweight computation as in (V/B)OLE: garbled circuit-based protocol is based on the security of the Ring-LWE assumption,
techniques [19] requires computing and communicating a garbled which has recently been extensively benchmarked in the context
circuit for the PAHE decryption algorithm; FHE bootstrapping- of the NIST post-quantum cryptography standardization process
based techniques [15] are even less efficient; and techniques such and the homomorphic encryption standardization process [2]. In
as [7] only work for the GSW homomorphic encryption scheme particular, we base our parameter choices off of the homomorphic
which is also too inefficient for our purposes. encryption standard [2]. This widely endorsed standard signifi-
We present a simple way to achieve circuit privacy of PAHE, cantly strengthens the security claims about the parameter choices
and therefore construct a (V/B)OLE protocol, using a rounding for this protocol. In section 5.4, we show that even with these strong
operation that allows greater efficiency than the folklore noise- security claims, our protocol achieves efficiency that meets or ex-
flooding approach. The rounding operation and its rather simple ceeds the efficiency of [3] in many settings. In addition, our VOLE
analysis is inspired by similar techniques in the context of learning protocol gives a natural extension to BOLE.
with rounding (LWR) [4]. To compare our protocol against works in the second category,
In addition, we give an implementation of this protocol along we must first recognize that all protocols in the second category will
with performance benchmarks. This implementation aims to be asymptotically outperform ours, since eventually the linear com-
usable as a black-box in larger applications; an earlier version of munication complexity of our protocol will surpass the sublinear
this implementation was already used in the work of Hazay, Ishai, communication complexity of protocols in this category. The natu-
Marcedone, and Venkitasubramaniam [21] in CCS 2019. ral question to ask, then, is at what VOLE length does the sublinear
Furthermore, our protocol outperforms recent implementations protocol become faster than the linear protocol, and where do the
([31], CCS 2019) based on the learning parity with noise assump- practical applications of VOLE live vis-à-vis the crossover point?
tion, as described in [8], for VOLE dimension up to 𝑚 = 235 . In When comparing against protocols in this category, we estimate
other words, while [8, 31] will outperform our implementation this cross-over point and then show applications where the VOLE
asymptotically for huge 𝑚, the crossover point is quite far out. We lengths are less than this point. Our main point of comparison
describe the relation between the two approaches in more detail for protocols in this category is the work of Schoppmann, Gascón,
in Section 1.2, and provide a detailed performance comparison in Reichert, and Raykova [31], which is a two-party implementation
Section 5. of the function secret sharing (FSS) based work of Boyle, Couteau,
Gilboa, and Ishai [8] that constructs a VOLE protocol with sublinear
30
Session 2 WAHC 21, November 15, 2021, Virtual Event, Republic of Korea
complexity. In section 5.4, we show that our protocol outperforms 2 BACKGROUND
[31] for smaller VOLE lengths, which is to be expected. We also In this section, we will briefly review the definitions of the homo-
show that the expected cross-over point of the performances of morphic encryption scheme used below as well as the mathematics
this protocol is large enough to allow potential applications that behind residue number system optimizations.
do not require VOLE correlations of greater length. In section 5.6,
we discuss one such application: securely evaluating convolutional 2.1 Notation
neural network layers for medical images.
The homomorphic encryption scheme used in our work is based
off of the Ring Learning with Errors problem [27], which is defined
Subsequent Work. Subsequent to the online publication of this over polynomial rings. In our instantiation, we use the ring
work [14], several additional works have been published presenting
various VOLE and BOLE protocols. We briefly compare against R = Z[𝑥]/(𝑥 𝑛 + 1) (1)
these works to show that our protocol is still competitive with the For the remainder of this work, 𝑛 will always be a power of two.
state-of-the-art. For a modulus 𝑞, let R𝑞 be R with all coefficients mod 𝑞.
The first work we compare against is the BOLE protocol of
Baum et al. [5]. This two-party protocol between Alice with a
R𝑞 = Z𝑞 [𝑥]/(𝑥 𝑛 + 1)
vector input 𝜶 and Bob with a vector input 𝜷 results in each party
holding an additive share of the component-wise product 𝜶 · 𝜷. For an integer 𝑛, we will denote the set {0, 1, 2, . . . , 𝑛 1} as [𝑛].
The main feature of this protocol is that it requires only one round For integers 𝑖𝑗, we denote the range {𝑖, 𝑖 + 1, . . . , 𝑗 1} as [𝑖 : 𝑗].
of simultaneous communication; Alice sends a message to Bob and We denote the rounding function ⌈·⌋ : R → Z that maps 𝑥𝑟
Bob simultaneously sends a message to Alice. Once these messages R to the closest integer 𝑥 ∈ Z. We denote the flooring function
have been received, the parties locally compute additive shares of ⌊·⌋ : R → Z that maps 𝑥𝑟 ∈ R to the closest integer 𝑥 ∈ Z such that
the component-wise product. While our protocol is two rounds, 𝑥𝑥𝑟 .
we show in section 5.4 that our protocol outperforms this one- For a positive integer 𝑏, we write the modular reduction opera-
round protocol in both total communication and computation time. tion 𝑐𝑎 mod 𝑏 as 𝑐 = [𝑎]𝑏 .
The intuition for why this is the case is because the one-round The norm notation || · || refers to the ℓ∞ norm, unless otherwise
BOLE protocol requires two iterations of the LPR-style rounding specified. For a polynomial 𝑎 with 𝑛 coefficients each mod 𝑞, we
technique, while our approach only performs this operation once. have the bound ||𝑎|| ≤ 𝑞.
Next, we compare against a series of works that fall into the We specify the base-2 logarithm by log.
category of asymptotically sublinear implementations. These im- We call a distribution whose samples have magnitude bounded
plementations are based on some flavor of the LPN assumption by 𝐵 a 𝐵-bounded distribution.
[6]. The majority of these works focus on VOLE, specifically a vari- We say that a function negl is negligible if for every constant
ant known as subfield VOLE, which we denote sVOLE. The sVOLE 𝑐 > 1 we have negl(𝑛) < 1/𝑛𝑐 for all sufficiently large 𝑛.
protocol is defined by a length 𝑛, a prime 𝑝, and an integer 𝑟 . The For two vectors 𝑣 and 𝑤 of length , denote the component-wise
receivers input is a value 𝑥 ∈ F𝑝 𝑟 , and the senders input is the product of 𝑣 and 𝑤 as 𝑣𝑤.
tuple (𝜶 , 𝜷) ∈ F𝑛𝑝 × F𝑛𝑝𝑟 . Note that the elements of the vector 𝜶 We will denote sampling from a distribution as ←
$
. We will denote
are drawn from the subfield F𝑝 rather than the full domain F𝑝 𝑟 . 𝑛𝑒𝑡
sending or receiving a message from a public channel as ←−−−.
The output of sVOLE takes the same format as VOLE, namely the
vector 𝛾 = 𝜶 · 𝑥 + 𝜷 ∈ F𝑝 𝑟 . We note that sVOLE can be the same as
VOLE when 𝑟 = 1. Building off of the works of Boyle et al. [8, 9], 2.2 Circuit Privacy
the fastest work (to our knowledge) implementing sVOLE is the For brevity, we move the defintion of a leveled homomorphic en-
work of Weng et al. and we briefly compare against this work in cryption scheme to appendix A.3
section 5.5.
Definition 2.1. Circuit Privacy ([22] definition 7, [7] definition
The subsequent work of Boyle et al. [10] also uses LPN-style
5.1)
assumptions to construct a sublinear protocol for BOLE. While this
Let E be a leveled homomorphic encryption scheme as defined in
protocol achieves very low communication, the estimated compu-
definition A.4. Define the following:
tational performance is not competitive with out protocol, as we
discuss in section 5.5. $
E.KeyGen(1𝜆 , 1𝐿 )
(sk, pk, evk) ←
$
ct𝑖
E.Encrypt(pk, 𝑚𝑖 ) 𝑖 ∈ [1 . . . 𝑘]
1.3 Road Map $
In section 2, we give some brief preliminaries and definitions neces- ct𝑖
E.EncryptSK(sk, 𝑚𝑖 ) 𝑖 ∈ [𝑘 + 1 . . . 𝑛]
sary in the explanations below. In section 3, we describe our circuit ⟨ct𝑖 ⟩ ≔ {ct𝑖 }𝑛𝑖=1
privacy transformation method and prove its security. In section 𝑚𝑜𝑢𝑡 = 𝑓 (𝑚 1, . . . , 𝑚𝑛 )
4, we give the full VOLE protocol, including a proof of security,
and in section 5 we present an implementation of this protocol and We say that E is 𝜖-circuit private for functions 𝑓 of depth 𝐿
analyze the performance. if there exists a PPT simulator algorithm Sim such that for all PPT
31
Session 2 WAHC 21, November 15, 2021, Virtual Event, Republic of Korea
distinguishing algorithms D the following holds: in our implementation. Finally, we describe the full circuit privacy
h   i transformation, followed by a security proof.

Pr D E.Eval pk, evk, 𝑓 , ⟨ct𝑖 ⟩ , ⟨ct𝑖 ⟩, sk, pk, evk = 1
3.1 The Divide-and-Round Step
h    i We begin by presenting our procedure for performing an efficient
Pr D Sim 1 , pk, sk, evk, 𝑚𝑜𝑢𝑡 , ⟨ct𝑖 ⟩, sk, pk, evk = 1 divide-and-round operation over ciphertext elements. The benefits
of this operation include a decreased overall ciphertext size as well
≤𝜖
as destroying the information in the ciphertext error term that
In words, definition 2.1 says that the output of the evaluation could leak information about the computation.
algorithm of a circuit private leveled homomorphic encryption Let 𝑄 be the original ciphertext modulus of our scheme, and let
Î
scheme should be indistinguishable from a simulated output, where 𝑞𝑖 be the primes such that 𝑖 𝑞𝑖 = 𝑄. Given a ciphertext ct𝑄 , where
the simulator is given no information about the function 𝑓 used to the elements are in R𝑄 and the decryption operations are performed
compute the result ciphertext other than the function output. We over R𝑄 , the goal of this operation is to obtain a ciphertext ct𝑞0
can view the simulator in definition 2.1 as an alternate encryption that encrypts the same message. This new ciphertext ct𝑞0 will
procedure that produces a fresh ciphertext that is indistinguishable have elements in R𝑞0 and the decryption of this ciphertext will be
from the output of the real E.Eval algorithm. We only consider performed over R𝑞0 as well. Recall that we choose primes 𝑞𝑖 to fit in
functions 𝑓 that will result in E.Eval outputting a correct ciphertext. a standard machine word, so decryption of ct𝑞0 is far more efficient
This can be implemented by having both E.Eval and Sim output ⊥ than the decryption of ct𝑄 . In addition, the communication cost
for functions that are exceed the number of levels supported by E. of sending ct𝑞0 over a network is significantly less than sending
We will use the homomorphic encryption scheme of Brakerski, ct𝑄 . Note that this operation can only be correct if the plaintext
Fan, and Vercuteran [11, 17], denoted as the BFV scheme. modulus 𝑝 is sufficiently less than 𝑞 0 .
Let 𝑞 0 = 𝑄/𝑞 0 . Below, we write the ciphertext ct𝑄 and expand
2.3 Ring Expansion Factor out the terms to anticipate the division by 𝑞 0 .
In order to effectively choose parameters for our leveled homomor-  
phic encryption scheme, we must accurately upper bound the noise ct𝑄 = 𝑎, 𝑎 · 𝑠 + Δ𝑚 + 𝑒
 
= 𝑎 · 𝑞 0 + [𝑎]𝑞0 , (𝑎 · 𝑞 0 + [𝑎]𝑞0 ) · 𝑠 + Δ𝑚 + 𝑒
growth due to homomorphic operations. When multiplying two el-
ements of R𝑞 , we need an upper bound on the norm of the product !
by a function of the norms of the operands as well as properties of
R𝑞 itself. = 𝑎 · 𝑞 0 + [𝑎]𝑞0 , (𝑎 · 𝑞 0 + [𝑎]𝑞0 ) · 𝑠 + Δ𝑚 + 𝑒
Definition 2.2 (Ring Expansion Factor [18, 26]). The expan-
The ciphertext ct𝑞0 is obtained by dividing the two components of
sion factor 𝛿 R for a ring R is defined as follows:
ct𝑄 by 𝑞 0 .
||𝑎 · 𝑏 ||
𝛿 R = max $ % $ % $ %
𝑎,𝑏 ∈R ||𝑎|| · ||𝑏 || ct𝑄  𝑎 𝑎 · 𝑠 + Δ𝑚 + 𝑒 
ct𝑞0 = = ,
Lemma 2.1 (Ring Expansion Upper Bound). For a ring R = 𝑞 0 𝑞 0 𝑞 0
 𝑎 · 𝑞 + [𝑎]𝑞 (𝑎 · 𝑞 0 + [𝑎]𝑞0 ) · 𝑠 + Δ𝑚 + 𝑒 
$ % $ %
Z[𝑥]/(𝑥 𝑛 + 1), we can upper bound the ring expansion factor 𝛿 R for 0 0
the norm || · || by = ,
𝑞 0 𝑞 0 (2)
𝛿R ≤ 𝑛
[𝑎]𝑞0 · 𝑠 + Δ𝑚 + 𝑒 
$ %

Proof. In the worst case for the expansion of the norm || · || is = 𝑎 , 𝑎 · 𝑠 +
𝑞 0
when both polynomials 𝑎, 𝑏 ∈ R have coefficients of all the same  
magnitude. In this case, the maximum coefficient of the product will = 𝑎 , 𝑎 · 𝑠 + 𝑣
be 𝑛 times the product of the maximum coefficient of the operands.
Therefore, we have where
[𝑎]𝑞0 · 𝑠 + Δ𝑚 + 𝑒
$ %
||𝑎|| · ||𝑏 || = 𝑛 · ||𝑎 · 𝑏 || = 𝛿 R · ||𝑎 · 𝑏 || 𝑣= (3)
𝑞 0
is the new term in ct𝑞0 that equals Δ ′𝑚 + 𝑣 for a scaling factor Δ
Remark 2.1. The upper bound in lemma 2.1 extends to R𝑞 for any and error term 𝑣 . Note that we will analyze the whole term 𝑣 for
modulus 𝑞. the remainder of this section to show the security of the circuit
privacy procedure, and then we will split 𝑣 into Δ ′𝑚 + 𝑣 to show
3 OUR CIRCUIT PRIVACY APPROACH that this procedure produces a correct, decryptable ciphertext.
In this section, we describe our approach to obtaining circuit privacy. We will now begin to analyze this term 𝑣 in terms of the original
We will begin by describing the main operation of the circuit privacy error term 𝑒. Our goal will be to show that the distribution of
transformation, which is fundamentally a divide-then-round step. 𝑣 is statistically close to a distribution that is independent of 𝑒.
Then, we will describe how this operation is performed efficiently Once we have shown this, we will be able to construct a simulator
32
Session 2 WAHC 21, November 15, 2021, Virtual Event, Republic of Korea
that samples an error term that is statistically close to 𝑣 but still Lemma 3.3. Let 𝑎 be an element of the polynomial ring R𝑞 =
independent of the original error term 𝑒. Z𝑞 [𝑥]/(𝑥 𝑛 + 1), let 𝑏 be a scalar in Z𝑞 , and let 𝐵 be a positive integer.
We begin by defining the event that must occur in order for We can bound the ring boundary event from definition 3.2 by the
⌊(𝑎 + 𝑐)/𝑏⌋ ≠ ⌊(𝑎 + 𝑐 + 𝑒)/𝑏⌋ for a small error term 𝑒 in terms of 𝑎 following:
and fixed scalar 𝑐. This event is intuitively very similar to the BAD Õ
Pr[BAD(𝑎, 𝑏; 𝐵)] ≤ Pr[BAD(𝑎𝑖 , 𝑏; 𝐵)] (10)
event in the analysis of learning with rounding (LWR) in [4].
𝑖 ∈ [𝑛]
Definition 3.1 (Boundary Event). Let 𝑎, 𝑏, 𝐵 be positive inte-
Proof. This follows directly from taking the union bound over
gers. Define the event BAD(𝑎, 𝑏; 𝐵) to be the following:
the events corresponding to the coefficients of 𝑎. □
BAD(𝑎, 𝑏; 𝐵) ≔ {[𝑎]𝑏 ∈ [0, 𝐵) [𝑏 𝐵, 𝑏)} (4)
Corollary 3.3.1. For a polynomial 𝑎 that is sampled over R𝑞
In words, equation 4 is the event that 𝑎 is within distance 𝐵 of a
such that each coefficient 𝑎𝑖 is uniform modulo 𝑏, we can bound the
multiple of 𝑏.
probability of the BAD event occurring by the following:
Lemma 3.1. Let 𝑎, 𝑏 be positive integers, and let 𝑒 be an integer Õ 𝐵
such that |𝑒 | ≤ 𝐵 where 𝐵 < 𝑏/2. Then the following relation holds: Pr[BAD(𝑎, 𝑏; 𝐵)] ≤ Pr[BAD(𝑎𝑖 , 𝑏; 𝐵)] = 2𝑛 (11)
𝑏
j𝑎 k j𝑎 + 𝑒 k 𝑖 ∈ [𝑛]
≠ =⇒ BAD(𝑎, 𝑏; 𝐵) (5)
𝑏 𝑏 Corollary 3.3.2. For a polynomial 𝑎 that is sampled uniformly
Proof. This proof is quite straightforward. If we write 𝑎 = 𝑘𝑏 +𝑟 over R𝑄 for 𝑄 = 𝑞 0 ·𝑞 0 and a small polynomial error term 𝑒 such that
where 𝑘 is an integer and 𝑟 ∈ [𝑏], we have that ||𝑒 || ≤ 𝐵 for a positive integer 𝐵 < 𝑞 0 /2, we can bound the following
j 𝑎 k j 𝑘𝑏 + 𝑟 k probability:
= =𝑘 (6) hj𝑎 + 𝑒 k j 𝑎 ki 𝐵
𝑏 𝑏 Pr ≠ ≤ Pr[BAD(𝑎, 𝑞 0 ; 𝐵)] ≤ 2𝑛
Since ⌊(𝑎+𝑒)/𝑏⌋ ≠ 𝑘, there are two possibilities. The first possibility 𝑎←R𝑄 𝑞 0 𝑞0 𝑞0
is that 𝑎 + 𝑒 = (𝑘 + 1)𝑏 + 𝑟 where 𝑟 ∈ [𝑏]. This occurs when We now return to equation 3 for the compressed term 𝑣 and give
𝑎 ∈ [(𝑘 + 1) · 𝑏 𝑒, (𝑘 + 1) · 𝑏) ⊆ [(𝑘 + 1) · 𝑏 𝐵, (𝑘 + 1) · 𝑏). a lower bound the probability that this term hides the original error
The second case is when 𝑎 + 𝑒 = (𝑘 1)𝑏 + 𝑟 . This occurs when term 𝑒.
𝑎 ∈ ((𝑘 1) · 𝑏, (𝑘 1) · 𝑏 + 𝑒] ⊆ ((𝑘 1) · 𝑏, (𝑘 1) · 𝑏 + 𝐵]. Both We begin by examining the numerator of this function and ob-
of these cases satisfy the condition of BAD(𝑎, 𝑏; 𝐵) occurring. □ serve that if the secret 𝑠 is invertible modulo 𝑞 0 , then the term
Corollary 3.1.1. Let 𝑎, 𝑏 be positive integers, and let 𝑒 be an [𝑎]𝑞0 · 𝑠 would define a bijection with domain and range [𝑞 0 ]. From
integer such that |𝑒 | ≤ 𝐵 and 𝐵 < 𝑏/2. Then the following relation this, we have the simple observation that a uniformly random input
holds: j𝑎 k j𝑎 + 𝑒 k to this bijection gives a uniformly random output over the same
¬BAD(𝑎, 𝑏; 𝐵) =⇒ = (7) range.
𝑏 𝑏 However, we cannot guarantee that 𝑠 is invertible, and we cannot
We will now bound the probability that the BAD event occurs restrict to the case where 𝑠 is invertible while still appealing to
for values of 𝑎 that are uniformly random. the security of RLWE. We handle this by observing that while
Lemma 3.2 (Boundary Event for Random Values). Let 𝑏 and the polynomial element 𝑎 · 𝑠 may not be random, the marginal
𝐵 be positive integers. Let 𝐴 be a distribution over the integers such distribution of each of the individual coefficients of 𝑎 · 𝑠 are still
that the distribution of [𝑎]𝑏 is uniformly random over [𝑏], where random. This is formalized in the following lemma.
𝑎𝐴. We can bound the probability of the BAD event occurring for Lemma 3.4. Fix an integer 𝑄 = 𝑞 0 · 𝑞 0 . Fix a non-zero 𝑠 ∈ R𝑄 =
an output of 𝐴 as follows: Z𝑄 [𝑥]/(𝑥 𝑛 + 1) and an index 𝑖 ∈ [𝑛]. Define the distribution 𝐴𝑖 to
2𝐵 produce samples by first sampling a truly random 𝑎 ← R𝑄 , then
Pr [BAD(𝑎, 𝑏; 𝐵)] ≤ (8)
𝑎←𝐴 𝑏 computing the product 𝑎 · 𝑠 modulo 𝑞 0 , then outputing the 𝑖 𝑡 coef-
Proof. If we write the value 𝑎𝐴 as 𝑎 = 𝑘 · 𝑏 + 𝑟 , we are ficient of this product. The output of this distribution is statistically
given that the distribution of 𝑟 = [𝑎]𝑏 is uniformly random over close to uniform over [𝑞 0 ].
[𝑏]. From definition 3.1, the BAD event occurs when 𝑟 falls into
Proof. We can write the 𝑖 𝑡 coefficient of 𝑎 · 𝑠 as the inner
the range [0, 𝐵) [𝑏 𝐵, 𝑏). This range is of size 2𝐵, so this occurs
product the coefficients of 𝑠 with a rotation of the coefficients of 𝑎.
with probability 2𝐵/𝑏. □
Since the coefficients of 𝑎 are uniformly random modulo 𝑞 0 , and 𝑠
We will now extend the definition of the boundary event to is non-zero, this inner product is statistically close to uniform over
elements over R𝑞 . [𝑞 0 ]. □
Definition 3.2 (Ring Boundary Event). Let 𝑎 be an element Corollary 3.4.1. Fix an integer 𝑄 = 𝑞 0 · 𝑞 0 . Fix a non-zero
of the polynomial ring R𝑞 = Z𝑞 [𝑥]/(𝑥 𝑛 + 1), let 𝑏 be a scalar in Z𝑞 , 𝑠 ∈ R𝑄 = Z𝑄 [𝑥]/(𝑥 𝑛 + 1) and a positive integer B. For a distribution
and let 𝐵 be a positive integer. We will define the boundary case for an of polynomials sampled uniformly over R𝑄 , we can upper bound the
element of R𝑞 to occur if the boundary case for any of its coefficients probability of the BAD event for the product 𝑎 · 𝑠 by the following:
occurs. Denote the 𝑖 𝑡 coefficient of 𝑎 as 𝑎𝑖 . h i 𝐵
Pr BAD(𝑎 · 𝑠, 𝑞 0 ; 𝐵) ≤ 2𝑛 (12)
BAD(𝑎, 𝑏; 𝐵) = {∃𝑖 ∈ [𝑛] such that BAD(𝑎𝑖 , 𝑏; 𝐵)} (9) 𝑎←R𝑄 𝑞 0
33
Session 2 WAHC 21, November 15, 2021, Virtual Event, Republic of Korea
We now complete this analysis by applying corollary 3.4.1 to us to perform ciphertext compression with 𝑘 integer multiplications,
equation 3 for the final term 𝑣 to show that 𝑣 hides 𝑒. We will define 𝑘 floating point multiplications, 𝑘 floating point additions, and
one distribution that outputs 𝑣 as described in equation 3 and one 𝑘 + 1 modular reductions. Comparing with the naïve approach,
distribution that leaves out the error term 𝑒. The next lemma will this technique replaces the entire multi-precision divide-and-floor
give an upper bound that these two distributions output different operation with only the marginal cost increase of 𝑘 floating point
values. addition and multiplication operations versus 𝑘 integer addition
and multiplication operations. We refer the reader to [20] for further
Lemma 3.5. If ||𝑒 || ≤ 𝐵, the final term 𝑣 from equation 3 hides the
analysis of the benefits of the DCRT representations.
original error term 𝑒 with probability 2𝑛 𝑞𝐵 , where 𝑛 is the degree of
0
the polynomial modulus 𝑥 𝑛 + 1. 3.3 Circuit Privacy Operation
Proof. Define two distributions in terms of 𝑠, Δ𝑚, and 𝑒. Distri- We now give the full circuit privacy procedure and prove that its
bution 𝐷 1 produces samples by first sampling a uniformly random secure given a sufficiently small error term relative to the divisor 𝑞 0 .
$ The procedure, given in algorithm 1, is very simple: we add an en-
R𝑄 for 𝑄 = 𝑞 0 · 𝑞, then producing a sample of the form:
𝑎← cryption of zero and then perform the divide-and-round operation
[𝑎]𝑞0 · 𝑠 + Δ𝑚 + 𝑒
$ %
described in section 3.1.
𝑑1 =
𝑞 0
Algorithm 1 Circuit Privacy Procedure, denoted CP
This is identical to the real distribution of 𝑣. The second distribution Input: Ciphertext ct, Public Key pk
$
𝐷 2 produces samples by first sampling a uniformly random 𝑎 R𝑄 ct𝑧𝑒𝑟𝑜 ← Encrypt(pk, 0)
and produces a sample of the following form: ct+ = EvalAdd(ct
j k 𝑧𝑒𝑟𝑜 , ct)
[𝑎]𝑞0 · 𝑠 + Δ𝑚 ct𝑐𝑝 = ct
$ % +
𝑞 0
𝑑2 =
𝑞 0 Output: ct𝑐𝑝
We can bound the probability that any distinguisher can distin-
guish between the outputs of 𝐷 1 and 𝐷 2 by upper bounding the Algorithm 2 defines the full circuit-private evaluation algorithm
probability that these outputs are different. By corollary 3.4.1, we in terms of algorithm 1 and the original BFV Eval procedure.
can say that the marginal distribution of each coefficient of the
polynomial [𝑎]𝑞0 · 𝑠 + Δ𝑚 modulo 𝑞 0 is uniformly random, so this Algorithm 2 Circuit Private Eval Algorithm, denoted EvalCP
is upper bounded by 2𝑛 𝑞𝐵 . □ Input: Ciphertexts ⟨ct𝑖 ⟩ = {ct𝑖 }𝑛𝑖=0 , Public Key pk, Function 𝑓
0
ct𝑖𝑛 ← BFV.Eval(evk, 𝑓 , ⟨ct𝑖 ⟩)
3.2 Avoiding Multi-Precision Arithmetic ct𝑐𝑝
$
CP(ct𝑖𝑛 , pk)
The naïve approach to the above operation is to take an integer Output: ct𝑐𝑝
𝑥 in DCRT representation {𝑥 0, . . . , 𝑥𝑘1 }, recombine into a multi-
precision integer, then perform a multi-precision divide and floor
by 𝑞 0 , then take the result mod 𝑞 0 . In total, this requires 2𝑘 integer Finally we define BFV-CP to be a leveled homomorphic encryp-
multiplications, 𝑘 integer additions, 1 multi-precision divide-and- tion scheme specified by the tuple (BFV.KeyGen, BFV.Encrypt,
floor, and 𝑘 + 1 modular reductions for each coefficient. BFV.EncryptSK, BFV.Decrypt, EvalCP ).
To avoid multi-precision arithmetic required when operating We will now argue that if the noise level of the ciphertext gen-
over elements modulo the full ciphertext modulus, we make use erated by BFV.Eval subroutine within the EvalCP does not exceed
Î a pre-specified bound, BFV-CP is circuit private (Definition 2.1).
of the linearity of the DCRT recombination. Let 𝑄 = 𝑖 𝑞𝑖 , let
𝑞𝑖 = 𝑄/𝑞𝑖 , and let 𝑞˜𝑖 = 𝑞𝑖 (mod 𝑞𝑖 ). From this equation, we have The high-level proof strategy is to define three hybrids (HybridReal ,
the following expression for division by 𝑞 0 : HybridNoiseFree , HybridIdeal ) and show that they are indistinguish-
" # able from the point of view of a computationally bounded distin-
h𝑥 i 𝑘1
1  Õ
  guisher.
= [𝑥𝑖 · 𝑞˜𝑖 ]𝑞𝑖 · 𝑞𝑖 𝑣 · 𝑞 (13)
𝑞 0 𝑞0 𝑞 0 𝑖=0 Define D (ct , ⟨ct𝑖 ⟩, BFV.pk, BFV.sk, BFV.evk) → {0, 1} to be a
𝑞0
PPT distinguisher, where ct𝑖 are valid BFV encryptions of messages
𝑚𝑖 respectively. The domain of the first argument (ct ) is the set of
" 𝑘1
#
1 Õ

= [𝑥𝑖 · 𝑞˜𝑖 ]𝑞𝑖 · 𝑞𝑖 (14) BFV ciphertexts.
𝑞 0 𝑖=0
𝑞0
" 𝑘1 # • HybridReal
Õ 𝑞0 This hybrid corresponds to the real-world where the adver-
= [𝑥𝑖 · 𝑞˜𝑖 ]𝑞𝑖 · (15) sary tries to learn the function being computed from the
𝑖=0
𝑞𝑖
𝑞0
ciphertext that is generated by the evaluator. Concretely, we
 𝑞 0/𝑞𝑖
From equation 15 above, we have that only the terms 𝑞˜𝑖 and define ct ← EvalCP (⟨ct𝑖 ⟩, BFV.pk, 𝑓 ).
for all 𝑖 ∈ [𝑘] are needed to compute the desired term 𝑥/𝑞 0 𝑞 . By the correctness of BFV, the intermediate ciphertext (ct𝑖𝑛 )
0
Since these values only depend on the choice of factors of the computed in EvalCP is a valid encryption of 𝑓 (𝑚 1, . . . 𝑚𝑛 ).
ciphertext modulus, we can easily precompute them, which allows Represent this as ct𝑖𝑛 = (𝑎𝑖𝑛 , 𝑎𝑖𝑛 · 𝑠 + Δ𝑚 + 𝑒𝑖𝑛 ). Additionally,
34
Session 2 WAHC 21, November 15, 2021, Virtual Event, Republic of Korea
represent pk = (𝑎 , 𝑎 𝑠 + 𝑒 ).1 The encryption of zero has the Given a Hybrid, let 𝑃 [Hybrid] denote a probability defined as
form, follows,
ct𝑧𝑒𝑟𝑜 = (𝑎 𝑢 + 𝑒 , 𝑎 𝑠𝑢 + 𝑒 𝑢 + 𝑒 ) Pr[Hybrid] = Pr [D (·, ⟨ct𝑖 ⟩, sk, pk, evk) = 1]
where 𝑢, 𝑒 , 𝑒 ← 𝜒. .
Hence, the ciphertext ct+ has the form, | |𝑒 | |
Lemma 3.6. If 𝑛 𝑞+ ≤ 2𝜆 for security parameter 𝜆, then no PPT
0
ct+ [0] = 𝑎𝑖𝑛 + 𝑎 𝑢 + 𝑒 = 𝑎 + distinguisher can distinguish Hybrid0 and Hybrid1 with probability
ct+ [1] = 𝑎𝑖𝑛 · 𝑠 + Δ𝑚 + 𝑒𝑖𝑛 + 𝑎 𝑠𝑢 + 𝑒 𝑢 + 𝑒 better than 2𝜆 .
= 𝑎 + · 𝑠 + Δ𝑚 + 𝑒 + | Pr[HybridReal ] Pr[HybridNoiseFree ]| ≤ 2𝜆
where 𝑒 + = 𝑒𝑖𝑛 + 𝑒 𝑢 + 𝑒 𝑠 · 𝑒 . Proof. We can upper bound the success of any PPT distin-
Performing the divide-and-floor operation during the CP pro- guisher 𝐷 from distinguishing the outputs of HybridReal and
cedure on ct+ results in a new ciphertext ct𝑞𝑢𝑜𝑡 = ⌊ct+ /𝑞 0 ⌋, HybridNoiseFree by the probability that the outputs of these hybrids
which simplifies to, are different. The only place where these hybrids differ is in the error
term that is added to the numerator. In equation 18 in HybridReal ,
j𝑎 k
+
ct𝑞𝑢𝑜𝑡 [0] = = 𝑎 + (16)
𝑞0 there is an 𝑒 + term that is added in the flooring numerator, while
j 𝑎 · 𝑠 + Δ𝑚 + 𝑒 k in equation 20 in hybrid HybridNoiseFree , this term is omitted.
+ +
ct𝑞𝑢𝑜𝑡 [1] = (17)
𝑞 0 To invoke lemma 3.5, we must show that each coefficient of
j [𝑎 + ]𝑞 · 𝑠 + Δ𝑚 + 𝑒 + k 𝑎 + is uniformly random modulo 𝑞 0 . This can be seen through a
= 𝑎 + · 𝑠 + 0
(18) straightforward application of the same argument used in the proof
𝑞 0 of lemma 3.5 on the coefficients of the product 𝑎 · 𝑢. Once we
where 𝑎 + = 𝑎 + · 𝑞 0 + [𝑎 + ]𝑞0 . show that these coefficients are random, the fixed offsets defined
Note that the ciphertext ct := ct𝑞𝑢𝑜𝑡 . by the coefficients of the rest of the terms in 𝑎 + do not change this
• HybridNoiseFree fact. Therefore, we can invoke lemma 3.5 on the 𝑣 term in hybrid
HybridReal to show that this term hides 𝑣 and is statistically close
In this hybrid, we define a new simulator that outputs a ci- to the distribution of the 𝑣 terms in HybridNoiseFree .
phertext for the distinguisher D, that is very similar to the □
HybridReal . The main difference is that this simulator tweaks Lemma 3.7. Assuming the hardness of RLWE𝑛,𝑞,𝜒 , no PPT dis-
the computation of ct𝑞𝑢𝑜𝑡 [1] to set the 𝑒 + term to zero. Con- tinguisher D can distinguish HybridNoiseFree from HybridIdeal with
cretely, we define a simulator Sim(1𝑙 , pk, evk, sk, 𝑓 , ⟨𝑐𝑡𝑖 ⟩) → non-negligible advantage.
ct , which recomputes all the quantities computed in the
| Pr[HybridNoiseFree ] Pr[HybridIdeal ]| ≤ negl(𝜆)
HybridReal . It then uses the secret key sk to compute 𝑒𝑖𝑛 .
Subsequently it computes 𝑒 + and which can be subtracted Proof. Assume a distinguisher D exists that can distinguish
from the ct+ [1]. HybridNoiseFree from HybridIdeal with probability at least 𝛿. We can
Thus, the output of this simulator ct simplifies to, use D to construct an adversary A that breaks the RLWE problem
j𝑎 k
+ with advantage 𝛿. Given an RLWE sample (𝑎, 𝑏), we can construct
ct [0] = = 𝑎 + (19) a key-pair and ciphertext such that the ciphertext will have the
𝑞0
j [𝑎 + ]𝑞 · 𝑠 + Δ𝑚 k form of a HybridNoiseFree sample if 𝑏 = 𝑎 · 𝑢 + 𝑒 and the form of a
ct [1] = 𝑎 + · 𝑠 + 0
(20) HybridIdeal sample if 𝑏 is uniform over R𝑞 .
𝑞 0 $
Given an RLWE sample (𝑎, 𝑏), begin by sampling 𝑠 , 𝑒 𝜒 and
• HybridIdeal set sk = 𝑠 and pk = (𝑎, 𝑎𝑠 + 𝑒 ) = (pk0, pk1 ). Next, set 𝑟 = 𝑏 and
This hybrid corresponds to the ideal world where the adver- compute ct𝑞𝑢𝑜𝑡 as in equation 22. If 𝑏 = 𝑎𝑢 + 𝑒 then 𝑟 is identically
sary only has access to the function output 𝑚 = 𝑓 (𝑚 1 . . . 𝑚𝑛 ) distributed to 𝑎 + in HybridNoiseFree and if 𝑏 is uniform then 𝑟 is
but has no access to the function 𝑓 . We define a new simula- identically distributed to HybridIdeal . Therefore, any PPT distin-
tor Sim(1𝑙 , pk, evk, sk, 𝑚) → ct whose output will be sent guisher D that can distinguish HybridNoiseFree from HybridIdeal
to the distinguisher D. Note that this is the same simulator can solve decisional RLWE with the same probability. □
required by definition 2.1.
The simulator samples a uniformly polynomial 𝑟 from R𝑞 . It Theorem 3.8 (Circuit Privacy). Given that the input ciphertext
then outputs, and public key are well-formed, the input ciphertext error has magni-
j𝑟 k tude bounded by 𝐵 and the error distribution 𝜒 is 𝐵-bounded, and
ct [0] = = 𝑟 (21) for a security parameter 𝜆 we have
𝑞0
j [𝑟 ]𝑞 · 𝑠 + Δ𝑚 k 2𝑛𝐵 2 + 𝐵 + 𝐵
2𝑛 ≤ 2𝜆 (23)
ct [1] = 𝑟 · 𝑠 + 0
(22) 𝑞 0
𝑞 0
1 Since we are performing a simple homomorphic operation without homomorphic then algorithm 2 achieves 2𝜆 -circuit privacy as defined in definition
multiplications or rotations, it is sufficient for evk to simply be the public key pk. 2.1.
35
Session 2 WAHC 21, November 15, 2021, Virtual Event, Republic of Korea
Proof. This follows by combining lemmas 3.6 and 3.7, since Algorithm 3 Receiver VOLE Protocol
HybridReal is equivalent to the distribution of outputs of algorithm Input: 𝑥 ∈ Z𝑝 , Secret Key sk ∈ R𝑞
2 and HybridIdeal is a distribution that is independent of the circuit $
used to compute the final message. The magnitude of 𝑒 + in lemma ct𝑖𝑛
Encrypt(sk, 𝑥)
𝑛𝑒𝑡
3.6 is 𝐵 plus the magnitude of the error of a public key encryption Sender ←−−− ct𝑖𝑛
(𝑖) 𝑛𝑒𝑡
scheme using error distribution 𝜒. The bound in equation 23 follows. {ct𝑟𝑒𝑠 }𝜏𝑖=1 ←−−− Sender
Therefore, algorithm 2 achieves 2𝜆 -circuit privacy as defined in (𝑖)
{𝜸 (𝑖) }𝜏𝑖=1 ← Decrypt(sk, ct𝑟𝑒𝑠 )
definition 2.1. □
Output: 𝜸 = (𝜸 (1) , 𝜸 (2) , . . . , 𝜸 (𝜏) )
3.4 Correctness
In this subsection, we give the correctness of our circuit privacy Algorithm 4 Sender VOLE Protocol
procedure. Note that since weve already shown the security of this Input: VOLE inputs 𝜶 , 𝜷 ∈ Z𝑚
𝑝 , Public Key pk
operation, we dont have to worry about equivalent expressions 𝑛𝑒𝑡
leaking information about the original error term. ct𝑖𝑛 ←−−− Receiver
We begin with the expression for the ciphertext quotient. for 𝑖 from 1 to 𝜏 do
(𝑖)
ct𝑜𝑙𝑒 ← EvalAddPlain(EvalMultPlain(ct𝑖𝑛 , 𝜶 (𝑖) ), 𝜷 (𝑖) )
[𝑎]𝑞0 · 𝑠 + Δ𝑚 + 𝑒 
$ %
   (𝑖) (𝑖)
ct = 𝑎 , 𝑎 · 𝑠 + 𝑣 = 𝑎 , 𝑎 · 𝑠 + ct𝑐𝑝 ← CP(ct𝑜𝑙𝑒 , pk)
𝑞 0 end for
𝑛𝑒𝑡 (𝑖)
Receiver ←−−− ct𝑐𝑝 = {ct𝑐𝑝 }𝜏𝑖=1
$ %
 [𝑎]𝑞0 · 𝑠 + 𝑒 1 𝑄 [𝑄] 𝑝 
= 𝑎 , 𝑎 ·𝑠 + + ( )𝑚 Output: ⊥
𝑞0 𝑞0 𝑝 𝑝
$ %
 [𝑎]𝑞0 · 𝑠 + 𝑒 𝑞0 [𝑄] 𝑝 
(24)
= 𝑎 , 𝑎 ·𝑠 + + ( )𝑚
𝑞 0 𝑝 𝑞 0 · 𝑝 In sections 4.1 and 4.2, we show the security and correctness of
the VOLE protocol defined by algorithms 3 and 4.
$ [𝑄 ] %
 j𝑞 k [𝑎]𝑞0 · 𝑠 𝑝 𝑝 𝑚 + 𝑒 
0
= 𝑎 , 𝑎 ·𝑠 +
𝑝
𝑚 + 𝑒𝑓 +
𝑞 0
4.1 Security
  We will now argue the security of the VOLE protocol.
= 𝑎 , 𝑎 · 𝑠 + Δ ′𝑚 + 𝑒 𝑓 + 𝑣
Lemma 4.1 (Security Against Sender). If the parameters of
where Δ = ⌊𝑞 0 /𝑝⌋ and 𝑒 𝑓 is the small error term (||𝑒 𝑓 || ≤ 1) the homomorphic encryption scheme are chose to satisfy semantic
introduced by removing ⌊𝑞 0 /𝑝⌋𝑚 from the flooring term. In section security, algorithm 3 achieves security against an honest-but-curious
4, we will give concrete parameters that result in sufficiently small sender.
errors to allow for decryption correctness, but at a high level this Proof. This follows directly from the semantic security of the
ciphertext is decryptable if encryption scheme. □
Δ /2 > ||𝑣 + 𝑒 𝑓 ||
The security against a semi-honest receiver is dependent on the
choosing parameters that satisfy the constraints given in theorem
4 OUR VOLE PROTOCOL 3.8.
In this section, we give our VOLE protocol. Let 𝑚 be the length
of the VOLE vectors and let 𝑛 be the dimension of the ring R = Lemma 4.2 (VOLE Circuit Privacy Parameter). Let 𝜒 be a
Z[𝑥]/(𝑥 𝑛 + 1). Our VOLE protocol splits the vectors in Z𝑚 𝐵-bounded error distribution. For error distribution 𝜒, polynomial
𝑝 over
𝜏 = ⌈𝑚/𝑛⌉ elements of R𝑝 , padding with zeros when necessary. The degree 𝑛, plaintext modulus 𝑝, ciphertext modulus 𝑞 = 𝑞 0 · 𝑞 0 , and
sender then uses homomorphic encryption operations to compute security parameter 𝜆, the following parameter bound will result in
the VOLE result, which is decrypted by the receiver. the homomorphic evaluation in algorithm 4 to maintain 2𝜆 -circuit
In more detail, the receiver begins the protocol by encoding the privacy.
input 𝑥 ∈ Z𝑝 in a single BFV ciphertext ct𝑥 , which it sends to the 2𝑛𝐵 2 + 𝐵 + 𝑛𝑝𝐵
2𝑛 ≤ 2𝜆 (25)
sender. The senders inputs 𝜶 , 𝜷 ∈ Z𝑚 𝑝 are split into 𝜏 elements 𝑞 0
(𝜶 , 𝜶 , . . . , 𝜶 ) and (𝜷 , 𝜷 , . . . , 𝜷 (𝜏) ). For each of the 𝜏
(1) (2) (𝜏) (1) (2)
Proof. This follows from theorem 3.8 if we show that the input
blocks, the sender computes a ciphertext to the circuit privacy procedure has an error term with magnitude
ct (𝑖) = EvalAddPlain(EvalMultPlain(ct𝑥 , 𝜶 (𝑖) ), 𝜷 (𝑖) ) bounded by 𝑛𝑝𝐵. This follows directly from the analysis of the error
growth of the EvalMultPlain operation. The encrypted input to the
The sender then performs the circuit privacy procedure from section EvalMultPlain function in algorithm 4 is a fresh encryption of the
3 and returns each of the resulting ciphertexts to the receiver. The receivers input which has an error term with magnitude upper
receiver then decrypts the 𝜏 blocks to obtain the VOLE result. bounded by 𝐵, so the magnitude of the resulting error term is no
The pseudo-code for the sender and receiver roles are given in more than 𝑛𝑝𝐵. The bound in equation 25 follows from theorem
algorithms 3 and 4. 3.8 where 𝐵 = 𝑛𝑝𝐵. □
36
Session 2 WAHC 21, November 15, 2021, Virtual Event, Republic of Korea
Lemma 4.3 (Security Against Receiver). Given that the param- Algorithm 5 Receiver BOLE Protocol
eters of the homomorphic encryption scheme are chosen to satisfy the Input: x ∈ Z𝑚 𝑝 , Secret Key sk ∈ R𝑞
circuit privacy constraint in lemma 4.2, algorithm 4 achieves security
Let (x , x (2) , . . . , x (𝜏) ) ← x, where 𝜏 = ⌈𝑚/𝑛⌉
(1)
against an honest-but-curious receiver.
for 𝑖 from 1 to 𝜏 do
(𝑖) $
Proof. The proof of this lemma relies on the circuit privacy ct𝑖𝑛 Encrypt(sk, x (𝑖) )
result of theorem 3.8. Note that we crucially rely on the receivers end for
𝑛𝑒𝑡 (𝑖)
initial ciphertext being well-formed. If this is the case, then a simula- Sender ←−−− {ct𝑖𝑛 }𝜏𝑖=1
tor S can use the simulator from theorem 3.8 to return a ciphertext (𝑖) 𝑛𝑒𝑡
{ct𝑟𝑒𝑠 }𝜏𝑖=1 ←−−− Sender
to the receiver that is independent of the circuit used to compute (𝑖)
the final message 𝜸 by sampling a ciphertext as in Hybrid2 from {𝜸 (𝑖) }𝜏𝑖=1 ← Decrypt(sk, ct𝑟𝑒𝑠 )
section 3. Since S knows the correct output message 𝜸 , this ci- Output: 𝜸 = (𝜸 (1) , 𝜸 (2) , . . . , 𝜸 (𝜏) )
phertext is indistinguishable from the output of the circuit privacy
procedure in the real algorithm 4. Therefore, by theorem 3.8, the
message produced by algorithm 4 is indistinguishable from the dimension 𝑛. These parameter sets must satisfy both the computa-
output produced by the simulator that samples a ciphertext from tional security requirements of the RLWE problem as well as the
Hybrid2 . □ security requirements of the circuit-privacy procedure. In the VOLE
protocol, there are no restrictions on the plaintext modulus 𝑝 other
Theorem 4.4 (Secure VOLE Protocol). Algorithms 3 and 4 than its size, so we opt for powers of two for ease of comparison. In
define of a secure VOLE protocol. the BOLE protocol, the plaintext modulus must support the NTT
operation to encode the vectors as polynomials for component-wise
Proof. This follows from combining lemmas 4.1 and 4.3. □ operations. This requires that Z𝑝 contain a 2𝑛 th root of unity.
From the previous section, we have several constraints on the
4.2 Correctness parameters for the protocol to satisfy both security and correctness.
We will now give parameters for which the VOLE protocol is correct. (1) For circuit privacy parameter 𝜆𝑐𝑝 , lemma 4.2 gives the fol-
lowing lower bound on 𝑞 0 .
Lemma 4.5 (Correctness). Let 𝜒 be a 𝐵-bounded error distribu-
tion. For error distribution 𝜒, polynomial degree 𝑛, plaintext modulus 2𝑛𝐵 2 + 𝐵 + 𝑛𝑝𝐵
𝑝, and ciphertext modulus 𝑞 = 𝑞 0 · 𝑞 0 . Algorithms 3 and 4 achieve 1𝑛 ≥ 1 2𝜆𝑐𝑝
𝑞 0
the correct VOLE functionality as defined in definition A.1 if 𝑞 0 is
chosen to satisfy the circuit privacy constraint from lemma 4.2 and 𝑞 0 ≥ 2𝜆𝑐𝑝 · 𝑛(2𝑛𝐵 2 + 𝐵 + 𝑛𝑝𝐵)
𝑞 0 > 2𝑝𝑛𝐵 + 2𝑝 + [𝑞 0 ] 𝑝 , where 𝑣 is the compressed error term from (2) For correctness, lemma 4.5 gives the following lower bound
equation 3. on 𝑞 0 .
𝑞 0 > 2𝑝𝑛𝐵
Proof. This follows directly from the BFV scheme. □
(3) Finally, for semantic security, we must choose 𝑛 and 𝑞 to
satisfy the desired security level. These parameters are given
4.3 Simple Extension to Batched OLE
in table 1 in the Homomorphic Encryption Standard [2]. In
The extension of this VOLE protocol to a BOLE protocol is straight- particular, this table gives a maximum size of 𝑞 for a given
forward and preserves the security proofs of section 4.1. In addition, value of 𝑛. We must select an 𝑛 that is large enough for our
this BOLE extension maintains the exact communication complex- choice of 𝑞.
ity of the VOLE protocol. Only the receivers role is modified in this
Since we have competing constraints on 𝑞, it is not immediately
extension. We show the receivers BOLE procedure in algorithm 5.
clear that we can pick parameters that will be able to satisfy all
The receiver begins with a vector x of values, which she encodes
these constraints. Luckily, there are many parameter sets that satisfy
as a polynomial as described in the full version of this paper. The
these conditions.
operations performed by the sender are identical. To obtain the
Our implementation of the discrete Gaussian sampling algorithm
BOLE output, the receiver must decode the decryption result by
follows [16], which takes advantage of the small standard deviation
evaluating the resulting polynomial at the roots of unity determined
to simply compute the probability of sampling all of the most com-
by the encoding NTT parameters.
mon integers. This truncated distribution is statistically close to the
real discrete Gaussian distribution while still allowing us to bound
5 IMPLEMENTATION & PERFORMANCE
the maximum size of a sampled term. Following the homomorphic
In this section, we give an implementation of the VOLE and BOLE encryption standard, we use a standard deviation of 𝜎 = 3.2. Includ-
protocols presented in section 4. ing the sign bit, no term sampled from 𝜒 is greater than five bits,
so 𝐵 = 32 for all our parameter sets.
5.1 Parameter Selection In most of the parameter sets in this section, each limb of the
In both of these protocols, there are four parameters that must se- ciphertext modulus will be 55 bits. For a circuit privacy parameter of
lected: the magnitude bound 𝐵 of the 𝐵-bounded error distribution 𝜆𝑐𝑝 = 80, we will use a ciphertext modulus consisting of four limbs
𝜒, the plaintext modulus 𝑝, the ciphertext modulus 𝑞, and the ring for a total of 220 bits. This is the maximum size of the modulus for
37
Session 2 WAHC 21, November 15, 2021, Virtual Event, Republic of Korea
𝑛 = 213 in table 1 of the HE standard [2]. For a plaintext modulus
𝑝 = 232 , these parameters satisfy the three constraints given above.
5.2 Extending to Larger Moduli
There are some applications which require moduli that are larger
than can be supported by a final 𝑞 0 that fits in a standard machine
word. We handle these moduli by using the same DCRT represen-
tation as the ciphertext modulus. In words, for a plaintext modulus
𝑝 that is too large for a standard machine word, we represent 𝑝 as
Îℓ
a product of primes 𝑝 = 𝑖=0 𝑝𝑖 where each 𝑝𝑖 is small enough Figure 1: Time per OLE multiplication vs. VOLE length compared
to be supported by a 𝑞 0 that fits in a standard machine word for to [3] (denoted ADI+17). Times are measured in microseconds, and
the given circuit privacy parameter. This allows us to extend the the VOLE protocol is over a 32-bit field. The runtime for the noise-
support of the protocol in section 4 to larger moduli with a factor flooding variant of our protocol was estimated by implementing the
of overhead. noise sampling function using NTL [33] and adding this sampling
time to our protocol time.
5.3 Optimizing for Low-Bandwidth Networks
In section 5.4, we will benchmark our implementation in an envi-
ronment with low network bandwidth. We define this as a setting
where the network bandwidth is the bottleneck of the protocol. To
reduce the overall communication of our protocol, we include in
our implementation a common optimization to reduce the size of a
fresh Ring LWE encryption by a factor of two. The idea is simple:
instead of sampling a random 𝑎 polynomial for a ciphertext (𝑎, 𝑏),
sample a random PRG seed 𝜎 and generate 𝑎𝑃𝑅𝐺 (𝜎) to be the
output of the PRG. The remainder of the ciphertext is generated
as if 𝑎 were sampled normally, but when it comes time to send
the ciphertext we only need to send the pair (𝜎, 𝑏), replacing the Figure 2: Time for a single run of the VOLE protocol vs. VOLE
element of R𝑞 with a 16 byte PRG seed. length compared to [3] (denoted ADI+17). Times are mea-
sured in seconds, and the VOLE protocol is over a 32-bit field.
5.4 Experimental Setup & Results
We implement2 the VOLE and BOLE protocols from section 4 on slower than the machine used for the benchmarks in [3], but this
top of an implementation of the BFV [11, 17] homomorphic en- is not considered in our comparisons (all numbers are reported
cryption scheme based on the BFV-RNS implementation of Halevi, without scaling from [3]).
Polyakov, and Shoup [20] in addition to code from the work of Using this setup, we took two types of benchmarks. The first
Juvekar, Vaikuntanathan, and Chandrakasan [24]. We use an NTT benchmark is of the protocol running between the two machines
implementation based off of the NFLlib library of Aguilar-Melchor, using all of the threads. Based on this runtime, we divided by the
Barrier, Guelton, Guinet, Killijian, and Lepoint [1]. length of the VOLE protocol to get the time per OLE multiplication.
Asymptotically Linear Comparisons. As mentioned in section We then ran the protocol once using a single thread per machine
1.2, we separate prior work into two categories and compare to to measure the latency, the time that a higher level application
each category separately. The first category consists of protocols must wait for the protocol to complete. We compare these results
with (asymptotically) linear communication complexity, and, more against the numbers given in [3] (see tables 2 and 3 in [3]) for their
heuristically, protocols that are optimized for small VOLE lengths. optimized 32 bit protocol. These comparisons are given in figures
The fastest protocol in this category that we are aware of is the 1 and 2. We note that the trend lines for the [3] protocol do not
work of Applebaum, Dåmgard, Ishai, Nielsen, and Zichron [3]. To consider the increase in communication of the protocol, which
compare against [3], we ran our protocol on two AWS m5.2xlarge could result in further slowdown in settings where the bandwidth
instances, each having 8 vCPUs at 3.1 GHz and 32 GB of RAM. is constrained. We also note that this graph includes an estimated
These instances were in the same geographic region (US-East) and benchmark for the time per OLE for our protocol if the circuit
were connected by a network with a bandwidth of 500 MB/sec. This privacy analysis from section 3 is not considered and noise flooding
setup was intended to replicate the conditions of the experiments is used to achieve circuit privacy. This runtime is estimated by
of [3] (see section 6.5). Our network bandwidth is much lower than implementing the sampling function for the noise flooding term in
in their setup which favors [3] as their communication complexity NTL [33], then adding the time for this sampling algorithm to our
does not exceed the bandwidth of the network we used. We employ runtime.
the communication optimization described in section 5.3. We also Finally, we measured the communication complexity of a sin-
note that the clock speed of the CPU of our machine is slightly gle run of our VOLE protocol and compared it to estimates of the
communication complexity of [3] based on the reported consumed
2 https://github.com/leodec/ole_wahc bandwidth for their protocol. This comparison is displayed in figure
38
Session 2 WAHC 21, November 15, 2021, Virtual Event, Republic of Korea
Figure 4: Time for a single run of the VOLE protocol vs VOLE
Figure 3: Total communication in kilobytes for a single run length compared to [31] (denoted SGRR19). The runtime is
of the VOLE protocol vs. VOLE length compared to [3] (de- measured in seconds, and the VOLE is over a 32-bit field.
noted ADI+17). The VOLE is over a 32-bit field. Note that the Both axes are shown in log scale to better display the trends.
slope of the ADI+17 trend line is greater than the slope of our Based on extrapolation, we estimate the intersection point
protocols trend line. We estimate the cross-over point in the in the runtimes to be near a VOLE length of 235 .
communication complexity to be a VOLE length of around
216 .
3. For small VOLE, the protocol of [3] achieves better communica-
tion complexity than our protocol for VOLE lengths less than 216 ,
although beyond this length our protocol achieves better commu-
nication complexity.
Asymptotically Sublinear Comparisons. The second category of
prior art consists of protocols with sub-linear communication com-
plexity. These protocols are designed to be optimized for large
VOLE lengths, and due the sub-linear communication they will
Figure 5: Total communication for a single run of the VOLE
always be asymptotically faster than our protocol. Despite this, our
protocol vs VOLE length compared to [31] (denoted SGRR19).
protocol is faster for smaller VOLE lengths, and our comparison
The communication is measured in megabytes, and the
aims to determine VOLE length at which one protocol becomes
VOLE is over a 32-bit field. Both axes are shown in log scale
faster than the other. By determining this point, we are able to
to better display the trends. Based on extrapolation, we esti-
discuss applications for which our protocol is more efficient and
mate the intersection point in the communication to be near
when it makes sense to switch to a protocol optimized for larger
a VOLE length of 232 .
VOLE sizes. To our knowledge, the fastest protocol in this category
is the work of Schoppmann, Gascón, Reichert, and Raykova [31],
which is an optimized two-party implementation of the FSS-based is able to generate arbitrary VOLE correlations directly. While there
sub-linear vector VOLE [8]. is a simple and secure reduction from random VOLE to arbitrary
Since the authors of [31] included their code, we were able to VOLE, it is an interesting future direction to try to further optimize
run side-by-side comparisons between our protocol and theirs. our protocol for generating random VOLE correlations.
Following the setup from their work, we ran each protocol on a
single thread on two machines connected by a network with 500 5.5 Comparison Against Subsequent Works
MB/sec bandwidth. The results of these comparisons are given in
In this section, we briefly compare against works subsequent to the
figures 4 and 5. By extrapolating the data in figure 4, we estimate the
online publication of this work.
the cross-over point is near a VOLE length of 235 . Extrapolating the
communication complexity trends in figure 5 gives an approximate Comparisons to Asymptotically Linear Protocols. We compare
cross-over point near a VOLE length of 232 . In section 5.6, we give against the one-round BOLE protocol of Baum et al. [5]. This work
some applications that require VOLE protocols of width less than presents several different BOLE protocols, and we compare against
this cross-over point. the version the authors chose to benchmark. This protocol assumes
We note briefly that the protocol in [31] requires running a that the two parties each begin with the public key of the other
smaller VOLE protocol that it then expands using an LPN code. party, and we do not count this key exchange as part of the com-
When generating an exceptionally long VOLE correlation, one could munication of this protocol. Aside from the round complexity, our
run a protocol that recursively calls [31] until it reaches a point BOLE protocol outperforms the one-round protocol in both com-
where our protocol is faster, then runs our protocol as the base munication and computation. To compare this one-round protocol
case. It is an interesting future direction to combine our protocol to our C++ implementation, we implemented the one-round BOLE
with [31] to improve the performance of VOLE with sub-linear protocol using the same underlying RLWE library as our BOLE
communication. protocol. Based on these comparisons, our protocol is a 20-30×
We also note that both protocols in this section focus on optimiz- faster computationally in experiments where each party is running
ing the generation of random VOLE correlations while our protocol in a single thread connecting over localhost. Our benchmarks
39
Session 2 WAHC 21, November 15, 2021, Virtual Event, Republic of Korea
for the one-round protocol for a 60-bit plaintext was less than one PALISADE library. We gratefully acknowledge funding from the
second, faster than Golang benchmark given in Baum et al. [5] with Jameel-Clinic and from the National Science Foundation.
the same parameters, which was roughly 1.5 seconds. Our total
communication is strictly less than the one-round protocol, since
one of their ciphertexts requires essentially the same LPR overhead REFERENCES
as ours, while the other ciphertext that they send requires double [1] Carlos Aguilar-Melchor, Joris Barrier, Serge Guelton, Adrien Guinet, Marc-Olivier
the LPR overhead. Killijian, and Tancrède Lepoint. 2016. NFLlib: NTT-Based Fast Lattice Library.
In Topics in Cryptology - CT-RSA 2016, Kazue Sako (Ed.). Springer International
Comparisons to Asymptotically Sublinear Protocols. To compare Publishing, Cham, 341356.
to the subsequent VOLE protocol of Weng et al. [34], we note that [2] Martin Albrecht, Melissa Chase, Hao Chen, Jintai Ding, Shafi Goldwasser, Sergey
Gorbunov, Shai Halevi, Jeffrey Hoffstein, Kim Laine, Kristin Lauter, Satya Lokam,
this work claims a roughly 20× computational speedup over Schopp- Daniele Micciancio, Dustin Moody, Travis Morrison, Amit Sahai, and Vinod
mann et al. [31]. We can extrapolate this comparison to estimate Vaikuntanathan. 2018. Homomorphic Encryption Security Standard. Technical
the crossover point for the Weng et al. protocol, which is roughly a Report. HomomorphicEncryption.org, Toronto, Canada.
[3] Benny Applebaum, Ivan Damgård, Yuval Ishai, Michael Nielsen, and Lior Zichron.
VOLE length of 228 . For communication, the Weng et al. protocol a 2017. Secure Arithmetic Computation with Constant Computational Overhead.
setup cost of 1.1 MB and subsequent communication of 0.42 bits In Advances in Cryptology CRYPTO 2017, Jonathan Katz and Hovav Shacham
per VOLE correlation puts the crossover point with our protocol to (Eds.). Springer International Publishing, Cham, 223254.
[4] Abhishek Banerjee, Chris Peikert, and Alon Rosen. 2012. Pseudorandom
be near a VOLE length of 215 . Functions and Lattices. In Advances in Cryptology EUROCRYPT 2012, David
To compare against the BOLE protocol of Boyle et al. [10], we Pointcheval and Thomas Johansson (Eds.). Springer Berlin Heidelberg, Berlin,
Heidelberg, 719737.
first note that the work does not give the benchmarks of an actual [5] Carsten Baum, Daniel Escudero, Alberto Pedrouzo-Ulloa, Peter Scholl, and
implementation, only estimates of the computation time. They give Juan Ramón Troncoso-Pastoriza. 2020. Efficient Protocols for Oblivious Lin-
various parameter sets to allow for a communication-computation ear Function Evaluation from Ring-LWE. In Security and Cryptography for Net-
works, Clemente Galdi and Vladimir Kolesnikov (Eds.). Springer International
trade-off. For 128 bits of security, The estimates they provide for Publishing, Cham, 130149.
generating 220 batch OLE correlations give performance that ranges [6] Avrim Blum, Adam Tauman Kalai, and Hal Wasserman. 2003. Noise-Tolerant
from roughly 13 seconds with nearly 18 MB of communication to Learning, the Parity Problem, and the Statistical Query Model. Journal of the ACM
(JACM) 50, 4 (July 2003), 506519. https://www.microsoft.com/en-us/research/
nearly 20 seconds with communication roughly 1.5 MB. The plain- publication/noise-tolerant-learning-parity-problem-statistical-query-model/
text modulus of these BOLE correlations was roughly 2124 , and [7] Florian Bourse, Rafaël Del Pino, Michele Minelli, and Hoeteck Wee. 2016. FHE
Circuit Privacy Almost for Free. In Advances in Cryptology CRYPTO 2016,
while our BOLE protocol is not optimized for these large moduli, we Matthew Robshaw and Jonathan Katz (Eds.). Springer Berlin Heidelberg, Berlin,
can estimate the performance for a plaintext modulus of this size by Heidelberg, 6289.
generating four 32-bit BOLE correlations for each 124-correlation [8] Elette Boyle, Geoffroy Couteau, Niv Gilboa, and Yuval Ishai. 2018. Compressing
Vector OLE. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and
in [10] (this simulates a CRT splitting of a 128-bit plaintext mod- Communications Security (Toronto, Canada) (CCS 18). Association for Comput-
ulus). Our benchmarks for 222 BOLE correlations with a 32-bit ing Machinery, New York, NY, USA, 896912. https://doi.org/10.1145/3243734.
plaintext modulus give a computation time of 3 seconds and a total 3243868
[9] Elette Boyle, Geoffroy Couteau, Niv Gilboa, Yuval Ishai, Lisa Kohl, Peter Rindal,
communication of about 200 MB. and Peter Scholl. 2019. Efficient Two-Round OT Extension and Silent Non-
Interactive Secure Computation. In Proceedings of the 2019 ACM SIGSAC Con-
5.6 Application: Secure Image Convolution ference on Computer and Communications Security (London, United Kingdom)
(CCS 19). Association for Computing Machinery, New York, NY, USA, 291308.
There are many applications for a VOLE protocol, including many https://doi.org/10.1145/3319535.3354255
that do not require VOLE correlations of length greater than the [10] Elette Boyle, Geoffroy Couteau, Niv Gilboa, Yuval Ishai, Lisa Kohl, and Peter
cross-over points discussed in section 5.4. One concrete example Scholl. 2020. Efficient Pseudorandom Correlation Generators from Ring-LPN.
In Advances in Cryptology CRYPTO 2020, Daniele Micciancio and Thomas
is image convolution for secure neural network evaluation. The Ristenpart (Eds.). Springer International Publishing, Cham, 387416.
canonical image convolution operations takes in a two-dimensional [11] Zvika Brakerski. 2012. Fully Homomorphic Encryption without Modulus Switch-
image and a small two-dimensional kernel and produces an output ing from Classical GapSVP. Proceedings of Advances in Cryptology-Crypto 7417
(08 2012). https://doi.org/10.1007/978-3-642-32009-5_50
image where each output pixel is the sum of the component-wise [12] Zvika Brakerski and Vinod Vaikuntanathan. 2011. Efficient Fully Homomorphic
product of the kernel and a region of input image. While one can Encryption from (Standard) LWE. In IEEE 52nd Annual Symposium on Foundations
of Computer Science, FOCS 2011, Palm Springs, CA, USA, October 22-25, 2011, Rafail
describe this operation using two-dimensional Fourier transforms, Ostrovsky (Ed.). IEEE Computer Society, 97106. https://doi.org/10.1109/FOCS.
one of the most efficient implementations of secure convolution 2011.12
is to use VOLE. In particular, one can define a VOLE correlation [13] Melissa Chase, Yevgeniy Dodis, Yuval Ishai, Daniel Kraschewski, Tianren Liu,
Rafail Ostrovsky, and Vinod Vaikuntanathan. 2019. Reusable Non-Interactive
for each element in the kernel and then perform a scalar-vector Secure Computation. In Advances in Cryptology CRYPTO 2019, Alexandra
product with the elements of the image multiplied by the specific Boldyreva and Daniele Micciancio (Eds.). Springer International Publishing,
kernel element. Even for large neural networks such as networks Cham, 462488.
[14] Leo de Castro, Chiraag Juvekar, and Vinod Vaikuntanathan. 2020. Fast Vector
that process mammograms for cancer diagnosis [35], the size of a Oblivious Linear Evaluation from Ring Learning with Errors. Cryptology ePrint
convolution does not exceed the size of the input image, which is Archive, Report 2020/685. https://ia.cr/2020/685.
[15] Léo Ducas and Damien Stehlé. 2016. Sanitization of FHE Ciphertexts. In Advances
2028 × 2028 = 222 pixels. Based on the benchmarks in section 5.4, in Cryptology - EUROCRYPT 2016 - 35th Annual International Conference on the
we would outperform all prior works for the generation of these Theory and Applications of Cryptographic Techniques, Vienna, Austria, May 8-
VOLE correlations. 12, 2016, Proceedings, Part I (Lecture Notes in Computer Science, Vol. 9665), Marc
Fischlin and Jean-Sébastien Coron (Eds.). Springer, 294310. https://doi.org/10.
1007/978-3-662-49890-3_12
ACKNOWLEDGMENTS [16] Nagarjun Dwarakanath and Steven Galbraith. 2014. Sampling from discrete
Gaussians for lattice-based cryptography on a constrained device. Applicable
We would like to thank Michael Nielsen for providing the code Algebra in Engineering, Communication and Computing 25 (06 2014). https:
for [3] and Yuriy Polyakov for answering our questions about the //doi.org/10.1007/s00200-014-0218-3
40
Session 2 WAHC 21, November 15, 2021, Virtual Event, Republic of Korea
[17] Junfeng Fan and Frederik Vercauteren. 2012. Somewhat practical fully homomor- let Ω𝑅 = Z𝑛𝑝 . The function 𝑓𝑆 (𝜉𝑆 , 𝜉𝑅 ) = ⊥ for all (𝜉𝑆 , 𝜉𝑅 ) ∈ Ξ𝑆 × Ξ𝑅 .
phic encryption. https://eprint.iacr.org/2012/144.pdf For 𝜉𝑆 = (𝜶 , 𝜷) and 𝜉𝑅 = 𝑥, define 𝑓𝑅 (𝜉𝑆 , 𝜉𝑅 ) = 𝜶 · 𝑥 + 𝜷 mod 𝑝.
[18] Craig Gentry. 2009. Fully homomorphic encryption using ideal lattices. STOC.
[19] Craig Gentry, Shai Halevi, and Vinod Vaikuntanathan. 2010. i-Hop Homomorphic
Encryption and Rerandomizable Yao Circuits. In Advances in Cryptology - CRYPTO
Definition A.2 (Batch Oblivious Linear Evaluation). For an
2010, 30th Annual Cryptology Conference, Santa Barbara, CA, USA, August 15-19, integer 𝑝 and 𝑛, define the protocol BOLE (𝑝,𝑛) to have the following
2010. Proceedings (Lecture Notes in Computer Science, Vol. 6223), Tal Rabin (Ed.). ideal functionality. Let Ξ𝑆 = Z𝑛𝑝 × Z𝑛𝑝 , let Ω𝑆 = ∅, let Ξ𝑅 = Z𝑛𝑝 , and
Springer, 155172. https://doi.org/10.1007/978-3-642-14623-7_9
[20] Shai Halevi, Yuriy Polyakov, and Victor Shoup. 2019. An Improved RNS Variant of let Ω𝑅 = Z𝑛𝑝 . The function 𝑓𝑆 (𝜉𝑆 , 𝜉𝑅 ) = ⊥ for all (𝜉𝑆 , 𝜉𝑅 ) ∈ Ξ𝑆 × Ξ𝑅 .
the BFV Homomorphic Encryption Scheme. In Topics in Cryptology CT-RSA 2019 For 𝜉𝑆 = (𝜶 , 𝜷) and 𝜉𝑅 = x, define 𝑓𝑅 (𝜉𝑆 , 𝜉𝑅 ) = 𝜶 ⊙ x + 𝜷 mod 𝑝.
- The Cryptographers Track at the RSA Conference 2019, Proceedings (Lecture Notes
in Computer Science (including subseries Lecture Notes in Artificial Intelligence and
Lecture Notes in Bioinformatics)), Mitsuru Matsui (Ed.). Springer-Verlag, 83105. A.2 Ring Learning with Errors
https://doi.org/10.1007/978-3-030-12612-4_5
[21] Carmit Hazay, Yuval Ishai, Antonio Marcedone, and Muthuramakrishnan Venki-
In this section, we define the decisional Ring Learning with Errors
tasubramaniam. 2019. LevioSA: Lightweight Secure Arithmetic Computation. In (RLWE) [27] problem.
Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications
Security (London, United Kingdom) (CCS 19). Association for Computing Ma- Definition A.3 (Decisional Ring Learning with Errors [27]).
chinery, New York, NY, USA, 327344. https://doi.org/10.1145/3319535.3354258 For integers 𝑛 and 𝑞, a ring R𝑞 = Z𝑞 [𝑥]/(𝑥 𝑛 + 1), and an error
[22] Yuval Ishai and Anat Paskin. 2007. Evaluating Branching Programs on Encrypted
Data. In Theory of Cryptography, Salil P. Vadhan (Ed.). Springer Berlin Heidelberg, distribution 𝜒 over R𝑞 , the decisional Ring Learning with Errors
Berlin, Heidelberg, 575594. problem RLWE𝑛,𝑞,𝜒 is to distinguish between samples of the form
[23] Yuval Ishai, Manoj Prabhakaran, and Amit Sahai. 2008. Founding Cryptography $ $
on Oblivious Transfer Efficiently. 572591. https://doi.org/10.1007/978-3-540- (𝑎, 𝑎𝑠 + 𝑒) and (𝑎, 𝑢) where 𝑎, 𝑢
R𝑞 and 𝑠, 𝑒
𝜒.
85174-5_32
[24] Chiraag Juvekar, Vinod Vaikuntanathan, and Anantha Chandrakasan. 2018. In this work, we consider 𝜒 to be the discrete, zero-centered
GAZELLE: A Low Latency Framework for Secure Neural Network Inference. In
27th USENIX Security Symposium (USENIX Security 18). USENIX Association, Bal-
Gaussian distribution over R𝑞 . This follows the homomorphic en-
timore, MD, 16511669. https://www.usenix.org/conference/usenixsecurity18/ cryption security standard [2] from which we get the concrete
presentation/juvekar parameters used in the implementation in section 5.
[25] Kim Laine. 2019. https://github.com/microsoft/SEAL/issues/89.
[26] Vadim Lyubashevsky and Daniele Micciancio. 2006. Generalized Compact Knap-
sacks Are Collision Resistant. In Automata, Languages and Programming, Michele A.3 Homomorphic Encryption
Bugliesi, Bart Preneel, Vladimiro Sassone, and Ingo Wegener (Eds.). Springer
Berlin Heidelberg, Berlin, Heidelberg, 144155. In this section, we give high level definitions for the algorithms
[27] Vadim Lyubashevsky, Chris Peikert, and Oded Regev. 2010. On Ideal Lattices and comprising a leveled homomorphic encryption scheme as well as
Learning with Errors Over Rings. EUROCRYPT (2010). necessary security definitions.
[28] Moni Naor and Benny Pinkas. 2006. Oblivious Polynomial Evaluation. SIAM J.
Comput. 35 (01 2006), 12541281. https://doi.org/10.1137/S0097539704383633 Definition A.4 (Leveled Homomorphic Encryption). A lev-
[29] Zhiniang Peng. 2019. Danger of using fully homomorphic encryption: A look at
Microsoft SEAL. CoRR abs/1906.07127 (2019). arXiv:1906.07127 http://arxiv.org/ eled homomorphic encryption scheme
abs/1906.07127
[30] Y. Polyakov, K. Rohloff, and G.W Ryan. [n.d.]. PALISADE lattice cryptography E = (KeyGen, Encrypt, Eval, Decrypt)
library. https://git.njit.edu/palisade/PALISADE.
[31] Phillipp Schoppmann, Adrià Gascón, Leonie Reichert, and Mariana Raykova.
is a set of PPT algorithms defined as follows:
2019. Distributed Vector-OLE: Improved Constructions and Implementation. • KeyGen(1𝜆 , 1𝐿 ) → (sk, pk, evk)
https://doi.org/10.1145/3319535.3363228
[32] SEAL 2020. Microsoft SEAL (release 3.5). https://github.com/Microsoft/SEAL. Given the security parameter 𝜆 and a maximum circuit depth
Microsoft Research, Redmond, WA. 𝐿, outputs a key pair consisting of a public encryption key pk,
[33] Victor Shoup. [n.d.]. NTL: A Library for doing Number Theory. https://shoup. a secret decryption key sk, and an evaluation key evk.
net/ntl/.
[34] C. Weng, K. Yang, J. Katz, and X. Wang. 2021. Wolverine: Fast, Scalable, and • Encrypt(pk, 𝑚) → ct
Communication-Efficient Zero-Knowledge Proofs for Boolean and Arithmetic Given a message 𝑚 ∈ M and an encryption key pk, outputs a
Circuits. In 2021 2021 IEEE Symposium on Security and Privacy (SP). IEEE Computer
Society, Los Alamitos, CA, USA, 10741091. https://doi.org/10.1109/SP40001.
ciphertext ct.
2021.00056 • Eval(evk, 𝑓 , ct1, ct2, . . . , ct𝑛 ) → ct
[35] Adam Yala, Constance Lehman, Tal Schuster, Tally Portnoi, and Regina Barzilay. Given the evaluation key, a description of a function 𝑓 : M𝑛
2019. A Deep Learning Mammography-based Model for Improved Breast Cancer
Risk Prediction. Radiology 292 (05 2019), 182716. https://doi.org/10.1148/radiol. M with multiplicative depth at most 𝐿, and 𝑛 ciphertexts
2019182716 encrypting messages 𝑚 1, . . . , 𝑚𝑛 , outputs the result ciphertext
ct encrypting 𝑚 = 𝑓 (𝑚 1, . . . , 𝑚𝑛 ).
• Decrypt(sk, ct) = 𝑚 Given the secret decryption key and a
A FURTHER BACKGROUND ciphertext ct encrypting 𝑚, outputs 𝑚.
In this appendix, we give further background omitted from section
2 due to space constraints. Optionally the scheme E may be extended with a PPT algorithm
EncryptSK(sk, 𝑚) → ct which uses the secret key sk rather than
the public key pk, to compute the ciphertext ct from the message
A.1 Oblivious Linear Evaluation 𝑚.
We now give formal definitions of vector OLE and batch OLE, along When returning a ciphertext output by the Eval function, it is
with security definitions. often desirable for this ciphertext to hide the function 𝑓 that was
used to produce it. This property of the scheme is called circuit
Definition A.1 (Vector Oblivious Linear Evaluation). For privacy, which we formally define below.
an integer 𝑝 and 𝑛, define the protocol VOLE (𝑝,𝑛) to have the following
ideal functionality. Let Ξ𝑆 = Z𝑛𝑝 × Z𝑛𝑝 , let Ω𝑆 = ∅, let Ξ𝑅 = Z𝑝 , and
41