opaque-lattice/papers_txt/vole-ring-lwe.txt

                              MIT Open Access Articles

                     Fast Vector Oblivious Linear Evaluation
                        from Ring Learning with Errors


             The MIT Faculty has made this article openly available. Please share
             how this access benefits you. Your story matters.


Citation: de Castro, Leo, Juvekar, Chiraag and Vaikuntanathan, Vinod. 2021. "Fast Vector
Oblivious Linear Evaluation from Ring Learning with Errors." Proceedings of the 9th on
Workshop on Encrypted Computing &amp; Applied Homomorphic Cryptography.

As Published: 10.1145/3474366.3486928

Publisher: ACM

Persistent URL: https://hdl.handle.net/1721.1/143921

Version: Final published version: final published article, as it appeared in a journal, conference
proceedings, or other formally published context

Terms of use: Creative Commons Attribution 4.0 International license
Session 2                                                                            WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea


                                 Fast Vector Oblivious Linear Evaluation
                                     from Ring Learning with Errors
                   Leo de Castro                                         Chiraag Juvekar                        Vinod Vaikuntanathan
    Massachusetts Institute of Technology                               Analog Devices                    Massachusetts Institute of Technology
              ldec@mit.edu                                        chiraag.juvekar@analog.com                       vinodv@mit.edu

ABSTRACT                                                                            secure computation protocols [13, 21]. In an OLE protocol over a
Oblivious linear evaluation (OLE) is a fundamental building block                   ring Z𝑝 , there are two parties, a sender 𝑆 with values 𝛼, 𝛽 ∈ Z𝑝 and
in multi-party computation protocols. In OLE, a sender holds a                      a receiver 𝑅 with a value 𝑥 ∈ Z𝑝 . At the end of the protocol, 𝑅 will
description of an affine function 𝑓𝛼,𝛽 (𝑧) = 𝛼𝑧 + 𝛽, the receiver                   learn the value 𝛾 = 𝛼 ·𝑥 + 𝛽 while 𝑆 will learn nothing. In Vector OLE
holds an input 𝑥, and gets 𝛼𝑥 + 𝛽 (where all computations are done                  (VOLE), the sender has 𝜶 , 𝜷 ∈ Z𝑚    𝑝 , the receiver has 𝑥 ∈ Z𝑝 , and
over some field, or more generally, a ring). Vector OLE (VOLE) is a                 obtains 𝜸 := 𝜶 𝑥 + 𝜷 ∈ Z𝑚 𝑝 . In other words, vector OLE is running
generalization where the sender has many affine functions and the                   many OLE protocols in parallel where the receiver has the same 𝑥.
receiver learns the evaluation of all of these functions on a single                Batch OLE (BOLE) can be viewed as many OLE protocols running
point 𝑥.                                                                            in parallel where the receiver has a vector x of values over Z𝑚  𝑝 . In
The state-of-the-art semi-honest VOLE protocols generally fall                      this work, we will primarily focus on VOLE with a simple extension
into two groups. The first group relies on standard assumptions                     to BOLE.
to achieve security but lacks in concrete efficiency. These con-                       Our approach to implementing a VOLE protocol is to use the
structions are mostly based on additively homomorphic encryption                    packed additively homomorphic encryption (PAHE) of Braker-
(AHE) and are classified as “folklore". The second group relies on                  ski [11] and Fan and Vercuteran [17], henceforth referred to as the
less standard assumptions, usually properties of sparse, random lin-                BFV scheme. PAHE is a natural choice of primitive to implement
ear codes, but they manage to achieve concrete practical efficiency.                such a protocol, especially in the honest-but-curious model. At a
In this work, we present a conceptually simple VOLE protocol that                   high level, our protocol has the receiver 𝑅 encrypt the value 𝑥 and
derives its security from a standard assumption, namely Ring Learn-                 send the encryption [[𝑥]] to the sender 𝑆. Using the PAHE opera-
ing with Errors (RLWE), while still achieving concrete efficiency                   tions, the sender can then compute the ciphertext [[𝜸 ]] = [[𝜶 ·𝑥 +𝜷]]
comparable to the fastest VOLE protocols from non-standard coding                   and return it to the receiver, who can decrypt and learn the VOLE
assumptions [3, 8, 31]. Furthermore, our protocol admits a natural                  output 𝜸 . As long as the PAHE scheme achieves circuit privacy
extension to batch OLE (BOLE), which is yet another variant of                      which, in this case, says that the homomorphically evaluated ci-
OLE that computes many OLEs in parallel.                                            phertext [[𝜸 ]] does not leak information about 𝛼 and 𝛽 even to the
                                                                                    owner of the secret key, then security is achieved against honest-
CCS CONCEPTS                                                                        but-curious adversaries. In the case of VOLE, the circuit is specified
• Security and privacy → Cryptography;                                              by the sender’s values of 𝜶 and 𝜷.
                                                                                       We note that achieving security in the honest-but-curious setting
KEYWORDS                                                                            is sufficient following the work of Hazay, Ishai, Marcedone, and
                                                                                    Venkitasubramaniam [21] which instantiates the semi-honest-to-
homomorphic encryption, applied cryptography, oblivious linear                      active compiler of Ishai, Prabhakaran, and Sahai [23] using black-
evaluation                                                                          box access to an honest-but-curious OLE protocol. In particular, this
ACM Reference Format:                                                               instantiation enables a semi-honest OLE protocol to be upgraded
Leo de Castro, Chiraag Juvekar, and Vinod Vaikuntanathan. 2021. Fast Vector         to an actively secure OLE protocol with roughly a factor of two
Oblivious Linear Evaluation from Ring Learning with Errors. In Proceedings          overhead ([21], section 5.3).
of the 9th Workshop on Encrypted Computing Applied Homomorphic Cryptog-                While the high-level story is simple, instantiating a OLE scheme
raphy (WAHC ’21), November 15, 2021, Virtual Event, Republic of Korea. ACM,         from a PAHE scheme in a concretely efficient way is less so. In this
New York, NY, USA, 13 pages. https://doi.org/10.1145/3474366.3486928
                                                                                    work, we present a series of optimizations of the textbook BFV
                                                                                    PAHE scheme that results in a vector OLE scheme that outper-
1 INTRODUCTION                                                                      forms all prior works in parameter regimes of practical interest. At
Oblivious linear evaluation (OLE), a special case of oblivious poly-                the heart of our optimizations is a new analysis of the modulus
nomial evaluation [28], is a fundamental building block in many                     reduction operation similar to the BV homomorphic encryption
                                                                                    scheme [12], which argues that circuit privacy is achieved when
                                                                                    modulus reduction is performed under carefully selected parame-
                                                                                    ters. In addition, we extend the analysis of Halevi, Polyakov, and
                                                                                    Shoup [20] to efficiently implement this operation in the double
This work is licensed under a Creative Commons Attribution International 4.0
License.                                                                            Chinese remainder theorem (DCRT) representation.

WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea
© 2021 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-8656-2/21/11.
https://doi.org/10.1145/3474366.3486928


                                                                               29
Session 2                                                                         WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea


1.1    Our Contributions                                                         1.2    Related Work
First, we present a fast, light-weight vector OLE (VOLE) protocol.               Recent related work improving VOLE and BOLE efficiency have
In fact, our protocol most naturally achieves the more expressive                mainly focused on obtaining optimizations from the Learning Parity
batch OLE (BOLE) functionality at little additional cost. This is in             with Noise (LPN) assumption [6] and other related coding assump-
contrast to the LPN-based VOLE protocols of [3] and [31] which,                  tions. We separate prior work into two main categories based on
to the best of our knowledge, do not have such a straightforward                 their communication complexity. The first category consists of pro-
extension to BOLE.                                                               tocols that have communication complexity linear or super-linear
    The main technical idea in our result is a way to achieve circuit            in the length of the VOLE correlations they generate. More heuris-
privacy in a PAHE scheme which corresponds to sender security                    tically, these protocols are optimized for smaller VOLE correlations
in a (V/B)OLE protocol. In slightly more detail, we’d like to ensure             and tend to be very fast for these shorter VOLE lengths. The sec-
that the receiver who has the private key of the PAHE scheme and                 ond category consists of protocols with sublinear communication
𝑥, and obtains an evaluated ciphertext [[𝜸 ]] = [[𝜶 · 𝑥 + 𝜷]] learns             complexity in the length of the VOLE correlations they generate.
𝜸 , but no other information about 𝜶 and 𝜷. The crucial difficulty               Heuristically, these protocols are optimized for larger VOLE lengths
is that the noise contained in the PAHE evaluated ciphertext [[𝜸 ]]              and tend to be slower than protocols in the first category for gener-
could reveal information about 𝜶 and 𝜷.                                          ating shorter VOLE correlations.
    The folklore idea to achieve circuit privacy is a simple technique              Our protocol falls in the first category, so we compare directly
called noise-flooding [18]. That is, sample from a sufficiently wide             with other protocols in this category. While our protocol can be
distribution, either Gaussian or uniform, so as to flood the noise               extended to the more general batch-OLE functionality at minimal
contained in [[𝜸 ]]. In turn, to achieve sufficient statistical security,        cost, we focus our comparisons on vector-OLE. Our main point
this forces us to work with inefficient multi-precision arithmetic.              of comparison is the work of Applebaum et al. [3], although we
For example, the procedure to sample uniformly random numbers                    also compare our protocol against the subsequent works of Baum
from a sufficiently wide interval, using the NTL library [33], will              et al. [5], Boyle et al. [10], and Weng et al. [34]. We note that [34]
take more than 100% of our sender run-time. Additionally, despite                implements the vector-OLE functionality whereas our work as well
concrete (in)security concerns in practical scenarios [29], homomor-             as [10? ] implement the more general batch-OLE functionality.
phic encryption libraries either do not seem to implement circuit                   The protocol of Applebaum et al. [3] implements a fast vector-
privacy (such as in SEAL [25, 32]) or implement a single-precision               OLE protocol using assumptions over sparse linear codes that are
version of noise flooding which provides very limited statistical                inspired by the LPN assumption but notably are not known to have
security (such as in PALISADE [30]).                                             a reduction from the LPN assumption. Consequently, while this
    Other approaches to circuit-private homomorphic encryption                   work achieves impressive performance, the security of their pro-
exist, but are much less efficient than what we can afford for such              tocol is harder to quantify. In contrast, the security of our VOLE
a lightweight computation as in (V/B)OLE: garbled circuit-based                  protocol is based on the security of the Ring-LWE assumption,
techniques [19] requires computing and communicating a garbled                   which has recently been extensively benchmarked in the context
circuit for the PAHE decryption algorithm; FHE bootstrapping-                    of the NIST post-quantum cryptography standardization process
based techniques [15] are even less efficient; and techniques such               and the homomorphic encryption standardization process [2]. In
as [7] only work for the GSW homomorphic encryption scheme                       particular, we base our parameter choices off of the homomorphic
which is also too inefficient for our purposes.                                  encryption standard [2]. This widely endorsed standard signifi-
    We present a simple way to achieve circuit privacy of PAHE,                  cantly strengthens the security claims about the parameter choices
and therefore construct a (V/B)OLE protocol, using a rounding                    for this protocol. In section 5.4, we show that even with these strong
operation that allows greater efficiency than the folklore noise-                security claims, our protocol achieves efficiency that meets or ex-
flooding approach. The rounding operation and its rather simple                  ceeds the efficiency of [3] in many settings. In addition, our VOLE
analysis is inspired by similar techniques in the context of learning            protocol gives a natural extension to BOLE.
with rounding (LWR) [4].                                                            To compare our protocol against works in the second category,
    In addition, we give an implementation of this protocol along                we must first recognize that all protocols in the second category will
with performance benchmarks. This implementation aims to be                      asymptotically outperform ours, since eventually the linear com-
usable as a black-box in larger applications; an earlier version of              munication complexity of our protocol will surpass the sublinear
this implementation was already used in the work of Hazay, Ishai,                communication complexity of protocols in this category. The natu-
Marcedone, and Venkitasubramaniam [21] in CCS 2019.                              ral question to ask, then, is at what VOLE length does the sublinear
    Furthermore, our protocol outperforms recent implementations                 protocol become faster than the linear protocol, and where do the
([31], CCS 2019) based on the learning parity with noise assump-                 practical applications of VOLE live vis-à-vis the crossover point?
tion, as described in [8], for VOLE dimension up to 𝑚 = 235 . In                 When comparing against protocols in this category, we estimate
other words, while [8, 31] will outperform our implementation                    this cross-over point and then show applications where the VOLE
asymptotically for huge 𝑚, the crossover point is quite far out. We              lengths are less than this point. Our main point of comparison
describe the relation between the two approaches in more detail                  for protocols in this category is the work of Schoppmann, Gascón,
in Section 1.2, and provide a detailed performance comparison in                 Reichert, and Raykova [31], which is a two-party implementation
Section 5.                                                                       of the function secret sharing (FSS) based work of Boyle, Couteau,
                                                                                 Gilboa, and Ishai [8] that constructs a VOLE protocol with sublinear


                                                                            30
Session 2                                                                     WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea


complexity. In section 5.4, we show that our protocol outperforms            2     BACKGROUND
[31] for smaller VOLE lengths, which is to be expected. We also              In this section, we will briefly review the definitions of the homo-
show that the expected cross-over point of the performances of               morphic encryption scheme used below as well as the mathematics
this protocol is large enough to allow potential applications that           behind residue number system optimizations.
do not require VOLE correlations of greater length. In section 5.6,
we discuss one such application: securely evaluating convolutional           2.1     Notation
neural network layers for medical images.
                                                                             The homomorphic encryption scheme used in our work is based
                                                                             off of the Ring Learning with Errors problem [27], which is defined
   Subsequent Work. Subsequent to the online publication of this             over polynomial rings. In our instantiation, we use the ring
work [14], several additional works have been published presenting
various VOLE and BOLE protocols. We briefly compare against                                              R = Z[𝑥]/(𝑥 𝑛 + 1)                           (1)
these works to show that our protocol is still competitive with the          For the remainder of this work, 𝑛 will always be a power of two.
state-of-the-art.                                                            For a modulus 𝑞, let R𝑞 be R with all coefficients mod 𝑞.
   The first work we compare against is the BOLE protocol of
Baum et al. [5]. This two-party protocol between Alice with a
                                                                                                        R𝑞 = Z𝑞 [𝑥]/(𝑥 𝑛 + 1)
vector input 𝜶 and Bob with a vector input 𝜷 results in each party
holding an additive share of the component-wise product 𝜶 · 𝜷.                  For an integer 𝑛, we will denote the set {0, 1, 2, . . . , 𝑛 − 1} as [𝑛].
The main feature of this protocol is that it requires only one round         For integers 𝑖 ≤ 𝑗, we denote the range {𝑖, 𝑖 + 1, . . . , 𝑗 − 1} as [𝑖 : 𝑗].
of simultaneous communication; Alice sends a message to Bob and                 We denote the rounding function ⌈·⌋ : R → Z that maps 𝑥𝑟 ∈
Bob simultaneously sends a message to Alice. Once these messages             R to the closest integer 𝑥 ∈ Z. We denote the flooring function
have been received, the parties locally compute additive shares of           ⌊·⌋ : R → Z that maps 𝑥𝑟 ∈ R to the closest integer 𝑥 ∈ Z such that
the component-wise product. While our protocol is two rounds,                𝑥 ≤ 𝑥𝑟 .
we show in section 5.4 that our protocol outperforms this one-                  For a positive integer 𝑏, we write the modular reduction opera-
round protocol in both total communication and computation time.             tion 𝑐 ≡ 𝑎 mod 𝑏 as 𝑐 = [𝑎]𝑏 .
The intuition for why this is the case is because the one-round                 The norm notation || · || refers to the ℓ∞ norm, unless otherwise
BOLE protocol requires two iterations of the LPR-style rounding              specified. For a polynomial 𝑎 with 𝑛 coefficients each mod 𝑞, we
technique, while our approach only performs this operation once.             have the bound ||𝑎|| ≤ 𝑞.
   Next, we compare against a series of works that fall into the                We specify the base-2 logarithm by log.
category of asymptotically sublinear implementations. These im-                 We call a distribution whose samples have magnitude bounded
plementations are based on some flavor of the LPN assumption                 by 𝐵 a 𝐵-bounded distribution.
[6]. The majority of these works focus on VOLE, specifically a vari-            We say that a function negl is negligible if for every constant
ant known as subfield VOLE, which we denote sVOLE. The sVOLE                 𝑐 > 1 we have negl(𝑛) < 1/𝑛𝑐 for all sufficiently large 𝑛.
protocol is defined by a length 𝑛, a prime 𝑝, and an integer 𝑟 . The            For two vectors 𝑣 and 𝑤 of length ℓ, denote the component-wise
receiver’s input is a value 𝑥 ∈ F𝑝 𝑟 , and the sender’s input is the         product of 𝑣 and 𝑤 as 𝑣 ⊙ 𝑤.
tuple (𝜶 , 𝜷) ∈ F𝑛𝑝 × F𝑛𝑝𝑟 . Note that the elements of the vector 𝜶              We will denote sampling from a distribution as ←
                                                                                                                                     $
                                                                                                                                −. We will denote
are drawn from the subfield F𝑝 rather than the full domain F𝑝 𝑟 .                                                                               𝑛𝑒𝑡
                                                                             sending or receiving a message from a public channel as ←−−−.
The output of sVOLE takes the same format as VOLE, namely the
vector 𝛾 = 𝜶 · 𝑥 + 𝜷 ∈ F𝑝 𝑟 . We note that sVOLE can be the same as
VOLE when 𝑟 = 1. Building off of the works of Boyle et al. [8, 9],           2.2     Circuit Privacy
the fastest work (to our knowledge) implementing sVOLE is the                For brevity, we move the defintion of a leveled homomorphic en-
work of Weng et al. and we briefly compare against this work in              cryption scheme to appendix A.3
section 5.5.
                                                                                Definition 2.1. Circuit Privacy ([22] definition 7, [7] definition
   The subsequent work of Boyle et al. [10] also uses LPN-style
                                                                             5.1)
assumptions to construct a sublinear protocol for BOLE. While this
                                                                                Let E be a leveled homomorphic encryption scheme as defined in
protocol achieves very low communication, the estimated compu-
                                                                             definition A.4. Define the following:
tational performance is not competitive with out protocol, as we
discuss in section 5.5.                                                                             $
                                                                                                  − E.KeyGen(1𝜆 , 1𝐿 )
                                                                                    (sk, pk, evk) ←
                                                                                                    $
                                                                                               ct𝑖 ←
                                                                                                   − E.Encrypt(pk, 𝑚𝑖 ) 𝑖 ∈ [1 . . . 𝑘]
1.3    Road Map                                                                                     $
In section 2, we give some brief preliminaries and definitions neces-                          ct𝑖 ←
                                                                                                   − E.EncryptSK(sk, 𝑚𝑖 ) 𝑖 ∈ [𝑘 + 1 . . . 𝑛]
sary in the explanations below. In section 3, we describe our circuit                       ⟨ct𝑖 ⟩ ≔ {ct𝑖 }𝑛𝑖=1
privacy transformation method and prove its security. In section                            𝑚𝑜𝑢𝑡 = 𝑓 (𝑚 1, . . . , 𝑚𝑛 )
4, we give the full VOLE protocol, including a proof of security,
and in section 5 we present an implementation of this protocol and           We say that E is 𝜖-circuit private for functions 𝑓 of depth ℓ ≤ 𝐿
analyze the performance.                                                     if there exists a PPT simulator algorithm Sim such that for all PPT


                                                                        31
Session 2                                                                                               WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea


distinguishing algorithms D the following holds:                                                       in our implementation. Finally, we describe the full circuit privacy
         h                                                                          i                transformation, followed by a security proof.

      Pr D E.Eval pk, evk, 𝑓 , ⟨ct𝑖 ⟩ , ⟨ct𝑖 ⟩, sk, pk, evk = 1
                                                                                                       3.1     The Divide-and-Round Step
             h                                                                         i            We begin by presenting our procedure for performing an efficient
      − Pr D Sim 1ℓ , pk, sk, evk, 𝑚𝑜𝑢𝑡 , ⟨ct𝑖 ⟩, sk, pk, evk = 1                                      divide-and-round operation over ciphertext elements. The benefits
                                                                                                       of this operation include a decreased overall ciphertext size as well
      ≤𝜖
                                                                                                       as destroying the information in the ciphertext error term that
   In words, definition 2.1 says that the output of the evaluation                                     could leak information about the computation.
algorithm of a circuit private leveled homomorphic encryption                                             Let 𝑄 be the original ciphertext modulus of our scheme, and let
                                                                                                                                     Î
scheme should be indistinguishable from a simulated output, where                                      𝑞𝑖 be the primes such that 𝑖 𝑞𝑖 = 𝑄. Given a ciphertext ct𝑄 , where
the simulator is given no information about the function 𝑓 used to                                     the elements are in R𝑄 and the decryption operations are performed
compute the result ciphertext other than the function output. We                                       over R𝑄 , the goal of this operation is to obtain a ciphertext ct𝑞0
can view the simulator in definition 2.1 as an alternate encryption                                    that encrypts the same message. This new ciphertext ct𝑞0 will
procedure that produces a fresh ciphertext that is indistinguishable                                   have elements in R𝑞0 and the decryption of this ciphertext will be
from the output of the real E.Eval algorithm. We only consider                                         performed over R𝑞0 as well. Recall that we choose primes 𝑞𝑖 to fit in
functions 𝑓 that will result in E.Eval outputting a correct ciphertext.                                a standard machine word, so decryption of ct𝑞0 is far more efficient
This can be implemented by having both E.Eval and Sim output ⊥                                         than the decryption of ct𝑄 . In addition, the communication cost
for functions that are exceed the number of levels supported by E.                                     of sending ct𝑞0 over a network is significantly less than sending
   We will use the homomorphic encryption scheme of Brakerski,                                         ct𝑄 . Note that this operation can only be correct if the plaintext
Fan, and Vercuteran [11, 17], denoted as the BFV scheme.                                               modulus 𝑝 is sufficiently less than 𝑞 0 .
                                                                                                          Let 𝑞 0∗ = 𝑄/𝑞 0 . Below, we write the ciphertext ct𝑄 and expand
2.3     Ring Expansion Factor                                                                          out the terms to anticipate the division by 𝑞 0∗ .
In order to effectively choose parameters for our leveled homomor-
phic encryption scheme, we must accurately upper bound the noise                                               ct𝑄 = 𝑎, 𝑎 · 𝑠 + Δ𝑚 + 𝑒

                                                                                                                    = 𝑎 ′ · 𝑞 0∗ + [𝑎]𝑞0∗ , (𝑎 ′ · 𝑞 0∗ + [𝑎]𝑞0∗ ) · 𝑠 + Δ𝑚 + 𝑒
growth due to homomorphic operations. When multiplying two el-
ements of R𝑞 , we need an upper bound on the norm of the product                                                                                                                !
by a function of the norms of the operands as well as properties of
R𝑞 itself.                                                                                                          = 𝑎 ′ · 𝑞 0∗ + [𝑎]𝑞0∗ , (𝑎 ′ · 𝑞 0∗ + [𝑎]𝑞0∗ ) · 𝑠 + Δ𝑚 + 𝑒

   Definition 2.2 (Ring Expansion Factor [18, 26]). The expan-
                                                                                                       The ciphertext ct𝑞0 is obtained by dividing the two components of
sion factor 𝛿 R for a ring R is defined as follows:
                                                                                                       ct𝑄 by 𝑞 0∗ .
                                                   ||𝑎 · 𝑏 ||
                                  𝛿 R = max                                                                        $       %       $ % $                      %
                                          𝑎,𝑏 ∈R ||𝑎|| · ||𝑏 ||                                                      ct𝑄          𝑎          𝑎 · 𝑠 + Δ𝑚 + 𝑒
                                                                                                          ct𝑞0 =             =           ,
   Lemma 2.1 (Ring Expansion Upper Bound). For a ring R =                                                             𝑞 0∗          𝑞 0∗              𝑞 0∗
                                                                                                                    𝑎 ′ · 𝑞 ∗ + [𝑎]𝑞 ∗         (𝑎 · 𝑞 0 + [𝑎]𝑞0∗ ) · 𝑠 + Δ𝑚 + 𝑒
                                                                                                                     $                     % $ ′ ∗                              %
Z[𝑥]/(𝑥 𝑛 + 1), we can upper bound the ring expansion factor 𝛿 R for                                                         0           0
the norm || · || by                                                                                            =                            ,
                                                                                                                              𝑞 0∗                             𝑞 0∗                (2)
                              𝛿R ≤ 𝑛
                                                                                                                                       [𝑎]𝑞0∗ · 𝑠 + Δ𝑚 + 𝑒
                                                                                                                                    $                       %

   Proof. In the worst case for the expansion of the norm || · || is                                           = 𝑎 ′, 𝑎 ′ · 𝑠 +
                                                                                                                                                 𝑞 0∗
when both polynomials 𝑎, 𝑏 ∈ R have coefficients of all the same
magnitude. In this case, the maximum coefficient of the product will                                           = 𝑎 ′, 𝑎 ′ · 𝑠 + 𝑣
be 𝑛 times the product of the maximum coefficient of the operands.
Therefore, we have                                                                                     where
                                                                                                                                         [𝑎]𝑞0∗ · 𝑠 + Δ𝑚 + 𝑒
                                                                                                                                     $                         %
                         ||𝑎|| · ||𝑏 || = 𝑛 · ||𝑎 · 𝑏 || = 𝛿 R · ||𝑎 · 𝑏 ||                                                     𝑣=                                                (3)
                                                                                                                                                 𝑞 0∗
                                                                                              □
                                                                                                       is the new term in ct𝑞0 that equals Δ ′𝑚 + 𝑣 ′ for a scaling factor Δ ′
  Remark 2.1. The upper bound in lemma 2.1 extends to R𝑞 for any                                       and error term 𝑣 ′ . Note that we will analyze the whole term 𝑣 for
modulus 𝑞.                                                                                             the remainder of this section to show the security of the circuit
                                                                                                       privacy procedure, and then we will split 𝑣 into Δ ′𝑚 + 𝑣 ′ to show
3     OUR CIRCUIT PRIVACY APPROACH                                                                     that this procedure produces a correct, decryptable ciphertext.
In this section, we describe our approach to obtaining circuit privacy.                                    We will now begin to analyze this term 𝑣 in terms of the original
We will begin by describing the main operation of the circuit privacy                                  error term 𝑒. Our goal will be to show that the distribution of
transformation, which is fundamentally a divide-then-round step.                                       𝑣 is statistically close to a distribution that is independent of 𝑒.
Then, we will describe how this operation is performed efficiently                                     Once we have shown this, we will be able to construct a simulator


                                                                                                  32
Session 2                                                                          WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea


that samples an error term that is statistically close to 𝑣 but still                Lemma 3.3. Let 𝑎 be an element of the polynomial ring R𝑞 =
independent of the original error term 𝑒.                                         Z𝑞 [𝑥]/(𝑥 𝑛 + 1), let 𝑏 be a scalar in Z𝑞 , and let 𝐵 be a positive integer.
   We begin by defining the event that must occur in order for                    We can bound the ring boundary event from definition 3.2 by the
⌊(𝑎 + 𝑐)/𝑏⌋ ≠ ⌊(𝑎 + 𝑐 + 𝑒)/𝑏⌋ for a small error term 𝑒 in terms of 𝑎              following:
and fixed scalar 𝑐. This event is intuitively very similar to the BAD                                                  Õ
                                                                                              Pr[BAD(𝑎, 𝑏; 𝐵)] ≤            Pr[BAD(𝑎𝑖 , 𝑏; 𝐵)]           (10)
event in the analysis of learning with rounding (LWR) in [4].
                                                                                                                       𝑖 ∈ [𝑛]
   Definition 3.1 (Boundary Event). Let 𝑎, 𝑏, 𝐵 be positive inte-
                                                                                     Proof. This follows directly from taking the union bound over
gers. Define the event BAD(𝑎, 𝑏; 𝐵) to be the following:
                                                                                  the events corresponding to the coefficients of 𝑎.            □
            BAD(𝑎, 𝑏; 𝐵) ≔ {[𝑎]𝑏 ∈ [0, 𝐵) ∪ [𝑏 − 𝐵, 𝑏)}               (4)
                                                                                     Corollary 3.3.1. For a polynomial 𝑎 that is sampled over R𝑞
In words, equation 4 is the event that 𝑎 is within distance 𝐵 of a
                                                                                  such that each coefficient 𝑎𝑖 is uniform modulo 𝑏, we can bound the
multiple of 𝑏.
                                                                                  probability of the BAD event occurring by the following:
   Lemma 3.1. Let 𝑎, 𝑏 be positive integers, and let 𝑒 be an integer                                             Õ                            𝐵
such that |𝑒 | ≤ 𝐵 where 𝐵 < 𝑏/2. Then the following relation holds:                      Pr[BAD(𝑎, 𝑏; 𝐵)] ≤          Pr[BAD(𝑎𝑖 , 𝑏; 𝐵)] = 2𝑛    (11)
                                                                                                                                              𝑏
                  j𝑎 k j𝑎 + 𝑒 k                                                                                   𝑖 ∈ [𝑛]
                       ≠            =⇒ BAD(𝑎, 𝑏; 𝐵)                    (5)
                    𝑏         𝑏                                                       Corollary 3.3.2. For a polynomial 𝑎 that is sampled uniformly
   Proof. This proof is quite straightforward. If we write 𝑎 = 𝑘𝑏 +𝑟              over R𝑄 for 𝑄 = 𝑞 0∗ ·𝑞 0 and a small polynomial error term 𝑒 such that
where 𝑘 is an integer and 𝑟 ∈ [𝑏], we have that                                   ||𝑒 || ≤ 𝐵 for a positive integer 𝐵 < 𝑞 0∗ /2, we can bound the following
                         j 𝑎 k j 𝑘𝑏 + 𝑟 k                                         probability:
                                =          =𝑘                          (6)                       hj𝑎 + 𝑒 k j 𝑎 ki                                  𝐵
                           𝑏         𝑏                                                      Pr              ≠ ∗ ≤ Pr[BAD(𝑎, 𝑞 0∗ ; 𝐵)] ≤ 2𝑛 ∗
Since ⌊(𝑎+𝑒)/𝑏⌋ ≠ 𝑘, there are two possibilities. The first possibility                   𝑎←R𝑄       𝑞 0∗       𝑞0                                 𝑞0
is that 𝑎 + 𝑒 = (𝑘 + 1)𝑏 + 𝑟 ′ where 𝑟 ′ ∈ [𝑏]. This occurs when                     We now return to equation 3 for the compressed term 𝑣 and give
𝑎 ∈ [(𝑘 + 1) · 𝑏 − 𝑒, (𝑘 + 1) · 𝑏) ⊆ [(𝑘 + 1) · 𝑏 − 𝐵, (𝑘 + 1) · 𝑏).              a lower bound the probability that this term hides the original error
The second case is when 𝑎 + 𝑒 = (𝑘 − 1)𝑏 + 𝑟 ′′ . This occurs when                term 𝑒.
𝑎 ∈ ((𝑘 − 1) · 𝑏, (𝑘 − 1) · 𝑏 + 𝑒] ⊆ ((𝑘 − 1) · 𝑏, (𝑘 − 1) · 𝑏 + 𝐵]. Both            We begin by examining the numerator of this function and ob-
of these cases satisfy the condition of BAD(𝑎, 𝑏; 𝐵) occurring. □                 serve that if the secret 𝑠 is invertible modulo 𝑞 0∗ , then the term
   Corollary 3.1.1. Let 𝑎, 𝑏 be positive integers, and let 𝑒 be an                [𝑎]𝑞0∗ · 𝑠 would define a bijection with domain and range [𝑞 0∗ ]. From
integer such that |𝑒 | ≤ 𝐵 and 𝐵 < 𝑏/2. Then the following relation               this, we have the simple observation that a uniformly random input
holds:                               j𝑎 k j𝑎 + 𝑒 k                                to this bijection gives a uniformly random output over the same
                ¬BAD(𝑎, 𝑏; 𝐵) =⇒         =                       (7)              range.
                                      𝑏       𝑏                                      However, we cannot guarantee that 𝑠 is invertible, and we cannot
   We will now bound the probability that the BAD event occurs                    restrict to the case where 𝑠 is invertible while still appealing to
for values of 𝑎 that are uniformly random.                                        the security of RLWE. We handle this by observing that while
   Lemma 3.2 (Boundary Event for Random Values). Let 𝑏 and                        the polynomial element 𝑎 · 𝑠 may not be random, the marginal
𝐵 be positive integers. Let 𝐴 be a distribution over the integers such            distribution of each of the individual coefficients of 𝑎 · 𝑠 are still
that the distribution of [𝑎]𝑏 is uniformly random over [𝑏], where                 random. This is formalized in the following lemma.
𝑎 ← 𝐴. We can bound the probability of the BAD event occurring for                    Lemma 3.4. Fix an integer 𝑄 = 𝑞 0∗ · 𝑞 0 . Fix a non-zero 𝑠 ∈ R𝑄 =
an output of 𝐴 as follows:                                                        Z𝑄 [𝑥]/(𝑥 𝑛 + 1) and an index 𝑖 ∈ [𝑛]. Define the distribution 𝐴𝑖 to
                                              2𝐵                                  produce samples by first sampling a truly random 𝑎 ← R𝑄 , then
                       Pr [BAD(𝑎, 𝑏; 𝐵)] ≤                          (8)
                      𝑎←𝐴                      𝑏                                  computing the product 𝑎 · 𝑠 modulo 𝑞 0∗ , then outputing the 𝑖 𝑡ℎ coef-
   Proof. If we write the value 𝑎 ← 𝐴 as 𝑎 = 𝑘 · 𝑏 + 𝑟 , we are                   ficient of this product. The output of this distribution is statistically
given that the distribution of 𝑟 = [𝑎]𝑏 is uniformly random over                  close to uniform over [𝑞 0∗ ].
[𝑏]. From definition 3.1, the BAD event occurs when 𝑟 falls into
                                                                                       Proof. We can write the 𝑖 𝑡ℎ coefficient of 𝑎 · 𝑠 as the inner
the range [0, 𝐵) ∪ [𝑏 − 𝐵, 𝑏). This range is of size 2𝐵, so this occurs
                                                                                  product the coefficients of 𝑠 with a rotation of the coefficients of 𝑎.
with probability 2𝐵/𝑏.                                               □
                                                                                  Since the coefficients of 𝑎 are uniformly random modulo 𝑞 0∗ , and 𝑠
   We will now extend the definition of the boundary event to                     is non-zero, this inner product is statistically close to uniform over
elements over R𝑞 .                                                                [𝑞 0∗ ].                                                             □
   Definition 3.2 (Ring Boundary Event). Let 𝑎 be an element                         Corollary 3.4.1. Fix an integer 𝑄 = 𝑞 0∗ · 𝑞 0 . Fix a non-zero
of the polynomial ring R𝑞 = Z𝑞 [𝑥]/(𝑥 𝑛 + 1), let 𝑏 be a scalar in Z𝑞 ,           𝑠 ∈ R𝑄 = Z𝑄 [𝑥]/(𝑥 𝑛 + 1) and a positive integer B. For a distribution
and let 𝐵 be a positive integer. We will define the boundary case for an          of polynomials sampled uniformly over R𝑄 , we can upper bound the
element of R𝑞 to occur if the boundary case for any of its coefficients           probability of the BAD event for the product 𝑎 · 𝑠 by the following:
occurs. Denote the 𝑖 𝑡ℎ coefficient of 𝑎 as 𝑎𝑖 .                                                         h                  i       𝐵
                                                                                                     Pr BAD(𝑎 · 𝑠, 𝑞 0∗ ; 𝐵) ≤ 2𝑛 ∗                 (12)
         BAD(𝑎, 𝑏; 𝐵) = {∃𝑖 ∈ [𝑛] such that BAD(𝑎𝑖 , 𝑏; 𝐵)}           (9)                          𝑎←R𝑄                             𝑞 0


                                                                             33
Session 2                                                                              WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea


   We now complete this analysis by applying corollary 3.4.1 to                       us to perform ciphertext compression with 𝑘 integer multiplications,
equation 3 for the final term 𝑣 to show that 𝑣 hides 𝑒. We will define                𝑘 floating point multiplications, 𝑘 floating point additions, and
one distribution that outputs 𝑣 as described in equation 3 and one                    𝑘 + 1 modular reductions. Comparing with the naïve approach,
distribution that leaves out the error term 𝑒. The next lemma will                    this technique replaces the entire multi-precision divide-and-floor
give an upper bound that these two distributions output different                     operation with only the marginal cost increase of 𝑘 floating point
values.                                                                               addition and multiplication operations versus 𝑘 integer addition
                                                                                      and multiplication operations. We refer the reader to [20] for further
   Lemma 3.5. If ||𝑒 || ≤ 𝐵, the final term 𝑣 from equation 3 hides the
                                                                                      analysis of the benefits of the DCRT representations.
original error term 𝑒 with probability 2𝑛 𝑞𝐵∗ , where 𝑛 is the degree of
                                             0
the polynomial modulus 𝑥 𝑛 + 1.                                                       3.3    Circuit Privacy Operation
  Proof. Define two distributions in terms of 𝑠, Δ𝑚, and 𝑒. Distri-                   We now give the full circuit privacy procedure and prove that it’s
bution 𝐷 1 produces samples by first sampling a uniformly random                      secure given a sufficiently small error term relative to the divisor 𝑞 0∗ .
   $                                                                                  The procedure, given in algorithm 1, is very simple: we add an en-
 − R𝑄 for 𝑄 = 𝑞 0∗ · 𝑞, then producing a sample of the form:
𝑎←                                                                                    cryption of zero and then perform the divide-and-round operation
                            [𝑎]𝑞0∗ · 𝑠 + Δ𝑚 + 𝑒
                          $                     %
                                                                                      described in section 3.1.
                   𝑑1 =
                                      𝑞 0∗
                                                                                      Algorithm 1 Circuit Privacy Procedure, denoted CP
This is identical to the real distribution of 𝑣. The second distribution              Input: Ciphertext ct, Public Key pk
                                                                   $
𝐷 2 produces samples by first sampling a uniformly random 𝑎 ←      − R𝑄                 ct𝑧𝑒𝑟𝑜 ← Encrypt(pk, 0)
and produces a sample of the following form:                                            ct+ = EvalAdd(ct
                                                                                               j k      𝑧𝑒𝑟𝑜 , ct)
                                 [𝑎]𝑞0∗ · 𝑠 + Δ𝑚                                        ct𝑐𝑝 = ct
                               $                 %                                                 +
                                                                                                𝑞 0∗
                         𝑑2 =
                                        𝑞 0∗                                          Output: ct𝑐𝑝

  We can bound the probability that any distinguisher can distin-
guish between the outputs of 𝐷 1 and 𝐷 2 by upper bounding the                           Algorithm 2 defines the full circuit-private evaluation algorithm
probability that these outputs are different. By corollary 3.4.1, we                  in terms of algorithm 1 and the original BFV Eval procedure.
can say that the marginal distribution of each coefficient of the
polynomial [𝑎]𝑞0∗ · 𝑠 + Δ𝑚 modulo 𝑞 0∗ is uniformly random, so this                   Algorithm 2 Circuit Private Eval Algorithm, denoted EvalCP
is upper bounded by 2𝑛 𝑞𝐵∗ .                                               □          Input: Ciphertexts ⟨ct𝑖 ⟩ = {ct𝑖 }𝑛𝑖=0 , Public Key pk, Function 𝑓
                             0
                                                                                        ct𝑖𝑛 ← BFV.Eval(evk, 𝑓 , ⟨ct𝑖 ⟩)
3.2     Avoiding Multi-Precision Arithmetic                                            ct𝑐𝑝 ←
                                                                                               $
                                                                                            − CP(ct𝑖𝑛 , pk)
The naïve approach to the above operation is to take an integer                       Output: ct𝑐𝑝
𝑥 in DCRT representation {𝑥 0, . . . , 𝑥𝑘−1 }, recombine into a multi-
precision integer, then perform a multi-precision divide and floor
by 𝑞 0∗ , then take the result mod 𝑞 0 . In total, this requires 2𝑘 integer              Finally we define BFV-CP to be a leveled homomorphic encryp-
multiplications, 𝑘 integer additions, 1 multi-precision divide-and-                   tion scheme specified by the tuple (BFV.KeyGen, BFV.Encrypt,
floor, and 𝑘 + 1 modular reductions for each coefficient.                             BFV.EncryptSK, BFV.Decrypt, EvalCP ).
    To avoid multi-precision arithmetic required when operating                          We will now argue that if the noise level of the ciphertext gen-
over elements modulo the full ciphertext modulus, we make use                         erated by BFV.Eval subroutine within the EvalCP does not exceed
                                                                        Î             a pre-specified bound, BFV-CP is circuit private (Definition 2.1).
of the linearity of the DCRT recombination. Let 𝑄 = 𝑖 𝑞𝑖 , let
𝑞𝑖∗ = 𝑄/𝑞𝑖 , and let 𝑞˜𝑖 = 𝑞𝑖∗ (mod 𝑞𝑖 ). From this equation, we have                 The high-level proof strategy is to define three hybrids (HybridReal ,
the following expression for division by 𝑞 0∗ :                                       HybridNoiseFree , HybridIdeal ) and show that they are indistinguish-
                         "                                         #                  able from the point of view of a computationally bounded distin-
             h𝑥 i                𝑘−1
                            1  Õ                      ∗
                                                                                    guisher.
                       = ∗            [𝑥𝑖 · 𝑞˜𝑖 ]𝑞𝑖 · 𝑞𝑖 − 𝑣 · 𝑞          (13)
               𝑞 0∗ 𝑞0     𝑞 0 𝑖=0                                                       Define D (ct ′, ⟨ct𝑖 ⟩, BFV.pk, BFV.sk, BFV.evk) → {0, 1} to be a
                                                                     𝑞0
                                                                                      PPT distinguisher, where ct𝑖 are valid BFV encryptions of messages
                                                                                      𝑚𝑖 respectively. The domain of the first argument (ct ′ ) is the set of
                         "      𝑘−1
                                                          #
                            1 Õ                      ∗

                       = ∗          [𝑥𝑖 · 𝑞˜𝑖 ]𝑞𝑖 · 𝑞𝑖                    (14)        BFV ciphertexts.
                           𝑞 0 𝑖=0
                                                            𝑞0
                         " 𝑘−1                     #                                       • HybridReal
                           Õ                   𝑞0                                            This hybrid corresponds to the real-world where the adver-
                       =       [𝑥𝑖 · 𝑞˜𝑖 ]𝑞𝑖 ·                            (15)               sary tries to learn the function being computed from the
                           𝑖=0
                                               𝑞𝑖
                                               𝑞0
                                                                                             ciphertext that is generated by the evaluator. Concretely, we
                                                          𝑞 0/𝑞𝑖
From equation 15 above, we have that only the terms 𝑞˜𝑖 and                                  define ct ′ ← EvalCP (⟨ct𝑖 ⟩, BFV.pk, 𝑓 ).
for all 𝑖 ∈ [𝑘] are needed to compute the desired term 𝑥/𝑞 0∗ 𝑞 .                            By the correctness of BFV, the intermediate ciphertext (ct𝑖𝑛 )
                                                                 0
Since these values only depend on the choice of factors of the                               computed in EvalCP is a valid encryption of 𝑓 (𝑚 1, . . . 𝑚𝑛 ).
ciphertext modulus, we can easily precompute them, which allows                              Represent this as ct𝑖𝑛 = (𝑎𝑖𝑛 , 𝑎𝑖𝑛 · 𝑠 + Δ𝑚 + 𝑒𝑖𝑛 ). Additionally,


                                                                                 34
Session 2                                                                                         WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea


         represent pk = (𝑎 ′, 𝑎 ′𝑠 + 𝑒 ′ ).1 The encryption of zero has the                      Given a Hybrid, let 𝑃 [Hybrid] denote a probability defined as
         form,                                                                                follows,
                    ct𝑧𝑒𝑟𝑜 = (𝑎 ′𝑢 + 𝑒 ′′, 𝑎 ′𝑠𝑢 + 𝑒 ′𝑢 + 𝑒 ′′′ )                                          Pr[Hybrid] = Pr [D (·, ⟨ct𝑖 ⟩, sk, pk, evk) = 1]
        where 𝑢, 𝑒 ′′, 𝑒 ′′′ ← 𝜒.                                                             .
        Hence, the ciphertext ct+ has the form,                                                                    | |𝑒 | |
                                                                                                 Lemma 3.6. If 𝑛 𝑞+∗ ≤ 2−𝜆 for security parameter 𝜆, then no PPT
                                    ′      ′′                                                                       0
              ct+ [0] = 𝑎𝑖𝑛 + 𝑎 𝑢 + 𝑒 = 𝑎 +                                                   distinguisher can distinguish Hybrid0 and Hybrid1 with probability
              ct+ [1] = 𝑎𝑖𝑛 · 𝑠 + Δ𝑚 + 𝑒𝑖𝑛 + 𝑎 ′𝑠𝑢 + 𝑒 ′𝑢 + 𝑒 ′′′                             better than 2−𝜆 .
                       = 𝑎 + · 𝑠 + Δ𝑚 + 𝑒 +                                                               | Pr[HybridReal ] − Pr[HybridNoiseFree ]| ≤ 2−𝜆
        where 𝑒 + = 𝑒𝑖𝑛 + 𝑒 ′𝑢 + 𝑒 ′′′ − 𝑠 · 𝑒 ′′ .                                              Proof. We can upper bound the success of any PPT distin-
        Performing the divide-and-floor operation during the CP pro-                          guisher 𝐷 from distinguishing the outputs of HybridReal and
        cedure on ct+ results in a new ciphertext ct𝑞𝑢𝑜𝑡 = ⌊ct+ /𝑞 0∗ ⌋,                      HybridNoiseFree by the probability that the outputs of these hybrids
        which simplifies to,                                                                  are different. The only place where these hybrids differ is in the error
                                                                                              term that is added to the numerator. In equation 18 in HybridReal ,
                          j𝑎 k
                               +
             ct𝑞𝑢𝑜𝑡 [0] = ∗ = 𝑎 +′                                (16)
                             𝑞0                                                               there is an 𝑒 + term that is added in the flooring numerator, while
                          j 𝑎 · 𝑠 + Δ𝑚 + 𝑒 k                                                  in equation 20 in hybrid HybridNoiseFree , this term is omitted.
                               +                 +
             ct𝑞𝑢𝑜𝑡 [1] =                                         (17)
                                     𝑞 0∗                                                        To invoke lemma 3.5, we must show that each coefficient of
                                     j [𝑎 + ]𝑞 ∗ · 𝑠 + Δ𝑚 + 𝑒 + k                             𝑎 + is uniformly random modulo 𝑞 0∗ . This can be seen through a
                        = 𝑎 +′ · 𝑠 +           0
                                                                  (18)                        straightforward application of the same argument used in the proof
                                                    𝑞 0∗                                      of lemma 3.5 on the coefficients of the product 𝑎 ′ · 𝑢. Once we
       where 𝑎 + = 𝑎 +′ · 𝑞 0∗ + [𝑎 + ]𝑞0∗ .                                                  show that these coefficients are random, the fixed offsets defined
       Note that the ciphertext ct ′ := ct𝑞𝑢𝑜𝑡 .                                              by the coefficients of the rest of the terms in 𝑎 + do not change this
     • HybridNoiseFree                                                                        fact. Therefore, we can invoke lemma 3.5 on the 𝑣 term in hybrid
                                                                                              HybridReal to show that this term hides 𝑣 and is statistically close
        In this hybrid, we define a new simulator that outputs a ci-                          to the distribution of the 𝑣 terms in HybridNoiseFree .
        phertext for the distinguisher D, that is very similar to the                                                                                               □
        HybridReal . The main difference is that this simulator tweaks                           Lemma 3.7. Assuming the hardness of RLWE𝑛,𝑞,𝜒 , no PPT dis-
        the computation of ct𝑞𝑢𝑜𝑡 [1] to set the 𝑒 + term to zero. Con-                       tinguisher D can distinguish HybridNoiseFree from HybridIdeal with
        cretely, we define a simulator Sim(1𝑙 , pk, evk, sk, 𝑓 , ⟨𝑐𝑡𝑖 ⟩) →                    non-negligible advantage.
        ct ′ , which recomputes all the quantities computed in the
                                                                                                        | Pr[HybridNoiseFree ] − Pr[HybridIdeal ]| ≤ negl(𝜆)
        HybridReal . It then uses the secret key sk to compute 𝑒𝑖𝑛 .
        Subsequently it computes 𝑒 + and which can be subtracted                                 Proof. Assume a distinguisher D exists that can distinguish
        from the ct+ [1].                                                                     HybridNoiseFree from HybridIdeal with probability at least 𝛿. We can
        Thus, the output of this simulator ct ′ simplifies to,                                use D to construct an adversary A that breaks the RLWE problem
                              j𝑎 k
                                   +                                                          with advantage 𝛿. Given an RLWE sample (𝑎, 𝑏), we can construct
                   ct ′ [0] = ∗ = 𝑎 +′                                 (19)                   a key-pair and ciphertext such that the ciphertext will have the
                                 𝑞0
                                         j [𝑎 + ]𝑞 ∗ · 𝑠 + Δ𝑚 k                               form of a HybridNoiseFree sample if 𝑏 = 𝑎 · 𝑢 + 𝑒 and the form of a
                   ct ′ [1] = 𝑎 +′ · 𝑠 +           0
                                                                       (20)                   HybridIdeal sample if 𝑏 is uniform over R𝑞 .
                                                     𝑞 0∗                                                                                                      $
                                                                                                 Given an RLWE sample (𝑎, 𝑏), begin by sampling 𝑠 ′, 𝑒 ′ ←        − 𝜒 and
     • HybridIdeal                                                                            set sk = 𝑠 ′ and pk = (𝑎, 𝑎𝑠 ′ + 𝑒 ′ ) = (pk0, pk1 ). Next, set 𝑟 ′ = 𝑏 and
       This hybrid corresponds to the ideal world where the adver-                            compute ct𝑞𝑢𝑜𝑡 as in equation 22. If 𝑏 = 𝑎𝑢 + 𝑒 then 𝑟 ′ is identically
       sary only has access to the function output 𝑚 = 𝑓 (𝑚 1 . . . 𝑚𝑛 )                      distributed to 𝑎 + in HybridNoiseFree and if 𝑏 is uniform then 𝑟 ′ is
       but has no access to the function 𝑓 . We define a new simula-                          identically distributed to HybridIdeal . Therefore, any PPT distin-
       tor Sim(1𝑙 , pk, evk, sk, 𝑚) → ct ′ whose output will be sent                          guisher D that can distinguish HybridNoiseFree from HybridIdeal
       to the distinguisher D. Note that this is the same simulator                           can solve decisional RLWE with the same probability.                      □
       required by definition 2.1.
       The simulator samples a uniformly polynomial 𝑟 from R𝑞 . It                               Theorem 3.8 (Circuit Privacy). Given that the input ciphertext
       then outputs,                                                                          and public key are well-formed, the input ciphertext error has magni-
                             j𝑟 k                                                             tude bounded by 𝐵 ′ and the error distribution 𝜒 is 𝐵-bounded, and
                  ct ′ [0] = ∗ = 𝑟 ′                                (21)                      for a security parameter 𝜆 we have
                               𝑞0
                                       j [𝑟 ]𝑞 ∗ · 𝑠 + Δ𝑚 k                                                                   2𝑛𝐵 2 + 𝐵 + 𝐵 ′
                                                                                                                       2𝑛                     ≤ 2−𝜆                 (23)
                  ct ′ [1] = 𝑟 ′ · 𝑠 +         0
                                                                    (22)                                                            𝑞 0∗
                                                 𝑞 0∗
1 Since we are performing a simple homomorphic operation without homomorphic                  then algorithm 2 achieves 2−𝜆 -circuit privacy as defined in definition
multiplications or rotations, it is sufficient for evk to simply be the public key pk.        2.1.


                                                                                         35
Session 2                                                                                     WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea


   Proof. This follows by combining lemmas 3.6 and 3.7, since                                Algorithm 3 Receiver VOLE Protocol
HybridReal is equivalent to the distribution of outputs of algorithm                         Input: 𝑥 ∈ Z𝑝 , Secret Key sk ∈ R𝑞
2 and HybridIdeal is a distribution that is independent of the circuit                                 $
used to compute the final message. The magnitude of 𝑒 + in lemma                                ct𝑖𝑛 ←
                                                                                                     − Encrypt(sk, 𝑥)
                                                                                                            𝑛𝑒𝑡
3.6 is 𝐵 ′ plus the magnitude of the error of a public key encryption                           Sender ←−−− ct𝑖𝑛
                                                                                                   (𝑖)        𝑛𝑒𝑡
scheme using error distribution 𝜒. The bound in equation 23 follows.                            {ct𝑟𝑒𝑠 }𝜏𝑖=1 ←−−− Sender
Therefore, algorithm 2 achieves 2−𝜆 -circuit privacy as defined in                                                                 (𝑖)
                                                                                              {𝜸 (𝑖) }𝜏𝑖=1 ← Decrypt(sk, ct𝑟𝑒𝑠 )
definition 2.1.                                                     □
                                                                                             Output: 𝜸 = (𝜸 (1) , 𝜸 (2) , . . . , 𝜸 (𝜏) )
3.4     Correctness
In this subsection, we give the correctness of our circuit privacy                           Algorithm 4 Sender VOLE Protocol
procedure. Note that since we’ve already shown the security of this                          Input: VOLE inputs 𝜶 , 𝜷 ∈ Z𝑚
                                                                                                                         𝑝 , Public Key pk
operation, we don’t have to worry about equivalent expressions                                        𝑛𝑒𝑡
leaking information about the original error term.                                              ct𝑖𝑛 ←−−− Receiver
   We begin with the expression for the ciphertext quotient.                                    for 𝑖 from 1 to 𝜏 do
                                                                                                     (𝑖)
                                                                                                   ct𝑜𝑙𝑒 ← EvalAddPlain(EvalMultPlain(ct𝑖𝑛 , 𝜶 (𝑖) ), 𝜷 (𝑖) )
                                                 [𝑎]𝑞0∗ · 𝑠 + Δ𝑚 + 𝑒
                                               $                             %
                                                                                                  (𝑖)             (𝑖)
           ′ ′                    ′ ′
   ct = 𝑎 , 𝑎 · 𝑠 + 𝑣 = 𝑎 , 𝑎 · 𝑠 +                                                              ct𝑐𝑝 ← CP(ct𝑜𝑙𝑒 , pk)
                                                            𝑞 0∗                              end for
                                                                                                        𝑛𝑒𝑡            (𝑖)
                                                                                              Receiver ←−−− ct𝑐𝑝 = {ct𝑐𝑝 }𝜏𝑖=1
                       $                                                 %
                         [𝑎]𝑞0∗ · 𝑠 + 𝑒      1 𝑄 [𝑄] 𝑝
           ′ ′
      = 𝑎 , 𝑎 ·𝑠 +                ∗      + ∗( −                     )𝑚                       Output: ⊥
                               𝑞0            𝑞0 𝑝             𝑝
                       $                                             %
                         [𝑎]𝑞0∗ · 𝑠 + 𝑒       𝑞0     [𝑄] 𝑝
           ′ ′                                                                   (24)
      = 𝑎 , 𝑎 ·𝑠 +                       +   (     −             )𝑚
                               𝑞 0∗            𝑝      𝑞 0∗ · 𝑝                                  In sections 4.1 and 4.2, we show the security and correctness of
                                                                                             the VOLE protocol defined by algorithms 3 and 4.
                                           $                   [𝑄 ]            %
                      j𝑞 k                  [𝑎]𝑞0∗ · 𝑠 − 𝑝 𝑝 𝑚 + 𝑒
           ′ ′             0
      = 𝑎 , 𝑎 ·𝑠 +
                          𝑝
                             𝑚 + 𝑒𝑓 +
                                                           𝑞 0∗
                                                                                             4.1     Security
                                                                                           We will now argue the security of the VOLE protocol.
      = 𝑎 ′, 𝑎 ′ · 𝑠 + Δ ′𝑚 + 𝑒 𝑓 + 𝑣 ′
                                                                                                Lemma 4.1 (Security Against Sender). If the parameters of
    where Δ ′ = ⌊𝑞 0 /𝑝⌋ and 𝑒 𝑓 is the small error term (||𝑒 𝑓 || ≤ 1)                      the homomorphic encryption scheme are chose to satisfy semantic
introduced by removing ⌊𝑞 0 /𝑝⌋𝑚 from the flooring term. In section                          security, algorithm 3 achieves security against an honest-but-curious
4, we will give concrete parameters that result in sufficiently small                        sender.
errors to allow for decryption correctness, but at a high level this                           Proof. This follows directly from the semantic security of the
ciphertext is decryptable if                                                                 encryption scheme.                                             □
                               Δ ′ /2 > ||𝑣 ′ + 𝑒 𝑓 ||
                                                                                                The security against a semi-honest receiver is dependent on the
                                                                                             choosing parameters that satisfy the constraints given in theorem
4     OUR VOLE PROTOCOL                                                                      3.8.
In this section, we give our VOLE protocol. Let 𝑚 be the length
of the VOLE vectors and let 𝑛 be the dimension of the ring R =                                  Lemma 4.2 (VOLE Circuit Privacy Parameter). Let 𝜒 be a
Z[𝑥]/(𝑥 𝑛 + 1). Our VOLE protocol splits the vectors in Z𝑚                                   𝐵-bounded error distribution. For error distribution 𝜒, polynomial
                                                              𝑝 over
𝜏 = ⌈𝑚/𝑛⌉ elements of R𝑝 , padding with zeros when necessary. The                            degree 𝑛, plaintext modulus 𝑝, ciphertext modulus 𝑞 = 𝑞 0 · 𝑞 0∗ , and
sender then uses homomorphic encryption operations to compute                                security parameter 𝜆, the following parameter bound will result in
the VOLE result, which is decrypted by the receiver.                                         the homomorphic evaluation in algorithm 4 to maintain 2−𝜆 -circuit
   In more detail, the receiver begins the protocol by encoding the                          privacy.
input 𝑥 ∈ Z𝑝 in a single BFV ciphertext ct𝑥 , which it sends to the                                                  2𝑛𝐵 2 + 𝐵 + 𝑛𝑝𝐵
                                                                                                                  2𝑛                 ≤ 2−𝜆                      (25)
sender. The sender’s inputs 𝜶 , 𝜷 ∈ Z𝑚   𝑝 are split into 𝜏 elements                                                        𝑞 0∗
(𝜶 , 𝜶 , . . . , 𝜶 ) and (𝜷 , 𝜷 , . . . , 𝜷 (𝜏) ). For each of the 𝜏
    (1)  (2)        (𝜏)        (1)   (2)
                                                                                                Proof. This follows from theorem 3.8 if we show that the input
blocks, the sender computes a ciphertext                                                     to the circuit privacy procedure has an error term with magnitude
        ct (𝑖) = EvalAddPlain(EvalMultPlain(ct𝑥 , 𝜶 (𝑖) ), 𝜷 (𝑖) )                           bounded by 𝑛𝑝𝐵. This follows directly from the analysis of the error
                                                                                             growth of the EvalMultPlain operation. The encrypted input to the
The sender then performs the circuit privacy procedure from section                          EvalMultPlain function in algorithm 4 is a fresh encryption of the
3 and returns each of the resulting ciphertexts to the receiver. The                         receiver’s input which has an error term with magnitude upper
receiver then decrypts the 𝜏 blocks to obtain the VOLE result.                               bounded by 𝐵, so the magnitude of the resulting error term is no
   The pseudo-code for the sender and receiver roles are given in                            more than 𝑛𝑝𝐵. The bound in equation 25 follows from theorem
algorithms 3 and 4.                                                                          3.8 where 𝐵 ′ = 𝑛𝑝𝐵.                                              □


                                                                                        36
Session 2                                                                         WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea


   Lemma 4.3 (Security Against Receiver). Given that the param-                  Algorithm 5 Receiver BOLE Protocol
eters of the homomorphic encryption scheme are chosen to satisfy the             Input: x ∈ Z𝑚  𝑝 , Secret Key sk ∈ R𝑞
circuit privacy constraint in lemma 4.2, algorithm 4 achieves security
                                                                                   Let (x , x (2) , . . . , x (𝜏) ) ← x, where 𝜏 = ⌈𝑚/𝑛⌉
                                                                                          (1)
against an honest-but-curious receiver.
                                                                                   for 𝑖 from 1 to 𝜏 do
                                                                                        (𝑖) $
   Proof. The proof of this lemma relies on the circuit privacy                      ct𝑖𝑛 ←  − Encrypt(sk, x (𝑖) )
result of theorem 3.8. Note that we crucially rely on the receiver’s               end for
                                                                                              𝑛𝑒𝑡      (𝑖)
initial ciphertext being well-formed. If this is the case, then a simula-          Sender ←−−− {ct𝑖𝑛 }𝜏𝑖=1
tor S can use the simulator from theorem 3.8 to return a ciphertext                     (𝑖)       𝑛𝑒𝑡
                                                                                    {ct𝑟𝑒𝑠 }𝜏𝑖=1 ←−−− Sender
to the receiver that is independent of the circuit used to compute                                                     (𝑖)
the final message 𝜸 by sampling a ciphertext as in Hybrid2 from                   {𝜸 (𝑖) }𝜏𝑖=1 ← Decrypt(sk, ct𝑟𝑒𝑠 )
section 3. Since S knows the correct output message 𝜸 , this ci-                 Output: 𝜸 = (𝜸 (1) , 𝜸 (2) , . . . , 𝜸 (𝜏) )
phertext is indistinguishable from the output of the circuit privacy
procedure in the real algorithm 4. Therefore, by theorem 3.8, the
message produced by algorithm 4 is indistinguishable from the                    dimension 𝑛. These parameter sets must satisfy both the computa-
output produced by the simulator that samples a ciphertext from                  tional security requirements of the RLWE problem as well as the
Hybrid2 .                                                              □         security requirements of the circuit-privacy procedure. In the VOLE
                                                                                 protocol, there are no restrictions on the plaintext modulus 𝑝 other
   Theorem 4.4 (Secure VOLE Protocol). Algorithms 3 and 4                        than its size, so we opt for powers of two for ease of comparison. In
define of a secure VOLE protocol.                                                the BOLE protocol, the plaintext modulus must support the NTT
                                                                                 operation to encode the vectors as polynomials for component-wise
    Proof. This follows from combining lemmas 4.1 and 4.3.            □          operations. This requires that Z𝑝 contain a 2𝑛 th root of unity.
                                                                                    From the previous section, we have several constraints on the
4.2    Correctness                                                               parameters for the protocol to satisfy both security and correctness.
We will now give parameters for which the VOLE protocol is correct.                 (1) For circuit privacy parameter 𝜆𝑐𝑝 , lemma 4.2 gives the fol-
                                                                                        lowing lower bound on 𝑞 0∗ .
    Lemma 4.5 (Correctness). Let 𝜒 be a 𝐵-bounded error distribu-
tion. For error distribution 𝜒, polynomial degree 𝑛, plaintext modulus                                      2𝑛𝐵 2 + 𝐵 + 𝑛𝑝𝐵
𝑝, and ciphertext modulus 𝑞 = 𝑞 0 · 𝑞 0∗ . Algorithms 3 and 4 achieve                                1−𝑛                    ≥ 1 − 2−𝜆𝑐𝑝
                                                                                                                   𝑞 0∗
the correct VOLE functionality as defined in definition A.1 if 𝑞 0∗ is
chosen to satisfy the circuit privacy constraint from lemma 4.2 and                                      𝑞 0∗ ≥ 2𝜆𝑐𝑝 · 𝑛(2𝑛𝐵 2 + 𝐵 + 𝑛𝑝𝐵)
𝑞 0 > 2𝑝𝑛𝐵 + 2𝑝 + [𝑞 0 ] 𝑝 , where 𝑣 is the compressed error term from              (2) For correctness, lemma 4.5 gives the following lower bound
equation 3.                                                                             on 𝑞 0 .
                                                                                                            𝑞 0 > 2𝑝𝑛𝐵
    Proof. This follows directly from the BFV scheme.                 □
                                                                                     (3) Finally, for semantic security, we must choose 𝑛 and 𝑞 to
                                                                                         satisfy the desired security level. These parameters are given
4.3    Simple Extension to Batched OLE
                                                                                         in table 1 in the Homomorphic Encryption Standard [2]. In
The extension of this VOLE protocol to a BOLE protocol is straight-                      particular, this table gives a maximum size of 𝑞 for a given
forward and preserves the security proofs of section 4.1. In addition,                   value of 𝑛. We must select an 𝑛 that is large enough for our
this BOLE extension maintains the exact communication complex-                           choice of 𝑞.
ity of the VOLE protocol. Only the receiver’s role is modified in this
                                                                                    Since we have competing constraints on 𝑞, it is not immediately
extension. We show the receiver’s BOLE procedure in algorithm 5.
                                                                                 clear that we can pick parameters that will be able to satisfy all
The receiver begins with a vector x of values, which she encodes
                                                                                 these constraints. Luckily, there are many parameter sets that satisfy
as a polynomial as described in the full version of this paper. The
                                                                                 these conditions.
operations performed by the sender are identical. To obtain the
                                                                                    Our implementation of the discrete Gaussian sampling algorithm
BOLE output, the receiver must decode the decryption result by
                                                                                 follows [16], which takes advantage of the small standard deviation
evaluating the resulting polynomial at the roots of unity determined
                                                                                 to simply compute the probability of sampling all of the most com-
by the encoding NTT parameters.
                                                                                 mon integers. This truncated distribution is statistically close to the
                                                                                 real discrete Gaussian distribution while still allowing us to bound
5     IMPLEMENTATION & PERFORMANCE
                                                                                 the maximum size of a sampled term. Following the homomorphic
In this section, we give an implementation of the VOLE and BOLE                  encryption standard, we use a standard deviation of 𝜎 = 3.2. Includ-
protocols presented in section 4.                                                ing the sign bit, no term sampled from 𝜒 is greater than five bits,
                                                                                 so 𝐵 = 32 for all our parameter sets.
5.1    Parameter Selection                                                          In most of the parameter sets in this section, each limb of the
In both of these protocols, there are four parameters that must se-              ciphertext modulus will be 55 bits. For a circuit privacy parameter of
lected: the magnitude bound 𝐵 of the 𝐵-bounded error distribution                𝜆𝑐𝑝 = 80, we will use a ciphertext modulus consisting of four limbs
𝜒, the plaintext modulus 𝑝, the ciphertext modulus 𝑞, and the ring               for a total of 220 bits. This is the maximum size of the modulus for


                                                                            37
Session 2                                                                     WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea


𝑛 = 213 in table 1 of the HE standard [2]. For a plaintext modulus
𝑝 = 232 , these parameters satisfy the three constraints given above.

5.2     Extending to Larger Moduli
There are some applications which require moduli that are larger
than can be supported by a final 𝑞 0 that fits in a standard machine
word. We handle these moduli by using the same DCRT represen-
tation as the ciphertext modulus. In words, for a plaintext modulus
𝑝 that is too large for a standard machine word, we represent 𝑝 as
                            Îℓ
a product of primes 𝑝 = 𝑖=0      𝑝𝑖 where each 𝑝𝑖 is small enough            Figure 1: Time per OLE multiplication vs. VOLE length compared
to be supported by a 𝑞 0 that fits in a standard machine word for            to [3] (denoted ADI+17). Times are measured in microseconds, and
the given circuit privacy parameter. This allows us to extend the            the VOLE protocol is over a 32-bit field. The runtime for the noise-
support of the protocol in section 4 to larger moduli with a factor          flooding variant of our protocol was estimated by implementing the
of ℓ overhead.                                                               noise sampling function using NTL [33] and adding this sampling
                                                                             time to our protocol time.
5.3     Optimizing for Low-Bandwidth Networks
In section 5.4, we will benchmark our implementation in an envi-
ronment with low network bandwidth. We define this as a setting
where the network bandwidth is the bottleneck of the protocol. To
reduce the overall communication of our protocol, we include in
our implementation a common optimization to reduce the size of a
fresh Ring LWE encryption by a factor of two. The idea is simple:
instead of sampling a random 𝑎 polynomial for a ciphertext (𝑎, 𝑏),
sample a random PRG seed 𝜎 and generate 𝑎 ← 𝑃𝑅𝐺 (𝜎) to be the
output of the PRG. The remainder of the ciphertext is generated
as if 𝑎 were sampled normally, but when it comes time to send
the ciphertext we only need to send the pair (𝜎, 𝑏), replacing the           Figure 2: Time for a single run of the VOLE protocol vs. VOLE
element of R𝑞 with a 16 byte PRG seed.                                       length compared to [3] (denoted ADI+17). Times are mea-
                                                                             sured in seconds, and the VOLE protocol is over a 32-bit field.
5.4     Experimental Setup & Results
We implement2 the VOLE and BOLE protocols from section 4 on                  slower than the machine used for the benchmarks in [3], but this
top of an implementation of the BFV [11, 17] homomorphic en-                 is not considered in our comparisons (all numbers are reported
cryption scheme based on the BFV-RNS implementation of Halevi,               without scaling from [3]).
Polyakov, and Shoup [20] in addition to code from the work of                   Using this setup, we took two types of benchmarks. The first
Juvekar, Vaikuntanathan, and Chandrakasan [24]. We use an NTT                benchmark is of the protocol running between the two machines
implementation based off of the NFLlib library of Aguilar-Melchor,           using all of the threads. Based on this runtime, we divided by the
Barrier, Guelton, Guinet, Killijian, and Lepoint [1].                        length of the VOLE protocol to get the time per OLE multiplication.
   Asymptotically Linear Comparisons. As mentioned in section                We then ran the protocol once using a single thread per machine
1.2, we separate prior work into two categories and compare to               to measure the latency, the time that a higher level application
each category separately. The first category consists of protocols           must wait for the protocol to complete. We compare these results
with (asymptotically) linear communication complexity, and, more             against the numbers given in [3] (see tables 2 and 3 in [3]) for their
heuristically, protocols that are optimized for small VOLE lengths.          optimized 32 bit protocol. These comparisons are given in figures
The fastest protocol in this category that we are aware of is the            1 and 2. We note that the trend lines for the [3] protocol do not
work of Applebaum, Dåmgard, Ishai, Nielsen, and Zichron [3]. To              consider the increase in communication of the protocol, which
compare against [3], we ran our protocol on two AWS m5.2xlarge               could result in further slowdown in settings where the bandwidth
instances, each having 8 vCPUs at 3.1 GHz and 32 GB of RAM.                  is constrained. We also note that this graph includes an estimated
These instances were in the same geographic region (US-East) and             benchmark for the time per OLE for our protocol if the circuit
were connected by a network with a bandwidth of 500 MB/sec. This             privacy analysis from section 3 is not considered and noise flooding
setup was intended to replicate the conditions of the experiments            is used to achieve circuit privacy. This runtime is estimated by
of [3] (see section 6.5). Our network bandwidth is much lower than           implementing the sampling function for the noise flooding term in
in their setup which favors [3] as their communication complexity            NTL [33], then adding the time for this sampling algorithm to our
does not exceed the bandwidth of the network we used. We employ              runtime.
the communication optimization described in section 5.3. We also                Finally, we measured the communication complexity of a sin-
note that the clock speed of the CPU of our machine is slightly              gle run of our VOLE protocol and compared it to estimates of the
                                                                             communication complexity of [3] based on the reported consumed
2 https://github.com/leodec/ole_wahc                                         bandwidth for their protocol. This comparison is displayed in figure


                                                                        38
Session 2                                                                       WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea


                                                                               Figure 4: Time for a single run of the VOLE protocol vs VOLE
Figure 3: Total communication in kilobytes for a single run                    length compared to [31] (denoted SGRR19). The runtime is
of the VOLE protocol vs. VOLE length compared to [3] (de-                      measured in seconds, and the VOLE is over a 32-bit field.
noted ADI+17). The VOLE is over a 32-bit field. Note that the                  Both axes are shown in log scale to better display the trends.
slope of the ADI+17 trend line is greater than the slope of our                Based on extrapolation, we estimate the intersection point
protocol’s trend line. We estimate the cross-over point in the                 in the runtimes to be near a VOLE length of 235 .
communication complexity to be a VOLE length of around
216 .


3. For small VOLE, the protocol of [3] achieves better communica-
tion complexity than our protocol for VOLE lengths less than 216 ,
although beyond this length our protocol achieves better commu-
nication complexity.
    Asymptotically Sublinear Comparisons. The second category of
prior art consists of protocols with sub-linear communication com-
plexity. These protocols are designed to be optimized for large
VOLE lengths, and due the sub-linear communication they will
                                                                               Figure 5: Total communication for a single run of the VOLE
always be asymptotically faster than our protocol. Despite this, our
                                                                               protocol vs VOLE length compared to [31] (denoted SGRR19).
protocol is faster for smaller VOLE lengths, and our comparison
                                                                               The communication is measured in megabytes, and the
aims to determine VOLE length at which one protocol becomes
                                                                               VOLE is over a 32-bit field. Both axes are shown in log scale
faster than the other. By determining this point, we are able to
                                                                               to better display the trends. Based on extrapolation, we esti-
discuss applications for which our protocol is more efficient and
                                                                               mate the intersection point in the communication to be near
when it makes sense to switch to a protocol optimized for larger
                                                                               a VOLE length of 232 .
VOLE sizes. To our knowledge, the fastest protocol in this category
is the work of Schoppmann, Gascón, Reichert, and Raykova [31],
which is an optimized two-party implementation of the FSS-based                is able to generate arbitrary VOLE correlations directly. While there
sub-linear vector VOLE [8].                                                    is a simple and secure reduction from random VOLE to arbitrary
    Since the authors of [31] included their code, we were able to             VOLE, it is an interesting future direction to try to further optimize
run side-by-side comparisons between our protocol and theirs.                  our protocol for generating random VOLE correlations.
Following the setup from their work, we ran each protocol on a
single thread on two machines connected by a network with 500                  5.5    Comparison Against Subsequent Works
MB/sec bandwidth. The results of these comparisons are given in
                                                                               In this section, we briefly compare against works subsequent to the
figures 4 and 5. By extrapolating the data in figure 4, we estimate the
                                                                               online publication of this work.
the cross-over point is near a VOLE length of 235 . Extrapolating the
communication complexity trends in figure 5 gives an approximate                  Comparisons to Asymptotically Linear Protocols. We compare
cross-over point near a VOLE length of 232 . In section 5.6, we give           against the one-round BOLE protocol of Baum et al. [5]. This work
some applications that require VOLE protocols of width less than               presents several different BOLE protocols, and we compare against
this cross-over point.                                                         the version the authors chose to benchmark. This protocol assumes
    We note briefly that the protocol in [31] requires running a               that the two parties each begin with the public key of the other
smaller VOLE protocol that it then expands using an LPN code.                  party, and we do not count this key exchange as part of the com-
When generating an exceptionally long VOLE correlation, one could              munication of this protocol. Aside from the round complexity, our
run a protocol that recursively calls [31] until it reaches a point            BOLE protocol outperforms the one-round protocol in both com-
where our protocol is faster, then runs our protocol as the base               munication and computation. To compare this one-round protocol
case. It is an interesting future direction to combine our protocol            to our C++ implementation, we implemented the one-round BOLE
with [31] to improve the performance of VOLE with sub-linear                   protocol using the same underlying RLWE library as our BOLE
communication.                                                                 protocol. Based on these comparisons, our protocol is a 20-30×
    We also note that both protocols in this section focus on optimiz-         faster computationally in experiments where each party is running
ing the generation of random VOLE correlations while our protocol              in a single thread connecting over localhost. Our benchmarks


                                                                          39
Session 2                                                                    WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea


for the one-round protocol for a 60-bit plaintext was less than one         PALISADE library. We gratefully acknowledge funding from the
second, faster than Golang benchmark given in Baum et al. [5] with          Jameel-Clinic and from the National Science Foundation.
the same parameters, which was roughly 1.5 seconds. Our total
communication is strictly less than the one-round protocol, since
one of their ciphertexts requires essentially the same LPR overhead         REFERENCES
as ours, while the other ciphertext that they send requires double           [1] Carlos Aguilar-Melchor, Joris Barrier, Serge Guelton, Adrien Guinet, Marc-Olivier
the LPR overhead.                                                                Killijian, and Tancrède Lepoint. 2016. NFLlib: NTT-Based Fast Lattice Library.
                                                                                 In Topics in Cryptology - CT-RSA 2016, Kazue Sako (Ed.). Springer International
   Comparisons to Asymptotically Sublinear Protocols. To compare                 Publishing, Cham, 341–356.
to the subsequent VOLE protocol of Weng et al. [34], we note that            [2] Martin Albrecht, Melissa Chase, Hao Chen, Jintai Ding, Shafi Goldwasser, Sergey
                                                                                 Gorbunov, Shai Halevi, Jeffrey Hoffstein, Kim Laine, Kristin Lauter, Satya Lokam,
this work claims a roughly 20× computational speedup over Schopp-                Daniele Micciancio, Dustin Moody, Travis Morrison, Amit Sahai, and Vinod
mann et al. [31]. We can extrapolate this comparison to estimate                 Vaikuntanathan. 2018. Homomorphic Encryption Security Standard. Technical
the crossover point for the Weng et al. protocol, which is roughly a             Report. HomomorphicEncryption.org, Toronto, Canada.
                                                                             [3] Benny Applebaum, Ivan Damgård, Yuval Ishai, Michael Nielsen, and Lior Zichron.
VOLE length of 228 . For communication, the Weng et al. protocol a               2017. Secure Arithmetic Computation with Constant Computational Overhead.
setup cost of 1.1 MB and subsequent communication of 0.42 bits                   In Advances in Cryptology – CRYPTO 2017, Jonathan Katz and Hovav Shacham
per VOLE correlation puts the crossover point with our protocol to               (Eds.). Springer International Publishing, Cham, 223–254.
                                                                             [4] Abhishek Banerjee, Chris Peikert, and Alon Rosen. 2012. Pseudorandom
be near a VOLE length of 215 .                                                   Functions and Lattices. In Advances in Cryptology – EUROCRYPT 2012, David
   To compare against the BOLE protocol of Boyle et al. [10], we                 Pointcheval and Thomas Johansson (Eds.). Springer Berlin Heidelberg, Berlin,
                                                                                 Heidelberg, 719–737.
first note that the work does not give the benchmarks of an actual           [5] Carsten Baum, Daniel Escudero, Alberto Pedrouzo-Ulloa, Peter Scholl, and
implementation, only estimates of the computation time. They give                Juan Ramón Troncoso-Pastoriza. 2020. Efficient Protocols for Oblivious Lin-
various parameter sets to allow for a communication-computation                  ear Function Evaluation from Ring-LWE. In Security and Cryptography for Net-
                                                                                 works, Clemente Galdi and Vladimir Kolesnikov (Eds.). Springer International
trade-off. For 128 bits of security, The estimates they provide for              Publishing, Cham, 130–149.
generating 220 batch OLE correlations give performance that ranges           [6] Avrim Blum, Adam Tauman Kalai, and Hal Wasserman. 2003. Noise-Tolerant
from roughly 13 seconds with nearly 18 MB of communication to                    Learning, the Parity Problem, and the Statistical Query Model. Journal of the ACM
                                                                                 (JACM) 50, 4 (July 2003), 506–519. https://www.microsoft.com/en-us/research/
nearly 20 seconds with communication roughly 1.5 MB. The plain-                  publication/noise-tolerant-learning-parity-problem-statistical-query-model/
text modulus of these BOLE correlations was roughly 2124 , and               [7] Florian Bourse, Rafaël Del Pino, Michele Minelli, and Hoeteck Wee. 2016. FHE
                                                                                 Circuit Privacy Almost for Free. In Advances in Cryptology – CRYPTO 2016,
while our BOLE protocol is not optimized for these large moduli, we              Matthew Robshaw and Jonathan Katz (Eds.). Springer Berlin Heidelberg, Berlin,
can estimate the performance for a plaintext modulus of this size by             Heidelberg, 62–89.
generating four 32-bit BOLE correlations for each 124-correlation            [8] Elette Boyle, Geoffroy Couteau, Niv Gilboa, and Yuval Ishai. 2018. Compressing
                                                                                 Vector OLE. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and
in [10] (this simulates a CRT splitting of a 128-bit plaintext mod-              Communications Security (Toronto, Canada) (CCS ’18). Association for Comput-
ulus). Our benchmarks for 222 BOLE correlations with a 32-bit                    ing Machinery, New York, NY, USA, 896–912. https://doi.org/10.1145/3243734.
plaintext modulus give a computation time of 3 seconds and a total               3243868
                                                                             [9] Elette Boyle, Geoffroy Couteau, Niv Gilboa, Yuval Ishai, Lisa Kohl, Peter Rindal,
communication of about 200 MB.                                                   and Peter Scholl. 2019. Efficient Two-Round OT Extension and Silent Non-
                                                                                 Interactive Secure Computation. In Proceedings of the 2019 ACM SIGSAC Con-
5.6    Application: Secure Image Convolution                                     ference on Computer and Communications Security (London, United Kingdom)
                                                                                 (CCS ’19). Association for Computing Machinery, New York, NY, USA, 291–308.
There are many applications for a VOLE protocol, including many                  https://doi.org/10.1145/3319535.3354255
that do not require VOLE correlations of length greater than the            [10] Elette Boyle, Geoffroy Couteau, Niv Gilboa, Yuval Ishai, Lisa Kohl, and Peter
cross-over points discussed in section 5.4. One concrete example                 Scholl. 2020. Efficient Pseudorandom Correlation Generators from Ring-LPN.
                                                                                 In Advances in Cryptology – CRYPTO 2020, Daniele Micciancio and Thomas
is image convolution for secure neural network evaluation. The                   Ristenpart (Eds.). Springer International Publishing, Cham, 387–416.
canonical image convolution operations takes in a two-dimensional           [11] Zvika Brakerski. 2012. Fully Homomorphic Encryption without Modulus Switch-
image and a small two-dimensional kernel and produces an output                  ing from Classical GapSVP. Proceedings of Advances in Cryptology-Crypto 7417
                                                                                 (08 2012). https://doi.org/10.1007/978-3-642-32009-5_50
image where each output pixel is the sum of the component-wise              [12] Zvika Brakerski and Vinod Vaikuntanathan. 2011. Efficient Fully Homomorphic
product of the kernel and a region of input image. While one can                 Encryption from (Standard) LWE. In IEEE 52nd Annual Symposium on Foundations
                                                                                 of Computer Science, FOCS 2011, Palm Springs, CA, USA, October 22-25, 2011, Rafail
describe this operation using two-dimensional Fourier transforms,                Ostrovsky (Ed.). IEEE Computer Society, 97–106. https://doi.org/10.1109/FOCS.
one of the most efficient implementations of secure convolution                  2011.12
is to use VOLE. In particular, one can define a VOLE correlation            [13] Melissa Chase, Yevgeniy Dodis, Yuval Ishai, Daniel Kraschewski, Tianren Liu,
                                                                                 Rafail Ostrovsky, and Vinod Vaikuntanathan. 2019. Reusable Non-Interactive
for each element in the kernel and then perform a scalar-vector                  Secure Computation. In Advances in Cryptology – CRYPTO 2019, Alexandra
product with the elements of the image multiplied by the specific                Boldyreva and Daniele Micciancio (Eds.). Springer International Publishing,
kernel element. Even for large neural networks such as networks                  Cham, 462–488.
                                                                            [14] Leo de Castro, Chiraag Juvekar, and Vinod Vaikuntanathan. 2020. Fast Vector
that process mammograms for cancer diagnosis [35], the size of a                 Oblivious Linear Evaluation from Ring Learning with Errors. Cryptology ePrint
convolution does not exceed the size of the input image, which is                Archive, Report 2020/685. https://ia.cr/2020/685.
                                                                            [15] Léo Ducas and Damien Stehlé. 2016. Sanitization of FHE Ciphertexts. In Advances
2028 × 2028 = 222 pixels. Based on the benchmarks in section 5.4,                in Cryptology - EUROCRYPT 2016 - 35th Annual International Conference on the
we would outperform all prior works for the generation of these                  Theory and Applications of Cryptographic Techniques, Vienna, Austria, May 8-
VOLE correlations.                                                               12, 2016, Proceedings, Part I (Lecture Notes in Computer Science, Vol. 9665), Marc
                                                                                 Fischlin and Jean-Sébastien Coron (Eds.). Springer, 294–310. https://doi.org/10.
                                                                                 1007/978-3-662-49890-3_12
ACKNOWLEDGMENTS                                                             [16] Nagarjun Dwarakanath and Steven Galbraith. 2014. Sampling from discrete
                                                                                 Gaussians for lattice-based cryptography on a constrained device. Applicable
We would like to thank Michael Nielsen for providing the code                    Algebra in Engineering, Communication and Computing 25 (06 2014). https:
for [3] and Yuriy Polyakov for answering our questions about the                 //doi.org/10.1007/s00200-014-0218-3


                                                                       40
Session 2                                                                                          WAHC ’21, November 15, 2021, Virtual Event, Republic of Korea


[17] Junfeng Fan and Frederik Vercauteren. 2012. Somewhat practical fully homomor-                let Ω𝑅 = Z𝑛𝑝 . The function 𝑓𝑆 (𝜉𝑆 , 𝜉𝑅 ) = ⊥ for all (𝜉𝑆 , 𝜉𝑅 ) ∈ Ξ𝑆 × Ξ𝑅 .
     phic encryption. https://eprint.iacr.org/2012/144.pdf                                        For 𝜉𝑆 = (𝜶 , 𝜷) and 𝜉𝑅 = 𝑥, define 𝑓𝑅 (𝜉𝑆 , 𝜉𝑅 ) = 𝜶 · 𝑥 + 𝜷 mod 𝑝.
[18] Craig Gentry. 2009. Fully homomorphic encryption using ideal lattices. STOC.
[19] Craig Gentry, Shai Halevi, and Vinod Vaikuntanathan. 2010. i-Hop Homomorphic
     Encryption and Rerandomizable Yao Circuits. In Advances in Cryptology - CRYPTO
                                                                                                      Definition A.2 (Batch Oblivious Linear Evaluation). For an
     2010, 30th Annual Cryptology Conference, Santa Barbara, CA, USA, August 15-19,               integer 𝑝 and 𝑛, define the protocol BOLE (𝑝,𝑛) to have the following
     2010. Proceedings (Lecture Notes in Computer Science, Vol. 6223), Tal Rabin (Ed.).           ideal functionality. Let Ξ𝑆 = Z𝑛𝑝 × Z𝑛𝑝 , let Ω𝑆 = ∅, let Ξ𝑅 = Z𝑛𝑝 , and
     Springer, 155–172. https://doi.org/10.1007/978-3-642-14623-7_9
[20] Shai Halevi, Yuriy Polyakov, and Victor Shoup. 2019. An Improved RNS Variant of              let Ω𝑅 = Z𝑛𝑝 . The function 𝑓𝑆 (𝜉𝑆 , 𝜉𝑅 ) = ⊥ for all (𝜉𝑆 , 𝜉𝑅 ) ∈ Ξ𝑆 × Ξ𝑅 .
     the BFV Homomorphic Encryption Scheme. In Topics in Cryptology – CT-RSA 2019                 For 𝜉𝑆 = (𝜶 , 𝜷) and 𝜉𝑅 = x, define 𝑓𝑅 (𝜉𝑆 , 𝜉𝑅 ) = 𝜶 ⊙ x + 𝜷 mod 𝑝.
     - The Cryptographers’ Track at the RSA Conference 2019, Proceedings (Lecture Notes
     in Computer Science (including subseries Lecture Notes in Artificial Intelligence and
     Lecture Notes in Bioinformatics)), Mitsuru Matsui (Ed.). Springer-Verlag, 83–105.            A.2     Ring Learning with Errors
     https://doi.org/10.1007/978-3-030-12612-4_5
[21] Carmit Hazay, Yuval Ishai, Antonio Marcedone, and Muthuramakrishnan Venki-
                                                                                                  In this section, we define the decisional Ring Learning with Errors
     tasubramaniam. 2019. LevioSA: Lightweight Secure Arithmetic Computation. In                  (RLWE) [27] problem.
     Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications
     Security (London, United Kingdom) (CCS ’19). Association for Computing Ma-                      Definition A.3 (Decisional Ring Learning with Errors [27]).
     chinery, New York, NY, USA, 327–344. https://doi.org/10.1145/3319535.3354258                 For integers 𝑛 and 𝑞, a ring R𝑞 = Z𝑞 [𝑥]/(𝑥 𝑛 + 1), and an error
[22] Yuval Ishai and Anat Paskin. 2007. Evaluating Branching Programs on Encrypted
     Data. In Theory of Cryptography, Salil P. Vadhan (Ed.). Springer Berlin Heidelberg,          distribution 𝜒 over R𝑞 , the decisional Ring Learning with Errors
     Berlin, Heidelberg, 575–594.                                                                 problem RLWE𝑛,𝑞,𝜒 is to distinguish between samples of the form
[23] Yuval Ishai, Manoj Prabhakaran, and Amit Sahai. 2008. Founding Cryptography                                                      $                $
     on Oblivious Transfer – Efficiently. 572–591. https://doi.org/10.1007/978-3-540-             (𝑎, 𝑎𝑠 + 𝑒) and (𝑎, 𝑢) where 𝑎, 𝑢 ←
                                                                                                                                    − R𝑞 and 𝑠, 𝑒 ←
                                                                                                                                                  − 𝜒.
     85174-5_32
[24] Chiraag Juvekar, Vinod Vaikuntanathan, and Anantha Chandrakasan. 2018.                          In this work, we consider 𝜒 to be the discrete, zero-centered
     GAZELLE: A Low Latency Framework for Secure Neural Network Inference. In
     27th USENIX Security Symposium (USENIX Security 18). USENIX Association, Bal-
                                                                                                  Gaussian distribution over R𝑞 . This follows the homomorphic en-
     timore, MD, 1651–1669. https://www.usenix.org/conference/usenixsecurity18/                   cryption security standard [2] from which we get the concrete
     presentation/juvekar                                                                         parameters used in the implementation in section 5.
[25] Kim Laine. 2019. https://github.com/microsoft/SEAL/issues/89.
[26] Vadim Lyubashevsky and Daniele Micciancio. 2006. Generalized Compact Knap-
     sacks Are Collision Resistant. In Automata, Languages and Programming, Michele               A.3     Homomorphic Encryption
     Bugliesi, Bart Preneel, Vladimiro Sassone, and Ingo Wegener (Eds.). Springer
     Berlin Heidelberg, Berlin, Heidelberg, 144–155.                                              In this section, we give high level definitions for the algorithms
[27] Vadim Lyubashevsky, Chris Peikert, and Oded Regev. 2010. On Ideal Lattices and               comprising a leveled homomorphic encryption scheme as well as
     Learning with Errors Over Rings. EUROCRYPT (2010).                                           necessary security definitions.
[28] Moni Naor and Benny Pinkas. 2006. Oblivious Polynomial Evaluation. SIAM J.
     Comput. 35 (01 2006), 1254–1281. https://doi.org/10.1137/S0097539704383633                      Definition A.4 (Leveled Homomorphic Encryption). A lev-
[29] Zhiniang Peng. 2019. Danger of using fully homomorphic encryption: A look at
     Microsoft SEAL. CoRR abs/1906.07127 (2019). arXiv:1906.07127 http://arxiv.org/               eled homomorphic encryption scheme
     abs/1906.07127
[30] Y. Polyakov, K. Rohloff, and G.W Ryan. [n.d.]. PALISADE lattice cryptography                                 E = (KeyGen, Encrypt, Eval, Decrypt)
     library. https://git.njit.edu/palisade/PALISADE.
[31] Phillipp Schoppmann, Adrià Gascón, Leonie Reichert, and Mariana Raykova.
                                                                                                  is a set of PPT algorithms defined as follows:
     2019. Distributed Vector-OLE: Improved Constructions and Implementation.                          • KeyGen(1𝜆 , 1𝐿 ) → (sk, pk, evk)
     https://doi.org/10.1145/3319535.3363228
[32] SEAL 2020. Microsoft SEAL (release 3.5). https://github.com/Microsoft/SEAL.                          Given the security parameter 𝜆 and a maximum circuit depth
     Microsoft Research, Redmond, WA.                                                                     𝐿, outputs a key pair consisting of a public encryption key pk,
[33] Victor Shoup. [n.d.]. NTL: A Library for doing Number Theory. https://shoup.                         a secret decryption key sk, and an evaluation key evk.
     net/ntl/.
[34] C. Weng, K. Yang, J. Katz, and X. Wang. 2021. Wolverine: Fast, Scalable, and                      • Encrypt(pk, 𝑚) → ct
     Communication-Efficient Zero-Knowledge Proofs for Boolean and Arithmetic                             Given a message 𝑚 ∈ M and an encryption key pk, outputs a
     Circuits. In 2021 2021 IEEE Symposium on Security and Privacy (SP). IEEE Computer
     Society, Los Alamitos, CA, USA, 1074–1091. https://doi.org/10.1109/SP40001.
                                                                                                          ciphertext ct.
     2021.00056                                                                                        • Eval(evk, 𝑓 , ct1, ct2, . . . , ct𝑛 ) → ct ′
[35] Adam Yala, Constance Lehman, Tal Schuster, Tally Portnoi, and Regina Barzilay.                       Given the evaluation key, a description of a function 𝑓 : M𝑛 →
     2019. A Deep Learning Mammography-based Model for Improved Breast Cancer
     Risk Prediction. Radiology 292 (05 2019), 182716. https://doi.org/10.1148/radiol.                    M with multiplicative depth at most 𝐿, and 𝑛 ciphertexts
     2019182716                                                                                           encrypting messages 𝑚 1, . . . , 𝑚𝑛 , outputs the result ciphertext
                                                                                                          ct ′ encrypting 𝑚 ′ = 𝑓 (𝑚 1, . . . , 𝑚𝑛 ).
                                                                                                       • Decrypt(sk, ct) = 𝑚 Given the secret decryption key and a
A      FURTHER BACKGROUND                                                                                 ciphertext ct encrypting 𝑚, outputs 𝑚.
In this appendix, we give further background omitted from section
2 due to space constraints.                                                                          Optionally the scheme E may be extended with a PPT algorithm
                                                                                                  EncryptSK(sk, 𝑚) → ct which uses the secret key sk rather than
                                                                                                  the public key pk, to compute the ciphertext ct from the message
A.1       Oblivious Linear Evaluation                                                             𝑚.
We now give formal definitions of vector OLE and batch OLE, along                                    When returning a ciphertext output by the Eval function, it is
with security definitions.                                                                        often desirable for this ciphertext to hide the function 𝑓 that was
                                                                                                  used to produce it. This property of the scheme is called circuit
   Definition A.1 (Vector Oblivious Linear Evaluation). For                                       privacy, which we formally define below.
an integer 𝑝 and 𝑛, define the protocol VOLE (𝑝,𝑛) to have the following
ideal functionality. Let Ξ𝑆 = Z𝑛𝑝 × Z𝑛𝑝 , let Ω𝑆 = ∅, let Ξ𝑅 = Z𝑝 , and


                                                                                             41