Title: Paper notes: CryptOpt
Date: 2023-12-01 12:30

- Full title: CryptOpt: Verified Compilation with Randomized Program Search for Cryptographic Primitives
- PDF: [arXiv](https://arxiv.org/abs/2211.10665) ([local mirror]({static}/files/papers/cryptopt.pdf))
- Authors: Joel Kuepper, Andres Erbsen, Jason Gross, Owen Conoly, Chuyue Sun, Samuel Tian, David Wu, Adam Chlipala, Chitchanok Chuengsatiansup, Daniel Genkin, Markus Wagner, Yuval Yarom

Cryptography is hard, high-performance one even more so: formal proof of
assembly implementations is horrible to model, and code generation from 
formal proofs are hard to lower to high-performance assembly. The core idea of
CryptOpt is to treat this as a black box combinatorial optimization problem,
and bruteforce possible solutions in a smart way against an oracle.

More precisely:

1. start from a known-correct implementation in
  [fiat-crypto](https://github.com/mit-plv/fiat-crypto) (a
  coq-powered high-level to low-level IR proven translator) low-level IR;
2. lower it via a fuzzer-like machinery replacing/reordering operands
   applying semantics-and-data-constrains-preserving transformations, which has an acceptable
   search space because:
    - it's straight-line no-aliasing constant-offset-pointers assembly;
    - transformations can be templatised, eg. `add ≍ clc; adcx`;
3. lift the resulting x64 assembly to fiat-crypto low-level IR;
4. use a custom [e-graph](https://en.wikipedia.org/wiki/E-graph) based 
  *equivalence checker* implemented as a mix between an SMT solver and a symbolic-execution engine;
5. if the new implementation is correct, benchmark it against the current;
   fastest one, and keep it if it's outperforming it.
6. `goto 2`.

This approach has a couple of advantages:

- fuzzers are cheaper than highly specialised engineering time
- porting implementations to new hardware is simply a matter of
  running CryptOpt on it.
- by lifting the assembly to fiat-crypto low-level IR,
  there is no need to write complex formal proofs,
  since fiat-crypto is already taking care of those.
- controlling the mutations allows to ensure that
  the implementation stays side-channel free.

The main issue though, is that one needs to formally implement 
whatever algorithm to optimize in fiat-crypto, which is not that easy (and
which the authors of the paper didn't do for libsecp256k1).

Implementation-wise, the author ran 200k mutations, with 20 initial candidates,
over 18 Fiat IR primitives, taking between 20 and 40 CPU hours. Interestingly,
since the equivalence-based verification is *slow* (between 0.1s and ~300s),
it's only done once at the end. They found out that "optimization progress is roughly logarithmic
in the number of mutations." CryptOpt generates code around 1.20 to 2.50 times
faster than gcc/clang for the same fiat-crypto generated C code. It's not
faster then OpenSSL (but offers formally verified correctness), but is
faster than libsecp256k1.

The paper was [presented]( https://iacr.org/submit/files/slides/2023/rwc/rwc2023/85/slides.pdf ) at [Real World Crypto 2023](https://rwc.iacr.org/2023/program.php),
and like all good one, it came with an [implementation]( https://github.com/0xADE1A1DE/CryptOpt )