Skip to content

Commit

Permalink
Merge pull request #5 from PyryL/week5
Browse files Browse the repository at this point in the history
Week5
  • Loading branch information
PyryL authored Dec 2, 2023
2 parents 0c5b3b0 + fbb6f79 commit ef2d933
Show file tree
Hide file tree
Showing 13 changed files with 331 additions and 41 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Implementation of [CRYSTALS-Kyber](https://pq-crystals.org/kyber/index.shtml) en
## Documentation

* [Requirements specification](docs/requirements.md)
* [Implementation]()
* [Implementation](docs/implementation.md)
* [Testing](docs/tests.md)
* [Usage guide](docs/usage.md)

Expand Down
79 changes: 79 additions & 0 deletions docs/implementation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Implementation

## Project structure

This project is divided into packages and modules that handle a small portion of the functionality. Packages have strict hierarchy that allows importing modules only from the same or lower-ranked package. Below is a diagram showing the import structure between packages.

```mermaid
classDiagram
encryption <|-- utilities
encryption <|-- constants
encryption <|-- entities
entities <|-- constants
utilities <|-- entities
utilities <|-- constants
ccakem <|-- utilities
ccakem <|-- entities
ccakem <|-- constants
ccakem <|-- encryption
class encryption {
encrypt
decrypt
keygen
}
class utilities {
byte_conversion
compression
encoding
cbd
parse
pseudo_random
round
}
class entities {
polring
}
class constants
class ccakem
```

Here are short descriptions of what is the purpose of each package:

* **utilities** package provides some basic functionalities, such as conversions, rounding and encoding, for higher-ranked modules to use.
* **entities** contains data structures.
* **constants** module has some fixed numerical values defined in the Kyber specification.
* **encryption** has capabilities for Kyber asymmetric encryption.
* **ccakem** has functions that utilize encryption and make Kyber a key-encapsulation mechanism.

## Utilities

Here are more in-depth descriptions of modules in utilities package.

### Byte conversion

Byte conversion module has some basic functions that integers and bit arrays to bytes, and vice versa.

### Compression

During the en/decryption the coefficients of polynomial ring are in modulo `q`, that is, between 0 and `q-1` (inclusive). When we transfer these polynomial rings, we can, however, reduce the size by downscaling these coefficients. That is done with compress and decompress functions.

### Encoding

Usually the polynomial rings are handled as `PolynomialRing` instances. We can not, however, send these instances over the Internet, so we have to encode them into byte arrays. At the other end, we need to recover polynomial ring from the byte array. This is done with encode and decode functions.

### Parse

Parse is a pseudo-random function that generates a specific type of polynomial ring instance from a random byte stream. This is different from decoding in that the input is byte stream instead of byte array, i.e., the number of bytes required to form the result is not known beforehand.

### CBD

This module provides a single function that deterministically produces a polynomial ring from byte array. Behavior of this function is quite similar to `parse` but in this case the length of the input byte array is fixed.

### Pseudo-random

This module includes multiple functions that use SHA-3 hash algorithm family to deterministically produce pseudo-random byte arrays from given seeds.

### Round

This small module provides a function that rounds floats in a "traditional" way, that is, ties rounded up instead of away from zero. For example, Python's built-in round function outputs `round(-3.5)=-4` whereas `normal_round(-3.5)=-3`.
2 changes: 1 addition & 1 deletion docs/requirements.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

### Problem to be solved

When two people want to communicate securely with each other using insecure network, they need to use encryption. One way of doing this would be to use asymmetric encryption, in which both would encrypt the payload with recipient's public key. However, asymmetric encryption is relatively slow when the payload gets longer. Therefore it is common to use asymmetric encryption to securely share a key that will then be used in faster asymmetric encryption. This is called key encapsulation mechanism, KEM [6].
When two people want to communicate securely with each other using insecure network, they need to use encryption. One way of doing this would be to use asymmetric encryption, in which both would encrypt the payload with recipient's public key. However, asymmetric encryption is relatively slow when the payload gets longer. Therefore it is common to use asymmetric encryption to securely share a key that will then be used in faster symmetric encryption. This is called key encapsulation mechanism, KEM [6].

Traditionally [Diffie–Hellman](https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange) method has been used for this [6], but because of Shor's algorithm it is thought not to be safe against powerful quantum computers. For this demand of quantum-resistant asymmetric encryption suitable for key-sharing was developed a new algorithm called CRYSTALS-Kyber. In 2022 National Institute of Standards and Technology (NIST) selected Kyber among three other algorithms to be the first post-quantum standards [4]. In August 2023 NIST released a candidate for the final standard [5] and this project is based on that.

Expand Down
50 changes: 27 additions & 23 deletions docs/tests.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,27 @@
# Tests

All functions and methods are tested individually with sample inputs. In addition to testing with correct inputs, code is tested to fail with invalid inputs. Outputs are tested to be in correct type and to match all criteria.
All functions and methods are tested individually with both sample and random inputs. In addition to testing with correct inputs, code is tested to fail with invalid inputs. Outputs are tested to be in correct type and to match all criteria.

In addition to unit tests, Kyber is also tested with some intergration tests. That is, a function and its inverse function are called consecutively and the output is checked to equal the original input.

Current test report:

```
tests/test_byte_conversion.py ........ [ 15%]
tests/test_cbd.py .... [ 22%]
tests/test_ccakem.py .... [ 30%]
tests/test_compression.py ...... [ 41%]
tests/test_decrypt.py ... [ 47%]
tests/test_encoding.py ........ [ 62%]
tests/test_encrypt.py ... [ 67%]
tests/test_encryption.py . [ 69%]
tests/test_key_generation.py .. [ 73%]
tests/test_modulo.py .. [ 77%]
tests/test_parse.py ... [ 83%]
tests/test_pseudo_random.py ........ [ 98%]
tests/test_round.py . [100%]
=============== 53 passed in 0.33s ================
tests/test_byte_conversion.py ........ [ 11%]
tests/test_cbd.py ..... [ 18%]
tests/test_ccakem.py ..... [ 26%]
tests/test_compression.py ...... [ 34%]
tests/test_decrypt.py ... [ 39%]
tests/test_encoding.py ....... [ 49%]
tests/test_encrypt.py ... [ 53%]
tests/test_encryption.py . [ 55%]
tests/test_key_generation.py .. [ 57%]
tests/test_parse.py ... [ 62%]
tests/test_polring.py ........ [ 73%]
tests/test_pseudo_random.py ................. [ 98%]
tests/test_round.py . [100%]
=================== 69 passed in 5.08s ===================
```

Tests can be run with `poetry run invoke test`.
Expand All @@ -37,25 +37,29 @@ Name Stmts Miss Branch BrPart Cover
------------------------------------------------------------------
kyber/ccakem.py 34 0 2 0 100%
kyber/constants.py 14 0 0 0 100%
kyber/encryption/decrypt.py 26 0 6 0 100%
kyber/encryption/encrypt.py 57 0 12 0 100%
kyber/encryption/keygen.py 35 0 8 0 100%
kyber/encryption/decrypt.py 24 0 6 0 100%
kyber/encryption/encrypt.py 51 0 12 0 100%
kyber/encryption/keygen.py 31 0 8 0 100%
kyber/entities/polring.py 50 0 26 0 100%
kyber/utils/byte_conversion.py 23 0 12 0 100%
kyber/utils/cbd.py 16 0 6 0 100%
kyber/utils/compression.py 23 0 8 0 100%
kyber/utils/encoding.py 36 0 22 0 100%
kyber/utils/modulo.py 17 0 8 0 100%
kyber/utils/encoding.py 34 0 20 0 100%
kyber/utils/parse.py 22 0 6 0 100%
kyber/utils/pseudo_random.py 23 0 0 0 100%
kyber/utils/round.py 5 0 2 0 100%
------------------------------------------------------------------
TOTAL 331 0 92 0 100%
TOTAL 350 0 108 0 100%
```

For more detailed report, run `poetry run invoke coverage-report` and then open `htmlcov/index.html`.

## Performance tests

The asymmetric encryption part of Kyber (CPAPKE in the specification document) only works with fixed-lengthed input, but we can split larger payload into 32-byte chunks and encrypt them separately. Ciphertexts can be concatenated and the whole process can be reversed during decryption. Using this method, encryption is tested with about 10 kibibytes of random payload.
The asymmetric encryption part of Kyber (called CPAPKE in the specification document) only works with fixed-lengthed input, but we can split larger payload into 32-byte chunks and encrypt them separately. Ciphertexts can be concatenated and the whole process can be reversed during decryption. Using this method, encryption is tested in `perf_tests/test_encryption.py` with about 10 kibibytes of random payload.

End-to-end process of Kyber handshake is iterated a couple of hundred times in `perf_tests/test_ccakem.py`.

In addition, there is an illustrative and comparable test in `perf_tests/test_aes_integration.py` that integrates Kyber with AES encryption to point out how much faster it is to use key encapsulation mechanism instead of asymmetric encryption.

Performance tests can be run with `poetry run invoke performance`.
42 changes: 34 additions & 8 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,28 +10,54 @@ poetry install

## Usage

Currently `kyber` provides three main functions that can be used directly from Python code. A sample usage is included in `main.py`.
Kyber has three main functions: key generation, encrypt and decrypt. Below is a diagram showing the workflow of key exchange.

```mermaid
sequenceDiagram
Note over Alice: private_key, public_key = generate_keys()
Alice->>Bob: public_key
Note over Bob: ciphertext, shared_secret = encrypt(public_key)
Bob->>Alice: ciphertext
Note over Alice: shared_secret = decrypt(private_key, ciphertext)
rect rgb(252, 132, 113)
loop Transfer the actual payload*
Note over Alice: payload_cipher = AESEncrypt(payload, key=shared_secret)
Alice->>Bob: payload_cipher
Note over Bob: payload = AESDecrypt(payload_cipher, key=shared_secret)
end
end
```

*) Section called **transfer the actual payload** is out of Kyber's scope. It is here just to illustrate how the shared secret generated by Kyber can be used with symmetric encryption algorithm (such as AES) to securely transfer the payload. To see a working example of how `kyber` is used with AES, take a look at `perf_tests/test_aes_integration.py`.

#### In Python

`kyber` package can be used directly from Python code. A sample usage is included in `main.py`.

#### CLI

Kyber can also be used via command-line interface that can be accessed with `poetry run python cli.py`. It has four subcommands: `keygen`, `pubkey`, `encrypt` and `decrypt`. Run any subcommand with `-h` flag to get help. Below is a usage example:

First, Alice generates a private key and extracts its public key to a separate file.

```
# Alice
poetry run python cli.py keygen private.txt
poetry run python cli.py pubkey --output public.txt private.txt
```

# Alice sends her public.txt file to Bob
After Bob has received Alice's public key, Bob can generate a random shared secret and encrypt it to ciphertext.

# Bob
```
poetry run python cli.py encrypt --key alice_public.txt --secret secret.txt --cipher cipher.txt
```

# Bob sends his cipher.txt file to Alice
When Alice receives Bob's ciphertext, Alice can decrypt it to obtain the same shared secret as Bob has.

# Alice
```
poetry run python cli.py decrypt --key private.txt --output secret.txt bob_cipher.txt
```

In the first line Alice generates herself a private key. On the second line she generates a public key matching the freshly-generated private key, after which she sends this public key to Bob. On the third line Bob encrypts a random shared secret with Alice's public key, after which he sends the ciphertext to Alice. On the last line Alice decrypts the ciphertext with her private key. At the end, both Alice and Bob have a file called `secret.txt` that contain the same shared secret.

### Tests

Unit tests can be run with
Expand Down
7 changes: 7 additions & 0 deletions docs/week-5.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Week 5

_27.11. – 3.12.2023_

This week I improved and added unit tests to lift branch coverage back to 100% (from 99% last week). I also added new performance tests. Documentation went through a process where I improved explanation and added the last missing one, implementation document. Finally, I had some time to response some of the issues that the first peer review pointed out.

Total working time: 5 hours
4 changes: 4 additions & 0 deletions perf_tests/__main__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
from perf_tests.test_encryption import runner as encryption_test_runner
from perf_tests.test_ccakem import runner as ccakem_test_runner
from perf_tests.test_aes_integration import runner as aes_integration_test_runner

encryption_test_runner()
ccakem_test_runner()
aes_integration_test_runner()
52 changes: 52 additions & 0 deletions perf_tests/test_aes_integration.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
from random import seed, randbytes
from time import time
from Crypto.Cipher import AES
from kyber.ccakem import ccakem_generate_keys, ccakem_encrypt, ccakem_decrypt

def run(payload: bytes) -> tuple[float, float, float]:
""":returns Durations of handshake and actual payload transfer in seconds as a tuple."""

t0 = time()

# Alice
private_key, public_key = ccakem_generate_keys()

# send public_key Alice->Bob

# Bob
ss_ciphertext, shared_secret1 = ccakem_encrypt(public_key)

# send ss_ciphertext Bob->Alice

# Alice
shared_secret2 = ccakem_decrypt(ss_ciphertext, private_key)

t1 = time()

# Alice
aes_cipher = AES.new(shared_secret2, AES.MODE_GCM)
payload_nonce = aes_cipher.nonce
payload_ciphertext, payload_tag = aes_cipher.encrypt_and_digest(payload)

# send payload_ciphertext, payload_tag and payload_nonce Alice->Bob

# Bob
aes_cipher = AES.new(shared_secret1, AES.MODE_GCM, nonce=payload_nonce)
decrypted_payload = aes_cipher.decrypt_and_verify(payload_ciphertext, payload_tag)

assert payload == decrypted_payload

return (t1-t0, time()-t1)

def runner():
seed(42)
payload = randbytes(100_000_000) # 100 megabytes
print("Starting AES integration performance test (about 3 seconds)")
durations = run(payload)
print("Results:")
print(f"Handshake: {durations[0]:.2f} sec")
print(f"Payload transfer: {durations[1]:.2f} sec")
print(f"Total: {sum(durations):.2f} sec")

if __name__ == "__main__":
runner()
39 changes: 39 additions & 0 deletions perf_tests/test_ccakem.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
from time import time
from kyber.ccakem import ccakem_generate_keys, ccakem_encrypt, ccakem_decrypt

def run_test() -> tuple[float, float, float]:
t0 = time()

private_key, public_key = ccakem_generate_keys()

t1 = time()

ciphertext, shared_secret1 = ccakem_encrypt(public_key)

t2 = time()

shared_secret2 = ccakem_decrypt(ciphertext, private_key)

t3 = time()

assert shared_secret1 == shared_secret2

return (t1-t0, t2-t1, t3-t2)

def runner():
print("Starting ccakem performance test (about 2 mins)")

test_iters = 250
averages = [0, 0, 0]

for _ in range(test_iters):
durations = run_test()
averages = [averages[i]+durations[i] for i in range(3)]

print("Results (averages):")
print(f"Keypair generation: {averages[0]/test_iters:.5f} sec")
print(f"Encryption: {averages[1]/test_iters:.5f} sec")
print(f"Decryption: {averages[2]/test_iters:.5f} sec")

if __name__ == "__main__":
runner()
9 changes: 8 additions & 1 deletion tests/test_cbd.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,18 @@
import unittest
from random import seed, randbytes
from random import seed,randbytes
from base64 import b64decode
from kyber.utils.cbd import cbd

class TestCBD(unittest.TestCase):
def setUp(self):
seed(42)

def test_cbd_returns_expected_polynomial_ring(self):
argument = b64decode("nXmxo38xgBzRGmcG+0DWvVdSaEaQO7E+3lYkOenBuCOpYIm8px89Gm0tPK2zZpy9UOFl5DQknYuCn0EWaYQql5kRA2zz6CIIbsqgB1pp/BeLqPg3GKqPO9H2XoFE5h2asw/LBqbBrY8pBucysQ9Nt4nTXqaMCIqz9kiBi6SmZWs=")
pol = cbd(argument, 2)
expected_result = [0,1,3328,0,3328,3328,0,3327,3328,0,3327,3328,1,0,3328,2,1,3328,3328,0,0,3328,0,0,0,3328,1,0,1,0,3328,1,0,3328,0,3328,0,1,1,0,0,0,3327,3328,3328,3328,3327,1,1,1,0,0,3328,1,3327,0,1,0,2,3328,3328,1,3328,3327,0,0,0,0,1,0,3328,2,0,3328,3328,0,3327,1,3328,0,0,1,3328,1,3327,2,0,1,3328,3327,0,0,0,2,3328,1,0,0,1,3328,0,0,1,1,3327,1,3328,1,0,1,1,3328,1,3328,0,0,1,3328,3328,0,0,0,1,1,3328,0,0,3328,0,0,3328,3328,0,3327,0,2,0,3327,1,1,3328,3328,0,1,0,1,2,0,0,0,0,3328,0,0,0,0,0,2,3328,3328,1,3328,0,1,0,1,3327,3328,3328,1,0,0,1,0,3327,3328,1,3328,0,0,0,1,1,3328,1,1,1,0,3328,1,0,0,3328,3327,0,0,2,3328,0,0,0,0,2,3328,0,1,1,0,3328,0,0,0,1,3328,3327,3328,3328,3328,0,0,1,1,3328,3328,1,0,1,3327,0,1,0,0,1,2,0,1,1,0,3328,3327,0,0,1,1,1,3328,1,3328,0,1,0,0,0,0,0,3328]
self.assertEqual(pol.coefs, expected_result)

def test_cbd_returns_same_result_with_same_arguments(self):
eta = 5
argument = randbytes(320) # 64*eta
Expand Down
12 changes: 7 additions & 5 deletions tests/test_compression.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,13 @@

class TestCompression(unittest.TestCase):
def test_compression_symmetry(self):
# test that polynomial ring does not change when it is compressed and decompressed
seed(42)
polynomial = PolynomialRing([randint(0, 2047) for _ in range(256)])
decompressed = decompress(polynomial, 11)
compressed = compress([decompressed], 11)[0]
self.assertListEqual(list(polynomial.coefs), list(compressed.coefs))
for _ in range(100):
polynomial = PolynomialRing([randint(0, 2047) for _ in range(256)])
decompressed = decompress(polynomial, 11)
compressed = compress([decompressed], 11)[0]
self.assertListEqual(list(polynomial.coefs), list(compressed.coefs))

def test_compression(self):
polynomial = PolynomialRing([416, 2913, 0, 1248])
Expand Down Expand Up @@ -40,4 +42,4 @@ def test_decompression_raises_with_too_large_coefficient(self):
# coefficient should not be greather than 2**d-1 = 7
polynomial = PolynomialRing([2, 8, 3])
with self.assertRaises(ValueError):
decompress(polynomial, 3)
decompress(polynomial, 3)#
Loading

0 comments on commit ef2d933

Please sign in to comment.