Recursive: cryptography (2) - RSA

2021-05-26

0. 前言

Reference: 9.5

数学学完了，现在来看 RSA 算法吧~

1. Public-key Encryption Algorithm

RSA 是一种 public-key encryption algorithm，也是比较容易理解的一种。

Public-key cryptography was invented by Whitfield Diffie and Martin Hellman and independently by Ralph Merkle.

公钥加密的关键在于每次生成 key 都是成对的，一个是 encryption key（加密钥匙 - public key），用来明文加密；还有一个是 decryption key（解密钥匙 - private key），用来把加密的文本解密还原成明文。

被 public key 加密之后的信息只能由 private key 解密，被 private key 加密的信息只能由 public key 解密。

The keys only work one way so that a message encrypted with the private key can only be decrypted with the public key, and vice versa.

2. RSA 为什么安全？

RSA 的 key pair 是由一个很大的质数（prime number）生成的（100-200 digits）。

我们可以用 python 实现这个加密算法。

步骤

选择两个大的质数，p 和 q，然后两数相乘得到结果 n, n = p x q
随便选一个 encrytion key e，这个 e 必须和 (p-1)x(q-1) 互为质数（最大公约数 = 1）。 𝑔𝑐𝑑(𝑒,(𝑝−1)×(𝑞−1))=1
最后，解密 key d 就是 e 对 (p-1)x(q-1) 的模倒数

e 和 n 一起组成 public key，d 是 private key。

当我们得到 n 和 e 之后，p 和 q 就不需要用到了（但我们不能告诉别人 p 和 q 是什么）。

如何加密/解密信息

用这个等式来加密：𝑐=$𝑚^𝑒$(mod𝑛)

解密信息用这个等式：𝑚=$𝑐^𝑑$(mod𝑛)

d 是 𝑒(mod𝑛) 的模倒数，所以下面的等式成立。

$𝑐^𝑑$ =($𝑚^𝑒$)𝑑(mod𝑛) =$𝑚^{𝑒𝑑}$(mod𝑛) =$𝑚^1$(mod𝑛) =𝑚(mod𝑛)

怎么把文字转换成数字？

我们可以用 ASCII 编码来进行转换，不过因为 ASCII 编码的 decimal version（十进制版本）的每个字母长度不同，所以我们用十六进制版本来进行转换，因为每两个十六进制的数字代表 a single byte or character。

However, since the decimal versions of the numbers of the ASCII values vary in the number of digits needed to represent them we will use the hexadecimal numbers where we know very reliably that two hexadecimal digits represent a single byte or character.

比如 ‘hello world’ 可以这样 map（上面是10进制，下面是16进制）：

然后我们把16进制连在一起，再转成10进制。

hex = 68656c6c6f20776f726c64

decimal =126207244316550804821666916

信息的分解（break to chunks）

大家可以看到，我们在进行ASCII转换之后得到了一个很大的数字。

在实际的 RSA 加密中，人们往往会把信息分解成 smaller chunks，然后再对每个 chunk 进行加密，这样的操作有两个原因：

performance：即使我们只有 1k 的文本，转换成数字也有 2000-3000 个 digits，这是一个很大的数字。
满足 restriction 𝑚≤𝑛，我们必须保证这个信息是 unique representation of modulo n。With binary data, choose the largest power of two less than 𝑛.

3. RSA 加密举例

比如，p = 5563，q = 8191，n = 5563 x 8191 = 45,566,533，message = “hello world”。

To keep the integer value of our chunks less than 𝑚, we will divide up our word into chunks that use less than the number of bytes needed to represent 𝑛.

我们可以用 bit_length method 来得到这个值，然后除以 8 得到 number of bytes，因为每个 character 都能用一个 byte 来表示，所以 number of bytes = number of characters。

Conveniently, this lets us simply break the message up into chunks of n characters and convert the hexadecimal representation of each chunk into an integer.

在这个例子里我们可以用 26 bits 来代表 45,566,533。

分块

26 除以 8 得到 3 余 2，我们应该把信息分成 chunks of 3 characters。

分好之后 h e l 这三个字符成为第一组，他们的 16 进制分别是 68，65，6c，我们得到一个 hex number: 68656c，然后转换成 10 进制 6841708。

m_1 = 6841708

m_2 = 7106336

m_3 = 7827314

m_4 = 27748

Note, breaking the message into chunks can be very tricky, in particular when the result of applying the RSA transformation to a chunk produces a number that is less than seven digits long. In this case we need to be careful to add a leading zero to the result when we go to glue our chunks back together again.

假如我们在分块的时候遇到了一个长度小于 7 digits 的数字，那么我们需要在结果上增加 leading zero，然后再把这些 chunk 合并在一起（比如 m4）。

选择 encrytion key `e`

e 需要满足以下两个条件：

We can select values randomly and use the GCD algorithm to test them against (𝑝−1)×(𝑞−1)=45552780.
Remember that we are looking for an 𝑒 that is relatively prime to 45,552,780. The number 1,471 will work nicely for this example.

代入算式：

𝑑 = 𝑒𝑥𝑡_𝑔𝑐𝑑(45552780,1471)

= −11705609

= 45552780−11705609

= 33847171

下面就加密第一个 chunk 的信息吧！（用 e）

𝑐=$6841708^{1471}$(mod45566533)=16310024

解密信息（用 d）

𝑚=$16310024^{33847171}$(mod45566533)=6841708

4. Python Code of RSA

gen_keys takes in p and q, then creates a public and private key pair
encrypt takes a message, public key and n, returns an encryted version of the message
decrypt takes the encrypted message, the private key, and 𝑛 and returns the original message.

def gen_keys(p, q):
    n = p * q
    m = (p - 1) * (q - 1)
    e = int(random.random() * n)
    while gcd(m, e) != 1:
        e = int(random.random() * n)
    d, a, b = ext_gcd(m, e)
    if b < 0:
        d = m + b
    else:
        d = b
    return (e, d, n)


def encrypt(msg, e, n):
    chunk_size = n.bit_length() // 8
    all_chunks = str_to_chunks(msg, chunk_size)
    return [
        modexp(msg_chunk, e, n)
        for msg_chunk in all_chunks
    ]


def decrypt(cipher_chunks, d, n):
    chunk_size = n.bit_length() // 8
    plain_chunks = [
        modexp(cipher_chunk, d, n)
        for cipher_chunk in cipher_chunks
    ]
    return chunks_to_str(plain_chunks, chunk_size)

sample usage

>>> msg = "Python"
>>> e, d, n = gen_keys(5563, 8191)
>>> print(e, d, n)
2646697 33043453 45566533
>>> c = encrypt(msg, e, n)
>>> print(c)
[22810070, 18852325, 34390906, 22805081]
>>> m = decrypt(c, d, n)
>>> print(m)
Python
>>>

Helper function (break into chunks)

用到了 python 里面一个叫做 bytearray 的 feature，来存储 string。

A bytearray allows us to store any string as a sequence of bytes.

我们可以方便地把 string 转换成 hex digits，然后把 hex digits 再转换回原来的 string。

def str_to_chunks(msg, chunk_size):
    msg_bytes = bytes(msg, "utf-8")
    # 填补 0
    hex_str = "".join([f"{b:02x}" for b in msg_bytes])
    num_chunks = len(hex_str) // chunk_size
    chunk_list = []
    # 分块
    for i in range(
        0, num_chunks * chunk_size + 1, chunk_size
    ):
        chunk_list.append(hex_str[i : i + chunk_size])
    chunk_list = [
        eval("0x" + x) for x in chunk_list if x
    ]
    return chunk_list


def chunks_to_str(chunk_list, chunk_size):
    hex_list = []
    for chunk in chunk_list:
        hex_str = hex(chunk)[2:]
        clen = len(hex_str)
        hex_list.append(
            "0" * ((chunk_size - clen) % 2) + hex_str
        )

    hstring = "".join(hex_list)
    msg_array = bytearray.fromhex(hstring)
    return msg_array.decode("utf-8")

str_to_chunks 解析

我们必须保证 hex number 对应一个 2 digits long 的 character，有时候我们可能需要加一个 leading zero；

We can do this easily by using the string formatting expression f"{b:02x}" ，它会创建一个正好满足 2 character long 的 string，并且在必要的情况下填补 leading 0；
当我们有了一个完整的 hex string 之后就开始分块（for loop）；
最后，我们用 eval function 和 list comprehension 把 hex 数字转换成 integer eval("0x" + x) for x in chunk_list if x

chunks_to_str 解析

把每个 chunk 都转换成 hex，然后连起来成为一个长的 hex；
把长的 hex 转换成一个 bytearray；
把 bytearray 转成 string；

The bytearray has a built-in decode function to turn the bytearray into a string.
唯一一个难点就是在 transformation 的过程中，chunk 的数字可能会比 original number 要小很多；在这个情况下我们要加 leading 0，保证每个 chunk 的长度一致，然后再把它们合并起来

"0" * ((chunk_size) - clen) % 2)

chunk_size represents the number of digits that should be present in the string and clen represents the actual number.

5. 结语

这个 RSA 算法跟之前学的最大公约数，模倒数都关系很密切。

我也是大概只看懂了 70%，数学的部分还要回顾一下。

这本算法书就剩下最后一个新的章节啦！9.6. Graphs Revisited: Pattern Matching

胜利在望，明天继续！