Salt#

ToDo:

  • Add illustration for the salting process - Similar to this and this.

  • Explain the concept of Key Derivation Function - Similar to this and this.

  • Add resources to other salt compatible algorithms - Argon2 for instance.

  • Add appendix regarding read/write hashes to database.

  • Add relevant resources at the end


In the case of data that tend to be similar for different users (e.g. common used passwords), if the database with all hashes is compromised, an attacker could see if any of the hashes corresponds with a hash from a pre-computed table, examples of this are Rainbow Attacks and Dictionary Attacks.

If a random array of bytes is added to the data to be hashed, then the results will not match any pre-computed table attackers might have. This also implies the array of random bytes should be provided along with the hash in order to validate the hash is correct. These bytes are called salt and they are not secret (i.e. they can be store in plain text). Even though one might think that salts should be kept secret, there are other vulnerabilities associated with secret salts, such as Length Extension Attacks

Scrypt#

The hashlib module has the function scrypt, which provides a convinient interface to add salts to hashes. This function is an implementation of the RFC 7914, it was designed to be memory-intensive and prevent GPU, ASIC and FPGA attacks.

Note: Salt could be of any lenght, however a minimum of 128bits (10 bytes) is recommended.

import secrets
import hashlib
random_bytes = secrets.token_bytes(10)
data = b"Hello World!"
data_hashed = hashlib.scrypt(data, salt=random_bytes, n=64, r=8, p=1).hex()
salt_string = random_bytes.hex()

print(f"   Original: {data}")
print(f"     Hashed: {data_hashed}")
print(f"Salt+Hashed: {salt_string}:{data_hashed}")
   Original: b'Hello World!'
     Hashed: 83e550df8eea0b3e73bcafcff2d8bddded69a3470a334ab8aff36896c0956dfa9c397ba5e38d895de7ed8877d03cac56fc0d3efae79de96d0c493ac18561ec0e
Salt+Hashed: 0c879eb784b548b3d795:83e550df8eea0b3e73bcafcff2d8bddded69a3470a334ab8aff36896c0956dfa9c397ba5e38d895de7ed8877d03cac56fc0d3efae79de96d0c493ac18561ec0e

Scrypt Disadvantages#

The hashed string depends on several parameters, n, r, p and the salt. Unless they are hardcoded in the source code it is a good practice to store them along with the hash.

The delimiter character is usually either “$” or “:”. Other options include using several columns in the database. Note that neither of the two are URL Safe, this is sensible because hashes should never be part of a URL

random_bytes = secrets.token_bytes(10)
data = b"Hello World!"

n = 2 ** 6
r = 8
p = 1
data_hashed = hashlib.scrypt(data, salt=random_bytes, n=n, r=r, p=p).hex()
salt_string = random_bytes.hex()

print(f"n+r+pSalt+Hashed: {n}${r}${p}${salt_string}${data_hashed}")
n+r+pSalt+Hashed: 64$8$1$919ab62c1751e5175200$e1f80b783dcb4187331d04e66019fbb872a174e5e0e1ebb723aaccc2e92f36dd4eaf89fedfcd2a269fab4db03dcf894a4f9c56723db62ec8ace84792cc52e8d9

Scrypt Bonuses#

Using the scrypt function also provides some additional advantages

Customizing Length#

There is an additional parameter, dklen, which allows to generate arbitrary long passwords, the default value is 64.

Important Note: Using short lenghts will increase collision risks

data = b"Hello World!"
lenght = 2**2
random_bytes = secrets.token_bytes(10)

data_hashed = hashlib.scrypt(data, salt=random_bytes, n=64, r=8, p=1, dklen=lenght).hex()
salt_string = random_bytes.hex()

print(f"Salt+Hashed: {salt_string}:{data_hashed}")
Salt+Hashed: 5c1945730bde7053e3de:b8e77137
lenght = 2**4
random_bytes = secrets.token_bytes(10)

data_hashed = hashlib.scrypt(data, salt=random_bytes, n=64, r=8, p=1, dklen=lenght).hex()
salt_string = random_bytes.hex()

print(f"Salt+Hashed: {salt_string}:{data_hashed}")
Salt+Hashed: 921db1fd105ca87427cd:e72e8866ab09b19465ba2ea319c38eec
lenght = 2**6
random_bytes = secrets.token_bytes(10)

data_hashed = hashlib.scrypt(data, salt=random_bytes, n=64, r=8, p=1, dklen=lenght).hex()
salt_string = random_bytes.hex()

print(f"Salt+Hashed: {salt_string}:{data_hashed}")
Salt+Hashed: 338355fa04e4e7aaa54a:cf219b6b2bd6168d10c70ad2eeaeeddedd1574fe299e959673dfc387a0771adff9a39f8c99d2a2156e431478100e6382d173337d0c41cde0a84ca0bf9bf75c56

Customizing Execution Time#

In some application making the hash slower can increase the security, the scrypt algorithm can be customized to take different amounts of memory and processing time.

The memory used will be equal to: 128 * n * r * p bytes

Examples:

  • Low Memory Footprint: 128 * 64 * 8 * 1 = 64 KB

  • Large Memory Footprint: 128 * 2^17 * 8 * 1 = 128 MB

n = 2 ** 20
r = 8
p = 1
memory_bytes = 128 * n * r * p
memory_kilo_bytes = memory_bytes / 1024
memory_mega_bytes = memory_kilo_bytes / 1024
print(f"Memory Consumed: {memory_bytes} bytes = {memory_kilo_bytes:.2f} KB = {memory_mega_bytes:.2f} MB")
Memory Consumed: 1073741824 bytes = 1048576.00 KB = 1024.00 MB

Code Examples#

data = b"Hello World!"
%%timeit
n = 2**6  # 64
random_bytes = secrets.token_bytes(16)
data_hashed = hashlib.scrypt(data, salt=random_bytes, n=n, r=8, p=1).hex()
252 µs ± 136 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%%timeit
n = 2**12
random_bytes = secrets.token_bytes(16)
data_hashed = hashlib.scrypt(data, salt=random_bytes, n=n, r=8, p=1).hex()
13.7 ms ± 42.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
n = 2**14
random_bytes = secrets.token_bytes(16)
data_hashed = hashlib.scrypt(data, salt=random_bytes, n=n, r=8, p=1).hex()
55.4 ms ± 189 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Example: User Authentication#

The following example showcases how scrypt can be used to securely store passwords.

Note: Remember that this code is not suitable for any production use-case.

Auxiliary Functions#

def generate_hash(data:str, salt: bytes) -> str:
    data_bytes = data.encode("utf-8")
    data_hashed = hashlib.scrypt(data_bytes, salt=salt, n=64, r=8, p=1)
    return f"{salt.hex()}:{data_hashed.hex()}"


def sign_up(email, password, database_):
    database = database_.copy()
    random_bytes = secrets.token_bytes(10)
    database[email] = generate_hash(password, random_bytes)
    print("Successfully Singed Up")
    return database


def login(email, password, database):
    if email not in database:
        print(f"ERROR: User {email} not in Database")
        return

    expected_password = database[email]
    salt, hashed = expected_password.split(":")
    salt_bytes = bytes.fromhex(salt)
    calculated_hash = generate_hash(password, salt_bytes)
    passwords_matched = secrets.compare_digest(expected_password, calculated_hash) 
    if passwords_matched:
        print(f"Successfully Signed in: {email}")
        return
    
    print(f"ERROR: Incorrect Password for: {email}")

Sign Up#

email = "johndoe@example.com"
password = "password123"
user_database = {}

user_database = sign_up(email, password, user_database)
Successfully Singed Up

Wrong Email#

email = "janedoe@example.com"
password = "password123"

login(email, password, user_database)
ERROR: User janedoe@example.com not in Database

Wrong Password#

email = "johndoe@example.com"
password = "password"

login(email, password, user_database)
ERROR: Incorrect Password for: johndoe@example.com

Successful Login#

email = "johndoe@example.com"
password = "password123"

login(email, password, user_database)
Successfully Signed in: johndoe@example.com

Conclusion#

Hash algorithms themselves have a drawback, anyone can pre-compute most commonly use values and simply do a one by one comparison searching for matches. However, thanks to the small changes in input big changes in output feature of cryptographic hashes, simply adding some bytes to the data mitigates this risk by a great ammount. This bytes are called salt, they are public and stored along with the hash. One of the algorithms that uses it is scrypt, which is available in the standard library. This particular algorithm also allows customization of memory consumption/processing time.