# Pepper

**ToDo**:
- Add illustration for the salting process - Similar to [this](https://thehavok.com/wp-content/uploads/2020/11/for_engineers.064-1-1024x576.jpeg).
- Add reference to libraries compatible with peppers.
- Add appendix regarding how to store environment variables in `.env` files.
- Add appendix regarding read/write hashes to database.
- Add relevant resources at the end
---


Pepper is the name given to an array of random bytes added to the data before hashing it, it is analogous to salts with some key differences:

- Peppers are secret
- Peppers should be store in the different medium than salts, or not stored altogether

The remaining vulnerability with salts is that an attacker can still run parallel attacks and try to find weak passwords, since the salt is public it can use a powerful GPU or some specialized hardware to test for commonly used password. 

With Pepper, even if the database is compromised, i.e. all hashes and salts are leaked, the attacker cannot parallelize the hash cracking process because to that it needs the salt (which was leaked), the hashed password (which was also leaked) and the pepper (which they do not have). That means that they are not able to test even for weak passwords in parallel.

That means that even with weak passwords, pepper protects the hashes with yet another layer of security.

There are two approaches to peppers:

- **Secret Pepper**: here the pepper is stored somewhere but physically separated from salts and hashes, like TPM circuit, environment variables, or managed services.
- **Re-discovered Pepper**: here the pepper is not stored and the system should re-discover it every time. 

Secret Peppers are fast and don't have major impact on performance, but they require an external medium to store them, moreover, if they are leaked, the security is the same as it would have been with salts alone. On the other hand, re-discovered peppers can be used without any storage but they penalized performance because of the actual search, they cannot be leaked because the system do not know their values and a different one is used each time.

When using the secret approach, the pepper is usually quite long whereas when using the re-discovered approach the pepper is usually quite short to minimize the performance penalty.

An example for each is provided below

## Caveats

Even though the algorithm used in this chapter is `scrypt`, it should be noted that it was not thought for handling peppers natively. There is no other algorithm available in the standard library so be mindful and only consider this code for educational purposes.

## Examples

### Common code

In [1]:
import secrets
import hashlib

In [2]:
# https://en.wikipedia.org/wiki/List_of_the_most_common_passwords
most_common_passwords = {
    "123456", "123456789", "picture1", "password",
    "12345678", "111111", "123123", "12345", "1234567890", 
    "senha", "1234567", "qwerty", "abc123", "Million2", 
    "000000", "1234", "iloveyou", "aaron431", 
    "password1", "qqww1122"
}

def cracking_password(database_hashed):
    salt, hashed_password = database_hashed.split(":")
    salt_bytes = bytes.fromhex(salt)
    
    for guess in most_common_passwords:
        guess_bytes = guess.encode("utf-8")
        hashed_guess = hashlib.scrypt(guess_bytes, salt=salt_bytes, n=64, r=8, p=1).hex()
        if hashed_password == hashed_guess:
            return f"Password Cracked: '{guess}'"
    
    return "Password not found in database"


def password_generator_salt(password):
    password_bytes = password.encode("utf-8")
    salt = secrets.token_bytes(32)
    hashed_password = hashlib.scrypt(password_bytes, salt=salt, n=64, r=8, p=1).hex()
    return f"{salt.hex()}:{hashed_password}"

In [3]:
user_database = {
    "John": password_generator_salt("abc123")
}
user_database

{'John': 'f5992b02c5daad756954608f02a1e45cae511a8ce39fef02c08dbbdfaa82dcda:3ae30cb6d8b7f22b5e88a24a7fe900c523272d7036a93e8e768c4a6ee4d69fe4b15b0c7b792ee9fb578a9eb283a714d1258cadfca13fc4e146d87d23008361a5'}

#### Without using Pepper

In [4]:
leaked_password = user_database["John"]

cracking_password(leaked_password)

"Password Cracked: 'abc123'"

### Secret Pepper Example 

In [5]:
# At an earlier moment in time
import os

# Environment variables are not part of the database
os.environ["PEPPER"] = secrets.token_hex(32)

def password_generator_salt_pepper_secret(password):
    password_bytes = password.encode("utf-8")
    salt = secrets.token_bytes(32)
    
    pepper = os.environ["PEPPER"]
    pepper_bytes = bytes.fromhex(pepper)

    new_salt = salt + pepper_bytes
    
    hashed_password = hashlib.scrypt(password_bytes, salt=new_salt, n=64, r=8, p=1).hex()
    return f"{salt.hex()}:{hashed_password}"   

def check_password_secret(user, password):
    password_bytes = password.encode("utf-8")
    
    salt, hashed_password = user_database[user].split(":")
    salt_bytes = bytes.fromhex(salt)
    
    pepper = os.environ["PEPPER"]
    pepper_bytes = bytes.fromhex(pepper)

    new_salt = salt_bytes + pepper_bytes
    
    hashed_trial = hashlib.scrypt(password_bytes, salt=new_salt, n=64, r=8, p=1).hex()
    if secrets.compare_digest(hashed_password, hashed_trial):
        return "Access Granted"
    
    return "Access Denied"

user_database = {
    "John": password_generator_salt_pepper_secret("abc123")
}

user_database

{'John': '88bedc9d09a7315c073e888412fef53a5b97cdf2e96ff10d0c521dafd4a1be96:cc0473857d846174965650badcbcfdeabaafc765d2364ad5af09aeedb67fb779817069551b6a99521bdb6b77475c20189544ddd04908d7b8d16acdd0c4b27508'}

In [6]:
leaked_password = user_database["John"]

print(cracking_password(leaked_password))

Password not found in database


In [7]:
print(check_password_secret("John", "abc123"))

Access Granted


In [8]:
%%timeit 
check_password_secret("John", "abc123")

433 µs ± 11.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


### Self-Discovered Pepper Example 

In [9]:
pepper_bitslength = 8
pepper_bytelenght = round(pepper_bitslength/8 + 0.5)

def password_generator_salt_pepper_discovered(password):
    password_bytes = password.encode("utf-8")
    salt = secrets.token_bytes(32)

    pepper = secrets.randbits(pepper_bitslength)
    pepper_bytes = int.to_bytes(pepper, pepper_bytelenght, "big")

    new_salt = salt + pepper_bytes
    
    hashed_password = hashlib.scrypt(password_bytes, salt=new_salt, n=64, r=8, p=1).hex()
    return f"{salt.hex()}:{hashed_password}"   

def check_password_rediscovered(user, password):
    password_bytes = password.encode("utf-8")
    
    salt, hashed_password = user_database[user].split(":")
    salt_bytes = bytes.fromhex(salt)
    
    for guess_pepper in range(2**pepper_bitslength):
        guess_pepper_bytes = int.to_bytes(guess_pepper, pepper_bytelenght, "big")

        new_salt = salt_bytes + guess_pepper_bytes

        hashed_trial = hashlib.scrypt(password_bytes, salt=new_salt, n=64, r=8, p=1).hex()
        if secrets.compare_digest(hashed_password, hashed_trial):
            return "Access Granted"
    
    return "Access Denied"

user_database = {
    "John": password_generator_salt_pepper_discovered("abc123")
}

user_database

{'John': '7fbe809bc8f1c530bb9db03a4acd2a3d5f039ccfec4d54adbb55f58b36deaa34:0a302dcf67b5911a01a16d94be792c6241c0fc7291d3689f01b1a37005b148aa0dda98f7ba4cfcc711367e108ef7eb06e5f19d163cee80465371a4a618f28f16'}

In [10]:
leaked_password = user_database["John"]

print(cracking_password(leaked_password))

Password not found in database


In [11]:
print(check_password_rediscovered("John", "abc123"))

Access Granted


In [12]:
%%timeit
check_password_rediscovered("John", "abc123")

10.8 ms ± 373 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


## Conclusion

Salts introduced a first layer of security on top of hashes but the hash + salt strategy is still vulnerable to parallelized brute-force attacks. Introducing a new secret sequence of bytes, called **pepper**, solves this issue. Peppers should be stored in a physically separated storage or not stored at all.