Hash Functions#

ToDo:

  • Add illustration for Hash Functions (HashTables) - Similar to this.

  • Add illustration for Cryptographic Hash Functions - Similar to this.

  • Add a comparison table for the different Hash Functions.

  • Show a benchmark comparing execution times.

  • Add appendix regarding read/write hashes to database.

  • Add relevant resources at the end


Hashes are mathematical functions with certain properties:

  • They map an variable length input to a fixed length output.

  • Given a certain input, they produce the same output, i.e. they do not have randomness of any sort.

  • They output is uniformly distributed across all possible values.

  • The likelihood of collisions (two inputs having the same output) is minimized.

An special case of hashes are cryptographic hashes, which in addition to the previous properties they have:

  • It is fast to compute (otherwise impacting performance).

  • Given a certain output it is unfeasible to reverse it and obtain the input (otherwise susceptible to pre-image attacks).

  • It is unfeasible to find two distinct inputs with the same output (otherwise susceptible to second pre-image attacks).

  • Small changes in the input changes the output drastically (otherwise susceptible to birthday attacks).

When any of this properties is not met, there is an attack that could be perform to crack the hash. Note that the word unfeasible is used instead of impossible that means that it is theoretically possible but not in any practical way, this video illustrates this notion of infeasibility.

In Python there are both functions for hashes and for cryptographic hashes.

Built-In Hash Function#

Python has a built-in hash function which is internally used by sets and dictionaries. However this is not a secure or cryptographic hash but rather a convient function to make use of the fast speed of HashTables, the underlying data structure of sets and dictionaries.

hash("Hello World!")
-1218337880191406375

The result will be an integer and it is designed in such a way that numeric objects which are equal when compared will have the same hash even though they are actually different. To objects are the same, when they have the same memory position, this can be obtained through the built-in id function.

This was done intentionally to increase the speed of the language and for general-purpose approach, not for security.

print(f"Are 1 and 1.0 the same object? {id(1) == id(1.0)}")
print(f"Does 1 have the same hash as 1.0? {hash(1) == hash(1.0)}")
Are 1 and 1.0 the same object? False
Does 1 have the same hash as 1.0? True

This means this function SHOULD NOT be used for any cryptographic work. Instead, the hashlib module should be used.

Hashlib Module#

The hashlib module uses the OpenSSL library under the hood and exposes several of its cryptographic hash functions.

One of the particularities of most cryptographic-focused modules and libraries is that they work with low level objects, mainly bytes objects, instead of the high level built in types, such as lists, strings, custom objects, etc. Since working with Bytes is not for some Python developers, an appendix is provided as a quick introduction.

The Hashlib module exposes two ways to construct hashes, one is a simple function call and the other is implementing the Builder Pattern, which is more object oriented.

import hashlib

Methods available#

It is possible to list all methods available

print(", ".join(hashlib.algorithms_available))
sha3_224, whirlpool, sha512, sha384, shake_256, sha3_256, sha3_512, md5-sha1, sha512_256, md4, md5, sha3_384, blake2s, sha512_224, sm3, sha256, sha224, shake_128, blake2b, ripemd160, sha1

Function Call#

The way to use functions calls directly is to use the already exposed function (e.g. sha256, blake2s, etc.). This approach does not allow parametrization in a simple way and is more adecuate when the hash algorithm is not likely to change.

hash_object = hashlib.sha256(b"Hello World!")
print(f"Bytes Digest: {hash_object.digest()}")
print(f"Hex Digest: {hash_object.hexdigest()}")
Bytes Digest: b'\x7f\x83\xb1e\x7f\xf1\xfcS\xb9-\xc1\x81H\xa1\xd6]\xfc-K\x1f\xa3\xd6w(J\xdd\xd2\x00\x12m\x90i'
Hex Digest: 7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069

Object Instantiation#

If instead the builder pattern is used, the hash algorithm is passed as a string, allowing for easy parametrization, this will also mean that the particular hash algorithm used should be saved somewhere because digests are not compatible across hash algorithms.

hasher = hashlib.new('sha256')
hasher.update(b"Hello World!")
print(f"Bytes Digest: {hasher.digest()}")
print(f"Hex Digest: {hasher.hexdigest()}")
Bytes Digest: b'\x7f\x83\xb1e\x7f\xf1\xfcS\xb9-\xc1\x81H\xa1\xd6]\xfc-K\x1f\xa3\xd6w(J\xdd\xd2\x00\x12m\x90i'
Hex Digest: 7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069

Important Considerations for OpenSSL#

When working in lightweight environments (e.g. Alpine Linux in a Docker Container), OpenSSL or some of its implementation might not be available. When using some specific environment, one should make sure the proper dependencies are installed and OpenSSL is not automatically installed with Python. Meaning it should be installed at the operating system level (apt-get, homebrew, chocolately, etc.).

Conclusion#

The built-in hash function is useful for non-secure operations, such as fast look up in dictionaries, on the other hand the hashlib module provides cryptographic hash functions ready to use. It should be considered though that to use hashlib OpenSSL should be installed and that not all the exposed functions are actually secure (e.g. MD5 is considered insecure).