Python Hash Algorithm Security Practice Guide
1. What is the hash algorithm?
The hash algorithm (also called digest algorithm or hash algorithm) is a basic tool for computer security. It can convert arbitrary length input data (such as a sentence, a picture, a software installation package) into a string of fixed length "digital fingerprints" - this fingerprint is called a hash value (digest), usually expressed in hexadecimal.
Don’t get hung up on the mathematical details behind it, just remember its 5 core security features:
- ✅ Absolute Certainty: The same input, no matter when and on which machine it is calculated, the result will always be the same.
- ⚡ High-speed calculation (normal scenario): Fingerprints can be calculated in milliseconds to seconds for files ranging from several KB to several GB.
- 🔒 Theoretical irreversibility (for security algorithms): The original data cannot be restored even if the fingerprint is obtained.
- 🦋 Butterfly Effect (Avalanche Effect): If you change even one character in the input, the output fingerprint will be completely different.
- 🎯 Extremely low collision probability (security algorithm): In reality, it is almost impossible to find a situation where two different inputs produce the same fingerprint.
2. hashlibQuick start
Python standard libraryhashlibProvides mainstream hashing algorithms, no need to install third-party libraries, just use them out of the box.
2.1 Basic usage: single piece of text hashing
All algorithms follow three steps:创建对象 -> 喂数据 -> 取指纹. For short text, you can directly chain the call:
:::tip Tips If you want to see which algorithms the system supports, you can use:
hashlib.algorithms_available— View all available algorithms in the current environment (including system local supplements)hashlib.algorithms_guaranteed— View algorithms guaranteed to be available across platforms :::
3. Hash processing of large files/streaming data
If you transfer a video file of several GB in one goread()If the hash is calculated again in the memory, the computer will definitely freeze. This is when you need to usehashlibGradually updated features.
4. Password storage: from “falling into pitfalls” to “avoiding pitfalls”
Password storage is the most common security scenario for hashing algorithms, but you must never use basic hashing (even SHA‑256) directly! The following is a step-by-step upgrade from the worst implementation to the industry recommended solution.
❌ Pit 1: Storing passwords in clear text
Once the database is leaked, user passwords are exposed at a glance - this kind of mistake is rare now, but it is still necessary to be wary of.
❌ Pit 2: Only do basic hashing once
A hacker can use a rainbow table (a large dictionary of "common passwords → basic hashes") to instantly reverse the plaintext.
⚠️ Transitional version: salted hashing
Salt is a randomly generated byte string that is different for each user. When saving the password, put the salt and password together and then hash them, and do the same when verifying. In this way, even if a hacker obtains the database, he will still have to generate a separate rainbow table for each user, which greatly increases the cost.
Although safer than the previous two, the basic hash operation is too fast - modern GPUs can calculate SHA‑256 billions of times per second, and brute force cracking is still only a matter of time. So we need to turn around and look for "slow hashes".
✅ Advanced version: Use built-in PBKDF2
PBKDF2 (Password-Based Key Derivation Function 2) is a slow hash function specially designed for passwords. Through a large number of iterations (such as 100,000 times), the speed of brute force cracking is reduced from "second level" to "day level" or even "grade".
✅✅ Most recommended version: Use a third-party modern slow hashing library
PBKDF2 is already good, but Password Hash Competition (PHC) champion Argon2 and the proven bcrypt are even better - they come with security designs such as salt value generation, parameter adaptation, constant time comparison, etc., and are easier to use.
Here we take the most widely used bcrypt as an example (Argon2 can passargon2-cffilibrary installation):
:::danger The iron law of password storage
- Password storage** can only use** specialized slow hash functions (PBKDF2, bcrypt, Argon2)
- Each password must have a unique, cryptographically secure random salt
- The number of iterations / work factor must be adjusted high enough to ensure that brute force cracking is extremely slow
- Never use MD5 / SHA‑1 / raw SHA‑256 for password storage :::
5. Other security/non-security uses of hashing algorithms
In addition to password storage, hashing algorithms have a variety of legitimate applications:
- Data Integrity Verification: When downloading a file, compare it with the officially provided SHA-256 fingerprint to ensure that the file has not been tampered with or downloaded incorrectly.
- Data deduplication: The "second transfer" of the network disk is to first calculate the file fingerprint. If there is already a file with the same fingerprint on the server, it will be referenced directly to you without repeated uploading.
- Digital signature prefix: Digital signatures usually do not sign the original file directly, but first sign the hash value of the file (the original file is too large and the signature algorithm is slow).
- Blockchain: Each block contains the hash value of the previous block, forming a chain structure that cannot be tampered with.
6. Summary
hashlibIt is a very useful standard library for Python, but its value can only be exerted when used in the right place:
Remember these red lines and you can avoid most pitfalls:
- 🚫 Don’t use MD5 / SHA‑1 for security related work
- 🧂 Each password must be equipped with a unique random salt
- 🐢 The password must use slow hashing, and the number of iterations must be high enough
- 🔑 Force users to set strong passwords and use two-step authentication (2FA) if possible
I hope this guide can help you use hashing algorithms safely and efficiently.

