Modern Hash Algorithms and HMAC Security Practice Guide
Hashing algorithms are one of the most commonly used basic tools in security development. It can compress data of any length into a fixed-length "summary", just like taking a unique "fingerprint" of the data. Regardless of whether the input is a letter or a movie, the output length is always the same, and the content is completely determined by the input. Even if only one bit is changed, the result will be completely different.
In daily development, we use hashes to do four types of things:
- Integrity Check — Check whether the downloaded files and the data returned by the interface have been damaged or tampered with during the transmission process.
- Password storage - does not store plain text, but stores the hash value of the password. Even if the database is leaked, it will be difficult for an attacker to deduce the original password.
- Digital Signature Assist — Calculate the digest of the long message first, and then use the private key to encrypt the digest, which is much more efficient than directly encrypting the complete message.
- Unique Identification — Generate UUIDs, generate unique IDs for Git commits, or "address" data content so that the same content always points to the same identifier.
A secure hash algorithm must have the following five characteristics:
- Deterministic: The same input, no matter what machine or time, the output must be the same.
- Efficiency: Processing of gigabyte-sized files can be completed within an acceptable time.
- One-way: Given a hash value, it is impossible to reversely calculate the original data (or the cost is too high to be realistic).
- Collision Resistance: Two different messages cannot be found to generate the same hash value, at least not with existing computing power.
- Avalanche Effect: Even if only one byte is changed, the output hash value will change drastically, and about half of the bits will be flipped.
What hashing algorithms are used now?
MD5 and SHA-1 have been completely phased out and should never be used again. It is recommended to choose according to the scenario:
- SHA-2 family (SHA-256 / SHA-384 / SHA-512) — has the best compatibility, is supported by most systems, and is the first choice for general scenarios. The higher the number, the longer the output and the stronger the security.
- SHA-3 (Keccak) — The winner of the 2012 NIST Hash Competition. It has a completely different structure from SHA-2. It is more resistant to quantum computing at some theoretical levels and is suitable for scenarios with higher security requirements.
- BLAKE2 / BLAKE3 — The calculation speed is faster than SHA-2 and SHA-3, and the security strength is equivalent. It is especially suitable for high-throughput, resource-constrained edge devices or microservices.
Message authentication code HMAC
Many people confuse "salting a hash" with "HMAC". HMAC is not simply a matter of hashing the key and message together, but a NIST-standardized message authentication technology that guarantees two things at the same time:
- Integrity — The message has not been tampered with.
- Authenticity — The message really comes from the party holding the correct key.
A layman's understanding of how it works: HMAC will first process the key to a fixed length, then do XOR with two different sets of padding values, and then do a two-layer hash with the message. The advantage of this design is that even if the rules you use for splicing are leaked, as long as the key is not lost, the attacker cannot forge a legitimate HMAC. At the same time, this message authentication code also avoids security risks caused by the key being too long or too short.
How to use HMAC in Python?
Python comes with the standard libraryhmacModule, there is no need to install any third-party dependencies, it is recommended to use:
Best practices for password storage
**Please note: Never use ordinary hashes (even SHA-256 salted) to directly store passwords. ** Ordinary hashes are designed to be "fast", but attackers can use GPUs and specialized chips to try billions of password combinations in a second. You must use specialized cryptographic hash functions, which have three key advantages:
- Built-in random salt - Different salts are automatically generated for each call, eliminating the need for developers to manually process, fundamentally eliminating rainbow table attacks.
- Adjustable work factor — You can slow down the calculation through parameters (for example, set each hash to take 0.2 to 0.5 seconds). This will not affect normal user login, but it can slow down the attacker's cracking speed to an unbearable level.
- Standardized Output Format — The generated hash string contains the algorithm identification, working parameters, salt and final hash value. It is automatically parsed during verification and does not require you to store any additional configuration.
Recommended password hashing algorithms (priority from highest to lowest)
- Argon2id — Winner of the 2015 Password Hash Competition (PHC), optimized for side-channel attacks and currently the most recommended choice.
- bcrypt — It has strong compatibility and is used by many old systems. The upgrade cost is low.
- scrypt — consumes a lot of memory and greatly raises the threshold for GPU/ASIC attacks.
- PBKDF2 — The most basic professional solution, which can be implemented on all platforms and all languages, and is suitable for systems with limited environments.
Python example: using passlib library
passlibIt is the most popular password hash package library in the Python ecosystem, and you can switch the algorithm with one line of code. Install:
Sample code:
Safe landing checklist
Key management
- Length: HMAC key is at least 16 bytes (128 bits), 32 bytes (256 bits) for high security scenarios. The "work factor" and salt used in password hashing are automatically generated by the library and do not need to be set manually.
- Storage: Never hardcode keys in code or configuration files. The formal environment uses key management services (AWS KMS/HashiCorp Vault/Azure Key Vault), and the development environment uses at least environment variables.
- Rotation: It is recommended to replace the HMAC key every 3 to 6 months. During rotation, new and old keys can be accepted for a period of time to avoid service interruption.
Quick check on algorithm selection
Other important details
- When comparing hash values: Must be used
hmac.compare_digestOr the one that comes with the password libraryverifymethod, never use it directly==. Ordinary comparisons will return early due to prefix matching, and an attacker can gradually guess the correct value by measuring the response time. - Work Factor Adjustment: Don’t blindly pursue high iterations. Actual measurements in a production environment have shown that the time required for a single password hash is 0.2 to 0.5 seconds, which is both safe and does not affect the user experience.
- Don't reinvent the wheel: "Do It Yourself" in the security field is the most dangerous behavior. Always choose libraries that are NIST/PHC standardized, extensively vetted by the community, and do not modify the underlying logic.
Summarize
"Hash" in modern applications actually requires the use of different tools depending on the scenario. A simple summary is:
- Only care about whether the data has been modified → SHA-256 / BLAKE2
- Also confirm whether the data comes from a trusted party → HMAC-SHA256
- Need to save user password → Argon2id/bcrypt
Python standard libraryhashlibandhmacThe safest standard implementation has been encapsulated, pluspasslibWith this "Swiss Army Knife", you only need to follow best practices and don't try to modify their internal logic, so you can avoid 90% of security traps.

