string_hashing¶
shortfx.fxString.string_hashing
¶
String Hashing and Fingerprinting Functions.
This module provides functions for generating cryptographic hashes and normalized fingerprints from strings.
Key Features: - MD5, SHA-256, SHA-512 hashing - Normalized fingerprint for deduplication
Functions¶
fingerprint(text: str) -> str
¶
Generates a normalized fingerprint for deduplication.
The fingerprint is created by lowercasing, removing accents, stripping non-alphanumeric characters, sorting the remaining tokens, and joining them with a single space.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
The input string. |
required |
Returns:
| Type | Description |
|---|---|
str
|
A normalized fingerprint string suitable for comparing duplicates. |
Example
fingerprint(" Café Résumé ") 'cafe resume' fingerprint("The LORD of The RINGS") 'lord of rings the'
Complexity: O(n log n) due to word sorting.
Source code in shortfx/fxString/string_hashing.py
hash_string(text: str, algorithm: str = 'sha256') -> str
¶
Generates a hexadecimal hash digest of a string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
The input string to hash. |
required |
algorithm
|
str
|
Hash algorithm name — "md5", "sha256", or "sha512". |
'sha256'
|
Returns:
| Type | Description |
|---|---|
str
|
The hexadecimal hash digest. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If text is not a string. |
ValueError
|
If algorithm is not supported. |
Example
hash_string("hello") '2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824' hash_string("hello", "md5") '5d41402abc4b2a76b9719d911017c592'
Complexity: O(n)