A function is considered to be a hash function if it can be used to translate data of arbitrary size to values of a defined size. The values that are generated by a hash function are referred to as hash values, hash codes, digests, or just hashes for short. In most cases, the values are used to index a table with a predetermined size that is known as a hash table. The process of using a hash function to index a hash table is known as hashing, also known as scatter storage addressing.
Applications for data storage and retrieval employ hash functions and the accompanying hash tables in order to access data in a short amount of time that is almost constant from one retrieval to the next. They need a storage space that is just slightly more than the entire space necessary for the data or records themselves. The non-constant access time of ordered and unordered lists and structured trees, as well as the frequently exponential storage requirements of direct access to state spaces of large or variable-length keys, can be avoided with hashing, which is a form of data access that is both computationally and storage space-efficient.
The application of hash functions is dependent on the statistical features of the interaction between the key and the function. The worst-case behaviour is intolerably awful with a vanishingly tiny probability, while the average-case behaviour may be virtually ideal (minimal collision).
Checksums, check digits, fingerprints, lossy compression, randomization functions, error-correcting codes, and cyphers are all linked to hash functions, and people often get these concepts mixed up with one another. Even if there is some overlap between the ideas, each one has its own applications and needs, and they are built and optimised in a separate way. The hash function is distinct from these other ideas primarily with regard to the data integrity it provides.