workspace-opus

toggle-button
 
In a Hurry?
  Go straight to the Quick Selection Guide.
Introduction

Hashing is the process of computing a fixed-length message digest from a data stream usually for the purpose of validating, authenticating, or digitally signing that stream.  The stream could be a disk file, an email message, or packets of data in network transport.  Hashing is not encryption because the message digest cannot readily be transformed back into the original data from which it was computed.  Instead, hashing is a mechanism for representing a block of data in a predictable way by the use of a standard, public algorithm.

The usefulness of hashing arises partly from the ease with which message digests can be computed and partly from the fact that no two data streams should ever produce the same message digest.  These characteristics suggest some important uses to which hashing algorithms can be put.  Digitally signing an email message, for example, involves computing a cryptographic hash of the body of the message and all attachments that is then encrypted using the sender's private key.  (Note:  It is the hash that is encrypted and not the message itself when the message is only digitally signed and not encrypted.  When the entire message is encrypted, the recipient's public key is used to do so.)  The encrypted hash is then attached to the message for transport.  On the receiving end, the digital signature accompanying the message is decrypted using the sender's public key, which could have accompanied the message or could have been drawn from a key escrow, and the resulting data, which was the original hash of the message body with attachments, is compared against a newly computed hash of the received message data.  If the original hash and the newly computed one are identical, then there is a high degree of probability that the message was not altered in transit and that it did come from the person whose digital signature accompanied the message.

Another important use for cryptographic hashes is in the verification of an acquired data stream against a published hash for that stream.  For example, individuals, companies, and organizations often provide file download services on web sites and in online databases.  In addition to offering content in the form of downloadable files, hashes of those files are often published that the consumer can validate the downloaded content against.  If the consumer's own computation of the hash of the content using his or her own tool, which can be different from the tool used by the content provider, is identical to the published hash, then there is a high degree of probability that the downloaded content is identical to the published data.  This is useful for ensuring that the received file is what was published both from the standpoint of malicious alteration and from the standpoint of accidental alteration or truncation in transit, which is much more likely.

What the hash verification does not do is validate that the acquired data stream is harmless.  Because of the ease with which hashes can be computed, malicious web site owner's can publish hashes for infected content.  Even "dear john" letters, which might be far from harmless to the recipient, can be digitally signed for email transport.  The hashing involved in either case says nothing whatsoever about the nature of the content provided.  Unsuspecting consumers might infer trust in the content from the existence of the published hash or digital signature when in fact all the hash can do is facilitate validation that the content received matches the content published.  Trust in the content itself must be derived from other knowledge that the consumer/recipient possesses about the publisher/sender.

Technical Discussion

For those who are interested in knowing more about the various hashing algorithms in use, a technical discussion of these algorithms and their possible uses follows.  Please feel free to skip this section.

There are several hashing algorithms in common use having different purposes and varying degrees of reliability for error detection, data validation, and cryptographic security.  Common algorithms include Cyclic Redundancy Check (CRC), Message Digest (MD), Secure Hash Algorithm (SHA), RACE Integrity Primitives Evaluation Message Digest (RIPEMD), and Whirlpool.  Virtually all algorithms have gone through revisions or replacements to improve their inherent security (SHA-2, for example, being more cryptographically secure than SHA-1, and the final version of Whirlpool more than earlier versions).  In addition, some algorithms, such as SHA and RIPEMD, offer variations that reduce the likelihood of accidental collisions (two messages having the same hash).  SHA-512, for example, is SHA-2 with a 512 bit (64 byte) message digest size that reduces the likelihood of accidental collisions versus SHA-256, but a larger digest size does not make an otherwise identical hash algorithm more secure.  The larger digest sizes satisfy the needs of encryption algorithms that require them.  The security of a hashing algorithm, however, is defined by its resistance to certain kinds of attacks such as pre-image attacks and deliberate collision attacks irrespective of its digest size.  (Digest size merely refers to the number of bits in the hash produced by the algorithm.)

CRC is a high-performance algorithm that can be implemented in hardware for the validation of data moving through the electronics of a computer or network device at high speed.  Its purpose is to provide maximum performance in the detection of errors in the data stream.  CRC is not suitable for cryptographic use because of its low collision resistance, but it does provide basic error checking when performance is paramount.

The widely-used MD5 algorithm is the latest of a serious of algorithms in the same family.  It produces a 128 bit message digest.  It has been shown, however, that MD5 is not collision resistant.  In 2007, two Danish researchers demonstrated that it is possible for two executable programs, one benign and the other not, to share the same MD5 message digest.  It would be difficult for malicious coders to exploit the researchers' methodology because it would require the coders to insinuate themselves into the publication of the original program, but it is difficult to be sure that this could not lead to a practical attack vector.  The researchers' conclusion was that MD5 ought not be used for code signing and cryptographic purposes.

SHA was created by the National Security Agency (NSA) of the United States Government, and it has been used as a general purpose algorithm for cryptographic applications since the mid-1990s.  (Although hashing algorithms are not reversible encryption systems, they are used by such systems for various purposes.)  Weaknesses in early versions, SHA-0 and SHA-1, led to the creation of SHA-2.  (SHA-256 and SHA-512 are both SHA-2 algorithms with differing message digest sizes.)  A public competition for a successor to SHA-2, which will become SHA-3, is currently being conducted by the National Institute of Standards and Technology (NIST).  Among the current SHA algorithms, SHA-256 provides a good compromise between performance and security having no known collision vectors.  One of the chief criticisms of SHA-1 and 2, however, has been that their development was conducted by a secret governmental agency.

Unlike SHA, RIPEMD was created by an open academic community--the COSIC group of Belgium's Katholieke Universiteit Leuven, which is the same group whose Rijndael encryption algorithm won the competition for the U.S. Government's Advanced Encryption Standard in 2001.  RIPEMD comes in two versions, RIPEMD-128 (the faster) and RIPEMD-160 (the more secure) each of which has an extension for a larger hash result size (256 and 320 bits respectively).  RIPEMD creators caution that the larger hash result sizes of the extensions should not be regarded as more secure than the base algorithms and are merely provided for applications that require larger message digests.

The Whirlpool hash algorithm was created by one of the co-creators, Vincent Rijmen, of the Rijndael encryption system that became the Advanced Encryption Standard.  Whirlpool is actually based on Rijndael with certain key differences that make it a one-way hashing algorithm instead of a reversible encryption system.  Whirlpool, which has a fixed message digest size of 512 bits, has been revised twice to deal with weaknesses found in early versions.  These versions are referred to as Whirlpool-0, Whirlpool-T, and then just Whirlpool for the final published version.  All implementations are expected to use the final version.

Programs that use any of these algorithms for file validation purposes ought to at least compute MD5 and SHA-1 hashes as these are the most widely used by software publishers.  If both are used, the effects of their respective weaknesses can be canceled because it is extremely unlikely that a given malicious file could simultaneously exploit the weaknesses of both.  For non-cryptographic purposes, this would be sufficient.  Good supplemental algorithms to these would be SHA-256 and Whirlpool as these currently have no known weaknesses.  The inclusion of other algorithms does not necessarily make a given program better.  There are some differences in the algorithms used by the various programs reviewed here.

The various versions of SHA and RIPEMD and the latest version of Whirlpool are included in the International Standards Organization (ISO) standard 10118-3:2004 for dedicated hash functions.

Product Reviews

Focusing on the use of hashing for the validation of a data stream against published hashes, there are a number of useful programs that provide this functionality.  Essentially, these programs 1) must be easy to use, 2) must accurately compute hashes according to published algorithms, and 3) must present the information in a usable form.  It is not important whether hashing is the primary purpose of the software or just an incidental feature of a broader application.  What is important is that a useful capability is provided attended with as little "noise" (bugs and fluff) as possible.

The programs reviewed here provide three levels of functionality:

  • Programs that compute hashes.
  • Programs that also provide hash validation.
  • Programs that also include a database of hashes for revalidation.

The reviewed applications implement their user interfaces in one of three ways:

  • Windows console application (DOS command line).
  • Windows Explorer context menu entry.
  • Windows Explorer property page tab.

It cannot really be said that any one of these approaches is better than the others because each provides its own capabilities.  A console application, for example, allows for scripting and ad-hoc programming that is not possible with graphical applications, but its user interface is somewhat limited.  A Windows Explorer context menu entry provides quick access to a full-scale application, but this also switches the user to a new application context.  An Explorer property page tab offers a handy and familiar access to program controls without context switching, but the small physical window size places severe constraints on application features.

HashTab

HashTab implements its user interface as a Windows Explorer file property page.  To compute the hash of a file, you right-click on the file, select Properties, and then click the tab labeled "File Hashes".  There are two zones in the tab panel.  The top zone shows the hash values of the selected hashes.  An Options link is provided to allow the user to change selected hashes, and the program remembers the selections in future sessions.  Message digests of the selected file are automatically computed and displayed.

The bottom zone of the tab panel provides the hash comparison feature.  A hash can be pasted into the Hash Comparison field, and it will be automatically compared against the selected hashes.  If a match is found, a green checkmark is displayed below the field along with the name of the hash that was matched.  If no match is found, a red "x" is shown.  You should be sure that the desired hash is selected before concluding that there is a mismatch because the program does not report whether it matches an unselected algorithm.

The comparison zone also provides a button that can be used to select a file to compare the current file against.  On clicking the button, a dialog is presented that permits the user to browse to the desired file.  The program remembers the last location the Open dialog was used to access, and subsequent dialog sessions return to that location, which may have been from a previous program session.

When comparing a hash that is pasted in, the one that matches is the one used, but when comparing another file, the first algorithm that matches in the alphabetically-sorted list is used.  If you want to use a specific hash, you have to change the selected hashes in Options by removing all hashes from the list that alphabetically precede the desired algorithm.  After so doing, you will have to reselect the file to compare because the Hash Comparison field will be blanked out on returning to the tab panel.

Nirsoft HashMyFiles

HashMyFiles is a full-scale Windows application that can be launched directly, or from the Explorer context menu when that feature is enabled.  HashMyFiles not only computes hashes of files and compares them against each other or against any MD5 or SHA-1 hash that is in the Windows clipboard, but it can also hash all files in a file system identifying hash duplicates in the process.  The program can compute hashes for a single file, a group of files, or an entire file system, but it only does so using CRC, MD5, and SHA-1.

The main program window provides a list of the files selected for hashing, and the hashes are computed automatically.  If a file in the list matches an MD5 or SHA-1 hash that has been copied to the clipboard, that file is highlighted.  If there are multiple hashes for multiple files in the clipboard, all matches are highlighted.  In addition, files in the list that are duplicates of each other are similarly labeled and highlighted.  The program can hook into the Windows Explorer context menu by enabling an option to do so.  (It is disabled by default.)  When enabled, right-clicking the selected files or folders and selecting HashMyFiles in the context menu will bring up the program with the file hashes computed and matches highlighted.  Selecting a large number of files, or the base folder of a large tree, can result in a lengthy delay while the hashes are calculated.

This program can be configured to operate from the Windows system tray.  Closing the program with this feature enabled--it is disabled by default--will allow quick access to the program window for further use.  Selecting another file to hash will restore the program window with the newly selected file and computed hashes added to the bottom of the list.

The NirSoft site for HashMyFiles reports support for all Windows versions since and including 2000.  The program has a faulty interaction, however, with a security feature of Windows 7, and as the feature also exists in Windows Vista, presumably with it as well.  One of the more recent capabilities of Windows is to keep track of the origin of individual files and request approval to open files that came from an untrusted source (e.g., the internet).  Sometimes, this causes HashMyFiles to launch multiple windows with the various selected files distributed among them or to open one window with multiple entries of the selected files present and marked as duplicates.  Another problem with the program is the low-contrast highlighting used to identify matched entries.  On some monitors, the low-contrast is difficult to dinstinguish at some visual angles and virtually disppears at others.

Febooti Hash & CRC

Like HashTab, Febooti Hash & CRC works as a tab on the file property page.  To compute the hash of a file, you right-click the file, select Properties, and then click the tab labeled "Hash / CRC".  The upper portion of the tab panel shows the name of the file being hashed, if just one had been selected, or a count of the total number selected.  It also shows the file system location of the selected file(s), although a deep location will be truncated.

The middle portion of the panel lists the available hashes which can be easily selected or excluded using a checkbox next to each one.  Two of the algorithms, MD and RIPEMD, have drop-downs to the left that allow you to select the version of the algorithm to use in the main list.  The algorithms that are selected when the program starts, which are remembered from the last session, are automatically computed for the selected file.  If an algorithm is added, you must click the Compute button in the lower section to recompute the hashes to include the newly selected one.

The lower portion of the tab panel provides a mechanism for switching the file to use when multiple files were selected in Windows Explorer.  To compute hashes for a different file from a group, just click the View file drop-down, select the desired file from the list, and click Compute.

Unlike the other programs reviewed here, Febooti Hash & CRC, does not provide a hash comparison feature.  There is no way to directly compare the hash of a file against published hash or against the hash of another file.  To compare the computed hash against a published hash, you must click the Copy button, select which hash (or all) to place on the clipboard, paste the hash(es) into another window such as an empty Notepad document, paste the published hash into the same window, and then visually compare them.  If you need to compare two files, you have to go through this exercise for each file before comparing them.  The program does, at least, make it easy to get the computed hashes onto the clipboard.

Microsoft File Checksum Integrity Verifier

The Microsoft File Checksum Integrity Verifier (FCIV) is a console application, which means that it only runs inside a command window.  You might wonder why such a program would be considered here, but there is a unique capability provided by this program that is worth a look.  For computing the hash of a single file, this would not be the tool to use, but it can be used to create a database of hashes for many files, including recursively through an entire file system, and then later use that database to revalidate those same files.

FCIV only computes hashes using the MD5 and SHA-1 algorithms.  The purpose of this program is to provide a method by which large numbers of files can be validated very quickly thus exposing unauthorized modifications.  The following command can be used to generate a database of hashes for all ".exe" and ".dll" files below C:\Program Files using both MD5 and SHA-1:

fciv "c:\program files" -xml c:\temp\pf.xml -r -both -type .exe -type .dll


If you leave out the type argument(s), it will compute hashes for all files that it finds.  The following command can be used to validate the database against the same files at a later time:

fciv -v -both -xml c:\temp\pf.xml


The program will report any differences that it finds.  It does not report the presence of new files, but it does report any files of the original set that are missing.  The setup of the program is entirely manual.  After extracting it from the download, the program must be copied to a location that is in the command path or its extracted folder must be added to the path.  Once this is done, it can be executed from any command window.  To get help information about the program, type "fciv -h".  The help information includes examples, but there are some differences with the information provided and the way the program actually behaves

Quick Selection Guide

HashTab    Rating 7 of 10

Pros   Works in a tab of the Windows Explorer file property page.  Computes hashes for fifteen algorithms including all of those described above.  Allows direct comparison of any hash that can be pasted in thus obviating the need for error prone visual comparison.  Provides a file comparison feature that permits direct hash comparisons with another file.  This is useful except as noted below.  HashTab is available for both Windows (except 9x, NT, and 2k) and Mac OSX.
Cons   The file comparison feature could have been done better.  It only compares the file against the first hash in the list, and there is no easy way to get it to use another. You have to remove all hashes that come before the one you want to use and then find the file to compare.  If you change the selected algorithms after a file has been selected for comparison, the file name field is blanked out so that you have to get the file again.  Does not work (tab is missing) when multiple files are selected.
Developer Home Page   http://beeblebrox.org
Download link   http://beeblebrox.org
File Size   780KB   Version 3.0   License Type Unrestricted Freeware   Installation Requirements WinXP or greater; Mac OSX

Nirsoft HashMyFiles    Rating 9 of 10

Pros   Full-scale Windows application. Computes hashes for individual files, multiple files, or entire file systems. Can compare files to hashes in the clipboard as well as to other selected files. Highlights duplicate files when an entire file system is loaded. Hooks into Explorer context menu for quick access to the program window. Can minimize to the system tray. Can create an HTML report of results as well as result files in various formats. Column list is customizable.
Cons   Only MD5 and SHA1 algorithms are computed. Behavior problems with later versions of Windows. Match highlights are in very pale colors that may be difficult to see on some monitors.
Developer Home Page   http://www.nirsoft.net
Download link   http://www.nirsoft.net/utils/hash_my_files.html
File Size   49.9KB   Version 1.67   License Type Unrestricted Freeware   Installation Requirements No install program to run.
Info   No software installation program. Just download and run.

Febooti Freeware Hash & CRC    Rating 5 of 10

Pros   Works in a tab of the Windows Explorer file property page. Computes hashes with fifteen different algorithms including those described above. It is easy to change hash selections and recompute. If multiple files are selected, it is possible to switch between the selected files to compute hashes for each one.  There is a simple mechanism for copying computed hashes to the clipboard.
Cons   No comparison feature. To check computed hashes against another file, you must compute the hashes for each file separately, paste the results into a text document, and then visually compare them.
Developer Home Page   http://www.febooti.com
Download link   http://www.febooti.com/products/filetweak/members/hash-and-crc
File Size   782KB   Version 3.0   License Type Unrestricted Freeware   Installation Requirements Windows only.
64 Bit version available   64 Bit version available

Microsoft File Checksum Integrity Verifier    Rating 7 of 10

Pros   Can be used to create a database of computed hashes and revalidate against it.
Cons   Only MD5 and SHA1 algorithms are supported. Only works in a command window.  The program's options are difficult to understand and use effectively, and the help provided is of limited usefulness as it has some inaccuracies.
Developer Home Page   http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=b3c93558-31b7-47e2-a663-7365c1686c08
Download link   http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=b3c93558-31b7-47e2-a663-7365c1686c08
File Size   116KB   Version 2.05   License Type Unrestricted Freeware   Installation Requirements No install program to run.

Please rate this article: 

Your rating: None
0
No votes yet