More information about md5deep can be found at http://md5deep.sourceforge.net

What are checksums? 

A checksum is a unique string of characters, or “hash”, assigned to a file. The hash stays the same until the file changes. This is useful for long-term preservation as a way to keep tabs on the degradation on files. This tutorial details how to generate and use checksums with md5deep. 

How do I install md5deep?

The installation procedure varies by operating system. Consult the md5deep manual for your particular OS.

Which OS is best?

Many digital preservationists prefer to use open source Linux operating systems such as Mint or Ubuntu since they are cleaner environments for digital forensics.

Md5deep is installed. Now what?

The checksum process works through Unix commands in Terminal (Terminal is a pre-installed program on the Mac OSX and Linux operating systems). In order to generate a checksum for a file, you must navigate to its directory by typing  cd [directory name]

Once in the correct directory, simply typing “md5deep” and a filename will generate a hash. An asterisk after md5deep will designate all files in that folder. In this example, the hash is the string of characters starting with ff229...

dams@dams-iMac:~/Desktop$ md5deep test.txt
ff22941336956098ae9a564289d1bf1b  /home/dams/Desktop/test.txt

The real power of md5deep comes with adding letters, or “flags”, to the command line. These flags perform different operations such as matching, recursively generating hashes, and estimating the time needed to generate large sets of hashes. For the DAMS's purposes, only a few flags (-r, -x, and -m) are used frequently.

Writing checksums to a file

Before discussing the flags, it’s important to know how to write a command’s output to a file. Users can either designate an existing file or let Terminal create the file for them. 

There are three files on my Desktop I would like to generate hashes for:

 

By typing "md5deep *" I generated these three hashes.

dams@dams-iMac:~/Desktop$ md5deep *
7a803c643432ea1443e3dbd5ee14db26  /home/dams/Desktop/pogocat.gif
ff22941336956098ae9a564289d1bf1b  /home/dams/Desktop/test.txt
8741172ab4d318c620f21d4c17f213ac  /home/dams/Desktop/beyonce.jpeg

To create a file for the hashes, simply type md5deep * >> [filename]
This command does not display any output in Terminal unless there are errors. The file should appear in your folder like so:

  

Recursive Mode

The -r flag allows md5deep to run hashes on the contents of sub-directories, including any directories within that sub-directory. For example, on my desktop there's a directory called “checksum_test.” Within that directory there are four more sub-directories.

Simply typing md5deep checksum_test will not return any hashes. Instead, it will say:

dams@dams-iMac:~/Desktop$ md5deep checksum_test
/home/dams/Desktop/checksum_test: Is a directory

In order to give md5deep the permission to go into a folder (and that folder’s folders), the recursive flag needs to be added:

dams@dams-iMac:~/Desktop$ md5deep -r checksum_test
e73f683a4d79ed3067b7dfb1dd65cea5  /home/dams/Desktop/checksum_test/folder_3/6.3a_Counter_Problems.png
bf6ed70e53234ee33be89487cf05e4a4  /home/dams/Desktop/checksum_test/folder_1/6.1_Includes.png
e020e204de0d6988cd5e352c41d589d9  /home/dams/Desktop/checksum_test/folder_2/6.3b_Counter_Fix.png

Matching

Md5deep also allows users to generate hashes and compare them to a pre-established list of hashes. There are two types of matching: positive and negative. Positive matching shows filenames with hashes that DO match, and negative matching shows filenames with hashes that DO NOT match. This is where you can see if the hash (and, most importantly, the file) has changed.

The syntax for positive matching is: 

md5deep -m [known_hashes_file] [file_to_check]

The syntax for negative matching is:

md5deep -x [known_hashes_file] [file_to_check]

Remember that an asterisk signifies all files in that directory.

So let's say someone added something to the “test.txt” file after I wrote its hash to the “known_hashes.csv” file. When running a positive match, “test.txt” will not appear in the list of matches

dams@dams-iMac:~/Desktop$ md5deep -m known_hashes.csv *
/home/dams/Desktop/checksum_test: Is a directory
/home/dams/Desktop/beyonce.jpeg
/home/dams/Desktop/pogocat.gif

Conversely, if a negative match is run, “text.txt” will be the only file to appear. Notice the tilde. This symbol denotes the old version of the file.

dams@dams-iMac:~/Desktop$ md5deep -x known_hashes.csv *
/home/dams/Desktop/checksum_test: Is a directory
/home/dams/Desktop/test.txt
/home/dams/Desktop/test.txt~

If you wish to show the hashes next to the file names, simply capitalize the flag:

dams@dams-iMac:~/Desktop$ md5deep -X known_hashes.csv *
/home/dams/Desktop/checksum_test: Is a directory
c2dff9abb840fa6a95b115b9eb25dbf2  /home/dams/Desktop/test.txt
d1de632acc13af5231c10f022a3c27c9  /home/dams/Desktop/test.txt~
dams@dams-iMac:~/Desktop$ md5deep -M known_hashes.csv *
/home/dams/Desktop/checksum_test: Is a directory
8741172ab4d318c620f21d4c17f213ac  /home/dams/Desktop/beyonce.jpeg
7a803c643432ea1443e3dbd5ee14db26  /home/dams/Desktop/pogocat.gif

Matching can also be done recursively. Just be sure to put the recursive flag first.

dams@dams-iMac:~/Desktop$ md5deep -rm known_hashes.csv *