Scripting Tips #2: How to compare 2 sets of files to see if they are the same?

Posted on Feb 3, 2012

Excerpts from Simplify, Automate, Liberate! For Unix/Linux System Administrators

written by Gerald Yong


You have 2 different folders, and you want to compare the files in the folders to see if they are the same or different. The files have the same names in both folders.


Compute a fast hash or checksum for all the files would be a faster method.



gen_checksum() {

cksum * > ${CKSUM_FILE}

gen_checksum ${DIR1}
gen_checksum ${DIR2}

diff ${DIR1}/${CKSUM_FILE} ${DIR2}/${CKSUM_FILE}


You may be tempted to use diff to compare each file with its twin in the other folder to see if there are differences. However this will take much longer.

cksum will generate a unique number for specified files. Because cksum can generate the number quickly, it can be used as a quick way to see if 2 files have the same checksum, and hence determine if they are the same.

The cksum method is a fast method of comparing files, but do note that 2 files with the same contents but different last modified dates will still have the same checksum.

# cksum *
735629196 151 1.ksh
3476470671 145 IP.txt
515424607 201 a.out
3407276374 223 ansichart.ksh
2496520640 112 arithmetic.ksh
217785708 399 arrowkeys.ksh
3210263215 139 ask_userinfo.ksh
3852637783 160 awk1.awk
3245121712 201 check_digit.ksh
3171192017 389 child.ksh
1248279931 229 chk_file_exists.ksh
552807997 116 chk_ora_listener.ksh
3997812998 206 chk_root.ksh
4060654239 291 chk_user.ksh
1701067367 390 chk_user2.ksh

The script above uses cksum to generate checksums for all the files in each directory and stores them in a text file in each directory. It then uses diff to compare the text files, so that you can tell which files have different checksums.

Need more information? Looking for expert advice to implement a mid-range to
enterprise level backup/storage solution in your organisation?

Click on the image below to contact us!