I have a pretty big folder (~10GB) that contains many duplicated files throughout it's directory tree. Many of these files are duplicated up 10 times. The duplicated files don't reside side by side, but within different sub-directories.
How can I compress the folder to a make it small enough?
I tried to use Winrar in "Best" mode, but it didn't compress it at all. (Pretty strange)
Will zip\tar\cab\7z\ any other compression tool do a better job?
I don't mind letting the tool work for a few hours - but not more.
I rather not do it programmatically myself
Best How To :
WinRAR compresses by default each file separately. So there is no real gain on compressing a folder structure with many similar or even identical files by default.
But there is also the option to create a solid archive. Open help of WinRAR and open on Contents tab the item Archive types and parameters and click on Solid archives. This help page explains what a solid archive is and which advantages and disadvantages this archive file format has.
A solid archive with a larger dictionary size in combination with best compression can make an archive file with a list of similar files very small. For example I have a list of 327 binary files with file sizes from 22 KB to 453 KB which have in total 47 MB not included the cluster size of the partition. I can compress those 327 similar, but not identical files, into a RAR archive with a dictionary size of 4 MB having only 193 KB. That is of course a dramatic reduce of size.
Follow the link to help page about rarfiles.lst after reading help page about solid archive. It describes how you can control in which order the files are put into a solid archive. This file is located in program files folder of WinRAR and can be of course customized by your needs.
You have to take care also about option Files to store without compression in case of using GUI version of WinRAR. This option can be found after clicking on symbol/command Add on the tab Files. There are specified file types which are just stored in the archive without any compression like *.png, *.jpg, *.zip, *.rar, ... Those files contain usually already the data in compressed format and therefore it does not make much sense to compress it once again. But if duplicate *.jpg exist in a folder structure and a solid archive is created it makes sense to remove all file extensions from this option.
By the way: There are applications like Total Commander, UltraFinder or UltraCompare and others which support searching for duplicate files by various, user selectable criteria like finding files with same name and same size, or most secure, finding files with same size and same content, and providing functions to delete the duplicates.