Finding file duplicates > Very large data set >
by Noob-Tech-Ninja from LinuxQuestions.org on (#6QJQ4)
Hi there guys.
I was hoping that you could help me with a issue that I've been experiencing.
Background:
I have a huuuuuge media collection.
I have literally hundreds of thousands of photos / images.
And tens of thousands of video files.
I have a LOT of duplicates within all of these.
These are spread out over quite a few HDD's (both internal and external) drives.
Issue:
What I'd like to do is -
1. Be able to scan these files and determine which ones are duplicates.
So I can then decide what to do with them.
2. I'd also like the scan to be very thorough (E.G. not just a filename scan).
I'd like multiple types of scans / verification, to determine if these are the same file or not.
E.G. file name, file size, dimensions of image or video file,
a "digital fingerprint" scan of the files.
Questions:
1. How can I go about achieving this ?
2. What app(s) can I use to obtain the information that I want ?
3. Is there any gotchas or things to be aware of when trying to achieve this ?
Troubleshooting:
1. I have used Meld in the past, and although it is powerful, I find that it
crashes (esp when using external HDD drives).
2. I've done quite a bit of Googling, however there is a very large amount of
conflicting advice provided by the various solutions.
TIA for any help or advice.
Useful information:
PC OS: Ubuntu 22.04. LTS
Kernel: 6.8.0-40 generic (64 bit)
KDE Plasma: 5.24.7
I was hoping that you could help me with a issue that I've been experiencing.
Background:
I have a huuuuuge media collection.
I have literally hundreds of thousands of photos / images.
And tens of thousands of video files.
I have a LOT of duplicates within all of these.
These are spread out over quite a few HDD's (both internal and external) drives.
Issue:
What I'd like to do is -
1. Be able to scan these files and determine which ones are duplicates.
So I can then decide what to do with them.
2. I'd also like the scan to be very thorough (E.G. not just a filename scan).
I'd like multiple types of scans / verification, to determine if these are the same file or not.
E.G. file name, file size, dimensions of image or video file,
a "digital fingerprint" scan of the files.
Questions:
1. How can I go about achieving this ?
2. What app(s) can I use to obtain the information that I want ?
3. Is there any gotchas or things to be aware of when trying to achieve this ?
Troubleshooting:
1. I have used Meld in the past, and although it is powerful, I find that it
crashes (esp when using external HDD drives).
2. I've done quite a bit of Googling, however there is a very large amount of
conflicting advice provided by the various solutions.
TIA for any help or advice.
Useful information:
PC OS: Ubuntu 22.04. LTS
Kernel: 6.8.0-40 generic (64 bit)
KDE Plasma: 5.24.7