Article 6QJQ4 Finding file duplicates > Very large data set >

Finding file duplicates > Very large data set >

by
Noob-Tech-Ninja
from LinuxQuestions.org on (#6QJQ4)
Hi there guys.

I was hoping that you could help me with a issue that I've been experiencing.

Background:
I have a huuuuuge media collection.
I have literally hundreds of thousands of photos / images.
And tens of thousands of video files.

I have a LOT of duplicates within all of these.
These are spread out over quite a few HDD's (both internal and external) drives.

Issue:
What I'd like to do is -

1. Be able to scan these files and determine which ones are duplicates.
So I can then decide what to do with them.

2. I'd also like the scan to be very thorough (E.G. not just a filename scan).

I'd like multiple types of scans / verification, to determine if these are the same file or not.
E.G. file name, file size, dimensions of image or video file,
a "digital fingerprint" scan of the files.

Questions:
1. How can I go about achieving this ?

2. What app(s) can I use to obtain the information that I want ?

3. Is there any gotchas or things to be aware of when trying to achieve this ?

Troubleshooting:
1. I have used Meld in the past, and although it is powerful, I find that it
crashes (esp when using external HDD drives).

2. I've done quite a bit of Googling, however there is a very large amount of
conflicting advice provided by the various solutions.

TIA for any help or advice.

Useful information:
PC OS: Ubuntu 22.04. LTS
Kernel: 6.8.0-40 generic (64 bit)
KDE Plasma: 5.24.7
External Content
Source RSS or Atom Feed
Feed Location https://feeds.feedburner.com/linuxquestions/latest
Feed Title LinuxQuestions.org
Feed Link https://www.linuxquestions.org/questions/
Reply 0 comments