How to Sync and omit duplicates between directories
by leoio2 from LinuxQuestions.org on (#51D73)
Context:
I have a URL that has 8 directories. In Directory #1, there are 80 files. In Directory #2, there are 90 files. 80 files in Directory #2 are the exact same as 80 files in Directory #1. But also in Directory #4, 6 and 8. There are also duplicates present from other Directories in the URL. I only want one copy of a file that has a unique name. So, after it is downloaded the first time in any particular directory, any file of the same name should not be downloaded again.
This will not work:
Code:rclone sync URL dest:Another user mentioned this:
Quote:
I have no idea how to use a script. But I wanted to at least make an attempt, so I came up with this:
Code:#!/usr/bin/env bash
rclone tree -i URL --level 2 $1 | sort >tmp1
rclone tree -i HDD_destination --level 2 $2 | sort >tmp2
combine tmp1 not tmp2>not_in_2
combine tmp2 not tmp1>not_in_1
rm tmp1
rm tmp2Is that correct or is it, at least, on the right track to fix this issue?
Alternative:
Someone also mentioned using symlinks with rclone, OR using 'wget -nc', but I'm not sure how they would prevent duplicates between directories.
Thank you.


I have a URL that has 8 directories. In Directory #1, there are 80 files. In Directory #2, there are 90 files. 80 files in Directory #2 are the exact same as 80 files in Directory #1. But also in Directory #4, 6 and 8. There are also duplicates present from other Directories in the URL. I only want one copy of a file that has a unique name. So, after it is downloaded the first time in any particular directory, any file of the same name should not be downloaded again.
This will not work:
Code:rclone sync URL dest:Another user mentioned this:
Quote:
This script serves a different purpose. But you can use some of the same logic and commands to achieve what you want to do. Also jump up one level, look at orig difflist with the rclone check command. https://github.com/88lex/diffmove/blob/master/difflist2 Code:#!/usr/bin/env bash # Requires installation of moreutils to run combine. # sudo apt install moreutils # The default command below compares directories, not files # Adjust --level to control how deeply you recurse into the tree rclone tree -di --level 2 $1 | sort >tmp1 rclone tree -di --level 2 $2 | sort >tmp2 # use these commands below if you want to compare files, not dirs #rclone tree -i --full-path $1 | sort >tmp1 #rclone tree -i --full-path $2 | sort >tmp2 combine tmp1 not tmp2>not_in_2 combine tmp2 not tmp1>not_in_1 rm tmp1 rm tmp2 |
Code:#!/usr/bin/env bash
rclone tree -i URL --level 2 $1 | sort >tmp1
rclone tree -i HDD_destination --level 2 $2 | sort >tmp2
combine tmp1 not tmp2>not_in_2
combine tmp2 not tmp1>not_in_1
rm tmp1
rm tmp2Is that correct or is it, at least, on the right track to fix this issue?
Alternative:
Someone also mentioned using symlinks with rclone, OR using 'wget -nc', but I'm not sure how they would prevent duplicates between directories.
Thank you.