Seafile Client on Linux re-uploads unchanged files upon Syncing a library with an existing directory


The scenario is as follows:

  • I had the library sync-ed on CompWin (and also on other comps) running Windows 10
  • I had a new machine running Ubuntu 17.10 (CompLin), and I wanted to have the same library there but instead downloading the whole library (which is huge) I copied using rsync and Samba share from the CompWin to CompLin
  • I then synced the library using existing folder (mind you, now copies on CompWin CompLin and server contain same contents in the library)
  • During synchronization Seafile Client starts re-uploading probably all of the files (i interrupted it after 190 files) and also shows the files as modified on the server

It was behaving as if the files on the CompLin were different than the ones on the rest of the machines and the Seafile Server. I checked one of the files (downloaded from server, from CompWin and from CompLin) using SHA1 algorithm and the hashes are the same so the contents is the same. Also the file sizes are identical and also the file times are identical (at least down to the minute).

There is no reason for Seafile Client to reupload all those files which are identical, there is clearly some kind of problem.

Versions used:
seafile-gui (6.1.2-1338~ubuntu17.10.1)
seafile-daemon (6.1.2-1225~ubuntu17.10.1)

This may due to the fact that we changed the file chunk size in recent client version. The chunks produced by the new client are different from the ones on the cloud. To avoid false re-upload, you have to make sure the file last modification time stamps are kept when you rsync them. Or you can download the files to the new computer.

Ok, so I checked the modification dates by running a python script walking the whole directory recording file sizes and modification dates (up to a second precision). Both copies yield the identical outputs (byte for byte), meaning that ALL files have the same dates and all files have the same sizes, and I trust rsync checksuming so all files have the same contents (I also checked few of the re-uploaded files and they were indeed the same, having the same SHA1 hashes).

If the client chunking algorithm has been changed (and the server persists according to older algorithm) with time the libraries will grow 2x in size because all stuff will be re-uploaded even though already on the server.