Thanks @feiniks for unmasking the bug!
I respect the dev team’s decision not to support this case. Yet, please allow me to offer a solution. I have a fix for seafile-daemon here on github: https://github.com/vacaboja/seafile I will also submit a pull request presently.
Plea
Before I go into the details of the fix, let me bother the dev team with a little plea. Of course, since I want compatibility with git, I can use my own fork of seafile-daemon, and I am content with that. Nevertheless, I wish to ask the dev team to reconsider their position on this issue. Messing up users’ git repositories really isn’t a good look. Especially considering that many people willing to self host their own cloud storage are likely to be nerds like myself, thus likely to use git. Besides, anyone who knows that seafile does not sync git repositories reliably, will naturally be wary of the reliability of the system in general. I tried to make fixing the bug as easy as possible: in fact, I believe that you only have to accept the pull request. Of course, I am willing to revise the code to meet any concerns.
Details of the bug
Given the explanation offered by @feiniks, one wonders why, in the tests performed by @tomservo, only occasional files went missing? In fact, things being as by @feiniks, seafile-daemon should miss every single object file saved by git, not just a few.
Well, @feiniks is of course right, but there is more to the story. Seafile indexes files on IN_CLOSE_WRITE and it does not on IN_CREATE, however there are other inotify events that may trigger indexing. One is IN_MODIFY, which does not concern us here. Another one, however, is IN_CREATE on the containing directory. In fact, in wt-monitor-linux.c the function add_watch_recursive() will cause any file present in a newly created directory to be indexed. This is precisely how most of the files in .git/objects get indexed in @tomservo tests. In fact, in those tests, git creates the subdirectories inside .git/objects and the files therein in rapid succession, and, by the time seafile-daemon gets to process the events, git ha already completed most of its work. Understanding this, it becomes clear that a more reliable way to trigger the bug is to create the repository, commit some files to it, then wait for seafile to sync, and then finally perform a new commit. The following script reliably triggers the bug for me, causing seafile to miss about 50 of git’s internal files, even by merely creating 200 1kb files.
mkdir test
cd test
git init
mkdir files1
for A in `seq 100`
do
dd if=/dev/urandom of=files1/file$A bs=1024 count=1
done
git add files1
git commit -m foo
sleep 60
mkdir files2
for A in `seq 100`
do
dd if=/dev/urandom of=files2/file$A bs=1024 count=1
done
git add files2
git commit -m bar
The command sleep 60 is to give seafile some time to index the first commit.
The fix
As correctly pointed out by @feiniks there is no way to tell whether a IN_CLOSE_WRITE event comes from an open() syscall or a link() syscall. So we need to guess and index the file when we guess link(). What happens if we guess wrong?
- We guess
open() and it was a link(). Then we miss a file. This is the bug. This is very bad.
- We guess
link() and it was a open(). Then we index a possibly incomplete file. This is tollerable as long as we don’t do it too much. Indeed seafile-daemon indexes incomplete files already in multiple occasions: on IN_MODIFY and when IN_CREATE fires on the containing directory, for instance. This is not a big deal because any file being written to will eventually get its own IN_CLOSE_WRITE event, and then it will be indexed with its final content.
So we need to make a reasonably reliable guess with no false opens.
The guess strategy in my code is as follows. When an IN_CREATE event fires on a regular file, we don’t index the file, but we remember this event for awhile. After about one second, if the same file has received any event causing it to be either indexed or deleted, or if we detect any change in the file through its last modification time, then we forget about the whole thing. If not, either it was created by link() or, at least, it is not being actively written to. Then, in this case, the safest bet is to index the file.
The implementation is straightforward. Looking at the wt-monitor-linux.c you see that the one second timer is created by imposing a timeout to the select() syscall in wt_monitor_job_linux(). The exact timing of this is very uncritical, so who cares if, under stress, it might fire less frequently. Let’s say that it fires at seconds n-1, n, n+1, n+2, etc. When second n+2 fires, we produce internal WT_EVENT_CREATE_OR_UPDATE events for all unchanged files created between seconds n and n+1 (so this files are 1 to 2 seconds old). This is effected by the two hash tables recheck_next and recheck_accu in the RepoWatchInfo structure. Specifically, at any time, files receiving IN_CREATE events are accumulated in the recheck_accu hash, while the files in recheck_next are those accumulated during the previous second. When the seconds timer fires, we deal with all the files in recheck_next, then we swap the now empty recheck_next with recheck_accu. Every time we fire a WT_EVENT_CREATE_OR_UPDATE on a file that is either in recheck_accu or recheck_next, we expunge that file.
Thanks for reading to the end of this.
And thanks to the dev team for their work maintaining seafile.