Has anyone else noticed this behavior? Is this a known issue and do we have a way to fix it?
I have recently setup a small-ish installation of Seafile (Pro version) for a company with ~70 roaming clients syncing files over (costly & unreliable) metered LTE/4G/Satellite connections.
The client connection is unreliable, may fail over to a secondary connection (when available), and frequently goes offline for hours.
Under the impression that seafile client supports resuming downloads (i.e. without re-starting the transfer from byte 0) when connection is interrupted, I did not expect to see one of the clients used up more than 2GB data, during the transfer of a single 1GB file.
At that point, the user decided to cancel the transfer and deleted the file from seafile library to avoid excessive bandwidth charges, but is willing to test again if we have a fix for this.
I have checked the seafile.log file from the client installation (C:\users\username\ccnet\logs), showing the transfer was timed out (multiple times), e.g.:
[11/27/19 08:53:11] http-tx-mgr.c(783): libcurl failed to GET https://SEAFILE-FQDN/seafhttp/repo/d50edf89-6b59-46ca-9e84-7b0c07396034/block/9af37b7269faa17d1017a82bfdee5f3f5ff567dc: Timeout was reached.
[11/27/19 08:53:11] repo-mgr.c(4338): Transfer failed.
[11/27/19 08:53:11] http-tx-mgr.c(1157): Transfer repo 'd50edf89': ('normal', 'data') --> ('error', 'finished')
[11/27/19 08:53:11] sync-mgr.c(621): Repo 'REPO-0001' sync state transition from downloading to 'error': 'Data transfer timed out. Please check network or firewall'.
Looking further into this; during the period that the client app has generated excessive download traffic, its log file contains ~100 log entries for interrupted or timed-out data transfers.
On the seafile data model document, it is stated seafile uses the “Content Defined Chunking algorithm to divide file into blocks […] a block’s size is around 1MB”.
Now I wonder if this algorithm is applied for transfers (or maybe “Range” http requests instead). In my case, I would expect 100 disruptions logged by the Client app would amount to no more than 100mb of retransmitted blocks. Unless I am missing something here.
The client downloads the file in a block by block manner. There is no resumable download for each block. That is, once a block download is interrupted, it has to be downloaded from beginning on retry. And now the block size is on average 8MB instead of 1MB. So it may consume more traffic than you expect.
On examining these requests issued by the client to the webserver, I have noticed the web server reporting on the log that the block was transfered entirely (8388608 bytes) 31 times. I have no idea why this happened, but if anyone can explain i would be very happy to know.
Most importantly, is there a way to force a lower block size, and can i go bellow 1mb? For such slow connections 128kb blocks sound more realistic.
I’ve found the option fixed_block_size may help, but the documentation states this only affects files uploaded via the web console. Is this something different?
The nginx log always records 200 because the response code is sent before the content. So no matter the client receive the complete content or not, it’s always 200. With regard to the transferred size, perhaps nginx sent the content to the socket buffer, but the content wasn’t really sent over the network.
Unfortunately there is no option to control the size of blocks produced by the sync client.
Does the users really need to sync when they’re in low-bandwidth network?
Thank you for answering my question and shedding more light to this.
Reading the seafile client source for uploads, my impression is that a file must exceed 100mb for the client carry out its upload it in a resumable manner; and once this happens, it is in 1mb chunks. Also, I think this results in 100mb blocks (vaguely remember observing this too). And, if I understand correct, uploading a file under 100mb via the client would be carried out in a non-resumable manner. Is that correct?
As said in earlier post, my specific case is about users that are on-the-move most of the year, and on low-bandwidth, unreliable connections. On top of it, those are metered connections, where losing a block results in extra charges by the provider. Therefore (for my use case):
it would be beneficial to my users’ downloads if I could force the block size to be less than 1Mbyte, which is currently the minimum allowed value. I could probably adjust my workflow to use the web-ui when uploading files to the user shares.
it would be awesome if the blocks uploaded by the sync client were configurable at client level, effectively making seafile resilient and cost-effective for slow, metered connections.
Now, I have spotted the places in the source code where those choices are made. Naturally, I am unsure what the implications of changing these values/thresholds will be. But I am willing to test and see if these changes can bring peace of mind to my users.
I think I can make changes on the client (reliable-upload.cpp) and compile it myself, but I am unsure how I would go about making changes to the seafile Pro edition (e.g. http-server.c) and produce a Pro docker image with those changes. Do i follow the normal build process outlined in the developing section of the docs, or is that only for the community edition?
Unfortunately not true. The average block size is 8MB. 100MB is the threshold to start a batch of upload, not to divide a block.
I don’t think letting the user choose the “block size” is a good idea. Most users don’t understand what this is. And making every client use different block size will cause unexpected conflict issue when syncing a library with existing folders.
There is no simple solution to this issue based on how the sync client works now. So my suggestion is to ask the user to pause automatic syncing when working on the move.
Another alternative is to use the SeaDrive client. It won’t automatically sync anything unless the user explicitly open a file.