I manage a Seafile server currently deployed in Hong Kong (Aliyun ECS + NAS for storage). However, the bandwidth of the server is starting to be a bottleneck, and connectivity for our new colleagues in Europe is far from smooth.
Anyone runs Seafile behind a CDN? Which one? Is it possible to cache files as they’re being synced by the client, and browser downloads?
I am thinking to move the hosting to Europe, and utilize GeoDNS to only serve China-based users through CDN.
But, now, putting my own personal Seafile behind Cloudflare for testing, it seems to me that file downloads are always a MISS (looking at the cf-cache-status http header). I suppose that’s because the UUID part in the URL (like /seafhttp/files/a903aea2-5ff6-42d0-b32b-212967f93426/image.jpg) always changes. Is there a way to make Seafile-served resources more cacheable?
Would only be possible with pro edition (HA) and some distributed storage. Not sure if high availability works well when distributed across continents, though.
Thanks for your reply, @shoeper!
For the sake of keeping the cost low (we are an environmental education non-profit) and the setup simple, I wanted to investigate the possibility of hosting in a single location and accelerating the downloads with a CDN.
If you have an insight into the internals of the sync protocol: Do you think it would be difficult to make identical blocks ( = a block of a newly stored file, that many clients will start syncing) available on an identical endpoint, so that they would be CDN-cacheable?
From a quick test I just ran, the clients get the resources with requests like:
"GET /seafhttp/repo/e6ec5507-7938-4f26-855c-e695aeb58be1/block/1ecfa0d8f0b08c76570398cb007f82985f031c13 HTTP/2.0" 200 8388608 "-" "Seafile/7.0.5 (Linux)" "-"
Then the same file from another machine:
"GET /seafhttp/repo/3e8c375c-d8a2-49d5-b9a7-f180569d0187/block/1ecfa0d8f0b08c76570398cb007f82985f031c13 HTTP/1.1" 200 8388608 "-" "Seafile/7.0.4 (Windows NT)" "-"
So the UUID/hash of the block stays the same, but the ID of the repository is different. Is there a strong reason for why it could not be on an identical endpoint ( = the repo ID being the same for everyone), so that this response would be cacheable?
Thanks for your time! Curious about where we can get this!
I will anyway test the usability of hosting in Europe and serving China through a CDN. Might at least help to let the CDN, with servers on a network with better peering than most home users, do the proxying, even if the responses can’t be cached.
In case the core team thinks that making the sync and browser downloads more caching friendly is a good idea, we would definitely try to implement it and work on contributing it back to the upstream.
The issue is that the authorization token is only valid once. The data of blocks itself is immutable. Maybe you could let your cache make HEAD requests for seafhttp in case the item is cached and return the cached block when the authorization is valid. Not sure if that works, maybe it (support for HEAD requests) could be implemented. In such a scenario a HEAD request should invalidate the auth token imho.
It is also possible to receive files instead of blocks through seafhttp. In that case I’m currently not sure whether it is possible to know whether the cached version is up to date or outdated. I think seafhttp returns the content-length but no checksum, so would could only check whether the length changed which might not be sufficient. The apps and Seahub receive files instead of blocks.
If HEAD requests work, the data could be served from cache and only small requests and replies would have been transferred to the remote server. It would still add latency but most likely significantly reduce the traffic.
When looking at the “upload” requests it should be possible to also cache data on upload. That way data uploaded form Europe which likely (?) might be downloaded shortly after upload from other clients could immediately be cached “locally”.
When immediately serving from cache everyone could download the data, thus the check is necessary imho.
Another way could be only running seafhttp redundantly - not sure what requirements would have to be fullfilled atm, though. In such a scenario one would have to make the storage available to seafhttp in Europe somehow and add a cache on file system layer. It would have to be checked which latencies seafhttp accepts or whether it would fail too many requests.
Actually I would find it interesting as well (my deployment has a storage server at home which doesn’t have much bandwidth and I could add a cache on a reverse proxy. My cache server wouldn’t have much storage, so I would need some smart cache eviction strategy - not sure which one would work best (some combination of like half the cache is used for recent uploads and another half is used for most frequently accessed blocks in e.g. last 7 days).
EDIT: Please note that I heavily changed my post since first submission.
Have you considered setting it up in a cluster solution, and having one end point in HK and the other in Europe? The downloads will be much closer to your users location then.