Upload and download speed

Ok, so recently i have been trying out diffrent self hostable cloud storage providers, including nextcloud, filebrowser and pydio cells. all have worked but not quite in the way i want. the speed is big for me, and also a problem nearly all of the cloud storages i have tested have been not slow but not utilizing my gigabit internett. where my server is hosted i have gigabit internett, though im not testing from there im in another place with another computer. Here i have 500 mbps so im hoping atleast to get 50 mb/s, nextcloud gave me 18-20 mb/s upload but download good. pydio gave me the same. But then i found Filebrowser, 50-60 up and down, wonderfull. But i think i want to migrate to seafile because of the looks and the bigger comunity with more support. on seafile im getting 20-30 up and down is wierd because it varies a lot, most of the time same as up but in the end goes up but its too late to see the full speed.
here is the docker compose file:

services:
  db:
    image: mariadb:10.11
    container_name: seafile-mysql
    environment:
      - MYSQL_ROOT_PASSWORD=password  
      - MYSQL_LOG_CONSOLE=true
      - MARIADB_AUTO_UPGRADE=1
    volumes:
      - /opt/seafile-mysql/db:/var/lib/mysql  
    networks:
      - seafile-net

  memcached:
    image: memcached:1.6.18
    container_name: seafile-memcached
    entrypoint: memcached -m 256
    networks:
      - seafile-net

  seafile:
    image: seafileltd/seafile-mc:11.0-latest
    container_name: seafile
    ports:
      - "80:80"
    volumes:
      - /mnt/SSD/General:/shared   
    environment:
      - DB_HOST=db
      - DB_ROOT_PASSWD=password
      - TIME_ZONE=timezone
      - SEAFILE_ADMIN_EMAIL=mail
      - SEAFILE_ADMIN_PASSWORD=password   
      - SEAFILE_SERVER_LETSENCRYPT=false  
      - SEAFILE_SERVER_HOSTNAME=127.0.0.1
    depends_on:
      - db
      - memcached
    networks:
      - seafile-net

networks:
  seafile-net:

And here is my nginx proxy manager settings

client_body_buffer_size 512k;
client_max_body_size 100G;
proxy_buffer_size 512k;
proxy_buffers 16 512k;
proxy_busy_buffers_size 512k;

client_body_timeout 120s;
client_header_timeout 120s;
keepalive_timeout 120s;
send_timeout 120s;

proxy_connect_timeout 120s;
proxy_send_timeout 120s;
proxy_read_timeout 120s;

proxy_request_buffering off;

gzip on; 
gzip_comp_level 4;
gzip_min_length 256;
gzip_proxied expired no-cache no-store private no_last_modified no_etag auth;
    gzip_types application/atom+xml text/javascript application/javascript application/json application/ld+json application/manifest+json application/rss+xml application/vnd.geo+json application/vnd.ms-fontobject application/wasm application/x-font-ttf application/x-web-app-manifest+json application/xhtml+xml application/xml font/opentype image/bmp image/svg+xml image/x-icon text/cache-manifest text/css text/plain text/vcard text/vnd.rim.location.xloc text/vtt text/x-component text/x-cross-domain-policy;

There’s several things with seafile that can affect the upload and download speed and make it inconsistent. Mostly this is down to the data being de-duplicated. The tl;dr

  • New data is slower (chunks have to be added to the storage), already seen data is faster (chunks are already in the storage, so only new references to that existing data are needed)
  • The sync clients tend to be faster at upload than the web interface, because the dividing up into chunks happens in the client. They are especially faster with duplicate data since only new blocks have to be transferred.
  • The sync clients can be faster to download too. They keep a cache of chunks so downloading can sometimes reuse chunks the client already has instead of sending them over the network again.
  • The server can need more CPU and RAM than systems that don’t do de-duplication because there’s extra processing needed (I found 2 CPUs and 2GB of RAM was enough most of the time, but stepped up to 4 of each for those times where it helps).
  • Using HDDs is slower than SSDs. You normally would expect that, but it’s even worse with seafile because when reading or writing a file the drive needs to jump around more. For example to download a 1GB file, the seafile server will actually need to read hundreds of files (I think the chunks tend to average about 1MB each). And when you consider some chunks might be older than others, it’s easy to see why they wouldn’t always be next to each other on the disk.

All that means that if you upload the same file to test speeds 3 times, it will be slower the first time than it will be for the next two times. Especially when doing this through the sync client since the 2nd and 3rd time could even seem to be faster than your network connection would allow, because it can just refer to blocks that the server already has without needing to actually transfer them again. This also means that any test won’t represent the real-world speeds because it’s hard to predict how much new vs existing data to use.

And here’s some info on my setup for comparison. I don’t use docker, my seafile is running in a VM that is sharing SSD storage with several other VMs. Uploading 1.5GB through the web interface got me about 44MB/s just now. This included the time the server spent diving the temp file up into chunks and storing them after the upload was done. Watching performance monitors it is clear that the bottle-neck was in the storage, that was as fast as it could write the temp file, and then the chunks. Downloading that file was about 90MB/s. Uploading that file again got about 70MB/s, I believe because when dividing it up into the chunks, no new blocks had to be written.

2 Likes

Thank you for the answer, my setup is currently a external usb storage from seagate usb 3.1 and it is connected by usb 3, 2 gigs of ram and 2 cores of the intel 8250u from the htop there doesn’t seem to be that much of a spike on both ram nor cpu. The drive is about a terabyte. My internet speed at the place of the server is a gigabit and where I’m uploading and downloading there is half of that. Would it be possible if you explain a little more about the re uploading chunks and that it could be faster uploading the same file multiple times, why would a person do that. So if you could please explain point 1 thank you. And is there a way to increase chunk sizes, I have found using filebrowser(another self hosted cloud) setting the chunk size to about 900 mb has given me best speeds, on every single upload and download I get about 60 download and upload.
And how is it that example Mega manager to nearly double my internett download speed and how may i replicate this?

I just realized I wrote a lot of text here. The short version is, I don’t know what you can do to make your setup faster. Maybe switch to the go file server if you aren’t already using it (see seafile.conf - Seafile Admin Manual ), or switch your proxy to using HTTP/2 if you aren’t already using it. But in real-world file syncing operation I think you will find it is already fast enough to work just fine if just don’t worry about it.

I’ll take these questions a little out of order, but first I should note that I was wrong about the block size, it’s about 8MB, not 1MB like I said, and I don’t know of a way to change it. Also when looking up that number, I also found that I remembered the name wrong, Seafile calls them blocks, not chunks.

I don’t really know anything about mega manager, so I can’t answer that one.

I don’t think seafile’s storage blocks are comparable to the chunking in filebrowser. I think filebrowser’s chunks are just breaking the transfer into smaller transfers (chunks) to make it possible to resume an interrupted transfer without needing to start over from the beginning. But these chunks get merged on the other end and stored as one complete file, like the source file. So maybe the larger chunks are faster because they avoid needing to “warm up” the TCP congestion control as often. I don’t really know.

And for clearing up point 1 about the new data vs existing data, new data is slower because duplicate data is already on the disk and doesn’t need to be written again. The speed testing example I gave before isn’t something you are likely to do a lot, so lets consider a more real-world usage example. Suppose you have a project with 100MB file in the directory you are syncing with seafile through the agent. It could be anything: a program you are writing, a video you are editing, or a giant powerpoint to convince your boss you are so good at powerpoint that you should get a raise. To make the example easy, it’s just one single 100MB file. You want to make some changes, but you aren’t sure if the change will work out or just make a mess of the project, so you first make a copy of the project into the file “project_snapshot”.

The agent sees 100MB of new files and begins dividing them out into blocks to stick in the block cache. When done it has some small data that gives the filename of the new file, and the list of blocks that make up the file’s contents, but there aren’t any new blocks. The agent sends that file that lists the new filenames to the sever. In this case the agent only sent a few KB over the network but the result is a new 100MB of file on the seafile server (but not on the server disk, that just stores the few KB).

Now you make your edit, adding 10MB more to the project, and save. The agent again goes through the file breaking it out to the block cache, and making the file that lists the file’s new list of blocks. This time the agent sends that to the server and the sever says it doesn’t already have these 10MB of new blocks, so the agent sends them. This time a transfer of 10MB plus a few KB for the file that describes the list of blocks gets all 110MB of the file up to the server.

You make one more change, and decide you don’t like that change. But you forgot to copy the file to another “project_snapshot” file. No problem, you go to seafile server web interface, find the file, look at the history, and restore the previous version. The server sends the file with the block list for that file to the agent, and the agent finds it still has all the needed blocks in its block cache, so it reassembles the older version of the file from those blocks without needing to download a new copy of the file.

In this scenario other programs I’ve used for syncing files like nextcloud, would have uploaded another 100MB for the “project_snapshot” directory, and then another 110MB for the edited file. In my case, nextcloud can saturate my 1Gb/s network connection with its uploads and downloads, apparently because it isn’t doing as much processing on the files, but by avoiding unneeded transfers the seafile agent can sync these edits much faster.

In both cases, a little CPU time has been exchanged for not needing to transfer and store as much data. This storage model makes the trade-off of having uploads and downloads though the web interface be slower, and through the agent it takes extra CPU time (and some disk space for the block cache), with the result that our example 210MB of files takes only 110MB of space on the server, and less time to transfer (especially over slower network connections like coffee shop wifi). This also enables quick access to older versions of files without saving a second copy by just keeping the older block list so the file can quickly be reverted to that older state.

1 Like

To be honest that was a really good explanation, I will test the go server. Also I maybe think that Seafile will be good as projects like you’re example but not for stuff like uploading big unique files, you said that Nextcloud managed to saturate you’re internet and I will continue trying Nextcloud. Nextcloud and file browser are better with unique files but Seafile is better fore syncing editing not o my with speed but for minimizing storage use. I know I have been asking a lot of questions but I have a few more quickly how did you manage the speeds in Nextcloud and did you use AIO(all in one) and did you use community version. Thank you so much. And just to clarify mega manager was a spelling mistake it’s just mega, mega .io is like a cloud service like google drive but for some reason I manage to get like 130 mb/s download speeds on unique downloads even though my internet is only about 60 .

Thank you for that. I do sometimes worry with my longer posts that it’s just an unwelcome level of detail. :slight_smile:

I think you are right about the comparative strengths and weaknesses between seafile and nextcloud.

For my nextcloud install I tried a few times to set it up. It was actually while troubleshooting some problems with the AIO version that I decided that I don’t like docker. I know I’m the in the minority there, but docker made troubleshooting the problem significantly harder. Yeah, I know, “old make yells at cloud” vibes. Anyway I already had the environment for running VMs, so I settled on just making a new ubuntu server VM and installing using the official manual install steps in
https://docs.nextcloud.com/server/stable/admin_manual/installation/source_installation.html
It took a while to go through all the steps, but has worked pretty well since then.

Thank you for the chat and answers!