Upload and download speed

turtle123n · June 27, 2024, 11:54am

Ok, so recently i have been trying out diffrent self hostable cloud storage providers, including nextcloud, filebrowser and pydio cells. all have worked but not quite in the way i want. the speed is big for me, and also a problem nearly all of the cloud storages i have tested have been not slow but not utilizing my gigabit internett. where my server is hosted i have gigabit internett, though im not testing from there im in another place with another computer. Here i have 500 mbps so im hoping atleast to get 50 mb/s, nextcloud gave me 18-20 mb/s upload but download good. pydio gave me the same. But then i found Filebrowser, 50-60 up and down, wonderfull. But i think i want to migrate to seafile because of the looks and the bigger comunity with more support. on seafile im getting 20-30 up and down is wierd because it varies a lot, most of the time same as up but in the end goes up but its too late to see the full speed.
here is the docker compose file:

services:
  db:
    image: mariadb:10.11
    container_name: seafile-mysql
    environment:
      - MYSQL_ROOT_PASSWORD=password  
      - MYSQL_LOG_CONSOLE=true
      - MARIADB_AUTO_UPGRADE=1
    volumes:
      - /opt/seafile-mysql/db:/var/lib/mysql  
    networks:
      - seafile-net

  memcached:
    image: memcached:1.6.18
    container_name: seafile-memcached
    entrypoint: memcached -m 256
    networks:
      - seafile-net

  seafile:
    image: seafileltd/seafile-mc:11.0-latest
    container_name: seafile
    ports:
      - "80:80"
    volumes:
      - /mnt/SSD/General:/shared   
    environment:
      - DB_HOST=db
      - DB_ROOT_PASSWD=password
      - TIME_ZONE=timezone
      - SEAFILE_ADMIN_EMAIL=mail
      - SEAFILE_ADMIN_PASSWORD=password   
      - SEAFILE_SERVER_LETSENCRYPT=false  
      - SEAFILE_SERVER_HOSTNAME=127.0.0.1
    depends_on:
      - db
      - memcached
    networks:
      - seafile-net

networks:
  seafile-net:

And here is my nginx proxy manager settings

client_body_buffer_size 512k;
client_max_body_size 100G;
proxy_buffer_size 512k;
proxy_buffers 16 512k;
proxy_busy_buffers_size 512k;

client_body_timeout 120s;
client_header_timeout 120s;
keepalive_timeout 120s;
send_timeout 120s;

proxy_connect_timeout 120s;
proxy_send_timeout 120s;
proxy_read_timeout 120s;

proxy_request_buffering off;

gzip on; 
gzip_comp_level 4;
gzip_min_length 256;
gzip_proxied expired no-cache no-store private no_last_modified no_etag auth;
    gzip_types application/atom+xml text/javascript application/javascript application/json application/ld+json application/manifest+json application/rss+xml application/vnd.geo+json application/vnd.ms-fontobject application/wasm application/x-font-ttf application/x-web-app-manifest+json application/xhtml+xml application/xml font/opentype image/bmp image/svg+xml image/x-icon text/cache-manifest text/css text/plain text/vcard text/vnd.rim.location.xloc text/vtt text/x-component text/x-cross-domain-policy;

tomservo · June 28, 2024, 1:26am

There’s several things with seafile that can affect the upload and download speed and make it inconsistent. Mostly this is down to the data being de-duplicated. The tl;dr

New data is slower (chunks have to be added to the storage), already seen data is faster (chunks are already in the storage, so only new references to that existing data are needed)
The sync clients tend to be faster at upload than the web interface, because the dividing up into chunks happens in the client. They are especially faster with duplicate data since only new blocks have to be transferred.
The sync clients can be faster to download too. They keep a cache of chunks so downloading can sometimes reuse chunks the client already has instead of sending them over the network again.
The server can need more CPU and RAM than systems that don’t do de-duplication because there’s extra processing needed (I found 2 CPUs and 2GB of RAM was enough most of the time, but stepped up to 4 of each for those times where it helps).
Using HDDs is slower than SSDs. You normally would expect that, but it’s even worse with seafile because when reading or writing a file the drive needs to jump around more. For example to download a 1GB file, the seafile server will actually need to read hundreds of files (I think the chunks tend to average about 1MB each). And when you consider some chunks might be older than others, it’s easy to see why they wouldn’t always be next to each other on the disk.

All that means that if you upload the same file to test speeds 3 times, it will be slower the first time than it will be for the next two times. Especially when doing this through the sync client since the 2nd and 3rd time could even seem to be faster than your network connection would allow, because it can just refer to blocks that the server already has without needing to actually transfer them again. This also means that any test won’t represent the real-world speeds because it’s hard to predict how much new vs existing data to use.

And here’s some info on my setup for comparison. I don’t use docker, my seafile is running in a VM that is sharing SSD storage with several other VMs. Uploading 1.5GB through the web interface got me about 44MB/s just now. This included the time the server spent diving the temp file up into chunks and storing them after the upload was done. Watching performance monitors it is clear that the bottle-neck was in the storage, that was as fast as it could write the temp file, and then the chunks. Downloading that file was about 90MB/s. Uploading that file again got about 70MB/s, I believe because when dividing it up into the chunks, no new blocks had to be written.

turtle123n · June 30, 2024, 8:29am

Thank you for the answer, my setup is currently a external usb storage from seagate usb 3.1 and it is connected by usb 3, 2 gigs of ram and 2 cores of the intel 8250u from the htop there doesn’t seem to be that much of a spike on both ram nor cpu. The drive is about a terabyte. My internet speed at the place of the server is a gigabit and where I’m uploading and downloading there is half of that. Would it be possible if you explain a little more about the re uploading chunks and that it could be faster uploading the same file multiple times, why would a person do that. So if you could please explain point 1 thank you. And is there a way to increase chunk sizes, I have found using filebrowser(another self hosted cloud) setting the chunk size to about 900 mb has given me best speeds, on every single upload and download I get about 60 download and upload.
And how is it that example Mega manager to nearly double my internett download speed and how may i replicate this?

tomservo · July 1, 2024, 1:40am

I just realized I wrote a lot of text here. The short version is, I don’t know what you can do to make your setup faster. Maybe switch to the go file server if you aren’t already using it (see seafile.conf - Seafile Admin Manual ), or switch your proxy to using HTTP/2 if you aren’t already using it. But in real-world file syncing operation I think you will find it is already fast enough to work just fine if just don’t worry about it.

I’ll take these questions a little out of order, but first I should note that I was wrong about the block size, it’s about 8MB, not 1MB like I said, and I don’t know of a way to change it. Also when looking up that number, I also found that I remembered the name wrong, Seafile calls them blocks, not chunks.

I don’t really know anything about mega manager, so I can’t answer that one.

I don’t think seafile’s storage blocks are comparable to the chunking in filebrowser. I think filebrowser’s chunks are just breaking the transfer into smaller transfers (chunks) to make it possible to resume an interrupted transfer without needing to start over from the beginning. But these chunks get merged on the other end and stored as one complete file, like the source file. So maybe the larger chunks are faster because they avoid needing to “warm up” the TCP congestion control as often. I don’t really know.

And for clearing up point 1 about the new data vs existing data, new data is slower because duplicate data is already on the disk and doesn’t need to be written again. The speed testing example I gave before isn’t something you are likely to do a lot, so lets consider a more real-world usage example. Suppose you have a project with 100MB file in the directory you are syncing with seafile through the agent. It could be anything: a program you are writing, a video you are editing, or a giant powerpoint to convince your boss you are so good at powerpoint that you should get a raise. To make the example easy, it’s just one single 100MB file. You want to make some changes, but you aren’t sure if the change will work out or just make a mess of the project, so you first make a copy of the project into the file “project_snapshot”.

The agent sees 100MB of new files and begins dividing them out into blocks to stick in the block cache. When done it has some small data that gives the filename of the new file, and the list of blocks that make up the file’s contents, but there aren’t any new blocks. The agent sends that file that lists the new filenames to the sever. In this case the agent only sent a few KB over the network but the result is a new 100MB of file on the seafile server (but not on the server disk, that just stores the few KB).

Now you make your edit, adding 10MB more to the project, and save. The agent again goes through the file breaking it out to the block cache, and making the file that lists the file’s new list of blocks. This time the agent sends that to the server and the sever says it doesn’t already have these 10MB of new blocks, so the agent sends them. This time a transfer of 10MB plus a few KB for the file that describes the list of blocks gets all 110MB of the file up to the server.

You make one more change, and decide you don’t like that change. But you forgot to copy the file to another “project_snapshot” file. No problem, you go to seafile server web interface, find the file, look at the history, and restore the previous version. The server sends the file with the block list for that file to the agent, and the agent finds it still has all the needed blocks in its block cache, so it reassembles the older version of the file from those blocks without needing to download a new copy of the file.

In this scenario other programs I’ve used for syncing files like nextcloud, would have uploaded another 100MB for the “project_snapshot” directory, and then another 110MB for the edited file. In my case, nextcloud can saturate my 1Gb/s network connection with its uploads and downloads, apparently because it isn’t doing as much processing on the files, but by avoiding unneeded transfers the seafile agent can sync these edits much faster.

In both cases, a little CPU time has been exchanged for not needing to transfer and store as much data. This storage model makes the trade-off of having uploads and downloads though the web interface be slower, and through the agent it takes extra CPU time (and some disk space for the block cache), with the result that our example 210MB of files takes only 110MB of space on the server, and less time to transfer (especially over slower network connections like coffee shop wifi). This also enables quick access to older versions of files without saving a second copy by just keeping the older block list so the file can quickly be reverted to that older state.

turtle123n · July 1, 2024, 3:27pm

To be honest that was a really good explanation, I will test the go server. Also I maybe think that Seafile will be good as projects like you’re example but not for stuff like uploading big unique files, you said that Nextcloud managed to saturate you’re internet and I will continue trying Nextcloud. Nextcloud and file browser are better with unique files but Seafile is better fore syncing editing not o my with speed but for minimizing storage use. I know I have been asking a lot of questions but I have a few more quickly how did you manage the speeds in Nextcloud and did you use AIO(all in one) and did you use community version. Thank you so much. And just to clarify mega manager was a spelling mistake it’s just mega, mega .io is like a cloud service like google drive but for some reason I manage to get like 130 mb/s download speeds on unique downloads even though my internet is only about 60 .

also want to quickly note im sitll using docker compose, normal install has the same speed as i have tested and im on the same network as the server just not on cable as the server is thus its speed diffrence.

tomservo · July 2, 2024, 5:42am

Thank you for that. I do sometimes worry with my longer posts that it’s just an unwelcome level of detail.

I think you are right about the comparative strengths and weaknesses between seafile and nextcloud.

For my nextcloud install I tried a few times to set it up. It was actually while troubleshooting some problems with the AIO version that I decided that I don’t like docker. I know I’m the in the minority there, but docker made troubleshooting the problem significantly harder. Yeah, I know, “old make yells at cloud” vibes. Anyway I already had the environment for running VMs, so I settled on just making a new ubuntu server VM and installing using the official manual install steps in
https://docs.nextcloud.com/server/stable/admin_manual/installation/source_installation.html
It took a while to go through all the steps, but has worked pretty well since then.

turtle123n · July 2, 2024, 8:24pm

Thank you for the chat and answers!

turtle123n · July 27, 2024, 6:26pm

Tomservo, i know its been some time but i kinda need to know how you managed to make nextcloud saturate gigabit internett. i simply cant get it to work. Filebrowser it works and it is fast. but for some reason nextcloud wont. im using nextcloud with mariadb and redis. memory limit is 2 gig and max file is plenty. i have dabled with chunk sizes in nextcloud and that does not seem to speed stuff up much although it could if it were faster. when chunk size is 100 mb and im uploading in bpytop its not downloading anything. but after some time it suddendly spikes to nearly over 100 mib and that chunk gets loaded on the progress bar. ten it stops again and continues til the file is uploaded. i have gathered more experience but still cant get owncloud, seafile or pydio to upload fast enough. i know this probably isnt the right page to ask for this help but since we already talked i figured ill just ask. Thank you. also note that using a downlaod manager is super fast like nearly 70 mb/s and it would be faster if the pc im downloading from had better internett. also hardware does not seem to be a bottleneck. Download seems stable even if i change settings. Thank you.

tomservo · August 2, 2024, 12:20am

Sorry for the delay in answering. I don’t think I did anything special to make nextcloud go faster, I just installed it. I haven’t actually messed with nextcloud much, I used it briefly for a couple of the add-on packages, but we settled on other solutions for those functions. I mostly keep it around now as a backup in case I have problems with these other programs.

My nextcloud server is a VM on my server. The VM has 3 virtual CPUs and 4GB of RAM. Since it is virtual I tried uploading a file from another VM, which should be faster because there’s no hardware network between them, but it wasn’t much faster. It seems that the network is close to the top speed of the VM anyway. I am surprised by how busy the CPU is on the neextcloud server when uploading a file to it. I don’t know enough of the insides of nextcloud to guess why that would be the case.

turtle123n · August 5, 2024, 10:39am

Thank you for answering, but i have found the culprit. It was something i have wondered for a while but finnaly got to test it. Nat loopback or Nat hairpinning. When you are on the same network as the server it will slow down over domain, significantly. When you are using a domain while inside the network the service is originaly hosted the signal has to go out the router and back in, which for some reason SIGNIFICANTLY slows stuff down, i mean like more than half. When on local network the signal goes just through the router not onto the internett, so its much faster. Some routers have hairpinning as a option but mine didnt. i fixed it by using my pihole or local dns and setting up a record that makes my domain into the ip when using that dns. so all requests for domain.com automaticly gets forwarded to the ip of the nextcloud instance. This worked for me. why a download manager is so much faster i still dont know, but on every single cloud storage site it is, maybe it has to do something with how chunking is done or something. Nextcloud from what i think i know writes every chunk while its being uploaded. Nextcloud chunks uploads so when every chunk gets uploaded it gets written to a temp folder or the folder you choose, it can then, (if you have) into a external storage or another drive. That makes the cpu work all the time, when the upload is finished and it says its finnishing upload its writing from that temp folder to the place where the files will go. For that reason on bpytop you can see not much writing while uploading on the external drive or other drive, but when the upload finishes the external drive/other drive gets maxed out ofcourse depending on the drive speed. if the drive is nvme it can be a very quick finishing cycle but the slower the drive the slower the finishing will be. for me i have a hdd with 180 write and read, so the finishing cycle takes a bit longer on bigger uploads. sorry for just blabbing, but thank you for answering and have a pleasant day.

tomservo · August 5, 2024, 9:37pm

Hey, that’s great that you found it, and especially great that you wrote it up here. On behalf of everyone who might find this thread in a search and learn from your experience, thank you for taking the time to write it up!

You are right, I do have my internal DNS configured to give the local IP for the server, and it really does help. I did that so long ago I basically forgot about it.

I suspect your download manager is doing some other tricks to download faster, like downloading parts in parallel. Like maybe it downloads the first and second half of the file simultaneously and then reassembles them into a single file once it has both parts. This is more work for both ends, but can be faster since one download can take up the extra bandwidth available if something causes the other to momentarily slow down.