Hi, I’m new to Seafile and file server
So, after toying around with Seafile, I found it suitable with our office needs. It’s small office with 3 branches and total around 15 employees, around 4-5 employees for each branch
Every worker will get their own Seafile account and the admin will share the files to them, according to their needs
The file server which host Seafile will be located in our main office. So the main question is how to build it and maintain the best practices for this file server.
For the hardware build itself, I will be using an 11th gen i5, 4-8 gigs of ram, 3 x 1 TB HDD (2 for raid, so the total size will be only 1 TB), 1 SSD for the OS (Ubuntu 20.04), the seafile itself will be deployed via Docker
My question is
- Which is the best practices for the RAID and deploying the SEAFILE?
- Should I install it on Ubuntu 20.04 or is it better to use something like TrueNAS?
- How to configure the RAID to be suitable for Seafile?
- Best backup suggestion?
In my opinion:
- RAID is up to you. Given the small amount of data (1TB), I would suggest either RAID 1 (mirror) or get another HDD and go for RAID 10/1+0 (stripe and mirror). With modern interfaces and drives and given the small data-set, either option should give you satisfactory performance. Network speeds will be more of the speed determinant versus drive speeds in this type of setup.
- I always prefer to use a simple minimal Linux installation for this type of thing unless you have other interoperability considerations such as needing to provide separate shares, SMB/Samba, etc. I like and use Debian personally, but Ubuntu is a fine choice too. Since you are running Seafile via Docker, you can just pick the base OS that you are most comfortable working on. Depending on your Linux experience level, I’d stick to the more popular distros if you think you may need help. The Debian/Ubuntu family lineage is probably the best choice for that reason.
- I would recommend either a hardware RAID (expensive but super-awesome) or a pseudo-hardware (aka still actually software) RAID via your motherboard/BIOS or a PCI-X expansion card. You can certainly use software via Linux but I just find it easier if the OS sees your RAID array as a single drive because the actual RAID is handled upstream. I know many people will point out that keeping software RAID at the OS level with drivers, etc. allows for more portability, but the hassles for such a small data-set don’t seem worth it IMHO. The major caveat is if you plan on upgrading your MOBO before your physical drives – in that case, you should opt for a dedicated PCI-X card since you can swap that into the new system along with your existing drives and everything is good-to-go. If you are using a MOBO solution, there is no guarantee it would be compatible even if the manufacturer is the same.
- I wrote a script that handles my backups using borgbackup to a an offsite ssh/rsync compatible service (rsync.net). Borg in general is fantastic – secure, robust and fast along with excellent versioning. I strongly recommend offsite backups such as Backblaze B2, Amazon S3 or something like Rsync.net. Coupled with something like borg, your data is easily stored off-site and encrypted for security. If you choose to go the local backup route, have at least 2 drives to rotate your external backups and ensure that one is kept off-site. Honestly, cloud backups are much easier. With Backblaze, your monthly cost would be like $5 for 1TB and it would only be slightly more using something like Rsync.net. Either way, not a big cost for data security. Also, you only pay for what you use so in the beginning if you’re only using say 100GB your costs would be 1/10th amount.
Hope that helps? I’m sure others will disagree but that’s the point – hopefully there is discussion and you get some ideas that will help you decide on the best options. Feel free to reply if you have any questions about what I suggested or want more information.
As a side note: You may want to look into storing your data directly on the cloud using Backblaze B2 or S3. You get the benefit of redundancy via their network, automatic backups (though you should still do your own backups to a 2nd service) and unlimited storage growth for a pretty low fee. Even with serious downloading, a data-set of that size will not incur much in the way of fees so it could end up being cheaper than buying the physical drives for the first 2 years. Plus, it’s not hard to set-up an NGINX cache so that your most common files are available locally and you completely side-step the bulk of the download fees thus, reducing the costs even further. Just a thought…
Best of luck! I’d love to hear what you end up deciding
Thanks for the effort in replying my question, really appreciate it.
About the Backblaze, Backblaze is used for storing data only right?
Can I install Seafile in Backblaze, or it is purely only for storing data?
Hi, thanks for the effor in replying my questions.
- Yes, it will be public ip, Letsencrypt is already done
- I think I will use software based RAID, since it’s quite hard to obtain RAID card in my region
- This is my first time hearing about JBOD, will research about it vs RAID further later, thanks
Well then you already have JBOD, it’s just means “Just a Bunch Of Disks”, it’s the No Raid mode for a disk controller card. And then raid is set up in software like you plan or used for zfs .
To answer a question above, back blaze is only for storage and you can’t install Seafile on it.
When you start send a message here if you have more questions.
If you want to practice installing you can use cloud.oracle.com
You would have to sign up and they have always free resources which include two VM and 200GB of space, no charge. You could actually use this for backup if it was enough space.
As already stated, yes Backblaze B2 is only for data. You cannot ‘install’ anything to it. B2 is Backblaze’s competing product for Amazon’s S3. Note, however, that you will have to use Minio as a bridge between Seafile and B2 and there are caveats with that approach. While B2 is much cheaper, it may be better (given your small data-set size) to just bite the bullet and use S3 since it is directly supported by Seafile without needing any intermediaries. If you go the B2 route, feel free to message here or contact me directly and I can help you with setting up the correct Minio container for B2. I made a few posts about this in the forum last year so you can search for those too.
Regarding the whole JBOD vs RAID thing. JBOD means you are just using a bunch of disks with no redundancy. That is fine if you have solid backups but any failure of a disk means you have to replace that disk and restore from backups. You do not have the option to run ‘degraded’ like with RAID. If you use something like S3 or B2, this is entirely irrelevant since you are not running any ‘disks’ but are instead using the cloud, does that make sense?
Meaning no offence, I would suggest against JBOD as an operating mode. I’m not advocating RAID necessarily, I’m just saying that if you are going the non-RAID route, simply connect your disks normally and use regular SATA AHCI mode. Switching the controler to JBOD means you are using the software RAID but asking it to not do RAID. In that case, why use the software-RAID at all? Just let the native SATA controller do it’s job directly. Faster, easier and no vendor-lock-in.
Regarding ZFS: This is like the ‘gold standard’ of data storage. It’s awesome. It’s also expensive and hard to set up unless you know what your are doing. You absolutely require ECC RAM and a server-class motherboard. Don’t try to do it on a ‘consumer’ or ‘gaming’ motherboard. This greatly increases your expense for no real benefit in your situtation, IMHO. If you have the money and experience then totally do ZFS! It’s amazing and I will never knock it. If you don’t have the money and experience, you will likely screw it up and regret this life choice – not knocking the suggestion, but just saying that for a first deployment you may want to err on the side of simplicity. Down the road as your needs expand and your experience grows, you can consider ZFS implementations. That’s just my opinion, however.
As @gogofc mentioned, use a regular Linux distro vs TrueNAS for exactly the points he brought up, he’s spot-on.
Keep posting here and asking questions. Taking the time to plan a little before deployment will save you hours of head and heartache
How’s MinIO , I’ve always wanted to try it but never have yet.
I understand it’s a bucket storage, but how do you use that to back up Seafile data, do you export libraries to filesystem, or something else, I might try to find the post you were talking about later. I don’t need it it’s just interesting for when I do need it
@gogofc Minio is only required if you’re using something like B2 since Seafile does not natively support it (the S3 backend will not work properly with B2 for several reasons that are Backblaze’s fault). Minio is used in gateway mode for this use-case so it works a little differently than the standard deployment where it actually provides the object storage backend.
Basically, Seafile is set-up to use S3 but pointed at Minio. Minio is the gateway to B2 and translates everything to make it all work – think kind of like a router in your network. Minio also provides a nice little optional web interface if you want to see your blocks on your remote storage (helpful for troubleshooting).
If using Minio with B2, you have to use an old version (RELEASE.2020-12-16T05-05-17Z) since that is the latest compatible with Backblaze’s strange S3 implementation. If you’re using memcached with Seafile (highly recommended) then Minio’s gateway cache causes problems also. So, I turn it off and use NGINX’s proxy cache functionality. That way I can cache say 50GB of the most recently accessed data and drastically reduce Backblaze’s already low fees
Hopefully that helped?