Storage Design and ZFS vs. BTRFS

Hey Folks,

I have been testing and trying seafile on a local VM for a while now. I finally decided to buy some good Hardware for my homeCloud. I’m currently waiting for the hardware and wanted to place some questions while waiting.
I went for 24 / 7 system with ECC RAM, an M2 MVME Intel Drive for OS and two 2TB HDDs for SeafileData. I have been reading alot about ZFS and BTRFS the last couple of days… and I’m still not sure what path I will go.

here some facts:

  • OS Ubuntu (just seems to be the OS I got used to over the last months and Its not going to be changed)
  • ECC-RAM might be overkill but I decided to go the safe way

after I install the OS on the M2-Drive I would install ZFS oder BTRFS and then plug in the two HDDs and mirror them.
Currently I’m more into BTRFS but there are so many opinions on both. Some say for Ubuntu BTRFS is more nativ. Others say ZFS is not stable yet. Other say both are not stable yet…

For me I just want to mirror those to drives, mount them as mirrored partition and place the Seafile Data on it.
For BTRFS I read that you can mirror two drives but you can only use one drive for accces e.g. the seafile Data.So in a case of failure I would have to remount everything to the good drive… that’s not what I’m looking for. I want the system to mount the mirrored partition and tell me when its degraded so that I can swap the falty hdd.
Further I dont want to do deduplication, snapshots or encrytption. The only thing I think I need this crazy Filesystems are there mechanism to detect faulty bits. Otherwise a normal Raid would be fine for me.

Since my MySQL Database will lay on the M2-Drive and that will not be part of the mirror, I cannot do snapshots that would be consistent with both seafile and MySQL, so I will do my Backup (as before) on my Workstation and backup the Data in plaintext. What I prefer more anyway. Seafile for me,is just a central Point to update all my Clients. Its not the only point I have my Data.
Please don’t tell me that Raid is not Backup. I think I have read this sentence 100 times over the last 10 Years…

I uploaded I picture of my situation:

Please let me know what you would use: BTRFS or ZFS and let me know what other Ideas you have.

As I said I currenty got a seafile running in a VM on my Workstation. Thats my Baseline. (its not as clean as I would like It to be but it serves me all the configs for the new System. I’m planing to setup another VM on my Workstation that will be a clean Test-Seafile-Server, and from that I will produce the “Production-Seafile-Server.
So when I want to try something I can your the old-“Dirty”-Play-Server”, when regular Updates come I test them on the “Test-Server” beforehand and then apply them to the “Real-Server”. Its a bit of a work, but thats how I would do things at work so I do the same at home. At the End I might delete the Play-Server but I will have the Test-Server in anycase, because it will also be setup with ZFS or BTRFS.

There are Guides online for installing the entire OS on a mirror like ZFS and BTRFS but they are so F***** complicated and bring to much complexity. I think I will be fine with a good Intel unmirrored Drive for OS.
I will make sure to backup essential configs files from the OS anyway and maybe change my backup solution later (e.g. connect the USB-Drive to seafile) but thats how I’m gonna roll for the time being.

Any questions, opinions or improvement?
Let me know.
And lots of thanks for your time spent, helping so far :slight_smile:

Cheers
Michael

Btrfs and zfs for non root disks are pretty easy to setup. Btrfs seems more suitable for what you want to archive and is stable in regards to mirrored disks. Currently any parity raid is just not stable with btrfs.
Where did you read about the disk mounting if one disk fails?

I’ve got a very good experience using ZFS. I started using a mirror and later switched over to a Raid Z2 (this involves moving the data away and back) with 4 discs (that’s basically a raid 6). My setup runs stable for almost 4 years now. One time a disk failed, I ordered a new one inserted it, told ZFS to use it and ZFS did the rest. While a pool is degraded it can still be used. What I like most is the ease of use (I think btrfs is not that different here). To check the data consistency one runs a scrub (zpool scrub ). I do that once per month.

Although it is zfsonlinux so far I didn’t have any issues with directly using the documentation from Oracle (e.g. https://docs.oracle.com/cd/E53394_01/html/E54801/gaypw.html#scrolltoc).

I found this Guide to install ZFS on a Root:

even If I only have one OS Drive it might be interessting, due to the fact that I should then be able to snapshot the OS (incl. Database) and the other ZFS Pool with the two HDD (incl. Seafile Data). If my theory is correct I would be able to have consistent snapshot of the overall system.
Not sure if I can only have one Drive in a pool for root but the Guide doesn’t seem to mention that it is mendatory.

I will check it out when I have the HW, I think its not possible to evaluate everything in beforehand…

still open for Ideas and comments.

Kind regards,
Michael

Personally I use ext4 on my SSD. Against data loss I sync the SSD to my ZFS pool every night with rsync. Additionally I have database backups.

@shoeper: so you have the OS installed on the SSD and the Seafile Data on a seperate ZFS Raid Pool?
I assume you have to shutdown the seafile Services when you backup AND then you can backup the OS (incl. Database) and the RaidPool (incl. Seafile Data). From my understanding its best when no changes happen and both DB and Seafile Data are backupped then. But I assume this is clear to all of you :slight_smile: thanks for input.
Shoeper just for clarification you have:

  • 1 x SSD with the OS and Database on a normal FileSystem (ext4)
  • 2 or more HDD in a ZFS raid Pool with the Seafile Datafolder
    ?

yes

This is not needed as long as a database dump is done before backing up the data.

1 Like

okay but you also stop both SeafileServices so that there are no changes or do you do Backup while Seafile is running?

I only do stop Seafile for upgrades. Just make sure the garbage collector is not running while doing the backup.

Stopping seafile for the db dump (takes a few sec) makes sense though.
The backup script I am using a while already is available on my github profile.

Danilo I will check out your Script tomorrow. Today I got the Hardware and I’m still assembling it. But I had some things going on in my head arround our Backup discussion I want to present:

  1. Seafile Server Migration / Backup:
    So from understanding so far the brain of seafile are the Database and the Seafile Data-folder. So in the case that I setup a fresh new server and make the configs the configs the same, could it be considered that by moving the DataBase and SeafIle Folder to that new Server the new server would work as the old one ? this would be good to understand for Backup and Recovery purposes.

  2. similar Situation but fresh Server:
    If I would setup a new Seafile Server with the same configs but do not transfer the database and seafile folder and just create the same users + pw’s. If I would then connect with a user, would the seafile Server detect that the Client bibliothek has all the files and start fetching them from him?
    2b. what if I revert/recover the server from a last night backup and then connect a client, from my understanding it should notice that the client has newer files, that he (the server) is missing and pull them back into his System.

these are my thoughts,understandings and assumptions from my general experience with Client/Server applications, Maybe you guys think, that this stuff is normal, I also do. Just want to make sure that seafile follows these standard traditions.

Thanks again for supporting me.

yes

All users would be deleted in this case (they are stored in the database). The clients would need to be reconnected, but the data would still remain on the clients.

In theory, yes. I’ve not tested it, though and currently don’t have the time to do so. The best way would be to create a new library, add some files, look into the database - there is some table pointing to the current state (head) and save / copy it. Add some additional files, stop Seafile set the head back to the old value, start Seafile and see what the client does.

Btw. I recommend running MySQL/MariaDB and enabling memcache.

I setup both the SATA drives in a ZFS Pool like this:

sudo zpool create -m /mountname poolname mirror disk-by-id1 disk-by-id2

it was quite easy after reading through all the other stuff about ZFS.

There is one thing I’m a bit afraid of. At this Link: https://github.com/zfsonlinux/zfs/wiki/FAQ#the-etczfszpoolcache-file
the guy says:
“Again - CEPH + ZFS will KILL a consumer based SSD VERY quickly” and
“even NVMe WILL NOT DO (Samsung 830, 840, 850, etc) for a variety of reasons. CEPH will kill them quickly”

I don’t like my NVME Drive killed, sure I don’t have any special setup as you can see from my ZPOOL create command but Im not sure how the defaults are for these “Killer-Meta-Data-Zila-Ceph-Stuff-Things” are.