i’ve been testing seafile now for a couple of weeks and I feel confident that I will go with seafile.
Since I already feel quite comtable with the software side of things: Ubuntu, Nginx, seafile I’m currently planning the Hardware Design.
I currently have these ideas:
2 x sata SSD (64 GB) in Raid 1
2 x sata HDD (2 TB) in Raid 1
I’m not sure yet if i will use Software raid or Hardware raid or intel Fakeraid but I’m questioning my selfe the following questions:
I need to know where the major reads/writes are. And under what circumstances what peace is written to or read from.
I understand that all the meta files are in the MySQL Database so I would be sure to put the database on the SSDs. Then there is the Seafile Folder (currently in /Home) and I was wondering how much access there is to those files and folders. I Assume it makes sense to keep these files on the SSD aswell. Then the only difference to my current Test-Setup would be to have the 2TB Raid mountet and have the DataFolder on those HDDs.
Am I correct by saying that the fIles and foldes in Seafile Datafolder are not visible in clear text. at least that is what I can say from know.
A last question is what services run in memory. Wouldn’t it be great to have the MySQL Database run in the memory to reduce SSD access? Is this by default like on MIcrosoft SQL Server?
What do you guys recommend for a Storage Design.
Actually I was also thinking about installing the OS on a USB Thumbdrive but only if the major important stuff will / can be offloaded to memory. Similar like a Firewall box where everything is loaded from the SD-Card to the memory at startup. Yet I assume the Seafile Server is a bit more complex and I will either land with:
I can’t think of a use for a dedicated 64GB pool of fast storage for a home server running Seafile, databases (when exported to SQL syntax) don’t excede 700kB on my home server (1,7TB used, 6 active users and about 10 libraries per user) so those are probably just on RAM all the time:
Unless you want to use the SSDs as some sort of cache I don’t recommend you invest on that, maybe get another drive for the RAID and do RAID5-6.
Thanks, but what are the reasons to use ZFS. IMHO Its been a hype since the last few years in the private sector but hasn’t become a industry standard.
I don’t need bleeding egde shit with crazy features. I need it to mainly be easy to understand and maintain / repair on the weekly basis. I believe that Linux raid does Raid good enough and LVM does expansion if needed, so why ZFS?
Thanks, so when your databases are not more that 1 MB where does all the action happen. Their must be some sort of indexing either in a file oder database… This is from my understanding the only way to manage Files and Folder (in ure case 1,7 TB) that seafile seems to strip apart and store in some special way in the Seafile-Data folder.
I’m not claiming that I’m 100% sure where the databases lives throughout its lifetime, it obviously needs to sync with the copy on non-volatile memory when writes occur. I’m just saying that because of how little the databases are, more static data like users, libraries, sublibraries are indexed in the databases (not the files in the libraries, but the libraries themselves (latest commits for example) you can peek into the databases and find out), thus less write operations are performed, allowing the database software to avoid touching the disk.
With that pseudoscientific analysis I could say that moving the databases to an SSD doesn’t make much sense economically. I’d rather just spend the money on more storage.
or wouldn’t you use a Raid for the SSD and the OS?
If you export your Snapshots, it’s not necessary. I just would use it for the data storage. It nearly impossible that just one SSD would break down. If they break, then both will. And be careful, RAID isn’t a backup!
Raid 1 is simply called mirror in ZFS.
RaidZ1 is one parity thus any one of all disks can fail (makes only sense with 3 or more disks).
RaidZ2 is with two parities thus any two of all disks can fail at the same time.
Okay so lets talk about Backup. From what I read in the manual Backing up the MYSQL DataBase and FileSystem have to be in the same timeslot and the Seafile Services should be stopped to have a cosistent Backup state. I’m not sure how to backup yet. I read some artikels on general Linux Backup and there are Backup solutions. I Also read the Seafile Cluster that tells alot about its Backupmechanism. But I’m not sure how often to backup and where yet. Actually I thought if I backup my stuff on my PC its okay. but it would be bad to have seafile down for a couple of days in order to rebuild …
Some Questions I was thinking about:
Am I making it to complicated? I don’t think most people do such effort when they plan a homeserver. Maybe KISS would do me better (Keep it simple,stupid)
I mean I realy won’t use this server as a test machine and try all sorts of software. I’m going to maintain Updates, document everything and make sure I do stuff the appropriate way (e.g. manual, vendors recommendation)
What if the System Idle’s and somehow a powerloss accurs. On my daily PC-Systems I just boot up and voila I’m up and running again. Is this a deal with a Seafile Server? I’m a bit afraid recovering because the File are not lying in plaintext and the smallest mismatch between Seafile Datafolder and MYSQL Database can make everything inconsestent as from what I read… this makes it seem very sensitive to me.
The only ideas I have is to backup all configs, every folder and dump the MYSQL databases every evening before the server shuts down at 02:00 at night… But that would mean I need a USB-Drive connected (more power consumption)
Damn this project is getting very complicated and is ******* me up.
Do you guys go so far with your home Seafile?
Come On guys I need some recommendations, hints and solutions. here is my planned Hardware buy:
|Chassis: not sure yet|
|PSU: 60W extern, picoPSU-90 |
|MOBO:Fujitsu D3417-B2 |
|RAM: Some random 4 GB Stick|
|HDD: 2 x 2000GB Seagate BarraCuda ST2000LM015|
|M2: Intel 760p 128 GB, Solid State Drive|
I don’t use ZFS but if it allows you to make snapshots, then I guess you could do this:
Stop the seafile server
Make a ZFS snapshot
Start the seafile server
Finish making the backup by copying seafile-data from the snapshot created in step 3
Given that you only need to stop the server to make the snapshot (I assume this doesn’t take much time) and backup the databases (takes a little less than 1 minute) then the downtime should be pretty insignificant.
After doing the SQL backup which is a matter of milliseconds you can start the server again. The only thing which doesn’t work is running the garbage collector or fixong some library with seaf-fsck (with that hardware and ZFS it is very unlikely that you ever need that) while doing the backup.
In the database are only references to the most recent state. In the worst case you’d have some hard to remove garbage on restore (data that had been uploaded after doing the SQL backup), but that wouldn’t have any other negative consequences. Only border case that just came to my find is that there is some file move feature on Seahub and I’m not 100% sure how that works. I think it should not have consequences for the backup and most likely uses hardlinks with fs backend.
I copy the content of my ssd to the raid every night using rsync and I do a full backup on the first sunday of each month. I keep the backup at someone else, so they are only connected while the backup is running. I also backup via rsync to a disc I’ve placed at my brother (the disk is connected using some cheap developer board and data is being pushed through SSH using a non privileged user. To prevent deletion of backups from my server the root user creates snapshots on regular basis. FS is btrf for this backup because afaik ZFS doesn’t work with arm).
To make sure the data is intact and no disk failed I run 2 scrubs a month (all data is being read on scrub and checked against its checksum, on failure missing data would be repaired if possible and I would replace the effacted disk).
I’ve played with online backups but wasn’t satisfied (slow, expensive, time intensive, error prone, …).