Is it possible to migrate out of S3?

Cellane · January 17, 2020, 7:20am

First of all, thank you so much for developing Seafile. I’ve been using it as a personal cloud drive with an S3-like backend for nearly three months now (after being utterly disappointed with Nextcloud) and I couldn’t be any happier – wrote an article about my experience (specifically, deploying Seafile to Dokku), as well.

I was recently thinking of moving my server away from DigitalOcean (and their DigitalOcean Spaces storage solution) to Hetzner Cloud. Hetzner doesn’t offer S3-like storage solution, but they have Storage Boxes which could be used as a network drive, mounted anywhere in the system. Therefore I think it could be used as Seafile’s storage.

My question is: can I migrate the existing data (about ~150 GB, so I’d prefer not to do it through my computer) from one Seafile instance to another, from S3-like back-end to a normal filesystem? The official documentation seems to imply that much (“Seafile supports data migration between filesystem, s3,…”), but how would I go about doing it? Would something like this work?

Set up a new server by using a snapshot of database & configuration from the old server (thus connecting it to the existing S3 storage)
Create a temporary seafile.conf file for migration – but how would this file look like when specifying a filesystem storage? I think this is my main point of confusion for now.
Run the migration script at least twice and shut down the old, S3-connected instance for good after all is migrated.

Would this work? Is migration from S3 back to filesystem really supported?

Thank you for your answers!

shoeper · January 17, 2020, 10:33am

Seafile won’t move the data for you.

For local data usually files are kept in a folder called seafile-data/storage containing three folders: fs, commits and blocks. In each of them is a folder for each library named by the library id. And in those folders is first a folder containing of 2 chars and then the files (name is 38 chars long).
If I recall correctly the blocks are stored a little different on S3 and you have to split the object ids into a folder of 2 chars and a file in the folder with a name of the other 38 chars (together we have a sha1 of 40 chars).

So unless someone has one, you would have to write some script moving over the data. As there are likely many objects make sure to use a large amount of threads.

DerDanilo · January 17, 2020, 11:07am

Do not use the storage boxes as data source. You are gonna have a bad time. Did that, bad idea !

Rather mount the volume storage that is offered within the Hetzner cloud, which is Ceph based and the availability is way better. Otherwise go for a small hw node and do it yourself. Storage box can be a target backup storage at best, but don’t expect good performance from it.

Cellane · January 18, 2020, 12:47am

I see! Thank you. I thought because the migrate.sh script that ships with Seafile Pro was able to migrate my files from the local filesystem to S3, it could perhaps do the reverse if properly configured.

Oh, if that’s the case, then I might abandon the migration completely. The only motivation was the pricing of storage boxes, as the pricing of volume storage doesn’t match my needs unfortunately.

DerDanilo · January 18, 2020, 7:49am

Technically it works but is slow and unpredictable. As backup storage it’s fine I guess but not for production. The provider needs to make some cuts somewhere otherwise these prices wouldn’t be possible.

You may check out the SB, dedicated HW for money. As long as you know what you are doing this is a good solution.

https://www.hetzner.de/sb

shoeper · January 18, 2020, 11:20am

Ok, didn’t know that one. Maybe it is also able to do it.

Cellane · January 21, 2020, 11:19pm

@DerDanilo Thank you for sharing your experience! I’m seriously considering renting the dedicated server now, already started doing some research in this direction (how to implement whole-disk encryption with remote unlock, etc.).

However, I’d really wish for some (in best case scenario official) confirmation about the migration script. I think that if I spin another instance of Seafile with seafile.conf pointing to the existing S3-like storage, that should work without any issues.

The big question is: if I then create a temporary seafile.conf (like the migration script documentation mentions, but due to forum rules, I can’t link it unfortunately) that omits the three sections (commit_object_backend, fs_object_backend & block_backend) that instruct Seafile to use S3 as data storage, would that trigger the data migration from S3 to the local filesystem?

( Because this is how the migration in the other direction, filesystem to S3, works. You create a temporary seafile.conf that points to remote storage, and run migration script that reads this temporary config file. Once migration is finished, you append the extra parts to the main seafile.conf & restart Seafile)

I’m sorry about the bump, but I’d really prefer not doing something this… experimental without a confirmation that it might work. I really wouldn’t want to lose my existing data.

DerDanilo · January 22, 2020, 5:53am

I am not sure if this will work. I have only worked with migrations the other way around.
If the script should not support S3 opt out, you could have a look at ‘minio’. With minio you could run your own S3 server locally and then sync/mirror data to it from the currently existing S3. There are tools available which can sync S3 buckets.

Cellane · January 22, 2020, 7:22am

That is super-smart, I did not think of that! So now I have three possible strategies: try the migration script; if it fails, either copy things over slowly manually (through Seadrive connected to both accounts, probably), or deploy Minio into Dokku and migrate that way. Thank you!

shoeper · January 22, 2020, 1:23pm

I think it would be easier to write a migration script than running minio without the need to do so (which just adds overhead, latency and complexity).

DerDanilo · January 22, 2020, 2:08pm

As far as I know the bash script calls a python script. The python script doesn’t work very efficient when it comes to a lot data. It requires adjustment for writing the index alone. With S3 there is latency of course. If run locally this latency will be very small though. But minio also offers the possibility to have an HA storage system with 2 replicas easily. Not something I would recommend for large enterprise clusters necessarily but it should work fine.

@Cellane Please let us know how it works out with the migration.

@daniel.pan Is there a migration script or function that can be used to migrate data from S3/ceph to local storage?

Cellane · January 24, 2020, 4:47am

I… think the migration script might actually be working

On the new server, I restored MariaDB from back-up created on the old server, and copied all the Docker persistent storage data (incl. conf/seafile.conf pointing at S3). Then I entered the container (in my case, that’s as simple as dokku enter files, since I run most of my apps via Dokku), and after executing these commands…

mkdir migration
export OBJECT_LIST_FILE_PATH=/opt/seafile/migration/file
touch /opt/seafile.conf # That’s right, empty config file here!
cd seafile-server-latest
./migrate.sh /opt

… this is the result I’m seeing:

It’s definitely downloading (filenames and sizes are matching compared to my S3 storage), and as you can see in the bottom right, it’s going crazy fast (100+MB/s). I’ll report back once the script has finished running!

Edit: also, as you can see from the screenshot, it is also renaming the files/folders to match the local filesystem structure that Seafile uses. It’s exactly like @shoeper mentioned in the beginning, the first two characters are extracted from the filenames and turned into a folder name.

Cellane · January 26, 2020, 10:34pm

Apologies for the delayed reply, but here’s my final report on this matter:

The migration script definitely works, and if I remember correctly, it took about 4 hours to copy everything from the S3 storage. Most of the time was spent on downloading a lot of small objects, the actual size of the repository didn’t seem to matter all that much.
After migration, the server also works fine, so it’s not like “data migrated, but everything else broke” – nope, it all actually works just fine. I think the only issue I’m having is that both Seafile and Seadrive clients are having issues syncing one library, probably due to some changes happening to it after I finished the migration script but before DNS entries propagated to the new server, so probably some changes were done to the old server that are not yet reflected on new one. For the Seafile client, this is easy to solve with the “Resync this library” option on library. For SeaDrive, I don’t have a solution yet, but I think I’ll just have to log out of my account, clear cache and log back in. Oh well, should’ve been more careful I suppose.
Oh my word the speed. So for regular usage, I’m not noticing any significant improvements but I run Seafile’s garbage collector every Sunday. When running on old VPS & S3, the collector took anything between 6 to 12 hours to finish. On new server and local storage, it finishes in 30 minutes. Whoa

Thanks for all the consultation, and thanks to the Seafile team for their hard work! (But maybe you could improve the documentation by explicitly stating that the migration script works in both ways for some back-ends )

DerDanilo · January 27, 2020, 6:13am

Thanks for the update. This could also be a possible solution for offsite backups when using S3 as backend. Good to know that this works.

Speed can be improved by writing the index file to local file and reading from that. But this only makes sense if one wants to migrate huge amounts of data with process restarting or parallel transfer of multiple streams.

Interesting information on the garbage collector. Didn’t know that the difference is that huge.

Cellane · January 27, 2020, 6:26am

Speed can be improved by writing the index file to local file and reading from that.

Is this related to the OBJECT_LIST_FILE_PATH environmental variable mentioned in the migration guide? If so, I used it and it seemed to work. After running the migration script once (~4 hours), I ran it again and it finished in mere seconds. So the times mentioned in my previous post are with that variable set.

hydrandt · March 24, 2022, 10:04pm

Somehow, the migration script didn’t work for me, it only copied over part of the data (100 GB out of 1TB).

So I used rclone to copy out the data from the S3 (aliyun oss) buckets. Originally I was using s3fs, terribly slow. This worked really well:

rclone copy gssffsobject:/gssffsobject /opt/seafile/storage/gssffsobject/gssffsobject --checkers 256 --fast-list --size-only --progress --multi-thread-streams 256 --transfers 256

But, now, I can’t enter any libraries (empty page), and in log there are messages like:

repo-mgr.c(905): Commit f93888d1-bf0c-40a8-b56d-ce1fefd96c6b:b5497d018d08cd01eabf44913c3bd91e657b79fc is missing

Is there a way I can now modify the folders structure into the expected format?

hydrandt · March 25, 2022, 8:25am

After some searching, I was not able to find documentation about the folders structure when using local storage. @DerDanilo @shoeper would you maybe be able to share some resources that would enable me to write a script that would migrate the S3 folder structure into local storage folder structure?

I see two ways I could take to finish the migration:

Start minio on the destination host (where I now have the content of the three S3 bucket on local filesystem), start Seafile Pro, and use the migrate.py script to do the migration from local S3 storage (minio) to local hdd storage, creating the necessary folder structure. Uncertainty I have here is whether I am still able to start Seafile Pro, as our license has expired (non-profit ran out of money → downscaling)
Read the code of migrate.py and figure out how it comes up with the folder structure when doing S3 → local storage. I’m not very proficient in python, but, sounds like a challenge

Any help welcome. I’m holding everyone’s files hostage for five days already (oops).

hydrandt · March 25, 2022, 10:03am

Ok, I found the answer in an old github issue reply from @shoeper: Issues migrating to s3 · Issue #1564 · haiwen/seafile · GitHub

You need to combine the folder and filename to be the new object name. On the file system the first to characters of the file name are used as foldername, to not have too many files in the same folder.

So e,g, c3685afe-ba8d-45a4-a2df-051f1da91ae7/cc/57c9304acfc72ec8bb5b39ea983834f1da8712 on fs needs to be c3685afe-ba8d-45a4-a2df-051f1da91ae7/cc57c9304acfc72ec8bb5b39ea983834f1da8712 on s3.

Tested on one small library by moving the files by hand, works.

Update: I reached a happy end yesterday

Wrote a python script that moves all files in a folder into a subfolder created from the file’s first two characters, and then moves the file there, cutting out first two characters from the file name:

#!/usr/bin/env python3
import os
import sys
import shutil

def path(sourceDirectory, fileName): return os.path.join(sourceDirectory, fileName)

sourceDirectory = sys.argv[1]
for originalFileName in os.listdir(sourceDirectory):
    sourceFilePath = path(sourceDirectory, originalFileName)
    if os.path.isfile(sourceFilePath):
        firstTwoChars = originalFileName[:2]; targetDirectory = path(sourceDirectory, firstTwoChars)
        if not os.path.exists(targetDirectory):
            os.mkdir(targetDirectory)
        shutil.move(sourceFilePath, path(targetDirectory, originalFileName[2:]))

And ran it once per storage folder:

seafile-data/storage# for dir in commits/*; do ./moveRename.py $dir; done

Thanks for listening to my monologue and hope someone else finds this useful one day

rasibo · November 4, 2024, 4:40pm

I used exactly these commands one by one. But ist starts giving me this:

2024-11-04 17:36:11,863 Start to fetch [commits] object from destination
2024-11-04 17:36:11,866 [commits] [1] objects exist in destination
2024-11-04 17:36:11,869 Start to fetch [fs] object from destination
2024-11-04 17:36:11,869 Start to migrate [commits] object
2024-11-04 17:36:11,873 Start to fetch [blocks] object from destination
2024-11-04 17:36:11,873 [fs] [0] objects exist in destination
2024-11-04 17:36:11,873 Start to migrate [fs] object
2024-11-04 17:36:11,873 [blocks] [0] objects exist in destination
2024-11-04 17:36:11,874 Start to migrate [blocks] object

and then stalls. Any idea?

rasibo · November 6, 2024, 4:49pm

@hydrandt, I would love to use your python script if it would work. Unfortunately so far it does not work for me. The script ends silently - no error, nothing, but nothing has been changed.
Are there variables to be defined manually? Where does the script get “sourceDirectory, filename” from? Which directory is the starting point of the script? Is it the the storage or the commits directory if the data in commits are to be converted?
Can you help?