Hello,
I need guidance about the processus to use for a storage migration.
I run a seafile instance in my company. I run seafile 10.0.18 and use cloudian s3 storage. I want to upgrade to seafile 11,12,13. But first , I have modify the storage from a 1 bucket configuration to a 3 buckets configuration. I’ll use the same s3 cluster.
note : the instance has a lot of history, dating back 8 years, with a lot of dead objects in the storage. I have 11TB is S3 and about 5TB in files when i check the admin ui.
option 1 : use migrate.sh
This will copy my initial bucket 3 times.
configure seafile-temp.conf
run the script online a 1st time migrate.sh (this take time)
cut user access
run the script a 2nd time
change the s3 backend config to the new buckets, restart seafile
re-enable user access
option 2 : use storage classes
define a new storage class, set it as default
migrate the repos 1 by 1 using a script that loops over migrate-repo.sh
remove my old single bucket storage class
Is there a prefered method ?
Is the migrate-repo.sh smart enough to put only fs object in fs bucket, commit objects in commit bucket and block object in block bucket ? Or will the original data be replicated 3 times ? ( I know migrate.sh copies the data 3 times )
Both options will copy all objects to the new bucket without distinguishing between blocks, commits, and fs. In version 14.0, we will provide an option in the migration script to support separating these three types of objects to the new storage backend, but this option is not available in the current version.
Hello,
Thanks for your answer. I have 2 follow-up questions.
question 1
Which method do you recommend? My guess is method 2 will be slower, but will copy only active libraries, here’s the breakdown I came up with.
migrate.sh
migrate-repos.sh
speed
faster
slower
storage size
bigger
smaller
data copied
everything is copied, including dead objects
only active libraries are migrated (incl. dead objects in active libraries)
downtime
downtime window mandatory
fully online
In seafile 10 pro, Is there any way to migrate my config to 3 buckets without having 3 full copies of my 11TB initial bucket ? I have “only” 4.5TB of actual data in the seafile UI, and i will have 33TB in S3. This is getting expensive…
question 2
In seafile 14, will I be able to define a new storage class, loop through all libraries and do : migrate-repo.sh --copy-only-relevant-object-type (or similar)?
Then the new storage will be 3x smaller than the old one ?
I recommend using method 2 for the migration, as it allows you to control which libraries are migrated. Before version 14.0, these three types of objects cannot be distinguished, so after migration, each bucket will contain a full copy of the original storage. In version 14.0, we will provide an option to separate these three types of objects, so that each bucket will contain only the corresponding type of objects.
In addition, please note that when all objects are stored in a single bucket, seaf-gc.sh cannot be performed. Running seaf-gc.sh in this case will lead to catastrophic consequences.
Thanks for the answer. This is really useful information !
When migrating from seafile.conf to storage_classes.json, how do I setup the storage_classes.json configuration, so that the storage_id corresponds to the storage that was defined in seafile.conf?
The storage_id must have the value default ?
Or does it use the is_default parameter ?
When will this be available ?
Will it be possible to use this seafile 14 script in seafile 10 ? Would you recommend it ?
Is this documented anywhere ? I learned it the hard way.
Hi,
I did read the document, but it does not explain fully my migration use case.
What i would like to restart seafile and have no change, except I now use storage_classes.json with :
storage class s3-old (single bucket) to be equivalent to the one in currently have in seafile.conf.
storage class s3-new (3 buckets) to be the new storage class, and set is_default = true for this one. Because i want new librairies to be created here automatically.
After all librairies are successfully migrated, i could then delete the data in s3-new.
Must s3-oldstorage_id be default ? Or s3-old’s name ? Or something else ?