Seafile 10 pro : s3 storage migration from 1 to 3 buckets

Hello,
I need guidance about the processus to use for a storage migration.
I run a seafile instance in my company. I run seafile 10.0.18 and use cloudian s3 storage. I want to upgrade to seafile 11,12,13. But first , I have modify the storage from a 1 bucket configuration to a 3 buckets configuration. I’ll use the same s3 cluster.

note : the instance has a lot of history, dating back 8 years, with a lot of dead objects in the storage. I have 11TB is S3 and about 5TB in files when i check the admin ui.

option 1 : use migrate.sh

This will copy my initial bucket 3 times.

  1. configure seafile-temp.conf
  2. run the script online a 1st time migrate.sh (this take time)
  3. cut user access
  4. run the script a 2nd time
  5. change the s3 backend config to the new buckets, restart seafile
  6. re-enable user access

option 2 : use storage classes

  1. define a new storage class, set it as default
  2. migrate the repos 1 by 1 using a script that loops over migrate-repo.sh
  3. remove my old single bucket storage class

Is there a prefered method ?
Is the migrate-repo.sh smart enough to put only fs object in fs bucket, commit objects in commit bucket and block object in block bucket ? Or will the original data be replicated 3 times ? ( I know migrate.sh copies the data 3 times )

Hello @Damien ,

Both options will copy all objects to the new bucket without distinguishing between blocks, commits, and fs. In version 14.0, we will provide an option in the migration script to support separating these three types of objects to the new storage backend, but this option is not available in the current version.

1 Like

Hello,
Thanks for your answer. I have 2 follow-up questions.

question 1
Which method do you recommend? My guess is method 2 will be slower, but will copy only active libraries, here’s the breakdown I came up with.

migrate.sh migrate-repos.sh
speed faster slower
storage size bigger smaller
data copied everything is copied, including dead objects only active libraries are migrated (incl. dead objects in active libraries)
downtime downtime window mandatory fully online

In seafile 10 pro, Is there any way to migrate my config to 3 buckets without having 3 full copies of my 11TB initial bucket ? I have “only” 4.5TB of actual data in the seafile UI, and i will have 33TB in S3. This is getting expensive…

question 2
In seafile 14, will I be able to define a new storage class, loop through all libraries and do : migrate-repo.sh --copy-only-relevant-object-type (or similar)?
Then the new storage will be 3x smaller than the old one ?

Thanks

Hello @Damien ,

I recommend using method 2 for the migration, as it allows you to control which libraries are migrated. Before version 14.0, these three types of objects cannot be distinguished, so after migration, each bucket will contain a full copy of the original storage. In version 14.0, we will provide an option to separate these three types of objects, so that each bucket will contain only the corresponding type of objects.

In addition, please note that when all objects are stored in a single bucket, seaf-gc.sh cannot be performed. Running seaf-gc.sh in this case will lead to catastrophic consequences.

1 Like

Thanks for the answer. This is really useful information !

When migrating from seafile.conf to storage_classes.json, how do I setup the storage_classes.json configuration, so that the storage_id corresponds to the storage that was defined in seafile.conf?

  • The storage_id must have the value default ?
  • Or does it use the is_default parameter ?

When will this be available ?
Will it be possible to use this seafile 14 script in seafile 10 ? Would you recommend it ?

Is this documented anywhere ? I learned it the hard way.

Hello @Damien

1.You can refer to this document https://manual.seafile.com/latest/setup/setup_with_multiple_storage_backends/ to configure multiple storage backends. The is_default parameter is required.

2.We typically release a new version every year. For the detailed release notes, please refer to the following link: Seafile Release Table

3.In theory, this script should be able to run on version 10.0-pro, but this can only be confirmed after version 14.0-pro is released.

4.There is currently no documentation on this. However, running GC on version 11.0-pro or later will result in an error and terminate the process.

Hi,
I did read the document, but it does not explain fully my migration use case.

What i would like to restart seafile and have no change, except I now use storage_classes.json with :

  • storage class s3-old (single bucket) to be equivalent to the one in currently have in seafile.conf.
  • storage class s3-new (3 buckets) to be the new storage class, and set is_default = true for this one. Because i want new librairies to be created here automatically.

After all librairies are successfully migrated, i could then delete the data in s3-new.
Must s3-old storage_id be default ? Or s3-old’s name ? Or something else ?

Hello @Damien ,

s3-old should set is_default=true.

You can add option for_new_library to the backends which are expected to store new libraries in json file.

1 Like

Thank you very much !

So here’s my procedure.

  1. Change the config files to use storage classes
  • s3-old as is_default
  • s3-new in for_new_library
  1. Run the migrate-repo.sh script to move all libraries to new storage
  2. Change the config files again :
  • s3-new set now is_default
  • s3-old section removed
  1. remove the old single bucket storage
  2. upgrade to seafile 11
  3. upgrade to seafile 12
  4. upgrade to seafile 13
  5. upgrade to seafile 14
  6. Run the gc script (procedure to be defined)