Seaf-gc: too slow

I would like to know how long the running time of seaf-gc is based on your experience.
I have a 12Gb library, (blocks into xfs filesystem) running seaf-gc in 20 minutes.

   [04/13/18 12:49:46] gc-core.c(440): GC version 1 repo
   [04/13/18 12:49:46] gc-core.c(313): GC started. Total block number is 80189.
   [04/13/18 12:49:46] gc-core.c(46): GC index size is 40094 Byte.
   [04/13/18 12:49:46] gc-core.c(327): Populating index.
   [04/13/18 12:49:46] gc-core.c(181): Populating index for repo c81c66cf.
   [04/13/18 13:00:17] gc-core.c(234): Traversed 27354 commits, 80738 blocks.
   [04/13/18 13:00:17] gc-core.c(183): Populating index for sub-repo 50cfb566.
   [04/13/18 13:00:26] gc-core.c(234): Traversed 2930 commits, 2852 blocks.
   [04/13/18 13:00:26] gc-core.c(183): Populating index for sub-repo c19713ee.
   [04/13/18 13:08:35] gc-core.c(234): Traversed 31512 commits, 68040 blocks.
   [04/13/18 13:08:36] gc-core.c(183): Populating index for sub-repo edf4a201.
   [04/13/18 13:08:47] gc-core.c(234): Traversed 845 commits, 14920 blocks.
   [04/13/18 13:08:47] gc-core.c(183): Populating index for sub-repo f6fb3a9a.
   [04/13/18 13:08:47] gc-core.c(234): Traversed 76 commits, 32 blocks.
   [04/13/18 13:08:47] gc-core.c(341): Scanning and deleting unused blocks.
   [04/13/18 13:08:51] gc-core.c(364): GC finished. 80189 blocks total, about 166582 reachable blocks, 0 blocks are
   [04/13/18 13:08:51] gc-core.c(456): === GC is finished ===


That looks normal to me. It mostly depends on the disk speed and likely involves random IO (so the amount of files/blocks is more important than the size).

Thanks for the reply.
I realized that the amount of blocks is more important than the overall size because another 40gb library uses a few seconds for processing.

But since seaf-gc has to work with the seafile off, it’s really a long time to downtime. Even if it is performed every night, however, the duration does not change.

How could I try to improve performance (without obviously changing the hardware :slight_smile:

Thanks again,

I run it once per month and at least for me that is more than sufficient. Also note that the professional edition can run the garbage collector while the server is online.

Depends on the hard drive speed and the processor speed. Upgrading both those increased my GC and fsck speeds.

And it depends on which filesystem you use.

1 Like

Using Multiple Threads in GC
Since Pro server 5.1.0, you can specify the thread number in GC. By default,

If storage backend is S3/Swift/Ceph, 10 threads are started to do the GC work.
If storage backend is file system, only 1 thread is started.
You can specify the thread number in with “-t” option. “-t” option can be used together with all other options. Each thread will do GC on one library. For example, the following command will use 20 threads to GC all libraries: -t 20

1 Like

Do u show difference from filesystem? Xfs is to be considered invalid?

I do not think the Community version is present

I show a high io wait cpu.
The disk is inserted into a kvm machine and therefore the performance can not be optimal with the hardware we have.

Another filesystem could improve the performance of this operation?

You can try, but I don’t think it’ll significantly change things. The GC most likely randomly accesses your disk (which is what hard drives are bad at)


I resume this thread, to understand the functioning of gc.
Is the speed incremental, or is it constant?
If you launch it once and it takes 10 minutes, the second launch will be faster?
Or … if I launch it every 24 hours or every month, do the performances change?

I think it always needs about the same time. How long it takes mostly depends on how many files there are on disk. As long as you aren’t short of space I’d recommend to run it not more than once a month.