Seafile Server Pro library size vs size on disk

I’m running Seafile Server Pro 11.0.16 in Docker on a Debian Server and I have some issues where running gc does not seem to free up the data I expect it to do.

As an example, see this library which should be a bit below 50GB in size.

The history of this library is set to no history (unfortunately I can only post 1 screenshot).

I’ve run gc for this library:

root@345fc92d90d4:/opt/seafile/seafile-pro-server-11.0.16# ./seaf-gc.sh 09b3c116-f155-4a9c-bb65-ded4906375df

Starting seafserv-gc, please wait …
2025-01-07 13:22:18 gc-core.c(1135): Database is MySQL/Postgre/Oracle, use online GC.
2025-01-07 13:22:18 gc-core.c(1160): Using up to 1 threads to run GC.
2025-01-07 13:22:18 gc-core.c(1104): GC version 1 repo flashback database archive(09b3c116-f155-4a9c-bb65-ded4906375df)
2025-01-07 13:22:18 gc-core.c(776): GC started for repo 09b3c116. Total block number is 43509.
2025-01-07 13:22:18 gc-core.c(78): GC index size is 21754 Byte for repo 09b3c116.
2025-01-07 13:22:18 gc-core.c(403): Populating index for repo 09b3c116.
2025-01-07 13:22:19 gc-core.c(405): Populating index for sub-repo 10c3f3a0.
2025-01-07 13:22:19 gc-core.c(405): Populating index for sub-repo 960e272a.
2025-01-07 13:22:19 gc-core.c(405): Populating index for sub-repo 2c758285.
2025-01-07 13:22:19 gc-core.c(405): Populating index for sub-repo fcb0393b.
2025-01-07 13:22:19 gc-core.c(382): Traversed 5 commits, 77584 blocks for repo 09b3c116.
2025-01-07 13:22:19 gc-core.c(382): Traversed 1 commits, 613 blocks for repo fcb0393b.
2025-01-07 13:22:19 gc-core.c(382): Traversed 1 commits, 61 blocks for repo 2c758285.
2025-01-07 13:22:19 gc-core.c(382): Traversed 1 commits, 365 blocks for repo 960e272a.
2025-01-07 13:22:19 gc-core.c(382): Traversed 1 commits, 367 blocks for repo 10c3f3a0.
2025-01-07 13:22:19 gc-core.c(853): Scanning and deleting unused blocks for repo 09b3c116.
2025-01-07 13:22:19 gc-core.c(890): GC finished for repo 09b3c116. 43509 blocks total, about 78990 reachable blocks, 0 blocks are removed.

2025-01-07 13:22:19 gc-core.c(1212): === GC is finished ===

seafserv-gc run done

Done.

However, the actual bytes used on disk do not match the library size:

root@vps657:/opt/seafile-data/seafile/seafile-data/storage/blocks/09b3c116-f155-4a9c-bb65-ded4906375df# du -h -d0
188G .
root@vps657:/opt/seafile-data/seafile/seafile-data/storage/blocks/09b3c116-f155-4a9c-bb65-ded4906375df#

How to get this disk space back?

Also, in my installation there is a /scripts/gc.sh script, and a /opt/seafile/seafile-pro-server-11.0.16/seaf-gc.sh script.

What is the difference between these two?

Thanks in advance!

firefox_Qm0CF265z7

Hi,

Can you provide screenshot of your library history? In GC output I can see 5 commits are traversed. You may adjust the history settings to some longer time temporarily to show more history commits.

Hi Jonathan, do you mean this?

Yes. Please change the history settings to keep 30 days history and then provide screenshot for history again.

Hi Jonathan, I just did that and it gives the same empty trash folder.

We have found the cause. That’s because the blocks are still referred by the sub-repos (virtual repos). Sub-repos are created when you share a sub-folder, or sync a sub-folder. You can check the shared sub-folders and synced sub-folders for this library and make some changes to the files in the sub-folders. That should solve the issue.

We’ll also improve the code to prevent it in the future.

Hi Jonathan,

Thanks for the in depth search and information.

How can we see which sub-folders are shared or synced from the back-end?

Hi Jonathan, I’d like to add some additional information which might help you.

I wasn’t able to find an overview of all synced and shared sub-folders, but after a lot of digging in the system admin → logs view I was able to find a user who was syncing a sub-folder.

Upon investigation of that users SeaFile client that sub-folder was not visible in the client (under synced libraries). However, he did have files stored locally matching the logs we saw in logs in the back end.

I then added a file to that sub-folder of that library and this file was synced to that users local drive, proving that indeed that client actually was syncing that sub-folder. From that moment, the synced sub-folder was visible in that users client under synced libraries.

Now that these synced sub-folders were shown in the client we’ve unsynced all the sub-folders this client was syncing (all from this same library) and then I did another GC for this library, see below log:

Seafile Pro: Perform online garbage collection.

Starting seafserv-gc, please wait …

2025-01-10 09:55:02 gc-core.c(1135): Database is MySQL/Postgre/Oracle, use online GC.
2025-01-10 09:55:02 gc-core.c(1160): Using up to 1 threads to run GC.
2025-01-10 09:55:02 gc-core.c(1104): GC version 1 repo flashback database archive(09b3c116-f155-4a9c-bb65-ded4906375df)
2025-01-10 09:55:02 gc-core.c(776): GC started for repo 09b3c116. Total block number is 37363.
2025-01-10 09:55:02 gc-core.c(78): GC index size is 18681 Byte for repo 09b3c116.
2025-01-10 09:55:02 gc-core.c(403): Populating index for repo 09b3c116.
2025-01-10 09:55:15 gc-core.c(405): Populating index for sub-repo 10c3f3a0.
2025-01-10 09:55:15 gc-core.c(405): Populating index for sub-repo 960e272a.
2025-01-10 09:55:15 gc-core.c(405): Populating index for sub-repo 2c758285.
2025-01-10 09:55:15 gc-core.c(405): Populating index for sub-repo fcb0393b.
2025-01-10 09:55:15 gc-core.c(382): Traversed 5 commits, 68169 blocks for repo 09b3c116.
2025-01-10 09:55:15 gc-core.c(382): Traversed 1 commits, 613 blocks for repo fcb0393b.
2025-01-10 09:55:15 gc-core.c(382): Traversed 1 commits, 61 blocks for repo 2c758285.
2025-01-10 09:55:15 gc-core.c(382): Traversed 1 commits, 365 blocks for repo 960e272a.
2025-01-10 09:55:15 gc-core.c(382): Traversed 1 commits, 367 blocks for repo 10c3f3a0.
2025-01-10 09:55:15 gc-core.c(853): Scanning and deleting unused blocks for repo 09b3c116.
2025-01-10 09:55:15 gc-core.c(890): GC finished for repo 09b3c116. 37363 blocks total, about 69575 reachable blocks, 1954 blocks are removed.
2025-01-10 09:55:15 gc-core.c(1212): === GC is finished ===
seafserv-gc run done

However, doing the exactly same GC right after that gave this log:

Seafile Pro: Perform online garbage collection.

Starting seafserv-gc, please wait …

2025-01-10 09:55:21 gc-core.c(1135): Database is MySQL/Postgre/Oracle, use online GC.
2025-01-10 09:55:21 gc-core.c(1160): Using up to 1 threads to run GC.
2025-01-10 09:55:21 gc-core.c(1104): GC version 1 repo flashback database archive(09b3c116-f155-4a9c-bb65-ded4906375df)
2025-01-10 09:55:21 gc-core.c(776): GC started for repo 09b3c116. Total block number is 35409.
2025-01-10 09:55:21 gc-core.c(78): GC index size is 17704 Byte for repo 09b3c116.
2025-01-10 09:55:21 gc-core.c(403): Populating index for repo 09b3c116.
2025-01-10 09:55:22 gc-core.c(405): Populating index for sub-repo 10c3f3a0.
2025-01-10 09:55:22 gc-core.c(405): Populating index for sub-repo 960e272a.
2025-01-10 09:55:22 gc-core.c(405): Populating index for sub-repo 2c758285.
2025-01-10 09:55:22 gc-core.c(405): Populating index for sub-repo fcb0393b.
2025-01-10 09:55:22 gc-core.c(382): Traversed 5 commits, 68169 blocks for repo 09b3c116.
2025-01-10 09:55:22 gc-core.c(382): Traversed 1 commits, 613 blocks for repo fcb0393b.
2025-01-10 09:55:22 gc-core.c(382): Traversed 1 commits, 61 blocks for repo 2c758285.
2025-01-10 09:55:22 gc-core.c(382): Traversed 1 commits, 365 blocks for repo 960e272a.
2025-01-10 09:55:22 gc-core.c(382): Traversed 1 commits, 367 blocks for repo 10c3f3a0.
2025-01-10 09:55:22 gc-core.c(853): Scanning and deleting unused blocks for repo 09b3c116.
2025-01-10 09:55:23 gc-core.c(890): GC finished for repo 09b3c116. 35409 blocks total, about 69575 reachable blocks, 714 blocks are removed.
2025-01-10 09:55:23 gc-core.c(1212): === GC is finished ===
seafserv-gc run done

I kept doing the same GC actions, resulting in 328 blocks removed, then 154, 66, 32, 13, 10, 3, 0

It kept being 0 after that.

Currently this library is still 128683600 bytes on disk while it contains 41.9GB of data, so it’s a lot better, but the issue isn’t fixed yet.

So, a few questions arise:

  1. How are we able to see which users have synced or shared sub-folders for all our libraries?
  2. How come the number of sub-repo’s shown by GC log did not change when we unsynced 3 sub-folders for this library?
  3. How come these synced sub-folders are not always visible in the client?
  4. How come that doing a GC does not remove all unused blocks in a single run?
  5. How many GC runs should be done to be sure all unused blocks are removed?

Thanks so far!

A quick way to find out all shared or synced sub-folders is to run this SQL in seafile_db:

select repo_id, path from VirtualRepo where origin_repo='09b3c116-f155-4a9c-bb65-ded4906375df'

Unsyncing the sub-folder doesn’t help with GC. You have to make some changes to the files in that sub-folder. I think because you made some changes in that sub-folder, GC made some progress (releasing a few thousands blocks).

Maybe the sub-folder were synced with some very old version of Seafile client.

GC usually can’t remove all unused blocks in a single run. The algorithm is probabilistic. It just remove most of the unused blocks in each run.

It’s not about how many GC’s you have to run. You just need to find out all 5 shared or synced sub-folders and make some changes in them.

Thanks again for the detailed information.

A quick way to find out all shared or synced sub-folders is to run this SQL in seafile_db:

So then I know the id’s of repositories, how can I then find which users have shared and / or synced something?

Unsyncing the sub-folder doesn’t help with GC. You have to make some changes to the files in that sub-folder. I think because you made some changes in that sub-folder, GC made some progress (releasing a few thousands blocks).

Can this be done without actually using the client used by this user?
In this case, we’ve unsynced these 3 folders, so we can’t make any changes to these sub-folders anymore. Does this mean we cannot get rid of these sub-repo’s anymore?

You just need to find out the owner of the library and ask him/her to make some changes in these sub-folders.

Making changes from the web interface also works.

Same issue i faced.

Unfortunately looking in the database gave more questions than answers. It seems in the VirtualRepo table are quite some entries which seem to be orphaned.

We tried finding the user who created the VirtualRepo but weren’t sure how to locate that info.

As a further test I synced a sub-folder of one of the libraries, which created a new record in VirtualRepo. However, this synced sub-folder is not visible in the web interface (so I can’t make changes to it in the web interface).

After unsyncing this temporary sub-folder the record which was created in VirtualRepo wasn’t removed.

In the end we’ve completely removed this library (09b3c116-f155-4a9c-bb65-ded4906375df), re-created and re-uploaded it, which is in this case was possible because file history is not important for us with this library. We’ve done this since I’ve burned too much time on this and don’t seem to get any closer to an actual solution.

However, as stated I think there are quite some entries in our database which seem to be orphaned.

Earlier you mentioned you are making a change to prevent this in the future, will this change also clean up the referred blocks already there?

Is there a way to get rid of these orphaned records in the database without trying to figure out which user created a virtual repo, logging in as that user, etc?