Garbage Collector can't release space if deleted file have a big size

Hey there!
I have a strange problem on one of my instance Seafile server (ver 6.0.5) under Ubuntu Server 16.04 LTS.
I have a library “Test” with history 1 day. I add big(~500 MB) file to this library and delete this file and clear trash bin in this library. After run Garbage Collector, unused blocks is not removed. If I disable a history on this library and run GC, unused blocks is removed. I am watching this problem only on big files(~500 MB), if I try delete files and clear trash bin on files size 3-10 MB, all works normally, and GC remove unused blocks.

I run previous version of Seafile server (ver 5.1.4) on other instance under Ubuntu Server 16.04 LT, and see the same problem.

Really Seafile can’t work normally with big files? Or it may be fixed with Seafile server configuration?

Thanks!

That’s interesting to hear. Can anyone other confirm this? @daniel.pan
@AlterMax Can you test what happens, after you delete this library. Will the space get free then or not?

I [quote=“marcusm, post:2, topic:904”]
Will the space get free then or not?
[/quote]

After delete the library, cleaning trash bin in admin panel, and run GC with -r key, all unused blocks a remove normally.

This is because Seafile always keep at least the last deleted file (or the last commit if you know the seafile data storage model).

The reason for this behaviour is that, for example, you set a library as keeping history for 7 days. Then you delete a file today after you had not touched the library for 8 days. This behaviour will help to let the user be able to recover the most recent deleted file.

I don’t think so… And my tests show that it is not so.
I spent a few more tests, and got the following results:

Test 1

  • created new repo and set history = 1 day
  • added to this repo 20 files size of 5 MB each
  • deleted all files from repo and cleaning trash bin
  • measured repo folder size before GC: ac2bd385-05a1-4e49-bff8-58f1b9deb293 = 100 MB
  • launched GC:
    gc-core.c(440): GC version 1 repo GC_Test(ac2bd385-05a1-4e49-bff8-58f1b9deb293) gc-core.c(313): GC started. Total block number is 100. gc-core.c(46): GC index size is 1024 Byte. gc-core.c(327): Populating index. gc-core.c(181): Populating index for repo ac2bd385. gc-core.c(234): Traversed 3 commits, 10 blocks. gc-core.c(341): Scanning and deleting unused blocks. gc-core.c(364): GC finished. 100 blocks total, about 10 reachable blocks, 90 blocks are removed.
  • measured repo folder size after GC:ac2bd385-05a1-4e49-bff8-58f1b9deb293 = 10 MB

Test 2

  • created new repo and set history = 1 day

  • added to this repo 1 file size of 569 MB

  • deleted all files from repo and cleaning trash bin

  • measured repo folder size before GC: ba20c238-821b-4f57-87f1-02460398b4f9 = 569 MB

  • launched GC:
    gc-core.c(440): GC version 1 repo GC_Test_500(ba20c238-821b-4f57-87f1-02460398b4f9) gc-core.c(313): GC started. Total block number is 569. gc-core.c(46): GC index size is 1024 Byte. gc-core.c(327): Populating index. gc-core.c(181): Populating index for repo ba20c238. gc-core.c(234): Traversed 2 commits, 569 blocks. gc-core.c(341): Scanning and deleting unused blocks. gc-core.c(364): GC finished. 569 blocks total, about 569 reachable blocks, 0 blocks are removed.

  • measured repo folder size after GC: ba20c238-821b-4f57-87f1-02460398b4f9 = 569 MB

  • set history to don't save history

  • launched GC again:
    gc-core.c(440): GC version 1 repo GC_Test_500(ba20c238-821b-4f57-87f1-02460398b4f9) gc-core.c(313): GC started. Total block number is 569. gc-core.c(46): GC index size is 1024 Byte. gc-core.c(327): Populating index. gc-core.c(181): Populating index for repo ba20c238. gc-core.c(234): Traversed 1 commits, 0 blocks. gc-core.c(341): Scanning and deleting unused blocks. gc-core.c(364): GC finished. 569 blocks total, about 0 reachable blocks, 569 blocks are removed.

  • measured repo folder size after GC: ba20c238-821b-4f57-87f1-02460398b4f9 = 0 MB


IMHO such behavior is very wasteful for storing large files with versioning in repo.
I expected that the cleaning trash bin uniquely removes files from the disk and free up space…

3 Likes

Hi all! I agree - did the same thing and had exactly the same bug.

Seafile server (ver 6.0.7) under Debian 8.7 Linux 3.16.0-4-amd64. Everything was deployed with an installation script https://github.com/haiwen/seafile-server-installer (MariaDB, Memcached and NGINX).

After removing the big file, you can try make some update to any file in that library. Then clean the trash and run GC. The file should be cleaned up.

I confirm the bug, and I confirm that creating a file, emptying the bin and running again the gc fixes it.

Still not fixed after such a long time?

Seems to be the intended behavior and not a bug:

2 Likes

Still happens, and the file is not recoverable after cleaning the bin. This doesn’t make sense.

@daniel.pan / @Jonathan

As mentioned above: “After removing the big file, you can try make some update to any file in that library. Then clean the trash and run GC. The file should be cleaned up.”

The “previous commit” before “the last commit” is not cleaned.

I think the behaviour is acceptable.

1 Like