Garbage Collector questions

Hey Folks,

just started to implement the GC and have some questions:

The official Manual states this first:

The GC program cleans up two types of unused blocks:

Blocks that no library references to, that is, the blocks belong to deleted libraries;
If you set history length limit on some libraries, the out-dated blocks in those libraries will also be removed.

for me the first statement sounds like only deleted libraries are collected but not deleted files… this might be just a little wrong wording as I would expect this to also delete deleted files…

the second statement I don’t understand at all. It sound like file-versions older than 30 days in libraries that have versioning-limit of 30 days would be deleted?

Then a couples of lines later the Manual states this:

As described before, there are two types of garbage blocks to be removed. Sometimes just removing the first type of blocks (those that belong to deleted libraries) is good enough. In this case, the GC program won’t bother to check the libraries for outdated historic blocks. The “-r” option implements this feature:

seaf-gc.sh -r

again this sound like only deleted libraries are deleted… but what about files I delete?

and then there is the -r parameter, but does it add its functionaly or do I have to run GC with and without -r ?

Can someone explain me this more nicely?

Thanks alot
Michael

seaf-gc.sh will remove deleted libraries and files.

seaf-gc.sh -r will remove deleted libraries, deleted files and file versions older than your history settings. So if you set your history to 30 days it will remove all file versions older than 30 days.

I allways run seaf-gc.sh -r since it removes all unnecessary junk.

BR
Aco

2 Likes

much thanks mate !

I’m actually not sure if the answer is correct. In the manual it says -r does not cleanup versions, but it only deletes deleted libraries. And running it on my server confirms it. seaf-gc.sh -r only deletes deleted libraries.

While just seaf-gc.sh cleans up everything.

1 Like

maybe it helps to look at the seaf-gc.sh in some detail

Most likely it’s only a wrapper for some c/c++ program.

The manual is quite clear, put the -r if you want the garbage collector to not bother checking time constraints for library blocks.

I think that its says the complete different:

“Sometimes just removing the first type of blocks (those that belong to deleted libraries) is good enough. In this case, the GC program won’t bother to check the libraries for outdated historic blocks. The “-r” option implements this feature”

Yes it says it’ll only deletes deleted libraries with the -r option, and it’s what it does.

Okay guys, I understand that this is not very interessting. Yet I want to get to an answer as this topic is still “warm”.

so after 3 Months of using Seafile, my storage filled up to 1,8 TB (while the Webinterface saying ~770 gb used)

  1. after running seafgc storage utilization went down to 940 GB
  2. after running seafgc -r storage utilization went down to 773 GB.

This is quite nice, yet opens more questions:

a) what happens to files/folders on libraries without version-history? are they not realy deleted and just moved into a users paperbin? do users have to goto the webinterface and delete stuff there finally?

b) what happens to files that are overdue to the libraries version-history? are they just marked for GC to be deleted in next run?

at this point I notice that Seafile would grow and grow, while never really deleting anything from the storage that a) users delete and b) run out of history, if the admin doesnot run GC… This is very important to understand and why isn’t GC part of the main installation-manual or linked to it with big red exclamation-mark in the manual with a IMPORTANT notice?

As I would like to run one command via cron every saturday or sunday, it still remains uncertain if seafgc.sh -r also contains seafgc.sh and I can therefor just use seafgc.sh -r as all-in-one solution.

One of the first comments here was exactly that seafgc.sh -r does it all, may that guy was correct.
Still I don’t really feel comtable not knowing what exactly to use. And I can not rollback my server and start with seafgc.sh -r and see if it goes down to 773 Gb right away. That test would have been nice, but its not possible anymore for me.

Hope you guys have the real answer to this.
kind regards,
Michael

With Seafile nothing is being deleted without the gc.

gc finds it out by itself.

I don’t understand what you mean by this.

seafgc.sh -r only deletes deleted libraries. seafgc.sh without -r cleans up everything that can be cleaned up.

That is wrong. It only removes deleted libraries.

1 Like

Yep I was wrong.

Seaf-gc.sh -r removes only Deleted libraries, and seaf-gc.sh without -r switch deletes everything it can be deleted.

Sorry for inconvenience.

BR
Babiloni

So to conclude, in a proper environment, as an admin you want to run two
Cronjobs. One without -r (maybe more often) and one with -r (maybe less often)

I will run both after each other.

Thanks to all for their contribution

With -r should never be required and without -r takes much more time, I wouldn’t run it too often. How often it is necessary depends on how much data is being changed, added and deleted and how much space you have.

On my server it runs once a month and the intervall works quite well.