Get the garbage collector to delete history files

I changed the history setting from 90 days to 10 days with the expectation the data of this user would free up about 1tb of space (user saves about 36GB of changed data every day and deletes the old).

SO i figgered run the cleanup-script from the manual (https://manual.seafile.com/maintain/seafile_gc.html) and i will get some space back, but it doesn’t

Anybody any idea how the get the old history data removed?

Kind Regards
Gerard

you need to run it twice to remove those blocks - if I remember correctly

I think this just matters for new files. The only option is to backup, delete and reupload the files. Then run the GC.

Are you running it with the “-r” flag as required?

“As described before, there are two types of garbage blocks to be removed. Sometimes just removing the first type (those belong to deleted libraries) of unused blocks is good enough. In this case, the GC program won’t bother to check the libraries for outdated historic blocks. The “-r” option implements this feature:”

Haai, tnx
Very well pointed out, this could easily have been the problem. But it isn’t…:frowning_man:
I Am running the script as user Root WITH the -r option at the end the line. run at several times actualy

I’ve set the history to 10 days, and everyday there is about 36 GB written by this user, so to 1.5TB should become max 400GB , but it doesn’t happen.
When i logon as this user, I DO see only 10 days of history, while earlier today, when the history setting was still 90 days, i saw MUCH MORE history.
So the history setting seems to have taken effect…

Kind Regards
Gerard

OK, if this the case, i won’t be able to clean data without deleting the whole repository of this user?
Seems to me something that wants to fixed by the developers??

Kind Regards
Gerard

This is not correct. If you believe it is, please provide your source.

I just checked on my end by setting a library I have always had at “Keep full history” to “No history”, ran garbage collection, and it deleted the blocks folder for that library down to match exactly what I have in there right now. (from about 20GB to 16).

If you can wait ~45 days, the files will get open for GC

Where are you getting 45 days from? He said he has his history set to 10 days.

If you use MariaDB, you marked all old files for gc. But you can’T manipulate the time before it can be completely deleted. If you recreate this scenario, you can see that the dates for expiration won’t change.

Because I thougt that the half of time could be over.

Of course, one option is to disable the file history, run the GC and set it to ten days expiration.

I’m not following.

Please clarify what you mean by “dates for expiration”?

I ment the Epiry Date. Is it wrong spelling? Google knows it.

No, I was confused but I think I understand what you are saying - that my test worked because I disabled history all together, but if I had 90 days as my history expiration instead of “Full History” and then changed it to 1 day, it wouldn’t have worked until all the files that were uploaded during the time that the library had it set to 90 days had their historical copies pruned out after the original 90 days.

I wonder if this is the same for files uploaded to a library that has Full History set - can their historical versions never be pruned if you change it later to something other than Full History?

I don’t really know because I never used this, but I wouldn’t code this into the tables, I would make a whitelist for better DB performance in high-used Systems. But maybe all wrong expiration days will be corrected after a restart.

Edit: A whitelist is even better for different dates in different libaries.

Alright, I just tested this scenario as well with another library I had, and it worked as expected. Garbage collection cleared out all blocks that held data for historical copies of files that were older than 1 day as expected.

With that in mind, it doesn’t make sense for it to have issues with a library that has a history setting that is one number, and is later changed to another number.

I have no libraries I can test that scenario with though.

Can you run Seafile fsck on the large library, just to make sure there is no corruption? (It can lead to unreachable / invisible history)

And what did seaf-gc report? It should give some output for each library (amongst others how many blocks there are, and how many were removed)

I got lost in your discussion, but needed the space very much.
So followed the advice (not to wait for xx days) and deleted the complete repository.

Tnx for your advice.

Kind Regards
Gerard