Recovering 100,000+ deleted files from trash

I have a library in Seafile (Community) with full history turned on that stores over 100,000 files. I want to recover the deleted files, but both the Seahub UI and the API time out after spending several minutes trying to populate the trash. I did increase the timeout on the Nginx proxy server in front of Seafile and that resolved the (1 minute?) timeout to the API, but I stopped the program after waiting 30+ minutes for a response for listing the trash contents. Seafile is just too slow at this scale.

The server is capable… Ryzen 7 5700X with 64GB of RAM + a PCIe 4.0 NVMe SSD.

I don’t actually have to restore the trash/deleted files, but that would be ideal. At a minimum, I need a list of all the file names that were deleted in the last 30 days or so.

I have full access to the server, so I have GUI and API access to Seafile, as well as full admin access to the underlying MariaDB server. I tried looking in MariaDB to see if I could get a list of the deleted files, but none of the tables appear to contain this data.

The core issue that I need assistance with is “how to use Seafile when the number of files is really, really, really large.”

Do you use version12.0?

You can also use the library history feature to see the snapshots of the library and restore deleted files there. The library history page does not depends on the number of files.

1 Like

It’s on 11.0.13, 12.0 wasn’t quite out yet when I set this up. I do see that 12 keeps a record of deleted files in the DB, but I’m not sure that I would retroactively get this benefit by upgrading to 12.

Library history does load quickly in the UI, but there are so many files that are added and deleted even on a weekly basis that getting back to the data I need is cumbersome.

The web API is version 2.1. api/v2.1/repos/repo_id/trash/ just hangs for 30+ minutes, even if show_days is “1”.

Since api/v2.1/repos/repo_id/history/ supports pagination, I have a Python script paginating through to the beginning and I will write another program to hopefully populate a list of files that were deleted over a given period of time. This would at least give me a manifest of the deleted files’ names (which, again, will suffice). I may be able to write another program to pull file history of each using some ID value for each file? I’m not sure.

I noticed that after paginating the /trash endpoint a few times it just times out. htop reports 100% cpu usage for the thread (a new one is created each time I restart the app to try to fetch these downloaded documents). The commit ID it hangs on has “only” 4000 file changes. Eventually the request returns 502.

I’m guessing there’s nothing I can do to recover these files.

@daniel.pan does fsck support exporting deleted files or will it skip those?

@jspinella if these files are important for you do not run the garbage collector until you have sorted this out. Even when fsck export does not support this you could follow the following procedure if the files are critical to you (it is some effort).

Given you have a copy/backup of the current version of that library you could search the commits folder on the server on disk for that library for the timeframe the files were deleted (by creation date). Then go to the database and change the current commit for the library in the database to a commit slightly before deleting the files (the id is a combination of the folder name and commit filename). This will show the status as it was before the files were deleted. Download them and revert the commit back to what was there before you changed the commit. You can now copy the files into the synced folder on the client side and they should be returning in the Seafile ui as well.

FSCK does not support exporting deleted files.

That sounds like a very complicated, clever, complicated solution and I’m bound to screw it up without more detailed instructions- but thank you for taking the time to reply.

I was running low on disk space so I had to run garbage collection and I disabled file versioning, so the trash/file revision history is gone. That’s ok. Better to find out Seafile’s limitations now than with a bonafide emergency.