We currently have a real-time backup setup of a Seafile Server pod on Kubernetes being backed up on another one at a different cluster. The backup had no issues with most of the libraries, but was getting some errors with one of the bigger ones (2.7TB). The errors were the following:
[03/12/2021 08:03:02 AM] http-tx-mgr.c(1984): Repo 58946ebf sync status: [Sync init] success, transition to [Get head commit id].
[03/12/2021 08:03:02 AM] http-tx-mgr.c(1984): Repo 58946ebf sync status: [Get head commit id] success, transition to [Get diff commit ids].
[03/12/2021 08:03:04 AM] http-tx-mgr.c(1971): Failed to sync repo 58946ebf, error in [Sync error].
[03/12/2021 08:03:32 AM] http-tx-mgr.c(1984): Repo 58946ebf sync status: [Sync init] success, transition to [Get head commit id].
[03/12/2021 08:03:32 AM] http-tx-mgr.c(1984): Repo 58946ebf sync status: [Get head commit id] success, transition to [Get diff commit ids].
[03/12/2021 08:03:34 AM] http-tx-mgr.c(1971): Failed to sync repo 58946ebf, error in [Sync error].
For other libraries, we had fixed that issue by running seaf-fsck.sh on the library on the primary server, and if it encountered no errors, we would force a sync. So after running seaf-fsck.sh, I forced a sync on that library, and got the following in the logs:
cat logs/seafile.log | grep 58946ebf:
[03/12/2021 09:46:54 AM] http-tx-mgr.c(1984): Repo 58946ebf sync status: [Sync init] success, transition to [Get head commit id].
[03/12/2021 09:46:54 AM] http-tx-mgr.c(1984): Repo 58946ebf sync status: [Get head commit id] success, transition to [Get diff commit ids].
[03/12/2021 09:47:16 AM] http-tx-mgr.c(1984): Repo 58946ebf sync status: [Get diff commit ids] success, transition to [Get diff commits].
[03/12/2021 01:59:10 PM] http-tx-mgr.c(1984): Repo 58946ebf sync status: [Get diff commits] success, transition to [Get fs].
cat logs/slow_logs/seafile_slow_storage.log | grep 58946ebf:
2021/03/12:09:47:36 - commits - write - 58946ebf-d080-46de-971f-52e9401ba79c - e9cf614225a9187b38bd143a4c61651b8f2ce0a6 - 3.167
2021/03/12:09:47:40 - commits - write - 58946ebf-d080-46de-971f-52e9401ba79c - 896bdf05b837c5462bdf386a0e36aa5c9bcc1c98 - 5.000
2021/03/12:09:47:45 - commits - write - 58946ebf-d080-46de-971f-52e9401ba79c - 9e29fe2a04f8b6ed67d7696a655c4920bbe68549 - 5.000
[…]
2021/03/12:13:58:55 - commits - write - 58946ebf-d080-46de-971f-52e9401ba79c - 3db652c7cf9f9c5222584d129236b1b9645bf0a9 - 5.000
2021/03/12:13:59:00 - commits - write - 58946ebf-d080-46de-971f-52e9401ba79c - 69f652d1301a198bb03fe3e8340c08403956e903 - 4.927
2021/03/12:13:59:05 - commits - write - 58946ebf-d080-46de-971f-52e9401ba79c - 23952b36ce9d12a17e4cdf28cdd96179cbfeb1a5 - 5.000
It is not showing any errors, but there have been no mention of that library in the logs since that date, yet the backups are still shown as ongoing (it has now been running for 10 days):
./seafile-server-latest/seaf-backup-cmd.sh status
Total number of libraries: 175
Number of synchronized libraries: 173
Number of libraries waiting for sync: 0
Number of libraries syncing: 1
Number of libraries failed to sync: 1
List of syncing libraries:
58946ebf-d080-46de-971f-52e9401ba79c
List of libraries failed to sync:
b531cce8-5171-4499-a360-b3ebea49e10c
Also, df -h shows no change in the currently used disk space (currently at 1.2TB out of 4.9TB, which accounts for all the other libraries).
Has anyone seen that kind of strange behavior? Would anyone know what would be blocking this process, or where I could find any other logs that would help in identifying the root of this issue?