Real-time backup unable to fetch repo-list

Hi all,

I reported this already in the topic “Seafile Pro 6.2 is ready”, but the post got little attention.

The problem is the following:
Since I upgraded from pro 6.1.8 to pro 6.2.4 my real-time backup cannot fetch the repo-list any more, and consequently the backup does not work.

The error (seafile.log) I get is:

[01/08/2018 02:09:49 PM] http-tx-mgr.c(2034): Sync polling timer triggered, start to fetch repo list from primary.
[01/08/2018 02:09:49 PM] http-tx-mgr.c(514): libcurl failed to GET https://linda.ifi.uzh.ch/seafhttp/server-sync/repo-list: 
Problem with the SSL CA cert (path? access rights?).

I am using debian (jessie and stretch show the same error) and I have also created the CA bundle in CentOS’s CA bundle path (as described in the manual for setting up the real-time backup).

Daniel also replied to my initial post:

This could well be. However, I am not familiar with CentOS. Therefore I have no clue as to what system libraries might be incompatible nor do I have a clue how I could work around this. Any hints appreciated.

best,
hp

Can you check the pro version build on Ubuntu? Does it have the same problem?

ok, I will try. I can switch between the general and the ubuntu version without running any update script, right?
(besides updating the symlink of course)

Is there a way to manually get the repo-list from the primary server?

if I do

wget -v https://[primary-server]/seafhttp/server-sync/repo-lists

I only get

ERROR 400: Bad Request

thanks,
Hp

I installed the ubuntu variant and getting the repo-list works.

however, seaf-backup-cmd.sh shows me a wrong total number of libraries (15 instead of 249). Furthermore, if I restart seafile (on the real-time backup), the total number of libraries goes further down.

I guess this has nothing to do with my original problem.

But to further diagnose this, it would be helpful if I could get the repo-list manually, to check if the primary server provides the correct list.

best,
Hp

Hi Daniel,

do you have any idea, why the real-time backup does not synchronize all libraries (see above)?
could this be caused by a inconsistency in the mysql database? (I had some replication errors in the past)

Best,
Hp

Hi all,

we observed the same behaviour.

If you leave the Seafile backup service running for some time, does the number approach the real value?

Regards,
Moritz

I will try. How long was “for some time” in your case?

Some hours, I think I always left it unattended overnight and checked the next morning and then the number seemed ok.

How were you able to solve your Database Replication errors?
I keep having those…

First, you have to take care that no (unintended) changes are made to the database on the real-time backup. I “solved” this by allowing only admins to log in to the server. But I guess you know this already.

Since I did that, a only get replication problems sporadically. I “solved” those by skipping the problematic operations using the mysql command

SET GLOBAL sql_slave_skip_counter = 1;

(or some higher number if necessary).

Furthermore, when doing seafile upgraded, you have to remember NOT to run the migrate skript on the real-time backup. The migration is already been take care of when upgrading the primary server.

These are my two cents concerning replication errors.

Hi Moritz,

it seems, waiting did not help, I still do not see all repos on the real-time backup. I wonder if I should wipe it clean and start the synchronization from scratch.

Another question: I use a setup with two seafile application (frontend) servers and one background server. So far, I use the background server as the source (primary) for the backup. I wonder if this is a good idea (sometimes the load on the background server is very high, indexing, virus scanning). Would you recommend that I use one of the frontend servers as the source for the backup?

Best,
Hp

Can you restart the backup service and check the messages in seafile.log on the backup server?

I rebootet the backup server, this is what I see:

[01/12/18 09:28:26] http-server.c(195): fileserver: worker_threads = 10
[01/12/18 09:28:26] http-server.c(208): fileserver: backlog = 32
[01/12/18 09:28:26] http-server.c(223): fileserver: fixed_block_size = 8388608
[01/12/18 09:28:26] http-server.c(238): fileserver: web_token_expire_time = 3600
[01/12/18 09:28:26] http-server.c(253): fileserver: max_indexing_threads = 1
[01/12/2018 09:28:26 AM] ../common/mq-mgr.c(61): [mq client] mq cilent is started
[01/12/2018 09:28:26 AM] http-tx-mgr.c(2034): Sync polling timer triggered, start to fetch repo list from primary.

And the a lot of entries like:

[01/12/2018 09:28:27 AM] http-tx-mgr.c(872): Repo 01c98426 doesn't change, skip.

(for different repos, in total 734 such entries, although, on the backup server I have only 379 libraries)

Then:

[01/12/2018 09:28:27 AM] http-tx-mgr.c(2041): Fetch repo list from primary successfully.

And then quite a few entries like (also for various repos, reported here only for one repo):

[01/12/2018 09:28:27 AM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Sync init] success, transition to [Get head commit id].
[01/12/2018 09:28:27 AM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get head commit id] success, transition to [Get diff commit ids].
[01/12/2018 09:28:27 AM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get diff commit ids] success, transition to [Get diff commits].
[01/12/2018 09:28:27 AM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get diff commits] success, transition to [Get fs].
[01/12/2018 09:28:27 AM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get fs] success, transition to [Get blocks].
[01/12/2018 09:28:32 AM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get blocks] success, transition to [Sync db].
[01/12/2018 09:28:32 AM] http-tx-mgr.c(1844): Sync repo b1d97f75-eadb-449c-b9ff-b0e5b45eef66 successfully.
[01/12/2018 12:28:28 PM] [01/12/2018 12:28:28 PM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Sync init] success, transition to [Get head commit id].
[01/12/2018 12:28:28 PM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get head commit id] success, transition to [Get diff commit ids].
[01/12/2018 12:28:28 PM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get diff commit ids] success, transition to [Get diff commits].
[01/12/2018 12:28:28 PM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get diff commits] success, transition to [Get fs].
[01/12/2018 12:28:28 PM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get fs] success, transition to [Get blocks].
[01/12/2018 12:28:29 PM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get blocks] success, transition to [Sync db].
[01/12/2018 12:28:29 PM] http-tx-mgr.c(1844): Sync repo b1d97f75-eadb-449c-b9ff-b0e5b45eef66 successfully. 
[01/12/2018 02:45:05 PM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Sync init] success, transition to [Get head commit id].
[01/12/2018 02:45:06 PM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get head commit id] success, transition to [Get diff commit ids].
[01/12/2018 02:45:06 PM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get diff commit ids] success, transition to [Get diff commits].
[01/12/2018 02:45:10 PM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get diff commits] success, transition to [Get fs].
[01/12/2018 02:45:15 PM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get fs] success, transition to [Get blocks].
[01/12/2018 02:46:02 PM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get blocks] success, transition to [Sync db].
[01/12/2018 02:46:02 PM] http-tx-mgr.c(1844): Sync repo b1d97f75-eadb-449c-b9ff-b0e5b45eef66 successfully.

Everything seems to be fine, so far. Nevertheless, “seaf-backup-cmd.sh status” tells me

Total number of libraries: 13
Number of synchronized libraries: 10
Number of libraries waiting for sync: 0
Number of libraries syncing: 0
Number of libraries failed to sync: 3

List of syncing libraries:

List of libraries failed to sync:
e612a24e-02f8-44fa-8700-ea2b4e05e7bf
0e4d6d48-2e32-4036-851d-db6c5e5bc3e9
f494c75d-5571-4ee4-a1d3-69493e1ccb5b

So, there seem to be only 13 libraries visible, although in the logs I see that many more libraries have been checked. This is one problem.

Then I see the following entries in the log containing the keyword ‘fail’:

[01/12/2018 09:28:36 AM] http-tx-mgr.c(1752): Failed to get commit dc0c3e4e323b692593900bdcc6e0892846093f3c.
[01/12/2018 09:28:36 AM] http-tx-mgr.c(1841): Failed to sync repo 0e4d6d48, error in [Get blocks].
[01/12/2018 09:33:31 AM] http-tx-mgr.c(1841): Failed to sync repo e612a24e, error in [Get fs].
[01/12/2018 12:28:55 PM] http-tx-mgr.c(1752): Failed to get commit ae162867d7695462694088ee5fc1cc4e0905c68d.
[01/12/2018 12:28:55 PM] http-tx-mgr.c(1841): Failed to sync repo 0e4d6d48, error in [Get blocks].
[01/12/2018 12:29:39 PM] http-tx-mgr.c(514): libcurl failed to GET https://linda.ifi.uzh.ch/seafhttp/repo/9fdb2cd7-a434-44ee-8667-5b8e2895e62c/block/bd6919ef725eeb10949c25757e9141f5b5f6ccda: Timeout was reached.
[01/12/2018 12:29:39 PM] http-tx-mgr.c(1841): Failed to sync repo 9fdb2cd7, error in [Get blocks].
[01/12/2018 12:33:54 PM] http-tx-mgr.c(1841): Failed to sync repo e612a24e, error in [Get fs].
[01/12/2018 02:45:32 PM] http-tx-mgr.c(1752): Failed to get commit 2c418b01edd2e0ed43e3ecd9ee6503942c365475.
[01/12/2018 02:45:32 PM] http-tx-mgr.c(1841): Failed to sync repo 0e4d6d48, error in [Get blocks].
[01/12/2018 02:46:38 PM] http-tx-mgr.c(514): libcurl failed to GET https://linda.ifi.uzh.ch/seafhttp/repo/f494c75d-5571-4ee4-a1d3-69493e1ccb5b/block/ae8d87145eb74f7d2a3aa7be0a5a35fea82e516f: Timeout was reached.
[01/12/2018 02:46:38 PM] http-tx-mgr.c(1841): Failed to sync repo f494c75d, error in [Get blocks].
[01/12/2018 02:49:56 PM] http-tx-mgr.c(1841): Failed to sync repo e612a24e, error in [Get fs].

I didn’t look into those errors so far, because the fact that the backup status sees only 13 libraries (instead of 256) seemed to be more relevant to me.

Jonathan,

I hope you can give me some feedback on my log/error messages.

Best,
Hp

Hi @hkunz

Sorry for the late reply. It’s actually a display bug to not display all synced libraries on the backup server. The “missing” libraries are just not changed, so they’re not synced at the moment. This was a change we added in recent version. We’ll fix the display.

For the failed libraries, you can check the seafile.log on primary for some error messages.

ok, thanks. the clears it up a little :slight_smile:

Syncing a few libraries I get this error (seafile.log on the real-time backup):

 [01/17/2018 10:35:06 AM] http-tx-mgr.c(1752): Failed to get commit 363231ba658169f3c79a543099774e8f701531a5.
 [01/17/2018 10:35:06 AM] http-tx-mgr.c(1841): Failed to sync repo 0e4d6d48, error in [Get blocks].

I see no corresponding error on the primary server. Could you explain to me what this error means?

Another error I get (also on the real-time server):

[01/17/2018 10:44:32 AM] http-tx-mgr.c(1318): Bad response code for POST https://linda.ifi.uzh.ch/seafhttp/server-sync/repo/e612a24e-02f8-44fa-8700-ea2b4e05e7bf/multi-fs-id-list/?client-head=c3bc7013a1b00be049842a1b6d7711e50a1837c7&force=0: 502.
[01/17/2018 10:44:32 AM] http-tx-mgr.c(1841): Failed to sync repo e612a24e, error in [Get fs].

Also here, I see no errors on the the primary (in seafile.log). What see in this case are a bunch of

Failed to set e612a24e-02f8-44fa-8700-ea2b4e05e7bf-5ffe0cbefb8feeee4ad4471ff527d1a2e9eab7fd to memcached: ITEM TOO BIG.

which you told me to ignore.

could you shed some light on these two errors, so that I get an idea on how to proceed?

Best,
Hp

1 Like

Hi @hkunz
Can these 3 libraries be accessed on primary server? You can run fsck for the libraries if they are corrupted.

yes, the libraries can be accessed and are syncing without errore. I also check one of the with seaf-fsck. No issues reported. I had that in the past in one instance. I solved that one with seaf-backup.sh -f [repoid]. I guess this will help with these for libraries too.

I have two libraries which I cannot sync to the real-time backup. for two different reasons. as those have nothing to do with the original topic I will open new reports.

Thanks everybody for their help!