Real-time backup unable to fetch repo-list


#1

Hi all,

I reported this already in the topic “Seafile Pro 6.2 is ready”, but the post got little attention.

The problem is the following:
Since I upgraded from pro 6.1.8 to pro 6.2.4 my real-time backup cannot fetch the repo-list any more, and consequently the backup does not work.

The error (seafile.log) I get is:

[01/08/2018 02:09:49 PM] http-tx-mgr.c(2034): Sync polling timer triggered, start to fetch repo list from primary.
[01/08/2018 02:09:49 PM] http-tx-mgr.c(514): libcurl failed to GET https://linda.ifi.uzh.ch/seafhttp/server-sync/repo-list: 
Problem with the SSL CA cert (path? access rights?).

I am using debian (jessie and stretch show the same error) and I have also created the CA bundle in CentOS’s CA bundle path (as described in the manual for setting up the real-time backup).

Daniel also replied to my initial post:

This could well be. However, I am not familiar with CentOS. Therefore I have no clue as to what system libraries might be incompatible nor do I have a clue how I could work around this. Any hints appreciated.

best,
hp


#2

Can you check the pro version build on Ubuntu? Does it have the same problem?


#3

ok, I will try. I can switch between the general and the ubuntu version without running any update script, right?
(besides updating the symlink of course)

Is there a way to manually get the repo-list from the primary server?

if I do

wget -v https://[primary-server]/seafhttp/server-sync/repo-lists

I only get

ERROR 400: Bad Request

thanks,
Hp


#4

I installed the ubuntu variant and getting the repo-list works.

however, seaf-backup-cmd.sh shows me a wrong total number of libraries (15 instead of 249). Furthermore, if I restart seafile (on the real-time backup), the total number of libraries goes further down.

I guess this has nothing to do with my original problem.

But to further diagnose this, it would be helpful if I could get the repo-list manually, to check if the primary server provides the correct list.

best,
Hp


#5

Hi Daniel,

do you have any idea, why the real-time backup does not synchronize all libraries (see above)?
could this be caused by a inconsistency in the mysql database? (I had some replication errors in the past)

Best,
Hp


#6

Hi all,

we observed the same behaviour.

If you leave the Seafile backup service running for some time, does the number approach the real value?

Regards,
Moritz


#7

I will try. How long was “for some time” in your case?


#8

Some hours, I think I always left it unattended overnight and checked the next morning and then the number seemed ok.

How were you able to solve your Database Replication errors?
I keep having those…


#9

First, you have to take care that no (unintended) changes are made to the database on the real-time backup. I “solved” this by allowing only admins to log in to the server. But I guess you know this already.

Since I did that, a only get replication problems sporadically. I “solved” those by skipping the problematic operations using the mysql command

SET GLOBAL sql_slave_skip_counter = 1;

(or some higher number if necessary).

Furthermore, when doing seafile upgraded, you have to remember NOT to run the migrate skript on the real-time backup. The migration is already been take care of when upgrading the primary server.

These are my two cents concerning replication errors.


#10

Hi Moritz,

it seems, waiting did not help, I still do not see all repos on the real-time backup. I wonder if I should wipe it clean and start the synchronization from scratch.

Another question: I use a setup with two seafile application (frontend) servers and one background server. So far, I use the background server as the source (primary) for the backup. I wonder if this is a good idea (sometimes the load on the background server is very high, indexing, virus scanning). Would you recommend that I use one of the frontend servers as the source for the backup?

Best,
Hp


#11

Can you restart the backup service and check the messages in seafile.log on the backup server?


#12

I rebootet the backup server, this is what I see:

[01/12/18 09:28:26] http-server.c(195): fileserver: worker_threads = 10
[01/12/18 09:28:26] http-server.c(208): fileserver: backlog = 32
[01/12/18 09:28:26] http-server.c(223): fileserver: fixed_block_size = 8388608
[01/12/18 09:28:26] http-server.c(238): fileserver: web_token_expire_time = 3600
[01/12/18 09:28:26] http-server.c(253): fileserver: max_indexing_threads = 1
[01/12/2018 09:28:26 AM] ../common/mq-mgr.c(61): [mq client] mq cilent is started
[01/12/2018 09:28:26 AM] http-tx-mgr.c(2034): Sync polling timer triggered, start to fetch repo list from primary.

And the a lot of entries like:

[01/12/2018 09:28:27 AM] http-tx-mgr.c(872): Repo 01c98426 doesn't change, skip.

(for different repos, in total 734 such entries, although, on the backup server I have only 379 libraries)

Then:

[01/12/2018 09:28:27 AM] http-tx-mgr.c(2041): Fetch repo list from primary successfully.

And then quite a few entries like (also for various repos, reported here only for one repo):

[01/12/2018 09:28:27 AM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Sync init] success, transition to [Get head commit id].
[01/12/2018 09:28:27 AM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get head commit id] success, transition to [Get diff commit ids].
[01/12/2018 09:28:27 AM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get diff commit ids] success, transition to [Get diff commits].
[01/12/2018 09:28:27 AM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get diff commits] success, transition to [Get fs].
[01/12/2018 09:28:27 AM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get fs] success, transition to [Get blocks].
[01/12/2018 09:28:32 AM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get blocks] success, transition to [Sync db].
[01/12/2018 09:28:32 AM] http-tx-mgr.c(1844): Sync repo b1d97f75-eadb-449c-b9ff-b0e5b45eef66 successfully.
[01/12/2018 12:28:28 PM] [01/12/2018 12:28:28 PM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Sync init] success, transition to [Get head commit id].
[01/12/2018 12:28:28 PM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get head commit id] success, transition to [Get diff commit ids].
[01/12/2018 12:28:28 PM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get diff commit ids] success, transition to [Get diff commits].
[01/12/2018 12:28:28 PM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get diff commits] success, transition to [Get fs].
[01/12/2018 12:28:28 PM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get fs] success, transition to [Get blocks].
[01/12/2018 12:28:29 PM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get blocks] success, transition to [Sync db].
[01/12/2018 12:28:29 PM] http-tx-mgr.c(1844): Sync repo b1d97f75-eadb-449c-b9ff-b0e5b45eef66 successfully. 
[01/12/2018 02:45:05 PM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Sync init] success, transition to [Get head commit id].
[01/12/2018 02:45:06 PM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get head commit id] success, transition to [Get diff commit ids].
[01/12/2018 02:45:06 PM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get diff commit ids] success, transition to [Get diff commits].
[01/12/2018 02:45:10 PM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get diff commits] success, transition to [Get fs].
[01/12/2018 02:45:15 PM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get fs] success, transition to [Get blocks].
[01/12/2018 02:46:02 PM] http-tx-mgr.c(1854): Repo b1d97f75 sync status: [Get blocks] success, transition to [Sync db].
[01/12/2018 02:46:02 PM] http-tx-mgr.c(1844): Sync repo b1d97f75-eadb-449c-b9ff-b0e5b45eef66 successfully.

Everything seems to be fine, so far. Nevertheless, “seaf-backup-cmd.sh status” tells me

Total number of libraries: 13
Number of synchronized libraries: 10
Number of libraries waiting for sync: 0
Number of libraries syncing: 0
Number of libraries failed to sync: 3

List of syncing libraries:

List of libraries failed to sync:
e612a24e-02f8-44fa-8700-ea2b4e05e7bf
0e4d6d48-2e32-4036-851d-db6c5e5bc3e9
f494c75d-5571-4ee4-a1d3-69493e1ccb5b

So, there seem to be only 13 libraries visible, although in the logs I see that many more libraries have been checked. This is one problem.

Then I see the following entries in the log containing the keyword ‘fail’:

[01/12/2018 09:28:36 AM] http-tx-mgr.c(1752): Failed to get commit dc0c3e4e323b692593900bdcc6e0892846093f3c.
[01/12/2018 09:28:36 AM] http-tx-mgr.c(1841): Failed to sync repo 0e4d6d48, error in [Get blocks].
[01/12/2018 09:33:31 AM] http-tx-mgr.c(1841): Failed to sync repo e612a24e, error in [Get fs].
[01/12/2018 12:28:55 PM] http-tx-mgr.c(1752): Failed to get commit ae162867d7695462694088ee5fc1cc4e0905c68d.
[01/12/2018 12:28:55 PM] http-tx-mgr.c(1841): Failed to sync repo 0e4d6d48, error in [Get blocks].
[01/12/2018 12:29:39 PM] http-tx-mgr.c(514): libcurl failed to GET https://linda.ifi.uzh.ch/seafhttp/repo/9fdb2cd7-a434-44ee-8667-5b8e2895e62c/block/bd6919ef725eeb10949c25757e9141f5b5f6ccda: Timeout was reached.
[01/12/2018 12:29:39 PM] http-tx-mgr.c(1841): Failed to sync repo 9fdb2cd7, error in [Get blocks].
[01/12/2018 12:33:54 PM] http-tx-mgr.c(1841): Failed to sync repo e612a24e, error in [Get fs].
[01/12/2018 02:45:32 PM] http-tx-mgr.c(1752): Failed to get commit 2c418b01edd2e0ed43e3ecd9ee6503942c365475.
[01/12/2018 02:45:32 PM] http-tx-mgr.c(1841): Failed to sync repo 0e4d6d48, error in [Get blocks].
[01/12/2018 02:46:38 PM] http-tx-mgr.c(514): libcurl failed to GET https://linda.ifi.uzh.ch/seafhttp/repo/f494c75d-5571-4ee4-a1d3-69493e1ccb5b/block/ae8d87145eb74f7d2a3aa7be0a5a35fea82e516f: Timeout was reached.
[01/12/2018 02:46:38 PM] http-tx-mgr.c(1841): Failed to sync repo f494c75d, error in [Get blocks].
[01/12/2018 02:49:56 PM] http-tx-mgr.c(1841): Failed to sync repo e612a24e, error in [Get fs].

I didn’t look into those errors so far, because the fact that the backup status sees only 13 libraries (instead of 256) seemed to be more relevant to me.


#13

Jonathan,

I hope you can give me some feedback on my log/error messages.

Best,
Hp


#14

Hi @hkunz

Sorry for the late reply. It’s actually a display bug to not display all synced libraries on the backup server. The “missing” libraries are just not changed, so they’re not synced at the moment. This was a change we added in recent version. We’ll fix the display.

For the failed libraries, you can check the seafile.log on primary for some error messages.


#15

ok, thanks. the clears it up a little :slight_smile:

Syncing a few libraries I get this error (seafile.log on the real-time backup):

 [01/17/2018 10:35:06 AM] http-tx-mgr.c(1752): Failed to get commit 363231ba658169f3c79a543099774e8f701531a5.
 [01/17/2018 10:35:06 AM] http-tx-mgr.c(1841): Failed to sync repo 0e4d6d48, error in [Get blocks].

I see no corresponding error on the primary server. Could you explain to me what this error means?

Another error I get (also on the real-time server):

[01/17/2018 10:44:32 AM] http-tx-mgr.c(1318): Bad response code for POST https://linda.ifi.uzh.ch/seafhttp/server-sync/repo/e612a24e-02f8-44fa-8700-ea2b4e05e7bf/multi-fs-id-list/?client-head=c3bc7013a1b00be049842a1b6d7711e50a1837c7&force=0: 502.
[01/17/2018 10:44:32 AM] http-tx-mgr.c(1841): Failed to sync repo e612a24e, error in [Get fs].

Also here, I see no errors on the the primary (in seafile.log). What see in this case are a bunch of

Failed to set e612a24e-02f8-44fa-8700-ea2b4e05e7bf-5ffe0cbefb8feeee4ad4471ff527d1a2e9eab7fd to memcached: ITEM TOO BIG.

which you told me to ignore.

could you shed some light on these two errors, so that I get an idea on how to proceed?

Best,
Hp


#16

Hi @hkunz
Can these 3 libraries be accessed on primary server? You can run fsck for the libraries if they are corrupted.


#17

yes, the libraries can be accessed and are syncing without errore. I also check one of the with seaf-fsck. No issues reported. I had that in the past in one instance. I solved that one with seaf-backup.sh -f [repoid]. I guess this will help with these for libraries too.


#18

I have two libraries which I cannot sync to the real-time backup. for two different reasons. as those have nothing to do with the original topic I will open new reports.

Thanks everybody for their help!