Memcached problems(?)


#21

yes I know, you said that, but this does not help at all in my understanding what might go wrong.

I have to disagree here, for two reasons:

  1. my memcached servers are working. When I take a tcpdump I see that seafile stores and retrieves entries from all three memcache servers. Actually, in the tcpdump I see no connection errors that would correspond to the errors (SERVER MARKED DEAD) I see in seafile.log
  2. it is not true, that using a hot standby (as opposed to using an active-active setup using e.g. haproxy) implies manual work to do the failover. using a floating IP (and heartbeat for example) is enough two switch from one seafile app server to the other. so, by shutting down 2 of my 3 memcached servers I would actually loose HA. And I am not prepared to do that without a justified reason.

anyhow, I just updated my seafile servers to pro-6.2.4 and the errors seem to be gone.

thanks everybody for their help and suggestions.


#22

Please see [quote=“daniel.pan, post:4, topic:4892”]
The memcache library sometimes has inconsistence state about the memcache servers.
[/quote]

See text above.

You’re right. Depends on the system and architecture of course. But for short testing i think it wouldn’t be a big problem using just one.

But glad to hear that it’s working now.


#23

unfortunately, the error came back.

but seahub.log indicates, that the error messages always concern the same memcached server. it is actually the memcached on my background server (who can have very big loads because of clamscan). I will disable memcached on this server, to see if this helps.

thanks again,
hp


#24

I configure my seafile cluster, to only use the memcached of the application servers, and not of the background node.
this went very well for 1:20h, then suddenly I got a lot of the SERVER IS MARKED ERRORS again. Unfortunately without any indication on which server is marked dead…

after restarting seafile, the errors were gone. amazingly, they did not reappear again since then. so, fingers crossed, it seems this solved this problem.

I have one other issue concerning memchached:

[01/09/2018 10:21:15 AM] ../common/obj-cache.c(106): Failed to set e612a24e-02f8-44fa-8700-ea2b4e05e7bf-a39558348330e29a2c77dfe75c86d9573b908364 to memcached: ITEM TOO BIG.

I see this error every now an then (if it occurs, it occurs many times within a few seconds). all those errors correspond to one library which is very large (4TB). is this something I have to worry about? would it make sense to increase the max item size memcached can handle?

many thanks,
hp


#25

You don’t need to worry about this.

For the SERVER IS MARKED ERRORS problem, after investigating the problem, we think that it is not recommended to use multiple memcache servers. Because memcache does not provide a server side robust cluster mechanism.

As you set retry timeout to 3600 seconds, when there is a connection error to a memecache server, the server will be marked as dead for 3600 seconds. If all the servers are marked as dead during these 3600 seconds, you will see SERVER IS MARKED ERRORS logs.


#26

thank you very much for these clarifications.

what is then the recommended setup for seafile clusters? to have only a single memcached server renders the concept of a cluster useless :slight_smile:

would the following scenario make sense:

  1. run the memcached service on the two (or more) front end application servers
  2. use a floating IP for memcached, which is owned by one of the application servers
  3. if this server goes down, the floating IP migrates to the other application server (using heartbeat or a similar solution)

of course, when step 3 occurs, the memcache will be more or less empty. would this be a problem, or would the memcached just be repopulated with the cached data?

best,
hp


#27

Yes this would avoid the memcached error issue. The empty cache shouldn’t affect Seafile functionality.


#28

ok, right now I use a clustered memcached on my two seafile application servers (but no memcached on the background node). so far without problems. as soon as problems reappear, I will try the solution outlined above and report back.


#29

I just did some digging around the Memcached stuff and I thought of the following:
For seahub (using pylibmc) we explicitly enabled the Ketama distribution algorithm as documented in the manual (https://manual.seafile.com/deploy/add_memcached.html).
But for seafile, we do not.

So based on http://sendapatch.se/projects/pylibmc/behaviors.html#ketama and http://docs.libmemcached.org/memcached_behavior.html#MEMCACHED_BEHAVIOR_DISTRIBUTION, I changed the line in seafile.conf (https://manual.seafile.com/deploy_pro/deploy_in_a_cluster.html#seafileconf) to:

memcached_options = --SERVER=192.168.1.134 --SERVER=192.168.1.135 --SERVER=192.168.1.136 \
    --POOL-MIN=10 --POOL-MAX=100 --RETRY-TIMEOUT=3600 \
    --HASH=md5 --DISTRIBUTION=consistent

So far, I do not see any problem caused by this, I will evaluate whether it helped with the original problem of a customer.
But maybe one of the devs has an additional opinion on this.

Regards,
Moritz


#30

oh, wow, thanks. I will try this and report back


#31

The current recommended configuration in the manual is:

[cluster]
enabled = true
memcached_options = --SERVER=192.168.1.134 --SERVER=192.168.1.135 --SERVER=192.168.1.136 --POOL-MIN=10 --POOL-MAX=100 --RETRY-TIMEOUT=3600

The other options has been set inside the code.

But we plan to change the recommendation to use an active-passive two-node memcached cluster. That would avoid the inherent clustering issue with memcached.


#32

Jonathan, thanks, these were exactly the settings I used. with those I got a lot of SERVER MARKED DEAD messages. So do you mean the setting above suggested by Moritz will not help me?


#33

With the multi-server memcache architecture, I think there is alway case that can have various issues. So we suggest to use active-passive two-node memcache architecture. The problem is that the current memcached cluster architecture leaves the clustering logic to the clients. Without built-in failover on the server side, I think it’s not possible to build reliable cluster.


#34

ok, so how do I setup such a two-node active-passive cluster? would this be (as I suggested above) two use a two-node memcached cluster combined with a floating IP?

best,
hp


#35

Yes, that’s what you mentioned. Use keepalived to migrate the virtual IP to the standby server.


#36

I’ve now switched our cluster to using just a single memcached server with a floating IP.
Let’s see how that goes! :wink:


#37

@schlarbm: I’m just interesting: everything goes OK with your memcached setup?


#38

I think we only had one case where the primary memcached server became offline since the change and the failover via keepalived worked fine, but the other Seafile instances still spat out “SERVER MARKED DEAD” for some time after that.
Maybe the configuration regarding the tuning of keepalive and retry parameters for memcached can still be optimized.


#39

@schlarbm What’s your memcache configuration in seafile.conf?


#40

hi guys,

just for reference, I will add how we solved the memcached problems:

  1. we have a active-standby setup for our seafile application servers, using a floating IP and heartbeat.
  2. according to the suggestion by jonathan to use only one memcached server (in a active-standby setup) it was straightforward in our case to also let the memcached daemon be managed by heartbeat such that it is always running on the same node which is the active seafile application server.
  3. therefore, in the configs (seafile, seahub) of all our seafile servers (application servers, background node) we can directly use the floating IP as the IP for the memcached service.

this solution is very simple and elegant, as long as you do not want to use load-balancing. and it seems to work so far.

to avoid memcached servers to be marked dead (when the primary server is rebooted, for example) we use the following options in seahub_settings.py:

CACHES = {
‘default’: {
‘BACKEND’: ‘django_pylibmc.memcached.PyLibMCCache’,
‘LOCATION’: [ ‘10.65.16.118:11211’, ],
‘OPTIONS’: {
‘ketama’: True,
‘remove_failed’: 0,
‘retry_timeout’: 1,
‘dead_timeout’: 60
}
}
}

best,
Hp