Yes, that’s what you mentioned. Use keepalived to migrate the virtual IP to the standby server.
I’ve now switched our cluster to using just a single memcached server with a floating IP.
Let’s see how that goes!
I think we only had one case where the primary memcached server became offline since the change and the failover via keepalived worked fine, but the other Seafile instances still spat out “SERVER MARKED DEAD” for some time after that.
Maybe the configuration regarding the tuning of keepalive and retry parameters for memcached can still be optimized.
hi guys,
just for reference, I will add how we solved the memcached problems:
- we have a active-standby setup for our seafile application servers, using a floating IP and heartbeat.
- according to the suggestion by jonathan to use only one memcached server (in a active-standby setup) it was straightforward in our case to also let the memcached daemon be managed by heartbeat such that it is always running on the same node which is the active seafile application server.
- therefore, in the configs (seafile, seahub) of all our seafile servers (application servers, background node) we can directly use the floating IP as the IP for the memcached service.
this solution is very simple and elegant, as long as you do not want to use load-balancing. and it seems to work so far.
to avoid memcached servers to be marked dead (when the primary server is rebooted, for example) we use the following options in seahub_settings.py:
CACHES = {
‘default’: {
‘BACKEND’: ‘django_pylibmc.memcached.PyLibMCCache’,
‘LOCATION’: [ ‘10.65.16.118:11211’, ],
‘OPTIONS’: {
‘ketama’: True,
‘remove_failed’: 0,
‘retry_timeout’: 1,
‘dead_timeout’: 60
}
}
}
best,
Hp
Hi Jonathan,
it’s just:
[cluster]
enabled = true
memcached_options = --SERVER=10.94.7.156 --POOL-MIN=10 --POOL-MAX=100
Regards, Moritz
We tried similar setup (CACHES = {…} in seahub_settings.py
), but with haproxy instead of heartbeat. Thus sefaile nodes has always one IP to access memcached. The configuration seems to be stable, but seahub.log
of other nodes is full of following exceptions:
2018-02-14 16:38:20,674 [ERROR] django.pylibmc:125 add MemcachedError: memcached_behavior_set returned 38
for behavior 'remove_failed' = 0
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/django_pylibmc/memcached.py", line 117, in add
return self._cache.add(key, value,
File "/usr/local/lib/python2.7/dist-packages/django_pylibmc/memcached.py", line 93, in _cache
client.behaviors = self._options
File "/usr/local/lib/python2.7/dist-packages/pylibmc/client.py", line 197, in set_behaviors
return super(Client, self).set_behaviors(_behaviors_numeric(behaviors))
Error: memcached_behavior_set returned 38 for behavior 'remove_failed' = 0
Do you have same errors? Can we just ignore them?
I suggest for single memcache server, all the options be removed.
I suspect that ketama hashing should be used together with remove_failed=1
I agree with Daniel that the above configuration should be removed from seahub_setting.py.
@schlarbm you observed “marked dead” because “remove_failed” and “ketama” option are hard coded in the C code right now. We will change this and also change the documentation too. (BTW, after some testing, I find that adding --RETRY-TIMEOUT=1 to seafile.conf would also avoid the “marked dead” situation too. But I prefer not to recommend to adding one more option to the config file.)
At last, the memcached config in seafile.conf should be:
[cluster]
enabled = true
memcached_options = --SERVER=ip --POOL-MIN=10 --POOL-MAX=100
in seahub_settings.py should be
CACHES = {
'default': {
'BACKEND': 'django_pylibmc.memcached.PyLibMCCache',
'LOCATION': ['ip:11211'],
}
Hi Jonathan!
Ah, okay!
The two snippets you posted are exactly what we currently have - and apart from that one instance there are no problems at the moment.
We wish you a happy new year holiday!
Moritz
We changed to memcached/keepalived solution. The setup works stable if keepalived points to one memcached instance. But we have similar issues if keepalived switches to backup memcached instance or goes back to master. Seafile cannot recover from this status: seafile server complains about Bad Access Token on background task server or nodes send empty urls to it. The only complete restart of all infrastucture (app nodes + background task server) fixes the problem. Shell I send our confs?
Hi @vmakarenko
Please provide the following config options:
- [cluster] section in seafile.conf
- CACHE option in seahub_settings.py
- keepalived configuration for migrating memcached VIP
Please also provide the error messages in seafile.log when you switch over the memcached server.
By the way, have you confirmed that the VIP has been migrated to the stand-by memcached server when the main server is shutdown?
seafile.conf:
[cluster]
enabled = true
memcached_options = --SERVER=V_IP:11211 --POOL-MIN=10 --POOL-MAX=100
health_check_port = 11001
seahub_settings.py
CACHES = { 'default': { 'BACKEND': 'django_pylibmc.memcached.PyLibMCCache', 'LOCATION': 'V_IP:11211', } }
keepalived.conf at any of 2 MEMCACHED instances, the only difference: priority
200, 101, 100
lobal_defs { notification_email { SYSADMIN_EMAIL } notification_email_from SYSADMIN_EMAIL smtp_server localhost smtp_connect_timeout 30 vrrp_garp_interval 5 vrrp_garp_master_delay 5 vrrp_garp_master_repeat 5 } vrrp_script check_memcached { script "/usr/lib/nagios/plugins/check_memcached.pl -H localhost" interval 2 # check every 2 seconds fall 2 # require 2 failures for KO rise 2 # require 2 successes for OK } vrrp_instance memcached { state MASTER interface eth0 virtual_router_id 56 priority 200 advert_int 1 smtp_alert authentication { auth_type PASS auth_pass PASSWORD } unicast_peer { MEMCACHED_INSTANCE_1_IP MEMCACHED_INSTANCE_2_IP MEMCACHED_INSTANCE_3_IP } virtual_ipaddress { V_IP } track_script { check_memcached } notify /usr/local/bin/keepalived_status.sh }
We checked VIP switching: completely fine with the keepalived settings.
seafile-background-tasks.log at background tatsks server after switching of MEMCACHED instance with keepalived:
[2018-03-05 16:48:39,688] [WARNING] failed to fetch document of task (http://127.0.0.1:8082/files/38487e0f-b91d-4762-a921-266cbbea6ff7/A%20Flexible%20Graph-Based%20Data%20Model%20Supporting%20incremental%20Schema%20Design%20and%20evolution.pdf): HTTP Error 400: Bad Request
w3m http://127.0.0.1:8082/files/38487e0f-b91d-4762-a921-266cbbea6ff7/A%20Flexible%20Graph-Based%20Data%20Model%20Supporting%20incremental%20Schema%20Design%20and%20evolution.pdf "Bad access token"
Hi @vmakarenko
Please have a try this test package: https://download.seafile.com/f/94eaf35cae4842219cc9/?dl=1
The config in seahub_settings.py and seafile.conf need not be changed.
Thank you Jonathan, we will test the package. BTW, I am just cuirious, what has been fixed/changed?
We have removed the hard coded ‘ketama’ and ‘remove_failed’ options in seaf-server. I think your error when switching memcached instances was caused by these options. It doesn’t mean there won’t be any error, but it should recover by itself after the switch is done.
Hi all,
The fix has been included in 6.2.11. New configuration for memcached and Seafile can be found in https://manual.seafile.com/deploy_pro/deploy_in_a_cluster.html. Just search for “memcached configuration”.
Redis will be supported soon?