Memcached problems(?)

Yes, that’s what you mentioned. Use keepalived to migrate the virtual IP to the standby server.

I’ve now switched our cluster to using just a single memcached server with a floating IP.
Let’s see how that goes! :wink:

@schlarbm: I’m just interesting: everything goes OK with your memcached setup?

I think we only had one case where the primary memcached server became offline since the change and the failover via keepalived worked fine, but the other Seafile instances still spat out “SERVER MARKED DEAD” for some time after that.
Maybe the configuration regarding the tuning of keepalive and retry parameters for memcached can still be optimized.

@schlarbm What’s your memcache configuration in seafile.conf?

hi guys,

just for reference, I will add how we solved the memcached problems:

  1. we have a active-standby setup for our seafile application servers, using a floating IP and heartbeat.
  2. according to the suggestion by jonathan to use only one memcached server (in a active-standby setup) it was straightforward in our case to also let the memcached daemon be managed by heartbeat such that it is always running on the same node which is the active seafile application server.
  3. therefore, in the configs (seafile, seahub) of all our seafile servers (application servers, background node) we can directly use the floating IP as the IP for the memcached service.

this solution is very simple and elegant, as long as you do not want to use load-balancing. and it seems to work so far.

to avoid memcached servers to be marked dead (when the primary server is rebooted, for example) we use the following options in seahub_settings.py:

CACHES = {
‘default’: {
‘BACKEND’: ‘django_pylibmc.memcached.PyLibMCCache’,
‘LOCATION’: [ ‘10.65.16.118:11211’, ],
‘OPTIONS’: {
‘ketama’: True,
‘remove_failed’: 0,
‘retry_timeout’: 1,
‘dead_timeout’: 60
}
}
}

best,
Hp

2 Likes

Hi Jonathan,
it’s just:

[cluster]
enabled = true
memcached_options = --SERVER=10.94.7.156 --POOL-MIN=10 --POOL-MAX=100

Regards, Moritz

We tried similar setup (CACHES = {…} in seahub_settings.py), but with haproxy instead of heartbeat. Thus sefaile nodes has always one IP to access memcached. The configuration seems to be stable, but seahub.log of other nodes is full of following exceptions:

2018-02-14 16:38:20,674 [ERROR] django.pylibmc:125 add MemcachedError: memcached_behavior_set returned 38 
for behavior 'remove_failed' = 0
Traceback (most recent call last):
 File "/usr/local/lib/python2.7/dist-packages/django_pylibmc/memcached.py", line 117, in add
   return self._cache.add(key, value,
 File "/usr/local/lib/python2.7/dist-packages/django_pylibmc/memcached.py", line 93, in _cache
   client.behaviors = self._options
 File "/usr/local/lib/python2.7/dist-packages/pylibmc/client.py", line 197, in set_behaviors
   return super(Client, self).set_behaviors(_behaviors_numeric(behaviors))
Error: memcached_behavior_set returned 38 for behavior 'remove_failed' = 0

Do you have same errors? Can we just ignore them?

I suggest for single memcache server, all the options be removed.

I suspect that ketama hashing should be used together with remove_failed=1​

1 Like

I agree with Daniel that the above configuration should be removed from seahub_setting.py.

@schlarbm you observed “marked dead” because “remove_failed” and “ketama” option are hard coded in the C code right now. We will change this and also change the documentation too. (BTW, after some testing, I find that adding --RETRY-TIMEOUT=1 to seafile.conf would also avoid the “marked dead” situation too. But I prefer not to recommend to adding one more option to the config file.)

At last, the memcached config in seafile.conf should be:

[cluster]
enabled = true
memcached_options = --SERVER=ip --POOL-MIN=10 --POOL-MAX=100

in seahub_settings.py should be

CACHES = {
    'default': {
        'BACKEND': 'django_pylibmc.memcached.PyLibMCCache',
        'LOCATION': ['ip:11211'],
}

Hi Jonathan!

Ah, okay!

The two snippets you posted are exactly what we currently have - and apart from that one instance there are no problems at the moment.

We wish you a happy new year holiday!
Moritz

@vmakarenko Have you solved your memcached issue with the new configuration?

We changed to memcached/keepalived solution. The setup works stable if keepalived points to one memcached instance. But we have similar issues if keepalived switches to backup memcached instance or goes back to master. Seafile cannot recover from this status: seafile server complains about Bad Access Token on background task server or nodes send empty urls to it. The only complete restart of all infrastucture (app nodes + background task server) fixes the problem. Shell I send our confs?

Hi @vmakarenko

Please provide the following config options:

  • [cluster] section in seafile.conf
  • CACHE option in seahub_settings.py
  • keepalived configuration for migrating memcached VIP

Please also provide the error messages in seafile.log when you switch over the memcached server.

By the way, have you confirmed that the VIP has been migrated to the stand-by memcached server when the main server is shutdown?

seafile.conf:
[cluster]
enabled = true
memcached_options = --SERVER=V_IP:11211 --POOL-MIN=10 --POOL-MAX=100
health_check_port = 11001

seahub_settings.py

CACHES = {                                                                       
    'default': {                                                                 
        'BACKEND': 'django_pylibmc.memcached.PyLibMCCache',                      
        'LOCATION': 'V_IP:11211',                                     
    }                                                                            
}     

keepalived.conf at any of 2 MEMCACHED instances, the only difference: priority 200, 101, 100

lobal_defs {
   notification_email {
     SYSADMIN_EMAIL
   }
   notification_email_from SYSADMIN_EMAIL
   smtp_server localhost
   smtp_connect_timeout 30
   vrrp_garp_interval 5
   vrrp_garp_master_delay 5
   vrrp_garp_master_repeat 5
}

vrrp_script check_memcached {
   script  "/usr/lib/nagios/plugins/check_memcached.pl -H localhost"
   interval 2  # check every 2 seconds
   fall 2  # require 2 failures for KO
   rise 2  # require 2 successes for OK
}

vrrp_instance memcached {
    state MASTER
    interface eth0
    virtual_router_id 56
    priority 200
    advert_int 1
    smtp_alert
    authentication {
        auth_type PASS
        auth_pass PASSWORD
    }
   unicast_peer {
     MEMCACHED_INSTANCE_1_IP
     MEMCACHED_INSTANCE_2_IP
     MEMCACHED_INSTANCE_3_IP

    }
    virtual_ipaddress {
        V_IP
    }
    track_script {
            check_memcached
    }
    notify /usr/local/bin/keepalived_status.sh
}

We checked VIP switching: completely fine with the keepalived settings.

seafile-background-tasks.log at background tatsks server after switching of MEMCACHED instance with keepalived:

[2018-03-05 16:48:39,688] [WARNING] failed to fetch document of task  (http://127.0.0.1:8082/files/38487e0f-b91d-4762-a921-266cbbea6ff7/A%20Flexible%20Graph-Based%20Data%20Model%20Supporting%20incremental%20Schema%20Design%20and%20evolution.pdf): HTTP Error 400: Bad Request
w3m http://127.0.0.1:8082/files/38487e0f-b91d-4762-a921-266cbbea6ff7/A%20Flexible%20Graph-Based%20Data%20Model%20Supporting%20incremental%20Schema%20Design%20and%20evolution.pdf
"Bad access token"

Hi @vmakarenko

Please have a try this test package: https://download.seafile.com/f/94eaf35cae4842219cc9/?dl=1

The config in seahub_settings.py and seafile.conf need not be changed.

Thank you Jonathan, we will test the package. BTW, I am just cuirious, what has been fixed/changed?

We have removed the hard coded ‘ketama’ and ‘remove_failed’ options in seaf-server. I think your error when switching memcached instances was caused by these options. It doesn’t mean there won’t be any error, but it should recover by itself after the switch is done.

Hi all,

The fix has been included in 6.2.11. New configuration for memcached and Seafile can be found in https://manual.seafile.com/deploy_pro/deploy_in_a_cluster.html. Just search for “memcached configuration”.

Redis will be supported soon?

1 Like