How well does Seafile cope with missing memcached node?

muellefr · February 24, 2021, 12:11pm

Hi,
upon upgrading and rebooting nodes from our memcached cluster, I noticed that seahub apparently stops working whenerver one of the memcached nodes is unavailable. Which leaves me puzzled as I would expect a memcached-cluster setup to prevent exactly this? Anyone experiencing the same thing? Is this maybe a misconfiguration on our side?
Thanks in advance!

DerDanilo · February 25, 2021, 9:59pm

Memcached Cluster support was dropped some time ago because they couldn’t fix certain problems and simply declared that it is not needed. … Which is not true, they simply did not seem to have this tested in a proper manner otherwise they would know that a memcached node failover (since clusters are not supported) causes serious issues in a Seafile cluster, especially with many connections/users.

A suitable solution would be to support redis, which is easy to cluster, a more modern approach and widely used for exactly this purpose in modern applications. It also scales a lot better than memcached.

@daniel.pan
Can you please elaborate on the future plans for this? A proper caching solution is required.

The same goes for CE as to replace memcached entirely or at least give the option to use redis.

Not providing a cluster solution for a cached backend makes the whole Seafile cluster not a real HA cluster.
When the memcached node breaks off the cluster crashes and users are loosing connections, Seahub is unresponsive and requires a service restart to start working properly again. This causes service interruptions and user support requests for they see error messages.

Thanks in advance for providing a proper caching cluster backend integration!

daniel.pan · February 26, 2021, 2:46am

Redis support will be added in version 9.0.

DerDanilo · February 26, 2021, 7:59am

This is really good news. Especially enterprise customers will be very happy about it

casselt · February 26, 2021, 8:14am

We use yrmcds for this. Worked without problems for years and manages the failover automatically.

DerDanilo · August 21, 2021, 10:05am

Was redis removed from the roadmap for 9 on purpose?
I though I saw it listed there some time ago but now it’s gone.

memcache(d) is not sufficient and yrmcds isn’t sufficient with large setups either.

Please also support read and write redis backends for different system calls to allow better performance with larger setups.

A Redis write backend is mandatory of course, read only can be optional and allows better performance tuning.

Jonathan · August 23, 2021, 4:44am

After more research we found that supporting Redis has little benefits. The reasons are:

The Redis cluster referred by @DerDanilo (Redis cluster tutorial – Redis) requires a different client library than the ordinary library for single-node Redis. This makes the C code complex. What’s worse, Django doesn’t support Redis cluster. This is a show stopper. (Redis Cluster support. · Issue #208 · jazzband/django-redis · GitHub)
The other Redis cluster solutions like Sentinel (Redis Sentinel Documentation – Redis) or HAProxy is no better than the current memcached cluster solution with Keepalived.

The current memcached solution with Keepalived actually can survive memcached node fail-over. After the fail-over, Seafile should be able to use memcached.

DerDanilo · August 23, 2021, 7:03am

Thanks for taking the time to explain with references.
But please explain this prior to removing if you put a feature on the roadmap. Silently removing announced features is a show stopper for planing and could also be seen as fraud because a business might have selected Seafile for that very reason.

The Redis cluster referred by @DerDanilo (Redis cluster tutorial – Redis) requires a different client library than the ordinary library for single-node Redis. This makes the C code complex. What’s worse, Django doesn’t support Redis cluster. This is a show stopper. (Redis Cluster support. · Issue #208 · jazzband/django-redis · GitHub)

I partly understand your points but don’t understand why supporting the single-node library is a problem?
AFAIK the single-node library works fine with redis sentinel and that is also what is recommended to use because of it’s simplicity.

The other Redis cluster solutions like Sentinel (Redis Sentinel Documentation – Redis) or HAProxy is no better than the current memcached cluster solution with Keepalived.

The current memcached solution with Keepalived actually can survive memcached node fail-over. After the fail-over, Seafile should be able to use memcached.

The “memcached cluster” solution with keepalived does not work properly and also fails to work under a lot of load. Additionally keepalived fails sometimes which causes downtime too.
This is not a real cluster solution. It’s basically a single memcached node with some sort of sync.
Please don’t get me wrong but memcached caused a lot of problems in the past and still does. It was one of the few reasons for multiple Seafile clusters to have downtimes.

Where as sentinel is a real cluster in the background, keeping everything in sync and even offers to differentiate read and write requests for better optimization.

I don’t know why you say that Sentinel is no different to yrmcds (memcached). Redis Sentinel is better alone for the fact that it provides a real clustering solution.

Memcached causes a lot of headaches for clusters and a “fail-over” (active/backup) is not a properly working cluster backend since there will always be an interruption.
I suggested redis/sentinel because of these interruptions and because Seafile tends to crash completely if memcached can’t keep up or because of a memcached node failover.

Please overthink your decision or support another caching backend that supports real cluster solutions.
Thanks!

Jonathan · August 27, 2021, 6:27am

Hi @DerDanilo

Thank you for your input. We tested with fail-over scenarios with memcached. Seafile doesn’t crash or misbehaves across such fail-overs. After the fail-over, requests to memcached fail for once, then works again after retry. I believe there are some misunderstanding around this. Or perhaps your experience is based on some older versions or configurations.

As for Redis support, I understand that the sentinel solution has some advantage compare to memcached. For example it can keep the nodes in sync. However it’s not essential for Seafile’s use case to keep the existing cache entries after a fail-over. So we’ll still consider support Redis in later versions (maybe version 10), but for now it’s lower priority.

DerDanilo · August 27, 2021, 2:22pm

Hi @Jonathan
Thanks for your reply.
As you say yourself, “it fails once” and then works again. But this failing connection/request is enough to kill client connections. This is exactly what should definitely not happen with a cluster that is supposed to be HA.
In addition to killing client connections the failover also puts an extremely high load onto the whole cluster until the new memcached server can serve most requests again and the cache was build up. With our biggest cluster (at peak times) it takes several minutes to recover from the memcached failover.
Having a session caching solution that is HA is a must have for Seafile clusters; because the proposed memcached solution is not; it’s active-backup.

I understand that adding Redis/Sentinel support is low priority from your standpoint, as maybe not enough customers have huge clusters with such a high load that it show any of the symptoms that we experience. But please consider keeping adding Redis/Sentinel support on the 9.x roadmap since 10.x is years of waiting without having a proper solution for this issue. It doesn’t have to be released with 9.0 but maybe one of the later 9.x releases.

And we’d rather wait for properly implemented Redis/Sentinel support with different read and write backends to further boost the performance.
This will also help e.g. S3 backends to faster serve requests, since many redis backend server can reply to a read request but only one can accept write requests at a time.
Please don’t hesitate to ask for more input regarding this if required.

Thanks!