Sync Groups via OAuth2 / OpenID Connect

Hello,

I am currently experimenting with Seafile and OpenID Connect Authentication (aka OAuth2) using the popular Keycloak server as my identity provider and Seafile CE as a client.
Setup was relatively straightforward and the login process works flawlessly, but as far as I see there is currently no way to sync groups from Keycloak to Seafile, right?

A possible workaround might be to enable user federation from Keycloak to an LDAP server and then sync the groups from there, but this would be far from elegant (and the feature is only available in the Pro edition).

Do you see any other possible ways to enable group syncing? Is it planned to implement this in a future release of Seafile?

Cheers!

Well, I had a look at the seahub source code myself and the relevant code seems to be in /seahub/oauth/views.py (line 184 and onwards).
While the basic user information is being updated with that obtained from the OpenID Connect UserInfo endpoint, group information is completely ignored.
In Keycloak, a protocol mapper can be used to easily add the user’s groups as a claim to the UserInfo endpoint, so I guess it wouldn’t be very hard at all to implement the synchronization of those groups to Seafile groups via Seafile API calls from the Seahub code. Still, you probably wouldn’t get real “replication” since group membership info would only be updated when a user logs on, which might not be sufficient. Using the workaround mentioned above, all group membership updates would be pushed to the LDAP server and could be periodically updated from there.

However, any ideas for a more convenient workaround at this point?

I am using Seafile CE and Keycloak plus LDAP. Currently I do not have group sync working but I have written a Python script that reads group membership from Seafile (via SQL) and from Keycloak (via API) and compares them.

I am currently considering two options for syncing. One option is to write an LDAP overlay (basically a way to hook into LDAP operations). This way, when Keycloak modifies the LDAP directory, I could use Seafile’s API or database to update group membership accordingly. It probably doesn’t matter if you configure Seafile to use LDAP in addition to OpenID Connect (I use LDAP as Keycloak’s backend anyway).

The other option would be to extend the Python script I mentioned earlier to not only display differences between the systems but also correct them - and then run that script as a cronjob.

However, on my scale I might also stick with “syncing” group membership manually and just add my Python script to monitoring, so as to send me an email when someone lets things run out of sync.

Hey, excellent, didn’t expect any answers any more :smiley:

Your approach of using your own Python script seems promising. Any specific reason why you query the Seafile database directly instead of just using the API?
Regarding LDAP overlays, I guess that’s not for the faint-hearted… At least for me, working with LDAP has pretty much always been a PITA and whenever I have the choice, I avoid it.

Would you mind sharing said Python script?

As to why I don’t use Seafile’s API: In the beginning I had in mind to realize this as a web-accessible CGI script. I wanted to pass an OpenID access token from the client on to both Keycloak and Seafile. But Seafile’s API doesn’t work with those. That’s why I explored reading directly from the database. However, I ended up using hard-coded credentials for both Keycloak and the database anyway, so I might just as well use hard-coded credentials for Seafile’s API as well. I would definitely do that if I wanted to update Seafile’s group memberships.

What you’re saying about LDAP is probably true. I am thinking of overlays as the cleanest solution, but also likely hard to implement. I haven’t yet looked into coding one.

Share the Python script? Sure, here goes:

#!/usr/bin/env python3

###################
# Consistency Layer
# install depdendencies: python3-pymysql, python3-phpserialize, python3-pandas, python3-requests
#
###################


import json
import pandas
import phpserialize
import pymysql
import requests


class terminal_colors:
    header = "\033[95m"
    fail = "\033[91m"
    end = "\033[0m"

keycloak_config = {
    "url": "https://keycloak.example.com/",
    "realm": "master",
    "username": "keycloak-user",
    "password": "keycloak-password"
}

database_connection = pymysql.connect("localhost", "db-user", "db-password")


def get_keycloak_access_token(keycloak_config):
    response = requests.post(
        keycloak_config["url"] + "/auth/realms/" + keycloak_config["realm"] + "/protocol/openid-connect/token",
        data = {
            "username": keycloak_config["username"],
            "password":  keycloak_config["password"],
            "grant_type": "password",
            "client_id": "admin-cli"
        },
        headers = {
            "content-type": "application/x-www-form-urlencoded"
        }
    )

    access_token = response.json()["access_token"]
    return access_token


def get_keycloak_users(keycloak_config):
    token = get_keycloak_access_token(keycloak_config)
    response = requests.get(
        keycloak_config["url"] + "/auth/admin/realms/" + keycloak_config["realm"] + "/users",
        headers = {
            "accept": "application/json",
            "authorization": "Bearer " + token
        }
    )
    return pandas.read_json(response.text)


def get_keycloak_user_groups(keycloak_config, user):
    token = get_keycloak_access_token(keycloak_config)
    response = requests.get(
        keycloak_config["url"] + "/auth/admin/realms/" + keycloak_config["realm"] + "/users/" + user.id + "/groups",
        headers = {
            "accept": "application/json",
            "authorization": "Bearer " + token
        }
    )
    user_groups = pandas.read_json(response.text)
    user_groups["user_id"] = user.id
    user_groups["username"] = user.username
    user_groups["user_email"] = user.email
    return user_groups


def get_keycloak_all_user_groups(keycloak_config, keycloak_users):
    user_groups = None

    for index, user in keycloak_users.iterrows():
        current_user_groups = get_keycloak_user_groups(keycloak_config, user)
        if user_groups is None:
            user_groups = current_user_groups
        else:
            user_groups = user_groups.append(current_user_groups, sort = False)

    return user_groups



def get_seafile_user_groups(database_connection):
    cursor = database_connection.cursor()
    cursor.execute("USE `ccnet-db`")
    cursor.execute("""
        SELECT user_name, group_name, is_staff
        FROM GroupUser
        JOIN `Group` ON GroupUser.group_id = `Group`.group_id
    """)

    database_records = cursor.fetchall()

    seafile_user_groups = pandas.DataFrame(data = list(database_records),
                                           columns = ["user_name", "group_name", "is_staff"])

    return seafile_user_groups


def check_seafile_group(keycloak_user_groups, seafile_user_groups, group):
    seafile_user_groups = seafile_user_groups.pivot(
        index = "user_name",
        columns = "group_name",
        values = "is_staff"
    ).rename(columns = {group: "in_group"})

    keycloak_user_groups = keycloak_user_groups.assign(in_group = 1).pivot(
        index = "user_email",
        columns = "name",
        values = "in_group"
    ).filter(["user_email", group]).rename(columns = {group: "in_group"})


    users_in_keycloak_group = set(keycloak_user_groups
        .query("in_group >= 0")
        .index)

    users_in_seafile_group = set(seafile_user_groups
        .query("in_group >= 0")
        .index)

    return {
        "less_than_expected": list(users_in_keycloak_group.difference(users_in_seafile_group)),
        "more_than_expected": list(users_in_seafile_group.difference(users_in_keycloak_group))
    }


keycloak_users = get_keycloak_users(keycloak_config)
keycloak_user_groups = get_keycloak_all_user_groups(keycloak_config, keycloak_users)
seafile_user_groups = get_seafile_user_groups(database_connection)

results = {
    "seafile": {
        "group1": check_seafile_group(keycloak_user_groups, seafile_user_groups, "group1"),
        "group2": check_seafile_group(keycloak_user_groups, seafile_user_groups, "group2"),
    }
}


print(terminal_colors.header, "Seafile", terminal_colors.end, sep = "")

for group in results["seafile"]:
    print("    ", group, " – More than expected", sep = "")
    for email in results["seafile"][group]["more_than_expected"]:
        print("        ", terminal_colors.fail, email, terminal_colors.end, sep = "")
    if len(results["seafile"][group]["more_than_expected"]) == 0:
        print("        ", "None", sep = "")

    print("    ", group, " – Less than expected", sep = "")
    for email in results["seafile"][group]["less_than_expected"]:
        print("        ", terminal_colors.fail, email, terminal_colors.end, sep = "")
    if len(results["seafile"][group]["less_than_expected"]) == 0:
        print("        ", "None", sep = "")

print()

print("End of report")

database_connection.close()

Currently its output is meant for a color-capable terminal. But it can easily be changed to output JSON (for CGI) or drop the colors. My script also checks a WordPress installation. I removed that code, let me know if you are interested in it.

Note that the script makes one Keycloak API request for every user. This may take quite a while if you have a lot of them.