Sorry, our demo is not currently available on mobile devices.

Please check out the desktop version.
You can learn more about Stemma on our blog.
See Our Blog
September 14, 2021
July 29, 2021
-
7
min read

How-to: Setup OIDC Authentication in Amundsen

by
Verdan Mahmood
Founding Engineer
Share This Article

Okta, Google, and others provide user authentication and have native support for Open ID Connect (OIDC) for all of their customers. This step-by-step guide will explain how to enable OIDC for your Amundsen installation.

You can find existing Amundsen docs at:

https://www.amundsen.io/amundsen/

We rely on verdan/flaskoidc for the OIDC setup in Amundsen. So, make sure to install flaskoidc for each Amundsen service. i.e., frontend, metadata, and search. You can find the repo at:

GitHub - verdan/flaskoidc: A wrapper of Flask with pre-configured OIDC support

Step 1 — Create an OIDC Application

I will assume in the rest of this guide that you have created a client/application in your OIDC provider and have the Client ID and Client Secret handy.

Important: While creating a client in your OIDC provider, you will be asked to provide a Redirect URI. For that, make sure to use https://YOUR_AMUNDSEN_DOMAIN/auth

Step 2 — Use OIDC Config for your amundsen/frontend service

Nothing complicated here. Make sure you are using the correct OidcConfig for your frontend service.

You can do this by setting the following available environment variable.

FRONTEND_SVC_CONFIG_MODULE_CLASS: 
amundsen_application.oidc_config.OidcConfig

Or, if you are using a custom config class for your frontend service, make sure to inherit your custom config with the OidcConfig.

The reason to use OidcConfig is that it sets the following two methods that are needed to get the end-to-end authentication within Amundsen:

This method ensures your frontend service passes the correct token in the request to other services when calling their APIs.

This configuration method gets the user information from the session and serializes that information into a User object.

Step 3 — Get a Discovery Document endpoint

To simplify OIDC implementations and increase flexibility, OpenID Connect allows the use of a “Discovery document,” a JSON document found at a well-known location containing key-value pairs which provide details about the OpenID Connect provider’s configuration, including the URIs of the authorization, token, revocation, userinfo, and public-keys endpoints.

The following are the discovery document endpoints of the commonly used OIDC providers:

Google: https://accounts.google.com/.well-known/openid-configuration

Okta: https://[YOUR_OKTA_DOMAIN]/.well-known/openid-configuration

Okta (Auth Server ID): https://[YOUR_OKTA_DOMAIN]/oauth2/[AUTH_SERVER_ID]/.well-known/openid-configuration

Auth0: https://[YOUR_DOMAIN]/.well-known/openid-configuration

Keycloak: http://[KEYCLOAK_HOST]:[KEYCLOAK_PORT]/auth/realms/[REALM]/.well-known/openid-configuration

If you are using your custom OIDC provider, you should expose a discovery endpoint that follows the specification defined at: https://openid.net/specs/openid-connect-discovery-1_0.html#ProviderMetadata

Step 4 — Configure Amundsen to use OIDC Auth

Following environment variables MUST be set to get the OIDC working.

Reference: https://github.com/verdan/flaskoidc#configurations

Note: You will need to set the following environment variables for each service. i.e., frontend, metadata, and search. (In our community helm chart, I have made the common settings to be set only once. I will explain how to set these for helm in a later section of this guide)

FLASK_APP_MODULE_NAME: flaskoidc
FLASK_APP_CLASS_NAME: FlaskOIDC
FLASK_OIDC_PROVIDER_NAME: okta | google | ...

The name of the OIDC provider, like google, okta, keycloak, etc. I have verified this package only for google, okta, and keycloak. Would you mind making sure to open a new issue if any of your OIDC providers is not working? (Even better: contribute it upstream to the open-source project, I am very proactive in reviewing and merging the patches and new features 😉)

FLASK_OIDC_SCOPES: openid email profile

The following Scopes are required to make your client works with the OIDC provider, separated by a space.

  • openid
  • email
  • profile
FLASK_OIDC_CLIENT_ID
FLASK_OIDC_CLIENT_IDFLASK_OIDC_CLIENT_SECRET

You will get the Client ID and Client Secret values after creating a new application in your OIDC provider in Step 1.

FLASK_OIDC_REDIRECT_URI: /auth

This is the endpoint that your OIDC provider hits to authenticate against your request. This is what you set as one of your redirect URI in the OIDC provider client’s settings.

FLASK_OIDC_CONFIG_URL

This is the discovery endpoint that you will get in Step 3 of this guide.

Step 5 — Verify your setup

This is it! Now is the time to start your Amundsen services and verify if things are working for you. By this time, you should be able to see your OIDC provider’s login screen when you try to access your Amundsen instance. ☀

Please note that only setting up OIDC:

  • does not enable User Profiles in Amundsen UI.
  • does not ingest users in your database. If you don’t have the users in your metadata proxy database, this will fail to load the Users’ related information, like owners, frequent users, user profiles, bookmarks, etc.

Please refer to the below optional sections to enable any of the above two mentioned items.

Optional: Enable User Profiles in Amundsen

User Profile is disabled by default in Amundsen UI. This can be enabled via a frontend configuration.

/frontend/amundsen_application/static/js/config/config-custom.ts

Make sure you enable the “indexUsers” settings, it will look like this afterward:

indexUsers: { enabled: true,},

Please note that you need to build the frontend manually after making this change. This also means that you will need to push your custom Docker image to Dockerhub or your custom registry if you are using Docker images for your deployment.

Optional: Ingest Users in metadata proxy

Preferably, users should be ingested to your metadata database via a databuilder job or a cronjob.

In cases where the databuilder job is not an option to ingest Users into the database, you can leverage USER_DETAIL_METHODof the metadata service configuration class to get (or even create users) in runtime from any third-party/custom system.

This custom function takes user_id as a parameter and returns a dictionary consisting of user details’ fields defined in UserSchema.

Example basic usage:

def get_user_details(user_id):
   user_info = {
       'email': user_id,
       'user_id': user_id,
   }
   return user_infoUSER_DETAIL_METHOD = get_user_details

The above basic code will ensure that your Amundsen UI will not break, even if you do not have Users ingested in your database.

Important Notes:

  • It is highly recommended to cache this function’s response to avoid any performance issues.
  • Once set, this function will be called to get the user details every time a user object is requested.
  • This function is implemented for Neo4j Proxy and Apache Atlas Proxy only when writing this guide.

If you are looking for a way to CREATE a user during this process in your database, then you can also do that using the USER_DETAIL_METHOD

The logic should look like this:

  • Check if the user exists in your database → if yes, return the user details
  • If the user is not already in the database → get user information from OIDC provider, or any other third-party system → Store in your database → Return the newly created user’s details

1: If you are using Google as the OIDC Provider

  • You MUST enable the People API for your Google Application.
  • Update the scopes defined in Step 4, and add https://www.googleapis.com/auth/directory.readonly.
  • The scopes should look like this now
FLASK_OIDC_SCOPES: openid email profile
https://www.googleapis.com/auth/directory.readonly
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
 from flask import current_app as app
 from amundsen_common.models.user import UserSchema

 from metadata_service.exception import NotFoundException
 from metadata_service.proxy import get_proxy_client

 def get_user_from_oidc(user_id: str) -> Dict:
    metadata = app.auth_client.load_server_metadata()
    search_endpoint = f'{metadata["issuer"]}/api/v1/users?q={user_id}&limit=1'

    _not_found_error = f"User Not Found in the OIDC Provider. User ID: {user_id}"

    response = app.auth_client.get(search_endpoint)
    response.raise_for_status()
    user_info = response.json()
    if not user_info:
        raise NotFoundException(_not_found_error)
    user_data = dict()
    _user = user_info[0]

    user_data.update(_user["profile"])
    user_data.update({
        "name": f'{_user["profile"]["firstName"]} {_user["profile"]["lastName"]}'
    })
    profile_url = _user.get("_links", {}).get("self", {}).get("href")
    user_data.update({"profile_url": profile_url})

    return {
        "user_id": user_id,
        "email": user_data["email"],
        "first_name": user_data["firstName"],
        "last_name": user_data["lastName"],
        "full_name": user_data["name"],
        "display_name": user_data["name"],
        "profile_url": user_data["profile_url"],
    }


 def get_user_details(user_id: str) -> Dict:
    client = get_proxy_client()
    schema = UserSchema()
    try:
        return schema.dump(client.get_user(id=user_id))
    except NotFoundException:
        LOGGER.info("User not found in the database. Trying to create one using oidc.get_user_detail")

    if not hasattr(app, 'auth_client'):
        raise OpenIDConnectNotConfigured

    try:
        user_info = get_user_from_oidc(user_id=user_id)

        user = schema.load(user_info)
        new_user, is_created = client.create_update_user(user=user)
        return schema.dump(new_user)

    except Exception as ex:
        LOGGER.exception(str(ex), exc_info=True)
        # Return the required information only
        return {
            "email": user_id,
            "user_id": user_id,
        }


 USER_DETAIL_METHOD = get_user_details

2: If you are using Okta as the OIDC Provider

  • Under your Okta application, you MUST grant permission for the following scopes
  • okta.users.read
  • okta.users.read.self
  • Update the scopes environment variable defined in Step 4, and add the ones you new assigned to your application in Okta. The scopes should look like this now:
FLASK_OIDC_SCOPES: openid email profile okta.users.read 
okta.users.read.self
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
 from flask import current_app as app
 from amundsen_common.models.user import UserSchema

 from metadata_service.exception import NotFoundException
 from metadata_service.proxy import get_proxy_client

 def get_user_from_oidc(user_id: str) -> Dict:
    metadata = app.auth_client.load_server_metadata()
    search_endpoint = f'{metadata["issuer"]}/api/v1/users?q={user_id}&limit=1'

    _not_found_error = f"User Not Found in the OIDC Provider. User ID: {user_id}"

    response = app.auth_client.get(search_endpoint)
    response.raise_for_status()
    user_info = response.json()
    if not user_info:
        raise NotFoundException(_not_found_error)
    user_data = dict()
    _user = user_info[0]

    user_data.update(_user["profile"])
    user_data.update({
        "name": f'{_user["profile"]["firstName"]} {_user["profile"]["lastName"]}'
    })
    profile_url = _user.get("_links", {}).get("self", {}).get("href")
    user_data.update({"profile_url": profile_url})

    return {
        "user_id": user_id,
        "email": user_data["email"],
        "first_name": user_data["firstName"],
        "last_name": user_data["lastName"],
        "full_name": user_data["name"],
        "display_name": user_data["name"],
        "profile_url": user_data["profile_url"],
    }


 def get_user_details(user_id: str) -> Dict:
    client = get_proxy_client()
    schema = UserSchema()
    try:
        return schema.dump(client.get_user(id=user_id))
    except NotFoundException:
        LOGGER.info("User not found in the database. Trying to create one using oidc.get_user_detail")

    if not hasattr(app, 'auth_client'):
        raise OpenIDConnectNotConfigured

    try:
        user_info = get_user_from_oidc(user_id=user_id)

        user = schema.load(user_info)
        new_user, is_created = client.create_update_user(user=user)
        return schema.dump(new_user)

    except Exception as ex:
        LOGGER.exception(str(ex), exc_info=True)
        # Return the required information only
        return {
            "email": user_id,
            "user_id": user_id,
        }


 USER_DETAIL_METHOD = get_user_details

Optional: Configure OIDC using Helm

Mount the OIDC secrets in the Kubernetes cluster by passing in the required values through the Helm --set ... function. (use comma-separated values to set multiple values)

Make sure to enable the OIDC support:

--set oidc.enabled=true

Once you enable the OIDC using the flag above, the helm chart automatically sets the Flask’s wrapper module and class to each Amundsen service. That means you do not need to set the following explicitly:

FLASK_APP_MODULE_NAME=flaskoidc
FLASK_APP_CLASS_NAME=FlaskOIDC

The following common values for OIDC need to be set only once and not for each service separately. The default values set in our helm are:

FLASK_OIDC_PROVIDER_NAME: google
FLASK_OIDC_SCOPES: "openid email profile"
FLASK_OIDC_USER_ID_FIELD: email
FLASK_OIDC_REDIRECT_URI: "/auth"
FLASK_OIDC_CONFIG_URL: "https://accounts.google.com
/.well-known/openid-configuration"

You can change the values accordingly, for example:

--set 
oidc.configs.FLASK_OIDC_PROVIDER_NAME=google,oidc.configs.
FLASK_OIDC_SCOPES="openid email profile 
https://www.googleapis.com/auth/directory.readonly"

Finally, you need to set the client id and secret for each service that you got when you created the OIDC application in Step 1 of this guide.

--set oidc.frontend.client_id=...\
--set oidc.frontend.client_secret=...\
--set oidc.metadata.client_id=...\
--set oidc.metadata.client_secret=...\
--set oidc.search.client_id=...\
--set oidc.search.client_secret=...\

Stemma ❤ Amundsen

Hopefully, this guide helps you in setting up OIDC with Amundsen. I am a founding engineer at Stemma, which provides a managed version of Amundsen with certain additions:

  • Managed Amundsen with encryption, backup/restore, OIDC integration, and much more straight out of the box.
  • Intelligence — common join and filter conditions, lineage, slack integration, and much more
  • Deep experience in ensuring adoption within your organization.
Get started with us at Stemma or reach out to me on Amundsen slack if you have any further questions.
Share This Article
Amundsen
Stay in the loop by subscribing to our newsletter
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Next Articles

September 15, 2021
September 15, 2021
-
min read

Data Discovery in Data Mesh

Why is data discovery important? What is the role for data discovery in data mesh? Who's responsible for making data discoverable? Learn the answers to these questions (and more!) — summarized from a recent panel discussion on Data Discovery in Data Mesh.

September 14, 2021
December 1, 2020
-
7
min read

The data production - consumption gap

All recent innovation in data has taken place in two areas — helping data engineers produce data, and helping data consumers (primarily data analysts and scientists) consume that data.

September 14, 2021
March 18, 2021
-
5
min read

Amundsen deployment best practices

Almost any organization using Amundsen will need to make custom changes to their install. Unfortunately, this has been a long-time issue for the community. This post is the first in a step-by-step guide to getting a fully customized enterprise deployment of Amundsen¹, based on how Stemma deploys Amundsen for its customers.