天天看點

技術實踐分享 | springboot 使用 keycloak 踩坑記錄

作者:閃念基因

1

問題描述

公司有的項目使用 keycloak 作為統一身份認證、權限控制的方法,後端使用的是 springboot,是以一般是使用 springboot + 內建 keycloak 作為統一登入的方案,具體搭建流程可以參考官方文檔。此前一直沒有遇到問題,直到某天客戶回報說頁面突然打不開,但是過了一陣子就好了,有時候沒來得及定位問題就恢複了。

技術實踐分享 | springboot 使用 keycloak 踩坑記錄

問題出現的前端展示情況

有一次持續了幾分鐘,而且其他項目(使用 springboot 內建 keycloak) 都有機率出現這種問題,這種情況分析一般原因是因為接口比較耗時,是以進入容器 jstack 後列印出了目前的堆棧資訊,堆棧資訊比較長,我隻粘貼了出問題的部分:

"http-nio-8081-exec-9" #61 daemon prio=5 os_prio=0 tid=0x00007efc702c1000 nid=0x4c waiting for monitor entry [0x00007efc778f6000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.keycloak.adapters.rotation.JWKPublicKeyLocator.getPublicKey(JWKPublicKeyLocator.java:60)
        - waiting to lock <0x00000003f1888968> (a org.keycloak.adapters.rotation.JWKPublicKeyLocator)
        at org.keycloak.adapters.rotation.AdapterTokenVerifier.getPublicKey(AdapterTokenVerifier.java:121)
        at org.keycloak.adapters.rotation.AdapterTokenVerifier.createVerifier(AdapterTokenVerifier.java:111)
        at org.keycloak.adapters.rotation.AdapterTokenVerifier.verifyToken(AdapterTokenVerifier.java:47)
        at org.keycloak.adapters.BearerTokenRequestAuthenticator.authenticateToken(BearerTokenRequestAuthenticator.java:103)
        at org.keycloak.adapters.BearerTokenRequestAuthenticator.authenticate(BearerTokenRequestAuthenticator.java:88)
        at org.keycloak.adapters.RequestAuthenticator.authenticate(RequestAuthenticator.java:67)
        at org.keycloak.adapters.springsecurity.filter.KeycloakAuthenticationProcessingFilter.attemptAuthentication(KeycloakAuthenticationProcessingFilter.java:154)
        at org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:212)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
        at org.keycloak.adapters.springsecurity.filter.KeycloakPreAuthActionsFilter.doFilter(KeycloakPreAuthActionsFilter.java:96)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
        at org.springframework.security.web.header.HeaderWriterFilter.doHeadersAfter(HeaderWriterFilter.java:92)
        at org.springframework.security.web.header.HeaderWriterFilter.doFilterInternal(HeaderWriterFilter.java:77)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
....           

這應該是後端請求 keycloak 擷取 publicKey 出現了阻塞。

2

問題原因分析

初步分析是調用 keycloak 出現了問題,進入 JWKPublicKeyLocator 的 60 行,代碼如下:

// Check if we are allowed to send request
synchronized (this) {
    currentTime = Time.currentTime();
    if (currentTime > lastRequestTime + minTimeBetweenRequests) {
        sendRequest(deployment);
        lastRequestTime = currentTime;
    } else {
        log.debug("Won't send request to realm jwks url. Last request time was " + lastRequestTime);
    }

    return lookupCachedKey(publicKeyCacheTtl, currentTime, kid);
}           

因為目前 JWKPublicKeyLocator 是單例,同一個程序所有線程公用這個執行個體,是以當一個線程程式無法退出時,其他線程執行到 synchronized 隻能阻塞,繼續去看 sendRequest 函數,該函數會調用 keycloak 接口擷取 publicKey,協定是 http。

抛出問題:

  1. 調用擷取 publicKey 接口為啥會出現長時間未傳回?
  2. 是否有設定逾時時間,當接口超出時間未傳回時快速失敗?

我們後端使用 springboot 自動配置 keycloak,配置檔案主要有三個參數:

keycloak.realm=realmId
keycloak.resource=clientId
keycloak.auth-server-url=http://127.0.0.1:8180/auth           

這幾個參數是 KeycloakSpringBootProperties 配置類自動注入的,其中 keycloak.auth-server-url 配置的就是 keycloak 調用的 baseurl,客戶環境該參數是域名形式,不是 ip+端口格式,是以調用時會走域名解析,負載均衡等過程。

如果客戶的外網環境很差,出現網絡抖動等問題,通過這種方式調用還是可能會出現資料長時間未傳回的情況。通過代碼分析,這裡的調用設定的逾時時間等參數用的是預設值,可以檢視 org.apache.http.client.config.RequestConfig 類,預設是使用 public static final RequestConfig DEFAULT = (new RequestConfig.Builder()).build(); 建構預設值,Builder 裡面的預設值,有關逾時的三個參數:

private int connectionRequestTimeout = -1;
private int connectTimeout = -1;
private int socketTimeout = -1;           

-1 表示不逾時,是以我們的接口預設是不會逾時的,當一個請求阻塞住沒法釋放鎖,其它請求都沒辦法響應,隻能等待鎖釋放。

3

問題解決方案

總結一下避免此類問題的辦法:

  1. 調用外部接口時,必須設定逾時時間,避免由于一次調用逾時導緻整個服務的不可用;
  2. 如果 keycloak 部署在同一個區域網路環境中,配置的 keycloak 的位址參數可以使用内網 ip 參數,不使用域名或者外網,這樣不會出現由于網絡問題導緻的接口長時間不傳回。

3.1 設定逾時時間

先分析 keycloak jdk 中調用部的源碼。

org.keycloak.adapters.rotation.JWKPublicKeyLocator#sendRequest 如下:

private void sendRequest(KeycloakDeployment deployment) {
      if (log.isTraceEnabled()) {
          log.trace("Going to send request to retrieve new set of realm public keys for client " + deployment.getResourceName());
      }

      HttpGet getMethod = new HttpGet(deployment.getJwksUrl());
      try {
          JSONWebKeySet jwks = HttpAdapterUtils.sendJsonHttpRequest(deployment, getMethod, JSONWebKeySet.class);

          Map<String, PublicKey> publicKeys = JWKSUtils.getKeysForUse(jwks, JWK.Use.SIG);

          if (log.isDebugEnabled()) {
              log.debug("Realm public keys successfully retrieved for client " +  deployment.getResourceName() + ". New kids: " + publicKeys.keySet().toString());
          }

          // Update current keys
          currentKeys.clear();
          currentKeys.putAll(publicKeys);

      } catch (HttpClientAdapterException e) {
          log.error("Error when sending request to retrieve realm keys", e);
      }
   }           

org.keycloak.adapters.HttpAdapterUtils#sendJsonHttpRequest 如下:

public static <T> T sendJsonHttpRequest(KeycloakDeployment deployment, HttpRequestBase httpRequest, Class<T> clazz) throws HttpClientAdapterException {
      try {
          HttpResponse response = deployment.getClient().execute(httpRequest);
          int status = response.getStatusLine().getStatusCode();
          if (status != 200) {
              close(response);
              throw new HttpClientAdapterException("Unexpected status = " + status);
          }
          HttpEntity entity = response.getEntity();
          if (entity == null) {
              throw new HttpClientAdapterException("There was no entity.");
          }
          InputStream is = entity.getContent();
          try {
              return JsonSerialization.readValue(is, clazz);
          } finally {
              try {
                  is.close();
              } catch (IOException ignored) {

              }
          }
      } catch (IOException e) {
          throw new HttpClientAdapterException("IO error", e);
      }
  }           

源碼中 HttpGet 和 deployment.getClient() 這兩個地方都未設定逾時時間,是以在請求 keycloak 接口時,使用的預設的配置,預設配置是-1 表示不逾時。這裡的 deployment 對象雖然未使用 Spring 托管,但是可以通過其他托管對象擷取到,而且它一旦建立就是全局唯一的,是以我們解決的思路是擷取全局的 deployment 對象,然後擷取其 client,然後改變其設定。

通過分析代碼發現,AdapterDeploymentContext 執行個體是 spring 托管的,而且能通過它找到 deployment 執行個體。接下來就是确定怎麼攔截這個請求,一般有兩種方式 filter 或者 interceptor,在此場景中使用 filter 會更友善點(因為 keycloak jdk 本身就定義了很多 filter,而且支援自定義 filter),例如 jdk 中自帶的 org.keycloak.adapters.springsecurity.filter.KeycloakPreAuthActionsFilter,參考這個 filter,我們自定義 filter,代碼如下:

@Component
public class ChangeTimeOutFilter implements Filter {

    @Resource
    private AdapterDeploymentContext deploymentContext;

    private volatile boolean deploymentChanged = false;

    @Override
    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {

        HttpFacade facade = new SimpleHttpFacade((HttpServletRequest)request, (HttpServletResponse)response);
        KeycloakDeployment deployment = deploymentContext.resolveDeployment(facade);

        if (deployment == null) {
            chain.doFilter(request, response);
            return;
        }

        //deployment 是全局唯一,隻需要修改一次
        if (deploymentChanged) {
            chain.doFilter(request, response);
            return;
        }

        /**
         * 設定逾時時間
         */
        HttpParams params = deployment.getClient().getParams();
        params.setIntParameter(CoreConnectionPNames.SO_TIMEOUT, 10000);
        params.setIntParameter(CoreConnectionPNames.CONNECTION_TIMEOUT, 10000);
        params.setLongParameter(ClientPNames.CONN_MANAGER_TIMEOUT, 10000L);
        deploymentChanged=true;

        chain.doFilter(request, response);
    }           

為了測試逾時時間是否生效所構造的錯誤,我們将逾時時間設定非常短,例如幾毫秒,然後調用一定會逾時。代碼部署後,錯誤日志會輸出:

Error when sending request to retrieve realm keys
org.keycloak.adapters.HttpClientAdapterException: IO error
        at org.keycloak.adapters.HttpAdapterUtils.sendJsonHttpRequest(HttpAdapterUtils.java:57)
        at org.keycloak.adapters.rotation.JWKPublicKeyLocator.sendRequest(JWKPublicKeyLocator.java:99)
        at org.keycloak.adapters.rotation.JWKPublicKeyLocator.getPublicKey(JWKPublicKeyLocator.java:63)
        at org.keycloak.adapters.rotation.AdapterTokenVerifier.getPublicKey(AdapterTokenVerifier.java:121)
        at org.keycloak.adapters.rotation.AdapterTokenVerifier.createVerifier(AdapterTokenVerifier.java:111)
        at org.keycloak.adapters.rotation.AdapterTokenVerifier.verifyToken(AdapterTokenVerifier.java:47)
        at org.keycloak.adapters.BearerTokenRequestAuthenticator.authenticateToken(BearerTokenRequestAuthenticator.java:103)
        at org.keycloak.adapters.BearerTokenRequestAuthenticator.authenticate(BearerTokenRequestAuthenticator.java:88)
        at org.keycloak.adapters.RequestAuthenticator.authenticate(RequestAuthenticator.java:67)
        ......
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:171)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)           

上面這個報錯資訊,說明逾時配置已經生效。

3.2 keycloak 通路位址設定為内網

原來的 keycloak 配置資訊:

keycloak.realm=atlas
keycloak.resource=atlas-assistant
keycloak.auth-server-url=http://122.122.122.122:8180/auth
keycloak.ssl-required=none
keycloak.public-client=true
keycloak.use-resource-role-mappings=true           

将 keycloak.auth-server-url 改為内網位址 keycloak.auth-server-url=http://172.17.0.1:8180/auth因為考慮到前端需要通過後端傳回的 keycloak 位址在浏覽器進行跳轉(跳轉到登入頁面),是以這個傳回的位址必須是外網位址(内網位址前端沒法請求),是以新增一個配置項,這個配置項的值配置為外網位址,隻用來傳回給前端(以前都是使用 keycloak.auth-server-url 這個配置,現在将其拆開) environment.keycloak.auth-server-url=http://122.122.122.122:8180/auth ,然後相應的地方代碼修改部署到測試環境後,進入頁面報錯:

技術實踐分享 | springboot 使用 keycloak 踩坑記錄

設定内網登入失敗

先看錯誤日志(此日志級别是 debug,非 error,是以這塊對源碼是有疑問的):

2022-04-01 16:45:33.069 [http-nio-8081-exec-3] DEBUG o.k.adapters.BearerTokenRequestAuthenticator - Found [1] values in authorization header, selecting the first value for Bearer. 
2022-04-01 16:45:33.069 [http-nio-8081-exec-3] DEBUG o.k.adapters.BearerTokenRequestAuthenticator - Verifying access_token 
2022-04-01 16:45:33.071 [http-nio-8081-exec-3] DEBUG o.k.adapters.BearerTokenRequestAuthenticator - Failed to verify token 
2022-04-01 16:45:33.071 [http-nio-8081-exec-3] DEBUG org.keycloak.adapters.RequestAuthenticator - Bearer FAILED 
2022-04-01 16:45:33.071 [http-nio-8081-exec-3] DEBUG o.k.a.s.f.KeycloakAuthenticationProcessingFilter - Auth outcome: FAILED 
2022-04-01 16:45:33.071 [http-nio-8081-exec-3] DEBUG o.k.a.s.f.KeycloakAuthenticationProcessingFilter - Authentication request failed: org.keycloak.adapters.springsecurity.KeycloakAuthenticationException: Invalid authorization header, see WWW-Authenticate header for details 
org.keycloak.adapters.springsecurity.KeycloakAuthenticationException: Invalid authorization header, see WWW-Authenticate header for details
	at org.keycloak.adapters.springsecurity.filter.KeycloakAuthenticationProcessingFilter.attemptAuthentication(KeycloakAuthenticationProcessingFilter.java:162)
	at org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:212)
	at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
	at org.keycloak.adapters.springsecurity.filter.KeycloakPreAuthActionsFilter.doFilter(KeycloakPreAuthActionsFilter.java:96)
	at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:103)
	at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
	at org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:105)
	at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:103)
	at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
	at org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:215)
	at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:178)
	at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:358)
	at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:271)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:103)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:103)           

通過分析源碼,最終定位到是有一個 RealmUrlCheck 校驗邏輯沒通過導緻。

技術實踐分享 | springboot 使用 keycloak 踩坑記錄

分析出這兩個值,一個是内網位址,一個是外網位址,雖然指向同一個地方,但是值不同,校驗失敗,為什麼會出現這兩個不同的位址,可以從前後端互動邏輯說起:

技術實踐分享 | springboot 使用 keycloak 踩坑記錄

是以 1.2 後端需要傳回給前端浏覽器位址必須是外網位址,而前端請求後生成的 token 中攜帶的就是外網位址,deployment 中的 realmUrl 是根據配置資訊(keycloak.auth-server-url)解析出來的,是内網位址,而 JsonWebToken 是根據前端傳到後端的 token 解析出來的,這裡面的位址是外網位址,4.2 校驗 token 時兩個位址是不一緻的,後端會認為該 token 存在被篡改的危險,抛出了異常,是以為了解決該問題,思考了兩種解決方案:

方案 1:擴充原來的 jdk,使用自定義的 KeycloakConfigResolver,KeycloakDeployment 等,比較複雜;

方案 2:在上面的 filter 中,對 deployment 資料進行修改,将 realmUrl 位址從内網替換成域名,因為真正請求的時候不是使用這個參數,是以不會影響内網調用(真正調用時使用的是 authServerBaseUrl 參數)。

為了節省時間,使用了方案 2,具體的 filter 變成了:

@Component
public class ChangeTimeOutFilter implements Filter {

    @Resource
    private AdapterDeploymentContext deploymentContext;

    @Resource
    private KeyCloakConfig keyCloakConfig;

    private static Field realmInfoUrlFd;

    private volatile boolean deploymentChanged = false;

    static {
        try {
            ChangeTimeOutFilter.realmInfoUrlFd = KeycloakDeployment.class.getDeclaredField("realmInfoUrl");
            realmInfoUrlFd.setAccessible(true);
        } catch (Exception ex){

        }
    }

    @Override
    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {

        HttpFacade facade = new SimpleHttpFacade((HttpServletRequest)request, (HttpServletResponse)response);
        KeycloakDeployment deployment = deploymentContext.resolveDeployment(facade);

        if (deployment == null) {
            chain.doFilter(request, response);
            return;
        }

        //deployment 是全局唯一,隻需要修改一次
        if (deploymentChanged) {
            chain.doFilter(request, response);
            return;
        }

        /**
         * 将 realmInfoUrl 從内網改為外網,可以讓 check 通過
         */
        String realmInfoUrl = deployment.getRealmInfoUrl();
        if (!StringUtils.isBlank(realmInfoUrl)) {
            realmInfoUrl = realmInfoUrl.replaceAll(keyCloakConfig.getInnerUrl(), keyCloakConfig.getAuthUrl());
            try {
                realmInfoUrlFd.set(deployment, realmInfoUrl);
            } catch (Exception ex){
            }
        }

        HttpParams params = deployment.getClient().getParams();
        params.setIntParameter(CoreConnectionPNames.SO_TIMEOUT, 10000);
        params.setIntParameter(CoreConnectionPNames.CONNECTION_TIMEOUT, 10000);
        params.setLongParameter(ClientPNames.CONN_MANAGER_TIMEOUT, 10000L);
        deploymentChanged=true;

        chain.doFilter(request, response);
    }
}           

通過反射将字段的值改變使其一緻,就可以繞過校驗。

基于以上解決方案,目前我們已經解決了這個突如其來的報錯。未來,我将繼續在“觀遠資料技術團隊”分享過往的各種踩坑故事以及改進經驗,歡迎大家關注,共同探讨。

作者:杭州小丁,觀遠資料後端開發,網際網路老兵,長期奮鬥于J2EE領域,熱衷于研究各種開源代碼并從中進行個人技能的提高,擅長系統的架構設計與實作,設計模式,微服務,并發程式設計,領域模組化等,緻力于通過技術提供高穩定性,高效率和高擴充性的業務系統。

來源-微信公衆号:觀遠資料技術團隊

出處:https://mp.weixin.qq.com/s/GyTWRV19qrDiUKH9lKTa-Q