天天看点

技术实践分享 | springboot 使用 keycloak 踩坑记录

作者:闪念基因

1

问题描述

公司有的项目使用 keycloak 作为统一身份认证、权限控制的方法,后端使用的是 springboot,所以一般是使用 springboot + 集成 keycloak 作为统一登录的方案,具体搭建流程可以参考官方文档。此前一直没有遇到问题,直到某天客户反馈说页面突然打不开,但是过了一阵子就好了,有时候没来得及定位问题就恢复了。

技术实践分享 | springboot 使用 keycloak 踩坑记录

问题出现的前端展示情况

有一次持续了几分钟,而且其他项目(使用 springboot 集成 keycloak) 都有概率出现这种问题,这种情况分析一般原因是因为接口比较耗时,所以进入容器 jstack 后打印出了当前的堆栈信息,堆栈信息比较长,我只粘贴了出问题的部分:

"http-nio-8081-exec-9" #61 daemon prio=5 os_prio=0 tid=0x00007efc702c1000 nid=0x4c waiting for monitor entry [0x00007efc778f6000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.keycloak.adapters.rotation.JWKPublicKeyLocator.getPublicKey(JWKPublicKeyLocator.java:60)
        - waiting to lock <0x00000003f1888968> (a org.keycloak.adapters.rotation.JWKPublicKeyLocator)
        at org.keycloak.adapters.rotation.AdapterTokenVerifier.getPublicKey(AdapterTokenVerifier.java:121)
        at org.keycloak.adapters.rotation.AdapterTokenVerifier.createVerifier(AdapterTokenVerifier.java:111)
        at org.keycloak.adapters.rotation.AdapterTokenVerifier.verifyToken(AdapterTokenVerifier.java:47)
        at org.keycloak.adapters.BearerTokenRequestAuthenticator.authenticateToken(BearerTokenRequestAuthenticator.java:103)
        at org.keycloak.adapters.BearerTokenRequestAuthenticator.authenticate(BearerTokenRequestAuthenticator.java:88)
        at org.keycloak.adapters.RequestAuthenticator.authenticate(RequestAuthenticator.java:67)
        at org.keycloak.adapters.springsecurity.filter.KeycloakAuthenticationProcessingFilter.attemptAuthentication(KeycloakAuthenticationProcessingFilter.java:154)
        at org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:212)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
        at org.keycloak.adapters.springsecurity.filter.KeycloakPreAuthActionsFilter.doFilter(KeycloakPreAuthActionsFilter.java:96)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
        at org.springframework.security.web.header.HeaderWriterFilter.doHeadersAfter(HeaderWriterFilter.java:92)
        at org.springframework.security.web.header.HeaderWriterFilter.doFilterInternal(HeaderWriterFilter.java:77)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
....           

这应该是后端请求 keycloak 获取 publicKey 出现了阻塞。

2

问题原因分析

初步分析是调用 keycloak 出现了问题,进入 JWKPublicKeyLocator 的 60 行,代码如下:

// Check if we are allowed to send request
synchronized (this) {
    currentTime = Time.currentTime();
    if (currentTime > lastRequestTime + minTimeBetweenRequests) {
        sendRequest(deployment);
        lastRequestTime = currentTime;
    } else {
        log.debug("Won't send request to realm jwks url. Last request time was " + lastRequestTime);
    }

    return lookupCachedKey(publicKeyCacheTtl, currentTime, kid);
}           

因为当前 JWKPublicKeyLocator 是单例,同一个进程所有线程公用这个实例,所以当一个线程程序无法退出时,其他线程执行到 synchronized 只能阻塞,继续去看 sendRequest 函数,该函数会调用 keycloak 接口获取 publicKey,协议是 http。

抛出问题:

  1. 调用获取 publicKey 接口为啥会出现长时间未返回?
  2. 是否有设置超时时间,当接口超出时间未返回时快速失败?

我们后端使用 springboot 自动配置 keycloak,配置文件主要有三个参数:

keycloak.realm=realmId
keycloak.resource=clientId
keycloak.auth-server-url=http://127.0.0.1:8180/auth           

这几个参数是 KeycloakSpringBootProperties 配置类自动注入的,其中 keycloak.auth-server-url 配置的就是 keycloak 调用的 baseurl,客户环境该参数是域名形式,不是 ip+端口格式,所以调用时会走域名解析,负载均衡等过程。

如果客户的外网环境很差,出现网络抖动等问题,通过这种方式调用还是可能会出现数据长时间未返回的情况。通过代码分析,这里的调用设置的超时时间等参数用的是默认值,可以查看 org.apache.http.client.config.RequestConfig 类,默认是使用 public static final RequestConfig DEFAULT = (new RequestConfig.Builder()).build(); 构建默认值,Builder 里面的默认值,有关超时的三个参数:

private int connectionRequestTimeout = -1;
private int connectTimeout = -1;
private int socketTimeout = -1;           

-1 表示不超时,所以我们的接口默认是不会超时的,当一个请求阻塞住没法释放锁,其它请求都没办法响应,只能等待锁释放。

3

问题解决方案

总结一下避免此类问题的办法:

  1. 调用外部接口时,必须设置超时时间,避免由于一次调用超时导致整个服务的不可用;
  2. 如果 keycloak 部署在同一个局域网环境中,配置的 keycloak 的地址参数可以使用内网 ip 参数,不使用域名或者外网,这样不会出现由于网络问题导致的接口长时间不返回。

3.1 设置超时时间

先分析 keycloak jdk 中调用部的源码。

org.keycloak.adapters.rotation.JWKPublicKeyLocator#sendRequest 如下:

private void sendRequest(KeycloakDeployment deployment) {
      if (log.isTraceEnabled()) {
          log.trace("Going to send request to retrieve new set of realm public keys for client " + deployment.getResourceName());
      }

      HttpGet getMethod = new HttpGet(deployment.getJwksUrl());
      try {
          JSONWebKeySet jwks = HttpAdapterUtils.sendJsonHttpRequest(deployment, getMethod, JSONWebKeySet.class);

          Map<String, PublicKey> publicKeys = JWKSUtils.getKeysForUse(jwks, JWK.Use.SIG);

          if (log.isDebugEnabled()) {
              log.debug("Realm public keys successfully retrieved for client " +  deployment.getResourceName() + ". New kids: " + publicKeys.keySet().toString());
          }

          // Update current keys
          currentKeys.clear();
          currentKeys.putAll(publicKeys);

      } catch (HttpClientAdapterException e) {
          log.error("Error when sending request to retrieve realm keys", e);
      }
   }           

org.keycloak.adapters.HttpAdapterUtils#sendJsonHttpRequest 如下:

public static <T> T sendJsonHttpRequest(KeycloakDeployment deployment, HttpRequestBase httpRequest, Class<T> clazz) throws HttpClientAdapterException {
      try {
          HttpResponse response = deployment.getClient().execute(httpRequest);
          int status = response.getStatusLine().getStatusCode();
          if (status != 200) {
              close(response);
              throw new HttpClientAdapterException("Unexpected status = " + status);
          }
          HttpEntity entity = response.getEntity();
          if (entity == null) {
              throw new HttpClientAdapterException("There was no entity.");
          }
          InputStream is = entity.getContent();
          try {
              return JsonSerialization.readValue(is, clazz);
          } finally {
              try {
                  is.close();
              } catch (IOException ignored) {

              }
          }
      } catch (IOException e) {
          throw new HttpClientAdapterException("IO error", e);
      }
  }           

源码中 HttpGet 和 deployment.getClient() 这两个地方都未设置超时时间,所以在请求 keycloak 接口时,使用的默认的配置,默认配置是-1 表示不超时。这里的 deployment 对象虽然未使用 Spring 托管,但是可以通过其他托管对象获取到,而且它一旦建立就是全局唯一的,所以我们解决的思路是获取全局的 deployment 对象,然后获取其 client,然后改变其设置。

通过分析代码发现,AdapterDeploymentContext 实例是 spring 托管的,而且能通过它找到 deployment 实例。接下来就是确定怎么拦截这个请求,一般有两种方式 filter 或者 interceptor,在此场景中使用 filter 会更方便点(因为 keycloak jdk 本身就定义了很多 filter,而且支持自定义 filter),例如 jdk 中自带的 org.keycloak.adapters.springsecurity.filter.KeycloakPreAuthActionsFilter,参考这个 filter,我们自定义 filter,代码如下:

@Component
public class ChangeTimeOutFilter implements Filter {

    @Resource
    private AdapterDeploymentContext deploymentContext;

    private volatile boolean deploymentChanged = false;

    @Override
    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {

        HttpFacade facade = new SimpleHttpFacade((HttpServletRequest)request, (HttpServletResponse)response);
        KeycloakDeployment deployment = deploymentContext.resolveDeployment(facade);

        if (deployment == null) {
            chain.doFilter(request, response);
            return;
        }

        //deployment 是全局唯一,只需要修改一次
        if (deploymentChanged) {
            chain.doFilter(request, response);
            return;
        }

        /**
         * 设置超时时间
         */
        HttpParams params = deployment.getClient().getParams();
        params.setIntParameter(CoreConnectionPNames.SO_TIMEOUT, 10000);
        params.setIntParameter(CoreConnectionPNames.CONNECTION_TIMEOUT, 10000);
        params.setLongParameter(ClientPNames.CONN_MANAGER_TIMEOUT, 10000L);
        deploymentChanged=true;

        chain.doFilter(request, response);
    }           

为了测试超时时间是否生效所构造的错误,我们将超时时间设置非常短,例如几毫秒,然后调用一定会超时。代码部署后,错误日志会输出:

Error when sending request to retrieve realm keys
org.keycloak.adapters.HttpClientAdapterException: IO error
        at org.keycloak.adapters.HttpAdapterUtils.sendJsonHttpRequest(HttpAdapterUtils.java:57)
        at org.keycloak.adapters.rotation.JWKPublicKeyLocator.sendRequest(JWKPublicKeyLocator.java:99)
        at org.keycloak.adapters.rotation.JWKPublicKeyLocator.getPublicKey(JWKPublicKeyLocator.java:63)
        at org.keycloak.adapters.rotation.AdapterTokenVerifier.getPublicKey(AdapterTokenVerifier.java:121)
        at org.keycloak.adapters.rotation.AdapterTokenVerifier.createVerifier(AdapterTokenVerifier.java:111)
        at org.keycloak.adapters.rotation.AdapterTokenVerifier.verifyToken(AdapterTokenVerifier.java:47)
        at org.keycloak.adapters.BearerTokenRequestAuthenticator.authenticateToken(BearerTokenRequestAuthenticator.java:103)
        at org.keycloak.adapters.BearerTokenRequestAuthenticator.authenticate(BearerTokenRequestAuthenticator.java:88)
        at org.keycloak.adapters.RequestAuthenticator.authenticate(RequestAuthenticator.java:67)
        ......
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:171)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)           

上面这个报错信息,说明超时配置已经生效。

3.2 keycloak 访问地址设置为内网

原来的 keycloak 配置信息:

keycloak.realm=atlas
keycloak.resource=atlas-assistant
keycloak.auth-server-url=http://122.122.122.122:8180/auth
keycloak.ssl-required=none
keycloak.public-client=true
keycloak.use-resource-role-mappings=true           

将 keycloak.auth-server-url 改为内网地址 keycloak.auth-server-url=http://172.17.0.1:8180/auth因为考虑到前端需要通过后端返回的 keycloak 地址在浏览器进行跳转(跳转到登录页面),所以这个返回的地址必须是外网地址(内网地址前端没法请求),所以新增一个配置项,这个配置项的值配置为外网地址,只用来返回给前端(以前都是使用 keycloak.auth-server-url 这个配置,现在将其拆开) environment.keycloak.auth-server-url=http://122.122.122.122:8180/auth ,然后相应的地方代码修改部署到测试环境后,进入页面报错:

技术实践分享 | springboot 使用 keycloak 踩坑记录

设置内网登录失败

先看错误日志(此日志级别是 debug,非 error,所以这块对源码是有疑问的):

2022-04-01 16:45:33.069 [http-nio-8081-exec-3] DEBUG o.k.adapters.BearerTokenRequestAuthenticator - Found [1] values in authorization header, selecting the first value for Bearer. 
2022-04-01 16:45:33.069 [http-nio-8081-exec-3] DEBUG o.k.adapters.BearerTokenRequestAuthenticator - Verifying access_token 
2022-04-01 16:45:33.071 [http-nio-8081-exec-3] DEBUG o.k.adapters.BearerTokenRequestAuthenticator - Failed to verify token 
2022-04-01 16:45:33.071 [http-nio-8081-exec-3] DEBUG org.keycloak.adapters.RequestAuthenticator - Bearer FAILED 
2022-04-01 16:45:33.071 [http-nio-8081-exec-3] DEBUG o.k.a.s.f.KeycloakAuthenticationProcessingFilter - Auth outcome: FAILED 
2022-04-01 16:45:33.071 [http-nio-8081-exec-3] DEBUG o.k.a.s.f.KeycloakAuthenticationProcessingFilter - Authentication request failed: org.keycloak.adapters.springsecurity.KeycloakAuthenticationException: Invalid authorization header, see WWW-Authenticate header for details 
org.keycloak.adapters.springsecurity.KeycloakAuthenticationException: Invalid authorization header, see WWW-Authenticate header for details
	at org.keycloak.adapters.springsecurity.filter.KeycloakAuthenticationProcessingFilter.attemptAuthentication(KeycloakAuthenticationProcessingFilter.java:162)
	at org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:212)
	at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
	at org.keycloak.adapters.springsecurity.filter.KeycloakPreAuthActionsFilter.doFilter(KeycloakPreAuthActionsFilter.java:96)
	at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:103)
	at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
	at org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:105)
	at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:103)
	at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
	at org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:215)
	at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:178)
	at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:358)
	at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:271)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:103)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:103)           

通过分析源码,最终定位到是有一个 RealmUrlCheck 校验逻辑没通过导致。

技术实践分享 | springboot 使用 keycloak 踩坑记录

分析出这两个值,一个是内网地址,一个是外网地址,虽然指向同一个地方,但是值不同,校验失败,为什么会出现这两个不同的地址,可以从前后端交互逻辑说起:

技术实践分享 | springboot 使用 keycloak 踩坑记录

所以 1.2 后端需要返回给前端浏览器地址必须是外网地址,而前端请求后生成的 token 中携带的就是外网地址,deployment 中的 realmUrl 是根据配置信息(keycloak.auth-server-url)解析出来的,是内网地址,而 JsonWebToken 是根据前端传到后端的 token 解析出来的,这里面的地址是外网地址,4.2 校验 token 时两个地址是不一致的,后端会认为该 token 存在被篡改的危险,抛出了异常,所以为了解决该问题,思考了两种解决方案:

方案 1:扩展原来的 jdk,使用自定义的 KeycloakConfigResolver,KeycloakDeployment 等,比较复杂;

方案 2:在上面的 filter 中,对 deployment 数据进行修改,将 realmUrl 地址从内网替换成域名,因为真正请求的时候不是使用这个参数,所以不会影响内网调用(真正调用时使用的是 authServerBaseUrl 参数)。

为了节省时间,使用了方案 2,具体的 filter 变成了:

@Component
public class ChangeTimeOutFilter implements Filter {

    @Resource
    private AdapterDeploymentContext deploymentContext;

    @Resource
    private KeyCloakConfig keyCloakConfig;

    private static Field realmInfoUrlFd;

    private volatile boolean deploymentChanged = false;

    static {
        try {
            ChangeTimeOutFilter.realmInfoUrlFd = KeycloakDeployment.class.getDeclaredField("realmInfoUrl");
            realmInfoUrlFd.setAccessible(true);
        } catch (Exception ex){

        }
    }

    @Override
    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {

        HttpFacade facade = new SimpleHttpFacade((HttpServletRequest)request, (HttpServletResponse)response);
        KeycloakDeployment deployment = deploymentContext.resolveDeployment(facade);

        if (deployment == null) {
            chain.doFilter(request, response);
            return;
        }

        //deployment 是全局唯一,只需要修改一次
        if (deploymentChanged) {
            chain.doFilter(request, response);
            return;
        }

        /**
         * 将 realmInfoUrl 从内网改为外网,可以让 check 通过
         */
        String realmInfoUrl = deployment.getRealmInfoUrl();
        if (!StringUtils.isBlank(realmInfoUrl)) {
            realmInfoUrl = realmInfoUrl.replaceAll(keyCloakConfig.getInnerUrl(), keyCloakConfig.getAuthUrl());
            try {
                realmInfoUrlFd.set(deployment, realmInfoUrl);
            } catch (Exception ex){
            }
        }

        HttpParams params = deployment.getClient().getParams();
        params.setIntParameter(CoreConnectionPNames.SO_TIMEOUT, 10000);
        params.setIntParameter(CoreConnectionPNames.CONNECTION_TIMEOUT, 10000);
        params.setLongParameter(ClientPNames.CONN_MANAGER_TIMEOUT, 10000L);
        deploymentChanged=true;

        chain.doFilter(request, response);
    }
}           

通过反射将字段的值改变使其一致,就可以绕过校验。

基于以上解决方案,目前我们已经解决了这个突如其来的报错。未来,我将继续在“观远数据技术团队”分享过往的各种踩坑故事以及改进经验,欢迎大家关注,共同探讨。

作者:杭州小丁,观远数据后端开发,互联网老兵,长期奋斗于J2EE领域,热衷于研究各种开源代码并从中进行个人技能的提高,擅长系统的架构设计与实现,设计模式,微服务,并发编程,领域建模等,致力于通过技术提供高稳定性,高效率和高扩展性的业务系统。

来源-微信公众号:观远数据技术团队

出处:https://mp.weixin.qq.com/s/GyTWRV19qrDiUKH9lKTa-Q