laitimes

Throw away ELK! Try this kubernetes logging solution that we've been using for over 7 years!

This article brings you an introduction to a set of lightweight Kubernetes log collection solutions. I've used this solution myself in production, and to my surprise, the kubernetes resources it consumes are really small compared to the ELK solution. Then follow this article and start learning Xi it......

Why you should use Loki

This article focuses on the Loki log collection application developed by Grafana. Loki is a lightweight log collection and analysis application, which uses the promtail method to obtain the log content and send it to Loki for storage, and finally add the data source to Grafana's datasource for log display and query.

Loki's persistent storage supports five types: Azure, GCS, S3, Swift, and Local, among which S3 and Local are commonly used. In addition, it also supports many types of log collection, such as the most commonly used logstash and fluentbit.

So what are the advantages of it?

  • 支持的客户端,如Promtail,Fluentbit,Fluentd,Vector,Logstash和Grafana Agent
  • Preferred agent Promtail, which can extract logs from multiple sources, including local log files, systemd, Windows event logs, Docker logging drivers, and more
  • There are no log format requirements, including JSON, XML, CSV, logfmt, unstructured text
  • Query logs using the same syntax as query metrics
  • Log queries allow you to dynamically filter and convert log lines
  • You can easily calculate the required metrics in the logs
  • Minimal indexes at ingestion mean you can dynamically slice and dice logs at query time to answer new questions as they arise
  • Cloud-native support, using Prometheus to scrape data

A simple comparison of the log collection components

Loki works to solve the problem of log parsing format

Throw away ELK! Try this kubernetes logging solution that we've been using for over 7 years!

As we can see from the above figure, when parsing logs, it is mainly index, which includes timestamps and part of the label of the pod (other labels are filename, containers, etc.), and the rest is the log content. The specific query effect is as follows:

Throw away ELK! Try this kubernetes logging solution that we've been using for over 7 years!
Throw away ELK! Try this kubernetes logging solution that we've been using for over 7 years!

{app="loki",namespace="kube-public"}为索引

Log collection architecture pattern

Throw away ELK! Try this kubernetes logging solution that we've been using for over 7 years!

In the process of use, it is recommended to use promtail as an agent to deploy it on the Kubernetes worker node in DaemonSet mode to collect logs. You can also use the other log collection tools mentioned above, and the configuration of the other tools is attached at the end of this article.

What are the Loki deployment modes?

Loki is built from a number of component microservices, and there are 5 microservice components. Add a cache to these 5 to put the data up and speed up the query. The data is placed in the shared storage configuration memberlist_config part and the state is shared between instances, so that Loki can be scaled out infinitely.

After configuring the memberlist_config section, polling is used to find the data. For the sake of ease of use, the official compiles all microservices into a binary, which can be controlled by the command line parameter -target, supports all, read, and write, and we can specify different modes according to the size of the log volume when deploying

all (read/write mode)

After the service is started, the data queries and data writes we do come from this one node. Take a look at this diagram below:

Throw away ELK! Try this kubernetes logging solution that we've been using for over 7 years!

read/write (read/write splitting mode)

When running in read/write splitting mode, the fronted-query query forwards traffic to the read node. The querier, ruler, and fronted are retained on the read node, and the distributor and ingester are retained on the write node

Operates in microservice mode

In the microservice mode, different roles are started through different configuration parameters, and each process references its target role service.

Throw away ELK! Try this kubernetes logging solution that we've been using for over 7 years!

Server-side deployment

We've talked so much about loki and how it works, and you're looking forward to how it's deployed, right?!How to deploy, where to deploy, how to use it after deployment, and so on.

You need to prepare a k8s cluster before deploying. Okay, let's look down patiently......

Throw away ELK! Try this kubernetes logging solution that we've been using for over 7 years!

AllInOne deployment model

(1) K8S deployment

The program we downloaded from github does not have a configuration file, so we need to prepare a copy of the file in advance. A complete allInOne configuration file is available here, with some optimizations.

The configuration file content is as follows

auth_enabled: false
target: all
ballast_bytes: 20480
server:
  grpc_listen_port: 9095
  http_listen_port: 3100
  graceful_shutdown_timeout: 20s
  grpc_listen_address: "0.0.0.0"
  grpc_listen_network: "tcp"
  grpc_server_max_concurrent_streams: 100
  grpc_server_max_recv_msg_size: 4194304
  grpc_server_max_send_msg_size: 4194304
  http_server_idle_timeout: 2m
  http_listen_address: "0.0.0.0"
  http_listen_network: "tcp"
  http_server_read_timeout: 30s
  http_server_write_timeout: 20s
  log_source_ips_enabled: true
  # http_path_prefix如果需要更改,在推送日志的时候前缀都需要加指定的内容
  # http_path_prefix: "/"
  register_instrumentation: true
  log_format: json
  log_level: info
distributor:
  ring:
    heartbeat_timeout: 3s
    kvstore:
      prefix: collectors/
      store: memberlist
      # 需要提前创建好consul集群
    #   consul:
    #     http_client_timeout: 20s
    #     consistent_reads: true
    #     host: 127.0.0.1:8500
    #     watch_burst_size: 2
    #     watch_rate_limit: 2
querier:
  engine:
    max_look_back_period: 20s 
    timeout: 3m0s 
  extra_query_delay: 100ms 
  max_concurrent: 10 
  multi_tenant_queries_enabled: true
  query_ingester_only: false
  query_ingesters_within: 3h0m0s
  query_store_only: false
  query_timeout: 5m0s
  tail_max_duration: 1h0s
query_scheduler:
  max_outstanding_requests_per_tenant: 2048
  grpc_client_config:
    max_recv_msg_size: 104857600
    max_send_msg_size: 16777216
    grpc_compression: gzip
    rate_limit: 0
    rate_limit_burst: 0
    backoff_on_ratelimits: false
    backoff_config:
      min_period: 50ms
      max_period: 15s
      max_retries: 5 
  use_scheduler_ring: true
  scheduler_ring:
    kvstore:
      store: memberlist
      prefix: "collectors/"
    heartbeat_period: 30s
    heartbeat_timeout: 1m0s
    # 默认第一个网卡的名称
    # instance_interface_names
    # instance_addr: 127.0.0.1
    # 默认server.grpc-listen-port
    instance_port: 9095
frontend:
  max_outstanding_per_tenant: 4096
  querier_forget_delay: 1h0s
  compress_responses: true
  log_queries_longer_than: 2m0s
  max_body_size: 104857600
  query_stats_enabled: true
  scheduler_dns_lookup_period: 10s 
  scheduler_worker_concurrency: 15
query_range:
  align_queries_with_step: true
  cache_results: true
  parallelise_shardable_queries: true
  max_retries: 3
  results_cache:
    cache:
      enable_fifocache: false
      default_validity: 30s 
      background:
        writeback_buffer: 10000
      redis:
        endpoint: 127.0.0.1:6379
        timeout: 1s
        expiration: 0s 
        db: 9
        pool_size: 128 
        password: 1521Qyx6^
        tls_enabled: false
        tls_insecure_skip_verify: true
        idle_timeout: 10s 
        max_connection_age: 8h
ruler:
  enable_api: true
  enable_sharding: true
  alertmanager_refresh_interval: 1m
  disable_rule_group_label: false
  evaluation_interval: 1m0s
  flush_period: 3m0s
  for_grace_period: 20m0s
  for_outage_tolerance: 1h0s
  notification_queue_capacity: 10000
  notification_timeout: 4s
  poll_interval: 10m0s
  query_stats_enabled: true
  remote_write:
    config_refresh_period: 10s
    enabled: false
  resend_delay: 2m0s
  rule_path: /rulers
  search_pending_for: 5m0s
  storage:
    local:
      directory: /data/loki/rulers
    type: configdb
  sharding_strategy: default
  wal_cleaner:
    period:  240h
    min_age: 12h0m0s
  wal:
    dir: /data/loki/ruler_wal
    max_age: 4h0m0s
    min_age: 5m0s
    truncate_frequency: 1h0m0s
  ring:
    kvstore:
      store: memberlist
      prefix: "collectors/"
    heartbeat_period: 5s
    heartbeat_timeout: 1m0s
    # instance_addr: "127.0.0.1"
    # instance_id: "miyamoto.en0"
    # instance_interface_names: ["en0","lo0"]
    instance_port: 9500
    num_tokens: 100
ingester_client:
  pool_config:
    health_check_ingesters: false
    client_cleanup_period: 10s 
    remote_timeout: 3s
  remote_timeout: 5s 
ingester:
  autoforget_unhealthy: true
  chunk_encoding: gzip
  chunk_target_size: 1572864
  max_transfer_retries: 0
  sync_min_utilization: 3.5
  sync_period: 20s
  flush_check_period: 30s 
  flush_op_timeout: 10m0s
  chunk_retain_period: 1m30s
  chunk_block_size: 262144
  chunk_idle_period: 1h0s
  max_returned_stream_errors: 20
  concurrent_flushes: 3
  index_shards: 32
  max_chunk_age: 2h0m0s
  query_store_max_look_back_period: 3h30m30s
  wal:
    enabled: true
    dir: /data/loki/wal 
    flush_on_shutdown: true
    checkpoint_duration: 15m
    replay_memory_ceiling: 2GB
  lifecycler:
    ring:
      kvstore:
        store: memberlist
        prefix: "collectors/"
      heartbeat_timeout: 30s 
      replication_factor: 1
    num_tokens: 128
    heartbeat_period: 5s 
    join_after: 5s 
    observe_period: 1m0s
    # interface_names: ["en0","lo0"]
    final_sleep: 10s 
    min_ready_duration: 15s
storage_config:
  boltdb:
    directory: /data/loki/boltdb 
  boltdb_shipper:
    active_index_directory: /data/loki/active_index
    build_per_tenant_index: true
    cache_location: /data/loki/cache 
    cache_ttl: 48h
    resync_interval: 5m
    query_ready_num_days: 5
    index_gateway_client:
      grpc_client_config:
  filesystem:
    directory: /data/loki/chunks
chunk_store_config:
  chunk_cache_config:
    enable_fifocache: true
    default_validity: 30s
    background:
      writeback_buffer: 10000
    redis:
      endpoint: 192.168.3.56:6379
      timeout: 1s
      expiration: 0s 
      db: 8 
      pool_size: 128 
      password: 1521Qyx6^
      tls_enabled: false
      tls_insecure_skip_verify: true
      idle_timeout: 10s 
      max_connection_age: 8h
    fifocache:
      ttl: 1h
      validity: 30m0s
      max_size_items: 2000
      max_size_bytes: 500MB
  write_dedupe_cache_config:
    enable_fifocache: true
    default_validity: 30s 
    background:
      writeback_buffer: 10000
    redis:
      endpoint: 127.0.0.1:6379
      timeout: 1s
      expiration: 0s 
      db: 7
      pool_size: 128 
      password: 1521Qyx6^
      tls_enabled: false
      tls_insecure_skip_verify: true
      idle_timeout: 10s 
      max_connection_age: 8h
    fifocache:
      ttl: 1h
      validity: 30m0s
      max_size_items: 2000
      max_size_bytes: 500MB
  cache_lookups_older_than: 10s 
# 压缩碎片索引
compactor:
  shared_store: filesystem
  shared_store_key_prefix: index/
  working_directory: /data/loki/compactor
  compaction_interval: 10m0s
  retention_enabled: true
  retention_delete_delay: 2h0m0s
  retention_delete_worker_count: 150
  delete_request_cancel_period: 24h0m0s
  max_compaction_parallelism: 2
  # compactor_ring:
frontend_worker:
  match_max_concurrent: true
  parallelism: 10
  dns_lookup_duration: 5s 
# runtime_config 这里没有配置任何信息
# runtime_config:
common:
  storage:
    filesystem:
      chunks_directory: /data/loki/chunks
      fules_directory: /data/loki/rulers
  replication_factor: 3
  persist_tokens: false
  # instance_interface_names: ["en0","eth0","ens33"]
analytics:
  reporting_enabled: false
limits_config:
  ingestion_rate_strategy: global
  ingestion_rate_mb: 100
  ingestion_burst_size_mb: 18
  max_label_name_length: 2096
  max_label_value_length: 2048
  max_label_names_per_series: 60
  enforce_metric_name: true
  max_entries_limit_per_query: 5000
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  creation_grace_period: 20m0s
  max_global_streams_per_user: 5000
  unordered_writes: true
  max_chunks_per_query: 200000
  max_query_length: 721h
  max_query_parallelism: 64 
  max_query_series: 700
  cardinality_limit: 100000
  max_streams_matchers_per_query: 1000 
  max_concurrent_tail_requests: 10 
  ruler_evaluation_delay_duration: 3s 
  ruler_max_rules_per_rule_group: 0
  ruler_max_rule_groups_per_tenant: 0
  retention_period: 700h
  per_tenant_override_period: 20s 
  max_cache_freshness_per_query: 2m0s
  max_queriers_per_tenant: 0
  per_stream_rate_limit: 6MB
  per_stream_rate_limit_burst: 50MB
  max_query_lookback: 0
  ruler_remote_write_disabled: false
  min_sharding_lookback: 0s
  split_queries_by_interval: 10m0s
  max_line_size: 30mb
  max_line_size_truncate: false
  max_streams_per_user: 0

# memberlist_conig模块配置gossip用于在分发服务器、摄取器和查询器之间发现和连接。
# 所有三个组件的配置都是唯一的,以确保单个共享环。
# 至少定义了1个join_members配置后,将自动为分发服务器、摄取器和ring 配置memberlist类型的kvstore
memberlist:
  randomize_node_name: true
  stream_timeout: 5s 
  retransmit_factor: 4
  join_members:
  - 'loki-memberlist'
  abort_if_cluster_join_fails: true
  advertise_addr: 0.0.0.0
  advertise_port: 7946
  bind_addr: ["0.0.0.0"]
  bind_port: 7946
  compression_enabled: true
  dead_node_reclaim_time: 30s
  gossip_interval: 100ms
  gossip_nodes: 3
  gossip_to_dead_nodes_time: 3
  # join:
  leave_timeout: 15s
  left_ingesters_timeout: 3m0s 
  max_join_backoff: 1m0s
  max_join_retries: 5
  message_history_buffer_bytes: 4096
  min_join_backoff: 2s
  # node_name: miyamoto
  packet_dial_timeout: 5s
  packet_write_timeout: 5s 
  pull_push_interval: 100ms
  rejoin_interval: 10s
  tls_enabled: false
  tls_insecure_skip_verify: true
schema_config:
  configs:
  - from: "2020-10-24"
    index:
      period: 24h
      prefix: index_
    object_store: filesystem
    schema: v11
    store: boltdb-shipper
    chunks:
      period: 168h
    row_shards: 32
table_manager:
  retention_deletes_enabled: false
  retention_period: 0s
  throughput_updates_disabled: false
  poll_interval: 3m0s
  creation_grace_period: 20m
  index_tables_provisioning:
    provisioned_write_throughput: 1000
    provisioned_read_throughput: 500
    inactive_write_throughput: 4
    inactive_read_throughput: 300
    inactive_write_scale_lastn: 50 
    enable_inactive_throughput_on_demand_mode: true
    enable_ondemand_throughput_mode: true
    inactive_read_scale_lastn: 10 
    write_scale:
      enabled: true
      target: 80
      # role_arn:
      out_cooldown: 1800
      min_capacity: 3000
      max_capacity: 6000
      in_cooldown: 1800
    inactive_write_scale:
      enabled: true
      target: 80
      out_cooldown: 1800
      min_capacity: 3000
      max_capacity: 6000
      in_cooldown: 1800
    read_scale:
      enabled: true
      target: 80
      out_cooldown: 1800
      min_capacity: 3000
      max_capacity: 6000
      in_cooldown: 1800
    inactive_read_scale:
      enabled: true
      target: 80
      out_cooldown: 1800
      min_capacity: 3000
      max_capacity: 6000
      in_cooldown: 1800
  chunk_tables_provisioning:
    enable_inactive_throughput_on_demand_mode: true
    enable_ondemand_throughput_mode: true
    provisioned_write_throughput: 1000
    provisioned_read_throughput: 300
    inactive_write_throughput: 1
    inactive_write_scale_lastn: 50
    inactive_read_throughput: 300
    inactive_read_scale_lastn: 10
    write_scale:
      enabled: true
      target: 80
      out_cooldown: 1800
      min_capacity: 3000
      max_capacity: 6000
      in_cooldown: 1800
    inactive_write_scale:
      enabled: true
      target: 80
      out_cooldown: 1800
      min_capacity: 3000
      max_capacity: 6000
      in_cooldown: 1800
    read_scale:
      enabled: true
      target: 80
      out_cooldown: 1800
      min_capacity: 3000
      max_capacity: 6000
      in_cooldown: 1800
    inactive_read_scale:
      enabled: true
      target: 80
      out_cooldown: 1800
      min_capacity: 3000
      max_capacity: 6000
      in_cooldown: 1800
tracing:
  enabled: true
           

Caution:

The value of ingester.lifecycler.ring.replication_factor is 1 in the case of a single instance

The value of the ingester.lifecycler.min_ready_duration is 15s, and it will be displayed for 15 seconds to change the status to ready after startup

The value of the memberlist.node_name can be set without setting, and the default value is the name of the current host

memberlist.join_members is a list where the hostname/IP address of each node needs to be added in case of multiple instances. In k8s, you can set up a service to bind to StatefulSets.

query_range.results_cache.cache.enable_fifocache is recommended to be set to false, which I set to true here

instance_interface_names is a list, the default is ["en0", "eth0"], you can set the corresponding NIC name as needed, generally no special settings are required.

Create a configmap

Note: Write the above content to a file - > loki-all.yaml and write it as a configmap to the k8s cluster. You can use the following command to create:

kubectl create configmap --from-file ./loki-all.yaml loki-all
           

You can view the created configmap by running the command, as shown in the following figure

Create persistent storage

In k8s, our data needs to be persisted. The log information collected by Loki is critical to the business, so it is necessary to retain the logs when the container is restarted.

Then you need to use pv, pvc, back-end storage can use nfs, glusterfs, hostPath, azureDisk, cephfs and other 20 support types, here because there is no corresponding environment to use the hostPath mode.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: loki
  namespace: default
spec:
  hostPath:
    path: /glusterfs/loki
    type: DirectoryOrCreate
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteMany
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: loki
  namespace: default
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  volumeName: loki
           

Create an app

After preparing the statefulSet deployment file of k8s, you can directly create an application in the cluster.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app: loki
  name: loki
  namespace: default
spec:
  podManagementPolicy: OrderedReady
  replicas: 1
  selector:
    matchLabels:
      app: loki
  template:
    metadata:
      annotations:
        prometheus.io/port: http-metrics
        prometheus.io/scrape: "true"
      labels:
        app: loki
    spec:
      containers:
      - args:
        - -config.file=/etc/loki/loki-all.yaml
        image: grafana/loki:2.5.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /ready
            port: http-metrics
            scheme: HTTP
          initialDelaySeconds: 45
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: loki
        ports:
        - containerPort: 3100
          name: http-metrics
          protocol: TCP
        - containerPort: 9095
          name: grpc
          protocol: TCP
        - containerPort: 7946
          name: memberlist-port
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /ready
            port: http-metrics
            scheme: HTTP
          initialDelaySeconds: 45
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          requests:
            cpu: 500m
            memory: 500Mi
          limits:
            cpu: 500m
            memory: 500Mi
        securityContext:
          readOnlyRootFilesystem: true
        volumeMounts:
        - mountPath: /etc/loki
          name: config
        - mountPath: /data
          name: storage
      restartPolicy: Always
      securityContext:
        fsGroup: 10001
        runAsGroup: 10001
        runAsNonRoot: true
        runAsUser: 10001
      serviceAccount: loki
      serviceAccountName: loki
      volumes:
      - emptyDir: {}
        name: tmp
      - name: config
        configMap:
          name: loki
      - persistentVolumeClaim:
          claimName: loki
        name: storage
---
kind: Service
apiVersion: v1
metadata:
  name: loki-memberlist
  namespace: default
spec:
  ports:
    - name: loki-memberlist
      protocol: TCP
      port: 7946
      targetPort: 7946
  selector:
    kubepi.org/name: loki
---
kind: Service
apiVersion: v1
metadata:
  name: loki
  namespace: default
spec:
  ports:
    - name: loki
      protocol: TCP
      port: 3100
      targetPort: 3100
  selector:
    kubepi.org/name: loki
           

In the above configuration file, I have added some pod-level security policies, these security policies also have cluster-level PodSecurityPolicy, to prevent the entire cluster crash due to vulnerabilities, about cluster-level psp, you can see the official documentation for details

Verify the deployment results

Throw away ELK! Try this kubernetes logging solution that we've been using for over 7 years!

When you see the above running status, you can use the API to see if the distributor is working normally, and when Active is displayed, the normal distribution log flow will be normal to the collector (ingester)

Throw away ELK! Try this kubernetes logging solution that we've been using for over 7 years!

(2) Bare metal deployment

Put loki in the /bin/ directory of the system, prepare a grafana-loki.service control file to reload the system service list

[Unit]
Description=Grafana Loki Log Ingester
Documentation=https://grafana.com/logs/
After=network-online.target

[Service]
ExecStart=/bin/loki --config.file /etc/loki/loki-all.yaml
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID

[Install]
WantedBy=multi-user.target
           

Overload the system list command to directly manage the system automatically:

systemctl daemon-reload
# 启动服务
systemctl start grafana-loki
# 停止服务
systemctl stop grafana-loki
# 重载应用
systemctl reload grafana-loki
           

Promtail deployment in action

When deploying a client to collect logs, you also need to create a configuration file as described in the steps above to create a server. The difference is that you need to push the log content to the server

(1) K8S deployment

Create a profile

server:
  log_level: info
  http_listen_port: 3101
clients:
  - url: http://loki:3100/loki/api/v1/push
positions:
  filename: /run/promtail/positions.yaml
scrape_configs:
  - job_name: kubernetes-pods
    pipeline_stages:
      - cri: {}
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels:
          - __meta_kubernetes_pod_controller_name
        regex: ([0-9a-z-.]+?)(-[0-9a-f]{8,10})?
        action: replace
        target_label: __tmp_controller_name
      - source_labels:
          - __meta_kubernetes_pod_label_app_kubernetes_io_name
          - __meta_kubernetes_pod_label_app
          - __tmp_controller_name
          - __meta_kubernetes_pod_name
        regex: ^;*([^;]+)(;.*)?$
        action: replace
        target_label: app
      - source_labels:
          - __meta_kubernetes_pod_label_app_kubernetes_io_instance
          - __meta_kubernetes_pod_label_release
        regex: ^;*([^;]+)(;.*)?$
        action: replace
        target_label: instance
      - source_labels:
          - __meta_kubernetes_pod_label_app_kubernetes_io_component
          - __meta_kubernetes_pod_label_component
        regex: ^;*([^;]+)(;.*)?$
        action: replace
        target_label: component
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_node_name
        target_label: node_name
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: namespace
      - action: replace
        replacement: $1
        separator: /
        source_labels:
        - namespace
        - app
        target_label: job
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_name
        target_label: pod
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_container_name
        target_label: container
      - action: replace
        replacement: /var/log/pods/*$1/*.log
        separator: /
        source_labels:
        - __meta_kubernetes_pod_uid
        - __meta_kubernetes_pod_container_name
        target_label: __path__
      - action: replace
        regex: true/(.*)
        replacement: /var/log/pods/*$1/*.log
        separator: /
        source_labels:
        - __meta_kubernetes_pod_annotationpresent_kubernetes_io_config_hash
        - __meta_kubernetes_pod_annotation_kubernetes_io_config_hash
        - __meta_kubernetes_pod_container_name
        target_label: __path__
           

Create a configMap with the above and the method is the same as above

Create a DaemonSet file

Promtail is a stateless application that does not need persistent storage, it needs to be deployed in the cluster, or the same is to prepare the DaemonSets to create files.

kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: promtail
  namespace: default
  labels:
    app.kubernetes.io/instance: promtail
    app.kubernetes.io/name: promtail
    app.kubernetes.io/version: 2.5.0
spec:
  selector:
    matchLabels:
      app.kubernetes.io/instance: promtail
      app.kubernetes.io/name: promtail
  template:
    metadata:
      labels:
        app.kubernetes.io/instance: promtail
        app.kubernetes.io/name: promtail
    spec:
      volumes:
        - name: config
          configMap:
            name: promtail
        - name: run
          hostPath:
            path: /run/promtail
        - name: containers
          hostPath:
            path: /var/lib/docker/containers
        - name: pods
          hostPath:
            path: /var/log/pods
      containers:
        - name: promtail
          image: docker.io/grafana/promtail:2.3.0
          args:
            - '-config.file=/etc/promtail/promtail.yaml'
          ports:
            - name: http-metrics
              containerPort: 3101
              protocol: TCP
          env:
            - name: HOSTNAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: spec.nodeName
          volumeMounts:
            - name: config
              mountPath: /etc/promtail
            - name: run
              mountPath: /run/promtail
            - name: containers
              readOnly: true
              mountPath: /var/lib/docker/containers
            - name: pods
              readOnly: true
              mountPath: /var/log/pods
          readinessProbe:
            httpGet:
              path: /ready
              port: http-metrics
              scheme: HTTP
            initialDelaySeconds: 10
            timeoutSeconds: 1
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 5
          imagePullPolicy: IfNotPresent
          securityContext:
            capabilities:
              drop:
                - ALL
            readOnlyRootFilesystem: false
            allowPrivilegeEscalation: false
      restartPolicy: Always
      serviceAccountName: promtail
      serviceAccount: promtail
      tolerations:
        - key: node-role.kubernetes.io/master
          operator: Exists
          effect: NoSchedule
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule
           

Create a promtail application

kubectl apply -f promtail.yaml
           

After using the above command to create, you can see that the service has been created. The next step is to add a DataSource to Grafana to view the data.

(2) Bare metal deployment

If it is a bare-metal deployment, you need to make a slight change to the above configuration file, change the address of the clients, and store the file in /etc/loki/, for example:

clients:
  - url: http://ipaddress:port/loki/api/v1/push
           

Add the system boot configuration, and store the service configuration file in /usr/lib/systemd/system/loki-promtail.service as follows:

[Unit]
Description=Grafana Loki Log Ingester
Documentation=https://grafana.com/logs/
After=network-online.target

[Service]
ExecStart=/bin/promtail --config.file /etc/loki/loki-promtail.yaml
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID

[Install]
WantedBy=multi-user.target
           

The startup method is the same as the above server-side deployment content

Loki in DataSource

Add a data source

具体步骤: Grafana->Setting->DataSources->AddDataSource->Loki

Notes:

http URL address, which namespace the application or service is deployed, you need to specify its FQDN address, and its format is ServiceName.namespace. If the default port number is 3100, you need to fill in the http://loki:3100, why don't you write the IP address and write the name of the service, because there is a DNS server in the k8s cluster that will automatically resolve this address.

Throw away ELK! Try this kubernetes logging solution that we've been using for over 7 years!

Find log information

Throw away ELK! Try this kubernetes logging solution that we've been using for over 7 years!

Other client configurations

Logstash acts as a log collection client

Install the plugin

After starting Logstash, we need to install a plugin, you can install the loki output plugin through this command, and after the installation is completed, you can add information to the output of logstash.

bin/logstash-plugin install logstash-output-loki
           

Add configurations for testing

For complete logstash configuration information, please refer to the LogstashConfigFile on the official website

output {
  loki {
    [url => "" | default = none | required=true]
    [tenant_id => string | default = nil | required=false]
    [message_field => string | default = "message" | required=false]
    [include_fields => array | default = [] | required=false]
    [batch_wait => number | default = 1(s) | required=false]
    [batch_size => number | default = 102400(bytes) | required=false]
    [min_delay => number | default = 1(s) | required=false]
    [max_delay => number | default = 300(s) | required=false]
    [retries => number | default = 10 | required=false]
    [username => string | default = nil | required=false]
    [password => secret | default = nil | required=false]
    [cert => path | default = nil | required=false]
    [key => path | default = nil| required=false]
    [ca_cert => path | default = nil | required=false]
    [insecure_skip_verify => boolean | default = false | required=false]
  }
}
           

Or use the http output module of Logstash with the following configuration:

output {
    http {
        format => "json"
        http_method => "post"
        content_type => "application/json"
        connect_timeout => 10
        url => "http://loki:3100/loki/api/v1/push"
        message => '"message":"%{message}"}'
    }
}
           

Helm installation

If you want to install it easily, you can use helm. Helm encapsulates all the installation steps, simplifying the installation steps.

For people who want to learn more about k8s, helm is not very suitable. Because it is encapsulated and executed automatically, k8s administrators do not know how the components depend on each other, which may cause misunderstandings.

Without further ado, let's start the helm installation

Add a repo source

helm repo add grafana https://grafana.github.io/helm-charts
           

Update the source

helm repo update
           

deploy

Default configuration

helm upgrade --install loki grafana/loki-simple-scalable
           

Customize the namespace

helm upgrade --install loki --namespace=loki grafana/loki-simple-scalable
           

Customize the configuration information

helm upgrade --install loki grafana/loki-simple-scalable --set "key1=val1,key2=val2,..."
           

8. Fault solutions

1.502 BadGateWay

The address of loki is incorrect

In k8s, the address is filled in incorrectly, resulting in a 502. Check to see if the address of the loki is the following:

http://LokiServiceName
http://LokiServiceName.namespace
http://LokiServiceName.namespace:ServicePort
           

Grafana and Loki check the network communication status and firewall policy between the nodes on different nodes

2.Ingester not ready: instance xx:9095 in state JOINING

Wait patiently for a while, as it will take some time for the program to start in allInOne mode.

3.too many unhealthy instances in the ring

Changing the ingester.lifecycler.replication_factor to 1 is due to an incorrect setting. This will be set to multiple replication sources at startup, but currently only one is deployed, so this will be prompted when viewing the label

4.Data source connected, but no labels received. Verify that Loki and Promtail is configured properly

  • promtail can't send the collected logs to loki, check if promtail's output is normal
  • Promtail sent the log while Loki wasn't ready, but Loki didn't receive it. If you need to receive logs again, you need to delete the positions.yaml file, and you can use find to find the specific path
  • promtail ignores the target log file or configuration file error that causes the boot to fail properly
  • promtail was unable to discover the log file at the specified location

Official Documentation:

  • https://kubernetes.io/docs/concepts/security/pod-security-policy/
Thanks for reading, I hope it helps you :) Source: juejin.cn/post/7150469420605767717

That's all for today's sharing, if it helps, welcome to support it with one click triple (like, comment, forward)!

Reader-only group: We sincerely invite you to join the technical exchange group and roll together!

If there are any errors or other problems, please leave comments and corrections. If it helps, welcome to like + forward and share. For more related open source technical articles, please stay tuned!Resource sharing (Xiaobian has carefully prepared various academic Xi materials for 2048G.) Including system operation and maintenance, database, redis, MogoDB, e-book, Java basic course, Java practical project, architect comprehensive tutorial, architect practical project, big data, Docker container, ELK Stack, machine learning Xi, BAT interview intensive lecture video, etc. )