k8s gc分析。Kubernetes garbage collector即垃圾收集器,存在于kube-controller-manger中,它负责回收kubernetes中的资源对象,监听资源对象事件,更新对象之间的依赖关系,并根据对象的删除策略来决定是否删除其关联对象。
garbage collector介绍
Kubernetes garbage collector即垃圾收集器,存在于kube-controller-manger中,它负责回收kubernetes中的资源对象,监听资源对象事件,更新对象之间的依赖关系,并根据对象的删除策略来决定是否删除其关联对象。
关于删除关联对象,细一点说就是,使用级联删除策略去删除一个
owner
时,会连带这个
owner
对象的
dependent
对象也一起删除掉。
关于对象的关联依赖关系,garbage collector会监听资源对象事件,根据资源对象中
ownerReference
的值,来构建对象间的关联依赖关系,也即
owner
与
dependent
之间的关系。
关于owner与dependent的介绍
以创建deployment对象为例进行讲解。
创建deployment对象后,kube-controller-manager为其创建出replicaset对象,且自动将该deployment的信息设置到replicaset对象
ownerReference
值。如下面示例,即说明replicaset对象
test-1-59d7f45ffb
的
owner
为deployment对象
test-1
,deployment对象
test-1
dependent
为replicaset对象
test-1-59d7f45ffb
。
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-1
namespace: test
uid: 4973d370-3221-46a7-8d86-e145bf9ad0ce
...
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: test-1-59d7f45ffb
namespace: test
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: Deployment
name: test-1
uid: 4973d370-3221-46a7-8d86-e145bf9ad0ce
uid: 386c380b-490e-470b-a33f-7d5b0bf945fb
...
同理,replicaset对象创建后,kube-controller-manager为其创建出pod对象,这些pod对象也会将replicaset对象的信息设置到pod对象的
ownerReference
的值中,replicaset是pod的
owner
,pod是replicaset的
dependent
对象中
ownerReference
的值,指定了
owner
dependent
garbage collector架构图
garbage collectort的详细架构与核心处理逻辑如下图。

garbage collector中最关键的代码就是
garbagecollector.go
graph_builder.go
两部分。
garbage collector的主要组成为1个图(对象关联依赖关系图)、2个处理器(
GraphBuilder
GarbageCollector
)、3个事件队列(
graphChanges
、
attemptToDelete
attemptToOrphan
):
1个图
(1)
uidToNode
:对象关联依赖关系图,由
GraphBuilder
维护,维护着所有对象间的关联依赖关系。在该图里,每一个k8s对象会对应着关系图里的一个
node
,而每个
node
都会维护一个
owner
列表以及
dependent
列表。
示例:现有一个deployment A,replicaset B(owner为deployment A),pod C(owner为replicaset B),则对象关联依赖关系如下:
3个node,分别是A、B、C
A对应一个node,无owner,dependent列表里有B;
B对应一个node,owner列表里有A,dependent列表里有C;
C对应一个node,owner列表里有B,无dependent。
2个处理器
GraphBuilder
:负责维护所有对象的关联依赖关系图,并产生事件触发
GarbageCollector
执行对象回收删除操作。
GraphBuilder
从
graphChanges
事件队列中获取事件进行消费,根据资源对象中
ownerReference
的值,来构建、更新、删除对象间的关联依赖关系图,也即
owner
dependent
之间的关系图,然后再作为生产者生产事件,放入
attemptToDelete
或
attemptToOrphan
队列中,触发
GarbageCollector
执行,看是否需要进行关联对象的回收删除操作,而
GarbageCollector
进行对象的回收删除操作时会依赖于
uidToNode
这个关系图。
(2)
GarbageCollector
:负责回收删除对象。
GarbageCollector
作为消费者,从
attemptToDelete
attemptToOrphan
队列中取出事件进行处理,若一个对象被删除,且其删除策略为级联删除,则进行关联对象的回收删除。关于删除关联对象,细一点说就是,使用级联删除策略去删除一个
owner
owner
dependent
3个事件队列
graphChanges
:list/watch apiserver,获取事件,由
informer
生产,由
GraphBuilder
消费;
attemptToDelete
:级联删除事件队列,由
GraphBuilder
GarbageCollector
(3)
attemptToOrphan
:孤儿删除事件队列,由
GraphBuilder
GarbageCollector
消费。
对象删除策略
kubernetes 中有三种对象删除策略:
Orphan
Foreground
和
Background
,删除某个对象时,可以指定删除策略。下面对这三种策略进行介绍。
Foreground前台删除
Foreground即前台删除策略,属于级联删除策略,垃圾收集器会删除对象的所有
dependent
使用前台删除策略删除某个对象时,该对象的
deletionTimestamp
字段被设置,且对象的
metadata.finalizers
字段包含值
foregroundDeletion
,用于阻塞该对象删除,等到垃圾收集器在删除了该对象中所有有阻塞能力的
dependent
对象(对象的
ownerReference.blockOwnerDeletion=true
) 之后,再去除该对象的
metadata.finalizers
字段中的值
foregroundDeletion
,然后删除该
owner
对象。
以删除deployment为例,使用前台删除策略,则按照Pod->ReplicaSet->Deployment的顺序进行删除。
Background后台删除
Background即后台删除策略,属于级联删除策略,Kubernetes会立即删除该
owner
对象,之后垃圾收集器会在后台自动删除其所有的
dependent
当删除一个对象时使用了
Background
后台删除策略时,该对象因没有相关的
Finalizer
设置(只有删除策略为
foreground
Orphan
时会设置相关
Finalizer
),会直接被删除,接着
GraphBuilder
会监听到该对象的delete事件,会将其
dependents
放入到
attemptToDelete
队列中去,触发
GarbageCollector
做
dependents
对象的回收删除处理。
以删除deployment为例,使用后台删除策略,则按照Deployment->ReplicaSet->Pod的顺序进行删除。
Orphan孤儿删除
Orphan即孤儿删除策略,属于非级联删除策略,即删除某个对象时,不会自动删除它的
dependent
,这些
dependent
也被称作孤立对象。
Orphan
孤儿删除策略时,该对象的
metadata.finalizers
orphan
,用于阻塞该对象删除,直至
GarbageCollector
将其所有
dependents
OwnerReferences
属性中的该
owner
的相关字段去除,再去除该
owner
metadata.finalizers
Orphan
,最后才能删除该
owner
以删除deployment为例,使用孤儿删除策略,则只删除Deployment,对应ReplicaSet和Pod不删除。
删除对象时指定删除策略
当删除对象时没有特别指定删除策略,将会使用默认删除策略:Background即后台删除策略。
(1)指定后台删除策略
curl -X DELETE localhost:8080/apis/apps/v1/namespaces/default/replicasets/my-repset \
-d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Background"}' \
-H "Content-Type: application/json"
(2)指定前台删除策略
curl -X DELETE localhost:8080/apis/apps/v1/namespaces/default/replicasets/my-repset \
-d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Foreground"}' \
-H "Content-Type: application/json"
(3)指定孤儿删除策略
curl -X DELETE localhost:8080/apis/apps/v1/namespaces/default/replicasets/my-repset \
-d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Orphan"}' \
-H "Content-Type: application/json"
garbage collector的源码分析分成两部分进行,分别是:
(1)启动分析;
(2)核心处理逻辑分析。
上一篇博客已经对garbage collector的启动进行了分析,本篇博客对garbage collector的核心处理逻辑进行分析。
garbage collector源码分析-处理逻辑分析
基于tag v1.17.4
https://github.com/kubernetes/kubernetes/releases/tag/v1.17.4
前面讲过,
garbage collector
中最关键的代码就是
garbagecollector.go
graph_builder.go
两部分,也即
GarbageCollector struct
GraphBuilder struct
,所以下面处理逻辑分析将分成两大块进行分析。
1.GraphBuilder
首先先看到
GraphBuilder
GraphBuilder 主要有2个功能:
(1)基于 informers 中的资源事件在
uidToNode
属性中维护着所有对象的关联依赖关系;
(2)处理
graphChanges
中的事件,并作为生产者将事件放入到
attemptToDelete
attemptToOrphan
两个队列中,触发消费者
GarbageCollector
进行对象的回收删除操作。
1.1 GraphBuilder struct
先来简单的分析下
GraphBuilder struct
,里面最关键的几个属性及作用如下:
graphChanges
:informers 监听到的事件会放在
graphChanges
中,然后
GraphBuilder
会作为消费者,处理
graphChanges
队列中的事件;
uidToNode
(对象依赖关联关系图):根据对象uid,维护所有对象的关联依赖关系,也即前面说的
owner
dependent
之间的关系,也可以理解为
GraphBuilder
会维护一张所有对象的关联依赖关系图,而
GarbageCollector
进行对象的回收删除操作时会依赖于这个关系图;
attemptToDelete
attemptToOrphan
:
GraphBuilder
作为生产者往
attemptToDelete
attemptToOrphan
两个队列中存放事件,然后
GarbageCollector
作为消费者会处理
attemptToDelete
attemptToOrphan
两个队列中的事件。
// pkg/controller/garbagecollector/graph_builder.go
type GraphBuilder struct {
...
// monitors are the producer of the graphChanges queue, graphBuilder alters
// the in-memory graph according to the changes.
graphChanges workqueue.RateLimitingInterface
// uidToNode doesn't require a lock to protect, because only the
// single-threaded GraphBuilder.processGraphChanges() reads/writes it.
uidToNode *concurrentUIDToNode
// GraphBuilder is the producer of attemptToDelete and attemptToOrphan, GC is the consumer.
attemptToDelete workqueue.RateLimitingInterface
attemptToOrphan workqueue.RateLimitingInterface
...
}
// pkg/controller/garbagecollector/graph.go
type concurrentUIDToNode struct {
uidToNodeLock sync.RWMutex
uidToNode map[types.UID]*node
}
// pkg/controller/garbagecollector/graph.go
type node struct {
...
dependents map[*node]struct{}
...
owners []metav1.OwnerReference
}
从结构体定义中可以看到,一个k8s对象对应着对象关联依赖关系图里的一个
node
node
都会维护一
个owner
dependent
1.2 GraphBuilder-gb.processGraphChanges
接下来看到
GraphBuilder
的处理逻辑部分,从
gb.processGraphChanges
作为入口进行处理逻辑分析。
前面说过,informers 监听到的事件会放入到
graphChanges
队列中,然后
GraphBuilder
graphChanges
队列中的事件,而
processGraphChanges
方法就是
GraphBuilder
作为消费者处理
graphChanges
队列中事件地方。
所以在此方法中,
GraphBuilder
既是消费者又是生产者,消费处理
graphChanges
中的所有事件并进行分类,再生产事件放入到
attemptToDelete
attemptToOrphan
两个队列中去,让
GarbageCollector
作为消费者去处理这两个队列中的事件。
主要逻辑:
(1)从
graphChanges
队列中取出事件进行处理;
(2)读取
uidToNode
,判断该对象是否已经存在于已构建的对象依赖关联关系图中;下面就开始根据对象是否存在于对象依赖关联关系图中以及事件类型来做不同的处理逻辑;
(3)若
uidToNode
中不存在该
node
且该事件是
addEvent
updateEvent
,则为该
object
创建对应的
node
,并调用
gb.insertNode
将该
node
加到
uidToNode
中,然后将该
node
添加到其
owner
dependents
中;
然后再调用
gb.processTransitions
方法做处理,该方法的处理逻辑是判断该对象是否处于删除状态,若处于删除状态会判断该对象是以
orphan
模式删除还是以
foreground
模式删除(其实就是判断deployment对象的finalizer来区分删除模式,删除deployment的时候会带上删除策略,kube-apiserver会根据删除策略给deployment对象打上相应的finalizer),若以
orphan
模式删除,则将该
node
加入到
attemptToOrphan
队列中,若以
foreground
模式删除则将该对象以及其所有
dependents
都加入到
attemptToDelete
队列中;
(4)若
uidToNode
中存在该
node
addEvent
updateEvent
时,则调用
referencesDiffs
方法检查该对象的
OwnerReferences
字段是否有变化,有变化则做相应处理,更新对象依赖关联关系图,最后调用
gb.processTransitions
做处理;
(5)若事件为删除事件,则调用
gb.removeNode
,从
uidToNode
中删除该对象,然后从该
node
所有
owners
dependents
中删除该对象,再把该对象的
dependents
attemptToDelete
GarbageCollector
处理;最后检查该
node
的所有
owners
,若有处于删除状态的
owner
,此时该
owner
可能处于删除阻塞状态正在等待该
node
的删除,将该
owner
attemptToDelete
GarbageCollector
处理。
// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) runProcessGraphChanges() {
for gb.processGraphChanges() {
}
}
// Dequeueing an event from graphChanges, updating graph, populating dirty_queue.
func (gb *GraphBuilder) processGraphChanges() bool {
item, quit := gb.graphChanges.Get()
if quit {
return false
}
defer gb.graphChanges.Done(item)
event, ok := item.(*event)
if !ok {
utilruntime.HandleError(fmt.Errorf("expect a *event, got %v", item))
return true
}
obj := event.obj
accessor, err := meta.Accessor(obj)
if err != nil {
utilruntime.HandleError(fmt.Errorf("cannot access obj: %v", err))
return true
}
klog.V(5).Infof("GraphBuilder process object: %s/%s, namespace %s, name %s, uid %s, event type %v", event.gvk.GroupVersion().String(), event.gvk.Kind, accessor.GetNamespace(), accessor.GetName(), string(accessor.GetUID()), event.eventType)
// Check if the node already exists
existingNode, found := gb.uidToNode.Read(accessor.GetUID())
if found {
// this marks the node as having been observed via an informer event
// 1. this depends on graphChanges only containing add/update events from the actual informer
// 2. this allows things tracking virtual nodes' existence to stop polling and rely on informer events
existingNode.markObserved()
}
switch {
case (event.eventType == addEvent || event.eventType == updateEvent) && !found:
newNode := &node{
identity: objectReference{
OwnerReference: metav1.OwnerReference{
APIVersion: event.gvk.GroupVersion().String(),
Kind: event.gvk.Kind,
UID: accessor.GetUID(),
Name: accessor.GetName(),
},
Namespace: accessor.GetNamespace(),
},
dependents: make(map[*node]struct{}),
owners: accessor.GetOwnerReferences(),
deletingDependents: beingDeleted(accessor) && hasDeleteDependentsFinalizer(accessor),
beingDeleted: beingDeleted(accessor),
}
gb.insertNode(newNode)
// the underlying delta_fifo may combine a creation and a deletion into
// one event, so we need to further process the event.
gb.processTransitions(event.oldObj, accessor, newNode)
case (event.eventType == addEvent || event.eventType == updateEvent) && found:
// handle changes in ownerReferences
added, removed, changed := referencesDiffs(existingNode.owners, accessor.GetOwnerReferences())
if len(added) != 0 || len(removed) != 0 || len(changed) != 0 {
// check if the changed dependency graph unblock owners that are
// waiting for the deletion of their dependents.
gb.addUnblockedOwnersToDeleteQueue(removed, changed)
// update the node itself
existingNode.owners = accessor.GetOwnerReferences()
// Add the node to its new owners' dependent lists.
gb.addDependentToOwners(existingNode, added)
// remove the node from the dependent list of node that are no longer in
// the node's owners list.
gb.removeDependentFromOwners(existingNode, removed)
}
if beingDeleted(accessor) {
existingNode.markBeingDeleted()
}
gb.processTransitions(event.oldObj, accessor, existingNode)
case event.eventType == deleteEvent:
if !found {
klog.V(5).Infof("%v doesn't exist in the graph, this shouldn't happen", accessor.GetUID())
return true
}
// removeNode updates the graph
gb.removeNode(existingNode)
existingNode.dependentsLock.RLock()
defer existingNode.dependentsLock.RUnlock()
if len(existingNode.dependents) > 0 {
gb.absentOwnerCache.Add(accessor.GetUID())
}
for dep := range existingNode.dependents {
gb.attemptToDelete.Add(dep)
}
for _, owner := range existingNode.owners {
ownerNode, found := gb.uidToNode.Read(owner.UID)
if !found || !ownerNode.isDeletingDependents() {
continue
}
// this is to let attempToDeleteItem check if all the owner's
// dependents are deleted, if so, the owner will be deleted.
gb.attemptToDelete.Add(ownerNode)
}
}
return true
}
结合代码分析可以得知,当删除一个对象时使用了
Background
Finalizer
Foreground
Orphan
Finalizer
GraphBuilder
dependents
attemptToDelete
GarbageCollector
dependents
1.2.1 gb.insertNode
调用
gb.insertNode
将
node
uidToNode
node
owner
dependents
中。
// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) insertNode(n *node) {
gb.uidToNode.Write(n)
gb.addDependentToOwners(n, n.owners)
}
func (gb *GraphBuilder) addDependentToOwners(n *node, owners []metav1.OwnerReference) {
for _, owner := range owners {
ownerNode, ok := gb.uidToNode.Read(owner.UID)
if !ok {
// Create a "virtual" node in the graph for the owner if it doesn't
// exist in the graph yet.
ownerNode = &node{
identity: objectReference{
OwnerReference: owner,
Namespace: n.identity.Namespace,
},
dependents: make(map[*node]struct{}),
virtual: true,
}
klog.V(5).Infof("add virtual node.identity: %s\n\n", ownerNode.identity)
gb.uidToNode.Write(ownerNode)
}
ownerNode.addDependent(n)
if !ok {
// Enqueue the virtual node into attemptToDelete.
// The garbage processor will enqueue a virtual delete
// event to delete it from the graph if API server confirms this
// owner doesn't exist.
gb.attemptToDelete.Add(ownerNode)
}
}
}
1.2.2 gb.processTransitions
gb.processTransitions 方法检查k8s对象是否处于删除状态(对象的
deletionTimestamp
属性不为空则处于删除状态),并且对象里含有删除策略对应的
finalizer
,然后做相应的处理。
因为只有删除策略为
Foreground
Orphan
时对象才会会设置相关
Finalizer
,所以该方法只会处理删除策略为
Foreground
Orphan
的对象,对于删除策略为
Background
的对象不做处理。
若对象的
deletionTimestamp
属性不为空,且有
Orphaned
删除策略对应的
finalizer
,则将对应的
node
attemptToOrphan
GarbageCollector
去消费处理;
deletionTimestamp
foreground
finalizer
,则调用
n.markDeletingDependents
标记
node
deletingDependents
属性为
true
,代表该
node
dependents
正在被删除,并将对应的
node
及其
dependents
attemptToDelete
GarbageCollector
去消费处理。
// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) processTransitions(oldObj interface{}, newAccessor metav1.Object, n *node) {
if startsWaitingForDependentsOrphaned(oldObj, newAccessor) {
klog.V(5).Infof("add %s to the attemptToOrphan", n.identity)
gb.attemptToOrphan.Add(n)
return
}
if startsWaitingForDependentsDeleted(oldObj, newAccessor) {
klog.V(2).Infof("add %s to the attemptToDelete, because it's waiting for its dependents to be deleted", n.identity)
// if the n is added as a "virtual" node, its deletingDependents field is not properly set, so always set it here.
n.markDeletingDependents()
for dep := range n.dependents {
gb.attemptToDelete.Add(dep)
}
gb.attemptToDelete.Add(n)
}
}
func startsWaitingForDependentsOrphaned(oldObj interface{}, newAccessor metav1.Object) bool {
return deletionStartsWithFinalizer(oldObj, newAccessor, metav1.FinalizerOrphanDependents)
}
func startsWaitingForDependentsDeleted(oldObj interface{}, newAccessor metav1.Object) bool {
return deletionStartsWithFinalizer(oldObj, newAccessor, metav1.FinalizerDeleteDependents)
}
func deletionStartsWithFinalizer(oldObj interface{}, newAccessor metav1.Object, matchingFinalizer string) bool {
// if the new object isn't being deleted, or doesn't have the finalizer we're interested in, return false
if !beingDeleted(newAccessor) || !hasFinalizer(newAccessor, matchingFinalizer) {
return false
}
// if the old object is nil, or wasn't being deleted, or didn't have the finalizer, return true
if oldObj == nil {
return true
}
oldAccessor, err := meta.Accessor(oldObj)
if err != nil {
utilruntime.HandleError(fmt.Errorf("cannot access oldObj: %v", err))
return false
}
return !beingDeleted(oldAccessor) || !hasFinalizer(oldAccessor, matchingFinalizer)
}
func beingDeleted(accessor metav1.Object) bool {
return accessor.GetDeletionTimestamp() != nil
}
func hasFinalizer(accessor metav1.Object, matchingFinalizer string) bool {
finalizers := accessor.GetFinalizers()
for _, finalizer := range finalizers {
if finalizer == matchingFinalizer {
return true
}
}
return false
}
1.2.3 gb.removeNode
gb.removeNode
uidToNode
node
owners
dependents
dependents
attemptToDelete
GarbageCollector
node
owners
owner
owner
node
owner
attemptToDelete
GarbageCollector
// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) removeNode(n *node) {
gb.uidToNode.Delete(n.identity.UID)
gb.removeDependentFromOwners(n, n.owners)
}
func (gb *GraphBuilder) removeDependentFromOwners(n *node, owners []metav1.OwnerReference) {
for _, owner := range owners {
ownerNode, ok := gb.uidToNode.Read(owner.UID)
if !ok {
continue
}
ownerNode.deleteDependent(n)
}
}
2.GarbageCollector
再来看到
GarbageCollector
GarbageCollector 主要有2个功能:
(1)处理
attemptToDelete
队列中的事件,根据对象删除策略
foreground
background
做相应的回收逻辑处理,删除关联对象;
attemptToOrphan
Orphan
,更新该
owner
dependents
对象,将对象的
OwnerReferences
属性中该
owner
的相关字段去除,接着再更新该
owner
对象,去除
Orphan
finalizers
GarbageCollector的2个关键处理方法:
gc.runAttemptToDeleteWorker
:主要负责处理
attemptToDelete
队列中的事件,负责删除策略为
foreground
background
的对象回收处理;
gc.runAttemptToOrphanWorker
attemptToOrphan
Orphan
的对象回收处理。
2.1 GarbageCollector struct
GarbageCollector struct
attemptToDelete
attemptToOrphan
GraphBuilder
attemptToDelete
attemptToOrphan
GarbageCollector
attemptToDelete
attemptToOrphan
// pkg/controller/garbagecollector/garbagecollector.go
type GarbageCollector struct {
...
attemptToDelete workqueue.RateLimitingInterface
attemptToOrphan workqueue.RateLimitingInterface
...
}
2.2 GarbageCollector-gc.runAttemptToDeleteWorker
GarbageCollector
gc.runAttemptToDeleteWorker
runAttemptToDeleteWorker主要逻辑为循环调用
attemptToDeleteWorker
方法。
attemptToDeleteWorker方法主要逻辑:
attemptToDelete
队列中取出对象;
(2)调用
gc.attemptToDeleteItem
尝试删除
node
;
(3)若删除失败则重新加入到
attemptToDelete
队列中进行重试。
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) runAttemptToDeleteWorker() {
for gc.attemptToDeleteWorker() {
}
}
func (gc *GarbageCollector) attemptToDeleteWorker() bool {
item, quit := gc.attemptToDelete.Get()
gc.workerLock.RLock()
defer gc.workerLock.RUnlock()
if quit {
return false
}
defer gc.attemptToDelete.Done(item)
n, ok := item.(*node)
if !ok {
utilruntime.HandleError(fmt.Errorf("expect *node, got %#v", item))
return true
}
err := gc.attemptToDeleteItem(n)
if err != nil {
if _, ok := err.(*restMappingError); ok {
// There are at least two ways this can happen:
// 1. The reference is to an object of a custom type that has not yet been
// recognized by gc.restMapper (this is a transient error).
// 2. The reference is to an invalid group/version. We don't currently
// have a way to distinguish this from a valid type we will recognize
// after the next discovery sync.
// For now, record the error and retry.
klog.V(5).Infof("error syncing item %s: %v", n, err)
} else {
utilruntime.HandleError(fmt.Errorf("error syncing item %s: %v", n, err))
}
// retry if garbage collection of an object failed.
gc.attemptToDelete.AddRateLimited(item)
} else if !n.isObserved() {
// requeue if item hasn't been observed via an informer event yet.
// otherwise a virtual node for an item added AND removed during watch reestablishment can get stuck in the graph and never removed.
// see https://issue.k8s.io/56121
klog.V(5).Infof("item %s hasn't been observed via informer yet", n.identity)
gc.attemptToDelete.AddRateLimited(item)
}
return true
}
2.2.1 gc.attemptToDeleteItem
(1)判断
node
是否处于删除状态;
(2)从
apiserver
获取该
node
对应的对象;
(3)调用
item.isDeletingDependents
方法:通过
node
deletingDependents
字段判断该
node
当前是否正在删除
dependents
,若是则调用
gc.processDeletingDependentsItem
方法对
dependents
做进一步处理:检查该
node
blockingDependents
是否被完全删除,若是则移除该
node
对应对象的相关
finalizer
,若否,则将未删除的
blockingDependents
attemptToDelete
上面分析
GraphBuilder
时说到,在
GraphBuilder
处理
graphChanges
中的事件时,在
processTransitions
方法逻辑里,会调用
n.markDeletingDependents
,标记
node
deletingDependents
true
(4)调用
gc.classifyReferences
node
owner
分为3类,分别是
solid
(至少有一个
owner
存在且不处于删除状态)、
dangling
(
owner
均不存在)、
waitingForDependentsDeletion
owner
存在,处于删除状态且正在等待其
dependents
被删除);
(5)接下来将根据
solid
dangling
waitingForDependentsDeletion
的数量做不同的逻辑处理;
(6)第一种情况:当
solid
数量不为0时,即该
node
至少有一个
owner
存在且不处于删除状态,则说明该对象还不能被回收删除,此时将
dangling
waitingForDependentsDeletion
列表中的
owner
node
ownerReferences
中删除;
(7)第二种情况:
solid
数量为0,该
node
owner
处于
waitingForDependentsDeletion
状态并且
node
dependents
未被完全删除,将使用
foreground
前台删除策略来删除该
node
(8)当不满足以上两种情况时(即),进入该默认处理逻辑:按照删除对象时使用的删除策略,调用
apiserver
的接口删除对象。
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) attemptToDeleteItem(item *node) error {
klog.V(2).Infof("processing item %s", item.identity)
// "being deleted" is an one-way trip to the final deletion. We'll just wait for the final deletion, and then process the object's dependents.
if item.isBeingDeleted() && !item.isDeletingDependents() {
klog.V(5).Infof("processing item %s returned at once, because its DeletionTimestamp is non-nil", item.identity)
return nil
}
// TODO: It's only necessary to talk to the API server if this is a
// "virtual" node. The local graph could lag behind the real status, but in
// practice, the difference is small.
latest, err := gc.getObject(item.identity)
switch {
case errors.IsNotFound(err):
// the GraphBuilder can add "virtual" node for an owner that doesn't
// exist yet, so we need to enqueue a virtual Delete event to remove
// the virtual node from GraphBuilder.uidToNode.
klog.V(5).Infof("item %v not found, generating a virtual delete event", item.identity)
gc.dependencyGraphBuilder.enqueueVirtualDeleteEvent(item.identity)
// since we're manually inserting a delete event to remove this node,
// we don't need to keep tracking it as a virtual node and requeueing in attemptToDelete
item.markObserved()
return nil
case err != nil:
return err
}
if latest.GetUID() != item.identity.UID {
klog.V(5).Infof("UID doesn't match, item %v not found, generating a virtual delete event", item.identity)
gc.dependencyGraphBuilder.enqueueVirtualDeleteEvent(item.identity)
// since we're manually inserting a delete event to remove this node,
// we don't need to keep tracking it as a virtual node and requeueing in attemptToDelete
item.markObserved()
return nil
}
// TODO: attemptToOrphanWorker() routine is similar. Consider merging
// attemptToOrphanWorker() into attemptToDeleteItem() as well.
if item.isDeletingDependents() {
return gc.processDeletingDependentsItem(item)
}
// compute if we should delete the item
ownerReferences := latest.GetOwnerReferences()
if len(ownerReferences) == 0 {
klog.V(2).Infof("object %s's doesn't have an owner, continue on next item", item.identity)
return nil
}
solid, dangling, waitingForDependentsDeletion, err := gc.classifyReferences(item, ownerReferences)
if err != nil {
return err
}
klog.V(5).Infof("classify references of %s.\nsolid: %#v\ndangling: %#v\nwaitingForDependentsDeletion: %#v\n", item.identity, solid, dangling, waitingForDependentsDeletion)
switch {
case len(solid) != 0:
klog.V(2).Infof("object %#v has at least one existing owner: %#v, will not garbage collect", item.identity, solid)
if len(dangling) == 0 && len(waitingForDependentsDeletion) == 0 {
return nil
}
klog.V(2).Infof("remove dangling references %#v and waiting references %#v for object %s", dangling, waitingForDependentsDeletion, item.identity)
// waitingForDependentsDeletion needs to be deleted from the
// ownerReferences, otherwise the referenced objects will be stuck with
// the FinalizerDeletingDependents and never get deleted.
ownerUIDs := append(ownerRefsToUIDs(dangling), ownerRefsToUIDs(waitingForDependentsDeletion)...)
patch := deleteOwnerRefStrategicMergePatch(item.identity.UID, ownerUIDs...)
_, err = gc.patch(item, patch, func(n *node) ([]byte, error) {
return gc.deleteOwnerRefJSONMergePatch(n, ownerUIDs...)
})
return err
case len(waitingForDependentsDeletion) != 0 && item.dependentsLength() != 0:
deps := item.getDependents()
for _, dep := range deps {
if dep.isDeletingDependents() {
// this circle detection has false positives, we need to
// apply a more rigorous detection if this turns out to be a
// problem.
// there are multiple workers run attemptToDeleteItem in
// parallel, the circle detection can fail in a race condition.
klog.V(2).Infof("processing object %s, some of its owners and its dependent [%s] have FinalizerDeletingDependents, to prevent potential cycle, its ownerReferences are going to be modified to be non-blocking, then the object is going to be deleted with Foreground", item.identity, dep.identity)
patch, err := item.unblockOwnerReferencesStrategicMergePatch()
if err != nil {
return err
}
if _, err := gc.patch(item, patch, gc.unblockOwnerReferencesJSONMergePatch); err != nil {
return err
}
break
}
}
klog.V(2).Infof("at least one owner of object %s has FinalizerDeletingDependents, and the object itself has dependents, so it is going to be deleted in Foreground", item.identity)
// the deletion event will be observed by the graphBuilder, so the item
// will be processed again in processDeletingDependentsItem. If it
// doesn't have dependents, the function will remove the
// FinalizerDeletingDependents from the item, resulting in the final
// deletion of the item.
policy := metav1.DeletePropagationForeground
return gc.deleteObject(item.identity, &policy)
default:
// item doesn't have any solid owner, so it needs to be garbage
// collected. Also, none of item's owners is waiting for the deletion of
// the dependents, so set propagationPolicy based on existing finalizers.
var policy metav1.DeletionPropagation
switch {
case hasOrphanFinalizer(latest):
// if an existing orphan finalizer is already on the object, honor it.
policy = metav1.DeletePropagationOrphan
case hasDeleteDependentsFinalizer(latest):
// if an existing foreground finalizer is already on the object, honor it.
policy = metav1.DeletePropagationForeground
default:
// otherwise, default to background.
policy = metav1.DeletePropagationBackground
}
klog.V(2).Infof("delete object %s with propagation policy %s", item.identity, policy)
return gc.deleteObject(item.identity, &policy)
}
}
gc.processDeletingDependentsItem
主要逻辑:检查该
node
blockingDependents
(即阻塞
owner
删除的
dpendents
)是否被完全删除,若是则移除该
node
finalizer
(finalizer移除后,kube-apiserver会删除该对象),若否,则将未删除的
blockingDependents
attemptToDelete
队列中。
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) processDeletingDependentsItem(item *node) error {
blockingDependents := item.blockingDependents()
if len(blockingDependents) == 0 {
klog.V(2).Infof("remove DeleteDependents finalizer for item %s", item.identity)
return gc.removeFinalizer(item, metav1.FinalizerDeleteDependents)
}
for _, dep := range blockingDependents {
if !dep.isDeletingDependents() {
klog.V(2).Infof("adding %s to attemptToDelete, because its owner %s is deletingDependents", dep.identity, item.identity)
gc.attemptToDelete.Add(dep)
}
}
return nil
}
item.blockingDependents
item.blockingDependents返回会阻塞
node
dependents
。一个
dependents
会不会阻塞
owner
的删除,主要看这个
dependents
ownerReferences
blockOwnerDeletion
属性值是否为
true
,为
true
则代表该
dependents
会阻塞
owner
的删除。
// pkg/controller/garbagecollector/graph.go
func (n *node) blockingDependents() []*node {
dependents := n.getDependents()
var ret []*node
for _, dep := range dependents {
for _, owner := range dep.owners {
if owner.UID == n.identity.UID && owner.BlockOwnerDeletion != nil && *owner.BlockOwnerDeletion {
ret = append(ret, dep)
}
}
}
return ret
}
2.3 GarbageCollector-gc.runAttemptToOrphanWorker
gc.runAttemptToOrphanWorker方法是负责处理
orphan
删除策略删除的
node
gc.runAttemptToDeleteWorker主要逻辑为循环调用
gc.attemptToDeleteWorker
下面来看一下
gc.attemptToDeleteWorker
方法的主要逻辑:
attemptToOrphan
gc.orphanDependents
方法:更新该
owner
dependents
OwnerReferences
owner
的相关字段去除,失败则将该
owner
重新加入到
attemptToOrphan
gc.removeFinalizer
owner
Orphan
finalizers
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) runAttemptToOrphanWorker() {
for gc.attemptToOrphanWorker() {
}
}
func (gc *GarbageCollector) attemptToOrphanWorker() bool {
item, quit := gc.attemptToOrphan.Get()
gc.workerLock.RLock()
defer gc.workerLock.RUnlock()
if quit {
return false
}
defer gc.attemptToOrphan.Done(item)
owner, ok := item.(*node)
if !ok {
utilruntime.HandleError(fmt.Errorf("expect *node, got %#v", item))
return true
}
// we don't need to lock each element, because they never get updated
owner.dependentsLock.RLock()
dependents := make([]*node, 0, len(owner.dependents))
for dependent := range owner.dependents {
dependents = append(dependents, dependent)
}
owner.dependentsLock.RUnlock()
err := gc.orphanDependents(owner.identity, dependents)
if err != nil {
utilruntime.HandleError(fmt.Errorf("orphanDependents for %s failed with %v", owner.identity, err))
gc.attemptToOrphan.AddRateLimited(item)
return true
}
// update the owner, remove "orphaningFinalizer" from its finalizers list
err = gc.removeFinalizer(owner, metav1.FinalizerOrphanDependents)
if err != nil {
utilruntime.HandleError(fmt.Errorf("removeOrphanFinalizer for %s failed with %v", owner.identity, err))
gc.attemptToOrphan.AddRateLimited(item)
}
return true
}
2.3.1 gc.orphanDependents
主要逻辑:更新指定
owner
dependents
OwnerReferences
owner
的相关字段去除,对于每个
dependents
,分别起一个goroutine来处理,加快处理速度。
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) orphanDependents(owner objectReference, dependents []*node) error {
errCh := make(chan error, len(dependents))
wg := sync.WaitGroup{}
wg.Add(len(dependents))
for i := range dependents {
go func(dependent *node) {
defer wg.Done()
// the dependent.identity.UID is used as precondition
patch := deleteOwnerRefStrategicMergePatch(dependent.identity.UID, owner.UID)
_, err := gc.patch(dependent, patch, func(n *node) ([]byte, error) {
return gc.deleteOwnerRefJSONMergePatch(n, owner.UID)
})
// note that if the target ownerReference doesn't exist in the
// dependent, strategic merge patch will NOT return an error.
if err != nil && !errors.IsNotFound(err) {
errCh <- fmt.Errorf("orphaning %s failed, %v", dependent.identity, err)
}
}(dependents[i])
}
wg.Wait()
close(errCh)
var errorsSlice []error
for e := range errCh {
errorsSlice = append(errorsSlice, e)
}
if len(errorsSlice) != 0 {
return fmt.Errorf("failed to orphan dependents of owner %s, got errors: %s", owner, utilerrors.NewAggregate(errorsSlice).Error())
}
klog.V(5).Infof("successfully updated all dependents of owner %s", owner)
return nil
}
总结
先来回顾一下
garbage collector
的构架与核心处理逻辑。
GraphBuilder
GarbageCollector
graphChanges
attemptToDelete
attemptToOrphan
)。
从apiserver list/watch的事件会放入到
graphChanges
队列,而
GraphBuilder
graphChanges
队列中取出事件进行处理,构建对象关联依赖关系图,并根据对象删除策略将关联对象放入
attemptToDelete
attemptToOrphan
队列中,接着
GarbageCollector
会从
attemptToDelete
attemptToOrphan
队列中取出事件,再从对象关联依赖关系图中获取信息进行处理,最后回收删除对象。
总结一下3种对象删除策略下,
node
及其对象的删除过程。
dependent
deletionTimestamp
metadata.finalizers
foregroundDeletion
dependent
ownerReference.blockOwnerDeletion=true
metadata.finalizers
foregroundDeletion
owner
owner
dependent
Background
Finalizer
foreground
Orphan
Finalizer
GraphBuilder
dependents
attemptToDelete
GarbageCollector
dependents
dependent
dependent
Orphan
metadata.finalizers
orphan
GarbageCollector
dependents
OwnerReferences
owner
owner
metadata.finalizers
Orphan
owner