k8s gc分析。Kubernetes garbage collector即垃圾收集器,存在于kube-controller-manger中,它負責回收kubernetes中的資源對象,監聽資源對象事件,更新對象之間的依賴關系,并根據對象的删除政策來決定是否删除其關聯對象。
garbage collector介紹
Kubernetes garbage collector即垃圾收集器,存在于kube-controller-manger中,它負責回收kubernetes中的資源對象,監聽資源對象事件,更新對象之間的依賴關系,并根據對象的删除政策來決定是否删除其關聯對象。
關于删除關聯對象,細一點說就是,使用級聯删除政策去删除一個
owner
時,會連帶這個
owner
對象的
dependent
對象也一起删除掉。
關于對象的關聯依賴關系,garbage collector會監聽資源對象事件,根據資源對象中
ownerReference
的值,來建構對象間的關聯依賴關系,也即
owner
與
dependent
之間的關系。
關于owner與dependent的介紹
以建立deployment對象為例進行講解。
建立deployment對象後,kube-controller-manager為其建立出replicaset對象,且自動将該deployment的資訊設定到replicaset對象
ownerReference
值。如下面示例,即說明replicaset對象
test-1-59d7f45ffb
的
owner
為deployment對象
test-1
,deployment對象
test-1
dependent
為replicaset對象
test-1-59d7f45ffb
。
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-1
namespace: test
uid: 4973d370-3221-46a7-8d86-e145bf9ad0ce
...
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: test-1-59d7f45ffb
namespace: test
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: Deployment
name: test-1
uid: 4973d370-3221-46a7-8d86-e145bf9ad0ce
uid: 386c380b-490e-470b-a33f-7d5b0bf945fb
...
同理,replicaset對象建立後,kube-controller-manager為其建立出pod對象,這些pod對象也會将replicaset對象的資訊設定到pod對象的
ownerReference
的值中,replicaset是pod的
owner
,pod是replicaset的
dependent
對象中
ownerReference
的值,指定了
owner
dependent
garbage collector架構圖
garbage collectort的詳細架構與核心處理邏輯如下圖。

garbage collector中最關鍵的代碼就是
garbagecollector.go
graph_builder.go
兩部分。
garbage collector的主要組成為1個圖(對象關聯依賴關系圖)、2個處理器(
GraphBuilder
GarbageCollector
)、3個事件隊列(
graphChanges
、
attemptToDelete
attemptToOrphan
):
1個圖
(1)
uidToNode
:對象關聯依賴關系圖,由
GraphBuilder
維護,維護着所有對象間的關聯依賴關系。在該圖裡,每一個k8s對象會對應着關系圖裡的一個
node
,而每個
node
都會維護一個
owner
清單以及
dependent
清單。
示例:現有一個deployment A,replicaset B(owner為deployment A),pod C(owner為replicaset B),則對象關聯依賴關系如下:
3個node,分别是A、B、C
A對應一個node,無owner,dependent清單裡有B;
B對應一個node,owner清單裡有A,dependent清單裡有C;
C對應一個node,owner清單裡有B,無dependent。
2個處理器
GraphBuilder
:負責維護所有對象的關聯依賴關系圖,并産生事件觸發
GarbageCollector
執行對象回收删除操作。
GraphBuilder
從
graphChanges
事件隊列中擷取事件進行消費,根據資源對象中
ownerReference
的值,來建構、更新、删除對象間的關聯依賴關系圖,也即
owner
dependent
之間的關系圖,然後再作為生産者生産事件,放入
attemptToDelete
或
attemptToOrphan
隊列中,觸發
GarbageCollector
執行,看是否需要進行關聯對象的回收删除操作,而
GarbageCollector
進行對象的回收删除操作時會依賴于
uidToNode
這個關系圖。
(2)
GarbageCollector
:負責回收删除對象。
GarbageCollector
作為消費者,從
attemptToDelete
attemptToOrphan
隊列中取出事件進行處理,若一個對象被删除,且其删除政策為級聯删除,則進行關聯對象的回收删除。關于删除關聯對象,細一點說就是,使用級聯删除政策去删除一個
owner
owner
dependent
3個事件隊列
graphChanges
:list/watch apiserver,擷取事件,由
informer
生産,由
GraphBuilder
消費;
attemptToDelete
:級聯删除事件隊列,由
GraphBuilder
GarbageCollector
(3)
attemptToOrphan
:孤兒删除事件隊列,由
GraphBuilder
GarbageCollector
消費。
對象删除政策
kubernetes 中有三種對象删除政策:
Orphan
Foreground
和
Background
,删除某個對象時,可以指定删除政策。下面對這三種政策進行介紹。
Foreground前台删除
Foreground即前台删除政策,屬于級聯删除政策,垃圾收集器會删除對象的所有
dependent
使用前台删除政策删除某個對象時,該對象的
deletionTimestamp
字段被設定,且對象的
metadata.finalizers
字段包含值
foregroundDeletion
,用于阻塞該對象删除,等到垃圾收集器在删除了該對象中所有有阻塞能力的
dependent
對象(對象的
ownerReference.blockOwnerDeletion=true
) 之後,再去除該對象的
metadata.finalizers
字段中的值
foregroundDeletion
,然後删除該
owner
對象。
以删除deployment為例,使用前台删除政策,則按照Pod->ReplicaSet->Deployment的順序進行删除。
Background背景删除
Background即背景删除政策,屬于級聯删除政策,Kubernetes會立即删除該
owner
對象,之後垃圾收集器會在背景自動删除其所有的
dependent
當删除一個對象時使用了
Background
背景删除政策時,該對象因沒有相關的
Finalizer
設定(隻有删除政策為
foreground
Orphan
時會設定相關
Finalizer
),會直接被删除,接着
GraphBuilder
會監聽到該對象的delete事件,會将其
dependents
放入到
attemptToDelete
隊列中去,觸發
GarbageCollector
做
dependents
對象的回收删除處理。
以删除deployment為例,使用背景删除政策,則按照Deployment->ReplicaSet->Pod的順序進行删除。
Orphan孤兒删除
Orphan即孤兒删除政策,屬于非級聯删除政策,即删除某個對象時,不會自動删除它的
dependent
,這些
dependent
也被稱作孤立對象。
Orphan
孤兒删除政策時,該對象的
metadata.finalizers
orphan
,用于阻塞該對象删除,直至
GarbageCollector
将其所有
dependents
OwnerReferences
屬性中的該
owner
的相關字段去除,再去除該
owner
metadata.finalizers
Orphan
,最後才能删除該
owner
以删除deployment為例,使用孤兒删除政策,則隻删除Deployment,對應ReplicaSet和Pod不删除。
删除對象時指定删除政策
當删除對象時沒有特别指定删除政策,将會使用預設删除政策:Background即背景删除政策。
(1)指定背景删除政策
curl -X DELETE localhost:8080/apis/apps/v1/namespaces/default/replicasets/my-repset \
-d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Background"}' \
-H "Content-Type: application/json"
(2)指定前台删除政策
curl -X DELETE localhost:8080/apis/apps/v1/namespaces/default/replicasets/my-repset \
-d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Foreground"}' \
-H "Content-Type: application/json"
(3)指定孤兒删除政策
curl -X DELETE localhost:8080/apis/apps/v1/namespaces/default/replicasets/my-repset \
-d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Orphan"}' \
-H "Content-Type: application/json"
garbage collector的源碼分析分成兩部分進行,分别是:
(1)啟動分析;
(2)核心處理邏輯分析。
上一篇部落格已經對garbage collector的啟動進行了分析,本篇部落格對garbage collector的核心處理邏輯進行分析。
garbage collector源碼分析-處理邏輯分析
基于tag v1.17.4
https://github.com/kubernetes/kubernetes/releases/tag/v1.17.4
前面講過,
garbage collector
中最關鍵的代碼就是
garbagecollector.go
graph_builder.go
兩部分,也即
GarbageCollector struct
GraphBuilder struct
,是以下面處理邏輯分析将分成兩大塊進行分析。
1.GraphBuilder
首先先看到
GraphBuilder
GraphBuilder 主要有2個功能:
(1)基于 informers 中的資源事件在
uidToNode
屬性中維護着所有對象的關聯依賴關系;
(2)處理
graphChanges
中的事件,并作為生産者将事件放入到
attemptToDelete
attemptToOrphan
兩個隊列中,觸發消費者
GarbageCollector
進行對象的回收删除操作。
1.1 GraphBuilder struct
先來簡單的分析下
GraphBuilder struct
,裡面最關鍵的幾個屬性及作用如下:
graphChanges
:informers 監聽到的事件會放在
graphChanges
中,然後
GraphBuilder
會作為消費者,處理
graphChanges
隊列中的事件;
uidToNode
(對象依賴關聯關系圖):根據對象uid,維護所有對象的關聯依賴關系,也即前面說的
owner
dependent
之間的關系,也可以了解為
GraphBuilder
會維護一張所有對象的關聯依賴關系圖,而
GarbageCollector
進行對象的回收删除操作時會依賴于這個關系圖;
attemptToDelete
attemptToOrphan
:
GraphBuilder
作為生産者往
attemptToDelete
attemptToOrphan
兩個隊列中存放事件,然後
GarbageCollector
作為消費者會處理
attemptToDelete
attemptToOrphan
兩個隊列中的事件。
// pkg/controller/garbagecollector/graph_builder.go
type GraphBuilder struct {
...
// monitors are the producer of the graphChanges queue, graphBuilder alters
// the in-memory graph according to the changes.
graphChanges workqueue.RateLimitingInterface
// uidToNode doesn't require a lock to protect, because only the
// single-threaded GraphBuilder.processGraphChanges() reads/writes it.
uidToNode *concurrentUIDToNode
// GraphBuilder is the producer of attemptToDelete and attemptToOrphan, GC is the consumer.
attemptToDelete workqueue.RateLimitingInterface
attemptToOrphan workqueue.RateLimitingInterface
...
}
// pkg/controller/garbagecollector/graph.go
type concurrentUIDToNode struct {
uidToNodeLock sync.RWMutex
uidToNode map[types.UID]*node
}
// pkg/controller/garbagecollector/graph.go
type node struct {
...
dependents map[*node]struct{}
...
owners []metav1.OwnerReference
}
從結構體定義中可以看到,一個k8s對象對應着對象關聯依賴關系圖裡的一個
node
node
都會維護一
個owner
dependent
1.2 GraphBuilder-gb.processGraphChanges
接下來看到
GraphBuilder
的處理邏輯部分,從
gb.processGraphChanges
作為入口進行處理邏輯分析。
前面說過,informers 監聽到的事件會放入到
graphChanges
隊列中,然後
GraphBuilder
graphChanges
隊列中的事件,而
processGraphChanges
方法就是
GraphBuilder
作為消費者處理
graphChanges
隊列中事件地方。
是以在此方法中,
GraphBuilder
既是消費者又是生産者,消費處理
graphChanges
中的所有事件并進行分類,再生産事件放入到
attemptToDelete
attemptToOrphan
兩個隊列中去,讓
GarbageCollector
作為消費者去處理這兩個隊列中的事件。
主要邏輯:
(1)從
graphChanges
隊列中取出事件進行處理;
(2)讀取
uidToNode
,判斷該對象是否已經存在于已建構的對象依賴關聯關系圖中;下面就開始根據對象是否存在于對象依賴關聯關系圖中以及事件類型來做不同的處理邏輯;
(3)若
uidToNode
中不存在該
node
且該事件是
addEvent
updateEvent
,則為該
object
建立對應的
node
,并調用
gb.insertNode
将該
node
加到
uidToNode
中,然後将該
node
添加到其
owner
dependents
中;
然後再調用
gb.processTransitions
方法做處理,該方法的處理邏輯是判斷該對象是否處于删除狀态,若處于删除狀态會判斷該對象是以
orphan
模式删除還是以
foreground
模式删除(其實就是判斷deployment對象的finalizer來區分删除模式,删除deployment的時候會帶上删除政策,kube-apiserver會根據删除政策給deployment對象打上相應的finalizer),若以
orphan
模式删除,則将該
node
加入到
attemptToOrphan
隊列中,若以
foreground
模式删除則将該對象以及其所有
dependents
都加入到
attemptToDelete
隊列中;
(4)若
uidToNode
中存在該
node
addEvent
updateEvent
時,則調用
referencesDiffs
方法檢查該對象的
OwnerReferences
字段是否有變化,有變化則做相應處理,更新對象依賴關聯關系圖,最後調用
gb.processTransitions
做處理;
(5)若事件為删除事件,則調用
gb.removeNode
,從
uidToNode
中删除該對象,然後從該
node
所有
owners
dependents
中删除該對象,再把該對象的
dependents
attemptToDelete
GarbageCollector
處理;最後檢查該
node
的所有
owners
,若有處于删除狀态的
owner
,此時該
owner
可能處于删除阻塞狀态正在等待該
node
的删除,将該
owner
attemptToDelete
GarbageCollector
處理。
// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) runProcessGraphChanges() {
for gb.processGraphChanges() {
}
}
// Dequeueing an event from graphChanges, updating graph, populating dirty_queue.
func (gb *GraphBuilder) processGraphChanges() bool {
item, quit := gb.graphChanges.Get()
if quit {
return false
}
defer gb.graphChanges.Done(item)
event, ok := item.(*event)
if !ok {
utilruntime.HandleError(fmt.Errorf("expect a *event, got %v", item))
return true
}
obj := event.obj
accessor, err := meta.Accessor(obj)
if err != nil {
utilruntime.HandleError(fmt.Errorf("cannot access obj: %v", err))
return true
}
klog.V(5).Infof("GraphBuilder process object: %s/%s, namespace %s, name %s, uid %s, event type %v", event.gvk.GroupVersion().String(), event.gvk.Kind, accessor.GetNamespace(), accessor.GetName(), string(accessor.GetUID()), event.eventType)
// Check if the node already exists
existingNode, found := gb.uidToNode.Read(accessor.GetUID())
if found {
// this marks the node as having been observed via an informer event
// 1. this depends on graphChanges only containing add/update events from the actual informer
// 2. this allows things tracking virtual nodes' existence to stop polling and rely on informer events
existingNode.markObserved()
}
switch {
case (event.eventType == addEvent || event.eventType == updateEvent) && !found:
newNode := &node{
identity: objectReference{
OwnerReference: metav1.OwnerReference{
APIVersion: event.gvk.GroupVersion().String(),
Kind: event.gvk.Kind,
UID: accessor.GetUID(),
Name: accessor.GetName(),
},
Namespace: accessor.GetNamespace(),
},
dependents: make(map[*node]struct{}),
owners: accessor.GetOwnerReferences(),
deletingDependents: beingDeleted(accessor) && hasDeleteDependentsFinalizer(accessor),
beingDeleted: beingDeleted(accessor),
}
gb.insertNode(newNode)
// the underlying delta_fifo may combine a creation and a deletion into
// one event, so we need to further process the event.
gb.processTransitions(event.oldObj, accessor, newNode)
case (event.eventType == addEvent || event.eventType == updateEvent) && found:
// handle changes in ownerReferences
added, removed, changed := referencesDiffs(existingNode.owners, accessor.GetOwnerReferences())
if len(added) != 0 || len(removed) != 0 || len(changed) != 0 {
// check if the changed dependency graph unblock owners that are
// waiting for the deletion of their dependents.
gb.addUnblockedOwnersToDeleteQueue(removed, changed)
// update the node itself
existingNode.owners = accessor.GetOwnerReferences()
// Add the node to its new owners' dependent lists.
gb.addDependentToOwners(existingNode, added)
// remove the node from the dependent list of node that are no longer in
// the node's owners list.
gb.removeDependentFromOwners(existingNode, removed)
}
if beingDeleted(accessor) {
existingNode.markBeingDeleted()
}
gb.processTransitions(event.oldObj, accessor, existingNode)
case event.eventType == deleteEvent:
if !found {
klog.V(5).Infof("%v doesn't exist in the graph, this shouldn't happen", accessor.GetUID())
return true
}
// removeNode updates the graph
gb.removeNode(existingNode)
existingNode.dependentsLock.RLock()
defer existingNode.dependentsLock.RUnlock()
if len(existingNode.dependents) > 0 {
gb.absentOwnerCache.Add(accessor.GetUID())
}
for dep := range existingNode.dependents {
gb.attemptToDelete.Add(dep)
}
for _, owner := range existingNode.owners {
ownerNode, found := gb.uidToNode.Read(owner.UID)
if !found || !ownerNode.isDeletingDependents() {
continue
}
// this is to let attempToDeleteItem check if all the owner's
// dependents are deleted, if so, the owner will be deleted.
gb.attemptToDelete.Add(ownerNode)
}
}
return true
}
結合代碼分析可以得知,當删除一個對象時使用了
Background
Finalizer
Foreground
Orphan
Finalizer
GraphBuilder
dependents
attemptToDelete
GarbageCollector
dependents
1.2.1 gb.insertNode
調用
gb.insertNode
将
node
uidToNode
node
owner
dependents
中。
// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) insertNode(n *node) {
gb.uidToNode.Write(n)
gb.addDependentToOwners(n, n.owners)
}
func (gb *GraphBuilder) addDependentToOwners(n *node, owners []metav1.OwnerReference) {
for _, owner := range owners {
ownerNode, ok := gb.uidToNode.Read(owner.UID)
if !ok {
// Create a "virtual" node in the graph for the owner if it doesn't
// exist in the graph yet.
ownerNode = &node{
identity: objectReference{
OwnerReference: owner,
Namespace: n.identity.Namespace,
},
dependents: make(map[*node]struct{}),
virtual: true,
}
klog.V(5).Infof("add virtual node.identity: %s\n\n", ownerNode.identity)
gb.uidToNode.Write(ownerNode)
}
ownerNode.addDependent(n)
if !ok {
// Enqueue the virtual node into attemptToDelete.
// The garbage processor will enqueue a virtual delete
// event to delete it from the graph if API server confirms this
// owner doesn't exist.
gb.attemptToDelete.Add(ownerNode)
}
}
}
1.2.2 gb.processTransitions
gb.processTransitions 方法檢查k8s對象是否處于删除狀态(對象的
deletionTimestamp
屬性不為空則處于删除狀态),并且對象裡含有删除政策對應的
finalizer
,然後做相應的處理。
因為隻有删除政策為
Foreground
Orphan
時對象才會會設定相關
Finalizer
,是以該方法隻會處理删除政策為
Foreground
Orphan
的對象,對于删除政策為
Background
的對象不做處理。
若對象的
deletionTimestamp
屬性不為空,且有
Orphaned
删除政策對應的
finalizer
,則将對應的
node
attemptToOrphan
GarbageCollector
去消費處理;
deletionTimestamp
foreground
finalizer
,則調用
n.markDeletingDependents
标記
node
deletingDependents
屬性為
true
,代表該
node
dependents
正在被删除,并将對應的
node
及其
dependents
attemptToDelete
GarbageCollector
去消費處理。
// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) processTransitions(oldObj interface{}, newAccessor metav1.Object, n *node) {
if startsWaitingForDependentsOrphaned(oldObj, newAccessor) {
klog.V(5).Infof("add %s to the attemptToOrphan", n.identity)
gb.attemptToOrphan.Add(n)
return
}
if startsWaitingForDependentsDeleted(oldObj, newAccessor) {
klog.V(2).Infof("add %s to the attemptToDelete, because it's waiting for its dependents to be deleted", n.identity)
// if the n is added as a "virtual" node, its deletingDependents field is not properly set, so always set it here.
n.markDeletingDependents()
for dep := range n.dependents {
gb.attemptToDelete.Add(dep)
}
gb.attemptToDelete.Add(n)
}
}
func startsWaitingForDependentsOrphaned(oldObj interface{}, newAccessor metav1.Object) bool {
return deletionStartsWithFinalizer(oldObj, newAccessor, metav1.FinalizerOrphanDependents)
}
func startsWaitingForDependentsDeleted(oldObj interface{}, newAccessor metav1.Object) bool {
return deletionStartsWithFinalizer(oldObj, newAccessor, metav1.FinalizerDeleteDependents)
}
func deletionStartsWithFinalizer(oldObj interface{}, newAccessor metav1.Object, matchingFinalizer string) bool {
// if the new object isn't being deleted, or doesn't have the finalizer we're interested in, return false
if !beingDeleted(newAccessor) || !hasFinalizer(newAccessor, matchingFinalizer) {
return false
}
// if the old object is nil, or wasn't being deleted, or didn't have the finalizer, return true
if oldObj == nil {
return true
}
oldAccessor, err := meta.Accessor(oldObj)
if err != nil {
utilruntime.HandleError(fmt.Errorf("cannot access oldObj: %v", err))
return false
}
return !beingDeleted(oldAccessor) || !hasFinalizer(oldAccessor, matchingFinalizer)
}
func beingDeleted(accessor metav1.Object) bool {
return accessor.GetDeletionTimestamp() != nil
}
func hasFinalizer(accessor metav1.Object, matchingFinalizer string) bool {
finalizers := accessor.GetFinalizers()
for _, finalizer := range finalizers {
if finalizer == matchingFinalizer {
return true
}
}
return false
}
1.2.3 gb.removeNode
gb.removeNode
uidToNode
node
owners
dependents
dependents
attemptToDelete
GarbageCollector
node
owners
owner
owner
node
owner
attemptToDelete
GarbageCollector
// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) removeNode(n *node) {
gb.uidToNode.Delete(n.identity.UID)
gb.removeDependentFromOwners(n, n.owners)
}
func (gb *GraphBuilder) removeDependentFromOwners(n *node, owners []metav1.OwnerReference) {
for _, owner := range owners {
ownerNode, ok := gb.uidToNode.Read(owner.UID)
if !ok {
continue
}
ownerNode.deleteDependent(n)
}
}
2.GarbageCollector
再來看到
GarbageCollector
GarbageCollector 主要有2個功能:
(1)處理
attemptToDelete
隊列中的事件,根據對象删除政策
foreground
background
做相應的回收邏輯處理,删除關聯對象;
attemptToOrphan
Orphan
,更新該
owner
dependents
對象,将對象的
OwnerReferences
屬性中該
owner
的相關字段去除,接着再更新該
owner
對象,去除
Orphan
finalizers
GarbageCollector的2個關鍵處理方法:
gc.runAttemptToDeleteWorker
:主要負責處理
attemptToDelete
隊列中的事件,負責删除政策為
foreground
background
的對象回收處理;
gc.runAttemptToOrphanWorker
attemptToOrphan
Orphan
的對象回收處理。
2.1 GarbageCollector struct
GarbageCollector struct
attemptToDelete
attemptToOrphan
GraphBuilder
attemptToDelete
attemptToOrphan
GarbageCollector
attemptToDelete
attemptToOrphan
// pkg/controller/garbagecollector/garbagecollector.go
type GarbageCollector struct {
...
attemptToDelete workqueue.RateLimitingInterface
attemptToOrphan workqueue.RateLimitingInterface
...
}
2.2 GarbageCollector-gc.runAttemptToDeleteWorker
GarbageCollector
gc.runAttemptToDeleteWorker
runAttemptToDeleteWorker主要邏輯為循環調用
attemptToDeleteWorker
方法。
attemptToDeleteWorker方法主要邏輯:
attemptToDelete
隊列中取出對象;
(2)調用
gc.attemptToDeleteItem
嘗試删除
node
;
(3)若删除失敗則重新加入到
attemptToDelete
隊列中進行重試。
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) runAttemptToDeleteWorker() {
for gc.attemptToDeleteWorker() {
}
}
func (gc *GarbageCollector) attemptToDeleteWorker() bool {
item, quit := gc.attemptToDelete.Get()
gc.workerLock.RLock()
defer gc.workerLock.RUnlock()
if quit {
return false
}
defer gc.attemptToDelete.Done(item)
n, ok := item.(*node)
if !ok {
utilruntime.HandleError(fmt.Errorf("expect *node, got %#v", item))
return true
}
err := gc.attemptToDeleteItem(n)
if err != nil {
if _, ok := err.(*restMappingError); ok {
// There are at least two ways this can happen:
// 1. The reference is to an object of a custom type that has not yet been
// recognized by gc.restMapper (this is a transient error).
// 2. The reference is to an invalid group/version. We don't currently
// have a way to distinguish this from a valid type we will recognize
// after the next discovery sync.
// For now, record the error and retry.
klog.V(5).Infof("error syncing item %s: %v", n, err)
} else {
utilruntime.HandleError(fmt.Errorf("error syncing item %s: %v", n, err))
}
// retry if garbage collection of an object failed.
gc.attemptToDelete.AddRateLimited(item)
} else if !n.isObserved() {
// requeue if item hasn't been observed via an informer event yet.
// otherwise a virtual node for an item added AND removed during watch reestablishment can get stuck in the graph and never removed.
// see https://issue.k8s.io/56121
klog.V(5).Infof("item %s hasn't been observed via informer yet", n.identity)
gc.attemptToDelete.AddRateLimited(item)
}
return true
}
2.2.1 gc.attemptToDeleteItem
(1)判斷
node
是否處于删除狀态;
(2)從
apiserver
擷取該
node
對應的對象;
(3)調用
item.isDeletingDependents
方法:通過
node
deletingDependents
字段判斷該
node
目前是否正在删除
dependents
,若是則調用
gc.processDeletingDependentsItem
方法對
dependents
做進一步處理:檢查該
node
blockingDependents
是否被完全删除,若是則移除該
node
對應對象的相關
finalizer
,若否,則将未删除的
blockingDependents
attemptToDelete
上面分析
GraphBuilder
時說到,在
GraphBuilder
處理
graphChanges
中的事件時,在
processTransitions
方法邏輯裡,會調用
n.markDeletingDependents
,标記
node
deletingDependents
true
(4)調用
gc.classifyReferences
node
owner
分為3類,分别是
solid
(至少有一個
owner
存在且不處于删除狀态)、
dangling
(
owner
均不存在)、
waitingForDependentsDeletion
owner
存在,處于删除狀态且正在等待其
dependents
被删除);
(5)接下來将根據
solid
dangling
waitingForDependentsDeletion
的數量做不同的邏輯處理;
(6)第一種情況:當
solid
數量不為0時,即該
node
至少有一個
owner
存在且不處于删除狀态,則說明該對象還不能被回收删除,此時将
dangling
waitingForDependentsDeletion
清單中的
owner
node
ownerReferences
中删除;
(7)第二種情況:
solid
數量為0,該
node
owner
處于
waitingForDependentsDeletion
狀态并且
node
dependents
未被完全删除,将使用
foreground
前台删除政策來删除該
node
(8)當不滿足以上兩種情況時(即),進入該預設處理邏輯:按照删除對象時使用的删除政策,調用
apiserver
的接口删除對象。
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) attemptToDeleteItem(item *node) error {
klog.V(2).Infof("processing item %s", item.identity)
// "being deleted" is an one-way trip to the final deletion. We'll just wait for the final deletion, and then process the object's dependents.
if item.isBeingDeleted() && !item.isDeletingDependents() {
klog.V(5).Infof("processing item %s returned at once, because its DeletionTimestamp is non-nil", item.identity)
return nil
}
// TODO: It's only necessary to talk to the API server if this is a
// "virtual" node. The local graph could lag behind the real status, but in
// practice, the difference is small.
latest, err := gc.getObject(item.identity)
switch {
case errors.IsNotFound(err):
// the GraphBuilder can add "virtual" node for an owner that doesn't
// exist yet, so we need to enqueue a virtual Delete event to remove
// the virtual node from GraphBuilder.uidToNode.
klog.V(5).Infof("item %v not found, generating a virtual delete event", item.identity)
gc.dependencyGraphBuilder.enqueueVirtualDeleteEvent(item.identity)
// since we're manually inserting a delete event to remove this node,
// we don't need to keep tracking it as a virtual node and requeueing in attemptToDelete
item.markObserved()
return nil
case err != nil:
return err
}
if latest.GetUID() != item.identity.UID {
klog.V(5).Infof("UID doesn't match, item %v not found, generating a virtual delete event", item.identity)
gc.dependencyGraphBuilder.enqueueVirtualDeleteEvent(item.identity)
// since we're manually inserting a delete event to remove this node,
// we don't need to keep tracking it as a virtual node and requeueing in attemptToDelete
item.markObserved()
return nil
}
// TODO: attemptToOrphanWorker() routine is similar. Consider merging
// attemptToOrphanWorker() into attemptToDeleteItem() as well.
if item.isDeletingDependents() {
return gc.processDeletingDependentsItem(item)
}
// compute if we should delete the item
ownerReferences := latest.GetOwnerReferences()
if len(ownerReferences) == 0 {
klog.V(2).Infof("object %s's doesn't have an owner, continue on next item", item.identity)
return nil
}
solid, dangling, waitingForDependentsDeletion, err := gc.classifyReferences(item, ownerReferences)
if err != nil {
return err
}
klog.V(5).Infof("classify references of %s.\nsolid: %#v\ndangling: %#v\nwaitingForDependentsDeletion: %#v\n", item.identity, solid, dangling, waitingForDependentsDeletion)
switch {
case len(solid) != 0:
klog.V(2).Infof("object %#v has at least one existing owner: %#v, will not garbage collect", item.identity, solid)
if len(dangling) == 0 && len(waitingForDependentsDeletion) == 0 {
return nil
}
klog.V(2).Infof("remove dangling references %#v and waiting references %#v for object %s", dangling, waitingForDependentsDeletion, item.identity)
// waitingForDependentsDeletion needs to be deleted from the
// ownerReferences, otherwise the referenced objects will be stuck with
// the FinalizerDeletingDependents and never get deleted.
ownerUIDs := append(ownerRefsToUIDs(dangling), ownerRefsToUIDs(waitingForDependentsDeletion)...)
patch := deleteOwnerRefStrategicMergePatch(item.identity.UID, ownerUIDs...)
_, err = gc.patch(item, patch, func(n *node) ([]byte, error) {
return gc.deleteOwnerRefJSONMergePatch(n, ownerUIDs...)
})
return err
case len(waitingForDependentsDeletion) != 0 && item.dependentsLength() != 0:
deps := item.getDependents()
for _, dep := range deps {
if dep.isDeletingDependents() {
// this circle detection has false positives, we need to
// apply a more rigorous detection if this turns out to be a
// problem.
// there are multiple workers run attemptToDeleteItem in
// parallel, the circle detection can fail in a race condition.
klog.V(2).Infof("processing object %s, some of its owners and its dependent [%s] have FinalizerDeletingDependents, to prevent potential cycle, its ownerReferences are going to be modified to be non-blocking, then the object is going to be deleted with Foreground", item.identity, dep.identity)
patch, err := item.unblockOwnerReferencesStrategicMergePatch()
if err != nil {
return err
}
if _, err := gc.patch(item, patch, gc.unblockOwnerReferencesJSONMergePatch); err != nil {
return err
}
break
}
}
klog.V(2).Infof("at least one owner of object %s has FinalizerDeletingDependents, and the object itself has dependents, so it is going to be deleted in Foreground", item.identity)
// the deletion event will be observed by the graphBuilder, so the item
// will be processed again in processDeletingDependentsItem. If it
// doesn't have dependents, the function will remove the
// FinalizerDeletingDependents from the item, resulting in the final
// deletion of the item.
policy := metav1.DeletePropagationForeground
return gc.deleteObject(item.identity, &policy)
default:
// item doesn't have any solid owner, so it needs to be garbage
// collected. Also, none of item's owners is waiting for the deletion of
// the dependents, so set propagationPolicy based on existing finalizers.
var policy metav1.DeletionPropagation
switch {
case hasOrphanFinalizer(latest):
// if an existing orphan finalizer is already on the object, honor it.
policy = metav1.DeletePropagationOrphan
case hasDeleteDependentsFinalizer(latest):
// if an existing foreground finalizer is already on the object, honor it.
policy = metav1.DeletePropagationForeground
default:
// otherwise, default to background.
policy = metav1.DeletePropagationBackground
}
klog.V(2).Infof("delete object %s with propagation policy %s", item.identity, policy)
return gc.deleteObject(item.identity, &policy)
}
}
gc.processDeletingDependentsItem
主要邏輯:檢查該
node
blockingDependents
(即阻塞
owner
删除的
dpendents
)是否被完全删除,若是則移除該
node
finalizer
(finalizer移除後,kube-apiserver會删除該對象),若否,則将未删除的
blockingDependents
attemptToDelete
隊列中。
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) processDeletingDependentsItem(item *node) error {
blockingDependents := item.blockingDependents()
if len(blockingDependents) == 0 {
klog.V(2).Infof("remove DeleteDependents finalizer for item %s", item.identity)
return gc.removeFinalizer(item, metav1.FinalizerDeleteDependents)
}
for _, dep := range blockingDependents {
if !dep.isDeletingDependents() {
klog.V(2).Infof("adding %s to attemptToDelete, because its owner %s is deletingDependents", dep.identity, item.identity)
gc.attemptToDelete.Add(dep)
}
}
return nil
}
item.blockingDependents
item.blockingDependents傳回會阻塞
node
dependents
。一個
dependents
會不會阻塞
owner
的删除,主要看這個
dependents
ownerReferences
blockOwnerDeletion
屬性值是否為
true
,為
true
則代表該
dependents
會阻塞
owner
的删除。
// pkg/controller/garbagecollector/graph.go
func (n *node) blockingDependents() []*node {
dependents := n.getDependents()
var ret []*node
for _, dep := range dependents {
for _, owner := range dep.owners {
if owner.UID == n.identity.UID && owner.BlockOwnerDeletion != nil && *owner.BlockOwnerDeletion {
ret = append(ret, dep)
}
}
}
return ret
}
2.3 GarbageCollector-gc.runAttemptToOrphanWorker
gc.runAttemptToOrphanWorker方法是負責處理
orphan
删除政策删除的
node
gc.runAttemptToDeleteWorker主要邏輯為循環調用
gc.attemptToDeleteWorker
下面來看一下
gc.attemptToDeleteWorker
方法的主要邏輯:
attemptToOrphan
gc.orphanDependents
方法:更新該
owner
dependents
OwnerReferences
owner
的相關字段去除,失敗則将該
owner
重新加入到
attemptToOrphan
gc.removeFinalizer
owner
Orphan
finalizers
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) runAttemptToOrphanWorker() {
for gc.attemptToOrphanWorker() {
}
}
func (gc *GarbageCollector) attemptToOrphanWorker() bool {
item, quit := gc.attemptToOrphan.Get()
gc.workerLock.RLock()
defer gc.workerLock.RUnlock()
if quit {
return false
}
defer gc.attemptToOrphan.Done(item)
owner, ok := item.(*node)
if !ok {
utilruntime.HandleError(fmt.Errorf("expect *node, got %#v", item))
return true
}
// we don't need to lock each element, because they never get updated
owner.dependentsLock.RLock()
dependents := make([]*node, 0, len(owner.dependents))
for dependent := range owner.dependents {
dependents = append(dependents, dependent)
}
owner.dependentsLock.RUnlock()
err := gc.orphanDependents(owner.identity, dependents)
if err != nil {
utilruntime.HandleError(fmt.Errorf("orphanDependents for %s failed with %v", owner.identity, err))
gc.attemptToOrphan.AddRateLimited(item)
return true
}
// update the owner, remove "orphaningFinalizer" from its finalizers list
err = gc.removeFinalizer(owner, metav1.FinalizerOrphanDependents)
if err != nil {
utilruntime.HandleError(fmt.Errorf("removeOrphanFinalizer for %s failed with %v", owner.identity, err))
gc.attemptToOrphan.AddRateLimited(item)
}
return true
}
2.3.1 gc.orphanDependents
主要邏輯:更新指定
owner
dependents
OwnerReferences
owner
的相關字段去除,對于每個
dependents
,分别起一個goroutine來處理,加快處理速度。
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) orphanDependents(owner objectReference, dependents []*node) error {
errCh := make(chan error, len(dependents))
wg := sync.WaitGroup{}
wg.Add(len(dependents))
for i := range dependents {
go func(dependent *node) {
defer wg.Done()
// the dependent.identity.UID is used as precondition
patch := deleteOwnerRefStrategicMergePatch(dependent.identity.UID, owner.UID)
_, err := gc.patch(dependent, patch, func(n *node) ([]byte, error) {
return gc.deleteOwnerRefJSONMergePatch(n, owner.UID)
})
// note that if the target ownerReference doesn't exist in the
// dependent, strategic merge patch will NOT return an error.
if err != nil && !errors.IsNotFound(err) {
errCh <- fmt.Errorf("orphaning %s failed, %v", dependent.identity, err)
}
}(dependents[i])
}
wg.Wait()
close(errCh)
var errorsSlice []error
for e := range errCh {
errorsSlice = append(errorsSlice, e)
}
if len(errorsSlice) != 0 {
return fmt.Errorf("failed to orphan dependents of owner %s, got errors: %s", owner, utilerrors.NewAggregate(errorsSlice).Error())
}
klog.V(5).Infof("successfully updated all dependents of owner %s", owner)
return nil
}
總結
先來回顧一下
garbage collector
的構架與核心處理邏輯。
GraphBuilder
GarbageCollector
graphChanges
attemptToDelete
attemptToOrphan
)。
從apiserver list/watch的事件會放入到
graphChanges
隊列,而
GraphBuilder
graphChanges
隊列中取出事件進行處理,建構對象關聯依賴關系圖,并根據對象删除政策将關聯對象放入
attemptToDelete
attemptToOrphan
隊列中,接着
GarbageCollector
會從
attemptToDelete
attemptToOrphan
隊列中取出事件,再從對象關聯依賴關系圖中擷取資訊進行處理,最後回收删除對象。
總結一下3種對象删除政策下,
node
及其對象的删除過程。
dependent
deletionTimestamp
metadata.finalizers
foregroundDeletion
dependent
ownerReference.blockOwnerDeletion=true
metadata.finalizers
foregroundDeletion
owner
owner
dependent
Background
Finalizer
foreground
Orphan
Finalizer
GraphBuilder
dependents
attemptToDelete
GarbageCollector
dependents
dependent
dependent
Orphan
metadata.finalizers
orphan
GarbageCollector
dependents
OwnerReferences
owner
owner
metadata.finalizers
Orphan
owner