天天看點

k8s garbage collector源碼分析(2)-處理邏輯分析

k8s gc分析。Kubernetes garbage collector即垃圾收集器,存在于kube-controller-manger中,它負責回收kubernetes中的資源對象,監聽資源對象事件,更新對象之間的依賴關系,并根據對象的删除政策來決定是否删除其關聯對象。

garbage collector介紹

Kubernetes garbage collector即垃圾收集器,存在于kube-controller-manger中,它負責回收kubernetes中的資源對象,監聽資源對象事件,更新對象之間的依賴關系,并根據對象的删除政策來決定是否删除其關聯對象。

關于删除關聯對象,細一點說就是,使用級聯删除政策去删除一個

owner

時,會連帶這個

owner

對象的

dependent

對象也一起删除掉。

關于對象的關聯依賴關系,garbage collector會監聽資源對象事件,根據資源對象中

ownerReference

的值,來建構對象間的關聯依賴關系,也即

owner

dependent

之間的關系。

關于owner與dependent的介紹

以建立deployment對象為例進行講解。

建立deployment對象後,kube-controller-manager為其建立出replicaset對象,且自動将該deployment的資訊設定到replicaset對象

ownerReference

值。如下面示例,即說明replicaset對象

test-1-59d7f45ffb

owner

為deployment對象

test-1

,deployment對象

test-1

dependent

為replicaset對象

test-1-59d7f45ffb

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-1
  namespace: test
  uid: 4973d370-3221-46a7-8d86-e145bf9ad0ce
...
           
apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: test-1-59d7f45ffb
  namespace: test
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: Deployment
    name: test-1
    uid: 4973d370-3221-46a7-8d86-e145bf9ad0ce
  uid: 386c380b-490e-470b-a33f-7d5b0bf945fb
...
           

同理,replicaset對象建立後,kube-controller-manager為其建立出pod對象,這些pod對象也會将replicaset對象的資訊設定到pod對象的

ownerReference

的值中,replicaset是pod的

owner

,pod是replicaset的

dependent

對象中

ownerReference

的值,指定了

owner

dependent

garbage collector架構圖

garbage collectort的詳細架構與核心處理邏輯如下圖。

k8s garbage collector源碼分析(2)-處理邏輯分析

garbage collector中最關鍵的代碼就是

garbagecollector.go

graph_builder.go

兩部分。

garbage collector的主要組成為1個圖(對象關聯依賴關系圖)、2個處理器(

GraphBuilder

GarbageCollector

)、3個事件隊列(

graphChanges

attemptToDelete

attemptToOrphan

):

1個圖

(1)

uidToNode

:對象關聯依賴關系圖,由

GraphBuilder

維護,維護着所有對象間的關聯依賴關系。在該圖裡,每一個k8s對象會對應着關系圖裡的一個

node

,而每個

node

都會維護一個

owner

清單以及

dependent

清單。

示例:現有一個deployment A,replicaset B(owner為deployment A),pod C(owner為replicaset B),則對象關聯依賴關系如下:

3個node,分别是A、B、C

A對應一個node,無owner,dependent清單裡有B;  
B對應一個node,owner清單裡有A,dependent清單裡有C;  
C對應一個node,owner清單裡有B,無dependent。  
           
k8s garbage collector源碼分析(2)-處理邏輯分析

2個處理器

GraphBuilder

:負責維護所有對象的關聯依賴關系圖,并産生事件觸發

GarbageCollector

執行對象回收删除操作。

GraphBuilder

graphChanges

事件隊列中擷取事件進行消費,根據資源對象中

ownerReference

的值,來建構、更新、删除對象間的關聯依賴關系圖,也即

owner

dependent

之間的關系圖,然後再作為生産者生産事件,放入

attemptToDelete

attemptToOrphan

隊列中,觸發

GarbageCollector

執行,看是否需要進行關聯對象的回收删除操作,而

GarbageCollector

進行對象的回收删除操作時會依賴于

uidToNode

這個關系圖。

(2)

GarbageCollector

:負責回收删除對象。

GarbageCollector

作為消費者,從

attemptToDelete

attemptToOrphan

隊列中取出事件進行處理,若一個對象被删除,且其删除政策為級聯删除,則進行關聯對象的回收删除。關于删除關聯對象,細一點說就是,使用級聯删除政策去删除一個

owner

owner

dependent

3個事件隊列

graphChanges

:list/watch apiserver,擷取事件,由

informer

生産,由

GraphBuilder

消費;

attemptToDelete

:級聯删除事件隊列,由

GraphBuilder

GarbageCollector

(3)

attemptToOrphan

:孤兒删除事件隊列,由

GraphBuilder

GarbageCollector

消費。

對象删除政策

kubernetes 中有三種對象删除政策:

Orphan

Foreground

Background

,删除某個對象時,可以指定删除政策。下面對這三種政策進行介紹。

Foreground前台删除

Foreground即前台删除政策,屬于級聯删除政策,垃圾收集器會删除對象的所有

dependent

使用前台删除政策删除某個對象時,該對象的

deletionTimestamp

字段被設定,且對象的

metadata.finalizers

字段包含值

foregroundDeletion

,用于阻塞該對象删除,等到垃圾收集器在删除了該對象中所有有阻塞能力的

dependent

對象(對象的

ownerReference.blockOwnerDeletion=true

) 之後,再去除該對象的

metadata.finalizers

字段中的值

foregroundDeletion

,然後删除該

owner

對象。

以删除deployment為例,使用前台删除政策,則按照Pod->ReplicaSet->Deployment的順序進行删除。

Background背景删除

Background即背景删除政策,屬于級聯删除政策,Kubernetes會立即删除該

owner

對象,之後垃圾收集器會在背景自動删除其所有的

dependent

當删除一個對象時使用了

Background

背景删除政策時,該對象因沒有相關的

Finalizer

設定(隻有删除政策為

foreground

Orphan

時會設定相關

Finalizer

),會直接被删除,接着

GraphBuilder

會監聽到該對象的delete事件,會将其

dependents

放入到

attemptToDelete

隊列中去,觸發

GarbageCollector

dependents

對象的回收删除處理。

以删除deployment為例,使用背景删除政策,則按照Deployment->ReplicaSet->Pod的順序進行删除。

Orphan孤兒删除

Orphan即孤兒删除政策,屬于非級聯删除政策,即删除某個對象時,不會自動删除它的

dependent

,這些

dependent

也被稱作孤立對象。

Orphan

孤兒删除政策時,該對象的

metadata.finalizers

orphan

,用于阻塞該對象删除,直至

GarbageCollector

将其所有

dependents

OwnerReferences

屬性中的該

owner

的相關字段去除,再去除該

owner

metadata.finalizers

Orphan

,最後才能删除該

owner

以删除deployment為例,使用孤兒删除政策,則隻删除Deployment,對應ReplicaSet和Pod不删除。

删除對象時指定删除政策

當删除對象時沒有特别指定删除政策,将會使用預設删除政策:Background即背景删除政策。

(1)指定背景删除政策

curl -X DELETE localhost:8080/apis/apps/v1/namespaces/default/replicasets/my-repset \
  -d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Background"}' \
  -H "Content-Type: application/json"
           

(2)指定前台删除政策

curl -X DELETE localhost:8080/apis/apps/v1/namespaces/default/replicasets/my-repset \
  -d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Foreground"}' \
  -H "Content-Type: application/json"
           

(3)指定孤兒删除政策

curl -X DELETE localhost:8080/apis/apps/v1/namespaces/default/replicasets/my-repset \
  -d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Orphan"}' \
  -H "Content-Type: application/json"
           

garbage collector的源碼分析分成兩部分進行,分别是:

(1)啟動分析;

(2)核心處理邏輯分析。

上一篇部落格已經對garbage collector的啟動進行了分析,本篇部落格對garbage collector的核心處理邏輯進行分析。

garbage collector源碼分析-處理邏輯分析

基于tag v1.17.4

https://github.com/kubernetes/kubernetes/releases/tag/v1.17.4

前面講過,

garbage collector

中最關鍵的代碼就是

garbagecollector.go

graph_builder.go

兩部分,也即

GarbageCollector struct

GraphBuilder struct

,是以下面處理邏輯分析将分成兩大塊進行分析。

1.GraphBuilder

首先先看到

GraphBuilder

GraphBuilder 主要有2個功能:

(1)基于 informers 中的資源事件在

uidToNode

屬性中維護着所有對象的關聯依賴關系;

(2)處理

graphChanges

中的事件,并作為生産者将事件放入到

attemptToDelete

attemptToOrphan

兩個隊列中,觸發消費者

GarbageCollector

進行對象的回收删除操作。

1.1 GraphBuilder struct

先來簡單的分析下

GraphBuilder struct

,裡面最關鍵的幾個屬性及作用如下:

graphChanges

:informers 監聽到的事件會放在

graphChanges

中,然後

GraphBuilder

會作為消費者,處理

graphChanges

隊列中的事件;

uidToNode

(對象依賴關聯關系圖):根據對象uid,維護所有對象的關聯依賴關系,也即前面說的

owner

dependent

之間的關系,也可以了解為

GraphBuilder

會維護一張所有對象的關聯依賴關系圖,而

GarbageCollector

進行對象的回收删除操作時會依賴于這個關系圖;

attemptToDelete

attemptToOrphan

GraphBuilder

作為生産者往

attemptToDelete

attemptToOrphan

兩個隊列中存放事件,然後

GarbageCollector

作為消費者會處理

attemptToDelete

attemptToOrphan

兩個隊列中的事件。

// pkg/controller/garbagecollector/graph_builder.go
type GraphBuilder struct {
	...
	
	// monitors are the producer of the graphChanges queue, graphBuilder alters
	// the in-memory graph according to the changes.
	graphChanges workqueue.RateLimitingInterface
	// uidToNode doesn't require a lock to protect, because only the
	// single-threaded GraphBuilder.processGraphChanges() reads/writes it.
	uidToNode *concurrentUIDToNode
	// GraphBuilder is the producer of attemptToDelete and attemptToOrphan, GC is the consumer.
	attemptToDelete workqueue.RateLimitingInterface
	attemptToOrphan workqueue.RateLimitingInterface
	
	...
}
           
// pkg/controller/garbagecollector/graph.go
type concurrentUIDToNode struct {
	uidToNodeLock sync.RWMutex
	uidToNode     map[types.UID]*node
}
           
// pkg/controller/garbagecollector/graph.go
type node struct {
	...
	dependents map[*node]struct{}
	...
	owners []metav1.OwnerReference
}
           

從結構體定義中可以看到,一個k8s對象對應着對象關聯依賴關系圖裡的一個

node

node

都會維護一

個owner

dependent

1.2 GraphBuilder-gb.processGraphChanges

接下來看到

GraphBuilder

的處理邏輯部分,從

gb.processGraphChanges

作為入口進行處理邏輯分析。

前面說過,informers 監聽到的事件會放入到

graphChanges

隊列中,然後

GraphBuilder

graphChanges

隊列中的事件,而

processGraphChanges

方法就是

GraphBuilder

作為消費者處理

graphChanges

隊列中事件地方。

是以在此方法中,

GraphBuilder

既是消費者又是生産者,消費處理

graphChanges

中的所有事件并進行分類,再生産事件放入到

attemptToDelete

attemptToOrphan

兩個隊列中去,讓

GarbageCollector

作為消費者去處理這兩個隊列中的事件。

主要邏輯:

(1)從

graphChanges

隊列中取出事件進行處理;

(2)讀取

uidToNode

,判斷該對象是否已經存在于已建構的對象依賴關聯關系圖中;下面就開始根據對象是否存在于對象依賴關聯關系圖中以及事件類型來做不同的處理邏輯;

(3)若

uidToNode

中不存在該

node

且該事件是

addEvent

updateEvent

,則為該

object

建立對應的

node

,并調用

gb.insertNode

将該

node

加到

uidToNode

中,然後将該

node

添加到其

owner

dependents

中;

然後再調用

gb.processTransitions

方法做處理,該方法的處理邏輯是判斷該對象是否處于删除狀态,若處于删除狀态會判斷該對象是以

orphan

模式删除還是以

foreground

模式删除(其實就是判斷deployment對象的finalizer來區分删除模式,删除deployment的時候會帶上删除政策,kube-apiserver會根據删除政策給deployment對象打上相應的finalizer),若以

orphan

模式删除,則将該

node

加入到

attemptToOrphan

隊列中,若以

foreground

模式删除則将該對象以及其所有

dependents

都加入到

attemptToDelete

隊列中;

(4)若

uidToNode

中存在該

node

addEvent

updateEvent

時,則調用

referencesDiffs

方法檢查該對象的

OwnerReferences

字段是否有變化,有變化則做相應處理,更新對象依賴關聯關系圖,最後調用

gb.processTransitions

做處理;

(5)若事件為删除事件,則調用

gb.removeNode

,從

uidToNode

中删除該對象,然後從該

node

所有

owners

dependents

中删除該對象,再把該對象的

dependents

attemptToDelete

GarbageCollector

處理;最後檢查該

node

的所有

owners

,若有處于删除狀态的

owner

,此時該

owner

可能處于删除阻塞狀态正在等待該

node

的删除,将該

owner

attemptToDelete

GarbageCollector

處理。

// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) runProcessGraphChanges() {
	for gb.processGraphChanges() {
	}
}

// Dequeueing an event from graphChanges, updating graph, populating dirty_queue.
func (gb *GraphBuilder) processGraphChanges() bool {
	item, quit := gb.graphChanges.Get()
	if quit {
		return false
	}
	defer gb.graphChanges.Done(item)
	event, ok := item.(*event)
	if !ok {
		utilruntime.HandleError(fmt.Errorf("expect a *event, got %v", item))
		return true
	}
	obj := event.obj
	accessor, err := meta.Accessor(obj)
	if err != nil {
		utilruntime.HandleError(fmt.Errorf("cannot access obj: %v", err))
		return true
	}
	klog.V(5).Infof("GraphBuilder process object: %s/%s, namespace %s, name %s, uid %s, event type %v", event.gvk.GroupVersion().String(), event.gvk.Kind, accessor.GetNamespace(), accessor.GetName(), string(accessor.GetUID()), event.eventType)
	// Check if the node already exists
	existingNode, found := gb.uidToNode.Read(accessor.GetUID())
	if found {
		// this marks the node as having been observed via an informer event
		// 1. this depends on graphChanges only containing add/update events from the actual informer
		// 2. this allows things tracking virtual nodes' existence to stop polling and rely on informer events
		existingNode.markObserved()
	}
	switch {
	case (event.eventType == addEvent || event.eventType == updateEvent) && !found:
		newNode := &node{
			identity: objectReference{
				OwnerReference: metav1.OwnerReference{
					APIVersion: event.gvk.GroupVersion().String(),
					Kind:       event.gvk.Kind,
					UID:        accessor.GetUID(),
					Name:       accessor.GetName(),
				},
				Namespace: accessor.GetNamespace(),
			},
			dependents:         make(map[*node]struct{}),
			owners:             accessor.GetOwnerReferences(),
			deletingDependents: beingDeleted(accessor) && hasDeleteDependentsFinalizer(accessor),
			beingDeleted:       beingDeleted(accessor),
		}
		gb.insertNode(newNode)
		// the underlying delta_fifo may combine a creation and a deletion into
		// one event, so we need to further process the event.
		gb.processTransitions(event.oldObj, accessor, newNode)
	case (event.eventType == addEvent || event.eventType == updateEvent) && found:
		// handle changes in ownerReferences
		added, removed, changed := referencesDiffs(existingNode.owners, accessor.GetOwnerReferences())
		if len(added) != 0 || len(removed) != 0 || len(changed) != 0 {
			// check if the changed dependency graph unblock owners that are
			// waiting for the deletion of their dependents.
			gb.addUnblockedOwnersToDeleteQueue(removed, changed)
			// update the node itself
			existingNode.owners = accessor.GetOwnerReferences()
			// Add the node to its new owners' dependent lists.
			gb.addDependentToOwners(existingNode, added)
			// remove the node from the dependent list of node that are no longer in
			// the node's owners list.
			gb.removeDependentFromOwners(existingNode, removed)
		}

		if beingDeleted(accessor) {
			existingNode.markBeingDeleted()
		}
		gb.processTransitions(event.oldObj, accessor, existingNode)
	case event.eventType == deleteEvent:
		if !found {
			klog.V(5).Infof("%v doesn't exist in the graph, this shouldn't happen", accessor.GetUID())
			return true
		}
		// removeNode updates the graph
		gb.removeNode(existingNode)
		existingNode.dependentsLock.RLock()
		defer existingNode.dependentsLock.RUnlock()
		if len(existingNode.dependents) > 0 {
			gb.absentOwnerCache.Add(accessor.GetUID())
		}
		for dep := range existingNode.dependents {
			gb.attemptToDelete.Add(dep)
		}
		for _, owner := range existingNode.owners {
			ownerNode, found := gb.uidToNode.Read(owner.UID)
			if !found || !ownerNode.isDeletingDependents() {
				continue
			}
			// this is to let attempToDeleteItem check if all the owner's
			// dependents are deleted, if so, the owner will be deleted.
			gb.attemptToDelete.Add(ownerNode)
		}
	}
	return true
}
           

結合代碼分析可以得知,當删除一個對象時使用了

Background

Finalizer

Foreground

Orphan

Finalizer

GraphBuilder

dependents

attemptToDelete

GarbageCollector

dependents

1.2.1 gb.insertNode

調用

gb.insertNode

node

uidToNode

node

owner

dependents

中。

// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) insertNode(n *node) {
	gb.uidToNode.Write(n)
	gb.addDependentToOwners(n, n.owners)
}

func (gb *GraphBuilder) addDependentToOwners(n *node, owners []metav1.OwnerReference) {
	for _, owner := range owners {
		ownerNode, ok := gb.uidToNode.Read(owner.UID)
		if !ok {
			// Create a "virtual" node in the graph for the owner if it doesn't
			// exist in the graph yet.
			ownerNode = &node{
				identity: objectReference{
					OwnerReference: owner,
					Namespace:      n.identity.Namespace,
				},
				dependents: make(map[*node]struct{}),
				virtual:    true,
			}
			klog.V(5).Infof("add virtual node.identity: %s\n\n", ownerNode.identity)
			gb.uidToNode.Write(ownerNode)
		}
		ownerNode.addDependent(n)
		if !ok {
			// Enqueue the virtual node into attemptToDelete.
			// The garbage processor will enqueue a virtual delete
			// event to delete it from the graph if API server confirms this
			// owner doesn't exist.
			gb.attemptToDelete.Add(ownerNode)
		}
	}
}

           

1.2.2 gb.processTransitions

gb.processTransitions 方法檢查k8s對象是否處于删除狀态(對象的

deletionTimestamp

屬性不為空則處于删除狀态),并且對象裡含有删除政策對應的

finalizer

,然後做相應的處理。

因為隻有删除政策為

Foreground

Orphan

時對象才會會設定相關

Finalizer

,是以該方法隻會處理删除政策為

Foreground

Orphan

的對象,對于删除政策為

Background

的對象不做處理。

若對象的

deletionTimestamp

屬性不為空,且有

Orphaned

删除政策對應的

finalizer

,則将對應的

node

attemptToOrphan

GarbageCollector

去消費處理;

deletionTimestamp

foreground

finalizer

,則調用

n.markDeletingDependents

标記

node

deletingDependents

屬性為

true

,代表該

node

dependents

正在被删除,并将對應的

node

及其

dependents

attemptToDelete

GarbageCollector

去消費處理。

// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) processTransitions(oldObj interface{}, newAccessor metav1.Object, n *node) {
	if startsWaitingForDependentsOrphaned(oldObj, newAccessor) {
		klog.V(5).Infof("add %s to the attemptToOrphan", n.identity)
		gb.attemptToOrphan.Add(n)
		return
	}
	if startsWaitingForDependentsDeleted(oldObj, newAccessor) {
		klog.V(2).Infof("add %s to the attemptToDelete, because it's waiting for its dependents to be deleted", n.identity)
		// if the n is added as a "virtual" node, its deletingDependents field is not properly set, so always set it here.
		n.markDeletingDependents()
		for dep := range n.dependents {
			gb.attemptToDelete.Add(dep)
		}
		gb.attemptToDelete.Add(n)
	}
}

func startsWaitingForDependentsOrphaned(oldObj interface{}, newAccessor metav1.Object) bool {
	return deletionStartsWithFinalizer(oldObj, newAccessor, metav1.FinalizerOrphanDependents)
}

func startsWaitingForDependentsDeleted(oldObj interface{}, newAccessor metav1.Object) bool {
	return deletionStartsWithFinalizer(oldObj, newAccessor, metav1.FinalizerDeleteDependents)
}

func deletionStartsWithFinalizer(oldObj interface{}, newAccessor metav1.Object, matchingFinalizer string) bool {
	// if the new object isn't being deleted, or doesn't have the finalizer we're interested in, return false
	if !beingDeleted(newAccessor) || !hasFinalizer(newAccessor, matchingFinalizer) {
		return false
	}

	// if the old object is nil, or wasn't being deleted, or didn't have the finalizer, return true
	if oldObj == nil {
		return true
	}
	oldAccessor, err := meta.Accessor(oldObj)
	if err != nil {
		utilruntime.HandleError(fmt.Errorf("cannot access oldObj: %v", err))
		return false
	}
	return !beingDeleted(oldAccessor) || !hasFinalizer(oldAccessor, matchingFinalizer)
}

func beingDeleted(accessor metav1.Object) bool {
	return accessor.GetDeletionTimestamp() != nil
}

func hasFinalizer(accessor metav1.Object, matchingFinalizer string) bool {
	finalizers := accessor.GetFinalizers()
	for _, finalizer := range finalizers {
		if finalizer == matchingFinalizer {
			return true
		}
	}
	return false
}
           

1.2.3 gb.removeNode

gb.removeNode

uidToNode

node

owners

dependents

dependents

attemptToDelete

GarbageCollector

node

owners

owner

owner

node

owner

attemptToDelete

GarbageCollector

// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) removeNode(n *node) {
	gb.uidToNode.Delete(n.identity.UID)
	gb.removeDependentFromOwners(n, n.owners)
}

func (gb *GraphBuilder) removeDependentFromOwners(n *node, owners []metav1.OwnerReference) {
	for _, owner := range owners {
		ownerNode, ok := gb.uidToNode.Read(owner.UID)
		if !ok {
			continue
		}
		ownerNode.deleteDependent(n)
	}
}
           

2.GarbageCollector

再來看到

GarbageCollector

GarbageCollector 主要有2個功能:

(1)處理

attemptToDelete

隊列中的事件,根據對象删除政策

foreground

background

做相應的回收邏輯處理,删除關聯對象;

attemptToOrphan

Orphan

,更新該

owner

dependents

對象,将對象的

OwnerReferences

屬性中該

owner

的相關字段去除,接着再更新該

owner

對象,去除

Orphan

finalizers

GarbageCollector的2個關鍵處理方法:

gc.runAttemptToDeleteWorker

:主要負責處理

attemptToDelete

隊列中的事件,負責删除政策為

foreground

background

的對象回收處理;

gc.runAttemptToOrphanWorker

attemptToOrphan

Orphan

的對象回收處理。

2.1 GarbageCollector struct

GarbageCollector struct

attemptToDelete

attemptToOrphan

GraphBuilder

attemptToDelete

attemptToOrphan

GarbageCollector

attemptToDelete

attemptToOrphan

// pkg/controller/garbagecollector/garbagecollector.go
type GarbageCollector struct {
	...
	attemptToDelete workqueue.RateLimitingInterface
	attemptToOrphan workqueue.RateLimitingInterface
	...
}
           

2.2 GarbageCollector-gc.runAttemptToDeleteWorker

GarbageCollector

gc.runAttemptToDeleteWorker

runAttemptToDeleteWorker主要邏輯為循環調用

attemptToDeleteWorker

方法。

attemptToDeleteWorker方法主要邏輯:

attemptToDelete

隊列中取出對象;

(2)調用

gc.attemptToDeleteItem

嘗試删除

node

(3)若删除失敗則重新加入到

attemptToDelete

隊列中進行重試。

// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) runAttemptToDeleteWorker() {
	for gc.attemptToDeleteWorker() {
	}
}

func (gc *GarbageCollector) attemptToDeleteWorker() bool {
	item, quit := gc.attemptToDelete.Get()
	gc.workerLock.RLock()
	defer gc.workerLock.RUnlock()
	if quit {
		return false
	}
	defer gc.attemptToDelete.Done(item)
	n, ok := item.(*node)
	if !ok {
		utilruntime.HandleError(fmt.Errorf("expect *node, got %#v", item))
		return true
	}
	err := gc.attemptToDeleteItem(n)
	if err != nil {
		if _, ok := err.(*restMappingError); ok {
			// There are at least two ways this can happen:
			// 1. The reference is to an object of a custom type that has not yet been
			//    recognized by gc.restMapper (this is a transient error).
			// 2. The reference is to an invalid group/version. We don't currently
			//    have a way to distinguish this from a valid type we will recognize
			//    after the next discovery sync.
			// For now, record the error and retry.
			klog.V(5).Infof("error syncing item %s: %v", n, err)
		} else {
			utilruntime.HandleError(fmt.Errorf("error syncing item %s: %v", n, err))
		}
		// retry if garbage collection of an object failed.
		gc.attemptToDelete.AddRateLimited(item)
	} else if !n.isObserved() {
		// requeue if item hasn't been observed via an informer event yet.
		// otherwise a virtual node for an item added AND removed during watch reestablishment can get stuck in the graph and never removed.
		// see https://issue.k8s.io/56121
		klog.V(5).Infof("item %s hasn't been observed via informer yet", n.identity)
		gc.attemptToDelete.AddRateLimited(item)
	}
	return true
}
           

2.2.1 gc.attemptToDeleteItem

(1)判斷

node

是否處于删除狀态;

(2)從

apiserver

擷取該

node

對應的對象;

(3)調用

item.isDeletingDependents

方法:通過

node

deletingDependents

字段判斷該

node

目前是否正在删除

dependents

,若是則調用

gc.processDeletingDependentsItem

方法對

dependents

做進一步處理:檢查該

node

blockingDependents

是否被完全删除,若是則移除該

node

對應對象的相關

finalizer

,若否,則将未删除的

blockingDependents

attemptToDelete

上面分析

GraphBuilder

時說到,在

GraphBuilder

處理

graphChanges

中的事件時,在

processTransitions

方法邏輯裡,會調用

n.markDeletingDependents

,标記

node

deletingDependents

true

(4)調用

gc.classifyReferences

node

owner

分為3類,分别是

solid

(至少有一個

owner

存在且不處于删除狀态)、

dangling

owner

均不存在)、

waitingForDependentsDeletion

owner

存在,處于删除狀态且正在等待其

dependents

被删除);

(5)接下來将根據

solid

dangling

waitingForDependentsDeletion

的數量做不同的邏輯處理;

(6)第一種情況:當

solid

數量不為0時,即該

node

至少有一個

owner

存在且不處于删除狀态,則說明該對象還不能被回收删除,此時将

dangling

waitingForDependentsDeletion

清單中的

owner

node

ownerReferences

中删除;

(7)第二種情況:

solid

數量為0,該

node

owner

處于

waitingForDependentsDeletion

狀态并且

node

dependents

未被完全删除,将使用

foreground

前台删除政策來删除該

node

(8)當不滿足以上兩種情況時(即),進入該預設處理邏輯:按照删除對象時使用的删除政策,調用

apiserver

的接口删除對象。

// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) attemptToDeleteItem(item *node) error {
	klog.V(2).Infof("processing item %s", item.identity)
	// "being deleted" is an one-way trip to the final deletion. We'll just wait for the final deletion, and then process the object's dependents.
	if item.isBeingDeleted() && !item.isDeletingDependents() {
		klog.V(5).Infof("processing item %s returned at once, because its DeletionTimestamp is non-nil", item.identity)
		return nil
	}
	// TODO: It's only necessary to talk to the API server if this is a
	// "virtual" node. The local graph could lag behind the real status, but in
	// practice, the difference is small.
	latest, err := gc.getObject(item.identity)
	switch {
	case errors.IsNotFound(err):
		// the GraphBuilder can add "virtual" node for an owner that doesn't
		// exist yet, so we need to enqueue a virtual Delete event to remove
		// the virtual node from GraphBuilder.uidToNode.
		klog.V(5).Infof("item %v not found, generating a virtual delete event", item.identity)
		gc.dependencyGraphBuilder.enqueueVirtualDeleteEvent(item.identity)
		// since we're manually inserting a delete event to remove this node,
		// we don't need to keep tracking it as a virtual node and requeueing in attemptToDelete
		item.markObserved()
		return nil
	case err != nil:
		return err
	}

	if latest.GetUID() != item.identity.UID {
		klog.V(5).Infof("UID doesn't match, item %v not found, generating a virtual delete event", item.identity)
		gc.dependencyGraphBuilder.enqueueVirtualDeleteEvent(item.identity)
		// since we're manually inserting a delete event to remove this node,
		// we don't need to keep tracking it as a virtual node and requeueing in attemptToDelete
		item.markObserved()
		return nil
	}

	// TODO: attemptToOrphanWorker() routine is similar. Consider merging
	// attemptToOrphanWorker() into attemptToDeleteItem() as well.
	if item.isDeletingDependents() {
		return gc.processDeletingDependentsItem(item)
	}

	// compute if we should delete the item
	ownerReferences := latest.GetOwnerReferences()
	if len(ownerReferences) == 0 {
		klog.V(2).Infof("object %s's doesn't have an owner, continue on next item", item.identity)
		return nil
	}

	solid, dangling, waitingForDependentsDeletion, err := gc.classifyReferences(item, ownerReferences)
	if err != nil {
		return err
	}
	klog.V(5).Infof("classify references of %s.\nsolid: %#v\ndangling: %#v\nwaitingForDependentsDeletion: %#v\n", item.identity, solid, dangling, waitingForDependentsDeletion)

	switch {
	case len(solid) != 0:
		klog.V(2).Infof("object %#v has at least one existing owner: %#v, will not garbage collect", item.identity, solid)
		if len(dangling) == 0 && len(waitingForDependentsDeletion) == 0 {
			return nil
		}
		klog.V(2).Infof("remove dangling references %#v and waiting references %#v for object %s", dangling, waitingForDependentsDeletion, item.identity)
		// waitingForDependentsDeletion needs to be deleted from the
		// ownerReferences, otherwise the referenced objects will be stuck with
		// the FinalizerDeletingDependents and never get deleted.
		ownerUIDs := append(ownerRefsToUIDs(dangling), ownerRefsToUIDs(waitingForDependentsDeletion)...)
		patch := deleteOwnerRefStrategicMergePatch(item.identity.UID, ownerUIDs...)
		_, err = gc.patch(item, patch, func(n *node) ([]byte, error) {
			return gc.deleteOwnerRefJSONMergePatch(n, ownerUIDs...)
		})
		return err
	case len(waitingForDependentsDeletion) != 0 && item.dependentsLength() != 0:
		deps := item.getDependents()
		for _, dep := range deps {
			if dep.isDeletingDependents() {
				// this circle detection has false positives, we need to
				// apply a more rigorous detection if this turns out to be a
				// problem.
				// there are multiple workers run attemptToDeleteItem in
				// parallel, the circle detection can fail in a race condition.
				klog.V(2).Infof("processing object %s, some of its owners and its dependent [%s] have FinalizerDeletingDependents, to prevent potential cycle, its ownerReferences are going to be modified to be non-blocking, then the object is going to be deleted with Foreground", item.identity, dep.identity)
				patch, err := item.unblockOwnerReferencesStrategicMergePatch()
				if err != nil {
					return err
				}
				if _, err := gc.patch(item, patch, gc.unblockOwnerReferencesJSONMergePatch); err != nil {
					return err
				}
				break
			}
		}
		klog.V(2).Infof("at least one owner of object %s has FinalizerDeletingDependents, and the object itself has dependents, so it is going to be deleted in Foreground", item.identity)
		// the deletion event will be observed by the graphBuilder, so the item
		// will be processed again in processDeletingDependentsItem. If it
		// doesn't have dependents, the function will remove the
		// FinalizerDeletingDependents from the item, resulting in the final
		// deletion of the item.
		policy := metav1.DeletePropagationForeground
		return gc.deleteObject(item.identity, &policy)
	default:
		// item doesn't have any solid owner, so it needs to be garbage
		// collected. Also, none of item's owners is waiting for the deletion of
		// the dependents, so set propagationPolicy based on existing finalizers.
		var policy metav1.DeletionPropagation
		switch {
		case hasOrphanFinalizer(latest):
			// if an existing orphan finalizer is already on the object, honor it.
			policy = metav1.DeletePropagationOrphan
		case hasDeleteDependentsFinalizer(latest):
			// if an existing foreground finalizer is already on the object, honor it.
			policy = metav1.DeletePropagationForeground
		default:
			// otherwise, default to background.
			policy = metav1.DeletePropagationBackground
		}
		klog.V(2).Infof("delete object %s with propagation policy %s", item.identity, policy)
		return gc.deleteObject(item.identity, &policy)
	}
}

           

gc.processDeletingDependentsItem

主要邏輯:檢查該

node

blockingDependents

(即阻塞

owner

删除的

dpendents

)是否被完全删除,若是則移除該

node

finalizer

(finalizer移除後,kube-apiserver會删除該對象),若否,則将未删除的

blockingDependents

attemptToDelete

隊列中。

// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) processDeletingDependentsItem(item *node) error {
	blockingDependents := item.blockingDependents()
	if len(blockingDependents) == 0 {
		klog.V(2).Infof("remove DeleteDependents finalizer for item %s", item.identity)
		return gc.removeFinalizer(item, metav1.FinalizerDeleteDependents)
	}
	for _, dep := range blockingDependents {
		if !dep.isDeletingDependents() {
			klog.V(2).Infof("adding %s to attemptToDelete, because its owner %s is deletingDependents", dep.identity, item.identity)
			gc.attemptToDelete.Add(dep)
		}
	}
	return nil
}
           

item.blockingDependents

item.blockingDependents傳回會阻塞

node

dependents

。一個

dependents

會不會阻塞

owner

的删除,主要看這個

dependents

ownerReferences

blockOwnerDeletion

屬性值是否為

true

,為

true

則代表該

dependents

會阻塞

owner

的删除。

// pkg/controller/garbagecollector/graph.go
func (n *node) blockingDependents() []*node {
	dependents := n.getDependents()
	var ret []*node
	for _, dep := range dependents {
		for _, owner := range dep.owners {
			if owner.UID == n.identity.UID && owner.BlockOwnerDeletion != nil && *owner.BlockOwnerDeletion {
				ret = append(ret, dep)
			}
		}
	}
	return ret
}
           

2.3 GarbageCollector-gc.runAttemptToOrphanWorker

gc.runAttemptToOrphanWorker方法是負責處理

orphan

删除政策删除的

node

gc.runAttemptToDeleteWorker主要邏輯為循環調用

gc.attemptToDeleteWorker

下面來看一下

gc.attemptToDeleteWorker

方法的主要邏輯:

attemptToOrphan

gc.orphanDependents

方法:更新該

owner

dependents

OwnerReferences

owner

的相關字段去除,失敗則将該

owner

重新加入到

attemptToOrphan

gc.removeFinalizer

owner

Orphan

finalizers

// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) runAttemptToOrphanWorker() {
	for gc.attemptToOrphanWorker() {
	}
}

func (gc *GarbageCollector) attemptToOrphanWorker() bool {
	item, quit := gc.attemptToOrphan.Get()
	gc.workerLock.RLock()
	defer gc.workerLock.RUnlock()
	if quit {
		return false
	}
	defer gc.attemptToOrphan.Done(item)
	owner, ok := item.(*node)
	if !ok {
		utilruntime.HandleError(fmt.Errorf("expect *node, got %#v", item))
		return true
	}
	// we don't need to lock each element, because they never get updated
	owner.dependentsLock.RLock()
	dependents := make([]*node, 0, len(owner.dependents))
	for dependent := range owner.dependents {
		dependents = append(dependents, dependent)
	}
	owner.dependentsLock.RUnlock()

	err := gc.orphanDependents(owner.identity, dependents)
	if err != nil {
		utilruntime.HandleError(fmt.Errorf("orphanDependents for %s failed with %v", owner.identity, err))
		gc.attemptToOrphan.AddRateLimited(item)
		return true
	}
	// update the owner, remove "orphaningFinalizer" from its finalizers list
	err = gc.removeFinalizer(owner, metav1.FinalizerOrphanDependents)
	if err != nil {
		utilruntime.HandleError(fmt.Errorf("removeOrphanFinalizer for %s failed with %v", owner.identity, err))
		gc.attemptToOrphan.AddRateLimited(item)
	}
	return true
}
           

2.3.1 gc.orphanDependents

主要邏輯:更新指定

owner

dependents

OwnerReferences

owner

的相關字段去除,對于每個

dependents

,分别起一個goroutine來處理,加快處理速度。

// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) orphanDependents(owner objectReference, dependents []*node) error {
	errCh := make(chan error, len(dependents))
	wg := sync.WaitGroup{}
	wg.Add(len(dependents))
	for i := range dependents {
		go func(dependent *node) {
			defer wg.Done()
			// the dependent.identity.UID is used as precondition
			patch := deleteOwnerRefStrategicMergePatch(dependent.identity.UID, owner.UID)
			_, err := gc.patch(dependent, patch, func(n *node) ([]byte, error) {
				return gc.deleteOwnerRefJSONMergePatch(n, owner.UID)
			})
			// note that if the target ownerReference doesn't exist in the
			// dependent, strategic merge patch will NOT return an error.
			if err != nil && !errors.IsNotFound(err) {
				errCh <- fmt.Errorf("orphaning %s failed, %v", dependent.identity, err)
			}
		}(dependents[i])
	}
	wg.Wait()
	close(errCh)

	var errorsSlice []error
	for e := range errCh {
		errorsSlice = append(errorsSlice, e)
	}

	if len(errorsSlice) != 0 {
		return fmt.Errorf("failed to orphan dependents of owner %s, got errors: %s", owner, utilerrors.NewAggregate(errorsSlice).Error())
	}
	klog.V(5).Infof("successfully updated all dependents of owner %s", owner)
	return nil
}
           

總結

先來回顧一下

garbage collector

的構架與核心處理邏輯。

k8s garbage collector源碼分析(2)-處理邏輯分析

GraphBuilder

GarbageCollector

graphChanges

attemptToDelete

attemptToOrphan

)。

從apiserver list/watch的事件會放入到

graphChanges

隊列,而

GraphBuilder

graphChanges

隊列中取出事件進行處理,建構對象關聯依賴關系圖,并根據對象删除政策将關聯對象放入

attemptToDelete

attemptToOrphan

隊列中,接着

GarbageCollector

會從

attemptToDelete

attemptToOrphan

隊列中取出事件,再從對象關聯依賴關系圖中擷取資訊進行處理,最後回收删除對象。

總結一下3種對象删除政策下,

node

及其對象的删除過程。

dependent

deletionTimestamp

metadata.finalizers

foregroundDeletion

dependent

ownerReference.blockOwnerDeletion=true

metadata.finalizers

foregroundDeletion

owner

owner

dependent

Background

Finalizer

foreground

Orphan

Finalizer

GraphBuilder

dependents

attemptToDelete

GarbageCollector

dependents

dependent

dependent

Orphan

metadata.finalizers

orphan

GarbageCollector

dependents

OwnerReferences

owner

owner

metadata.finalizers

Orphan

owner

繼續閱讀