Author | Ma Chao, editor-in-charge, | Zhang Hongyue
Produced by | CSDN Blog
The last time I wrote .Net code in C# was almost 10 years ago, because Java was already quite kingly. Net was almost overwhelmed. So at the time the author was interested in this. Net's project attitude is more perfunctory, does not have a deep understanding of some of the excellent mechanisms, and did not give it when writing "C and Java are not so fragrant, who can be king in the era of high concurrency" last year. Net has a place, but recently by coincidence, I took over a project on the Windows side, which also gave me the opportunity to revisit myself about it. Knowledge of the Net Framework.
The function to be realized by the project prototype is not complicated, mainly to record the Chinese copy of the record of the removable storage device, and the need to occupy as little as possible system resources, and in the development process I inadvertently added a line of code that seems to have no effect, using the Invoke method to record the file copying situation, such an operation makes the program execution efficiency significantly higher, the reasons behind this are particularly worth summarizing.

A line of useless code that increases efficiency?
Since the file copy information that the author needs to record does not echo the needs of the UI, so it does not consider the problem of concurrency conflicts, in the initial version of the implementation, I for the filesystemwatcher callback events, are directly handled, as follows:
private void DeleteFileHandler(object sender, FileSystemEventArgs e) { if(files.Contains(e.FullPath)) { files.Remove(e.FullPath); //一些其它操作 } }
The processing efficiency of this program is that if 20 files are copied at the same time on an ordinary office PC, the CPU utilization rate of the USB flash drive monitoring program is about 0.7% during the copying process.
But by chance, I used event/Delegate's Invoke mechanism and found that such a seemingly waste operation reduced the CPU usage of the program to about 0.2%.
private void UdiskWather_Deleted(object sender, FileSystemEventArgs e) { if(this.InvokeRequired) { this.Invoke(new DeleteDelegate(DeleteFileHandler), new object { sender,e }); } else { DeleteFileHandler(sender, e); } }
In my initial understanding, the Delete mechanism in .net was to be disassembled and boxed during the call process, so this was not bad to slow down the operation, but the actual verification result was the opposite.
What is the use of seemingly useless Invoke?
Here first give the conclusion, Invoke can improve the efficiency of program execution, the key is that the thread in the multi-core switching consumption is much higher than the resource consumption of disassembly, boxing, we know that the core of our program is to operate the file of this shared variable, every time in the detected U disk directory if the file changes, its callback notification function may run in different threads, as follows:
Behind the Invoke mechanism is to ensure that all operations on the shared variable files are performed by a single thread.
presently. Net's code is open source, let's roughly explain the invocation process of Invoke, whether it is BeginningInvoke or Behind Invoke, it is actually called the MarshalledInvoke method to complete, as follows:
public IAsyncResult BeginInvoke(Delegate method, params Object[] args) { using (new MultithreadSafeCallScope) { Control marshaler = FindMarshalingControl; return(IAsyncResult)marshaler.MarshaledInvoke(this, method, args, false); } }
MarshaledInvoke's main job is to create a ThreadMethodEntry object, manage it in a linked list, and then call PostMessage to send the relevant information to the thread to communicate, as follows:
private Object MarshaledInvoke(Control caller, Delegate method, Object[] args, bool synchronous) { if (!IsHandleCreated) { throw new InvalidOperationException(SR.GetString(SR.ErrorNoMarshalingThread)); } ActiveXImpl activeXImpl = (ActiveXImpl)Properties.GetObject(PropActiveXImpl); if (activeXImpl != ) { IntSecurity.UnmanagedCode.Demand; } // We don't want to wait if we're on the same thread, or else we'll deadlock. // It is important that syncSameThread always be false for asynchronous calls. // bool syncSameThread = false; int pid; // ignored if (SafeNativeMethods.GetWindowThreadProcessId(new HandleRef(this, Handle), out pid) == SafeNativeMethods.GetCurrentThreadId) { if (synchronous) syncSameThread = true; } // Store the compressed stack information from the thread that is calling the Invoke // so we can assign the same security context to the thread that will actually execute // the delegate being passed. // ExecutionContext executionContext = ; if (!syncSameThread) { executionContext = ExecutionContext.Capture; } ThreadMethodEntry tme = new ThreadMethodEntry(caller, this, method, args, synchronous, executionContext); lock (this) { if (threadCallbackList == ) { threadCallbackList = new Queue; } } lock (threadCallbackList) { if (threadCallbackMessage == 0) { threadCallbackMessage = SafeNativeMethods.RegisterWindowMessage(Application.WindowMessagesVersion + "_ThreadCallbackMessage"); } threadCallbackList.Enqueue(tme); } if (syncSameThread) { InvokeMarshaledCallbacks; } else { // UnsafeNativeMethods.PostMessage(new HandleRef(this, Handle), threadCallbackMessage, IntPtr.Zero, IntPtr.Zero); } if (synchronous) { if (!tme.IsCompleted) { WaitForWaitHandle(tme.AsyncWaitHandle); } if (tme.exception != ) { throw tme.exception; } return tme.retVal; } else { return(IAsyncResult)tme; } }
Invoke's mechanism ensures that a shared variable can only be maintained by one thread, which is implicitly consistent with the DESIGN OF GO to use communication instead of shared memory, and their philosophy is to "let the same block of memory be operated by only one thread at a time". This is inextricably linked to the multi-core CPU (SMP) of modern computing architecture.
Here we first come to popularize the communication between CPUs MESI protocol content. We know that modern CPUs are equipped with caches, according to the MESI protocol convention of multi-core cache synchronization, each cache line has four states, namely E(exclusive), M(modified), S(shared), I(invalid), where:
M: Indicates that the contents of the cache line are modified, and the cache line is only cached in that CPU. This state represents that the data for the cached row is different from the data in memory.
E: It means that the contents of the cache line corresponding to memory are only cached by the CPU, and other CPUs do not cache the contents of the cache corresponding memory line. The data in the cache row in this state is consistent with the data in memory.
I: Means that the content in the cache row is not valid.
S: This state means that the data is stored not only in the local CPU cache, but also in the cache of other CPUs. The data in this state is also consistent with the data in memory. However, whenever the CPU modifies the cache line, the state of the line will become I.
The state transition diagram of the four states is as follows:
We also mentioned above, different threads are running on different CPU cores, when different CPUs operate the same piece of memory, from the perspective of CPU0, CPU1 will continue to initiate remote write operations, which will make the cached state will always be state migration between S and I, and once the state becomes I will take more time to synchronize the state.
So we can basically come up with this. Invoke(new DeleteDelegate(DeleteFileHandler), new object { sender,e }); ; After this seemingly insignificant line of code, the maintenance operation of the file-sharing variable is inadvertently manipulated by multiple cores and multiple threads, and becomes a multi-thread communication between many sub-threads to the main thread, and all maintenance operations are carried out by the main thread, which also makes the final execution more efficient.
In-depth interpretation of why two locks should be added
In the current trend of using communication instead of shared memory, locks are actually the most important design.
We see in. Net's Introduce implementation uses two locks lock (this) and lock (threadCallbackList).
lock (this) { if (threadCallbackList == ) { threadCallbackList = new Queue; } } lock (threadCallbackList) { if (threadCallbackMessage == 0) { threadCallbackMessage = SafeNativeMethods.RegisterWindowMessage(Application.WindowMessagesVersion + "_ThreadCallbackMessage"); } threadCallbackList.Enqueue(tme); }
In .NET, the lock keyword can be basically understood as providing a COMPARE And Swap similar to CAS. The CAS principle is to constantly compare the "expected value" with the "actual value", and when they are equal, it means that the CPU holding the lock has released the lock, and then the CPU trying to acquire the lock will try to write the value of "new" (0) to "p" (swap) to indicate that it has become spinlock's new owner. The pseudocode is shown below:
void CAS(int p, int old,int new) { if *p != old do nothing else *p ← new }
CAS-based lock efficiency is no problem, especially in the absence of multi-core competition CAS is particularly good, but cas's biggest problem is unfair, because if there are multiple CPUs at the same time applying for a lock, then the CPU that has just released the lock is very likely to gain an advantage in the next round of competition, get this lock again, the result is that a CPU is busy, while other CPUs are very idle, we often criticize multi-core SOC "one core is difficult, eight cores are watching" In fact, many times it is caused by this unfairness.
In order to solve the unfair problem of CAS, the industry gods have introduced the TAS (Test And Set Lock) mechanism, personally feel that it is better to understand the T in the TAS as a ticket, and the TAS scheme maintains a head and tail index value that requests the lock, which is composed of two indexes, "head" and "tail".
struct lockStruct{ int32 head; int32 tail; } ;
"head" represents the head of the request queue, and "tail" represents the tail of the request queue, both of which have an initial value of 0.
At the beginning, the cpu that first applies finds that the tail value of the queue is 0, then the CPU will directly acquire the lock, and will update the tail value to 1, and when the lock is released, the head value will be updated to 1.
In general, when the lock is released by the CPU held, the head value of the queue will be added to 1, when other CPUs try to acquire the lock, the tail value of the lock is obtained, and then the tail value is added to 1, and stored in its own dedicated register, and then the updated tail value is updated to the tail of the queue. The next step is to continuously cycle through the comparison to determine whether the current "head" value of the lock is equal to the "tail" value stored in the register, and if it is equal, it means that the lock was successfully acquired.
TAS This is similar to when a user goes to the government affairs hall to do business, first of all, in the call machine to get the number, when the staff broadcast the call to the number is consistent with the number in your hand, you get the ownership of the office counter.
However, TAS has a certain efficiency problem, according to the MESI protocol we introduced above, the head and tail index of this lock is actually shared between various CPUs, so the frequent updates of tail and head will still cause the adjustment of the cache non-stop invalidate, which will greatly affect the efficiency.
So we see in . Net's implementation simply introduces the threadCallbackList queue directly, and constantly adds tme (ThreadMethodEntry) to the end of the team, while the process that receives the message continuously gets the message from the team head.
lock (threadCallbackList) { if (threadCallbackMessage == 0) { threadCallbackMessage = SafeNativeMethods.RegisterWindowMessage(Application.WindowMessagesVersion + "_ThreadCallbackMessage"); } threadCallbackList.Enqueue(tme); }
When the team leader points to this tme, the message is sent, which is actually an implementation similar to MAS, of course, MAS actually establishes a dedicated queue for each CPU, which is slightly different from the design of Invoke, but the basic idea is the same.
Many times when I am young, I can't taste the taste behind a lot of things, which also makes me miss a lot of technical points that are very worth summarizing, so I summarize the recent experience of using C# during the Spring Festival holiday to entertain readers and wish everyone a happy New Year!
About author:Chao Ma, Fintech expert, external double tutor of Gaoli Financial Research Institute of Renmin University, MVP of Alibaba Cloud, Huawei's Top Ten Developer Stars in 2020, CSDN Appointment Columnist, famous FinTech Evangelist. Promoters and contributors to many domestic open source projects.
New Programmer 003 is officially launched, co-created by more than 50 technical experts, and a technical selection book for cloud-native and digital developers. The content has both development trends and methodological structures, and more than 30 well-known companies such as Huawei, Alibaba, ByteDance, NetEase, Kuaishou, Microsoft, Amazon, Intel, Siemens, Schneider and other 30 well-known companies have first-hand experience in cloud native and digital!