laitimes

Watch the World Cup, the villa is gone? Because your algorithm is used incorrectly

author:natural history

At the opening of each World Cup football match, there will be various celebrities for the final ownership of the Hercules Cup, including mathematical models in the hands of scientists. "There are a thousand Hamlets in a thousand spectators", ten different models may predict ten different championships, such as Brazil, Argentina, France are all predicted favorites.

For the 2022 Qatar World Cup, Matthew Penn, a British epidemiological statistical researcher, is more optimistic about the Belgian team [1,2], knowing that the Belgian team has never reached a final in the history of the World Cup. But this dude is by no means "nonsense", firstly, people have their own data-based probabilistic models, and secondly, this model once shined in the prediction of Euro 2020, when it accurately predicted that Italy and England would be champions and runners-up, and predicted six of the top eight teams.

Watch the World Cup, the villa is gone? Because your algorithm is used incorrectly

Figure 1: Belgium (red) beats Brazil in the quarterfinals of the 2018 World Cup

Source: Кирилл Венедиктов/Wikimedia Commons

On November 15, Nature magazine interviewed the prediction god and announced the results of this World Cup that he predicted using the same model, and the likelihood of each team winning the championship is as follows:

Team Probability of winning the title (%)
1. Belgium 13.88
2. Brazil 13.51
3. France 12.11
4. Argentina 11.52
5. Netherlands 9.65
6. Germany 7.24
7. Spain 6.37
8. Switzerland 5.29
9. Portugal 3.78
10. Uruguay 3.36
11. Denmark 3.17
12. England 2.56
13. Poland 2.33
14. Croatia 1.46
15. Mexico 0.67

So how did these European Cup and World Cup predictions come about? Simply put, it is a roll of dice from each game, but it is not a simple roll of dice, but a Poisson probability distribution.

We throw an ordinary dice and get any one of 1 to 6 points, and the probability of the six outcomes is equal, called a uniform probability distribution. For Poisson probability distribution, we consider such a situation: suppose there is a small shop on the street that is not very good, operating for 10 hours a day, and an average of 30 customers can come every day, then the average number of customers per hour is only 3, and assuming that customers are randomly selected at a time, there is no "peak flow flow", if you randomly choose one of the hours of business, the number of customers must be 3? Obviously, not necessarily, this time it happens that no one will come, and the next time it happens that a dozen people may come at once. The French mathematician Poisson gave the following formula:

Watch the World Cup, the villa is gone? Because your algorithm is used incorrectly

λ=3 represents the average, P represents the probability of k people coming to the hour's time period, and e is a constant of nature. In Poisson's eyes, the probability of exactly 3 customers coming in this small store in an hour (average) is 22.4%, while the probability of one person not coming is 4.98%, and the probability of many people coming also exists, but the probability is very small, for example, the probability of 10 people coming is 0.08%, and the probability of other people can also be calculated one by one, as shown in the figure below.

Watch the World Cup, the villa is gone? Because your algorithm is used incorrectly

Figure 2: Poisson probability distribution with a mean of 3

Source: Homemade by the author of this article

In reality, the Poisson probability distribution is actually everywhere, and many real data are strikingly similar to this distribution. These include the number of radioactive decay of nuclear material per second, the number of natural disasters such as earthquakes, the number of people queuing in public places, the number of machine failures, the number of aircraft crashes per year, the number of sick people in a certain area, the number of criminal cases in various regions of the city, and even the number of Prussian soldiers kicked to death by horses during the Franco-Prussian War.

In Matthew Penn's model, the Poisson probability distribution is used to represent the number of goals scored by a particular side in each football game. The outcome and score of a match naturally depend on the strength and luck of both sides, and uncertainty is frowned upon in certainty.

In order to measure the strength of each team, the model sets each team with "attack power" and "defensive vulnerability" index, the higher the former value, the easier it is to score goals, and the higher the latter value indicates that the easier it is to concede the goal and the weaker the defense. In this regard, players of various online game board games must be no strangers, and in more distant times, the "Water Margin" hero card presented in instant noodles is also marked with the attack and defense power of each good man. Obviously, the first-class team has strong offensive power and low defensive vulnerability, the second-class team has weak offensive power and low defensive vulnerability, or conversely, the worst team has weak offensive power and high defensive vulnerability.

Watch the World Cup, the villa is gone? Because your algorithm is used incorrectly

Figure 3: The Water Margin hero card presented in the crisp noodle bag

Source: Zhao Yang (shooting) / Light Science Popularization

If team A and team B play a game, according to the "most reasonable and deserved" play, the number of goals expected by team A is A's attack power multiplied by B's defensive vulnerability, and the number of goals expected by team B is B's attack power multiplied by A's defensive vulnerability. Suppose Team A has an attack power of 12 and a defensive vulnerability of 0.1, and Team B has an attack power of 6 and a defensive vulnerability of 0.2, and the "normal" score of the two teams is 2.4:0.6, which is about 2:1. But football is round, we can only think that 2:1 is the most likely score, and there are various other possibilities, so we regard the uncertainty of the number of goals scored by team A as a Poisson probability distribution with an average of 2.4, and team B as a Poisson probability distribution with an average of 0.6, and the probability size of various possible scores depends on the product of the probability values of the two goals scored.

Of course, one of the most crucial questions has not yet been said, how to determine the value of each team's attack power and defensive vulnerability? The answer is to constantly adjust the two values based on the historical results between the teams in recent years, so that the predicted probability distribution of the score matches the statistical distribution of the actual record as closely as possible. In this way, when any two teams play in the World Cup, the probability of various scores can be roughly predicted in advance, simulating the entire schedule, and finally determining the probability of each team winning the World Cup.

Poisson probability distribution is also a "frequent visitor" in the field of optics, but it is also a "thorn" that often causes trouble. The uncertainty of probability brings surprises, suspense and excitement to football matches, and brings more unbearable messing noise signals to optical imaging.

A beam of light can be seen as composed of many tiny photons, after uniformly illuminating a white paper, it seems that the intensity is consistent everywhere on the paper, but in fact, the number of photons reflected at each position of the paper will be different, and the number of photons also corresponds to the difference in light illumination and darkness. Even at the same location, the number of photons reflected at different times will continue to fluctuate, all following the Poisson probability distribution.

For cameras, the distribution of the number of photons each time it falls on the sensor also has the uncertainty of Poisson probability, inevitably introducing shot noise [3] (Figure 4 left), and almost no matter how well designed a camera is, it is impossible to directly remove this noise. According to the Poisson probability distribution formula, the fluctuation degree of the number of photons compared to the average number of fluctuations will increase with the increase of the square root of the average number of photons, but the average value of the number of photons is proportional to the size of the signal you want to receive, so when the light intensity becomes larger (the number of photons increases), although the shot noise is getting bigger, the ratio of signal to noise (signal-to-noise ratio) will also become larger, and the overall image you see will be clearer.

Watch the World Cup, the villa is gone? Because your algorithm is used incorrectly

Figure 4: Shot noise in images observed by fluorescence microscopy (left) and results processed by artificial intelligence algorithms (right) [4]

Source: Nature Biotechnology (2022): 1-11.

However, in many applications, increasing the number of photons or intensity of the signal light is futile, such as when using the lidar of the unmanned self-driving car outdoors, sometimes no matter how to increase the light signal intensity of the lidar, compared to sunlight is "small witch", at this time one way to remove noise is to record a signal in a blank scene without any target object, as background noise, and then each time it is recorded, remove this static background noise. However, in the face of sunlight, which itself is very intense, and is constantly changing dynamically with Poisson probability distribution, this trick is also not effective.

In CT medical imaging using X-rays, in order to improve the signal-to-noise ratio, it is not feasible to increase the intensity of X-ray irradiation, because excessive X-ray doses are harmful to the human body. And even in the case of visible light, in some live-cell microscopy observations, excessive light is enough to kill cells or disable cell function [4].

The 2022 Nobel Prize in Physics has once again attracted global attention in the field of quantum information, in which quantum secure communication, one of the related technologies, can provide very strong key security in theory, but the condition that needs to be met in practice is to have a light source that only generates a single photon at a time [5], "one can not be more, one can not be less", but in practice, the number of photons contained in a laser pulse often emitted is itself Poisson probability distribution. If the average value is set to 0.1, there are many times when the light source does not emit photons, and a small number of times it emits two or more photons, which will make the quantum communication system less efficient and safe than ideal.

However, there are always more solutions than difficulties, just as finding clues to the championship team from the chaotic probability of goal count, in the face of the elusive Poisson probability distribution noise, researchers can perfectly remove the noise in the captured photos through deep learning artificial intelligence algorithms and achieve flawless repair (Figure 4 right). Whether it is predicting the score of the game or removing the noise of the picture, it is inseparable from big data as a reference, and the fog of randomness is lifted layer by layer by layer.

The cover image of this article is provided by Light Science Popularization

Resources:

[1] Penn, Matthew J., and Christl A. Donnelly. "Analysis of a double Poisson model for predicting football results in Euro 2020." Plos one17.5 (2022): e0268511.

[2]D. Adam, “Science and the World Cup: how big data is transforming football,” Nature 611, 444-446 (2022)

[3]https://en.wikipedia.org/wiki/Shot_noise

[4] Li, Xinyang, et al. "Real-time denoising enables high-sensitivity fluorescence time-lapse imaging beyond the shot-noise limit." Nature Biotechnology(2022): 1-11.

[5]Y. Hu, X. Peng, T. Li and H. Guo, “On the Poisson approximation to photon distribution for faint lasers,” Physics Letters A367(3), 173-176 (2007).

This article is reprinted with permission and published by | Jiao Shuming (Assistant Researcher, Peng Cheng Laboratory / Ph.D. in Electronic Engineering, City University of Hong Kong)

Reviewer | Li Wei (Dean of Chinese Academy of Sciences, Chunguang Machine)

WeChat Editor | Ah what cool

Read on