天天看点

足球队失去了主场优势

The corona crisis affects our daily lives in ways that we previously never thought possible. A couple of weeks ago, the Belgian first division soccer teams played their first games of the new season. After only four games, it is already painstakingly clear that the corona crisis also affects soccer competitions. What follows is a brief analysis of the impact of the corona crisis on soccer games.

吨 他电晕危机影响到我们日常生活的方式,我们以前从来没有想过可能。 几个星期前,比利时一等足球队参加了新赛季的第一场比赛。 仅经过四场比赛,就已经很清楚地看到电晕危机也影响了足球比赛。 接下来是对电晕危机对足球比赛的影响的简要分析。

As data scientists and statisticians, we are constantly performing experiments and testing hypotheses. Most of the time, we have to accept certain limitations to our experiments. It only happens occasionally that we are given the opportunity to work with a so-called natural experiment.

作为数据科学家和统计学家,我们一直在进行实验和检验假设。 大多数时候,我们必须接受实验的某些限制。 只有偶尔有机会让我们有机会进行所谓的自然实验。

A natural experiment is an empirical study in which individuals (or clusters of individuals) are exposed to the experimental and control conditions that are determined by nature or by other factors outside the control of the investigators.

自然实验是一种经验研究,其中个体(或个体集群)处于实验和控制条件下,这些条件由自然或调查人员无法控制的其他因素决定。

- Wikipedia

-维基百科

The corona crisis can be seen as a natural experiment to test the impact of soccer fans on the outcome of the games. The previous season (2019–2020) of the Belgian first division was stopped after the pandemic reached Belgium. Almost all games were completed with a stadium full of supporters. The first four games of the current season (2020–2021) were completed without supporters.

电晕危机可以看作是测试足球迷对比赛结果影响的自然实验。 大流行到达比利时后,比利时一级师的上一个赛季(2019–2020)停止了。 几乎所有比赛都在一个拥有众多支持者的体育场内完成。 本赛季(2020-2021年)的前四场比赛在没有支持者的情况下完成。

By using the corona crisis as a natural experiment, we can answer the following question:

通过使用电晕危机作为自然实验,我们可以回答以下问题:

Would removing all fans from the stadium impact the home advantage of soccer teams?

将所有球迷撤出体育场会影响足球队的主场优势吗?

让我们看一下数据 (Let’s have a look at the data)

For this project, I make use of two different datasets that were both downloaded from football-data-co.uk. One dataset contains all the game results for the Belgian first division season 2019–2020, the second dataset contains the results for the 2020–2021 season (so far). Both datasets are combined.

对于这个项目,我使用了两个不同的数据集,它们都是从football-data-co.uk下载的。 一个数据集包含比利时2019-2020赛季第一分区的所有比赛结果,第二数据集包含2020-2021赛季(到目前为止)的结果。 将两个数据集合并。

足球队失去了主场优势

The data that was downloaded from the database — Image by author 从数据库下载的数据—照片作者author

Besides the information shown above, the dataset contains much more information about the games. For this project, we only need the FTHG (amount of home goals scored), FTAG (amount of away goals scored), FTR (the result of the game), and the date of the game.

除了上面显示的信息之外,数据集还包含有关游戏的更多信息。 对于此项目,我们仅需要FTHG(计入本国进球数),FTAG(计入离场进球数),FTR(比赛结果)和比赛日期。

Two features are engineered:

设计了两个功能:

  1. HomeTeamWon: indicator variable equal to one when the home team won the game;

    HomeTeamWon:主队获胜时的指标变量等于1;

  2. Year_2020: indicator variable equal to one when the game was played in the season 2020–2021

    Year_2020:指标变量等于在2020-2021赛季进行比赛时的变量

检验假设 (Testing the hypothesis)

The main question asked above can be translated into a statistical hypothesis. We know that the games played in 2019–2020 had supporters in the stadium, and all games in 2020–2021 so far were played without supporters. This means that the null hypothesis can be expressed as follows:

上面提出的主要问题可以转化为统计假设。 我们知道,在2019–2020年进行的比赛在体育场内有支持者,而到目前为止2020–2021年的所有比赛都是在没有支持者的情况下进行的。 这意味着原假设可以表示为:

H0: The proportion of home game wins in 2019–2020 is equal to the proportion of home game wins in 2020–2021

H0:2019–2020年主场比赛获胜的比例等于2020–2021年主场比赛获胜的比例

These proportions correspond to the probabilities that the home team wins the game. If both proportions are the same, the presence of fans inside the stadium does not have an impact on the result of soccer games. However, if we find a statistically significant difference, we can say that the fans impact the outcome of the game.

这些比例对应于主队获胜的概率。 如果两个比例相同,则体育场内球迷的存在不会对足球比赛的结果产生影响。 但是,如果发现统计学上的显着差异,则可以说粉丝会影响比赛的结果。

The proportion of games won by the home team in 2019–2020 is equal to 48%. This means that almost one in two games ends with a win for the home team. In 2020–2021, this proportion is only equal to 28%!

主队在2019–2020年赢得比赛的比例等于48%。 这意味着几乎有两场比赛以主队获胜而告终。 在2020-2021年,这一比例仅等于28%!

Whether or not this difference in proportion is statistically significant can be tested by using a proportion test. This proportion test is implemented in R with the function prop.test. The difference in proportion identified above is significant with a p-value of 0.0342. It can thus be concluded that there is indeed an impact on the home advantage when the fans are removed from the stadium.

可以使用比例检验来检验这种比例差异是否具有统计显着性。 此比例测试在R中使用prop.test函数实现 。 上面确定的比例差异很大,p值为0.0342。 因此可以得出结论,当球迷从体育馆撤离时,确实会对主场优势产生影响。

让我们看看主客场进球数 (Let’s look at the number of goals scored by home and away teams)

A team can only win a soccer game if they manage to score at least one more goal than the other team. We already know that fans impact the probability of the home team winning a game, but, how exactly do they impact the game?

如果一支球队的进球数比另一支球队多至少一个,他们就可以赢得一场足球比赛。 我们已经知道球迷会影响主队赢得比赛的可能性,但是,他们到底会如何影响比赛呢?

In the season 2019–2020, a home team scored on average 1.63 goals per game while the away team scored on average 1.19 goals per game. In 2020–2021, however, a home team only scored 1.08 goals per game, and an away team 1.33 goals per game.

在2019–2020赛季,主队平均每场得分1.63个进球,而客队平均每场得分1.19进球。 然而,在2020-2021年间,主队每场仅得分1.08个进球,客队每场仅1.33个进球。

The plot below shows the distributions of home and away goals scored in both seasons. The dark blue bars show the results for the season 2020–2021 and the light blue bars show the result for the season 2019–2020.

下图显示了两个赛季得分后的主场和客场进球的分布情况。 深蓝色条显示的是2020-2021年的结果,浅蓝色条显示的是2019-2020年的结果。

足球队失去了主场优势

Comparing the number of goals scored — Image by author 比较进球数的目标—照片作者作者

In the current season, the home teams tend to score fewer goals compared to the previous season. On the other hand, away teams seem to score slightly more goals. The difference between the number of home goals scored is statistically significant with a p-value of 0.0041, but the difference between the away goals is not statistically significant with a p-value of 0.4581!

在本赛季中,主队的进球数往往少于上一赛季。 另一方面,客队似乎进了更多的进球。 进球得分之间的差异在统计学上具有显着性,p值为0.0041,但客场进球之间的差异在统计学上不显着,p值为0.4581!

The plots above combined with the statistical tests show that the lack of fans in the stadium results in fewer goals scored by the home team. However, the lack of fans does not impact the number of goals scored by the away team!

上面的图表结合统计测试表明,体育场内球迷的缺乏导致主队得分的进球减少。 但是,缺少球迷不会影响客队的进球数量!

这些结果是什么意思? (What do these results mean?)

足球队失去了主场优势

Michael Lee on Michael Lee在 Unsplash Unsplash

The analysis above shows how important fan engagement is in soccer. The supporters of the home team have a statistically significant impact on the result of the game, more specifically on the number of goals their team will score. Increasing fan engagement can thus not only increase the direct revenue of soccer teams [1] but it can also increase revenues that are related to the performance of the club e.g. increased prize money and merchandise sales.

上面的分析显示了球迷参与足球运动的重要性。 主队的支持者对比赛结果具有统计学上的显着影响,更具体地说,对他们的球队将进球的进球数产生影响。 因此,增加球迷的参与不仅可以增加足球队的直接收入[1],而且可以增加与俱乐部的表现有关的收入,例如增加的奖金和商品销售。

The results also raise the question: ‘If, by example, the second half of the season is played with supporters, should we somehow correct the results of games that were played without supporters?’. A team that only faced easy away games might be disadvantaged compared to a team that only faced difficult away games if we do not correct the results. This can have an impact on the final standings of the competition.

结果还引发了一个问题:“ 例如,如果本赛季的下半场是与支持者一起踢球,我们是否应该以某种方式更正没有支持者的比赛结果?”。 与只面对困难客场比赛的球队相比,如果我们不纠正结果,那他们可能会处于不利地位。 这可能会影响比赛的最终排名。

摘要 (Summary)

In this blog post, we investigated whether or not the home advantage of soccer teams is impacted when the fans are removed from the stadium. The clear answer is yes. This has been proven by using the corona crisis as a natural experiment. We compared the proportion of games that were won by the home team, and the number of goals scored by home and away teams. It was shown that the lack of fans inside the stadium affected the number of goals scored by the home team, but it did not impact the number of goals scored by the away team.

在此博客文章中,我们研究了当球迷从体育场撤离时,足球队的主场优势是否会受到影响。 明确的答案是肯定的 。 通过使用电晕危机作为自然实验已证明了这一点。 我们比较了主队赢得比赛的比例以及主队和客队的进球数。 事实表明,体育场内缺乏球迷会影响主队进球数,但并没有影响客队进球数。

Even though there have only been four games played in the new season, I am confident that the results are correct. Not only are the results statistically significant, but there is also a logical explanation for the observed phenomenon.

尽管新赛季只打了四场比赛,但我相信结果是正确的。 结果不仅在统计上有意义,而且对于观察到的现象也有逻辑上的解释。

翻译自: https://towardsdatascience.com/soccer-teams-lost-their-home-advantage-8d6692c1fc47