数据清洗之数据分组方法

2022-07-07 10:11:21

数据分组方法

分组计算根据某个或某几个字段对数据集进行分组，然后运用特点的函数，得到结果
使用groupby方法进行分组计算，得到分组对象GroupBy
语法为df.groupby(by=)
分组对象GroupBy可以运用描述性统计方法，如count(计数)、mean(均值)、median(中位数)、max(最大值)和min(最小值)等

import pandas as pd
import numpy as np
import

os.getcwd()

'D:\\Jupyter\\notebook\\Python数据清洗实战\\数据清洗之数据统计'

os.chdir('D:\\Jupyter\\notebook\\Python数据清洗实战\\数据')

df = pd.read_csv('online_order.csv', encoding='gbk', dtype={'customer':str, 'order':str})

df.head(5)

customer	order	total_items	discount%	weekday	hour	Food%	Fresh%	Drinks%	Home%	Beauty%	Health%	Baby%	Pets%
45	23.03	4	13	9.46	87.06	3.48	0.00	0.00	0.00	0.0	0.0
1	1	38	1.22	5	13	15.87	75.80	6.22	2.12	0.00	0.00	0.0	0.0
2	2	51	18.08	4	13	16.88	56.75	3.37	16.48	6.53	0.00	0.0	0.0
3	1	3	57	16.51	1	12	28.81	35.99	11.78	4.62	2.87	15.92	0.0	0.0
4	1	4	53	18.31	2	11	24.13	60.38	7.78	7.72	0.00	0.00	0.0	0.0

grouped = df.groupby('weekday')

type(grouped)

pandas.core.groupby.generic.DataFrameGroupBy

grouped.mean()

total_items	discount%	hour	Food%	Fresh%	Drinks%	Home%	Beauty%	Health%	Baby%	Pets%
weekday
1	30.662177	8.580705	14.693122	22.690866	20.000904	22.522993	13.932553	6.972394	1.152285	11.592562	1.007306
2	31.868612	8.638014	14.966197	23.994915	19.407738	24.346459	13.559191	4.903366	1.079423	11.277284	1.272638
3	31.869796	7.794507	15.059898	24.309274	19.957653	23.822470	13.282088	6.702640	1.156829	9.591389	0.937205
4	32.251899	8.068155	14.324185	24.374364	21.538027	24.553266	13.391946	4.806528	1.031490	9.058201	1.080473
5	31.406619	9.159031	13.386919	24.602790	20.549153	24.976466	12.485788	5.431221	1.248605	9.655343	0.908227
6	32.154814	8.414258	14.751084	23.743196	18.707788	23.593699	14.173291	5.878647	1.170585	11.478343	1.150980
7	32.373837	8.710171	16.989535	22.271512	21.020359	21.093767	13.632481	5.895322	1.145938

grouped.mean()['Food%']

weekday
1    22.690866
2    23.994915
3    24.309274
4    24.374364
5    24.602790
6    23.743196
7    22.271512
Name: Food%, dtype: float64

# 多个字段分组
grouped = df.groupby(by=['customer', 'weekday'])

grouped.sum()['total_items']

customer  weekday
0         4           96
          5           38
1         1          423
          2          127
          4           37
          5           36
10        1           23
          3           26
100       1           38
          2           78
          3           78
          7          135
1000      2            6
10000     6           30
10001     6           15
10002     3           11
          6           42
          7           48
10003     2            4
10004     2           28
          3          131
          4           93
10005     7           29
10006     2           20
          5           27
          7           26
10007     2            6
          6           15
10008     7          123
10009     1            2
                    ... 
9984      6           40
          7           61
9985      6           11
9986      1           50
          6           49
          7           50
9987      1           23
9988      1           18
          4            1
9989      1           27
999       1          173
          2           45
          4           60
          5          137
          7          149
9990      7            8
9991      6           46
9992      1           13
          2           14
          5           25
          6           24
9993      6            8
9994      2           64
          3           57
9995      7           14
9996      7           14
9997      6            5
9998      1           28
          6           10
9999      6            4
Name: total_items, Length: 20777, dtype: int64

数据清洗之数据分组方法

数据分组方法

继续阅读

来自python的【条件控制/语句循环/break/continue/else/pass】一、条件控制二、语句循环

无法解析的外部符号 wmain，该符号在函数 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink导出用例转换工具(XML2Excel)

YAML简介和PyYAML安全操作YAML支持的类型YAML的优点：yaml的基本语法python操作

Small tricks

libsvm for python 安装

学习软件测试基础测试第七天

Zeppelin 配置访问 REST APIApache Zeppelin Configuration REST API

【Torch】最简洁logging使用指南

27. Remove Element(列表)题目代码

Cloud Studio初体验

使用 ctypes 进行 Python 和 C 的混合编程

【python】【数据处理】画多维数据分布图

【python】netconf协议对接管理设备

「Python 网络自动化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 网络设备

在python中创建excel并写入

数据清洗之 数据分组方法

数据分组方法

继续阅读

数据清洗之数据分组方法