pandas - python - 如何对多个列的组进行求和或计数

71 4

我试图分组对pandas dataframe中的行进行计数或求和

例如,我有以下dataframe:


import numpy as np


df = pd.DataFrame(np.random.randint(0,5,size=(5, 7)), columns=["grey2","red1","blue1","red2","red3","blue2","grey1"])



 grey2 red1 blue1 red2 red3 blue2 grey1


0 4 3 0 2 4 0 2


1 4 2 0 4 0 3 1


2 1 1 3 1 1 3 1


3 4 4 1 4 1 1 1


4 3 4 1 0 3 3 1



我想在这个分组,例如,所有列按颜色分组,我期望的是:

如果我把这些数字相加


blue 15


grey 22


red 34



如果我计数(x> 0)我就会得到


 blue 7


 grey 10


 red 13




pd.pivot_table(data=df, index=df.index, values=["red1","red2","red3"], aggfunc='sum', margins=True)


 red1 red2 red3


0 3 2 4


1 2 4 0


2 1 1 1


3 4 4 1


4 4 0 3


ALL 14 11 9



pd.pivot_table(data=df, index=df.index, values=["red1","red2","red3"], aggfunc='count', margins=True)




 red1 red2 red3


 0 1 1 1


 1 1 1 1


 2 1 1 1


 3 1 1 1


 4 1 1 1


 All 5 5 5



注意:在此例子中,我仅使用颜色来简化情况,但是我可能有很多列,它们被称为col001至col300等,所以这些组可能是:


blue = col131, col254, col005


red = col023, col190, col053



等等。

时间: 原作者:

122 3

你可以使用pd.wide_to_long


data= pd.wide_to_long(df.reset_index(), stubnames=['grey','red','blue'], 


 i='index',


 j='group',


 sep=''


 )



输出:


# data


 grey red blue


index group 


0 1 2.0 3 0.0


 2 4.0 2 0.0


 3 NaN 4 NaN


1 1 1.0 2 0.0


 2 4.0 4 3.0


 3 NaN 0 NaN


2 1 1.0 1 3.0


 2 1.0 1 3.0


 3 NaN 1 NaN


3 1 1.0 4 1.0


 2 4.0 4 1.0


 3 NaN 1 NaN


4 1 1.0 4 1.0


 2 3.0 0 3.0


 3 NaN 3 NaN



还有:


data.sum()


# grey 22.0


# red 34.0


# blue 15.0


# dtype: float64



data.gt(0).sum()


# grey 10


# red 13


# blue 7


# dtype: int64



更新wide_to_long只是mergerename的便捷快捷方式,如果你有一个字典{cat:[col_list]},你可以解析为:


groups = {'blue' : ['col131', 'col254', 'col005'],


 'red' : ['col023', 'col190', 'col053']}



# create the inverse dictionary for mapping


inv_group = {v:k for k,v in groups.items()}



data = df.melt()



# map the original columns to group


data['group'] = data['variable'].map(inv_group)



# from now on, it's similar to other answers


# sum


data.groupby('group')['value'].sum()



# count


data['value'].gt(0).groupby(data['group']).sum()



原作者:
128 4


# Get rid of the numbers + reshape


df.columns = pd.Index(df.columns.str.rstrip('0123456789'), name='color')


df = df.melt()



df.groupby('color').sum()


# value


#color 


#blue 15


#grey 22


#red 34



df.value.gt(0).groupby(df.color).sum()


#color


#blue 7.0


#grey 10.0


#red 13.0


#Name: value, dtype: float64




# Unnecessary in this case, but more general


d = {'grey1': 'color_1', 'grey2': 'color_1', 


 'red1': 'color_2', 'red2': 'color_2', 'red3': 'color_2',


 'blue1': 'color_3', 'blue2': 'color_3'}



df.columns = pd.Index(df.columns.map(d), name='color')


df = df.melt()


df.groupby('color').sum()



# value


#color 


#color_1 22


#color_2 34


#color_3 15



原作者:
129 3

使用:


df.groupby(df.columns.str[:-1],axis=1).sum().sum()



输出:


blue 15


grey 22


red 34


dtype: int64



原作者:
...