pandas - Pandas MultiIndex中的重采样

  显示原文与译文双语对照的内容
0 0

我有一些分层数据它底部沐浴时间序列数据看起来如下所示:

df = pandas.DataFrame(
    {'value_a': values_a, 'value_b': values_b},
    index=[states, cities, dates])
df.index.names = ['State', 'City', 'Date']
df
                               value_a  value_b
State   City       Date                        
Georgia Atlanta    2012-01-01        0       10
                   2012-01-02        1       11
                   2012-01-03        2       12
                   2012-01-04        3       13
        Savanna    2012-01-01        4       14
                   2012-01-02        5       15
                   2012-01-03        6       16
                   2012-01-04        7       17
Alabama Mobile     2012-01-01        8       18
                   2012-01-02        9       19
                   2012-01-03       10       20
                   2012-01-04       11       21
        Montgomery 2012-01-01       12       22
                   2012-01-02       13       23
                   2012-01-03       14       24
                   2012-01-04       15       25

我想每个城市执行时间重新取样,就像这样

df.resample("2D", how="sum")

输出将为

                             value_a  value_b
State   City       Date                        
Georgia Atlanta    2012-01-01        1       21
                   2012-01-03        5       25
        Savanna    2012-01-01        9       29
                   2012-01-03       13       33
Alabama Mobile     2012-01-01       17       37
                   2012-01-03       21       41
        Montgomery 2012-01-01       25       45
                   2012-01-03       29       49

原样, df.resample('2D', how='sum')让我感动

TypeError: Only valid with DatetimeIndex or PeriodIndex

不错,可是我其实并希望这个能正常工作:

>>> df.swaplevel('Date', 'State').resample('2D', how='sum')
TypeError: Only valid with DatetimeIndex or PeriodIndex

此时我真的不知道该怎么办了。运行。 有没有方法堆栈和unstack也许能够帮我?

时间:原作者:3个回答

0 0
import pandas as pd
import datetime as DT
values_a = range(16)
values_b = range(10, 26)
states = ['Georgia']*8 + ['Alabama']*8
cities = ['Atlanta']*4 + ['Savanna']*4 + ['Mobile']*4 + ['Montgomery']*4
dates = pd.DatetimeIndex([DT.date(2012,1,1)+DT.timedelta(days = i) for i in range(4)]*4)
df = pd.DataFrame(
    {'value_a': values_a, 'value_b': values_b},
    index = [states, cities, dates])
df.index.names = ['State', 'City', 'Date']
df.reset_index(level=[0, 1], inplace=True)
df.groupby(['State','City'])
print(df.groupby(['State','City']).resample('2D', how='sum'))

产生

                               value_a  value_b
State   City       Date                        
Alabama Mobile     2012-01-01       17       37
                   2012-01-03       21       41
        Montgomery 2012-01-01       25       45
                   2012-01-03       29       49
Georgia Atlanta    2012-01-01        1       21
                   2012-01-03        5       25
        Savanna    2012-01-01        9       29
                   2012-01-03       13       33
原作者:
0 0

另一种使用stack/unstack

df.unstack(level=[0,1]).resample('2D', how='sum').stack(level=[2,1]).swaplevel(2,0)
                               value_a  value_b
State   City       Date
Georgia Atlanta    2012-01-01        1       21
Alabama Mobile     2012-01-01       17       37
        Montgomery 2012-01-01       25       45
Georgia Savanna    2012-01-01        9       29
        Atlanta    2012-01-03        5       25
Alabama Mobile     2012-01-03       21       41
        Montgomery 2012-01-03       29       49
Georgia Savanna    2012-01-03       13       33

注释:

  1. 不知道关于性能比较
  2. 可能熊猫bug stack( level=[ 2 ,1] ) 正常工作,但是stack( level=[ 1 ,2] ) 失败
原作者:
0 0

它的工作原理:

> df.groupby(level=[0,1]).apply(lambda x: x.set_index('Date').resample('2D', how='sum'))
                               value_a  value_b
State   City       Date
Alabama Mobile     2012-01-01       17       37
                   2012-01-03       21       41
        Montgomery 2012-01-01       25       45
                   2012-01-03       29       49
Georgia Atlanta    2012-01-01        1       21
                   2012-01-03        5       25
        Savanna    2012-01-01        9       29
                   2012-01-03       13       33

如果Date列为字符串,则转换为datetime预先:

df['Date'] = pd.to_datetime(df['Date'])
原作者:
...