python 2.7 - Grouping data by value ranges -
i have csv file shows parts on order. columns include days late, qty , commodity.
i need group data days late , commodity sum of qty. days late needs grouped ranges.
>56 >35 , <= 56 >14 , <= 35 >0 , <=14
i hoping use dict how.
{'red':'>56,'amber':'>35 , <= 56','yellow':'>14 , <= 35','white':'>0 , <=14'}
i looking result this
red amber yellow white strsub 56 60 74 40 botdwg 20 67 87 34
i new pandas don't know if possible @ all. provide advice.
thanks
suppose start data:
df = pd.dataframe({'id': ('strsub botdwg'.split())*4, 'days late': [60, 60, 50, 50, 20, 20, 10, 10], 'quantity': [56, 20, 60, 67, 74, 87, 40, 34]}) # days late id quantity # 0 60 strsub 56 # 1 60 botdwg 20 # 2 50 strsub 60 # 3 50 botdwg 67 # 4 20 strsub 74 # 5 20 botdwg 87 # 6 10 strsub 40 # 7 10 botdwg 34
then can find status category using pd.cut
. note default, pd.cut
splits series df['days late']
categories half-open intervals, (-1, 14], (14, 35], (35, 56], (56, 365]
:
df['status'] = pd.cut(df['days late'], bins=[-1, 14, 35, 56, 365], labels=false) labels = np.array('white yellow amber red'.split()) df['status'] = labels[df['status']] del df['days late'] print(df) # id quantity status # 0 strsub 56 red # 1 botdwg 20 red # 2 strsub 60 amber # 3 botdwg 67 amber # 4 strsub 74 yellow # 5 botdwg 87 yellow # 6 strsub 40 white # 7 botdwg 34 white
now use pivot
dataframe in desired form:
df = df.pivot(index='id', columns='status', values='quantity')
and use reindex
obtain desired order rows , columns:
df = df.reindex(columns=labels[::-1], index=df.index[::-1])
thus,
import numpy np import pandas pd df = pd.dataframe({'id': ('strsub botdwg'.split())*4, 'days late': [60, 60, 50, 50, 20, 20, 10, 10], 'quantity': [56, 20, 60, 67, 74, 87, 40, 34]}) df['status'] = pd.cut(df['days late'], bins=[-1, 14, 35, 56, 365], labels=false) labels = np.array('white yellow amber red'.split()) df['status'] = labels[df['status']] del df['days late'] df = df.pivot(index='id', columns='status', values='quantity') df = df.reindex(columns=labels[::-1], index=df.index[::-1]) print(df)
yields
red amber yellow white id strsub 56 60 74 40 botdwg 20 67 87 34
Comments
Post a Comment