I want to organize similar group data. Here is my data frame
SKU FATUT GUYGE FATUT-01 SUPAU GUYPE SUPAU-01 FATUT-02 GUYGE-01
my expected dataframe will be look like this:
SKU FATUT FATUT-01 FATUT-02 GUYGE GUYGE-01 SUPAU SUPAU-01 GUYPE
I want to organize similar group of data sequentially.
One option is to use
groupby with the parameter
sort=False; then concatenate the split DataFrames.
How it works:
dfby the strings before the dash
groupbysorts by the groupby keys by default; when we specify
sort=False, we make sure that the keys are stored in the same order as they first appear in
df, i.e. "GUYPE" stays behind "SUPAU".
groupbyobject contains information about the groups that you can unpack like a dictionary. Then unpack it and build a generator expression that returns the grouped DataFrames.
concat, concatenate the split DataFrames into one; by using
ignore_index=True, we ignore index coming from the split DataFrames and reset the index.
out = pd.concat((d for _, d in df.groupby(df['SKU'].str.split('-').str, sort=False)), ignore_index=True)
SKU 0 FATUT 1 FATUT-01 2 FATUT-02 3 GUYGE 4 GUYGE-01 5 SUPAU 6 SUPAU-01 7 GUYPE
But I feel like, for your task,
sort_values might work as well, even if the orders are not exactly the same as in the desired output:
df = df.sort_values(by='SKU', ignore_index=True)
SKU 0 FATUT 1 FATUT-01 2 FATUT-02 3 GUYGE 4 GUYGE-01 5 GUYPE 6 SUPAU 7 SUPAU-01