I have a txt file like below,
140037|1|TOP SOIL DARK BROWN CLAY RICH ORGANIC|0|0.8
140037|2|MATER SOFT||
140037|3|SANDY CLAY SOFT MOTTLED GREY/ORANGE|0.8|1
140037|4|BROWN <15% SAND GRAINS||
140037|5|CLAY MOTTLED DARK GREY/ORANGE BROWN|1|3
140037|6|SOFT BECOMING FIRM MINOR SILT AND||
140037|7|FINE SAND IN SOME LAYERS||
and want to make it like this.
140037|1|TOP SOIL DARK BROWN CLAY RICH ORGANIC MATER SOFT|0|0.8
140037|2|SANDY CLAY SOFT MOTTLED GREY/ORANGE BROWN <15% SAND GRAINS|0.8|1
140037|3|CLAY MOTTLED DARK GREY/ORANGE BROWN SOFT BECOMING FIRM MINOR SILT AND FINE SAND IN SOME LAYERS|1|3
I am using pandas to read file but not sure how to use merge. Any help will be appreciated.
>Solution :
Use:
#create DataFrame
df = pd.read_csv(file, sep="|", header=None)
#create groups by non missing values in 3,4 columns and aggregate by join and first
df1 = (df.assign(g = df[[3,4]].notna().any(axis=1).cumsum())
.groupby([0,'g'], as_index=False)
.agg({2:' '.join, 3:'first', 4:'first'}))
print (df1)
0 g 2 3 4
0 140037 1 TOP SOIL DARK BROWN CLAY RICH ORGANIC MATER SOFT 0.0 0.8
1 140037 2 SANDY CLAY SOFT MOTTLED GREY/ORANGE BROWN <15%... 0.8 1.0
2 140037 3 CLAY MOTTLED DARK GREY/ORANGE BROWN SOFT BECOM... 1.0 3.0
#write to new file
df1.to_csv(new_file, index=False, header=False, sep='|')