I have a tsv file with the following values :
Jan year Feb year March year
23 1992 34 1991 43 1993
25 1990 36 1993 45 1990
21 1993 38 1992 47 1991
27 1991 30 1990 49 1992
How can I merge all the year columns to a single column so that I can have something like this :
year Jan Feb March
1990 25 30 45
1991 27 34 47
1992 23 38 49
1993 21 36 43
I’m using pandas, and since the years are shuffled, hence I’m unable to consolidate them.
Please let me know on what needs to be done to achieve the same.
Here is the data source, and in a nutshell, I want a single year column instead of multiple year columns.
>Solution :
Group your columns by virtual groups:
out = df.groupby(np.repeat(np.arange(len(df.columns) // 2), 2), axis=1) \
.apply(lambda x: x.set_index('year')).droplevel(0, axis=1).reset_index()
print(out)
# Output
year Jan Feb March
0 1990 25 30 45
1 1991 27 34 47
2 1992 23 38 49
3 1993 21 36 43
With your data, use this code before:
df = pd.read_csv('England_SE_and_Central_S.txt', skiprows=5, sep='\s+')
df.columns = df.columns.str.split('.').str[0]
out = df.groupby(np.repeat(np.arange(len(df.columns) // 2), 2), axis=1) \
.apply(lambda x: x.set_index('year')).droplevel(0, axis=1).reset_index()
Output:
>>> out
year jan feb mar apr may jun jul aug sep oct nov dec win spr sum aut ann
0 1884 3.5 2.6 2.5 2.5 6.3 9.0 11.6 11.5 10.5 4.8 2.1 2.1 --- 3.77 10.75 5.79 5.76
1 1885 -0.0 3.2 0.4 3.2 4.8 9.2 10.8 9.4 8.3 3.9 3.2 0.4 1.69 2.78 9.82 5.11 4.72
2 1886 -1.1 -1.8 0.2 3.4 6.1 8.6 11.0 11.1 9.9 7.8 3.1 -1.0 -0.80 3.23 10.27 6.91 4.81
3 1887 -1.6 -0.1 -0.1 1.2 5.3 9.2 11.4 10.0 7.6 2.7 1.7 -0.1 -0.92 2.13 10.23 3.98 3.96
4 1888 -0.3 -1.6 -0.1 2.0 5.3 8.9 10.1 10.1 8.5 2.8 5.6 2.0 -0.66 2.39 9.73 5.58 4.45
.. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
133 2017 0.2 3.6 5.5 4.2 8.7 11.9 13.2 11.8 9.8 9.1 3.3 2.0 2.07 6.14 12.29 7.44 6.96
134 2018 2.9 -0.5 2.1 6.7 8.0 11.1 13.7 12.5 9.3 6.9 5.3 4.4 1.55 5.60 12.47 7.16 6.92
135 2019 0.7 2.2 4.8 4.4 6.4 10.5 12.8 12.4 10.2 7.5 3.6 3.1 2.43 5.23 11.95 7.12 6.59
136 2020 3.8 3.8 3.2 5.1 7.1 10.9 11.7 14.1 9.9 7.8 5.9 2.8 3.55 5.12 12.24 7.87 7.18
137 2021 0.7 2.3 3.1 1.2 6.0 11.5 13.3 12.1 11.6 8.8 3.9 4.6 1.92 3.44 12.28 8.10 6.61
[138 rows x 18 columns]