Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Grouping non-sorted year column values to a single column using pandas

I have a tsv file with the following values :

Jan year  Feb year  March year
23  1992  34  1991   43   1993
25  1990  36  1993   45   1990
21  1993  38  1992   47   1991
27  1991  30  1990   49   1992

How can I merge all the year columns to a single column so that I can have something like this :

year  Jan Feb March
1990  25   30  45
1991  27   34  47
1992  23   38  49
1993  21   36  43

I’m using pandas, and since the years are shuffled, hence I’m unable to consolidate them.
Please let me know on what needs to be done to achieve the same.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Here is the data source, and in a nutshell, I want a single year column instead of multiple year columns.

>Solution :

Group your columns by virtual groups:

out = df.groupby(np.repeat(np.arange(len(df.columns) // 2), 2), axis=1) \
        .apply(lambda x: x.set_index('year')).droplevel(0, axis=1).reset_index()
print(out)

# Output
   year  Jan  Feb  March
0  1990   25   30     45
1  1991   27   34     47
2  1992   23   38     49
3  1993   21   36     43

With your data, use this code before:

df = pd.read_csv('England_SE_and_Central_S.txt', skiprows=5, sep='\s+')
df.columns = df.columns.str.split('.').str[0]

out = df.groupby(np.repeat(np.arange(len(df.columns) // 2), 2), axis=1) \
        .apply(lambda x: x.set_index('year')).droplevel(0, axis=1).reset_index()

Output:

>>> out
     year  jan  feb  mar  apr  may   jun   jul   aug   sep  oct  nov  dec    win   spr    sum   aut   ann
0    1884  3.5  2.6  2.5  2.5  6.3   9.0  11.6  11.5  10.5  4.8  2.1  2.1    ---  3.77  10.75  5.79  5.76
1    1885 -0.0  3.2  0.4  3.2  4.8   9.2  10.8   9.4   8.3  3.9  3.2  0.4   1.69  2.78   9.82  5.11  4.72
2    1886 -1.1 -1.8  0.2  3.4  6.1   8.6  11.0  11.1   9.9  7.8  3.1 -1.0  -0.80  3.23  10.27  6.91  4.81
3    1887 -1.6 -0.1 -0.1  1.2  5.3   9.2  11.4  10.0   7.6  2.7  1.7 -0.1  -0.92  2.13  10.23  3.98  3.96
4    1888 -0.3 -1.6 -0.1  2.0  5.3   8.9  10.1  10.1   8.5  2.8  5.6  2.0  -0.66  2.39   9.73  5.58  4.45
..    ...  ...  ...  ...  ...  ...   ...   ...   ...   ...  ...  ...  ...    ...   ...    ...   ...   ...
133  2017  0.2  3.6  5.5  4.2  8.7  11.9  13.2  11.8   9.8  9.1  3.3  2.0   2.07  6.14  12.29  7.44  6.96
134  2018  2.9 -0.5  2.1  6.7  8.0  11.1  13.7  12.5   9.3  6.9  5.3  4.4   1.55  5.60  12.47  7.16  6.92
135  2019  0.7  2.2  4.8  4.4  6.4  10.5  12.8  12.4  10.2  7.5  3.6  3.1   2.43  5.23  11.95  7.12  6.59
136  2020  3.8  3.8  3.2  5.1  7.1  10.9  11.7  14.1   9.9  7.8  5.9  2.8   3.55  5.12  12.24  7.87  7.18
137  2021  0.7  2.3  3.1  1.2  6.0  11.5  13.3  12.1  11.6  8.8  3.9  4.6   1.92  3.44  12.28  8.10  6.61

[138 rows x 18 columns]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading