Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Can I get a sub-DataFrame according to first letter in columns names?

I want to get only columns whose names start with 'Q1' and those starting with 'Q3', I know that this is possible by doing:

new_df=df[['Q1_1', 'Q1_2', 'Q1_3','Q3_1', 'Q3_2', 'Q3_3']]

But since my real df is too large (more than 70 variables) I search a way to get the new_df by using only desired first letters in the columns titles.

My example dataframe is:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df=pd.DataFrame({
    'Q1_1': [np.random.randint(1,100) for i in range(10)],
    'Q1_2': np.random.random(10),
    'Q1_3': np.random.randint(2, size=10),
    'Q2_1': [np.random.randint(1,100) for i in range(10)],
    'Q2_2': np.random.random(10),
    'Q2_3': np.random.randint(2, size=10),
    'Q3_1': [np.random.randint(1,100) for i in range(10)],
    'Q3_2': np.random.random(10),
    'Q3_3': np.random.randint(2, size=10),
    'Q4_1': [np.random.randint(1,100) for i in range(10)],
    'Q4_2': np.random.random(10),
    'Q4_3': np.random.randint(2, size=10)
})

df has the following display:

    Q1_1    Q1_2    Q1_3    Q2_1    Q2_2    Q2_3    Q3_1    Q3_2    Q3_3    Q4_1    Q4_2    Q4_3
0   92  0.551722    1   36  0.063269    1   95  0.541573    1   91  0.521076    1
1   89  0.951076    1   82  0.853572    1   49  0.782290    1   98  0.232572    0
2   88  0.909953    1   19  0.544450    1   66  0.021061    1   51  0.951225    0
3   66  0.904642    1   17  0.727190    1   85  0.697792    0   35  0.412844    1
4   78  0.802783    1   23  0.634575    1   77  0.759861    0   55  0.460012    0
5   41  0.943271    1   63  0.460578    1   95  0.004986    1   89  0.970059    0
6   54  0.600558    0   18  0.031487    0   84  0.716314    0   84  0.636364    1
7   2   0.458006    0   95  0.029421    0   10  0.927356    1   27  0.031572    1
8   38  0.029658    1   30  0.125706    1   94  0.096702    1   32  0.241613    1
9   52  0.584300    1   85  0.026642    0   78  0.358952    0   70  0.696008    0

I want a simpler way to get the following sub-df:

    Q1_1    Q1_2    Q1_3    Q3_1    Q3_2    Q3_3
0   92  0.551722    1   95  0.541573    1
1   89  0.951076    1   49  0.782290    1
2   88  0.909953    1   66  0.021061    1
3   66  0.904642    1   85  0.697792    0
4   78  0.802783    1   77  0.759861    0
5   41  0.943271    1   95  0.004986    1
6   54  0.600558    0   84  0.716314    0
7   2   0.458006    0   10  0.927356    1
8   38  0.029658    1   94  0.096702    1
9   52  0.584300    1   78  0.358952    0

Please if you need more detail let me know in comments,

Any help from your side will be highly appreciated.

>Solution :

You can use pd.DataFrame.filter for this:

df.filter(regex = r'Q1_\d|Q3_\d')

   Q1_1      Q1_2  Q1_3  Q3_1      Q3_2  Q3_3
0     5  0.631041     0    46  0.768563     0
1    32  0.594106     1    46  0.982396     1
2    78  0.703139     1    38  0.252107     0
3    98  0.353230     0    35  0.324079     0
4    77  0.913203     1    11  0.456287     0
5    62  0.565350     1    77  0.387365     0
6    38  0.975652     1    59  0.276421     1
7    97  0.505808     1    84  0.035756     0
8    15  0.525452     0    57  0.675310     1
9    94  0.545259     0    25  0.628030     0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading