Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Connecting a dataframe and another 1d dataframe isn't a problem, but when using the […] operator it yields different results

I have an original dataframe og_df and a sublist dataframe, which is a part of the og_df. I want to create a new dataframe new_df which contains the elements of og_df and every n following elements in the og_df.
Example:

og_df = pd.DataFrame({'column': range(20)})
sub_df = pd.DataFrame({'column': [ 1, 2, 10 ]})
n = 3  
new_df = pd.DataFrame({'column':[]})

for index in sub_df.index:
    new_df = pd.concat([new_frame, og_df.iloc[index:index + n]])
print(new_df)

>>>    column
    0    1
    1    2
    2    3
    1    2
    2    3
    3    4
    11   10
    12   11
    13   12

This worked the way I wanted and gives the desired result. However, when og_df has multiple columns and I use the [..] operator, or if it has one column and I use the [..] operator it does behave like this:

for index in sub_df.index:
    new_df = pd.concat([new_frame, og_df['column'].iloc[index:index + n]])
print(new_df)

>>>    column     0
1      NaN   1.0
2      NaN   2.0
3      NaN   3.0
2      NaN   2.0
3      NaN   3.0
4      NaN   4.0
10     NaN  10.0
11     NaN  11.0
12     NaN  12.0

How can I make it behave like desired? I aim to select just one column from a multicolumn og_df and that’s the reason i’m using the [..] operator.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

The issue you’re facing is that you’re using a single slice [] instead of double slice [[]].

If you update your loop the code works:

new_df = pd.DataFrame({'column':[]})
for index in sub_df.index:
    new_df = pd.concat([new_df, og_df[['column']].iloc[index:index + n]])
print(new_df)

Output:

   column
0     0.0
1     1.0
2     2.0
1     1.0
2     2.0
3     3.0
2     2.0
3     3.0
4     4.0

This happens because a single slice gives a series:

type(og_df['column']), type(og_df[['column']])
(pandas.core.series.Series, pandas.core.frame.DataFrame)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading