I have an original dataframe og_df and a sublist dataframe, which is a part of the og_df. I want to create a new dataframe new_df which contains the elements of og_df and every n following elements in the og_df.
Example:
og_df = pd.DataFrame({'column': range(20)})
sub_df = pd.DataFrame({'column': [ 1, 2, 10 ]})
n = 3
new_df = pd.DataFrame({'column':[]})
for index in sub_df.index:
new_df = pd.concat([new_frame, og_df.iloc[index:index + n]])
print(new_df)
>>> column
0 1
1 2
2 3
1 2
2 3
3 4
11 10
12 11
13 12
This worked the way I wanted and gives the desired result. However, when og_df has multiple columns and I use the [..] operator, or if it has one column and I use the [..] operator it does behave like this:
for index in sub_df.index:
new_df = pd.concat([new_frame, og_df['column'].iloc[index:index + n]])
print(new_df)
>>> column 0
1 NaN 1.0
2 NaN 2.0
3 NaN 3.0
2 NaN 2.0
3 NaN 3.0
4 NaN 4.0
10 NaN 10.0
11 NaN 11.0
12 NaN 12.0
How can I make it behave like desired? I aim to select just one column from a multicolumn og_df and that’s the reason i’m using the [..] operator.
>Solution :
The issue you’re facing is that you’re using a single slice [] instead of double slice [[]].
If you update your loop the code works:
new_df = pd.DataFrame({'column':[]})
for index in sub_df.index:
new_df = pd.concat([new_df, og_df[['column']].iloc[index:index + n]])
print(new_df)
Output:
column
0 0.0
1 1.0
2 2.0
1 1.0
2 2.0
3 3.0
2 2.0
3 3.0
4 4.0
This happens because a single slice gives a series:
type(og_df['column']), type(og_df[['column']])
(pandas.core.series.Series, pandas.core.frame.DataFrame)