Let’s say I have the following sample input dataframe below. Please note my real table contains hundreds of dates.
Date |
---|
2023-04-02 |
2023-07-14 |
The code to create above table is:
import pandas as pd
df = pd.DataFrame({'date':["2023-04-02","2023-07-14"]})
And I have a list of names:
names_list = ['Matthew Perry', "Amy Winehouse", "Ted Nugent"]
I want to expand the input table by adding a second column that contains names from the names_list. The dates in "Date" column would get repeated until a new date is hit.
My desired output table looks like this:
Date | Names |
---|---|
2023-04-02 | Matthew Perry |
2023-04-02 | Amy Winehouse |
2023-04-02 | Ted Nugent |
2023-07-14 | Matthew Perry |
2023-07-14 | Amy Winehouse |
2023-07-14 | Ted Nugent |
I searched around Stack Overflow but couldn’t find any code to do this. I’m familiar with left joins and concats but I don’t think either apply in this situation.
>Solution :
One option would be to build a Series
of the names and cross-merge
it with your df
:
out = df.merge(pd.Series(names_list).rename("Names"), how="cross")
Alternatively, make a product
and pass it to the DataFrame
constructor :
from itertools import product
out = pd.DataFrame(
product(df["date"], names_list), columns=["date", "Names"]
)
Output :
print(out)
date Names
0 2023-04-02 Matthew Perry
1 2023-04-02 Amy Winehouse
2 2023-04-02 Ted Nugent
3 2023-07-14 Matthew Perry
4 2023-07-14 Amy Winehouse
5 2023-07-14 Ted Nugent