currently I have column in a pandas dataframe. df that looks like this:
| read_name |
|---|
| NB511043:297:HJJMHBGXJ:1:22110:22730:3876 |
| NB511043:297:HJJMHBGXJ:4:22609:8139:4265 |
| NB511043:298:HT6KCBGXJ:1:13311:16766:2025 |
What I’m hoping to do is specifically extract the 5th and 7th elements of each string in this df and append these to the end of the same dataframe, like so:
| value | 5th element | 7th element |
|---|---|---|
| NB511043:297:HJJMHBGXJ:1:22110:22730:3876 | 22110 | 3876 |
| NB511043:297:HJJMHBGXJ:4:22609:8139:4265 | 22609 | 4265 |
| NB511043:298:HT6KCBGXJ:1:13311:16766:2025 | 13311 | 2025 |
my current method is to create a whole new dataframe using str.split to split everything in read_name, and then simply append these values to the new dataframe. Like so
df_read_name= df['read_name'].str.split(":", n = 6, expand = True)
df['5th element']= pd.to_numeric(df_read_name[4])
df['7th element']= pd.to_numeric(df_read_name[6])
However, I think this is a bit cumbersome and was hoping there might be a faster approach.
as always, any help is appreciated!
>Solution :
You can use .str.split with expand=True:
df[["5th element", "7th element"]] = df["read_name"].str.split(":", expand=True)[[4, 6]].astype(int)