I have a nested array called element_text in the form of for example:
[[1, 'the'], [1, 'quick brown'], [2, 'fox jumped'], [2, 'over'], [2, 'the'], [3, 'lazy goat']]
And would like to concatenate the elements in the array and return a new array called page_text as so:
[[1, 'the quick brown'], [2, 'fox jumped over the'], [3, 'lazy goat']]
So, if the first number is the same, join the second text strings together with a space in between.
I’ve tried:
page_text = []
for i in element_text:
#join the list of strings together if the page number is the same
if i[0] == i[0]:
text = " ".join(i[1])
page_text.append([i[0], text])
But this just returns the same array as what was there in the first place.
Any help appreciated!
Thanks,
Carolina
>Solution :
Solution:
You can use pandas by grouping the records by your number and joining all the strings together into a new column.
import pandas as pd
data = [[1, 'the'], [1, 'quick brown'], [2, 'fox jumped'], [2, 'over'], [2, 'the'], [3, 'lazy goat']]
df = pd.DataFrame(data, columns=['num','text'])
df['full_text'] = df.groupby(['num'])['text'].transform(lambda x : ' '.join(x))
df = df[['num','full_text']].drop_duplicates(subset='num')
df.head()
# num full_text
#0 1 the quick brown
#2 2 fox jumped over the
#5 3 lazy goat