Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Two Letter Bigram in Pandas Dataframe

Having trouble finding a way to get every two letter combination in a string in a dataframe. Everything I have been searching is for words rather than letters. Below is expected output.

stringoutputhellohe, el, ll, loworldwo, or, rl,

I have tried both lines below

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df['bigram'] = list(zip(df['string'],df['string][1:]))

Generated this error

ValueError: Length of values (15570) does not match length of index (15571)

df['bigram'] = list(ngrams(df['string'], n=2))

Generated this error

ValueError: Length of values (15570) does not match length of index (15571)

df['bigram']=re.findall(r'[a-zA-z]{2}', df['string'])

Generated this error

TypeError: expected string or bytes-like object

Example:

string output
hello he, el, ll, lo
world wo, or, rl, ld

>Solution :

You need to loop over the strings:

from nltk import ngrams

df = pd.DataFrame({'string': ['abc', 'abcdef']})

df['bigram'] = df['string'].apply(lambda x: list(ngrams(x, n=2)))

Output:

   string                                    bigram
0     abc                          [(a, b), (b, c)]
1  abcdef  [(a, b), (b, c), (c, d), (d, e), (e, f)]

If you want a string:

df['bigram'] = [', '.join([x[i:i+2] for i in range(len(x)-2)])
                for x in df['string']]

Output:

   string          bigram
0     abc              ab
1  abcdef  ab, bc, cd, de
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading