Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to find and calculate common letters between words in pandas

I have a dataset with some words in it and I want to compare 2 columns and count common letters between them.

For e.g I have:

data = {'Col_1' : ['Heaven', 'Jako', 'Sm', 'apizza'],
       'Col_2' : ['Heaven', 'Jakob', 'Smart', 'pizza']}
df = pd.DataFrame(data)

| Col_1  | Col_2  |
-------------------
| Heaven | Heaven |
| Jako   | Jakob  |
| Sm     | Smart  |
| apizza | pizza  |

And I want to have smth like that:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

| Col_1  | Col_2  | Match                          | Count |
------------------------------------------------------------
| Heaven | Heaven | ['H', 'e', 'a', 'v', 'e', 'n'] | 6     |
| Jako   | Jakob  | ['J', 'a', 'k', 'o']           | 4     |
| Sm     | Smart  | ['S', 'm']                     | 2     |
| apizza | pizza  | []                             | 0     |

>Solution :

You can use a list comprehension with help of itertools.takewhile:

from itertools import takewhile
df['Match'] = [[x for x,y in takewhile(lambda x: x[0]==x[1], zip(a,b))]
               for a,b in zip(df['Col_1'], df['Col_2'])]
df['Count'] = df['Match'].str.len()

output:

    Col_1   Col_2               Match  Count
0  Heaven  Heaven  [H, e, a, v, e, n]      6
1    Jako   Jakob        [J, a, k, o]      4
2      Sm   Smart              [S, m]      2
3  apizza   pizza                  []      0

NB. the logic was no fully clear, so here this stops as soon as there is a mistmatch

If you want to continue after a mistmatch (which doesn’t seems to fit the "pizza" example):

df['Match'] = [[x for x,y in zip(a,b) if x==y]
               for a,b in zip(df['Col_1'], df['Col_2'])]
df['Count'] = df['Match'].str.len()

output:

    Col_1   Col_2               Match  Count
0  Heaven  Heaven  [H, e, a, v, e, n]      6
1    Jako   Jakob        [J, a, k, o]      4
2      Sm   Smart              [S, m]      2
3  apizza   pizza                 [z]      1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading