Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Get Closest match for a column in data frame

I have a data Frame which contains different call types as below values

    CallType
0         IN
1        OUT
2       a_in
3       asms
4   INCOMING
5   OUTGOING
6  A2P_SMSIN
7        ain
8       aout

I want to map this in such a way the output would be

    CallType
0       IN
1       OUT
2       IN
3       SMS
4       IN
5       OUT
6       SMS
7       IN
8       OUT

I am trying to use difflib.closestmatch but it gives no result . Below is my code

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

CALL_TYPE=['IN','OUT','SMS','VOICE','SMT']

def test1():
    final_file_data = pd.DataFrame({
        'CallType': ['IN', 'OUT', 'a_in',
                         'asms', 'INCOMING', 'OUTGOING','A2P_SMSIN',
                         'ain', 'aout']})

    print(final_file_data)
    final_file_data['CallType'] = final_file_data['CallType'].apply(lambda x: difflib.get_close_matches(x, CALL_TYPE, n=1))

The output I get is below which as results only for IN and OUT

 CallType
0     [IN]
1    [OUT]
2       []
3       []
4       []
5       []
6       []
7       []
8       []

I am not sure where I am going wrong .

>Solution :

It has to do with get_close_matches being case-sensitive and the cutoff for the score that is gotten for similarity. You can manipulate the x string to upper() and change the cutoff to be less stringent. This is what I did:

final_file_data['CallType'] = final_file_data['CallType'].apply(lambda x: difflib.get_close_matches(x.upper(), CALL_TYPE, n=1, cutoff=0))

final_file_data is now:

  CallType
0     [IN]
1    [OUT]
2     [IN]
3    [SMS]
4     [IN]
5    [OUT]
6    [SMS]
7     [IN]
8    [OUT]

You can read more about the get_close_matches here to read more about the cutoff argument.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading