Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Replicate contains string on row and move with specific result to new column Pandas python

I need to move a specific selected value of row to a new column with selected only value, what method that I need to do this? For example, I need to get only the value chosen that contains 'KOTA|KAB' to a new column, from this case, what

I have some data:

+---------------------------------------------+
|                     A                       |
+---------------------------------------------+
| JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135 |
| JL XXXX, KEC PORONG KAB SIDOARJO 61274      |
| DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471   |
+---------------------------------------------+

I need to get specific value only and move to a new column, what I expect is look like this

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

+---------------------------------------------+----------------------+
|                     A                       |           B          |
+---------------------------------------------+----------------------+
| JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135 | KOTA SURABAYA        |
| JL XXXX, KEC PORONG KAB SIDOARJO 61274      | KAB SIDOARJO         |
| DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471   | KAB BANYUWANGI       |
+---------------------------------------------+----------------------+

What I tried:

# initialize list of lists
testing = [['JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135'], 
           ['JL XXXX, KEC PORONG KAB SIDOARJO 61274'], 
           ['DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471']]
  
# Create the pandas DataFrame
df_test = pd.DataFrame(testing, columns=['A'])
  
for check in df_test['A']:
    test = re.sub(r'(\bKOTA\b)|(\bKAB\b)', '', check)
    print(test)

But the result above will remove the KOTA and KAB

>Solution :

Using re.sub removed the text that is matched with the pattern from the string.

You can use a single capture group with str.extract for example:

testing = [['JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135'],
           ['JL XXXX, KEC PORONG KAB SIDOARJO 61274'],
           ['DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471']]

# Create the pandas DataFrame
df_test = pd.DataFrame(testing, columns=['A'])

df_test['B'] = df_test["A"].str.extract(r'\b((?:KOTA|KAB) \w+)')
print (df_test)

Output

                                             A               B
0  JL XXXX, KEC TAMBAKSARI KOTA SURABAYA 60135   KOTA SURABAYA
1       JL XXXX, KEC PORONG KAB SIDOARJO 61274    KAB SIDOARJO
2    DUSUN XXX, KEC SRONO KAB BANYUWANGI 68471  KAB BANYUWANGI
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading