Remove Specific characters in dataframe

January 20, 2022

I have a dataframe as below

Example 1
Some data frame always has 9 characters in tagname column before Last full stop ‘.’

| id | tagname                         | datetime                |
|----|---------------------------------|-------------------------|
| 1  | A2.WAM_A.ACCESS_CLOSE_FAULT_LH  | 2022-01-20 15:07:36.310 |
| 2  | A2.WAM_A.ACCESS_SENSOR_FAULT_RH | 2022-01-20 15:07:36.310 |
| 3  | A2.WAM_A.OUTPUT_POWER_CP_FAULT  | 2022-01-20 15:07:36.310 |

If I use,

df['tagname'] = df['tagname'].str[9:]

Output: –

| id | tagname                | datetime                |
|----|------------------------|-------------------------|
| 1  | ACCESS_CLOSE_FAULT_LH  | 2022-01-20 15:07:36.310 |
| 2  | ACCESS_SENSOR_FAULT_RH | 2022-01-20 15:07:36.310 |
| 3  | OUTPUT_POWER_CP_FAULT  | 2022-01-20 15:07:36.310 |

Example 2
But in some tables I have diff length characters & multiple dot’s in tagname before Last full stop ‘.’, like below

| id | tagname                                      | datetime                |
|----|----------------------------------------------|-------------------------|
| 1  | A2.AC.CH.CONDITION.1ST_VACUUM_CH             | 2021-09-28 17:31:48.191 |
| 2  | A2.AC.CH.CONDITION.SMALL_LEAK_TEST_VACUUM_CH | 2021-09-28 17:31:48.193 |
| 3  | A2.AC.CH.CONDITION.VACC_VALUE_CH_FLOAT_R270  | 2021-09-28 17:31:48.196 |
| 4  | A2.CP01.PRL2_TRIM1.ONCHANGE.PRL2             | 2021-09-28 17:31:48.199 |
| 5  | AY2.CP01.DL5_TRIM1.ONCHANGE.DL5              | 2021-09-28 17:31:48.199 |

Requirement: Output
Remove all characters in column tagname before that last dot ‘.’ in tagname column

| id | tagname                   | datetime                |
|----|---------------------------|-------------------------|
| 1  | 1ST_VACUUM_CH             | 2021-09-28 17:31:48.191 |
| 2  | SMALL_LEAK_TEST_VACUUM_CH | 2021-09-28 17:31:48.193 |
| 3  | VACC_VALUE_CH_FLOAT_R270  | 2021-09-28 17:31:48.196 |
| 4  | PRL2                      | 2021-09-28 17:31:48.199 |
| 5  | DL5                       | 2021-09-28 17:31:48.199 |

>Solution :

Use str.rsplit:

Example 1:

df1['tagname'] = df1['tagname'].str.rsplit('.', 1).str[1]
print(df1)

# Output
   id                 tagname                 datetime
0   1   ACCESS_CLOSE_FAULT_LH  2022-01-20 15:07:36.310
1   2  ACCESS_SENSOR_FAULT_RH  2022-01-20 15:07:36.310
2   3   OUTPUT_POWER_CP_FAULT  2022-01-20 15:07:36.310

Example 2:

df2['tagname'] = df2['tagname'].str.rsplit('.', 1).str[1]
print(df2)

# Output
   id                    tagname                 datetime
0   1              1ST_VACUUM_CH  2021-09-28 17:31:48.191
1   2  SMALL_LEAK_TEST_VACUUM_CH  2021-09-28 17:31:48.193
2   3   VACC_VALUE_CH_FLOAT_R270  2021-09-28 17:31:48.196
3   4                       PRL2  2021-09-28 17:31:48.199
4   5                        DL5  2021-09-28 17:31:48.199