Convert a string to a dictionary of type <K: date, V: int>

I’m working on an ML model that requires RSSI values from a BLE device. For this case, I’ve created a Mac application where I store a dictionary of type <K: Date,v: Int> in a text file. See the sample below.

string = '[2021-10-17 06:52:00 +0000: -47, 2021-10-17 06:52:04 +0000: -50, 2021-10-17 06:52:03 +0000: -50, 2021-10-17 06:52:02 +0000: -47, 2021-10-17 06:52:08 +0000: -46, 2021-10-17 06:51:57 +0000: -50, 2021-10-17 06:52:09 +0000: -48, 2021-10-17 06:52:05 +0000: -49, 2021-10-17 06:52:01 +0000: -48, 2021-10-17 06:51:58 +0000: -50, 2021-10-17 06:51:59 +0000: -50, 2021-10-17 06:52:06 +0000: -47, 2021-10-17 06:52:07 +0000: -48]'

Here, the negative values in the sample are the RSSI values. For example for the first 2 indices

Date RSSI
2021-10-17 06:52:00 +0000 -47
2021-10-17 06:52:04 +0000 -50

For me to perform any calculations I need the data to be of type <Date, Int> equivalent on python. How can I convert the above string into a Pandas Dataframe to perform calculations? Hope this provides enough information. Thank you in advance.

>Solution :

You can use re.findall and a small regex:

as dataframe
string = '[2021-10-17 06:52:00 +0000: -47, 2021-10-17 06:52:04 +0000: -50, 2021-10-17 06:52:03 +0000: -50, 2021-10-17 06:52:02 +0000: -47, 2021-10-17 06:52:08 +0000: -46, 2021-10-17 06:51:57 +0000: -50, 2021-10-17 06:52:09 +0000: -48, 2021-10-17 06:52:05 +0000: -49, 2021-10-17 06:52:01 +0000: -48, 2021-10-17 06:51:58 +0000: -50, 2021-10-17 06:51:59 +0000: -50, 2021-10-17 06:52:06 +0000: -47, 2021-10-17 06:52:07 +0000: -48]'

import re
import pandas as pd

df = (pd.DataFrame.from_records(re.findall('([^,]+): (-?\d+)(?:, )?', string[1:-1]),
                                columns=['Date', 'RSSI'])
        .astype({'Date': 'datetime64', 'RSSI': 'int'})
      )

output:

                         Date RSSI
0   2021-10-17 06:52:00 +0000  -47
1   2021-10-17 06:52:04 +0000  -50
2   2021-10-17 06:52:03 +0000  -50
3   2021-10-17 06:52:02 +0000  -47
4   2021-10-17 06:52:08 +0000  -46
5   2021-10-17 06:51:57 +0000  -50
6   2021-10-17 06:52:09 +0000  -48
7   2021-10-17 06:52:05 +0000  -49
8   2021-10-17 06:52:01 +0000  -48
9   2021-10-17 06:51:58 +0000  -50
10  2021-10-17 06:51:59 +0000  -50
11  2021-10-17 06:52:06 +0000  -47
12  2021-10-17 06:52:07 +0000  -48

as dictionary

import re
dict(re.findall('([^,]+): (-?\d+)(?:, )?', string[1:-1]))

output:

{'2021-10-17 06:52:00 +0000': '-47',
 '2021-10-17 06:52:04 +0000': '-50',
 '2021-10-17 06:52:03 +0000': '-50',
 '2021-10-17 06:52:02 +0000': '-47',
 '2021-10-17 06:52:08 +0000': '-46',
 '2021-10-17 06:51:57 +0000': '-50',
 '2021-10-17 06:52:09 +0000': '-48',
 '2021-10-17 06:52:05 +0000': '-49',
 '2021-10-17 06:52:01 +0000': '-48',
 '2021-10-17 06:51:58 +0000': '-50',
 '2021-10-17 06:51:59 +0000': '-50',
 '2021-10-17 06:52:06 +0000': '-47',
 '2021-10-17 06:52:07 +0000': '-48'}

as dictionary with correct types:

import re
import pandas as pd
{pd.to_datetime(k): int(v)
 for k,v in re.findall('([^,]+): (-?\d+)(?:, )?', string[1:-1])}

output:

{Timestamp('2021-10-17 06:52:00+0000', tz='UTC'): -47,
 Timestamp('2021-10-17 06:52:04+0000', tz='UTC'): -50,
 Timestamp('2021-10-17 06:52:03+0000', tz='UTC'): -50,
 Timestamp('2021-10-17 06:52:02+0000', tz='UTC'): -47,
 Timestamp('2021-10-17 06:52:08+0000', tz='UTC'): -46,
 Timestamp('2021-10-17 06:51:57+0000', tz='UTC'): -50,
 Timestamp('2021-10-17 06:52:09+0000', tz='UTC'): -48,
 Timestamp('2021-10-17 06:52:05+0000', tz='UTC'): -49,
 Timestamp('2021-10-17 06:52:01+0000', tz='UTC'): -48,
 Timestamp('2021-10-17 06:51:58+0000', tz='UTC'): -50,
 Timestamp('2021-10-17 06:51:59+0000', tz='UTC'): -50,
 Timestamp('2021-10-17 06:52:06+0000', tz='UTC'): -47,
 Timestamp('2021-10-17 06:52:07+0000', tz='UTC'): -48}

Leave a Reply