Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Transforming automated text to a DataFrame

I have a set of computer-generated notices saved in a text file – it looks like this:

  • MY COMPANY LLC: Statement# 123456 for $10000.99 for the month of 2023 02 (FEB)
  • MY COMPANY LLC: Statement# 123457 for $100.01 for the month of 2022 09 (SEP)
  • MY COMPANY LLC: Statement# 123458 for -$51.00 for the month of 2022 10 (OCT)

Ideal Output – as DataFrame

Statement Amount Month
123456 10000.99 2023 02 (FEB)
123457 100.01 2022 09 (SEP)
123458 -51 2022 10 (OCT)

I know I can turn this into a CSV outside of Python, then import it with pandas.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

But can I load the text file into Python as-is, and transform convert it into a DataFrame?

One "hint" – the word for can work as my column separator in this case. It splits every row reliably into the fields I want. It’s a bit of luck.

>Solution :

The format of your expected dataframe is not clear but here is an option with extract :

df = (
        pd.read_csv("input2.txt", header=None, sep="|").squeeze()
            .str.extract("(.*): Statement# (.*) for (.*) for the month of (\d+) (\d+) \((\w+)\)")
            .set_axis(["company_name", "statement", "amount", "year", "day", "month"], axis=1)
      )

Output :

print(df)
​
     company_name statement     amount  year day month
0  MY COMPANY LLC    123456  $10000.99  2023  02   FEB
1  MY COMPANY LLC    123457    $100.01  2022  09   SEP
2  MY COMPANY LLC    123458    -$51.00  2022  10   OCT
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading