Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Read out file and convert certain line into a correct form

I have a problem. I am reading in a file.
This file contains abbreviations. However, I only want to read the abbreviations. This also works. However, not in the desired format as expected, I would like to save the abbreviations cleanly per line (see below for the desired output). The problem is that I’m getting something like '\t\\acro{.... How can I convert this to my desired output?

def getPrice(symbol,
            shortForm,
            longForm):

    abbreviations = []
    with open("./file.tex", encoding="utf-8") as f:
         file = list(f)
    save = False
    for line in file:
        print("\n"+ line)
        if(line.startswith(r'\end{acronym}')):
            save = False
        if(save):
            abbreviations.append(line)
        if(line.startswith(r'\begin{acronym}')):
            save = True
        
    print(abbreviations)

if __name__== "__main__":
    getPrice(str(sys.argv[1]),
    str(sys.argv[2]),
    str(sys.argv[3]))


[OUT]
['\t\\acro{knmi}[KNMI]{Koninklijk Nederlands Meteorologisch Instituut}\n', '\t\\acro{test}[TESTERER]{T E SDH SADHU AHENSAD }\n']
\chapter*{Short}
\addcontentsline{toc}{chapter}{Short}
\markboth{Short}{Short}
\begin{acronym}[TESTERER]
    \acro{knmi}[KNMI]{Koninklijk Nederlands Meteorologisch Instituut}
    \acro{example}[e.g.]{For example}
\end{acronym}

Desired Output

{
  "abbreviation1": {
      "symbol": "knmi",
      "shortForm": "KNMI",
      "longForm": "Koninklijk Nederlands Meteorologisch Instituut",
   }
  "abbreviation2": {
      "symbol": "example",
      "shortForm": "e.g.",
      "longForm": "For example",
   }
}

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can use re.findall() to capture all of the abbreviations, then use the json module to dump it out into a file. Your approach could work, but you’d have to do a lot of manual string parsing, which would be a pretty massive headache. (Note that a program that can parse arbitrary LaTeX would need something more powerful than regular expressions; however, since we’re parsing a very small subset of LaTeX, regular expressions will do fine here.)

import re
import json

data = r"""\chapter*{Short}
\addcontentsline{toc}{chapter}{Short}
\markboth{Short}{Short}
\begin{acronym}[TESTERER]
    \acro{knmi}[KNMI]{Koninklijk Nederlands Meteorologisch Instituut}
    \acro{example}[e.g.]{For example}
\end{acronym}"""

pattern = re.compile(r"\\acro\{(.+)\}\[(.+)\]\{(.+)\}")
regex_result = re.findall(pattern, data)
final_output = {}
for index, (symbol, shortform, longform) in enumerate(regex_result, start=1):
    final_output[f'abbreviation{index}'] = \
        dict(symbol=symbol, shortform=shortform, longform=longform)

with open('output.json', 'w') as output_file:
    json.dump(final_output, output_file, indent=4)

output.json contains the following:

{
    "abbreviation1": {
        "symbol": "knmi",
        "shortform": "KNMI",
        "longform": "Koninklijk Nederlands Meteorologisch Instituut"
    },
    "abbreviation2": {
        "symbol": "example",
        "shortform": "e.g.",
        "longform": "For example"
    }
}
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading