Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

replace substrings of elements within list and keep original elements

I have a list names.

names = ['Dr. Augsten, BÜNDNIS 90/DIE GRÜNEN', 'Dirk Adams, GRÜNE', 'Blechschmidt, DIE LINKE', 'Steffen Harzer, LINKE', 'Gerd Schuchardt, Minister für Wissenschaft, Forschung und Kultur', 'David-Christian Eckardt, SPD', 'Christine Ursula Klaus, SPD', 'Klaus von der Krone, CDU', 'Antje Ehrlich-Strathausen, SPD', 'Benno Lemke, PDS']

names = [re.sub('(?<!DIE)\sLINKE', ' DIE LINKE', line) for line in names]
names = [re.sub('(?<!DIE)\sGRÜNE', ' BÜNDNIS 90/DIE GRÜNEN', line) for line in names]
names = [re.sub('Die Linke', 'DIE LINKE', line) for line in names]
names = [re.sub('PDS', 'DIE LINKE', line) for line in names]
names = [re.sub('Dr.\s', '', line) for line in names]
actual_names = [re.sub('((?:^|(?:[.!?]\s))(\w+)\s)', '', line) for line in names]

print(actual_names)

actual_names = ['Augsten, BÜNDNIS 90/DIE GRÜNEN', 'Adams, BÜNDNIS 90/DIE GRÜNEN', 'Blechschmidt, DIE LINKE', 'Harzer, DIE LINKE', 'Schuchardt, Minister für Wissenschaft, Forschung und Kultur', 'David-Christian Eckardt, SPD', 'Ursula Klaus, SPD', 'von der Krone, CDU', 'Ehrlich-Strathausen, SPD', 'Lemke, DIE LINKE']

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Questions:

  1. How do i need to change the regex in order to account for the names that have a - within them (see 'David-Christian Eckardt, SPD'
  2. How do i need to change the code in order to keep the original elements?

desired_names = ['Augsten, BÜNDNIS 90/DIE GRÜNEN', 'Adams, BÜNDNIS 90/DIE GRÜNEN', 'Adams, GRÜNE', 'Blechschmidt, DIE LINKE', 'Harzer, DIE LINKE', 'Harzer, LINKE', 'Schuchardt, Minister für Wissenschaft, Forschung und Kultur', 'Eckardt, SPD', 'Klaus, SPD', 'von der Krone, CDU', 'Ehrlich-Strathausen, SPD', 'Lemke, PDS', 'Lemke, DIE LINKE']

Order within list does not matter

>Solution :

Is regex in this case necessary? You can use str.split with maxsplit=1 parameter:

names = [
    "Dr. Augsten, BÜNDNIS 90/DIE GRÜNEN",
    "Dirk Adams, GRÜNE",
    "Blechschmidt, DIE LINKE",
    "Steffen Harzer, LINKE",
    "Gerd Schuchardt, Minister für Wissenschaft, Forschung und Kultur",
    "David-Christian Eckardt, SPD",
    "Christine Ursula Klaus, SPD",
    "Klaus von der Krone, CDU",
    "Antje Ehrlich-Strathausen, SPD",
    "Benno Lemke, PDS",
]

m = {"LINKE": "DIE LINKE", "GRÜNE": "BÜNDNIS 90/DIE GRÜNEN", "PDS": "DIE LINKE"}

out = [n.split(", ", maxsplit=1) for n in names]
out = [", ".join([a.split()[-1], m.get(b, b)]) for a, b in out]

print(out)

Prints:

[
    "Augsten, BÜNDNIS 90/DIE GRÜNEN",
    "Adams, BÜNDNIS 90/DIE GRÜNEN",
    "Blechschmidt, DIE LINKE",
    "Harzer, DIE LINKE",
    "Schuchardt, Minister für Wissenschaft, Forschung und Kultur",
    "Eckardt, SPD",
    "Klaus, SPD",
    "Krone, CDU",
    "Ehrlich-Strathausen, SPD",
    "Lemke, DIE LINKE",
]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading