Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Turn Koeppen Climate Legend into meaningful csv with regex

I have this table:

    1:  Af   Tropical, rainforest                  [0 0 255]
    2:  Am   Tropical, monsoon                     [0 120 255]
    3:  Aw   Tropical, savannah                    [70 170 250]
    4:  BWh  Arid, desert, hot                     [255 0 0]
    5:  BWk  Arid, desert, cold                    [255 150 150]
    6:  BSh  Arid, steppe, hot                     [245 165 0]
    7:  BSk  Arid, steppe, cold                    [255 220 100]
    8:  Csa  Temperate, dry summer, hot summer     [255 255 0]
    9:  Csb  Temperate, dry summer, warm summer    [200 200 0]
    10: Csc  Temperate, dry summer, cold summer    [150 150 0]
    11: Cwa  Temperate, dry winter, hot summer     [150 255 150]
    12: Cwb  Temperate, dry winter, warm summer    [100 200 100]
    13: Cwc  Temperate, dry winter, cold summer    [50 150 50]
    14: Cfa  Temperate, no dry season, hot summer  [200 255 80]
    15: Cfb  Temperate, no dry season, warm summer [100 255 80]
    16: Cfc  Temperate, no dry season, cold summer [50 200 0]
    17: Dsa  Cold, dry summer, hot summer          [255 0 255]
    18: Dsb  Cold, dry summer, warm summer         [200 0 200]
    19: Dsc  Cold, dry summer, cold summer         [150 50 150]
    20: Dsd  Cold, dry summer, very cold winter    [150 100 150]
    21: Dwa  Cold, dry winter, hot summer          [170 175 255]
    22: Dwb  Cold, dry winter, warm summer         [90 120 220]
    23: Dwc  Cold, dry winter, cold summer         [75 80 180]
    24: Dwd  Cold, dry winter, very cold winter    [50 0 135]
    25: Dfa  Cold, no dry season, hot summer       [0 255 255]
    26: Dfb  Cold, no dry season, warm summer      [55 200 255]
    27: Dfc  Cold, no dry season, cold summer      [0 125 125]
    28: Dfd  Cold, no dry season, very cold winter [0 70 95]
    29: ET   Polar, tundra                         [178 178 178]
    30: EF   Polar, frost                          [102 102 102]

First: It is really hard to get this into a csv…
I would like to have the code (first column) and the long description (e.g. Tropical, rainforest for the first row). So I thought I would handle this with a regex. But apparently I am hitting my understanding of how regexes work. I tried doing it in R, but I’d be super grateful for any help.

I tried something like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

str_match(a, "\\d{1,2}:\\s[a-zA-Z]{2,3}.*([a-zA-Z,]).*\\[") but it fails…

>Solution :

You may use either

str_match(a, "(\\d{1,2}):\\s*(.*?)\\s*\\[(.*)\\]")
str_match(a, "(\\d{1,2}):\\s*(\\w+)\\s*(.*?)\\s*\\[(.*)\\]")

See the regex demo #1 and regex demo #2.

Details:

  • (\d{1,2}) – Group 1: one or two digits
  • :\s*: and zero or more whitespaces
  • (\w+) – Group 2: one or more letters, digits or _
  • \s* – zero or more whitespaces
  • (.*?) – Group 3: any zero or more chars other than line break chars, as few as possible
  • \s* – zero or more whitespaces
  • \[ – a [ char
  • (.*) – Group 4: any zero or more chars other than line break chars, as many as possible
  • \] – a ] char.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading