Home Delete numbers smaller then 3 digits in a list while amount of items stays the same

Questions

Delete numbers smaller then 3 digits in a list while amount of items stays the same

April 2, 2022

I want to normalize my list containing years. It is important that the amount of items in the list stay the same, because I’m going to convert the list to a dataframe and the rows need to allign with the other variables. This is the list I have. It contains many different ways to notate the year:

['1817 (1817p)', '1800-1824 (19.1q)', '1825-1849', 'ca. 1850', '1856–60', '1861-07-XX', 'copied between 1824 and 1845', 'copied d. 14tn Merz 1767', '1718']

Now, I would like to get only 1 year per item in the list. For example:

['1817', '1800', '1825', '1850', '1856', '1861', '1824', '1767', '1718']

If there are two years in 1 item, then choose the first year. (Bonus points if you could get the mean if there are 2 items in a list.)

In order to get te desired result, I removed everything within brackets and replaced "-" with spaces.

import re

data2 = []

for i in data:
    df8 = re.sub(r"\([^()]*\)", "", i)
    df10 = re.sub((r'\–'), " ", df8)
    df11 = re.sub((r'\-'), " ", df10)
    data2 += [df11]
print(data2)

Output 1:

['1817 ', '1800 1824 ', '1825 1849', 'ca. 1850', '1856 60', '1861 07 XX', 'copied between 1824 and 1845', 'copied d. 14tn Merz 1767', '1718']

Then I iterated through the items, but I end up with more items in the list than at the beginning.

ls = data2
ls2 = []
 
for i in ls:
    res = re.findall(r'\w+', i)
    for w in res:
        if len(w) > 3:
            ls2.append(w)
print(ls2)

Output 2:

['1817', '1800', '1824', '1825', '1849', '1850', '1856', '1861', 'copied', 'between', '1824', '1845', 'copied', '14tn', 'Merz', '1767', '1718']

>Solution :

What I can think of is using a combination of regex and numpy modules:

import re
import numpy as np
myList = ['1817 (1817p)', '1800-1824 (19.1q)', '1825-1849', 'ca. 1850', '1856–60', '1861-07-XX', 'copied between 1824 and 1845', 'copied d. 14tn Merz 1767', '1718']
[np.array(re.findall("\d{4}",x)).astype("int").mean() for x in myList]

Output

[1817.0, 1812.0, 1837.0, 1850.0, 1856.0, 1861.0, 1834.5, 1767.0, 1718.0]

This actually gives you the mean of the numbers in each element of the list.

database-normalization

byMR

Published April 02, 2022

Add a comment

If item in list contains a keyword remove it

byMR

April 2, 2022

Questions

Definition of base has to precede definition of derived contract (ERC721 implementation)

byMR

April 2, 2022

Questions

simple mapping of pandas series to 0 and 1s given threshold

byMR

April 2, 2022

Questions

Connection to nodejs (express) refused

byMR

April 2, 2022

Questions

Updating global variables from try block

byMR

April 2, 2022

Questions

Realloc + memcpy 2D float array results in segmentation fault

byMR

April 2, 2022

Delete numbers smaller then 3 digits in a list while amount of items stays the same

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Output

Like this:

Leave a ReplyCancel reply

Read more

If item in list contains a keyword remove it

Definition of base has to precede definition of derived contract (ERC721 implementation)

simple mapping of pandas series to 0 and 1s given threshold

Connection to nodejs (express) refused

Updating global variables from try block

Realloc + memcpy 2D float array results in segmentation fault

Keep Up to Date with the Most Important News

Delete numbers smaller then 3 digits in a list while amount of items stays the same

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Output

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

If item in list contains a keyword remove it

Definition of base has to precede definition of derived contract (ERC721 implementation)

simple mapping of pandas series to 0 and 1s given threshold

Connection to nodejs (express) refused

Updating global variables from try block

Realloc + memcpy 2D float array results in segmentation fault

Discover more from Dev solutions