Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to remove everything except numeric elements in R

My apologies because there is certainly many similar questions and answers but I’ve tried a bunch of the suggested answers and sadly no dice.

I’ve got temperature data in three columns of a dataframe (tempdata). For simplicity I’m just trying to change one of these locations (wentworth.castle) at a time.

This is what my data looks like. All the columns with ".castle" in them are temperatures for that site. There are missing values but this is expected. Hoping to turn them into NA.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

glimpse(tempdata)
Rows: 3,395
Columns: 5
$ Description      <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", …
$ date.time        <chr> "22/11/2023 09:48", "22/11/2023 10:18", "22/11/2023 10:48", "22/11/2023 11:…
$ site.castle      <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",…
$ dover.castle     <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",…
$ wentworth.castle <chr> "9.484 \xb0C", "9.642 \xb0C", "9.768 \xb0C", "9.994 \xb0C", "10.066 \xb0C",…

I’ve tried the few things below and got the following errors.

tempdata$wentworth.castle <- gsub(" �C", "", as.numeric(tempdata$wentworth.castle))
#Error in is.factor(x) : invalid multibyte string at '<b0>C'

tempdata$wentworth.castle <- gsub(" \xb0C", "", as.numeric(tempdata$wentworth.castle))
#Error in is.factor(x) : invalid multibyte string at '<b0>C'

tempdata$wentworth.castle = tempdata$wentworth.castle.replace('\u00b0','', regex=True)
#Error: attempt to apply non-function

tempdata$wentworth.castle <- as.numeric(tempdata$wentworth.castle)
#Error: invalid multibyte string at '<b0>C'

I also tried a less robust way and attempted to create a function to remove things after a certain number of characters, however this is difficult because sometimes my data has 5 sig figures and sometimes 6 so even if it had worked I would have had some random spaces to remove from some of the entries.

left = function(string, chat){substr(string, 1, char)}
tempdata$wentworth.castle <- left(tempdata$wentworth.castle, 6)
#Error in as.integer(stop) : 
#  cannot coerce type 'closure' to vector of type 'integer'

>Solution :

This is an encoding issue not correctly interpreting the degree symbol, you can use iconv to convert then gsub to remove °C:

# data 
wentworth <- c("9.484 \xb0C", "9.642 \xb0C", "9.768 \xb0C", "9.994 \xb0C", "10.066 \xb0C")

gsub(" °C","", iconv(wentworth, from = "ISO-8859-1", to = "UTF-8"))

# [1] "9.484"  "9.642"  "9.768"  "9.994"  "10.066"

# or if you want it numeric, just wrap it
as.numeric(gsub(" °C","", iconv(wentworth, from = "ISO-8859-1", to = "UTF-8")))

# [1]  9.484  9.642  9.768  9.994 10.066

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading