Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Julia Dataframes – concisely create column with eltype Union{Missing, T}

I’m building a dataframe where for some of the columns, the obvious way to create them involves a multi-step process. I’d like to idiomatically and concisely create a column with eltype Union{Missing, T}. Then I can then fill the column using the multi-step process (and disallowmissing once finished as appropriate). What’s the cleanest way to do this?

I’d like to do something like df[!, :col] :: Vector{Union{Int64, Missing}} .= missing but this gives "ArgumentError: column name :col not found in the data frame; ..."

If I try to do df[!, :col] .= fill(missing, nrow(df)) :: Vector{Union{Int64, Missing}}, I get "TypeError: in typeassert, expected Vector{Union{Missing, Int64}}, got a value of type Vector{Missing}".

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

For the moment I’m doing something ugly and confusing, like

df[!, :col] .= 0

allowmissing!(df, :col)

df.col .= missing

Any suggestions? My sense is that if I have this question, I don’t really understand the nuances of how column typing in DataFrames.jl works, even though I use it all the time and generally don’t have problems. I’ve searched the documentation and don’t feel like I’ve seen anything that would help with this specific issue, but any recommended reading would be appreciated.

Thanks!

>Solution :

This is a way to do it (there are other options how to add a column to a data frame, but the key function to use is missings):

julia> using DataFrames

julia> df = DataFrame()
0×0 DataFrame

julia> df.col = missings(Int, 5)
5-element Vector{Union{Missing, Int64}}:
 missing
 missing
 missing
 missing
 missing

julia> df
5×1 DataFrame
 Row │ col
     │ Int64?
─────┼─────────
   1 │ missing
   2 │ missing
   3 │ missing
   4 │ missing
   5 │ missing

julia> df.other_col = missings(Float64, nrow(df))
5-element Vector{Union{Missing, Float64}}:
 missing
 missing
 missing
 missing
 missing

julia> df
5×2 DataFrame
 Row │ col      other_col
     │ Int64?   Float64?
─────┼────────────────────
   1 │ missing    missing
   2 │ missing    missing
   3 │ missing    missing
   4 │ missing    missing
   5 │ missing    missing

As a side note – this issue is unrelated with DataFrames.jl but related to how vectors are created in Julia in general. The missings function is defined in the Missings.jl package (that is re-exported by DataFrames.jl). If you wanted to use Julia Base functionality only then the following would give you the same as using missings:

julia> Vector{Union{Int, Missing}}(missing, 5)
5-element Vector{Union{Missing, Int64}}:
 missing
 missing
 missing
 missing
 missing

(however, since it is more verbose I typically use the missings function)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading