Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

"ERROR: ArgumentError: Table returned but a single output column was expected" in transform! dataframes

I want to perform a simple One-Hot encoding by utilizing DataFrames.jl‘s transform! but I’m unsuccessful. I use the following DataFrame:

using DataFrames

df = DataFrame(
  color = ["red", "green", "blue"],
  x = [1, 2, 3]
)
# 3×2 DataFrame
#  Row │ color   x
#      │ String  Int64
# ─────┼───────────────
#    1 │ red         1
#    2 │ green       2
#    3 │ blue        3

And I defined a simple function to return the encoded matrix:

function OneHotEncod(vec::Vector{String})
  reduce(hcat, [vec .== i for i=vec])
end

Then, when I run the following code, I get an error:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

transform!(df, Cols(:color) => x -> OneHotEncod(x), renamecols=false)
ERROR: ArgumentError: Table returned, but a single output column was expected

The error is clear, but I wonder if there is any way to use transform! while the specified function returns more than one vector (like a Matrix)?


Appendix:

OneHotEncod(df.color)
# 3×3 BitMatrix:
#  1  0  0
#  0  1  0
#  0  0  1

>Solution :

You just need to specify that the output has multiple columns like this:

julia> transform!(df, :color => OneHotEncod => AsTable)
3×5 DataFrame
 Row │ color   x      x1     x2     x3
     │ String  Int64  Bool   Bool   Bool
─────┼────────────────────────────────────
   1 │ red         1   true  false  false
   2 │ green       2  false   true  false
   3 │ blue        3  false  false   true

A natural alternative is:

julia> transform!(df, [:color => ByRow(==(c)) => c for c in unique(df.color)])
3×8 DataFrame
 Row │ color   x      x1     x2     x3     red    green  blue
     │ String  Int64  Bool   Bool   Bool   Bool   Bool   Bool
─────┼─────────────────────────────────────────────────────────
   1 │ red         1   true  false  false   true  false  false
   2 │ green       2  false   true  false  false   true  false
   3 │ blue        3  false  false   true  false  false   true

as then you automatically set informative column names.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading