Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Best way to get a specific column as y in pandas DataFrame

I want to extract one specific column as y from a pandas DataFrame.
I found two ways to do this so far:

# The First way
y_df = df[specific_column]
y_array = np.array(y_df)
X_df = df.drop(columns=[specific_column])
X_array = np.array(X_df)

# The second way
features = ['some columns in my dataset']
y_df = np.array(df.loc[:, [specific_column]].values)
X_df = df.loc[:, features].values

But when I compare the values in each y array, I see they are not equal:

y[:4]==y_array[:4]

array([[ True,  True, False, False],
       [ True,  True, False, False],
       [False, False,  True,  True],
       [False, False,  True,  True]])

But I am sure that these two arrays contain the same elements:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

y[:4], y_array[:4]

(array([[0],
        [0],
        [1],
        [1]], dtype=int64),
 array([0, 0, 1, 1], dtype=int64))

So, why do I see False values when I compare them together?

>Solution :

If use double [[]] get one element DataFrame and if convert to array get 2d array:

y_df = np.array(df.loc[:, [specific_column]].values)

Solution is remove [] for Series and if convert to array get 1d array:

y_df = df[specific_column].to_numpy()
#your solution
y_df = np.array(df.loc[:, specific_column].values)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading