Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas: converting dataframe column to int following dataframe manipulation

Running pandas 1.5.3. Also attempted on pandas 2.2.1.

I am loading in data from a CSV that looks like such:

888|0|TEST ACCOUNT
888|1|Sample Ship-to
802001|0|COMPANY 1
802001|1|COMPANY 1 INC
802001|2|COMPANY 1 BALL
K802001|3|COMPANY 1

With columns CUSNO, S2, and NAME, in that order.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I have a script that loads in the data, then checks the first column and makes sure it is of int64 in the resulting DataFrame. If not, the script is supposed to convert the column to numeric and drop the rows that have NaN in them.

So, before:

     CUSNO  S2            NAME
0      888   0    TEST ACCOUNT
1      888   1  Sample Ship-to
2   802001   0       COMPANY 1
3   802001   1   COMPANY 1 INC
4   802001   2  COMPANY 1 BALL
5  K802001   3       COMPANY 1

Then run:

cl['CUSNO'] = pd.to_numeric(cl.CUSNO, errors='coerce')
cl = cl.dropna(axis='index', how='any')

After:

      CUSNO  S2            NAME
0     888.0   0    TEST ACCOUNT
1     888.0   1  Sample Ship-to
2  802001.0   0       COMPANY 1
3  802001.0   1   COMPANY 1 INC
4  802001.0   2  COMPANY 1 BALL

I want to make CUSNO a column full of int64 or similar types, but when I run company_locations['CUSNO'].dtype it keeps returning float64. (Realistically, I want to get rid of the decimal point at the end of every entry in CUSNO and thought typecasting to int or similar would work best.)

I’ve tried a number of solutions, namely:

cl['CUSNO'] = pd.to_numeric(cl.CUSNO, errors='coerce').dropna().astype(int) # replacing the earlier line 1 of the script
cl['CUSNO'] = cl.astype({'CUSNO': 'int'})
cl['CUSNO'] = cl['CUSNO'].apply(pd.to_numeric, errors='coerce')

I’ve tried inplace=True for line 2 in the script above. I’ve also tried solutions from pandas: to_numeric for multiple columns, Change column type in pandas, and Python – pandas column type casting with "astype" is not working.

Perhaps I’m missing something here? Do I have to copy the new DataFrame to a new variable or something?

>Solution :

I think simple (after dropping the NaNs):

df["CUSNO"] = df["CUSNO"].astype(int)
print(df)

Prints:

    CUSNO  S2            NAME
0     888   0    TEST ACCOUNT
1     888   1  Sample Ship-to
2  802001   0       COMPANY 1
3  802001   1   COMPANY 1 INC
4  802001   2  COMPANY 1 BALL
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading