pandas & excel: keeping trailing zero in number stored as string

May 16, 2023

I have a small Excel spreadsheet I’m reading in a Python script using the pandas package. There is a column with what can look like a number (such as "1.10") that’s stored as string and when I access the cell in the Python script, it’s extracted as a float and the trailing zero is not present:

 0 Section    float  1.1
 0 Text       str    One
 1 Section    float  1.2
 1 Text       str    Two
 2 Section    float  1.3
 2 Text       str    Three
 3 Section    float  1.4
 3 Text       str    Four
 4 Section    float  1.5
 4 Text       str    Five
 5 Section    float  1.6
 5 Text       str    Six
 6 Section    float  1.7
 6 Text       str    Seven
 7 Section    float  1.8
 7 Text       str    Eight
 8 Section    float  1.9
 8 Text       str    Nine
 9 Section    float  1.1
 9 Text       str    Ten
10 Section    float  1.11
10 Text       str    Eleven

I created a gist with a small Python script and Excel spreadsheet to illustrate the problem.

Does anyone know how to extract this type of cell as a string and retain the trailing blanks?

>Solution :

you can modify your code by explicitly specifying the data types for the columns you want to read as strings. Here’s your code that i edited as the code that i wrote before when i had the same problem and hope it works :

import sys
import argparse
import pandas as pd

parser = argparse.ArgumentParser(description=sys.argv[0])
parser.add_argument('path', help='Path to excel file')
args = parser.parse_args()

spreadsheet = pd.read_excel(args.path, sheet_name=None, dtype={'Section': str})
sheet_names = list(spreadsheet.keys())
sheet_name = sheet_names[0]
sheet = spreadsheet[sheet_name]
column_names = list(sheet.columns)
columns = {column_name: list(sheet[column_name]) for column_name in column_names}

for row in range(len(columns[column_names[0]])):
    for (column_name, column) in columns.items():
        cell = str(column[row])  # Convert the cell to string
        print(f'{row:2} {column_name:10} {type(cell).__name__:6} {cell}')

In the above code, the dtype={'Section': str} parameter is passed to the read_excelfunction. This tells pandas to treat the "Section" column as a string when reading the Excel file. By explicitly converting the cell to a string using str(column[row]), the trailing zero will be retained when accessing the cell value.

Make sure to replace 'numbers-as-strings.xlsx' with the path to your actual Excel file.

Hope it leads u to the way .