I have a small Excel spreadsheet I’m reading in a Python script using the pandas package. There is a column with what can look like a number (such as "1.10") that’s stored as string and when I access the cell in the Python script, it’s extracted as a float and the trailing zero is not present:
0 Section float 1.1
0 Text str One
1 Section float 1.2
1 Text str Two
2 Section float 1.3
2 Text str Three
3 Section float 1.4
3 Text str Four
4 Section float 1.5
4 Text str Five
5 Section float 1.6
5 Text str Six
6 Section float 1.7
6 Text str Seven
7 Section float 1.8
7 Text str Eight
8 Section float 1.9
8 Text str Nine
9 Section float 1.1
9 Text str Ten
10 Section float 1.11
10 Text str Eleven
I created a gist with a small Python script and Excel spreadsheet to illustrate the problem.
Does anyone know how to extract this type of cell as a string and retain the trailing blanks?
>Solution :
you can modify your code by explicitly specifying the data types for the columns you want to read as strings. Here’s your code that i edited as the code that i wrote before when i had the same problem and hope it works :
import sys
import argparse
import pandas as pd
parser = argparse.ArgumentParser(description=sys.argv[0])
parser.add_argument('path', help='Path to excel file')
args = parser.parse_args()
spreadsheet = pd.read_excel(args.path, sheet_name=None, dtype={'Section': str})
sheet_names = list(spreadsheet.keys())
sheet_name = sheet_names[0]
sheet = spreadsheet[sheet_name]
column_names = list(sheet.columns)
columns = {column_name: list(sheet[column_name]) for column_name in column_names}
for row in range(len(columns[column_names[0]])):
for (column_name, column) in columns.items():
cell = str(column[row]) # Convert the cell to string
print(f'{row:2} {column_name:10} {type(cell).__name__:6} {cell}')
In the above code, the dtype={'Section': str} parameter is passed to the read_excelfunction. This tells pandas to treat the "Section" column as a string when reading the Excel file. By explicitly converting the cell to a string using str(column[row]), the trailing zero will be retained when accessing the cell value.
Make sure to replace 'numbers-as-strings.xlsx' with the path to your actual Excel file.
Hope it leads u to the way .