In the following scenario, I would like to:
- Replace the missing values with a custom character, for example
- - Apply custom formatting with respect to numbers
Problem
I seem to be able to achieve only one of those goals. If I use the code below, I can achieve the desired results with respect to formatting; however the missing values are printed as nan
"""Handling Missing Data in Pandas / Tabulate
"""
import pandas as pd
from tabulate import tabulate
import seaborn as sns
import numpy as np
# Create sample data
iris_data = sns.load_dataset('iris')
# Derive summary table
iris_summary = pd.DataFrame.describe(iris_data, percentiles=[]).transpose()
# Add missing values
iris_summary.iloc[0, 1:6] = None
# Show missing data
print(tabulate(iris_summary, missingval="-",
floatfmt=(".0f", ".0f", ".3f", ".1f", ".4f", ".1f", ".0f")))
Results
------------ --- ------- ----- -------- ----- ---
sepal_length 150 nan nan nan nan nan
sepal_width 150 3.057 0.4 2.0000 3.0 4
petal_length 150 3.758 1.8 1.0000 4.3 7
petal_width 150 1.199 0.8 0.1000 1.3 2
------------ --- ------- ----- -------- ----- ---
Attempt 1
I’ve tried replacing the missing values
iris_summary.replace(np.nan, "", inplace=True)
but the results are not satisfactory as the number formatting is lost:
------------ --- ------------------ ------------------ --- ---- ---
sepal_length 150
sepal_width 150 3.0573333333333337 0.4358662849366982 2.0 3.0 4.4
petal_length 150 3.7580000000000005 1.7652982332594662 1.0 4.35 6.9
petal_width 150 1.1993333333333336 0.7622376689603465 0.1 1.3 2.5
------------ --- ------------------ ------------------ --- ---- ---
Desired results
I would like to arrive at the table that looks as fallows:
------------ --- ------- ----- -------- ----- ---
sepal_length 150 - - - - -
sepal_width 150 3.057 0.4 2.0000 3.0 4
petal_length 150 3.758 1.8 1.0000 4.3 7
petal_width 150 1.199 0.8 0.1000 1.3 2
------------ --- ------- ----- -------- ----- ---
Notes
- The
tabulatecall is used within a function that outputs LaTeX tables. My intention is to see the solution flexible, sotabulatecan received pandas dataframe with missing values at different locations and correctly apply the formatting passed via thefloatfmtto the remaining values within the column.
>Solution :
Use replace:
print(tabulate(iris_summary.replace(np.nan, None), missingval='-',
floatfmt=(".0f", ".0f", ".3f", ".1f", ".4f", ".1f", ".0f")))
Output:
------------ --- ----- --- ------ --- -
sepal_length 150 - - - - -
sepal_width 150 3.057 0.4 2.0000 3.0 4
petal_length 150 3.758 1.8 1.0000 4.3 7
petal_width 150 1.199 0.8 0.1000 1.3 2
------------ --- ----- --- ------ --- -
I think missingval is for None value but Pandas convert None to NaN where the column has float dtype so you have to force to replace nan by None to get your expected output.