Home How to remove zeros in dataframe after being created from dictionary?

Questions

How to remove zeros in dataframe after being created from dictionary?

April 12, 2022

I have this dictionary with descriptive statistics of the data:

import pandas as pd


def summary_table(df):
    """
    Return a summary table with the descriptive statistics about the dataframe.
    """

    summary = {
        "Number of Days": [len(df)],
        "Missing Cells": [df.isnull().sum().sum()],
        "Missing Cells (%)": [round(df.isnull().sum().sum() / df.shape[0] * 100, 2)],
        "Duplicated Rows": [df.duplicated().sum()],
        "Duplicated Rows (%)": [round(df.duplicated().sum() / df.shape[0] * 100, 2)],
        "Length of Categorical Variables": [len([i for i in df.columns if df[i].dtype == object])],
        "Length of Numerical Variables": [len([i for i in df.columns if df[i].dtype != object])]
    }
    print(summary.items())
    df = pd.DataFrame(summary.items(), columns=['Description', 'Value'])
    df = df.applymap(lambda x: x[0] if isinstance(x, list) else x)
    return df

df=pd.read_csv('test.csv')
df2=summary_table(df)
print(df2)

and this creates the output:

dict_items([('Number of Days', [434]), ('Missing Cells', [108]), ('Missing Cells (%)', [24.88]), ('Duplicated Rows', [0]), ('Duplicated Rows (%)', [0.0]), ('Length of Categorical Variables', [1]), ('Length of Numerical Variables', [11])])
                       Description   Value
0                   Number of Days  434.00
1                    Missing Cells  108.00
2                Missing Cells (%)   24.88
3                  Duplicated Rows    0.00
4              Duplicated Rows (%)    0.00
5  Length of Categorical Variables    1.00
6    Length of Numerical Variables   11.00

When printing the dictionary items, the data doesn’t contain zeros at the end. However, the dataframe cells contain extra zeros, which cause confusion. How could I fix this issue and remove the extra zeros in the dataframe conversion from dictionary?

>Solution :

Use an object dtype to enable mixed int/floats. Don’t use lists as container:

def summary_table(df):
    """
    Return a summary table with the descriptive statistics about the dataframe.
    """
    nulls = df.isnull().sum().sum()
    dups = df.duplicated().sum()
    summary = {
        "Number of Days": len(df),
        "Missing Cells": nulls,
        "Missing Cells (%)": round(nulls / df.shape[0] * 100, 2),
        "Duplicated Rows": dups,
        "Duplicated Rows (%)": round(dups / df.shape[0] * 100, 2),
        "Length of Categorical Variables": len([i for i in df.columns if df[i].dtype == object]),
        "Length of Numerical Variables": len([i for i in df.columns if df[i].dtype != object])
    }
    df = pd.DataFrame(summary.items(), columns=['Description', 'Value'], dtype=object)
    return df

Example:

print(summary_table(df))
                       Description Value
0                   Number of Days     8
1                    Missing Cells     0
2                Missing Cells (%)   0.0
3                  Duplicated Rows     0
4              Duplicated Rows (%)   0.0
5  Length of Categorical Variables     2
6    Length of Numerical Variables     1

You could further improve your code to avoid computing duplicated indicators.

For instance:

nulls = df.isnull().sum().sum()
...
        "Missing Cells": [nulls],
        "Missing Cells (%)": [nulls / df.shape[0] * 100, 2)
...

byMR

Published April 12, 2022

Add a comment

Typescript how to return key type

byMR

April 12, 2022

Questions

Footer appears in the wrong position

byMR

April 12, 2022

Questions

Checked Items is undefined on selection React JS

byMR

April 12, 2022

Questions

Elasticsearch: Use loop in Painless script

byMR

April 12, 2022

Questions

How to add total average column – postgreSQL

byMR

April 12, 2022

Questions

Write every nth line from list to new row in csv file

byMR

April 12, 2022

How to remove zeros in dataframe after being created from dictionary?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Typescript how to return key type

Footer appears in the wrong position

Checked Items is undefined on selection React JS

Elasticsearch: Use loop in Painless script

How to add total average column – postgreSQL

Write every nth line from list to new row in csv file

Keep Up to Date with the Most Important News

How to remove zeros in dataframe after being created from dictionary?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Typescript how to return key type

Footer appears in the wrong position

Checked Items is undefined on selection React JS

Elasticsearch: Use loop in Painless script

How to add total average column – postgreSQL

Write every nth line from list to new row in csv file

Discover more from Dev solutions