Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extract human readable memory usage for Pandas data frame

I have a data frame:

pd.DataFrame({'A': range(1, 10000)})

I can get a nice human-readable thing saying that it has a memory usage of 78.2 KB using df.info():

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9999 entries, 0 to 9998
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   A       9999 non-null   int64
dtypes: int64(1)
memory usage: 78.2 KB

I can get an unhelpful statement with similar effect using df.memory_usage() (and this is how Pandas itself calculates its own memory usage) but would like to avoid having to roll my own. I’ve looked at the df.info source and traced the source of the string all the way to this line.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

How is this specific string generated and how can I pull that out so I can print it to a log?

Nb I can’t parse the df.info() output because it prints directly to buffer; calling str on it just returns None.

Nb This line also does not help, what is initialised is merely a boolean flag for whether memory usage should be printed at all.

>Solution :

You can create an instance of pandas.io.formats.info.DataFrameInfo and read the memory_usage_string property, which is exactly what df.info() does:

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9999 entries, 0 to 9998
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   A       9999 non-null   int64
dtypes: int64(1)
memory usage: 78.2 KB
>>> pd.io.formats.info.DataFrameInfo(df).memory_usage_string.strip()
'78.2 KB'

If you’re passing memory_usage to df.info, you can pass it directly to DataFrameInfo:

pd.io.formats.info.DataFrameInfo(df, memory_usage='deep').memory_usage_string.strip()
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading