Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

All the ways to construct DataFrame() from data

The parameters section of the documentation for DataFrame (as of pandas 2.0.0) begins:

data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame

Dict can contain Series, arrays, constants, dataclass or list-like objects. If data is a dict, column order follows insertion-order. If a dict contains Series which have an index defined, it is aligned by its index. This alignment also occurs if data is a Series or a DataFrame itself. Alignment is done on Series/DataFrame inputs.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

If data is a list of dicts, column order follows insertion-order.

The description points to valid input types (i.e., ndarray, Iterable, dict, or DataFrame) but does not completely describe how the constructor will turn the data into a DataFrame. It seems like somewhat of a black box. Should I be able to predict, based on the documentation, that, say, passing a list containing a single Series and no other arguments will give a result that looks like Series.to_frame().T (although the dtypes may differ; see this answer and this one)?

The purpose of this question is to solicit answers that classify the different ways of passing data to a DataFrame() via data, according to how the constructor puts or massages the data into the DataFrame. It is necessarily a broad question, but there should be a finite number of cases given that the constructor is, you know, implemented in code. I’m interested in this question and would be willing to dig through the source code a little to discover the answer; however, I think others with more experience may have insights to share here before I do that.

This is a single question about rules broadly, and I believe its answers belong together in one place. However, since it is broad, I will provide some specific sub-questions to get us started:

  • For iterables, what container and element combinations are valid? Without needing to try it, should I be able to predict what will happen if I pass a list of DataFrames or a Series of Series? Which axis is used when a Series input is "aligned by its index"? Does the treatment depend at all on what its elements are?

  • How do the container and element types passed via data affect how the DataFrame will be put together? Should I be able to predict how the data will be aligned along the axes of the resulting DataFrame based on knowledge of data alone? I don’t know if the answer is obvious, but in either case I do not see it documented.

  • If I think of a DataFrame as "a dict-like container for Series objects" (as docs suggest), what are the intuitive rules governing how data gets interpreted (loosely) into keys and values?

I’m open to suggestions for improving the question, but I do think it’s a question that needs to be asked and I did not find a similar question on this site.

>Solution :

Besides the documentation, it’s sometimes useful to read the tests, especially test_constructors.py in your case. There are many ways to build a DataFrame.

Too long to describe all ways, take a look to test_constructors.py

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading