The parameters section of the documentation for DataFrame (as of pandas 2.0.0) begins:
data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame
Dict can contain Series, arrays, constants, dataclass or list-like objects. If data is a dict, column order follows insertion-order. If a dict contains Series which have an index defined, it is aligned by its index. This alignment also occurs if data is a Series or a DataFrame itself. Alignment is done on Series/DataFrame inputs.
If data is a list of dicts, column order follows insertion-order.
The description points to valid input types (i.e., ndarray, Iterable, dict, or DataFrame) but does not completely describe how the constructor will turn the data into a DataFrame. It seems like somewhat of a black box. Should I be able to predict, based on the documentation, that, say, passing a list containing a single Series and no other arguments will give a result that looks like Series.to_frame().T (although the dtypes may differ; see this answer and this one)?
The purpose of this question is to solicit answers that classify the different ways of passing data to a DataFrame() via data, according to how the constructor puts or massages the data into the DataFrame. It is necessarily a broad question, but there should be a finite number of cases given that the constructor is, you know, implemented in code. I’m interested in this question and would be willing to dig through the source code a little to discover the answer; however, I think others with more experience may have insights to share here before I do that.
This is a single question about rules broadly, and I believe its answers belong together in one place. However, since it is broad, I will provide some specific sub-questions to get us started:
-
For
iterables, what container and element combinations are valid? Without needing to try it, should I be able to predict what will happen if I pass alistofDataFramesor aSeriesofSeries? Which axis is used when aSeriesinput is "aligned by its index"? Does the treatment depend at all on what its elements are? -
How do the container and element types passed via
dataaffect how theDataFramewill be put together? Should I be able to predict how the data will be aligned along the axes of the resultingDataFramebased on knowledge ofdataalone? I don’t know if the answer is obvious, but in either case I do not see it documented. -
If I think of a
DataFrameas "a dict-like container forSeriesobjects" (as docs suggest), what are the intuitive rules governing howdatagets interpreted (loosely) into keys and values?
I’m open to suggestions for improving the question, but I do think it’s a question that needs to be asked and I did not find a similar question on this site.
>Solution :
Besides the documentation, it’s sometimes useful to read the tests, especially test_constructors.py in your case. There are many ways to build a DataFrame.
Too long to describe all ways, take a look to test_constructors.py