Creating a single-column dataframe in Pandas from a Django queryset

October 6, 2022

I’m trying to create a dataframe containing the values from field_1 and field_2 in a single column. I haven’t used pandas a whole lot before, so I’m sure this is naive.

# Create a dataset in CSV format
field_names = ["description", "comments"]
writer = csv.writer(open("dataset.csv", "w"), quoting=csv.QUOTE_ALL, delimiter=",")
writer.writerow(field_names)
for instance in Order.objects.all():
    writer.writerow([str(getattr(instance, f)) for f in field_names])

# Read CSV
data_frame = pd.read_csv("dataset.csv", index_col=0)
# Combine columns
df2 = data_frame.apply(lambda x: ", ".join(x[x.notnull()]), axis=1)

If I’m working with a fairly large dataset, is there a way I can make this more efficient? I would like to eliminate the step that creates the CSV entirely, if possible. If anybody can point me in the right direction, that’d be fantastic.

>Solution :

You really don’t need.

The order objects and getattr; use .values_list() to get an iterable of 2-tuples (assuming field_names are actual fields on the model).
CSV – now that you have an iterable of 2-tuples, pass them to the DataFrame constructor along with the respective column names.

field_names = ["description", "comments"]
df = pd.DataFrame(Order.objects.all().values_list(field_names), columns=field_names)

Also, you wouldn’t really need Pandas at all…

from django.db.models import F, Value
from django.db.models.functions import Concat

# ... 
my_data = list(
    Order.objects.annotate(
        x=Concat(
            F("description"),
            Value(", "),
            F("comments"),
        )
    ).values_list("x", flat=True)
)

and you have a list of description, comments strings, just like the series you’d get with Pandas.