This might be a very easy to spot mistake but I am not the most familiar with the Django queries so I have been breaking my head for hours trying to solve that.
I have a Django Rest app and there I have 2 models:
class Student(models.Model):
name = models.CharField(max_length=20)
status = models.CharField(max_length=4, blank=True, default="UNK", choices=STATUS_TYPES)
started_date = models.DateTimeField(null=False, auto_now_add=True)
class StudentStatusHistory(BaseTimestampModel):
"""Table of status changes for Students"""
student = models.ForeignKey(
"core.Student", on_delete=models.CASCADE, related_name="StatusHistory"
)
status = models.CharField(
max_length=4, blank=True, default="UNK", choices=STATUS_TYPES
)
created_on = models.DateTimeField(null=False, auto_now_add=True)
For reference:
STATUS_TYPES = [("UNK", "Unknown"), (“PASS", “Passed"), (“FAIL", “Failed")]
I am trying to write a query that calculates for how many students their status changed to Pass
within the time window of a week or less.
So for example if a Student.started_date
is 2023-07-15 12:22:22
and the StudentStatusHistory.created_on
field is 2023-07-21 12:22:22
, for a student with Student.status
that equals PASS
, I want to count this student because they passed in less than a week (in 6 days in this example).
Additionaly, I want to calculate this count only for students that started the current month.
What I have so far is:
current_month_start = datetime(datetime.today().year, datetime.today().month, 1)
# Subquery to filter StudentStatusHistory objects for each Student object within a week or less
pass_within_week_subquery = StudentStatusHistory.objects.filter(
student_id=OuterRef(“pk"),
status="PASS",
created_on__range=(OuterRef("started_date"), OuterRef(“started_date") + timedelta(weeks=1))
)
# Subquery to filter Student objects created within the current month
student_within_month_subquery = Student.objects.filter(
started_date__gte=current_month_start,
pk=OuterRef(“pk"),
)
# Main queryset to annotate Student objects with pass_within_week count
student_objects_with_pass_within_week = Student.objects.annotate(
pass_within_week_count=Subquery(
pass_within_week_subquery.annotate(count_pass_within_week=Count("pk")).values(“count_pass_within_week"),
output_field=IntegerField()),
is_within_month=Exists(student_within_month_subquery),
)
# Count the number of student objects that satisfy the condition
count_student_pass_within_week = student_objects_with_pass_within_week.filter(pass_within_week_count__gt=0,
is_within_month=True).count()
return Response(data= count_student_pass_within_week)
(I am not sure what’s up with the python format, but please ignore or edit that part)
Everything is running seemingly smoothly until the last line about the count that returns the error:
ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.
I read a lot about the OuterRef, Subqueries, etc and I thought that by using them in a main queryset would solve this problem but clearly not.
Any ideas?
>Solution :
I think you can do this simpler with:
from datetime import timedelta
from django.db.models import F
Student.objects.filter(
StatusHistory__status='PASS',
StatusHistory__created_on__range=(
F('started_date'),
F('started_date') + timedelta(weeks=1),
),
).distinct().count()
we thus filter on the Student
s to only retain the ones with a related StatusHistory
that has status='PASS'
, and the created_on
in the time range of started_date
and started_date
and a week. We then count the distinct Student
s that match this. We need to use .distinct()
[Django-doc] to prevent counting the student multiple times, if the status is set to PASS
multiple times within that week.
Note: The
related_name=…
[Django-doc]
is the name of the manager to fetch the related objects in reverse. Therefore
normally therelated_name
of aForeignKey
orManyToManyField
is plural,
for examplestatus_changes
instead of.StatusHistory