I had an interview where they asked me: "Find the most popular class among students using their first enrollment date", where the assumption was that a student would pick their favorite class first. For simplicity, no two
EnrollmentDT could be exactly the same and there are no data issues (e.g. a student can’t be enrolled in the same class twice).
They expected me to use a window function, and I’m curious how to do that for this problem.
I quickly setup some seed data as such (I’m aware the seed portion isn’t a perfect representation, but I needed something close enough quickly):
IF OBJECT_ID('StudentClass') IS NOT NULL DROP TABLE StudentClass; IF OBJECT_ID('Class') IS NOT NULL DROP TABLE Class; IF OBJECT_ID('Student') IS NOT NULL DROP TABLE Student; CREATE TABLE Student ( StudentID INT IDENTITY(1,1) PRIMARY KEY, [Name] UNIQUEIDENTIFIER DEFAULT NEWID(), ); CREATE TABLE Class ( ClassID INT IDENTITY(1,1) PRIMARY KEY, [Name] UNIQUEIDENTIFIER DEFAULT NEWID(), ClassLevel INT DEFAULT CAST(CEILING(RAND() * 3) AS INT) ); CREATE TABLE StudentClass ( StudentClassID INT IDENTITY(1,1), StudentID INT FOREIGN KEY REFERENCES Student (StudentID), ClassID INT FOREIGN KEY REFERENCES Class (ClassID), EnrollmentDT DATETIME2 ); GO INSERT INTO Student DEFAULT VALUES GO 50 INSERT INTO Class DEFAULT VALUES GO 5 DECLARE @StudentIndex INT = 1; DECLARE @Cycle INT = 1; WHILE @Cycle <= 5 BEGIN IF RAND() > 0.5 BEGIN INSERT INTO StudentClass (StudentID, ClassID, EnrollmentDT) VALUES (@StudentIndex, @Cycle, DATEADD(SECOND, CAST(CEILING(RAND() * 10000) AS INT), SYSDATETIME())) END SET @StudentIndex = @StudentIndex + 1; IF @StudentIndex = 50 BEGIN SET @Cycle = @Cycle + 1; SET @StudentIndex = 1; END END
But the only thing I could come up with was:
SELECT sc.ClassID, COUNT(*) AS IsFavoriteClassCount FROM StudentClass sc INNER JOIN ( SELECT StudentID, MIN(EnrollmentDT) AS MinEnrollmentDT FROM StudentClass GROUP BY StudentID ) sq ON sc.StudentID = sq.StudentID AND sc.EnrollmentDT = sq.MinEnrollmentDT GROUP BY sc.ClassID ORDER BY IsFavoriteClassCount DESC;
Any guidance on their thinking would be greatly appreciated! If I made any errors in my constructions / query, take that as a proper error and not something intentional.
SELECT ClassID, COUNT(*) AS IsFavoriteClassCount FROM ( SELECT sc.ClassID, sc.StudentID, ROW_NUMBER() OVER (PARTITION BY sc.StudentID ORDER BY sc.EnrollmentDT) AS rn FROM StudentClass sc ) t WHERE rn = 1 GROUP BY ClassID ORDER BY IsFavoriteClassCount DESC;
The query uses ROW_NUMBER() as a window function to assign a unique row number to each student’s first enrollment in a class (based on their enrollment date). The inner query selects the ClassID and the StudentID for each enrollment and adds the row number, partitioned by the StudentID and ordered by the enrollment date. The outer query filters for the rows where the row number is 1, which indicates that these are the first enrollments for each student, and aggregates the data by the ClassID to get the count of how many times each class was the first choice of a student.