I have a dataset of people, it includes their assigned id from one through eight, and their gender. Using python, how might I use Disproportional Stratified Sampling to make some teams?
I’ve used this code to get the distribution:
from openpyxl import load_workbook
wb = load_workbook("dataset.xlsx")
sh = wb["Page"]
dist = []
for i in range(415):
gr = sh[f'C{i+1}'].value
gend = sh[f'D{i+1}'].value
dist.append(str(int(inc))+onc)
for i in range(8):
print(f'Grade {i+1} Males:' +str(dist.count(str(i+1)+'M')))
print(f'Grade {i+1} Females:' +str(dist.count(str(i+1)+'F')))
>Solution :
Based on your dataset, we will do the following steps:
- Load your dataset from Excel.
- Count the number of males and females in each grade.
- Use Disproportional Stratified Sampling to select samples for each team.
Here is how you can achieve these steps:
First, let’s fix the code for loading data and counting the number of males and females in each grade. In your code, it seems you are using inc
and onc
which are not defined anywhere. I guess you intended to use gr
and gend
instead. Here is the corrected code:
from openpyxl import load_workbook
# Load the workbook and select the sheet
wb = load_workbook("dataset.xlsx")
sh = wb["Page"]
# Initialize the distribution list
dist = []
# Iterate over the rows in the sheet and append grade and gender to the distribution list
for i in range(415):
gr = sh[f'C{i+2}'].value # Excel uses 1-based indexing, so the first data row is 2
gend = sh[f'D{i+2}'].value
dist.append(str(int(gr)) + gend)
# Print the counts of males and females in each grade
for i in range(1, 9):
print(f'Grade {i} Males: ' + str(dist.count(str(i) + 'M')))
print(f'Grade {i} Females: ' + str(dist.count(str(i) + 'F')))
Next, we will use Disproportional Stratified Sampling to select samples for each team. Disproportional Stratified Sampling means we choose a sample in such a way that the sample size of each stratum does not correspond to the proportional size in the population.
For simplicity, let’s assume we want to form 10 teams each having 8 members (4 males and 4 females), and we will choose one male and one female from each grade. This is a very simple form of Disproportional Stratified Sampling where the strata are the grades and the gender. Here is how you can do it:
import random
# Initialize teams as empty lists
teams = [[] for _ in range(10)]
# For each team
for team in teams:
# For each grade
for i in range(1, 9):
# Select one male and one female randomly
while True:
candidate = random.choice(dist)
if candidate.startswith(str(i)) and candidate.endswith('M') and candidate not in team:
team.append(candidate)
break
while True:
candidate = random.choice(dist)
if candidate.startswith(str(i)) and candidate.endswith('F') and candidate not in team:
team.append(candidate)
break
# Print the teams
for i, team in enumerate(teams, 1):
print(f'Team {i}: {team}')
This code will randomly select one male and one female from each grade to form a team. This ensures that each team has members from all grades and both genders.
Please note, this is a simple form of Disproportional Stratified Sampling and doesn’t take into account the actual distribution of the grades and genders in the population. The actual method of Disproportional Stratified Sampling would require knowledge about the desired distribution in the sample.