Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create unique label for repeated units with PROC SURVEYSELECT in SAS

I need to resample from a real (cluster) trial data set. So far, I have used the following PROC SURVEYSELECT procedure in SAS to sample 10 clusters from the trial with replacement, with 50% of clusters coming from the control arm and 50% coming from the treatment arm. I repeat this 100 times to get 100 replicates with 10 clusters each and equal allocation.

proc surveyselect data=mydata out=resamples reps=100 sampsize=10 method=urs outhits;
  cluster site;
  strata rx / alloc=(0.5 0.5);
run;

Since I am using unrestricted random sampling (method=urs) to sample with replacement, I specified outhits so that SAS will inform me when a cluster is sampled more than once in each replication.

However, within each replicate in the output resamples dataset, I have not found a way to easily assign a unique identifier to clusters that appear more than once. If a cluster is sampled m times within a replicate, the observations within that cluster are simply repeated m times.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I attempted to use PROC SQL to identify distinct cluster ids and their occurrences within each replication, thinking I could use that to duplicate IDs as appropriate before joining additional data as necessary.

proc sql;
    create table clusterselect as
    select distinct r.replicate, r.site, r.numberhits from resamples as r;
quit;

However, I cannot figure out how to simply replicate rows in SAS.

Any help is appreciate, whether it be modifying PROC SURVEYSELECT to yield a unique cluster id within each replication or repeating cluster IDs as appropriate based on numberhits.

Thank you!


Here’s what I’ve done:

/* 100 resamples with replacement */
proc surveyselect data=mydata out=resamples reps=100 sampsize=10 method=urs outhits;
  cluster site;
  strata rx / alloc=(0.5 0.5);
run;

/* identify unique sites per replicate and their num of appearances (numberhits) */
proc sql;
    create table clusterSelect as
    select distinct r.replicate, r.site, r.numberhits from resamples as r;
quit;

/* for site, repeat according to numberhits */
/* create unique clusterId */
data uniqueIds;
    set clusterSelect;
    do i = 1 to numberhits;
        clusterId = cat(site, i);
        output;
    end;
    drop i numberhits;
run;

/* append data to cluster, retaining unique id */
proc sql;
    create table resDat as
    select
        uid.replicate,
        uid.clusterId,
        uid.site,
        mydata.*
    from uniqueIds as uid
    left join mydata
    on uid.site = mydata.site
quit;

>Solution :

Are you just asking how to convert one observation into the number of observations indicated in the NUMBERHITS variable?

data want;
  set resamples;
  do _n_=1 to numberhits;
    output;
  end;
run;
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading