Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create new variable x by selecting the value of a cell in a row conditional on the variable name containing the value of a variable z

I have a dataset of individuals. Each individual is observed across different years and countries. I have two independent variables x2002 and x2010 (see Dataex example: Before). These contain the values of a variable X for each year for a specific country. For example, all individuals observed in 2002 in France have a value of 10 for x2002 and a value of 200 for x2002. I want to create a new variable called X. For each row, the value for X should be set to the cell of the variable that contains the equivalent year in the variable name. (My original dataset has variables from x2002 to x2018 and 100.000 observations, so I need to automate this.). I am relatively new to Stata, The result should look like this:

Dataex example: After

* Example generated by -dataex-. For more info, type help dataex
clear
input int id str6 country int(year y1) float(y2 x2002 x2010 x)
7 "France" 2002 7 1.8628873 10 200  10
5 "France" 2002 2  19.52281 10 200  10
6 "France" 2010 7  14.80233 15 500 500
8 "France" 2010 9 15.173367 15 500 500
3 "USA"    2002 8 2.2806208  5 100   5
1 "USA"    2002 6 18.706757  5 100   5
4 "USA"    2010 4  7.421789 30  25  25
2 "USA"    2010 4  9.642876 30  25  25
end

Dataex example: Before

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

* Example generated by -dataex-. For more info, type help dataex
clear
input int id str6 country int(year y1) float(y2 x2002 x2010)
7 "France" 2002 7 1.8628873 10 200
5 "France" 2002 2  19.52281 10 200
6 "France" 2010 7  14.80233 15 500
8 "France" 2010 9 15.173367 15 500
3 "USA"    2002 8 2.2806208  5 100
1 "USA"    2002 6 18.706757  5 100
4 "USA"    2010 4  7.421789 30  25
2 "USA"    2010 4  9.642876 30  25
end

My code looks like this:

foreach i in year {
    gen x = list x`i' in i
}

I do not understand how to dynamically select a variable.

>Solution :

* Example generated by -dataex-. For more info, type help dataex
clear
input int id str6 country int(year y1) float(y2 x2002 x2010)
7 "France" 2002 7 1.8628873 10 200
5 "France" 2002 2  19.52281 10 200
6 "France" 2010 7  14.80233 15 500
8 "France" 2010 9 15.173367 15 500
3 "USA"    2002 8 2.2806208  5 100
1 "USA"    2002 6 18.706757  5 100
4 "USA"    2010 4  7.421789 30  25
2 "USA"    2010 4  9.642876 30  25
end

gen X = . 

quietly forval y = 2002/2018 { 
   capture replace X = x`y' if year == `y' 
}

This is quite concisely done in Stata, but here hinges on substitution of the contents of a local macro both as part of a variable name and as the value of a variable.

Note that foreach i in year is legal, but not at all a loop over the distinct values of year. It’s a loop over one item, the name or text "year". Also you need different syntax for referring to the local macro inside the loop, as just given.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading