I have a dataset of individuals. Each individual is observed across different years and countries. I have two independent variables x2002 and x2010 (see Dataex example: Before). These contain the values of a variable X for each year for a specific country. For example, all individuals observed in 2002 in France have a value of 10 for x2002 and a value of 200 for x2002. I want to create a new variable called X. For each row, the value for X should be set to the cell of the variable that contains the equivalent year in the variable name. (My original dataset has variables from x2002 to x2018 and 100.000 observations, so I need to automate this.). I am relatively new to Stata, The result should look like this:
Dataex example: After
* Example generated by -dataex-. For more info, type help dataex
clear
input int id str6 country int(year y1) float(y2 x2002 x2010 x)
7 "France" 2002 7 1.8628873 10 200 10
5 "France" 2002 2 19.52281 10 200 10
6 "France" 2010 7 14.80233 15 500 500
8 "France" 2010 9 15.173367 15 500 500
3 "USA" 2002 8 2.2806208 5 100 5
1 "USA" 2002 6 18.706757 5 100 5
4 "USA" 2010 4 7.421789 30 25 25
2 "USA" 2010 4 9.642876 30 25 25
end
Dataex example: Before
* Example generated by -dataex-. For more info, type help dataex
clear
input int id str6 country int(year y1) float(y2 x2002 x2010)
7 "France" 2002 7 1.8628873 10 200
5 "France" 2002 2 19.52281 10 200
6 "France" 2010 7 14.80233 15 500
8 "France" 2010 9 15.173367 15 500
3 "USA" 2002 8 2.2806208 5 100
1 "USA" 2002 6 18.706757 5 100
4 "USA" 2010 4 7.421789 30 25
2 "USA" 2010 4 9.642876 30 25
end
My code looks like this:
foreach i in year {
gen x = list x`i' in i
}
I do not understand how to dynamically select a variable.
>Solution :
* Example generated by -dataex-. For more info, type help dataex
clear
input int id str6 country int(year y1) float(y2 x2002 x2010)
7 "France" 2002 7 1.8628873 10 200
5 "France" 2002 2 19.52281 10 200
6 "France" 2010 7 14.80233 15 500
8 "France" 2010 9 15.173367 15 500
3 "USA" 2002 8 2.2806208 5 100
1 "USA" 2002 6 18.706757 5 100
4 "USA" 2010 4 7.421789 30 25
2 "USA" 2010 4 9.642876 30 25
end
gen X = .
quietly forval y = 2002/2018 {
capture replace X = x`y' if year == `y'
}
This is quite concisely done in Stata, but here hinges on substitution of the contents of a local macro both as part of a variable name and as the value of a variable.
Note that foreach i in year is legal, but not at all a loop over the distinct values of year. It’s a loop over one item, the name or text "year". Also you need different syntax for referring to the local macro inside the loop, as just given.