Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Using cross-sectional data for OLS/logit models

I have a cross-sectional dataset with the data example below, where the variable (id) refers to each individual in the df and rows represent the different number of Reddit posts written by each username, which vary across individuals.
My goal is to use OLS regression to predict average sentiment, based on individual-level covariates which are all measured at the username-level. For instance, the indicator "collective_action_prop" counts the proportion of collective action mentions across all posts for a given username.

Currently, I ran the OLS model as follows:

regress avg_sentiment avg_response collective_action_prop economic_demand_prop

However, I am not sure if I am correctly running the OLS regression at the username-level with the current data structure where each row represents a Reddit post but the variable id refers to usernames:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel


* Example generated by -dataex-. For more info, type help dataex
clear
input float id double avg_sentiment float avg_response double(collective_action_prop economic_demand_prop)
 1                 -1         1                  0                 1
 2                 -1         0                  0                 1
 3                  0         0                  0                 0
 4                  0         .                  0                 0
 4                  0         .                  0                 0
 5                  1         6                  0                 0
 6 -.2105263157894737  .2105263                  0 .2631578947368421
 6 -.2105263157894737  .2105263                  0 .2631578947368421
 6 -.2105263157894737         .                  0 .2631578947368421
 6 -.2105263157894737         .                  0 .2631578947368421
 6 -.2105263157894737         .                  0 .2631578947368421
 6 -.2105263157894737  .2105263                  0 .2631578947368421
 6 -.2105263157894737  .2105263                  0 .2631578947368421
 6 -.2105263157894737  .2105263                  0 .2631578947368421
 7 -.2307692307692307  .6923077 .07692307692307693 .3461538461538461
 7 -.2307692307692307  .6923077 .07692307692307693 .3461538461538461
 7 -.2307692307692307  .6923077 .07692307692307693 .3461538461538461
 7 -.2307692307692307  .6923077 .07692307692307693 .3461538461538461
 7 -.2307692307692307  .6923077 .07692307692307693 .3461538461538461
 7 -.2307692307692307  .6923077 .07692307692307693 .3461538461538461
 7 -.2307692307692307  .6923077 .07692307692307693 .3461538461538461
 7 -.2307692307692307  .6923077 .07692307692307693 .3461538461538461
end
----


>Solution :

If your average sentiment is bounded between (0,1) then OLS would not be right since it assumes that the outcome variable is continuous (-Inf,Inf). You need to re-conceptualize your problem via transformation of the outcome variable or a beta regression

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading