Can these two SQL queries be combined?

I’m wondering if there’s a way I can combine these two queries into one? I need to get the mean and std dev for each column in the company_feature table. I then need to take those two values and use them in an aggregation query on each row in the company_feature table.

/* Get mean and std dev for each feature column */
SELECT 
    AVG(F1) AS F1_mean, 
    STDDEV(F1) AS F1_std_dev
FROM company_feature_test cft;
/* Add averages for each feature to the following query */
SELECT
    DATA.company_id,
      (
       CASE
         WHEN DATA.in_ref_set = 0 AND DATA.size = 'SMALL'
         THEN 
           1 * ((LN(DATA.F1 + 1) - :F1_mean) / :F1_std_dev ) * 1
         WHEN DATA.in_ref_set = 0 AND DATA.size = 'MEDIUM'
         THEN 
           2 * ((LN(DATA.F1 + 1) - :F1_mean) / :F1_std_dev ) * 2           
         WHEN DATA.in_ref_set = 0 AND DATA.size = 'LARGE'
         THEN 
           3 * ((LN(DATA.F1 + 1) - :F1_mean) / :F1_std_dev ) * 3   
         WHEN DATA.in_ref_set = 0 AND DATA.size = 'VERY_LARGE'
         THEN 
           4 * ((LN(DATA.F1 + 1) - :F1_mean) / :F1_std_dev ) * 4
         ELSE
           5 * ((LN(DATA.F1 + 1) - :F1_mean) / :F1_std_dev )
       END
      ) AS feature_1
FROM (
  SELECT company.in_ref_set, company.size, cft.*
  FROM company_feature_test cft
  JOIN company ON company.id = cft.company_id
  GROUP BY company.id
) AS DATA
GROUP BY DATA.company_id;

the tables look like the following (below). There is a relation between company.id and company_feature.company_id.

company table

| id | ref_set | size |
| -- | --- | --- |
| 1  | 0   | SMALL |
| 2  | 1   | LARGE |


company_feature table

| company_id | F1 | F2 |
| --- | --- | --- |
| 1   | 5   | 10  |
| 2   | 15  | 20  |

The query outputs the following data:

| company_id | feature_1 |
| --- | --- | 
| 1   | -1.66 |
| 2   | -1.44 | 

>Solution :

Yes, you just cross join them:

SELECT
    DATA.company_id,
      (
       CASE
         WHEN DATA.in_ref_set = 0 AND DATA.size = 'SMALL'
         THEN 
           1 * ((LN(DATA.F1 + 1) - TOTALS.F1_mean) / TOTALS.F1_std_dev ) * 1
         WHEN DATA.in_ref_set = 0 AND DATA.size = 'MEDIUM'
         THEN 
           2 * ((LN(DATA.F1 + 1) - TOTALS.F1_mean) / TOTALS.F1_std_dev ) * 2           
         WHEN DATA.in_ref_set = 0 AND DATA.size = 'LARGE'
         THEN 
           3 * ((LN(DATA.F1 + 1) - TOTALS.F1_mean) / TOTALS.F1_std_dev ) * 3   
         WHEN DATA.in_ref_set = 0 AND DATA.size = 'VERY_LARGE'
         THEN 
           4 * ((LN(DATA.F1 + 1) - TOTALS.F1_mean) / TOTALS.F1_std_dev ) * 4
         ELSE
           5 * ((LN(DATA.F1 + 1) - TOTALS.F1_mean) / TOTALS.F1_std_dev )
       END
      ) AS feature_1
FROM (
  SELECT company.in_ref_set, company.size, cft.*
  FROM company_feature_test cft
  JOIN company ON company.id = cft.company_id
  GROUP BY company.id
) AS DATA
CROSS JOIN (
  SELECT 
    AVG(F1) AS F1_mean, 
    STDDEV(F1) AS F1_std_dev
  FROM company_feature_test cft
) AS TOTALS

Note that there’s no need to group by in the outer query; there will already only be one row per company.

Note that you still seem to be doing conditional aggregation incorrectly, if that is what you are trying to do; assuming there are multiple rows in cft for each company, you will be selecting an arbitrary F1 for each company. Default settings in newer versions of mysql will prohibit this.

Leave a Reply