Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Convert SQL commands to Python

I have the following code in SQL:

CREATE OR REPLACE TABLE
  `table2` AS
SELECT nmc_lvl, count(*) AS total_nb
FROM (
  SELECT DISTINCT nmc_ray as nmc_lvl, cli_id, date
  FROM `table1`
)
GROUP BY nmc_lvl

I’ve been trying to rewrite it in python like so:

table2 = table1['cli_id', 'date', 'nmc_ray'].unique()
table2 = table2.groupby('nmc_ray')['cli_id', 'date'].count()

but I keep getting a generic error message. What am I doing wrong?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

EDIT:
added the error message

KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_7989/1297335147.py in <module>
----> 1 table2 = table1['cli_id', 'date', 'nmc_ray'].unique()
      2 table2 = table2.groupby('nmc_ray')['cli_id', 'date'].count()

/opt/conda/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   3456             if self.columns.nlevels > 1:
   3457                 return self._getitem_multilevel(key)
-> 3458             indexer = self.columns.get_loc(key)
   3459             if is_integer(indexer):
   3460                 indexer = [indexer]

/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:
-> 3363                 raise KeyError(key) from err
   3364 
   3365         if is_scalar(key) and isna(key) and not self.hasnans:

KeyError: ('cli_id', 'date', 'nmc_ray')

>Solution :

IIUC, you could try the following:

table2 = (table1.drop_duplicates(subset=['nmc_ray', 'cli_id', 'date'])[['nmc_ray','cli_id','date']]
          .rename(columns={'nmc_ray':'nmc_lvl'})
          .value_counts('nmc_lvl').reset_index(name='total_nb'))

The equivalent of SELECT DISTINCT col is drop_duplicates(col) and the equivalent of SELECT col, count(*) is value_counts(col).

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading