Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Convert Python 3D dict into a dataframe

I’ve read excel file using pd.read_excel and i also have specified from which sheets i want them using Regex.

def gettingSheetName(_directory,_regex):
    '''reads the file using regex as a pattern for Sheet name'''
    xl = pd.ExcelFile(_directory)
    regex = re.compile(_regex)
    sheets = [n for n in xl.sheet_names if regex.match(n)]
    if sheets:
        return pd.read_excel(xl, sheet_name=sheets,index_col=False)

Therefor in return i get a Dictionary with 3 Dimensions.

{'SheetName':      Col1                 Col2
0             27                someText
1             35                someText
..           ...                         ...
171         4444                someText
172         6666                someText

i’ve tried pd.DataFrame.from_dict(Dic) to convert this to Pandas Data frame. and i got this error

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

If using all scalar values, you must pass an index

i also tried this code that i found pd.DataFrame(list(Sach.items()), columns=['Col1', 'Col2']) but the resualt looks like this

    Col1       col2
0   Sheetname  Col1 Col2 0....

Sheet name goes to value and after that Columns and finally the values that i’m assuming are starting with the first index which was 0
I’ve search everywhere and all i could find was for 2D Dic which doesn’t consider the Sheet name.
although i used this for loop instead of other popular methods which does the job, my Question is is there any way that i can use a Pandas function to achieve this? or possibly can i Not take the Sheetname from the beginning?

for key in Dic:
     Dic=Dic[key]

EDIT 1:

so after using return pd.concat(pd.read_excel(xl, sheet_name=sheets,index_col=False)) and then pd.DataFrame.from_dict(Sach) i got this Data Frame which is almost what i wanted but i only need the Cols and Values

                 Col1    Col2
Sheetname   0    Val1    Val2
            1

>Solution :

Use concat with DataFrame.reset_index:

return pd.concat(pd.read_excel(xl,sheet_name=sheets,index_col=False)).reset_index(drop=True)

d = {'SheetName':  pd.DataFrame({'Col1':[27,25], 'Col2':list('ab')}),
     'SheetName1':  pd.DataFrame({'Col1':[270,205], 'Col2':list('th')})} 
print (d)
{'SheetName':    Col1 Col2
0    27    a
1    25    b, 'SheetName1':    Col1 Col2
0   270    t
1   205    h}

By default get MultiIndex:

df = pd.concat(d)
print (df)
              Col1 Col2
SheetName  0    27    a
           1    25    b
SheetName1 0   270    t
           1   205    h

For convert it to column use:

df = pd.concat(d).droplevel(1).rename_axis('sheets').reset_index()
print (df)
       sheets  Col1 Col2
0   SheetName    27    a
1   SheetName    25    b
2  SheetName1   270    t
3  SheetName1   205    h

For remove MultiIndex use:

df = pd.concat(d).reset_index(drop=True)
print (df)
   Col1 Col2
0    27    a
1    25    b
2   270    t
3   205    h
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading