Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

python pandas csv file conversion of integers to binary

I have a csv file like that

Meme1, Meme2, Meme3, Meme4, Meme5, Meme6
Meme1, Meme2, Meme3, Meme99, Meme5, Meme6
Meme5, Meme2, Meme2, Meme4, Meme10, Meme6
Meme99, Meme3, Meme4, Meme4, Meme5, Meme6

I want like that

00000001, 00000010, 00000011, 00000100, 00000101, 00000110
00000001, 00000010, 01100011, 00000100, 00000101, 00000110
00000100, 00000010, 00000010, 00000100, 00001010, 00000110

means every integer should be converted to binary and word meme should be deleted

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I am trying but cannot do:(
import pandas as pd
import csv
import numpy as np

dataset = pd.read_csv('datsetcoma.txt')
reader = csv.DictReader(dataset)
print (reader)
 # print back the headers
for row in reader:
    if row.is_integer:
      
       b=np.binary_repr(10, width=8)
       print (b)

>Solution :

Loading the DF using import io with no headers, I extract the integers using a regular expression without expansion. Then cast to integer types. Because np.binary_repr is not vectorised, I have to "vectorise" it.

Because np methods do not retain indexing, I then reproduce the indicies (which is needed to retain row and column positions that are preserved in the multi-index) in the pd.Series constructor and unstack back to the original data frame shape.

df = pd.read_csv(io.StringIO('''Meme1, Meme2, Meme3, Meme4, Meme5, Meme6
Meme1, Meme2, Meme3, Meme99, Meme5, Meme6
Meme5, Meme2, Meme2, Meme4, Meme10, Meme6
Meme99, Meme3, Meme4, Meme4, Meme5, Meme6'''), header=None)

s = df.stack()
s = s.str.extract(r'(\d+)', expand=False).astype(int)
pd.Series(np.vectorize(np.binary_repr)(s, width=8), index=s.index).unstack()

The final output,

          0         1         2         3         4         5
0  00000001  00000010  00000011  00000100  00000101  00000110
1  00000001  00000010  00000011  01100011  00000101  00000110
2  00000101  00000010  00000010  00000100  00001010  00000110
3  01100011  00000011  00000100  00000100  00000101  00000110

Nb your binary conversions in the original post are not all accurate. Eg Meme5 is erroneously converted to 00000100 when it should be 00000101. The OP version also omits (probably for convenience) the final row.

Nb also that this will not work if there are multiple expansion groups. In a comment I posited the hypothetical example foo123bar456. This would result in two expansion groups which would disturb the indexing.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading