Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to create row wise CSV for vectorized dataframe?

What I am trying to do is basically pulling out keywords from a processed file of a log file and creating a vectorized dataframe of those keywords. But when I am writing that dataframe into CSV, words are in the columns and their respective value in the second row.
While I want the words to be in rows and their value in second column.

trial.py :

import re
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import ENGLISH_STOP_WORDS

def removeNumbers(list):
   #doing something

def processFiles(filename):
   #doing something

def readFile(fileName):
   #doing something

# Build our text
processFiles("log.txt")
text = readFile("processedFile.txt")


vectorizer = CountVectorizer()

matrix = vectorizer.fit_transform([text])

counts = pd.DataFrame(matrix.toarray(),
                      columns=vectorizer.get_feature_names_out())



counts.to_csv("keywords_count.csv")

keywords_count.csv looks like this :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

,accept,accepted,action,add,address,agent,allocated,api,api_action_sender,api_reader,apihandle,apiinitialize,apiterminate,appl,associate,attempt,available,bd,bdfb,broken,ceased,check_signals,chose,cksm,cl,clcat,client,close,code,complete,conf,configuration,connection,connfd,constructing,control,creating,ctcd,delresp,dereg,deregistering,does,dreg_process,dst,dump,edci,engine,entering,entity,entity_initialize,entries,entry,event,event_establishsessionsend,event_timert_expire,exist,exists,exit,exiting,expect,expired,failed,fc,file,filter,flg,flow,flow_timer_start,flow_timer_stop,forward,gateway,handle,home,hop,if,ifaeddrg_byaddr,ifidx,image,images,index,inf,info,informational,init_policyapi,initialization,initialized,install,interface,ioctl,ip,len,level,lih,link,list,local,locate_configfile,log,loopback,mailbox,mailbox_register,mailslot,mailslot_create,mailslot_send,mailslot_sitter,main,mcast_add,module,msg,necessary,new,node,obj,old,open_socket,operation,os,outgoing,papi_debug,papilogfunc,papiuservalue,path,pathdelta,pathed,pathtear,pipe,policy,process,proterr,proto,qoshandle,qoshd,qosmgr,qosmgr_request,qosmgr_response,query,querying,rapi,raw,rc,read_physical_netif,readbuffer,ready,reason,received,reentering,reg_process,registered,registering,registerwithpolicyapi,registration,remove,req,request,reservation,response,result,resv,resvdelta,resved,resvresp,return,returned,route,router_forward_getoi,rpapi_getpolicydata,rpapi_getspecdata,rpapi_reg_unregflow,rsv,rsvp,rsvp_action_nhop,rsvp_api_open,rsvp_event,rsvp_event_establishsession,rsvp_event_mapsession,rsvp_event_propagate,rsvp_explode_packet,rsvp_flow_statemachine,rsvp_hop,rsvp_parse_objects,rsvpd,rsvpfindactionname,rsvpfindservicedetailsonactname,rsvpgettspec,rsvpputactionname,rsvpremactionname,rthdl,send,sender,sender_withdraw,sending,service,sess,session,sessioned,setsockopt,settcpimage,sigalrm,signal,sigterm,socket,source,specified,src,start,started,state,status,stop,stopped,style,successful,supported,tc,tcp,tcpcs,term,term_policyapi,terminate,terminated,terminator,timer,tout,tr,trace,traffic,traffic_action_oif,traffic_reader,ttl,type,udp,unregistered,unregisterfrompolicyapi,user,using,vlink,warning,wf,writing
0,1,1,1,1,18,1,28,8,1,6,1,3,2,1,1,2,4,2,1,1,1,1,1,4,1,3,1,1,1,1,1,1,2,1,9,2,22,2,1,1,1,2,3,3,2,5,2,20,7,7,1,7,31,1,6,1,6,1,17,1,6,4,8,1,2,4,4,12,7,2,7,7,1,4,1,2,7,1,1,7,7,147,2,14,1,8,1,18,9,5,4,1,4,2,1,1,1,1,1,24,23,20,27,9,7,3,4,1,2,2,2,1,4,1,2,1,1,1,3,1,1,7,1,2,4,2,2,10,1,3,2,1,2,4,4,6,1,1,4,4,8,12,1,2,12,9,3,1,1,3,2,2,1,4,3,2,6,4,1,20,1,1,1,17,35,11,3,12,4,38,8,1,4,1,7,1,4,26,4,8,2,3,3,3,3,3,1,1,1,1,9,3,3,10,4,4,2,6,8,1,6,12,1,3,4,9,26,2,5,2,4,10,1,2,2,1,1,8,2,2,1,2,6,1,119,2,2,3,4,5,14,1,3,1,1,1,4,4,1

>Solution :

Transpose your dataframe:

counts.T.to_csv("keywords_count.csv")
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading