Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

use Pandas to drop values from csv

I have a csv that I want to use to search an api for data, but the row which stores the data used for the api search can contain a second value separated by ;
like this:

2 Jan Rohls Kunst und Religion zwischen Mittelalter und Barock : von Dante bis Bach -
3 Karl-Markus Ritter    Der Dom zu Speyer : weltliche Macht und christlicher Glaube 9783806241280; 3806241287
4 Hape Kerkeling    Ich bin dann mal weg : meine Reise auf dem Jakobsweg    9783890296005; 3890296009

I want to remove the values after and including ; so that only the 9783XXX remains in the row ISBN. It would be perfect, if the code could be included in the existing process, but I haven’t found a way how to manage that.

Here’s the full extracted csv: https://pastebin.com/frAK9NBG

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

These are some scripts I use to

a) CLean a pre-existing csv:

import pandas as pd
import glob
import os
path= "./**/extract/"

filelist=glob.glob('./**/Reihe A/Reihe*.csv',recursive=True)
print(filelist)
for file in filelist:
    data=pd.read_csv(file, sep="\t",  encoding='utf8')
    data.columns
    title=[] #this gets all the titles
    for row in data['Titel']:
        title.append(row)
    author=[] #this gets all the authors
    for row in data['Verfasser']:
        author.append(row)
    isbn=[] #this gets all the isbns
    for row in data['ISBN']:
        isbn.append(row)
    df=pd.DataFrame({'Verfasser': author, 'Titel': title, 'ISBN': isbn}) #create new csvs based on extracted data
#save csv in set path
df.to_csv(path + file +"_" +"extract", sep='\t', encoding='utf8')
#df.to_csv(path + file +'_' + 'extract.csv', sep="\t") #add endpath ,  encoding='utf-8' back if needed
print(file +'_' + 'extract.csv' + ' saved to ' + path)

and to

b) search the api

import glob
import pandas as pd
from urllib.request import urlopen
#import generated csvs from other script
filelist=glob.glob('./**/Reihe*_extract.csv',recursive=True)
print(filelist)
for file in filelist:
    #read csv, make a list of all isbns
    data=pd.read_csv(file, sep="\t",  encoding='utf8')
    isbnlist=[]
    for row in data['ISBN']:
        isbnlist.append(row)
    #for each isbn in list, get data from api
    for isbn in isbnlist:
        url = 'http://sru.k10plus.de/gvk!rec=1?version=1.1&operation=searchRetrieve&query=pica.isb%3D[isbn]&maximumRecords=10&recordSchema=marcxml'  
    url = url.replace('[isbn]', isbn)
    print(url)

as of right now, I’m splitting my codes based on their function (clean csv, search api etc.) but I plan to make a launcher script that does all for the end user.

Any help is appreciated.

>Solution :

Try this, split by seperator and keep wanted split:

data['ISBN'] = [x.split(' ')[0] for x in data['ISBN']] #keeps first portion of split.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading