I have multiple txt files like this :
https://ftp.ncbi.nlm.nih.gov/dbgap/studies/phs001672/analyses/phs001672.pha004730.txt
The file is saved in C:\Users\test.txt
How can we remove the first lines with the comments (lets say 20 lines) and save a new csv file only with the table in python ?
>Solution :
You can use read_table with a custom comment :
url = "https://ftp.ncbi.nlm.nih.gov/dbgap/studies/"
"phs001672/analyses/phs001672.pha004730.txt"
df = pd.read_table(url, comment="#")
Output :
print(df)
ID Analysis ID SNP ID ... Coded Allele Sample size Bin ID
0 506214698 4730 rs1300646 ... A 8542 6
1 506218329 4730 rs76749734 ... A 942 158
2 506216207 4730 rs80286553 ... A 90924 26
... ... ... ... ... ... ... ...
31662 506245867 4730 rs71334010 ... A 317118 1422
31663 506245880 4730 rs113480342 ... A 314121 1422
31664 506245884 4730 rs140069817 ... T 307546 1422
[31665 rows x 22 columns]