Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to split Hmm databse (Pfam-A.hmm) into individual files?

I have downloaded the Pfam database, but in order to proceed with my work I would need to split it into different individual files. I tried to do it with the command hmmfetch:

Usage: hmmfetch [options] -f <hmmfile> <keyfile>  (retrieves all HMMs in <keyfile>)

Following this procedure I am able to retrieve some Hmms, but I have to specify the name in the keyfile. This approach is not convenient as I have to retrieve all the Hmms that are present in the original file.

The next thing I tried to do is to split the original file into individual ones using the following command:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

csplit --digits=2  --quiet --prefix=hmm Pfam-A.hmm "////+1" "{*}"

This worked perfectly fine to split the file into individual ones, the only thing that I could not figure out is how to give each file the name of the hmm. Each hmm file looks like this:

HMMER3/f [3.1b2 | February 2015]
NAME  120_Rick_ant
ACC   PF12574.11
DESC  120 KDa Rickettsia surface antigen
LENG  238
ALPH  amino
RF    no
MM    no
CONS  yes
CS    no
MAP   yes
DATE  Tue Oct 12 02:07:11 2021
NSEQ  2
EFFN  0.449219
CKSUM 3984216663
GA    25 25;
TC    39.8 39.6;
NC    23.6 21.2;
BM    hmmbuild HMM.ann SEED.ann
SM    hmmsearch -Z 61295632 -E 1000 --cpu 4 HMM pfamseq
STATS LOCAL MSV      -10.8956  0.70336
STATS LOCAL VITERBI  -11.6161  0.70336
STATS LOCAL FORWARD   -5.3029  0.70336
HMM          A        C        D        E        F        G        H        I        K        L        M        N        P        Q        R        S        T        V        W        Y   
            m->m     m->i     m->d     i->m     i->i     d->m     d->d
  COMPO   2.48852  4.43316  2.82069  2.56851  3.39369  2.73712  3.79297  2.89060  2.54228  2.53662  3.76796  3.01951  3.39446  3.08353  3.05948  2.67787  2.83658  2.66102  4.89473  3.44979
          2.68618  4.42225  2.77519  2.73123  3.46354  2.40513  3.72494  3.29354  2.67741  2.69355  4.24690  2.90347  2.73739  3.18146  2.89801  2.37887  2.77519  2.98518  4.58477  3.61503
          0.03268  3.83303  4.55537  0.61958  0.77255  0.00000        *
      1   3.11165  4.58599  4.12585  3.76620  3.12182  3.93147  4.43434  2.32453  3.53431  0.92536  3.15834  4.04543  4.37407  3.91210  3.71656  3.49871  3.40796  2.35149  4.98612  3.70011      1 l - - -
          2.68618  4.42225  2.77519  2.73123  3.46354  2.40513  3.72494  3.29354  2.67741  2.69355  4.24690  2.90347  2.73739  3.18146  2.89801  2.37887  2.77519  2.98518  4.58477  3.61503
          0.03268  3.83303  4.55537  0.61958  0.77255  0.48576  0.95510
      2   1.07216  4.17353  3.42348  3.21371  4.01396  2.99897  4.24029  3.13365  3.22896  3.01700  4.05375  3.37300  3.73453  3.57391  3.48180  2.52446  2.79912  2.79493  5.44509  4.24110      2 a - - -
          2.68618  4.42225  2.77519  2.73123  3.46354  2.40513  3.72494  3.29354  2.67741  2.69355  4.24690  2.90347  2.73739  3.18146  2.89801  2.37887  2.77519  2.98518  4.58477  3.61503
          0.03268  3.83303  4.55537  0.61958  0.77255  0.48576  0.95510
      3   2.91965  5.02079  2.47306  1.08285  4.36227  3.24954  3.83381  3.80837  2.70946  3.43216  4.40865  2.91254  3.85246  3.05076  3.11366  2.90651  3.22382  3.49656  5.54134  4.26436      3 e - - -
...
//

Using my commands approach this file is called "hmm01", but I would like it to be named "120_Rick_ant.hmm". Does anyone one know something that could do the trick? Thanks in advance!

>Solution :

A basic solution using GNU/BSD awk:

#!/bin/bash

while read -r id filename
do
    echo mv "$filename" "$id".hmm
done < <(awk '$1 == "NAME" {print $2,FILENAME; nextfile}' hmm*)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading