Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Run short python code directly on snakemake

I have a snakemake pipeline where I need to do a small step of processing the data (applying a rolling average to a dataframe).

I would like to write something like this:

rule average_df:
    input:
        # script = ,
        df_raw = "{sample}_raw.csv"
    params:
        window = 83
    output:
        df_avg = "{sample}_avg.csv"
    shell:
        """
        python
        import pandas as pd
        df=pd.read_csv("{input.df_raw}")
        df=df.rolling(window={params.window}, center=True, min_periods=1).mean()
        df.to_csv("{output.df_avg}")
        """

However it does not work.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Do I have to create a python file with those 4 lines of code? The alternative that occurs to me is a bit cumbersome. It would be

average_df.py

import pandas as pd


def average_df(i_path, o_path, window):

        df=pd.read_csv(path)
        df=df.rolling(window=window, center=True, min_periods=1).mean()
        df.to_csv(o_path)

        return None


if __name__ == "__main__":
    import argparse

    parser = argparse.ArgumentParser(description='Description of your program')
    parser.add_argument('-i_path', '--input_path', help='csv file', required=True)
    parser.add_argument('-o_path', '--output_path', help='csv file ', required=True)
    parser.add_argument('-w', '--window', help='window for averaging', required=True)


    args = vars(parser.parse_args())

    i_path = args['input_path']
    o_path = args['output_path']
    window = args['window']

    average_df(i_path, o_path, window)


And then have the snakemake rule like this:

rule average_df:
    input:
        script = average_df.py,
        df_raw = "{sample}_raw.csv"
    params:
        window = 83
    output:
        df_avg = "{sample}_avg.csv"
    shell:
        """
        python average_df.py --input_path {input.df_raw} --ouput_path {output.df_avg} -window {params.window}
        """

Is there a smarter or more efficient way to do this? That would be great! Looking forward to your input!

>Solution :

This can be achieved via run directive:

rule average_df:
    input:
        # script = ,
        df_raw = "{sample}_raw.csv"
    params:
        window = 83
    output:
        df_avg = "{sample}_avg.csv"
    run:
        import pandas as pd
        df=pd.read_csv(input.df_raw)
        df=df.rolling(window=params.window, center=True, min_periods=1).mean()
        df.to_csv(output.df_avg)

Note that all snakemake objects are available directly via input, output, params, etc.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading