scgpm_clean_raw_data.py

This script calls the DNAnexus app I built called SCGPM Clean Raw Data at https://platform.dnanexus.com/app/scgpm_clean_raw_dataRemoves to unwanted files (that drive up the storage costs) from the raw_data folder of a DNAnexus project containing sequencing results from the SCGPM sequencing workflow. Most of the files in the raw_data folder are removed. Moreover, the lane tarball is removed; the XML files RunInfo.xml and runParameters.xml are extracted from Interop.tar and then the tarball is removed; finally, metadata.tar is removed. The extracted XML files are uploaded back to the raw_data folder.

Queryies DNAnexus for all projects billed to the specified org and that were created within the last -d days.

You must have the environemnt variable DX_SECURITY_CONTEXT set (described at http://autodoc.dnanexus.com/bindings/python/current/dxpy.html?highlight=token) in order to authenticate with DNAnexus.

usage: scgpm_clean_raw_data.py [-h] [-d DAYS_AGO] -o ORG

Named Arguments

-d, --days-ago
The number of days ago to query for new projects that are billed to the org specified by –org.

Default: 30

-o, --org
Limits the project search to only those that belong to the specified DNAnexus org. Should begin with ‘org-‘.