add_r1-r2_fastq_paths.py

Retrieves the FASTQ file names in DNAnexus for the specified sequenced libraries. The tab-delimited input file may be provided in one of two formats.

Format 1:
  1. DNAnexus project name
  2. barcode
Format 2
  1. uhts run name
  2. lane
  3. barcode,

The format in use is determined by the number of header fields present in the header line, which must appear as the very first line in the input file and begin with a ‘#’.

The output file is identical to the input file, with the exception of two new columns at the start of the file being the FASTQ file name on the DNAnexus platform, and the read number. Thus, the output columns are:

  1. FASTQ file name
  2. Read number (1 for forward reads, 2 for reverse reads)

followed by the input file columns. Note that at present, one of three warnings may be output to stdout. The possible warnings are triggered whenver

  • A DNAnexus project isn’t found based on the provided criteria.
  • A DNAnexus project was found, but there were not any FASTQ files found within having the specified barcodes.
  • A DNAnexus project was found, but only a forward reads or reverse reads FASTQ file was found, not both.

The last warning thus implies that the script assumes all reads are paired-end, which is true.

usage: add_r1-r2_fastq_paths.py [-h] -i INFILE -o OUTFILE

Named Arguments

-i, --infile
Tab-delimited input file in one of two formats. In each format, the first line must be a header line starting with a ‘#’. Empty lines and lines beginning with ‘#’ are ignored. The first format contains only two columns with the 1st containing the DNAnexus project name, and the second the barcode. The second format contains the three columns uhts_run name, lane, and barcode. The number of columns present in the header line determines the format - two fields for the first format, and three fields for the latter. A field-header line starting with ‘#’ is required as the first line.
-o, --outfile