dnanexus_utils

exception scgpm_seqresults_dnanexus.dnanexus_utils.DxMissingAlignmentSummaryMetrics[source]

Bases: Exception

Raised by DxSeqResults.get_alignment_summary_metrics() when it can’t locate a Picard alignment summary metrics file for a given barcoded sample of FASTQ sequencing results.

exception scgpm_seqresults_dnanexus.dnanexus_utils.DxMissingLibraryNameProperty[source]

Bases: Exception

Raised when creating a new DxSeqResults instance for a DNAnexus project that doesn’t have the library_name project property present.

exception scgpm_seqresults_dnanexus.dnanexus_utils.DxProjectMissingQueueProperty[source]

Bases: Exception

exception scgpm_seqresults_dnanexus.dnanexus_utils.DxMultipleProjectsWithSameLibraryName[source]

Bases: Exception

exception scgpm_seqresults_dnanexus.dnanexus_utils.FastqNotFound[source]

Bases: Exception

exception scgpm_seqresults_dnanexus.dnanexus_utils.DnanexusBarcodeNotFound[source]

Bases: Exception

scgpm_seqresults_dnanexus.dnanexus_utils.select_newest_project(dx_project_ids)[source]

Given a list of DNAnexus project IDs, returns the one that is newest as determined by creation date.

Parameters:dx_project_idslist of DNAnexus project IDs.
Returns:str.
scgpm_seqresults_dnanexus.dnanexus_utils.accept_project_transfers(access_level, queue, org, share_with_org=None)[source]
Parameters:
  • access_levelstr. Permissions level the new member should have on transferred projects. Should be one of [“VIEW”,”UPLOAD”,”CONTRIBUTE”,”ADMINISTER”]. See https://wiki.dnanexus.com/API-Specification-v1.0.0/Project-Permissions-and-Sharing for more details on access levels.
  • queuestr. The value of the queue property on a DNAnexus project. Only projects that are pending transfer that have this value for the queue property will be transferred to the specified org.
  • orgstr. The name of the DNAnexus org under which to accept the project transfers for projects that have their queue property set to the value of the ‘queue’ argument.
  • share_with_orgstr. Set this argument if you’d like to share the transferred projects with the org so that all users of the org will have access to the project. The value you supply should be the access level that members of the org will have.
Returns:

The projects that were transferred to the specified billing account. Keys are the project IDs, and values are the project names.

Return type:

dict

scgpm_seqresults_dnanexus.dnanexus_utils.find_org_projects_by_name_glob(org, glob)[source]
Parameters:globstr.
Ex:
Find the project(s) with SREQ-163 at the end of the project’s name:
find_org_projects_by_name_glob(org=”org-someorg”, glob=”*SREQ-163”)
scgpm_seqresults_dnanexus.dnanexus_utils.share_with_org(project_ids, org, access_level, suppress_email_notification=False)[source]

Shares one or more DNAnexus projects with an organization. It appears that DNAnexus requires for the user that wants to share the org to first have ADMINISTER access on the project. Only then could he share the project with the org.

Parameters:
  • project_idslist. One or more DNAnexus project identifiers, where each project ID is in the form “project-FXq6B809p5jKzp2vJkjkKvg3”.
  • orgstr. The name of the DNAnexus org with which to share the projects.
  • access_level – The permission level to give to members of the org - one of [“VIEW”,”UPLOAD”,”CONTRIBUTE”,”ADMINISTER”].
  • suppress_email_notificationbool. True means to allow the DNAnexus platform to send an email notification for each shared project.
class scgpm_seqresults_dnanexus.dnanexus_utils.DxSeqResults(dx_project_id=False, dx_project_name=False, uhts_run_name=False, sequencing_lane=False, library_name=False, billing_account_id=None, latest_project=False)[source]

Bases: object

Finds the DNAnexus sequencing results project that was uploaded by GSSC. The project can be precisely retrieved if the projecd ID is specified (via the dx_project_id argument). Otherwise, you can supply the dx_project_name argument if you know the name, or use the library_name argument if you know the name of the library that was submitted to GSSC. All sequencing result projects uploaded to DNAnexus by GSSC contain a property named ‘library_name’, and projects will be searched on this property for a matching library name when the library_name argument is specified. If both the library_name and the dx_project_name arguments are specified, only the latter is used in finding a project match. The billing_account argument can optionally be specifed to restrict all project searches to only those that are billed to that particular billing account (unless dx_project_id is specified in which case the DNAnexus project is directly retrieved).

Parameters:
  • - str. The ID of the DNAnexus project (dx_project_id) – will be performed as it will be directly retrieved.
  • - str. Name of a DNAnexus project containing sequencing results that were (dx_project_name) – uploaded by GSSC.
  • - str. Name of the sequencing run in UHTS. This is added as a property to (uhts_run_name) – all projects in DNAnexus through the ‘seq_run_name’ property.
  • - int. Lane number of the flowcell on which the library was sequenced. (sequencing_lane) – This is in a property named seq_lane_index on all GSSC projects in DNAnexus.
  • - str. Library name of the sample that was sequenced. This is the name of (library_name) – the library that was submitted to GSSC for sequencing, and is added as a property to all GSSC DNAnexus projects via the ‘library_name’ property.
  • - str. Name of the DNAnexus billing account that the project belongs to. (billing_account_id) – This will only be used to restrict the search of projects that the user can see to only those billed by the specified account.
  • - bool. True indicates that if multiple projects are found given the search (latest_project) – criteria, the most recently created project will be returned.
FQEXT = '.fastq.gz'

The extension used for FASTQ files.

get_run_details_json()[source]

Retrieves the JSON object for the stats in the file named run_details.json in the project specified by self.dx_project_id.

Returns:JSON object of the run details.
get_alignment_summary_metrics(barcode)[source]

Parses the metrics in a ${barcode}alignment_summary_metrics file in the DNAnexus project (usually in the qc folder). This contains metrics produced by Picard Tools’s CollectAlignmentSummaryMetrics program.

get_barcode_stats(barcode)[source]

Loads the JSON in a ${barcode}_stats.json file in the DNAnexus project (usually in the qc folder).

get_sample_stats_json(barcode=None)[source]

Deprecated since version 0.1.0: GSSC has removed the sample_stats.json file since the entire folder it was in has been removed. Use get_barcode_stats() instead.

Retrieves the JSON object for the stats in the file named sample_stats.json in the project specified by self.dx_project_id. This file is located in the DNAnexus folder staged_qc_report.

Parameters:barcode

str. The barcode for the sample. Currently, the sample_stats.json file is of the following form when there isn’t a genome mapping:

[{“Sample name”: “AGTTCC”}, {“Sample name”: “CAGATC”}, {“Sample name”: “GCCAAT”}, …}].

When there is a mapping, each dictionary has many more keys in addition to the “Sample name” one.

Returns:list of dicts if barcode=None, otherwise a dict for the given barcode.
download_metadata_tar(download_dir)[source]

Downloads the ${run_name}.metadata.tar file from the DNAnexus sequencing results project.

Parameters:download_dirstr - The local directory path to download the QC report to.
Returns:The filepath to the downloaded metadata tarball.
Return type:str
download_run_details_json(download_dir)[source]

Downloads the run_details.json and the barcodes.json from the DNAnexus sequencing results project.

Parameters:download_dirstr - The local directory path to download the QC report to.
Returns:str. The filepath to the downloaded run_details.json file.
download_barcodes_json(download_dir)[source]

Downloads the run_details.json and the barcodes.json from the DNAnexus sequencing results project.

Parameters:download_dirstr - The local directory path to download the QC report to.
Returns:str. The filepath to the downloaded barcodes.json file.
download_samplesheet(download_dir)[source]

Downloads the SampleSheet used in demultiplexing from the DNAnexus sequencing results project.

Parameters:download_dirstr - The local directory path to download the QC report to.
Returns:str. The filepath to the downloaded QC report.
download_qc_report(download_dir)[source]

Downloads the QC report from the DNAnexus sequencing results project.

Parameters:download_dirstr - The local directory path to download the QC report to.
Returns:str. The filepath to the downloaded QC report.
download_fastqc_reports(download_dir)[source]

Downloads the QC report from the DNAnexus sequencing results project.

Parameters:download_dirstr - The local directory path to download the QC report to.
Returns:str. The filepath to the downloaded FASTQC reports folder.
download_fastqs(dest_dir, barcode, overwrite=False)[source]

Downloads all FASTQ files in the project that match the specified barcode, or if a barcode isn’t given, all FASTQ files as in this case it is assumed that this is not a multiplexed experiment. Files are downloaded to the directory specified by dest_dir.

Parameters:
  • barcodestr. The barcode sequence used.
  • dest_dirstr. The local directory in which the FASTQs will be downloaded.
  • overwritebool. If True, then if the file to download already exists in dest_dir, the file will be downloaded again, overwriting it. If False, the file will not be downloaded again from DNAnexus.
Returns:

The key is the barcode, and the value is a dict with integer keys of 1 for the

forward reads file, and 2 for the reverse reads file. If not paired-end,

Return type:

dict

Raises:

Exception – The barcode is specified and less than or greater than 2 FASTQ files are found.

get_fastq_dxfile_objects(barcode=None)[source]

Retrieves all the FASTQ files in project self.dx_project_name as DXFile objects.

Parameters:barcodestr. If set, then only FASTQ file properties for FASTQ files having the specified barcode are returned.
Returns:list of DXFile objects representing FASTQ files.
Raises:dnanexus_utils.FastqNotFound – No FASTQ files were found.
revcomp_barcode_in_fastqfile_prop(i7=False, i5=False)[source]

Use this method if you need to update the barcode sequence stored as the value of the barcode property of a FASTQ file on DNAnexus.

Parameters:
  • i7bool. True means to reverse complement the i7 barcode.
  • i5bool. True means to reverse complement the i5 barcode.
revcomp(seq)[source]

Returns The reverse complement of a DNA sequence.

Parameters:seqstr.
Returns:str.
get_fastq_files_props(barcode=None)[source]

Returns the DNAnexus file properties for all FASTQ files in the project that match the specified barcode, or all FASTQ files if not barcode is specified.

Parameters:barcodestr. If set, then only FASTQ file properties for FASTQ files having the specified barcode are returned.
Returns:dict. Keys are the FASTQ file DXFile objects; values are the dict of associated properties on DNAnexus on the file. In addition to the properties on the file in DNAnexus, an additional property is added here called ‘fastq_file_name’.
Raises:dnanexus_utils.FastqNotFound exception if no FASTQ files were found.