The Integrated Rapid Infectious Disease Analysis (IRIDA) Platform

From Wikitia
Jump to navigation Jump to search
The Integrated Rapid Infectious Disease Analysis Platform
Type of site
Infectious Disease Analysis platform
Available inEnglish

The Integrated Rapid Infectious Disease Analysis platform, also known as IRIDA, is a web-based, open-source, decentralized bioinformatics platform for managing and analyzing next generation sequencing data of communicable pathogens.[1][2] IRIDA is developed by a group of Canadian researchers from the National Microbiology Laboratory, British Columbia Centre for Disease Control and Simon Fraser University, representing the Canadian federal government, provincial government and academic institutions, respectively. Aiming to provide public health personnels with an user-friendly and secure environment for managing whole genome sequencing data, IRIDA is designed to be user-friendly, locally installed, and open-source. Users can securely upload, store and share whole genome sequencing data and sensitive metadata using the IRIDA data management system, while also running customizable sequence analysis pipelines on their data. IRIDA is released under the Apache 2.0 licence. A beta version for immediate evaluation of IRIDA is hosted by SFU,[3] while the full version is available on Github.[4]

Public Health Significance and Impact

In recent years, whole genome sequencing of microbial pathogens has been widely recognized as the new technology that replaces traditional techniques used in public health surveillance of microbial pathogens. The increasing adoption of whole genome sequencing is due to the technology's unprecedented discriminatory power for molecular linkage and strain differentiation, and it represents a single molecular assay that permit comprehensive pathogen characterization, including clinically-relevant phenotypic traits prediction and subtyping. [5] However, the ability to manage and process the produced whole genome sequencing data in many public health settings is largely confounded by limited computational expertise and resources. While user-friendly bioinformatic analysis platforms are available, the diverse public health environments, data sharing policies and computer infrastructures, impede the usage of these platforms. IRIDA represents a public health solution to securely integrate cross-jurisdictional whole genome sequencing data and epidemiological information within a harmonized flexible communication platform and facilitate interpretation of genomic information for researchers, public health practitioners and stakeholders in the context of infectious disease epidemiology and surveillance. By reducing the technical burden of handling high-throughput data, while maintaining compliance with jurisdictional data protection policies, IRIDA provides an interface for effective data management and streamlined data analysis necessary for rapid outbreak response and timely public health actions.

Data Management Architecture

IRIDA adopts a project-centric framework to provide isolated working environments for data management. The concept of "project" is used to organize a collection of context-specific sequencing data and any accompanying metadata.[1] Each individual set of sequence data is imported as a "sample" within a project. This type of data management structure resembles the BioProject architectures used by global centralized repositories (e.g. SRA, ENA, DDBJ) to support compatible data sharing. To associate additional information to projects, an excel file containing sample metadata can be uploaded via the web browser and restrictive permissions are enforced to preserve data integrity. Reference genomes can also be uploaded and designated to a specific project for data analysis. Any previously generated analysis results of the samples and descriptions of previous analysis runs are all archived and visible to the collaborators of a given project. Projects, by default, are only visible to the project manager, but can be made accessible to other users of the system by inviting project collaborators. Various forms of methods are available to import data, including automated upload from Illumina sequencers, web upload, synchronization with other instances and REST API tools.[1]

Data Sharing Framework

Through user role management, the project data can be readily shared amongst users registered within the same local instance of IRIDA. By inviting additional project collaborators, new users are granted access to the raw sequencing data and metadata.[1] Alternatively, cross-instance data sharing is supported to enable automated project-specific data synchronization between different installations. This requires the host instance to grant connection permission to client instances, allowing for direct copying of data and associated metadata to the clients locally and the synchronization privilege can be revoked by the host or client administrator at any point in time. While the synchronization process is maintained, any modification of the data made by the host instance will be reflected in the client instance. The host-client data sharing model effectively supports the creation of global or local networks of real-time communication and collaboration between public health agencies. Voluntary submission of local sequencing data and associated metadata to global sequence repositories such as SRA, ENA and DDBJ is also supported by IRIDA to contribute to global research efforts.[1]

User Management

User role assignments are implemented to manage data permission and operations in IRIDA. System user roles are categorized into regular user, sequencer, manager and admin. A regular user is authorized to manage projects of granted access and create projects.[1] The sequencer role is authorized to upload data to projects and prohibited from data access and management. The manager role has the authorization to project management, creation, and adding or modifying users in the IRIDA system. The admin role has complete access to the entire system with permissions to all projects and analyses. For a given project that contains a collection of data, an additional layer of user control is in place to distinguish between project managers and collaborators. The project manager is the owner of the collection of data in a project and has the privilege of importing and modifying new or existing data and metadata, as well as adding new users to the project. Project collaborator is authorized to view and analyze the collection of data, however is prohibited from the modification of data.[1]

Sequence Analysis

Bioinformatic Pipelines

IRIDA integrates the Galaxy workflow management system[6] to support and assemble reproducible sequence analysis pipelines. A large collection of pipelines is available to the users to perform standardized genomic and phylogenetic analysis of microbial raw sequences that ranges from phylogenetic inference (SNVPhyl[7]), in silico serotyping (SISTR[8]), antimicrobial resistance prediction (CARD/RGI[9]) to genome assembly (Shovill[10]) and annotation (Prokka[11]). Each sequence analysis pipeline offers adjustable parameters to enable customization by the users such as selecting alternative reference sequences or databases. To perform an analysis, the platform adopts an online shopping cart model in which users individually select desired input samples stored under a single or separate projects for downstream processing. The ability to perform an integrated analysis of samples derived from multiple projects provides support for retrospective studies, longitudinal surveillance and the One Health framework. Once a job is submitted, the Galaxy management system is capable of distributing the analysis in a high-performance cluster environment to maximize parallel processing.

Result Report and Visualization

The raw outputs and accompanying quality metrics of an analysis are entirely made available to the users providing the autonomy for interpretation via the web browser. Clinically or epidemiologically-relevant results such as antimicrobial resistance and serotype directly become incorporated as part of the sample metadata which are presented in an interactive tabular format that can be directly downloaded locally in excel or CSV formats.[1] For phylogenetic inferences, IRIDA implements an interactive visualization software, PhyloCanvas [12] that supports modification of tree attributes for labelling or colouring by relevant metadata and alternative tree shapes (e.g. radial, rectangular, circular). The tree visualization is exportable in PNG or SVG formats. As various public health jurisdictions may adopt specific report formats, any external reporting systems can query the analysis results in IRIDA via the REST API and distribute the data in desired formats.[1]

External Links


  1. 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 Matthews, Thomas C.; Bristow, Franklin R.; Griffiths, Emma J.; Petkau, Aaron; Adam, Josh; Dooley, Damion; Kruczkiewicz, Peter; Curatcha, John; Cabral, Jennifer; Fornika, Dan; Winsor, Geoffrey L. (2018-07-31). "The Integrated Rapid Infectious Disease Analysis (IRIDA) Platform". bioRxiv: 381830. doi:10.1101/381830.
  2. "IRIDA – Integrated Rapid Infectious Disease Analysis Project". Retrieved 2020-02-18.
  3. "SFU IRIDA beta release".{{cite web}}: CS1 maint: url-status (link)
  4. phac-nml/irida, National Microbiology Laboratory, 2020-02-18, retrieved 2020-02-18
  5. Satta, G.; Lipman, M.; Smith, G. P.; Arnold, C.; Kon, O. M.; McHugh, T. D. (2018-06-01). "Mycobacterium tuberculosis and whole-genome sequencing: how close are we to unleashing its full potential?". Clinical Microbiology and Infection. 24 (6): 604–609. doi:10.1016/j.cmi.2017.10.030. ISSN 1198-743X. PMID 29108952.
  6. Afgan, Enis; Baker, Dannon; van den Beek, Marius; Blankenberg, Daniel; Bouvier, Dave; Čech, Martin; Chilton, John; Clements, Dave; Coraor, Nate; Eberhard, Carl; Grüning, Björn (8 July 2016). "The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update". Nucleic Acids Research. 44 (W1): W3–W10. doi:10.1093/nar/gkw343. ISSN 1362-4962. PMC 4987906. PMID 27137889.
  7. Petkau, Aaron; Mabon, Philip; Sieffert, Cameron; Knox, Natalie C.; Cabral, Jennifer; Iskander, Mariam; Iskander, Mark; Weedmark, Kelly; Zaheer, Rahat; Katz, Lee S.; Nadon, Celine (2017). "SNVPhyl: a single nucleotide variant phylogenomics pipeline for microbial genomic epidemiology". Microbial Genomics. 3 (6): e000116. doi:10.1099/mgen.0.000116. PMC 5628696. PMID 29026651.
  8. Yoshida, Catherine E.; Kruczkiewicz, Peter; Laing, Chad R.; Lingohr, Erika J.; Gannon, Victor P. J.; Nash, John H. E.; Taboada, Eduardo N. (2016-01-22). "The Salmonella In Silico Typing Resource (SISTR): An Open Web-Accessible Tool for Rapidly Typing and Subtyping Draft Salmonella Genome Assemblies". PLOS ONE. 11 (1): e0147101. doi:10.1371/journal.pone.0147101. ISSN 1932-6203. PMC 4723315. PMID 26800248.
  9. Alcock, Brian P.; Raphenya, Amogelang R.; Lau, Tammy T. Y.; Tsang, Kara K.; Bouchard, Mégane; Edalatmand, Arman; Huynh, William; Nguyen, Anna-Lisa V.; Cheng, Annie A.; Liu, Sihan; Min, Sally Y. (2020-01-08). "CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database". Nucleic Acids Research. 48 (D1): D517–D525. doi:10.1093/nar/gkz935. ISSN 1362-4962. PMID 31665441.
  10. "Shovill github repository".{{cite web}}: CS1 maint: url-status (link)
  11. "Prokka github repository".{{cite web}}: CS1 maint: url-status (link)
  12. "Phylocanvas main page".{{cite web}}: CS1 maint: url-status (link)

This article "The Integrated Rapid Infectious Disease Analysis (IRIDA) Platform" is from Wikipedia. The list of its authors can be seen in its historical. Articles taken from Draft Namespace on Wikipedia could be accessed on Wikipedia's Draft Namespace.