ResistoXplorer

Data Format Overview

There are mainly two data types accepted as an input in ResistoXplorer: a list of antimicrobial resistance genes (ARGs) or a data table from metagenomic-based AMR studies. The list data is a list of ARGs with optional abundance or fold change values. The data table is a table or matrix in tab-separated text or comma-separated values (.csv) file format containing information on features (ARGs or taxa) and samples. There are three types of data tables (files) required: an abundance profile (resistome or microbiome), an annotation (functional or taxonomic) file and a metadata file.

User can explore the ARG-microbe (host) associations by entering or pasting a list of ARGs (name/ID) of interest with optional fold change or abundance values. Such list can be those significant ARGs detected in differential abundance testing from metagenomic-based AMR studies or those identified through high-throughput qPCR. Currently, ResistoXplorer supports five primary reference databases (ResFinder, CARD, ARDB, BacMet and AMP dataset) for network-based exploratory analysis of ARGs. These databases consist of diverse variety of ARGs (antibiotic, antifungal, biocide, metal, antimicrobial peptide (AMP) and others) and can be classified based on the type of information or ARGs present within them. Additionally, the amount, type of information, gene nomenclature and annotation scheme used between databases vary considerably. As a result, there is a possibility that you might get different number or no hits for your input list of ARGs based upon database selected for mapping. So, please select the appropriate database based on your input ARG type and their naming format. Specific information related to each database is described below in detail:

ResFinder database (Download example gene list here || Database link here)

consists of 2395 ARGs.

sul2
qnrD1
aph(6)-Id
aph(3'')-Ib
blaCMY-2
sul1
qnrS1
ere(A)
qnrC
mph(E)
dfrA1
erm(F)
ant(2'')-Ia

CARD (version: 2.0) (Download example gene list here || Database link here)
consists of 2617 ARGs. Note: microbial host associations and annotation information of ARGs present only in "protein homolog" model files which are used for BLAST of metagenomic datasets is collected.
```
AAC(2'')-Ia
AAC(2'')-Ie
APH(3'''')-Ia
dfrA1
KPC-2
LEN-24
lsaE
MCR-3.5
mphC
OXA-216
OXA-217
OXA-86
OXA-87
```
ARDB (Download example gene list here || Database link here)
consists of 6451 ARGs.
```
aac2i
aac2ia
aac2ib
vanyd
vanz
vate
vand
tet40
tet39
bl2_ges
```
BacMet (version: 2.0) (Download gene list here || Database link here)
consists of 772 biocide and metal resistance genes (only experimentally validated). Note: associations and annotations for only experimentally confirmed biocide and metal resistance genes are collected. Also, simple, acyclic and hierarchical classification scheme designed for BacMet in MegaRes 2.0 is used here for functional annotations.
```
abeM
actP
arsB
copC
copZ
merA
merD
ncrC
pcoA
qacF
vcaM
```
AMP (Download gene list here || Dataset link here)
consists of 172 antimicrobial peptide (AMP) resistance genes.
```
CYP51
cytb
DHFR
CYP51a
FKS1
FKS2
FUR1
RTA2
tub2
ERG11
```

1. Resistome profile format

Resistome profile derived from whole-genome shotgun metagenomic data can be uploaded. The tab-separated (.txt) or comma-separated values (.csv) file format is used for resistome profile. Basically, it is a data table or matrix containing abundance values (raw read counts from metagenomic data saved as a tab delimited text (.txt) or comma-separated (.csv) file with rows for features (ARGs) and columns for samples). This delimited file can be generated from any spreadsheet or text editor software. Such file has to be in specific format which is described below:

It should contain sample names or IDs in first row beginning with "#NAME" in first column;
Both sample and feature names must be unique and consist of a combination of common English letters, underscores and numbers for naming purpose. Other special characters (e.g. single (') or double (") quotes) can also be used for feature (ARG) names. Latin/Greek letters are not supported;
Data values (read counts) should contain only numeric and positive values. Blank cells or with NA values are not allowed. Such values should be replaced by zero.
Non specific feature names (e.g. ARG_0001) can also be used as first column. In such case, a tab-delimited (.txt) or comma-separated (.csv) annotation mapping file must also be uploaded which contains functional annotation information at multiple levels, for each feature (ARG);
Lastly, in case of selecting already compiled database for functional annotation, the user should make sure that the feature (ARGs) names in abundance table should be in the same format as required by selected database. For more details on format for each database, kindly refer to "Annotation" tab from above.

Example:

Resistome abundance profile with features (ARGs) annotated through ResFinder database (Download here)

#NAME                Sample1   Sample2  Sample3 Sample4 Sample5
dfrA1_2_AJ419168       21       4	 4	  0	  0
tet(O)_1_M18896        424	232	 191	  786	  189
tet(T)_1_L42544        0	45	 0	  0	  1
aph(3')-III_1_M26832   47	48	 50	  51	  46

Resistome abundance profile with non specific feature (ARG) names (Download here) along with mapping functional annotation file (Download here)

#NAME                                                                                Sample1   Sample2  Sample3 Sample4 Sample5
222|JQ394987.1|JQ394987|Multi-drug_resistance|Multi-drug_efflux_pumps|MDFA              21      4	 4	  0	  0
424|D85892.1|D85892|MLS|Macrolide_phosphotransferases|MPHB                              424	232	 191	  786	  189
518|AJ007350.1|AJ007350|betalactams|Class_A_betalactamases|ACI                          0	45	 0	  0	  1
AGly|AY712687.1|gene1|Aminoglycosides|Aminoglycoside_O-nucleotidyltransferases|ANT6     47	48	 50	  51	  46

In case of Integration module, the user is also required to upload taxonomic abundance profile along with the resistome.

2. Taxonomic profile format

Taxonomic profiles derived from both 16S rRNA marker gene survey data or whole-genome shotgun metagenomic data can be uploaded. In case of taxonomic abundance profile, data values consist of read count (abundance) of taxa in each sample. The required file formats and data formatting for taxonomic profile is exactly same as stated above for resistome profile. Additionally, the user can also provide a taxonomic annotation mapping file separately for performing analysis at multiple taxonomic level (e.g. species, genus, phylum). Please note, parsing of features (taxa) names containing multiple taxonomic levels in abundance profile is not possible, hence an additional annotation file is always provided in such cases.

Example:

Resistome abundance profile (Download here) along with mapping taxonomic annotation file (Download here)

#NAME                       Sample1  Sample2  Sample3  Sample4 Sample5 Sample6 Sample7 Sample8
Acidobacterium capsulatum      219	49	42	50	6	17	22	21
Acidimicrobium ferrooxidans    424	0	191	0	0	0	0	0
Actinomyces oris               32	4	4	22	76	16	1	0
Bifidobacterium animalis       47	0	0	4	0	0	0	0

User can provide annotation information of features (ARGs) either by uploading a separate functional annotation file with their own annotation scheme or by just selecting the appropriate database (if available) used while annotating ARGs during upstream analysis of resistome data. In ResistoXplorer, we have manually collected and curated the functional annotation information from 11 (14 in total) most widely used antimicrobial resistance (AMR) databases to support analysis and profiling of resistome abundances at various functional levels. The required annotation file format or database annotation structure is described below in detail:

1. Annotation file

Tab-separated (.txt) or comma-separated values (.csv) format is also used for annotation file. For annotation file, first row should contain functional (in case of resistome profile) or taxonomic (in case microbiome profile) levels beginning with "#ANNOTATION" in the first column. All the feature (ARG or taxa) names will be present in the first column of file. Additionally, there is no requirement to include information for multiple levels, and there is no minimum or maximum functional or taxonomic annotation levels that must be included. Kindly consider the following points while formatting the annotation file:

Use the same feature (ARG or taxa) or row names as in your input resistome or taxonomic abundance table;
Use the simple, hierarchical and acyclic annotation structure containing information from higher to lower level for each feature name for accurate count-based profiling;
Also make sure that your data values do not contain tab or comma, as these are used as delimiter to separate values;
Data values should consist of a combination of common English letters, special characters and numbers for naming purpose. Latin/Greek letters are not supported;
Using blank cells or "NA" values (without quotes) for missing values are permitted in case of annotation table.

Example:

Resistome functional annotation file (Download here)

#ANNOTATION             Class            Mechanism
dfrA1_2_AJ419168	Trimethoprim     Folate pathway antagonist
tet(O)_1_M18896         Tetracycline     Target protection
tet(T)_1_L42544         Tetracycline     Target protection
aph(3')-III_1_M26832	Aminoglycoside   Enzymatic modification

Taxonomic annotation file (Download here)

#ANNOTATION                     Phylum          Class           Genus
Acidobacterium capsulatum	Acidobacteria	Acidobacteriia  Acidobacterium
Acidimicrobium ferrooxidans     Actinobacteria  Acidimicrobiia  Acidimicrobium 
Actinomyces oris                Actinobacteria  Actinobacteria  Actinomyces
Bifidobacterium animalis 	Actinobacteria  Actinobacteria  Bifidobacterium

2. Database annotation (only for resistome profile)

Currently, ResistoXplorer supports manually curated functional annotation information from several most widely and commonly used primary and secondary AMR databases including ResFinder, CARD, ARDB, ARG-ANNOT, MegaRes, AMRFinder, SARG, DeepARG-DB, ARGminer, BacMet and AMP database. All these databases use their own naming, annotation and classification scheme for annotation of features (i.e., ARGs) which have been identified in the resistome profile. Additionally, the functional annotation information as well annotation levels at which resistome profile can be analyzed vary considerably between databases.

User must make sure that the feature (row) names in their uploaded resistome profile should be in same format as present in the selected database in order to use their functional hierarchical annotations without uploading it as a separate file. Please note, all the feature (row) names are unique in the collected databases annotation table.

Here is an example of how the features (ARGs) are annotated (first column) and the functional annotation levels (first row) are organized in each of the database:

ResFinder (version: 4.1) (Download example here || Database link here)

consists of 3152 features annotated at three functional levels (Class, Mechanism and Gene).

#ANNOTATION             Class            Mechanism                    Gene
dfrA1_2_AJ419168	Trimethoprim     Folate pathway antagonist    dfrA1
tet(O)_1_M18896         Tetracycline     Target protection            tet(O)
tet(T)_1_L42544         Tetracycline     Target protection            tet(T)
aph(3')-III_1_M26832	Aminoglycoside   Enzymatic modification       aph(3')-III

CARD (version: 3.1.3) (Download example here || Database link here)

consists of 2979 features annotated at three functional levels (Mechanism, Family and Gene). Note: annotation information of features (ARGs) present only in "nucloetide fasta protein homolog" model files which are used for BLAST of metagenomic datasets is collected.

#ANNOTATION                                   Mechanism                      Family                                              Gene
gb|GQ343019|+|132-1023|ARO:3002999|CblA-1     antibiotic inactivation        CblA beta-lactamase                                 CblA-1
gb|HQ845196|+|0-861|ARO:3001109|SHV-52        antibiotic inactivation        SHV beta-lactamase                                  SHV-52
gb|AF028812|+|392-887|ARO:3002867|dfrF        antibiotic target replacement  trimethoprim resistant dihydrofolate reductase dfr	 dfrF
gb|JX017365|+|244-1120|ARO:3001989|CTX-M-130  antibiotic inactivation        CTX-M beta-lactamase                                CTX-M-130
gb|JN967644|+|0-813|ARO:3002356|NDM-6         antibiotic inactivation        NDM beta-lactamase                                  NDM-6
gb|LC004922|+|0-1146|ARO:3001855|ACT-35       antibiotic inactivation        ACT beta-lactamase                                  ACT-35
gb|AF135373|+|11-908|ARO:3002244|CARB-5       antibiotic inactivation        CARB beta-lactamase                                 CARB-5
gb|AY234334|+|0-846|ARO:3000600|Erm(34)       antibiotic target alteration   Erm 23S ribosomal RNA methyltransferase             Erm(34)

ARDB (Download example here || Database link here)

consists of 377 features annotated at two functional levels (Class and Mechanism).

#ANNOTATION     Class               Mechanism
aac2i           Aminoglycosides     Aminoglycoside N-acetyltransferase
aac2ia          Aminoglycosides     Aminoglycoside N-acetyltransferase
aac2ib          Aminoglycosides     Aminoglycoside N-acetyltransferase
aac2ic          Aminoglycosides     Aminoglycoside N-acetyltransferase
aac2id          Aminoglycosides     Aminoglycoside N-acetyltransferase
aac3ia          Aminoglycosides     Aminoglycoside N-acetyltransferase
aac3iia         Aminoglycosides     Aminoglycoside N-acetyltransferase

ARG-ANNOT (Download example here || Database link here)

consists of 2025 features annotated at Class functional level.

#ANNOTATION     Class
aac             aminoglycosides
aac2-Ia         aminoglycosides
aac2-Ib         aminoglycosides
aac2-Ic         aminoglycosides
aac(2'''')-Id	aminoglycosides
aac2-Ie         aminoglycosides
aac3-I          aminoglycosides
aac-IIIa	aminoglycosides
aac(3)-IIIb	aminoglycosides

MegaRes (version: 2.0) Full (Download example here || Database link here)

consists of 7868 features annotated at four functional levels (Type, Class, Mechanism and Group). Note: contain annotation information for all the ARGs conferring resistance to drugs, biocides and metals.

#ANNOTATION                                                                     Type            Class                   Mechanism                               Group
MEG_13|Drugs|Aminoglycosides|Aminoglycoside_N-acetyltransferases|AAC2-PRIME	Drugs           Aminoglycosides         Aminoglycoside N-acetyltransferases	AAC2-PRIME
MEG_14|Drugs|Aminoglycosides|Aminoglycoside_N-acetyltransferases|AAC2-PRIME	Drugs           Aminoglycosides         Aminoglycoside N-acetyltransferases	AAC2-PRIME
MEG_16|Drugs|Aminoglycosides|Aminoglycoside_N-acetyltransferases|AAC2-PRIME	Drugs           Aminoglycosides         Aminoglycoside N-acetyltransferases	AAC2-PRIME
MEG_3256|Biocides|Acid_resistance|Acid_resistance_protein|HEFA                  Biocides	Acid resistance         Acid resistance protein                 HEFA
MEG_3257|Biocides|Acid_resistance|Acid_resistance_protein|HEFC                  Biocides	Acid resistance         Acid resistance protein                 HEFC
MEG_7820|Biocides|Acid_resistance|Acid_resistance_protein|YDEP                  Biocides	Acid resistance         Acid resistance protein                 YDEP
MEG_723|Metals|Aluminum_resistance|Aluminum_resistance_protein|ALU1P            Metals          Aluminum resistance	Aluminum resistance protein             ALU1P
MEG_3075|Metals|Aluminum_resistance|Aluminum_ATPase_|G2ALT                      Metals          Aluminum resistance	Aluminum ATPase                         G2ALT

MegaRes (version: 2.0) Drugs only (Download example here || Database link here)

consists of 7126 features annotated at three functional levels (Class, Mechanism and Group). Note: contain annotation information for all the ARGs conferring resistance to drugs only.

#ANNOTATION                                                                     Class                   Mechanism                       Group
Bla|OXA-223|JN248564|1-825|825|betalactams|Class_D_betalactamases|OXA           betalactams             Class D betalactamases          OXA
gi|698174209|gb|KM087859.1|betalactams|Class_C_betalactamases|MIR               betalactams             Class C betalactamases          MIR
1172|AF317511.1|AF317511|betalactams|Class_B_betalactamases|VIM                 betalactams             Class B betalactamases          VIM
959|M97297.1|TRNVAN|Glycopeptides|VanA-type_accessory_protein|VANZA             Glycopeptides           VanA-type accessory protein	VANZA
Gly|VanY-A|M97297|9052-9963|912|Glycopeptides|VanA-type_accessory_protein|VANYA	Glycopeptides           VanA-type accessory protein	VANYA
Mdr|AY769962.1|gene1|Multi-drug_resistance|Multi-drug_efflux_pumps|ADEAI	Multi-drug resistance	Multi-drug efflux pumps         ADEAI
617|HQ875016.1|HQ875016|Phenicol|Phenicol_efflux_pumps|CML                      Phenicol                Phenicol efflux pumps           CML

MegaRes (version: 1.0.1) (Download example here || Database link here)

consists of 3824 features annotated at three functional levels (Class, Mechanism and Group).

#ANNOTATION                                                                     Class                   Mechanism                       Group
Bla|OXA-223|JN248564|1-825|825|betalactams|Class_D_betalactamases|OXA           betalactams             Class D betalactamases          OXA
gi|698174209|gb|KM087859.1|betalactams|Class_C_betalactamases|MIR               betalactams             Class C betalactamases          MIR
1172|AF317511.1|AF317511|betalactams|Class_B_betalactamases|VIM                 betalactams             Class B betalactamases          VIM
959|M97297.1|TRNVAN|Glycopeptides|VanA-type_accessory_protein|VANZA             Glycopeptides           VanA-type accessory protein	VANZA
Gly|VanY-A|M97297|9052-9963|912|Glycopeptides|VanA-type_accessory_protein|VANYA	Glycopeptides           VanA-type accessory protein	VANYA
Mdr|AY769962.1|gene1|Multi-drug_resistance|Multi-drug_efflux_pumps|ADEAI	Multi-drug resistance	Multi-drug efflux pumps         ADEAI
617|HQ875016.1|HQ875016|Phenicol|Phenicol_efflux_pumps|CML                      Phenicol                Phenicol efflux pumps           CML

AMRFinder (Download example here || Database link here)

consists of 4156 features annotated at two functional levels (Class and Mechanism).

#ANNOTATION     Class               Mechanism
aac(2')-Ia	AMINOGLYCOSIDE      acetyltransferase
aac(2')-Ib	AMINOGLYCOSIDE      acetyltransferase
aac(2')-Ic	AMINOGLYCOSIDE      acetyltransferase
aac(2')-Id	AMINOGLYCOSIDE      acetyltransferase
aac(2')-Ie	AMINOGLYCOSIDE      acetyltransferase
aac(2')-IIa	AMINOGLYCOSIDE      acetyltransferase
aac(2')-IIb	AMINOGLYCOSIDE      acetyltransferase

SARG (version: 2.0) (Download example here || Database link here)

consists of 12085 features annotated at two functional levels (Type and SubType).

#ANNOTATION         Type                                SubType
1CIA                chloramphenicol                     chloramphenicol__catA
1QCA                chloramphenicol                     chloramphenicol__catA
1XAT                chloramphenicol                     chloramphenicol__cat_chloramphenicol acetyltransferase
A0KQI1              bacitracin                          bacitracin__bacA
A0RD31              fosfomycin                          fosfomycin__fosB
A15097.gene.p01     macrolide-lincosamide-streptogramin	macrolide-lincosamide-streptogramin__ereB
A1A9B7              macrolide-lincosamide-streptogramin	macrolide-lincosamide-streptogramin__macB

DeepARG-DB (Download example here || Database link here)

consists of 4511 features annotated at two functional levels (Subtypes and Types).

#ANNOTATION      Subtypes            Types
AAC(6')-31       aminoglycoside      AAC(6')-31
AAC(6')-32       aminoglycoside      AAC(6')-32
AAC(6')-33       aminoglycoside      AAC(6')-33
AAC(6')-34       aminoglycoside      AAC(6')-34
AAC(6')-I30      aminoglycoside      AAC(6')-I30
AAC(6')-Ia       aminoglycoside      AAC(6')-Ia
pmrA             acriflavin          pmrA
novA             aminocoumarin       novA

ARGminer (version: 1.1.0) (Download example here || Database link here)

consists of 14872 features annotated at three functional levels (Class, Mechanism and ARG_NAME).

#ANNOTATION         Class           Mechanism                   ARG_NAME
BAE78082.1          multidrug       Unknown                     mdtP
WP_024565805.1      beta-lactam     Unknown                     BlaB
ALX99516.1          multidrug       Multi-drug efflux pumps	AdeC
YP_186749.1         multidrug       Unknown                     sav1866
CAA64891.1          multidrug       Unknown                     mtrE
BAC11911.1          multidrug       Multi-drug efflux pumps	emea
WP_000725529.1      beta-lactam     Unknown                     mecC
AAC75138.1          multidrug       Unknown                     mdtD

BacMet (version: 2.0) (Download example here || Database link here)

consists of 607 features annotated at two functional levels (Mechanism and Class). Note: annotations for only experimentally confirmed biocide and metal resistance genes are collected. Also, simple, acyclic and hierarchical classification scheme designed for BacMet in MegaRes 2.0 is used here for functional annotations.

#ANNOTATION     Mechanism                               Class
abeM            Drug and biocide MATE efflux pumps	Drug and biocide resistance
abeS            Drug and biocide SMR efflux pumps	Drug and biocide resistance
abuO            Multi-biocide RND efflux pump           Multi-biocide resistance
acn             Iron resistance protein                 Iron resistance
acr3            Arsenic resistance membrane transporter	Arsenic resistance
acrA            Drug and biocide RND efflux pumps	Drug and biocide resistance
acrB            Drug and biocide RND efflux pumps	Drug and biocide resistance
acrC            Drug and biocide RND efflux pumps	Drug and biocide resistance

Antimicrobial peptide (AMP) dataset (Download example here || Database link here)

consists of 131 features annotated at Mechanism functional level.

#ANNOTATION     Mechanism
almE            Target alteration
almF            Target alteration
almG            Target alteration
amiA            Target alteration
amiC            Target alteration
anrA            Efflux
anrB            Efflux
apsS            Regulation

Metadata File (Download: .txt or .csv)

Tab-separated (.txt) or comma-separated values (.csv) format is also used for metadata file. Sample names or IDs are in first column starting with "#NAME" in first row. In metadata files, sample names should be present in rows and metadata types (experimental factor) (e.g. Treatment) in columns. Kindly consider the following points while formatting the metadata file:

Use the same sample names or IDs as in your input resistome or microbiome abundance table;
Data values (Metadata labels) should be discrete and qualitative (e.g. HIGH, MED, LOW);
File does not contain any blank cells or with NA values;
Also make sure that neither your metadata type names or metadata labels include tab or comma, as these are used as delimiter to separate values.

Example:

#NAME       Treatment       TimePoint   Gender
Sample1     Control         Day_0       M
Sample2     Control         Day_0       F
Sample3     Control         Day_11      M
Sample4     Control         Day_11      F
Sample5     Antibiotics     Day_0       M 
Sample6     Antibiotics     Day_0       F 
Sample7     Antibiotics     Day_11      M 
Sample8     Antibiotics     Day_11      F