Overview

3USS (3'Utr Sequence Seeker) is a web-server developed with the aim of retrieving 3’UTR genomic coordinates and nucleotide sequences of the transcripts assembled by standard RNA-seq analysis protocols.

The tool allows users to see whether the 3'UTRs in their RNA-seq assemblies are different (shorter or longer) compared to annotated 3'UTRs in a Reference, which extends the annotation-matching functionality of existing assembly pipelines. The alternative 3'UTRs are also compared to others in different databases, in order to identify Putative Novel (not already known) 3'UTRs.
It is also possible to compare the 3'UTRs between one assembly (for a specific experiment, sample, biological condition) and another one, detecting the alternative 3'UTRs sequences occurring in specific biological context.

To understand how 3USS works, it is possible to explore three examples (for one RNA-seq assembly, for two RNA-seq assemblies and for 3'UTRs of already known transcripts). For each of them, you can click on the input files to download them, check their formats and run the job, or you can directly click on the corresponding Results page.

The execution time depends on the volume of transcriptome input data, but takes between 5 and 10 minutes of computer time for typical experiments.
If an email address is provided (optional), the server will send a link to the Results page.

Input Data

From the Home page, you can decide (Upper window: Retrieving alternative 3'UTRs (longer or shorter compared to the Reference) from RNA-seq assembled transcripts) whether to retrieve 3'UTRs of transcripts assembled by ONE only RNA-seq experiment, or by TWO different experiments along with a comparison between them.

Although the tool was thought for managing RNA-seq assembled transcripts, it also provides you with another utility (Bottom window: Retrieving 3'UTRs of already Annotated transcripts) that aids to easily retrieve 3'UTR genomic coordinates and nucleotide sequences from Transcript Annotation files.

In both cases, you need to select from the Menu the organism with the version of its genome ( for ex.: Mus musculus (mm9) ).
Be aware that this is the genome where the nucleotide sequences of 3'UTRs will be extracted from. Currently the choice is among 8 species: H. sapiens, M. musculus, R. Norvegicus, C. familiaris, B. taurus, G. gallus, D. melanogaster, C. elegans.

Retrieving alternative 3'UTRs (longer or shorter compared to the Reference) from RNA-seq assembled transcripts

You have to upload as input the transcriptome assembly file obtained by standard RNA-seq data analysis protocols (see the scheme of 3USS Input data from RNA-seq experiments down at the bottom of the paragraph ).
The uploaded RNA-Seq GTF transcriptome assembly file already needs to be compared to a Reference Annotation, using Cuffcompare or Cuffmerge, programs which can compare your assembled transcripts to a Reference Annotation such as UCSC, NCBI, Ensembl, or other sources.

The 3USS input files:
     - combined.gtf as the RNA-Seq GTF transcriptome assembly file (Cuffcompare output)
     - Reference_Annotation.gtf as the Reference Annotation GTF file
or:
     - merged.gtf as the RNA-Seq GTF transcriptome assembly file (Cuffmerge output)
     - Reference_Annotation.gtf as the Reference Annotation GTF file

Check the Cuffcompare output file (combined.gtf) or the Cuffmerge output file (merged.gtf) format: they must contain the field: class_code to be properly used.
For any information about Cuffcompare or Cuffmerge packages, visit the manual pages: http://cole-trapnell-lab.github.io/cufflinks/manual/.
In standard protocol analysis, Cuffcompare is used to compare one sample assembly file to a Reference. To have a look at the typical command line to run Cuffcompare click here
In standard protocol analysis, Cuffmerge is used to merge together several replica/sample assembly files, then it handles running Cuffcompare for you.
To have a look at the typical command line to run Cuffmerge click here

The typical command line to run Cuffcompare (and produce the 3USS input file) is:
cuffcompare -r Reference_Annotation.gtf -o My_Experiment transcripts_cufflinks.gtf

where:
transcripts_cufflinks.gtf is the input (the transcriptome assembly file from programs such as Cufflinks or Scripture)
Reference_Annotation.gtf is the reference annotation transcriptome for comparison
My_Experiment.combined.gtf is the output file, containing the standard comparison between assembled and annotated transcripts.

The 3USS user can use as Input:
My_Experiment.combined.gtf as RNA-seq transcriptome assembly file
and:
Reference_Annotation.gtf as Reference Transcriptome used for the RNA-seq analysis


The typical command line to run Cuffmerge (and produce the 3USS input file) is:
cuffmerge -g Reference_Annotation.gtf assembly_list.txt
Cuffmerge produces merged.gtf, a GTF file that contains an assembly that merges together the input assemblies.

where:
Reference_Annotation.gtf is the Reference Annotation transcriptome for comparison
assembly_list.txt is a text file with a list (one per line) of GTF files that you'd like to merge together into a single GTF assembly file

The 3USS user can use as Input:
merged.gtf as RNA-seq transcriptome assembly file
and:
Reference_Annotation.gtf as Reference Transcriptome used for the RNA-seq analysis

The server also accepts both the input files as compressed files (.zip or .gz). The recommended way is directly to upload your Reference Annotation file, used by Cuffcompare or Cuffmerge to produce your input file. Alternatively, you can choose the Annotation Database from a drop down menu. The tool provides sequence and annotation data from Illumina iGenomes. This is the Cufflinks' recommended repository of sequence and annotation files. For each species, the latest genomic build (up to May 15, as reported in the web-site) and the corresponding available annotation data from NCBI and Ensembl were downloaded. UCSC annotation files, instead, were directly downloaded from the UCSC Genome Browser website, in order to keep the corresponding UCSC transcript codes (ex.: uc029qmz.1); in fact, the UCSC annotation files in the iGenomes repository have different codes (ex.: NM_001195662). The tool also provides Annotations from Gencode: Gencode version 19 for Homo sapiens (hg19) and Gencode version M1 for Mus musculus (mm9), respectively.
To properly work, it is fundamental that the selected Annotation source exactly corresponds to the Reference Annotation used during the RNA-seq analysis (by Cuffcompare/Cuffmerge). If not, you might obtain complete inconsistency in the Results.

Comparing two experiments

In case you want to compare the results of two diverse biological samples, one combined.gtf (or merged.gtf) file for each sample (experiment) is required as input.
Clicking on the button (in the upper-right side), the second RNA-seq assembly file can be uploaded.

Examples

As you can see, in the upper part of the Input window there are the links to two examples: Ex_1 and Ex_2, for one RNA-seq assembly or for two RNA-seq assemblies, respectively. For each of them, you can click on the input files to download them (they contain a README file and the input data files) to run the job, or you can directly click on the corresponding Results page. In particular, in the Example1 public experimental data downloaded from NCBI GEO database are analyzed.

Scheme of 3USS Input data from RNA-seq experiments

Retrieving 3'UTRs of already Annotated transcripts

To retrieve the 3'UTR genomic coordinates and nucleotide sequences of already known transcripts, it is possible to upload your GTF Annotation file. You can also upload it as a compressed file (.zip or .gz).
The corresponding organism (and genome) needs to be properly selected: it will be the genome where the nucleotide sequences of the 3'UTRs will be extracted from. If not, you will obtain complete inconsistency in the Results.

As you can see, in the upper part of the Input window there is the link to one example: Ex_3. You can click on the input files to download them, check their formats and run the job, or you can directly click on the corresponding Results page.


[Back to top]

Results

Retrieving alternative 3'UTRs (longer or shorter compared to the Reference) from RNA-seq assembled transcripts

The server processes the input files and immediately returns some preliminary Statistics.
Firstly, it shows the total number of assembled transcripts from the RNA-seq experiment(s) and, in particular, the number of them matching the same intron chain of protein-coding transcripts annotated in the Reference Annotation; then, it compares their 3'UTRs, reporting how many assembled trasncripts have different 3'UTRs (shorter or longer) with respect to the corresponding in the Reference and how many among them are also not annotated in other databases (Putative Novel). The comparison of each Putative novel 3'UTR is performed against all the other 3'UTRs in the same Reference and against all the 3'UTRs stored in the available Annotation Databases for each organism (Ensembl, NCBI, UCSC, Gencode, as above described in details).

In the Results part, some output files, reporting exhaustive information about 3'UTRs, are provided:
- a multi-fasta file of the nucleotide sequences corresponding to the alternative 3'UTRs; each header contains:
the assembled transcript id__the corresponding Reference transcript id__the corresponding gene name;the strand;the assembled 3'UTR genomic coordinates (per each exon);the 3'UTR sequence length
(ex: >TCONS_00000011__FBtr0078104__CG11377;+;chr2L:103764-105332;1569)
- a tab-file containing: Gene_name, Reference_transcript id, Assembled_transcript id and the 3'UTR difference in length (reported in descending order of length: Assembled_3'UTR length - Reference_3'UTR length)
- a tab-file with the alternative 3'UTRs annotated in other databases (the database name and the corresponding associated transcript id are reported) or not (Putative Novel)
- a tab-file with the list of the transcripts without 3'UTR in the RNA-seq assembly file (differently from the Reference ones)
- a GTF-file of the assembled transcripts (with retrieved start- and stop-codons) having alternative 3'UTRs identified as Putative Novel (this file can be saved and uploaded in a genome browser to visualize in pink the corresponding transcripts): a link  to upload the file in the UCSC Genome Browser and immediately see the results is provided
- a GTF-file of the assembled transcripts (with retrieved start- and stop-codons) having alternative 3'UTRs identified in other databases (this file can be saved and uploaded in a genome browser to visualize in light green the corresponding transcripts): a link  to upload the file in the UCSC Genome Browser and immediately see the results is provided.

All these data, where appropriate, are also provided for all the protein-coding transcripts assembled from the RNA-seq experiment (sharing the intron chain with known protein-coding transcripts). In this case both transcripts exactly matching the same 3'UTR of the Reference annotated ones and transcripts with alternative 3'UTRs are considered. The complete GTF file of the assembled transcripts is provided (this file can be saved and uploaded in a genome browser to visualize in dark green the corresponding transcripts): a link  to upload the file in the UCSC Genome Browser and immediately see the results is also provided.

Example results for ONE experiment (Ex_1):

Comparing two experiments

In case you have used two different RNA-seq assembly files as input, the Results page consists of two parts.

The upper part shows the Results for each experiment, as described above.
In addition, in the bottom part, there are the results for their intersection: a Venn diagram illustrates how the transcripts with alternative 3'UTRs are distributed across the two experiments (to download the picture, right-click on it). You are provided with: a list of the transcripts with alternative 3'UTRs only in one specific experiment, a list of the transcripts with alternative 3'UTRs in both experiments, and, among these, a list of the transcripts with identical alternative 3'UTR in both the experiments. For each case, a list of the corresponding genes is provided to be directly used to perform further analysis (for ex. biological function enrichment analysis).

Example results for TWO experiments (Ex_2):

Retrieving 3'UTRs of already Annotated transcripts

The tool processes the input file and returns some preliminary Statistics.
It shows the total number of annotated transcripts and, among them, the protein-coding transcripts. In particular, it indicates how many of them are annotated without 3'UTR.

In the Results part, some output files, reporting exhaustive information about 3'UTRs, are provided:
- a multi-fasta file of the 3'UTR nucleotide sequences; each header contains:
the transcript id__the corresponding gene name;the strand;the 3'UTR genomic coordinates (per each exon);the 3'UTR sequence length
(ex: >NM_000014.4__A2M;-;chr12:9220304-9220418;115)
- the list of the protein-coding transcripts without 3'UTR in the Annotation

Example results for Annotated transcripts (Ex_3):


[Back to top]