RBPmap - Motifs Analysis and Prediction of RNA Binding Proteins

RBPmap Manual

Input
Motif selection
Advanced options
Job submission options
Sample data
Results

Genome and Database Assembly: RBPmap enables full functionality for the following genomes:

Human
- December 2013 (GRCh38/hg38) assembly, provided by the Genome Reference Consortium.
- February 2009 (GRCh37/hg19) assembly, provided by the Genome Reference Consortium.
- March 2006 (NCBI36/hg18) assembly, provided by the International Human Genome Sequencing Consortium.
Mouse
- December 2011 (GRCm38/mm10) assembly, provided by the Mouse Genome Reference Consortium.
- July 2007 (NCBI37/mm10) assembly, provided by NCBI and the Mouse Genome Sequencing Consortium.
Drosophila melanogaster
- April 2006 (BDGP R5/dm3) assembly, provided by the Berkeley Drosophila Genome Project (BDGP). Release 5.12 annotations (Oct. 2008) were provided by FlyBase.

For other genomes, the calculation is performed directly on the input sequence, without taking into account genomic information.

Query sequences: RBPmap mandatory input is a DNA/RNA sequence or a list of sequences, to which the selected motifs are mapped. The query sequences can be given in two formats:

Sequences: In FASTA format (view example).
Genomic Coordinates: In the following format: chromosome:start-end:strand (view example).
Genomic coordinates input is supported for human, mouse and drosophila genomes only.

Sequence length: The minimal length for a sequence is 21 bp and the maximal is 10,000 bp. If you have longer sequences, please divide them into segments up to 10,000 bp long.

Maximal number of sequences: The maximal number of entries per RBPmap run is 5,000. If you have a larger amount of sequences, please divide them into several RBPmap jobs.

Motif selection

The user can select the motifs of interest from RBPmap database (view list of motifs) and/or enter custom motifs.
Note that all combinations are allowed: only database motifs, only user-defined motifs or both as long as at least one motif is defined for RBPmap run.

Search RBPmap motifs database: RBPmap includes a database of 274 RNA-binding motifs from Human, Mouse and Drosophila melanogaster, that were extracted from the literature. Main references:

Dominguez D., Freese P., Alexis M.S., Su A., Hochman M., Palden T., Bazile C., Lambert N.J., Van Nostrand E.L., Pratt G.A., Yeo G.W., Graveley B.R., Burge C.B. (2018). Sequence, Structure, and Context Preferences of Human RNA Binding Proteins. Mol Cell., 70(5):854-867.
Ray,D., Kazan,H., Cook,K.B., Weirauch,M.T., Najafabadi,H.S., Li,X., Gueroussov,S., Albu,M., Zheng,H., Yang,A., et al. (2013)
A compendium of RNA-binding motifs for decoding gene regulation. Nature, 499, 172–177.
Huelga,S.C., Vu,A.Q., Arnold,J.D., Liang,T.Y., Liu,P.P., Yan,B.Y., Donohue,J.P., Shiue,L., Hoon,S., Brenner,S., et al. (2012) Integrative genome-wide analysis reveals cooperative regulation of alternative splicing by hnRNP proteins. Cell Rep, 1, 167–178.
Hafner,M., Landthaler,M., Burger,L., Khorshid,M., Hausser,J., Berninger,P., Rothballer,A., Ascano,M.,Jr, Jungkamp,A.-C., Munschauer,M., et al. (2010) Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell, 141, 129–141.
Akerman,M., David-Eden,H., Pinter,R.Y. and Mandel-Gutfreund,Y. (2009) A computational approach for genome-wide mapping of splicing factor binding sites. Genome Biol., 10, R30.

Searching the motifs database can be done in two ways:

Enter an RNA binding protein name, symbol or common alias and click the 'Search' button.
A pop-up window should be opened with the search results.
(Note that the browser's pop-up blocker has to be turned off for this site in order to view the search results).
In case the protein or the search phrase are not found in RBPmap database, the full list of motifs is displayed and the user can select any motif(s) of interest.
Open the full list of motifs stored in RBPmap database and select any motif(s) of interest.

User-defined motifs: The motifs can be provided as a consensus or a PSSM (Position Specific Scoring Matrix), where both options can be applied together. The accepted motif length is 4-10 positions.

Consensus motifs: You may enter one or more motifs that contain IUPAC symbols only. Adding a protein name is optional (by default the protein name gets the value of 'User1', 'User2', etc...).
PSSM motifs: PSSM motifs (one or more) can be provided as a probability matrix file, in MEME format.

Advanced options

Stringency level: Two thresholds (significant and suboptimal) are used to calculate the Weighted-Rank function and control the stringency level of filtering the results. This option enables to control the stringency level by selecting one of the following options:

Default stringency: P-value[significant] < 0.01 and P-value[suboptimal] < 0.02
High stringency: P-value[significant] < 0.005 and P-value[suboptimal] < 0.01

Conservation filter: Based on the tendency of regulatory regions to be conserved, RBPmap can apply an additional filter, which uses the UCSC phyloP conservation of placental mammals (for human and mouse genomes) or phastCons conservation of insects for D. melanogaster genome. When applying the conservation filter, regions, their average conservation score is lower than the average conservation score of regulatory regions, are not included in the final results.
By default, this filter is not applied, since it may be time-consuming (depending on the number of input sequences). We do recommend applying it in order to increase the specificity of the results, especially in cases of repetitive motifs.

Job submission options

E-mail address: The E-mail address is an optional field, required in order to get a link to the results page. If you don't get an E-mail from RBPmap within a reasonable time, check your spam folder, it might accidentally get there.

Job name: An optional parameter that enables you to give your job an informative name, otherwise, it will get a unique number identifier.

Note: a user can run up to 5 jobs simultaneously. When exceeding this limitation, the submitted job waits until one of the running jobs finishes and then it is executed.

Sample data

The 'Load sample data' button enables to try out RBPmap by automatically filling all the required fields for RBPmap run with sample data. In order to actually run RBPmap, you should click the 'Submit' button.
By default, the selected motifs and the advanced parameters are automatically filled to fit the example sequence, but it is possible to change all parameters as desired to experience the different options of RBPmap.

Results

RBPmap results page consists of two presentations of the results for each input sequence: (view results page example)

A summary of the binding sites predictions.
A visualized display of the binding sites using the UCSC Genome Browser.

In case there is more than one valid input sequence, an additional file, which summarizes the motifs predictions for all the input sequences together, is given as well. (View all sequences summary example)

Note: RBPmap uses BLAT to determine the query sequence coordinates in human, mouse and drosophila genomes (in case the input is FASTA sequences). In cases BLAT does not find at least 95% identity match for the query sequence or in other genomes, there is no possibility to display RBPmap results in the Genome Browser.

Binding sites predictions summary (view example):
This is a summary table of the predicted binding sites, their location on the input sequence and a measure for their reliability. Each protein is shown separately, where all the occurrences of the protein's ascribed motifs are listed together, ordered by their sequence position. The table is available both in web-based presentation and as a downloadable text file.
Note: In case there are more than 100 valid entries, the summary table will be provided as a downloadable text file only.

RBPmap presents only the hits that have passed all the filtering stages of the algorithm:

The hit's initial match score must pass the significant and suboptimal thresholds, controlled by the stringency level.
The hit's WR score must be significantly greater (with P-value<0.05) than the mean score of the background, calculated for the relevant genomic region.
If the conservation filter is applied, the mean conservation score of intergenic regions over the input sequence must be greater than the mean conservation score of regulatory regions.

The summary table consists of the following information:

Sequence Position: The starting position of the binding site in the input sequence (relative to the sequence itself and not to the genomic position).

Genomic Coordinate: The genomic coordinate from which the binding site starts (the coordinates in RBPmap are 1-based, meaning that the starting and ending positions are both included in the calculated sequence).

Motif: The motif that is mapped to the query sequence (several proteins have more than one motif). All the motifs are in consensus representation to simplify the results display.

K-mer occurrence: The exact occurrence of the motif in the query sequence. In the web-based presentation of the summary table, the occurrence of the k-mer is displayed in the context of its flanking sequences (25 bp of each side). The k-mer itself is color-coded by WebLogo colors.

Z-score and P-value: The Z-score (standard score) measures the deviation of the hit's WR score from the mean. The mean WR score was calculated using specific background datasets.
The P-value represents the probability of obtaining a specific Z-score considering a normal distribution (one-tailed).

UCSC Genome Browser visualization (view example):

The Genome Browser visualizes the predicted motif binding sites on the input sequence. Each track represents a protein, and the predicted binding sites are displayed at their first genomic position.

Note: The Genome Browser displays only the plus strand. If your sequence is mapped to the minus strand, read the Genome Browser results from right to left.

The predicted binding sites custom tracks are initially presented in dense mode. The hits are colored in different shades of gray according to their Z-score (a higher score is darker). Clicking on the track's title opens it in a full mode. The crossing line represents the P-value=0.05 cutoff.
In addition to RBPmap custom tracks, the default presentation includes the 'GENECODE v32' track (or ‘UCSC Genes’ in older genome assemblies) in dense mode.

For more information about the UCSC Genome Browser options, go to the Genome Browser User Guide.


Home Overview Manual Versions Download Contact	RBPmap Manual Input Motif selection Advanced options Job submission options Sample data Results Input Genome and Database Assembly: RBPmap enables full functionality for the following genomes: Human December 2013 (GRCh38/hg38) assembly, provided by the Genome Reference Consortium. February 2009 (GRCh37/hg19) assembly, provided by the Genome Reference Consortium. March 2006 (NCBI36/hg18) assembly, provided by the International Human Genome Sequencing Consortium. Mouse December 2011 (GRCm38/mm10) assembly, provided by the Mouse Genome Reference Consortium. July 2007 (NCBI37/mm10) assembly, provided by NCBI and the Mouse Genome Sequencing Consortium. Drosophila melanogaster April 2006 (BDGP R5/dm3) assembly, provided by the Berkeley Drosophila Genome Project (BDGP). Release 5.12 annotations (Oct. 2008) were provided by FlyBase. For other genomes, the calculation is performed directly on the input sequence, without taking into account genomic information. Query sequences: RBPmap mandatory input is a DNA/RNA sequence or a list of sequences, to which the selected motifs are mapped. The query sequences can be given in two formats: Sequences: In FASTA format (view example). Genomic Coordinates: In the following format: chromosome:start-end:strand (view example). Genomic coordinates input is supported for human, mouse and drosophila genomes only. Sequence length: The minimal length for a sequence is 21 bp and the maximal is 10,000 bp. If you have longer sequences, please divide them into segments up to 10,000 bp long. Maximal number of sequences: The maximal number of entries per RBPmap run is 5,000. If you have a larger amount of sequences, please divide them into several RBPmap jobs. Motif selection The user can select the motifs of interest from RBPmap database (view list of motifs) and/or enter custom motifs. Note that all combinations are allowed: only database motifs, only user-defined motifs or both as long as at least one motif is defined for RBPmap run. Search RBPmap motifs database: RBPmap includes a database of 274 RNA-binding motifs from Human, Mouse and Drosophila melanogaster, that were extracted from the literature. Main references: Dominguez D., Freese P., Alexis M.S., Su A., Hochman M., Palden T., Bazile C., Lambert N.J., Van Nostrand E.L., Pratt G.A., Yeo G.W., Graveley B.R., Burge C.B. (2018). Sequence, Structure, and Context Preferences of Human RNA Binding Proteins. Mol Cell., 70(5):854-867. Ray,D., Kazan,H., Cook,K.B., Weirauch,M.T., Najafabadi,H.S., Li,X., Gueroussov,S., Albu,M., Zheng,H., Yang,A., et al. (2013) A compendium of RNA-binding motifs for decoding gene regulation. Nature, 499, 172–177. Huelga,S.C., Vu,A.Q., Arnold,J.D., Liang,T.Y., Liu,P.P., Yan,B.Y., Donohue,J.P., Shiue,L., Hoon,S., Brenner,S., et al. (2012) Integrative genome-wide analysis reveals cooperative regulation of alternative splicing by hnRNP proteins. Cell Rep, 1, 167–178. Hafner,M., Landthaler,M., Burger,L., Khorshid,M., Hausser,J., Berninger,P., Rothballer,A., Ascano,M.,Jr, Jungkamp,A.-C., Munschauer,M., et al. (2010) Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell, 141, 129–141. Akerman,M., David-Eden,H., Pinter,R.Y. and Mandel-Gutfreund,Y. (2009) A computational approach for genome-wide mapping of splicing factor binding sites. Genome Biol., 10, R30. Searching the motifs database can be done in two ways: Enter an RNA binding protein name, symbol or common alias and click the 'Search' button. A pop-up window should be opened with the search results. (Note that the browser's pop-up blocker has to be turned off for this site in order to view the search results). In case the protein or the search phrase are not found in RBPmap database, the full list of motifs is displayed and the user can select any motif(s) of interest. Open the full list of motifs stored in RBPmap database and select any motif(s) of interest. User-defined motifs: The motifs can be provided as a consensus or a PSSM (Position Specific Scoring Matrix), where both options can be applied together. The accepted motif length is 4-10 positions. Consensus motifs: You may enter one or more motifs that contain IUPAC symbols only. Adding a protein name is optional (by default the protein name gets the value of 'User1', 'User2', etc...). PSSM motifs: PSSM motifs (one or more) can be provided as a probability matrix file, in MEME format. Advanced options Stringency level: Two thresholds (significant and suboptimal) are used to calculate the Weighted-Rank function and control the stringency level of filtering the results. This option enables to control the stringency level by selecting one of the following options: Default stringency: P-value[significant] < 0.01 and P-value[suboptimal] < 0.02 High stringency: P-value[significant] < 0.005 and P-value[suboptimal] < 0.01 Conservation filter: Based on the tendency of regulatory regions to be conserved, RBPmap can apply an additional filter, which uses the UCSC phyloP conservation of placental mammals (for human and mouse genomes) or phastCons conservation of insects for D. melanogaster genome. When applying the conservation filter, regions, their average conservation score is lower than the average conservation score of regulatory regions, are not included in the final results. By default, this filter is not applied, since it may be time-consuming (depending on the number of input sequences). We do recommend applying it in order to increase the specificity of the results, especially in cases of repetitive motifs. Job submission options E-mail address: The E-mail address is an optional field, required in order to get a link to the results page. If you don't get an E-mail from RBPmap within a reasonable time, check your spam folder, it might accidentally get there. Job name: An optional parameter that enables you to give your job an informative name, otherwise, it will get a unique number identifier. Note: a user can run up to 5 jobs simultaneously. When exceeding this limitation, the submitted job waits until one of the running jobs finishes and then it is executed. Sample data The 'Load sample data' button enables to try out RBPmap by automatically filling all the required fields for RBPmap run with sample data. In order to actually run RBPmap, you should click the 'Submit' button. By default, the selected motifs and the advanced parameters are automatically filled to fit the example sequence, but it is possible to change all parameters as desired to experience the different options of RBPmap. Results RBPmap results page consists of two presentations of the results for each input sequence: (view results page example) A summary of the binding sites predictions. A visualized display of the binding sites using the UCSC Genome Browser. In case there is more than one valid input sequence, an additional file, which summarizes the motifs predictions for all the input sequences together, is given as well. (View all sequences summary example) Note: RBPmap uses BLAT to determine the query sequence coordinates in human, mouse and drosophila genomes (in case the input is FASTA sequences). In cases BLAT does not find at least 95% identity match for the query sequence or in other genomes, there is no possibility to display RBPmap results in the Genome Browser. Binding sites predictions summary (view example): This is a summary table of the predicted binding sites, their location on the input sequence and a measure for their reliability. Each protein is shown separately, where all the occurrences of the protein's ascribed motifs are listed together, ordered by their sequence position. The table is available both in web-based presentation and as a downloadable text file. Note: In case there are more than 100 valid entries, the summary table will be provided as a downloadable text file only. RBPmap presents only the hits that have passed all the filtering stages of the algorithm: The hit's initial match score must pass the significant and suboptimal thresholds, controlled by the stringency level. The hit's WR score must be significantly greater (with P-value<0.05) than the mean score of the background, calculated for the relevant genomic region. If the conservation filter is applied, the mean conservation score of intergenic regions over the input sequence must be greater than the mean conservation score of regulatory regions. The summary table consists of the following information: Sequence Position: The starting position of the binding site in the input sequence (relative to the sequence itself and not to the genomic position). Genomic Coordinate: The genomic coordinate from which the binding site starts (the coordinates in RBPmap are 1-based, meaning that the starting and ending positions are both included in the calculated sequence). Motif: The motif that is mapped to the query sequence (several proteins have more than one motif). All the motifs are in consensus representation to simplify the results display. K-mer occurrence: The exact occurrence of the motif in the query sequence. In the web-based presentation of the summary table, the occurrence of the k-mer is displayed in the context of its flanking sequences (25 bp of each side). The k-mer itself is color-coded by WebLogo colors. Z-score and P-value: The Z-score (standard score) measures the deviation of the hit's WR score from the mean. The mean WR score was calculated using specific background datasets. The P-value represents the probability of obtaining a specific Z-score considering a normal distribution (one-tailed). UCSC Genome Browser visualization (view example): The Genome Browser visualizes the predicted motif binding sites on the input sequence. Each track represents a protein, and the predicted binding sites are displayed at their first genomic position. Note: The Genome Browser displays only the plus strand. If your sequence is mapped to the minus strand, read the Genome Browser results from right to left. The predicted binding sites custom tracks are initially presented in dense mode. The hits are colored in different shades of gray according to their Z-score (a higher score is darker). Clicking on the track's title opens it in a full mode. The crossing line represents the P-value=0.05 cutoff. In addition to RBPmap custom tracks, the default presentation includes the 'GENECODE v32' track (or ‘UCSC Genes’ in older genome assemblies) in dense mode. For more information about the UCSC Genome Browser options, go to the Genome Browser User Guide.