About RefEx

RefEx (Reference Expression dataset; https://refex.dbcls.jp) is a web tool for browsing reference gene expression, which provides access to curated data from several other public databases, with expression levels in forty tissues measured by four well-established gene-expression quantification technologies. The web interface allows users to browse the expression profiles by the gene name, various types of IDs, chromosomal regions in genetic maps, gene family based on InterPro, gene expression patterns, or biological categories based on Gene Ontology, and to compare expression profiles by different methods at a glance.

RefEx provides suitable datasets as a reference for gene expression data from 40 normal human, mouse, and rat tissues and cells. Forty tissues were selected based on the experience gained while constructing the bodymap database. The 40 tissues are classified into 10 groups (i.e., brain, blood, connective, reproductive, muscular, alimentary, liver, lung, urinary, and endo/exocrine). These groupings are mainly used for the abstraction of the gene expression profiles in the summary view and the inference of gene functions by the gene expression profiles. The following four different measurement strategies were used in our collected gene expression data: ESTs, Affymetrix GeneChip, CAGE, and RNA-Seq. These four types of data were linked based on the NCBI gene IDs in the dataset in RefEx.

How To Cite RefEx

RefEx project needs your support!
Please cite the original RefEx paper when you use RefEx. This is critical to sustaining our project funding.

Ono H, Ogasawara O, Okubo K, Bono H
RefEx, a reference gene expression dataset as a web tool for the functional analysis of genes
Scientific Data, 4:170105
DOI: 10.1038/sdata.2017.105

[Article] [PDF] [PubMed entry]

Citations

  • Papers citing RefEx
  • Features of RefEx

    • Reference of gene expression in normal organs throughout the body
      • By showing in parallel with gene expression dataset in normal 40 organs (10 major groups) was obtained by four different experimental methods, you can make an intuitive comparison among gene expression values but also the methods. Users can examine the expression profiles of unfamiliar genes in normal tissues of the body, cells, and cell lines, from actual measurement data, rather than only from a description in a journal article. Recently, we incorporated CAGE data from the FANTOM5 project into RefEx. The FANTOM5 project is a broad atlas of gene expression for human and mouse. It is now possible to search against more than five hundred human samples, encompassing cell lines, primary cells, and adult and fetal tissues.
    • Simple search interface for clear purposes
      • RefEx provides incremental search for gene name or gene symbol of your interests. Data in RefEx are also organized that can search for the group of genes belonging to a particular category such as "transcription factor" and "G-protein-coupled receptor". In addition, RefEx contains unique lists of genes with prominent expression patterns in a specific tissue relative to those in other tissues. The genes with tissue-specific expression patterns are calculated for all tissues using the ROKU method. Clicking on the tissue icons on the topof the RefEx page easily retrieves genes with tissue-specific expression patterns.
    •  Intuitive visualization for new knowledge discovery
      • The relative gene expression values are shown in RefEx as choropleth maps on 3D human body images from BodyParts3D. This type of visualization can help users to understand the differences in gene expression patterns among tissues more intuitively. In addition, users can add up to three genes to their list and compare these genes simultaneously. Users can compare all the detailed information about the genes in that list, including the expression data. This parallel comparison enables users to easily identify the differences among the genes. Therefore, RefEx is also useful as a tool for investigating the relationships of unknown genes found in gene expression analyses.
    • A practical example of useful and reusable public data
      • The data in RefEx were manually collected by RefEx curators from public databases. Raw data from the public databases were re-organized and compared with and against each other. RefEx is freely available, not only for academic users, but also for for-profit users under a Creative Commons Attribution 4.0 International License). Under this unforced license, some users might prefer to download the data and analyze them locally with other softwares. To accomplish this, a user can download a concatenated version of all the data at the downloads page. The availability of such a reference dataset will be beneficial to biologists that wish to reuse this type of data for their own research.

    Four experimental methods in RefEx

    • EST (Expressed sequence tag)
      • Expressed sequence tag (EST) data were originally obtained from the EST division of the INSD(International Nucleotide Sequence Database, consisting of Genbank/DDBJ/ENA). The number of ESTs was counted by source organ, based on the Bodymap method, according to the cDNA annotation of each EST entry. The EST data in RefEx comes from the BodyMap-Xs database, where gene expression data from the INSD EST division was previously compiled for reuse.
    • GeneChip
      • GeneChip data were previously measured by Affymetrix microarrays (GeneChip®), and calculated based on a typical microarray data analysis method. We extracted the microarray data deposited in the NCBI GEO database for our reference dataset (tissue-specific patterns of mRNA expression). The expression values of the genes were calculated from the original CEL files after robust multi-array averaging (RMA) normalization by the affy package in R/Bioconductor.
    • CAGE (cap analysis of gene expression)
      • Cap analysis of gene expression (CAGE) is a technique that produces a snapshot of the 52 end of the mRNA population in a biological sample. CAGE data collected in the RIKEN FANTOM5 project were counted by source organ, based on original data, FANTOM5 CAGE peaks expression, and annotation tables. CAGE tag counts mapped to reference genome sequences reflect the intensity of gene expression of corresponding transcripts. Tag counts are normalized by Tag per million (TPM). The processed data posted in RefEx is converted to log 2 for each TPM+1 value of the original FANTOM 5 CAGE data and then organized for each sample classification and the data to which the same GeneID is assigned are added up and averaged.
    • RNA-seq
      • We extracted normal tissue transcriptome sequence data from the NCBI Sequence Read Archive (SRA). Corresponding expression level and location data came from the Illumina Bodymap 2 project for human and mouse transcriptomes. These data were processed using typical RNA-seq data analysis pipeline with TopHat and Cufflinks, and transcript abundances were calculated and normalized to fragments per kilobase of transcript per million reads (FPKM).

    Original resources

    • EST
      • human,mouse,rat: INSD
    • GeneChip
      • human: GSE7307(Human body index - transcriptional profiling)
      • mouse: GSE10246(GNF Mouse GeneAtlas V3)
      • rat: GSE952(Transcriptome analysis in rat)
    • CAGE
      • human: PRJDB3010 (A promoter level mammalian expression atlas (human, ChIP-Seq))
      • mouse: PRJDB1100 (FANTOM5 Mouse CAGE)
    • RNA-seq
      • human: PRJEB2445 (RNA-Seq of human individual tissues and mixture of 16 tissues (Illumina Body Map))
      • mouse: PRJNA30467(Mapping and quantifying mammalian transcriptomes by RNA-Seq; brain, liver, muscle)

    Rawdata processing

    Detailed information is described in the GitHub page.

    Processed data and sample annotations

    It is available from Downloads page in RefEx website. Also, those with DOI for citations are deposited in figshare.

    Web Browser Compatibility

    This site is optimized for the latest version of following web browsers.

    • Firefox
    • Safari
    • Google Chrome
    • Internet Explorer

    Acknowledgements

    Togo picture gallery developed by Database Center for Life Science (DBCLS) is used to graphically display species and organs (Togo picture gallery, © 2012 DBCLS licensed under CC Attribution2.1 Japan).

    BodyParts3D developed by DBCLS is used to display heatmaps using 3D human body model ( BodyParts3D, © 2012 DBCLS licensed under CC Attribution-ShareAlike 2.1 Japan ).

    Also, we thank Mehul Bharat Lunagariya, Divyansh singh, Bindiya Sardhara, Yuji Hayamizu, Tushar Vyas, Md. Nur A Alam Dipu and NARASIMHA REDDY for pointing out the XSS vulnerability.

    Contact Us

    Please contact us for further details using our contact form.