CRCdb-A comprehensive database of human core regulatory circuitry

1.What is CRC?

Core transcriptional regulatory circuit (CRC) is comprised of a group of interconnected auto-regulating transcription factors (TFs) forming loops, and core TFs in CRCs have been proved highly valuable for cell-type-specific transcriptional regulation in healthy and disease cells. They can be computationally inferred on the base of enhancer clusters called super-enhancers (SEs).

Recently, one CRC database, dbCoRC, has been built to identify CRCs from mapping of SEs through running the CRC mapper software, which has become an effective data source for CRC investigation. However, dbCoRC only collected the most top ranked CRC for each sample. In fact, other high ranked CRCs should be effectively collected since they form interconnected auto-regulating loops. More importantly, the other effective CRC identification method, coltron, has been presented, which can significantly decrease false positives of CRCs. Especially, TFs can be scored and ranked through counting frequency of TFs, and in-degree and out-degree of TFs can be computed by coltron. It is quite important for identification of core TFs in CRCs.

Furthermore, with the development of researches in human disease and biology processes, more and more human H3K27ac ChIP-Seq data are accumulating rapidly, which promote an urgent need for comprehensive and effective collecting and processing these data to obtain CRCs in various human cell/tissue types. Taken together, more human samples and CRC identification/analysis strategies should be supported. Further detailed information such as TF frequency, in-degree and out-degree should be provided for biologists to deeper understanding of the role of CRCs.

2.What information is available in the CRCdb?

Here, we developed a comprehensive human core transcriptional regulatory circuit database (CRCdb, http://www.licpathway.net/crcdb/), which aims to identify CRCs for various cell/tissue types. The current version of CRCdb documented over 500 samples from NCBI, ENCODE and Roadmap. For all samples, CRCs were identified by using a unified system environment, taking advantages of ‘coltron’ and ‘CRC mapper’. The number of human samples have been increased nearly three-fold compared with dbCoRC and the scale of data is five times larger than dbCoRC. Especially, the number of CRCs in CRCdb is 8,611,549. CRCdb further provides detailed information of CRCs (the most representative CRC and all other CRCs), supporting the display of frequency of TFs in CRCs and in-degree/out-degree of TFs.

3.Database content and construction.

4.How to use the CRCdb?

The ‘Browse’ page is organized as an interactive and alphanumerically sortable table that allows users to quickly browse samples or TFs. Users can use the ‘Show entries’ drop-down menu to change the number of records per page. To view the CRCs for a given sample, users only need to click on the ‘Sample ID’ to view it. Besides, CRCdb also has TF-based browse. Users can browse related information about all TFs, which contains TF name, TF families, ensembl ID, entrez_ID and frequency of TFs.

CRCdb provides three query methods, including sample-based, TF-based, and tissue-category-based queries.

In the sample-based query, users can obtain the CRC result of interests by determining a sample of the query.

Users can see the most representative CRC and click on the gene to see the details of the gene. Usually the most representative CRC is top ranked CRC.

Frequency of TFs that appear in CRCs: The greater the number of CRCs, the higher the frequency at which the TF is represented in all CRCs. Therefore, this TF may be more important under this sample.

Indegree: the sum of the number of other TFs binding in the regulatory sequence of a TF; Outdegree: the number of other nodes a TF s itis regulating; Node connections between TFs are used to discern auto-regulatory cliques that in turn regulate a broader enhancer network.

Users can view the TF of the entire CRC of the sample, the amount of gene expression in tissues or cells. The expression level of transcription factor is FPKM

Users can see the details of a CRC by clicking the CRC detailed button. Including TF in the CRC, and the family of TF, SE associated with TFs, ranked plot for the SEs, expression of TFs in this CRC.

In the TF-based query, users can query a TF of interests, and then CRCdb will return all CRCs that match the TF–CRC relationship and distribution of TFs for all samples.

Genomic distribution of SEs associated with TF: It can roughly understand the genomic regions of SEs associated with TF, and help to carry out the next analysis, such as knocking down the experiment.

As for the tissue query, users can query all CRCs with a particular type of tissue.

In the ‘genomic-regions’ analysis, CRCdb can identify the SEs by inputting the genomic regions, and then the SEs associated TF genes can be obtained in the analysis result table.

In the ‘Downstream gene’ analysis, users can input a gene list, CRCdb will return the TFs associated with the input genes via the TF-gene relationships in CRCs. The input gene types including coding genes, microRNAs, lncRNAs can be supported by CRCdb.

In the ‘Systematic analysis of all samples’, users can click 'Looking for TFs that appear in most samples' button, CRCdb will return the distribution of core TFs in different number of cell/tissues types.

The data of CRCs and the related files of all samples are available to download in the ‘Download’ page. We support the export of query results in other search pages.

5.CGI interface (Overlap with the user-submitted genome location)

In the aspect of data sharing, a CGI program was specially built for CRCdb. Notably, website developers are required to provide the TF name and the link of the CRCdb website to determine which CRCs related with the TF. The data obtained by the feedback can be displayed directly on the platform.

For Example: http://www.licpathway.net/crcdb/search/tf_cgi.php?tf_name=FOXO3

6.Frequently Asked Questions

Why might web pages load slowly?
Reply:CRCdb has advanced storage technology and sufficient bandwidth to meet the needs of most users for the speed of web page loading. However, it is not excluded that few users have poor user experience due to network reasons.

Why do some strategies have no corresponding data?
Reply:Because coltron is a relatively accurate software, it is normal for all samples to have no corresponding data.

What is the difference between the most representative CRC and all CRCs? Reply: the most representative CRC is top ranked CRC; all CRCs is all CRCs identified by the identification strategy of CRCs

7.Development environment

The current version of CRCdb was developed using MySQL 5.7.17 (http://www.mysql.com) and runs on a Linux-based Apache Web server (http://www.apache.org). We used PHP 7.0 (http://www.php.net) for server-side scripting. We designed and built the interactive interface using Bootstrap v3.3.7 (https://v3.bootcss.com) and JQuery v2.1.1 (http://jquery. com). We used ECharts (http://echarts.baidu.com) as a graphical visualization framework. We recommend using a modern web browser that supports the HTML5 standard such as Firefox, Google Chrome, Safari, Opera or IE 9.0+ for the best display.

The CRCdb database is freely available to the research community using the web link (http://www.licpathway.net/crcdb). Users are not required to register or login to access features in the database.