Introduction

LncSEA is a powerful platform that provides a variety of types of lncRNA sets for users and perform annotation and enrichment analysis of lncRNA sets based on lncRNA lists submited by users. We collected lncRNA sets from more than 20 associated lncRNA databases. We not only collected lncRNA sets based on downstream regulatory data sources but also calculated a large number of lncRNA sets regulated by upstream transcription factors (TFs) and DNA regulatory elements by integrating TF ChIP-seq, DNase-seq, ATAC-seq and H3K27ac ChIP-seq data associated with hundreds of human cell types. Importantly, LncSEA provides annotation and enrichment analysis functions of lncRNA sets. Moreover, lncRNA set enrichment analysis associated with upstream regulators and downstream targets of lncRNAs can be performed simultaneously when choosing the categories of both upstream and downstream reference sets. LncSEA also provides a user-friendly interface to search, browse and visualize detailed information about these lncRNA sets.







C1: Disease In recent years, there have been several widely used resources to store the relationship between lncRNAs and diseases. In order to collect sets of disease, we integrated many experimentally supported associations between lncRNA and human cancer or disease from Lnc2Cancer2.0, EVLncRNAs, LncRNADisease2.0 and MNDR2.0. We divided the disease category into four sub-categories by different data sources. Each disease collection consists of a list of lncRNAs associated with the disease. In addition, some supporting evidences like methods and PMID are also displayed in LncSEA.
C2: Drug We downloaded the relationships between drugs and lncRNAs from LncMap database. The relationships were calculated by the Spearman correlations between lncRNA expression levels and IC50 values of 24 drugs across cell lines. The correlation coefficient and P value are displayed in our database. We defined each unique drug related lncRNAs as a predictive set. At the same time, we got a batch of drug-resistant related lncRNAs experimentally supported from Lnc2Cancer2.0. In order to distinguish the two sets accurately, we divided the drug set into two subs-categories by different sources.
C3: MicroRNA A large number of researches showed that lncRNAs performed a variety of regulatory functions for downstream genes such as microRNAs. Through integrating miRNA-target interactions in large-scale CLIP-Seq (HITS-CLIP, PAR-CLIP, iCLIP, CLASH) data from StarBase2.0 and LncBase2.0, we defined each unique miRNA related lncRNAs from one of two sources as a miRNA set. These collections will contribute to understand the mechanisms of lncRNAs. For example, lncRNAs often function as competing endogenous RNAs binding miRNA family members in disease.
C4: Cancer Phenotype The discovery of cancer subtypes has become one of the research hotspots in oncology. So, we downloaded phenotype-specific lncRNAs from Cancer RNA-Seq Nexus that included 40 cancers and 325 phenotypes. We defined 40 cancer sub-categories. Each phenotype (e.g. breast cancer stage II, ER+ breast cancer and Her2+ breast cancer) set contains a group of differential expression of lncRNAs between two kinds of organization samples. Average expression of samples and significant P value are available in LncSEA. Our lncRNA sets of the different subtypes that we store can provide the basis for accurate and personalized medicine.
C5&C6: Enhancer and Super Enhancer In order to build the ‘Enhancer’ and ‘Super Enhancer’ categories sets, we collected and processed H3K27ac ChIP-seq data from NCBI GEO/SRA, ENCODE, Roadmap and GGR (Genomics of Gene Regulation Project). To control normalization and consistency across various data sources, we used the streamlined pipeline of Bowtie-MACS-ROSE, which was developed by Loven et al. Raw sequencing reads were aligned to hg19 reference genomes with Bowtie, peaks were called using MACS14, and SE regions were annotated using ROSE software. Finally, we obtained more than 330 000 SE regions involving 542 cells/tissues that was developed by our group. Based on these enhancers and super-enhancers, we identified the lncRNAs regulated by cell-type-specific enhancers and super enhancers using ROSE software GeneMapper program. Three different positional relationships including ‘overlap’, ‘proximal’ and ‘closest’ between enhancers and lncRNAs were supported. There are closest active lncRNAs with super enhancer identified by CRC Mapper program in specific cell type.
C7: Accessible Chromatin The chromatin accessibility data, including DNase-seq and ATAC-seq, are available for hundreds of cell types. For DNase-seq, we collected genomic region data of 292 sample types from ENCODE, Roadmap and Cistrome. For ATAC-seq, we collected genomic region data of 105 sample types from Cistrome, NCBI and 386 samples in 23 kinds of cancer types from TCGA. We used the liftOver tool of UCSC to convert genomic locations of these genomic region datasets into hg19 version. The ROSE software GeneMapper program was also used to predict chromatin-accessibility-region associated lncRNAs with proximity rules including closest, overlapping and proximal.
C8: Cell Marker One of the most fundamental questions in biology is which types of cells form different tissues and organs in a functionally coordinated fashion. Larger-scale single cell sequencing and biological experimental studies are now rapidly opening up new ways to track this question by revealing substantial cell markers for distinguishing different cell types in tissues. Some lncRNAs are also regarded as cell markers to identify cells. We downloaded lncRNAs that are potential to be cell markers from CellMarker database for various cell types in tissues of human. Because of the small number of lncRNAs, we did not divide this category into other smaller sub-categories.
C9: Subcellular Localization There are two main mechanisms of LncRNA: transcriptional and post-transcriptional level regulation. The former plays a regulatory role in the nucleus through the role of nuclear factors, while the latter plays a regulatory role in the cytoplasm through the ceRNA mechanism. We collected lncRNAs with different positions in the cells from RNALocate which provided a lot of lncRNA location information experimentally confirmed and iLoc-lncRNA that supported information predicted by bioinformatic methods. We divide subcellular localization collection into two categories by data source and data accuracy.
C10: Cancer Hallmark With the development of high-throughput sequencing technology, a large amount of multi-component molecular data has been generated, which brings opportunities to the research of cancer mechanism and cancer treatment. Analysis of the hallmarks of tumours is beneficial to our research and unknown exploration of tumours. We collected lncRNAs as tumour markers including apoptosis, invasion, metastasis, migration, prognosis and proliferation from CRlncRNA database. Some additional proof and information such as cancer type, lncRNA expression level and PMID are also supported in LncSEA.
C11: Transcription Factor Increasing evidence suggests that lncRNA could be regulated by upstream transcription factors. We collected transcription factor ChIP-seq data of 467 sample types from ENCODE, Remap, Cistrome, ChIP-Atlas and GTRD. We used liftOver tool of UCSC to convert ChIP-seq peak data into hg19 version. We further identified the peaks overlapping with transcriptional regulatory regions including super enhancers, promoters and Chromatin accessibility regions of lncRNAs using BEDTools. Then, the relationships between transcription factors and lncRNAs were built via many kinds of lncRNA-related regulatory regions, such as promoter and enhancer regions bound by transcription factors. Finally, for each transcription factor, we established lncRNA sets with cells/tissues specific regulatory information.
C12: Methylation Pattern LncRNAs play an important role in some essential epigenetic regulation processes such as DNA methylation. DNA methylation is a fundamental feature of epigenomes that can affect the expression of protein-coding or non-coding transcripts. We got manually curated collection and annotation of experimentally supported lncRNAs-DNA methylation associations from Lnc2Meth. They were divided into five patterns of DNA methylation, such as "methylation", "demethylation", "hyper methylation", "hypor methylation" and "differential methylation ".
C13: RNA binding Protein RNA binding proteins (RBP) mediate RNA maturation, transport, localization, and translation. One RBP may have multiple targets and its expression defects can cause multiple diseases. A number of high-throughput techniques and bioinformatics prediction algorithms have been used in recent years to RNA the protein binding relationship. By integrating RNA binding proteins that rooted in large-scale CLIP-Seq data from StarBase, RNAInter and EuRBPDB, we defined each unique protein related LncRNAs as a set. Our lncRNA sets will contribute to investigate the regulatory landscape of cellular lncRNAs.
C14: Survival Some survival interacted lncRNAs were predicted by downloading and analysing lncRNA expression data and clinical data. Univariate Cox regression analysis was used to screen out lncRNA related to prognosis. We defined each cancer survival related lncRNAs as a set in TCGA project. Cox regression coefficients, p values and log rank test p values are displayed on set detail pages of our database for user screening and reference. Our survival sets will inform and guide the study of prognosis and lncRNA expression in cancer patients.
C15: SmORF Recent studies found that there are many short or small open ORF (sORFs or smORFs in the body that can encode small peptide. In addition, it was also found that non-coding genes or non-coding regions such as UTR also contained smORFs encoded functional peptides involved in muscle function regulation and regulation of cell metabolism. First,we collected encoding small peptides from sorf.org and SmProt. And we got lncRNA annotation information from GENCODE database. By calculating the intersection regions of small peptides and lncRNAs, we obtain some lncRNAs with the function of encoding small peptides and their specific genomic locations. Finally, we classify these sets into several subsets according to the type of lncRNA.
C16: Exosome Exosome is a kind of vesicle secreted by cells. It belongs to an extra cellular vesicle and contains different kinds of RNA to regulate the behavior of receptor cells. It also can be used as a circulating biomarker of disease. We obtained lncRNAs in human blood exosomes with experimental validations to construct a exosome associated lncRNA set.
C17:eQTL Numerous studies indicate that lncRNAs have critical functions across biological processes, and single nucleotide polymorphisms (SNPs) could contribute to diseases or traits through influencing lncRNA expression. We aim to build some sets of lncRNAs that have specific mutations in in different cancer types. We collected eQTLs of human cancers from ncRNA-eQTL database across 33 cancer types. Both cis- and trans- eQTL studies were included. We also linked lncRNA-eQTLs to genome-wide association study (GWAS) data. Furthermore, we obtained lncRNA sets wuth four different types including ‘Common cis’ and ‘Common trans’ eQTL lncRNAs, ‘Gwas cis’ and ‘Gwas trans’. Further details about SNP are also displayed on the set detail pages of the LncSEA.
C18:Conservation Recent advances studies have revealed many lncRNAs do not show the same pattern of high interspecies conservation as protein-coding genes. In order to construct lncRNA sets associated conservation, we obtained the evolutionary conservation of exons and promoters of lncRNAs from LnCompare, which were calculated using phastCons elements based on multispecies alignment. According to the conservative score of each lncRNA, we sorted and divided all the lncRNAs into three sets conservative high, middle and low. A conservation category was classified to three sub categories including ‘100 vertebrates’, ’20 mammals’ and ‘7 vertebrates’. These functional sets of conservation can contribute to the functional interpretation of lncRNA.