LncSEA is a powerful platform that provides a variety of types of lncRNA sets for users and perform annotation and enrichment analysis of lncRNA sets based on lncRNA lists submited by users. We collected lncRNA sets from more than 30 associated lncRNA databases. We not only collected lncRNA sets based on downstream regulatory data sources but also calculated a large number of lncRNA sets regulated by upstream transcription factors (TFs) and DNA regulatory elements by integrating TF ChIP-seq, DNase-seq, ATAC-seq and H3K27ac ChIP-seq data associated with hundreds of human cell types. Importantly, LncSEA provides annotation and enrichment analysis functions of lncRNA sets. Moreover, lncRNA set enrichment analysis associated with upstream regulators and downstream targets of lncRNAs can be performed simultaneously when choosing the categories of both upstream and downstream reference sets. LncSEA also provides a user-friendly interface to search, browse and visualize detailed information about these lncRNA sets.

The construction of LncSEA2.0 uses the most advanced process tool Snakemake, which is very convenient for maintenance and update. we also provide the Snakemake file of database construction for reference. Jump to.






Accessible_Chromatin: The chromatin accessibility data, including DNase-seq and ATAC-seq, are available for hundreds of cell types. For DNase-seq, we collected genomic region data of 292 sample types from ENCODE, Roadmap and Cistrome. For ATAC-seq, we collected genomic region data of 105 sample types from Cistrome, NCBI and 386 samples in 23 kinds of cancer types from TCGA. We used the liftOver tool of UCSC to convert genomic locations of these genomic region datasets into hg19 version. The ROSE software GeneMapper program was also used to predict chromatin-accessibility-region associated lncRNAs with proximity rules including closest, overlapping and proximal.

Cancer_Functional_State: With the development of high-throughput sequencing technology, a large amount of multi-component molecular data has been generated, which brings opportunities to the research of cancer mechanism and cancer treatment. Analysis of the functional states of tumours is beneficial to our research and unknown exploration of tumours. We collected lncRNAs as tumour markers including apoptosis, invasion, metastasis, migration, prognosis and proliferation from CRlncRNA, LncACTdb, CancerSEA. Some additional proof and information such as cancer type, lncRNA expression level and PMID are also supported in LncSEA.

Cell_Marker: One of the most fundamental questions in biology is which types of cells form different tissues and organs in a functionally coordinated fashion. Larger-scale single cell sequencing and biological experimental studies are now rapidly opening up new ways to track this question by revealing substantial cell markers for distinguishing different cell types in tissues. Some lncRNAs are also regarded as cell markers to identify cells. We downloaded lncRNAs that are potential to be cell markers from CellMarker database for various cell types in tissues of human. Because of the small number of lncRNAs, we did not divide this category into other smaller sub-categories.

Chromatin_Interaction: Chromatin organization in the nucleus can arrange specific lncRNA target genes to be in close proximity to the origin of lncRNA transcription. This mechanism would in principle allow extensive interactions between lncRNAs and their target sites located in the same or even different chromosomes. Thus, we collected chromatin interaction data from OncoBase to obtain a set of potential LncRNA target genes by filtering LncRNA pairs.

Conservation: Recent advanced studies have revealed many lncRNAs do not show the same pattern of high interspecies conservation as protein-coding genes. In order to construct lncRNA sets associated conservation, we obtained the evolutionary conservation of exons and promoters of lncRNAs from LnCompare, which were calculated using phastCons elements based on multispecies alignment. According to the conservative score of each lncRNA, we sorted and divided all the lncRNAs into three sets conservative high, middle and low. A conservation category was classified into three sub categories including '100 vertebrates', '20 mammals' and '7 vertebrates'. These functional sets of conservation can contribute to the functional interpretation of lncRNA.

Disease_Type: In recent years, there have been several widely used resources to store the relationship between lncRNAs and diseases. In order to collect sets of disease, we integrated many experimentally supported associations between lncRNA and human cancer or disease from Lnc2Cancer2.0, EVLncRNAs, LncRNADisease2.0 and MNDR2.0. We divided the disease category into four sub-categories by different data sources. Each disease collection consists of a list of lncRNAs associated with the disease. In addition, some supporting evidences like methods and PMID are also displayed in LncSEA.

Drug: We downloaded the relationships between drugs and lncRNAs from LncMap database. The relationships were calculated by the Spearman correlations between lncRNA expression levels and IC50 values of 24 drugs across cell lines. The correlation coefficient and P value are displayed in our database. We defined each unique drug related lncRNAs as a predictive set. At the same time, we got a batch of drug-resistant related lncRNAs experimentally supported from Lnc2Cancer2.0. In order to distinguish the two sets accurately, we divided the drug set into two subs-categories by different sources.

eQTL: Numerous studies indicate that lncRNAs have critical functions across biological processes, and single nucleotide polymorphisms (SNPs) could contribute to diseases or traits through influencing lncRNA expression. We aim to build some sets of lncRNAs that have specific mutations in in different cancer types. We collected eQTLs of human cancers from ncRNA-eQTL database across 33 cancer types. Both cis- and trans- eQTL studies were included. We also linked lncRNA-eQTLs to genome-wide association study (GWAS) data. Furthermore, we obtained lncRNA sets wuth four different types including 'Common cis' and 'Common trans' eQTL lncRNAs, 'Gwas cis' and 'Gwas trans'. Further details about SNP are also displayed on the set detail pages of the LncSEA.

Enhancer: In order to build the 'Enhancer' and 'Super Enhancer' categories sets, we collected and processed H3K27ac ChIP-seq data from NCBI GEO/SRA, ENCODE, Roadmap and GGR (Genomics of Gene Regulation Project). To control normalization and consistency across various data sources, we used the streamlined pipeline of Bowtie-MACS-ROSE, which was developed by Loven et al. Raw sequencing reads were aligned to hg19 reference genomes with Bowtie, peaks were called using MACS14, and SE regions were annotated using ROSE software. Finally, we obtained more than 330 000 SE regions involving 542 cells/tissues that was developed by our group. Based on these enhancers and super-enhancers, we identified the lncRNAs regulated by cell-type-specific enhancers and super enhancers using ROSE software GeneMapper program. Three different positional relationships including 'overlap', 'proximal' and 'closest' between enhancers and lncRNAs were supported. There are closest active lncRNAs with super enhancer identified by CRC Mapper program in specific cell type.

Exosome: Exosome is a kind of vesicle secreted by cells. It belongs to an extra cellular vesicle and contains different kinds of RNA to regulate the behavior of receptor cells. It also can be used as a circulating biomarker of disease. We obtained exosomes from exoRBase2.0, DeepBASE3.0 and experimental validations to construct an exosome associated lncRNA set..

Experimental_Validated_Function: LncRNAs are considered as the crucial regulators in diseases and have been demonstrated to participate in the pathological processes via numerous biological functions, such as cell proliferation, apoptosis and cell metastasis. We downloaded all the experimentally supported lncRNA-regulatory function relationships from LncTarD, which collected key targets and important biological functions driven by disease-related lncRNAs and lncRNA-mediated regulatory mechanisms in human diseases. We divided all the relationships based on the biological function positively (+) or negatively (-) affected by the lncRNA-mediated regulation in human disease.

Gene_Perturbation: LncRNAs were regulated by numerous upstream regulators, such as transcription factors and other transcriptional regulators. Inhibition the expression of upstream transcription factors could directly repress or enhance the transcription activity of lncRNAs. We obtained the knockdown/knockout experiments (siRNA/shRNA/CRISPRi) with high-throughputs data supported TF-lncRNA relationships from KnockTF. We divided the lncRNA sets based on the different TF perturbation dataset by TF names.

Cancer_Immunology: Perturbations of immune system gene regulation patterns have been considered as a major cause of the development of various types of cancer. Identification and characterization of lncRNA-associated potential regulators is critical to cancer immunotherapy. Within this class, we group all collections into 5 categories based on immune-related functions and phenotypes, including GO, pathway, function, cell, and tumor. We collected lncRNA-related immune functions from ImmPort, with immune-related GO terms from InnateDB and immune-related Pathway and Cell from ImmReg. We also obtained the cancer immunology-related lncRNAs in the TCGA cohort using the Cibersort software.

Inflammation: Inflammation has been considered as the main cause of the onset, progression, and outcome of multiple diseases, such as cancers and cardiovascular diseases. Recent studies have revealed that lncRNAs can regulate inflammatory agents, such as cytokine and chemokines, participating in the regulatory processes of the immune system and involving in the immunotherapy. We downloaded all the experimentally validated lncRNA-inflammatory disease relationships from ncRI.

Methylation_Pattern: LncRNAs play an important role in some essential epigenetic regulation processes such as DNA methylation. DNA methylation is a fundamental feature of epigenomes that can affect the expression of protein-coding or non-coding transcripts. We got manually curated collection and annotation of experimentally supported lncRNAs-DNA methylation associations from Lnc2Meth. They were divided into five patterns of DNA methylation, such as 'methylation', 'demethylation', 'hyper methylation', 'hypor methylation' and 'differential methylation'.

Mutations: Mutations that occur in lncRNA have been shown to play an important role in cancer. Mutations can destroy the RNA secondary structure of lncRNA, affect its molecular function and affect its expression pattern. The changes of lncRNAs expression and its mutations promote the occurrence and metastasis of tumors. Therefore, we collected data on LncRNA-associated mutations from TCGA and ICGC, respectively.

RNA_Compound: LncRNAs have been demonstrated to have the ability to bind compounds. Here, we collected all the lncRNA-compound pairs from RNAinter database. We divided the lncRNA sets based on the different compound names.

RNA_Histone_Modification: Numerous studies have demonstrated that lncRNAs could bind histones to involve in the gene transcription regulation. Here, we collected the lncRNA-histone modification protein pairs from RNAinter database. Totally, we listed 48 types of histone modification proteins, such as H3K27ac, H3K4me1 and H3K27me3.

RNA_Protein_Interaction: Protein binding is considered as the crucial mechanism of lncRNAs in disease regulatory pathway. Here, we collected all the lncRNA-protein pairs from RNAinter, NPinter and ENCORI. We divided the lncRNA sets based on the different protein names.

RNA_RNA_Interaction: A large number of researches showed that lncRNAs performed a variety of regulatory functions for downstream genes such as microRNAs. Through integrating RNA-target interactions in large-scale CLIP-Seq (HITS-CLIP, PAR-CLIP, iCLIP, CLASH) data from ENCORI and LncBase2.0, we defined each unique RNA related lncRNAs from one of two sources as a RNA set. These collections will contribute to understand the mechanisms of lncRNAs. For example, lncRNAs often function as competing endogenous RNAs binding miRNA family members in disease.

SmORF: Recent studies found that there are many short or small open ORF (sORFs or smORFs in the body that can encode small peptide. In addition, it was also found that non-coding genes or non-coding regions such as UTR also contained smORFs encoded functional peptides involved in muscle function regulation and regulation of cell metabolism. First,we collected encoding small peptides from sorf.org and SmProt2.0。 And we got lncRNA annotation information from GENCODE database. By calculating the intersection regions of small peptides and lncRNAs, we obtain some lncRNAs with the function of encoding small peptides and their specific genomic locations. Finally, we classify these sets into several subsets according to the type of lncRNA.

Subcellular_Localization: There are two main mechanisms of LncRNA: transcriptional and post-transcriptional level regulation. The former plays a regulatory role in the nucleus through the role of nuclear factors, while the latter plays a regulatory role in the cytoplasm through the ceRNA mechanism. We collected lncRNAs with different positions in the cells from RNALocate2.0 which provided a lot of lncRNA location information experimentally confirmed and Iloc-LncRNA2.0 that supported information predicted by bioinformatic methods. We divide subcellular localization collection into two categories by data source and data accuracy.

Super_Enhancer: In order to build the 'Enhancer' and 'Super Enhancer' categories sets, we collected and processed H3K27ac ChIP-seq data from NCBI GEO/SRA, ENCODE, Roadmap and GGR (Genomics of Gene Regulation Project). To control normalization and consistency across various data sources, we used the streamlined pipeline of Bowtie-MACS-ROSE, which was developed by Loven et al. Raw sequencing reads were aligned to hg19 reference genomes with Bowtie, peaks were called using MACS14, and SE regions were annotated using ROSE software. Finally, we obtained more than 330 000 SE regions involving 542 cells/tissues that was developed by our group. Based on these enhancers and super-enhancers, we identified the lncRNAs regulated by cell-type-specific enhancers and super enhancers using ROSE software GeneMapper program. Three different positional relationships including 'overlap', 'proximal' and 'closest' between enhancers and lncRNAs were supported. There are closest active lncRNAs with super enhancer identified by CRC Mapper program in specific cell type.

Survival: Some survival interacted lncRNAs were predicted by downloading and analysing lncRNA expression data and clinical data. Univariate Cox regression analysis was used to screen out lncRNA related to prognosis. We defined each cancer survival related lncRNAs as a set in TCGA project. Cox regression coefficients, p values and log rank test p values are displayed on set detail pages of our database for user screening and reference. Our survival sets will inform and guide the study of prognosis and lncRNA expression in cancer patients.

Tissue_Spatial_Expression: LncRNA spatial expression patterns across tissues are important for revealing or investigating lncRNA function in different tissues. More importantly, lncRNA spatial expression patterns also provide important biological cues into disease mechanisms and tissue-specific therapeutic targets. We downloaded the lncRNA-tissue relationships from lncSpA, which aims to provide the spatial atlas of expression for lncRNAs across 38 different normal tissues, 33 adult cancer types and 7 pediatric cancer types of the human body. We divided all the spatial expression patterns based on tissue names.

Transcription_Co_Factor: Numerous studies have reported that the major regulators in transcription regulation program are transcription factors (TFs), transcription cofactors (TcoFs), and chromatin regulators. TFs typically bind in a cooperative fashion to distal DNA elements to regulate gene expression via recruiting interactive TcoFs. Here we collected gene expression inference/TcoF ChIP-seq/experimental validated TcoF-lncRNA relationships from TcoFBase. We divided the lncRNA sets based on the TcoF name.

Transcription_Factor: Increasing evidence suggests that lncRNA could be regulated by upstream transcription factors. We collected transcription factor ChIP-seq data of 467 sample types from ENCODE, Remap, Cistrome, ChIP-Atlas and GTRD. We used liftOver tool of UCSC to convert ChIP-seq peak data into hg19 version. We further identified the peaks overlapping with transcriptional regulatory regions including super enhancers, promoters and Chromatin accessibility regions of lncRNAs using BEDTools. Then, the relationships between transcription factors and lncRNAs were built via many kinds of lncRNA-related regulatory regions, such as promoter and enhancer regions bound by transcription factors. Finally, for each transcription factor, we established lncRNA sets with cells/tissues specific regulatory information.

Tumor_Metastasis: Numerous studies have revealed that lncRNAs are the key regulators or biomarkers in various cancer metastatic events, such as cancer cell invasion, intravasation, extravasation and proliferation, which can cooperatively facilitate malignant tumor spread and cause massive patient deaths. Here we collected lncRNA-tumor metastasis events or cancer types from LncRNAWiki and LncR2metasta.

ceRNA: Competing endogenous RNA (ceRNA) is one of the well-investigated mechanisms of lncRNAs. Based on the ceRNA theory, lncRNAs can competitively sponge the endogenous miRNAs to rescue the degredation of downstream mRNAs. As an miRNA sponge, lncRNA plays a vital role in different tissues and cells. Therefore, we divided into three categories, namely Cell, Tissue and miRNA Here, we collected ceRNA pairs from multiple data sources, including LnCeCell and LncACTdb3.0.

m6A_Modification: Methylation of N6 adenosine (m6A) is the most abundant endogenous chemical modification in eukaryotic RNA. A large number of studies have suggested that aberrant m6A modification is the key to tumorigenesis and progression, such as breast cancer, lung cancer, acute myeloid leukemia and hepatocellular carcinoma. The abundances and effects of m6A modification on RNAs are determined by the complex interactions between different types of regulators, including methyltransferases (‘writers’), RNA binding proteins (‘readers’), and demethylases (‘erasers’). Understanding these different m6A regulators could dramatically increase our knowledge about the role of RNA methylation in the regulation of gene expression and various biological processes.

Splicing_Events: Previous studies have demonstrated that different isoforms of lncRNAs exhibit distinct, even opposite function in tumorigenesis. The preferred splicing pattern of lncRNAs in a given cancer type could facilitate to explore the modular function of non-coding sequence. We downloaded all the lncRNA related splicing events in cancers from LncAS2Cancer, which called splicing events from over 30 cancer types. We divided all the lncRNA related splicing events into eight types, including Skipped exon (SE), Alternative 5′ splice site (A5SS), Alternative 3′ splice site (A3SS), Retained intron (RI), Mutually exclusive exons (MXE), alternative transcription start site (altTSS), alternative transcription termination site (altTTS) and complex splicing (ComplexAS) from six different methods (rMATS, MAJIQ, SEASTAR, Dapars, SUPPA2 and Brie).

Chromatin_Regulators: Chromatin regulators (CRs) are crucial upstream regulatory factors of epigenetics, which can act as master controllers of gene transcription through regulation of histone modifications and chromatin remodeling. According to regulatory roles in epigenetics, CRs are usually grouped into three major categories: DNA methylators, histone modifiers, and chromatin remodelers. LncRNAs transcription was controlled by multiple CRs. Here we collected CR ChIP-seq supported CR-lncRNA relationships from CRdb. We divided the lncRNA sets based on the CR name.