A typical enhancer(TE), as illustrated in the top panel of the Figure, is a several hundred base pair region of DNA[1][2] that can bind transcription factors to sequence motifs on the enhancer. The typical enhancer can come in proximity to its target gene through a large chromosome loop. A Mediator a complex (consisting of about 26 proteins in an interacting structure) communicates regulatory signals from the enhancer-located DNA-bound transcription factors to the promoter of a gene, regulating RNA transcription of the target gene.
A super-enhancer, illustrated in the lower panel of the Figure, is a region of the mammalian genome comprising multiple typical enhancers that is collectively bound by an array of transcription factor proteins to drive transcription of genes involved in cell identity,[3][4][5] or of genes involved in cancer.[6] Because super-enhancers frequently occur near genes important for controlling and defining cell identity, they may be used to quickly identify key nodes regulating cell identity.[5][7] Super-enhancers are also central to mediating dysregulation of signaling pathways and promoting cancer cell growth.[6][8] Super-enhancers differ from typical enhancers, however, in that they are strongly dependent on additional specialized proteins that create and maintain their formation, including BRD4 (shown in the lower panel of Figure) and co-factors including p300.[9]
Enhancers have several quantifiable traits that have a range of values, and these traits are generally elevated at super-enhancers. Super-enhancers are bound by higher levels of transcription-regulating proteins and are associated with genes that are more highly expressed.[3][10][11][12] Expression of genes associated with super-enhancers is particularly sensitive to perturbations, which may facilitate cell state transitions or explain sensitivity of super-enhancer—associated genes to small molecules that target transcription.[3][10][11][13][14]
Frequency of super-enhancers
In many cell types, only a minority of activated enhancers are located in Super-Enhancers (SEs). For specialized tissue, such as skeletal muscle, a reduced number of genes are expressed and a low number of specialized and activated super-enhancers are found. In human skeletal muscle, there are nine identified types of cells. On average, the number of expressed genes in these nine cell types is 1,331.[15] There are also about 22 super-enhancers specific to skeletal muscle cells among the nine types of skeletal muscle cells, indicating that specialized super-enhancers in these cells are about 1.7% of the number of typical enhancers (TEs).[16] In immune-system B cells, a study identified 140 SEs and 4,290 TEs in non-stimulated B cells (SEs were 3.2% of activated transcription areas). In stimulated B cells SEs were 3.6% of activated transcription areas.[17] Similarly, in mouse embryonic stem cells, 231 SEs were found, compared to 8,794 TEs, with SEs comprising 2.6% of activated chromatin regions.[18] A study of neural stem cells found 445 SEs and 9436 TEs, so that SEs were 4.7% of active enhancer regions.[19]
Formation of super-enhancers
Hundreds of thousands of sites in the human genome can potentially act as enhancers. In one large 2020 study, 78 different types of human cells were examined for links between activated enhancers and genes coding for messenger RNA to produce gene products. Distributed among the 78 types of cells there were a total of 449,627 activated enhancers linked to 17,643 protein-coding genes.[20] With this large number of potentially active enhancers, there are some genome regions with a cluster of enhancers that, when all are activated they can all loop to the same promoter and produce a super-enhancer, driving a gene to have very high messenger RNA output.
One well-studied gene, MYC, has amplified expression in as many as 70% of all cancers.[21] While about 28% of its over-expressions are due to genetic focal amplifications or translocations,[22] the majority of cases of over-expression of MYC are due to activated super-enhancers.[23] There are more than 10 different super-enhancers that can cause MYC over-expression. For each of 4 tumor types of cells grown in culture (HCT-116, MCF7, K562 and Jurkat) there were three to five super-enhancers specific to each tumor cell type.
In one 2013 study,[24] the length of typical enhancers was found to be about 700 base pairs while in the case of super-enhancers the length was about 9,000 base pairs (encompassing multiple single enhancers). A later study, in 2020, indicated that typical enhancers were about 200 nucleotides long and that there may be as many as 3.6 million potentially active enhancers occupying 21.55% of the human genome.[25]
In the nucleus of mammalian cells, almost all the DNA is wrapped around regularly spaced protein complexes, called nucleosomes (see top panel in Figure "Chromatin").[26] The protein complexes are composed of 4 pairs of histones, H2A, H2B, H3 and H4. The DNA plus these protein complexes is called chromatin (see Figure illustrating chromatin). Enhancer regions, as described above, are several hundred nucleotides long. To be activated, the enhancer region must have the nucleosomes evicted from the DNA so that the multiple transcription factors that bind to that enhancer DNA would have access to their binding sites (see bottom panel in Figure "Chromatin"). (To be an active enhancer, more than 10 different binding sites must be occupied by different transcription factors in the enhancer.[25])
In eviction of nucleosomes from enhancer DNA, a pioneer transcription factor first loosens up the attachment of DNA to the nucleosome of an enhancer region. For instance, one transcription factor that does this is the pioneer transcription factor NF-kB .[28] Five steps follow this: (1) NF-kB is acetylated by p300/CBP. (2) Acetylated NF-kB recruits a specific histone acetyltransferase enzyme, BRD4.[29] (3) BRD4 acetylates histone 3 at histone 3 lysine 122 (see Figure “Nucleosome at enhancer with H3K122 acetylated”). (4) When histone 3 lysine 122 is acetylated the nucleosome is evicted from the enhancer sequence.[30] (5) Opening up the enhancer DNA allows binding of the other transcription factors needed to form an activated enhancer. Presumably, when the activating signal for NF-kB is very strong, much more NF-kB is activated, and then greatly increased NF-kB can start the process of activating multiple nearby enhancers at the same time, forming a super-enhancer.
Super-enhancers promote high levels of transcription
As described above, in forming a super-enhancer, BRD4 is complexed with NF-kB. This complex also recruits and forms a further complex with cyclin T1 and Cdk9. Cyclin T1/Cdk9 is also known as P-TEFb. P-TEFb acts as a kinase that phosphorylates RNA polymerase II (RNAP II), which then activates (in conjunction with the Mediator complex described below) the polymerase on the promoter of a gene to initiate transcription and to continue transcription (instead of pausing).[31]
The transcription factors, bound to their sites on each enhancer within the super-enhancer, recruit the Mediator complex between each enhancer and the RNA polymerase II that will initiate transcription of the gene to be actively transcribed (see Figure at top of article that illustrates a super-enhancer). The Mediator complex in humans is 1.4 MDa in size and includes 26 sub-units.[32] The tail modules of the Mediator complex protein sub-units interact with the activation domains of transcription factors bound at enhancers and the head and middle modules interact with the pre-initiation complex (PIC) at gene promoters.[33] The Mediator complex, when certain sub-units are phosphorylated and up-activated by particular cyclin-dependent kinases (Cdk8, Cdk9, Cdk19, etc.) it will then promote higher levels of transcription.
History
The regulation of transcription by enhancers has been studied since the 1980s.[34][35][36][37][38] Large or multi-component transcription regulators with a range of mechanistic properties, including locus control regions, clustered open regulatory elements, and transcription initiation platforms, were observed shortly thereafter.[39][40][41][42] More recent research has suggested that these different categories of regulatory elements may represent subtypes of super-enhancer.[5][43]
In 2013, two labs identified large enhancers near several genes especially important for establishing cell identities. While Richard A. Young and colleagues identified super-enhancers, Francis Collins and colleagues identified stretch enhancers.[3][4] Both super-enhancers and stretch enhancers are clusters of enhancers that control cell-specific genes and may be largely synonymous.[4][44]
As currently defined, the term “super-enhancer” was introduced by Young’s lab to describe regions identified in mouse embryonic stem cells (ESCs).[3] These particularly large, potent enhancer regions were found to control the genes that establish the embryonic stem cell identity, including Oct-4, Sox2, Nanog, Klf4, and Esrrb. Perturbation of the super-enhancers associated with these genes showed a range of effects on their target genes’ expression.[44] Super-enhancers have been since identified near cell identity-regulators in a range of mouse and human tissues. [4][5][45][46][47][48][49][50][51][52][53][54][55][56][57][58][59][60][61]
Function
The enhancers comprising super-enhancers share the functions of enhancers, including binding transcription factor proteins, looping to target genes, and activating transcription.[3][5][43][44] Three notable traits of enhancers comprising super-enhancers are their clustering in genomic proximity, their exceptional signal of transcription-regulating proteins, and their high frequency of physical interaction with each other. Perturbing the DNA of enhancers comprising super-enhancers showed a range of effects on the expression of cell identity genes, suggesting a complex relationship between the constituent enhancers.[44] Super-enhancers separated by tens of megabases cluster in three-dimensions inside the nucleus of mouse embryonic stem cells.[62][63]
High levels of many transcription factors and co-factors are seen at super-enhancers (e.g., CDK7, BRD4, and Mediator).[3][5][10][11][13][14][43]
This high concentration of transcription-regulating proteins suggests why their target genes tend to be more highly expressed than other classes of genes. However, housekeeping genes tend to be more highly expressed than super-enhancer—associated genes.[3]
Super-enhancers may have evolved at key cell identity genes to render the transcription of these genes responsive to an array of external cues.[44] The enhancers comprising a super-enhancer can each be responsive to different signals, which allows the transcription of a single gene to be regulated by multiple signaling pathways.[44] Pathways seen to regulate their target genes using super-enhancers include Wnt, TGFb, LIF, BDNF, and NOTCH.[44][64][65][66][67] The constituent enhancers of super-enhancers physically interact with each other and their target genes over a long range sequence-wise.[12][46][68]
Super-enhancers that control the expression of major cell surface receptors with a crucial role in the function of a given cell lineage have also been defined. This is notably the case for B-lymphocytes, the survival, the activation and the differentiation of which rely on the expression of membrane-form immunoglobulins (Ig). The Ig heavy chain locus super-enhancer is a very large (25kb) cis-regulatory region, including multiple enhancers and controlling several major modifications of the locus (notably somatic hypermutation, class-switch recombination and locus suicide recombination).
Relevance to Disease
Mutations in super-enhancers have been noted in various diseases, including cancers, type 1 diabetes, Alzheimer’s disease, lupus, rheumatoid arthritis, multiple sclerosis, systemic scleroderma, primary biliary cirrhosis, Crohn’s disease, Graves disease, vitiligo, and atrial fibrillation.[4][5][11][49][56][59][69][70][71][72][73] A similar enrichment in disease-associated sequence variation has also been observed for stretch enhancers.[4]
Super-enhancers may play important roles in the misregulation of gene expression in cancer. During tumor development, tumor cells acquire super-enhancers at key oncogenes, which drive higher levels of transcription of these genes than in healthy cells.[5][10][68][69][74][75][76][77][78][79][80][81][82][83] Altered super-enhancer function is also induced by mutations of chromatin regulators.[84] Acquired super-enhancers may thus be biomarkers that could be useful for diagnosis and therapeutic intervention.[44]
Proteins enriched at super-enhancers include the targets of small molecules that target transcription-regulating proteins and have been deployed against cancers.[10][11][49][85] For instance, super-enhancers rely on exceptional amounts of CDK7, and, in cancer, multiple papers report the loss of expression of their target genes when cells are treated with the CDK7 inhibitor THZ1.[10][13][14][86] Similarly, super-enhancers are enriched in the target of the JQ1 small molecule, BRD4, so treatment with JQ1 causes exceptional losses in expression for super-enhancer—associated genes.[11]
Identification
Super-enhancers have been most commonly identified by locating genomic regions that are highly enriched in ChIP-Seq signal. ChIP-Seq experiments targeting master transcription factors and co-factors like Mediator or BRD4 have been used, but the most frequently used is H3K27ac-marked nucleosomes.[3][5][11][87][88][89] The program “ROSE” (Rank Ordering of Super-Enhancers) is commonly used to identify super-enhancers from ChIP-Seq data. This program stitches together previously identified enhancer regions and ranks these stitched enhancers by their ChIP-Seq signal.[3] The stitching distance selected to combine multiple individual enhancers into larger domains can vary. Because some markers of enhancer activity also are enriched in promoters, regions within promoters of genes can be disregarded. ROSE separates super-enhancers from typical enhancers by their exceptional enrichment in a mark of enhancer activity. Homer is another tool that can identify super-enhancers.[90]
^Cameron A, Wakelin G, Gaulton N, Young LV, Wotherspoon S, Hodson N, Lees MJ, Moore DR, Johnston AP (December 2022). "Identification of underexplored mesenchymal and vascular-related cell populations in human skeletal muscle". Am J Physiol Cell Physiol. 323 (6): C1586 –C1600. doi:10.1152/ajpcell.00364.2022. PMID36342160.
^Jang MK, Mochizuki K, Zhou M, Jeong HS, Brady JN, Ozato K (August 2005). "The bromodomain protein Brd4 is a positive regulatory component of P-TEFb and stimulates RNA polymerase II-dependent transcription". Mol Cell. 19 (4): 523–34. doi:10.1016/j.molcel.2005.06.027. PMID16109376.
^Cellier M, Belouchi A, Gros P (June 1996). "Resistance to intracellular infections: comparative genomic analysis of Nramp". Trends in Genetics. 12 (6): 201–4. doi:10.1016/0168-9525(96)30042-5. PMID8928221.
^Koch F, Fenouil R, Gut M, Cauchy P, Albert TK, Zacarias-Cabeza J, Spicuglia S, de la Chapelle AL, Heidemann M, Hintermair C, Eick D, Gut I, Ferrier P, Andrau JC (August 2011). "Transcription initiation platforms and GTF recruitment at tissue-specific enhancers and promoters". Nature Structural & Molecular Biology. 18 (8): 956–63. doi:10.1038/nsmb.2085. PMID21765417. S2CID12778976.