C2orf16 is a protein that in humans is encoded by the C2orf16 gene. Isoform 2 of this protein (NCBI ID: CAH18189.1[4] henceforth referred to as C2orf16) is 1,984 amino acids long.[5] The gene contains 1 exon and is located at 2p23.3.[6] Aliases for C2orf16 include Open Reading Frame 16 on Chromosome 2 and P-S-E-R-S-H-H-S Repeats Containing Sequence.[7]
68 orthologs are known for this gene, including in mice and sheep, but no paralogs have been found.[8]
Gene
The C2orf16 isoform 2 is a 6.2 kb, 1 exon gene at locus 2p23.3, and contains P-S-E-R-S-H-H-S repeats on the C-terminal side of the gene from amino acid 1,559 to 1,903. These repeats appear to have arisen from a transposable element. Primates show more P-S-E-R-S-H-H-S repeats than other mammalian orthologs do.[6]
C2orf16 is not seen to have rapamycin sensitive expression.[13] C2orf16 is also seen to significantly increase expression in c-MYC knockdown breast cancer cells.[14]
mRNA
Isoforms
Two isoforms exist of C2orf16. Isoform 1 is 5,388 amino acids long encoded in 5 exons over 16,401 base pairs. Isoform 2 uses an alternate start site of transcription and is considerably shorter at 1,984 amino acids long encoded in 1 exon over 6,200 base pairs.[8]
Expression Regulation
One miRNA is predicted to bind to the 3'UTR of C2orf16, accession number MI0005564.[15][16]
Protein
C2orf16 has a predicted molecular weight of 224kD and a predicted isoelectric point of 10.08,[17] values that are relatively constant between orthologs. The protein includes higher than average composition of serine, histidine, and arginine and a lower than average composition of alanine.[18]
Compositional Features
A positive charge cluster is found from amino acid residues 1,274 to 1,302.[18]
An arginine rich region is found from amino acids 1,545 to 1,933, a serine rich region is found from amino acids 1,568 to 1,934, and a histidine rich region is found from amino acids 1,630 to 1,853.[18]
A dot matrix analysis[19] reveals a heavily repeated region from approximately residue 1,500 to 1,984, this being the P-S-E-R-S-H-H-S repeat. a small band of dots at approximately amino acid 1,200 denotes a half repeat of the P-S-E-R-S-H-H-S sequence.
C2orf16 is predicted to be localized to the nucleus after transcription.[8]
Structure
The 3D structure of C2orf16 is predicted to have three major domains. Domain 1 is from amino acids 1 to 662, domain 2 is from amino acids 674 to 1,487, and domain 3 is from amino acids 1,488 to 1,984.[27] Domain 1 and 2 are predicted to be connected via a stretch of 12 amino acids not otherwise organized into a secondary structure allowing flexibility between domains 1 and 2. Domain 2 is predicted to have protein interacting domains for transcription factors.[27] Domain 3 is predicted to follow a "balls on a string" structure[27] and has many sites for possible phosphorylation.[28]
68 orthologs are known for C2orf16.[8] The protein seems to have appeared in the mammalian evolutionary history 320 million years ago, around the divergence of mammals from reptiles. This history would explain why orthologs do not exist in amphibians, reptiles, birds, nor other more distantly related species.[30]
The P-S-E-R-S-H-H-S repeat sequence is seen to be conserved in orthologs for C2orf16, and is conserved in organisms as distantly related as oomyceteslime mold[31] and plants including the chloroplasts of Ashby's Wattle.[32] The S-P-S-E-R portion of the repeat is seen to be the most important for conservation, as seen by alignment with these orthologs and by creation of a Logo.[33]
The conservation analysis of the repeat shows the initial S-P-S is highly conserved, possibly for phosphorylation(S) and structure(P), and the R is almost completely conserved, mutating to a Lysine in some orthologs,[32] implying the positive charge is necessary for the purpose of the repeat.
The 3D shape of the repeat sequence is unclear as it has been predicted to be either balls-on-a-string[34] or an antiparallel beta-sheet[6] structure.
Function
C2orf16 isoform 2 is predicted to have a possible function in mitosis regulation through its nuclear localization,[8][21] predicted transcription factor binding site,[27] physical association with Myc,[29] and increased expression in c-MYC knockdown breast cancer cells.[14]
^Blom N, Gammeltoft S, Brunak S (December 1999). "Sequence and structure-based prediction of eukaryotic protein phosphorylation sites". Journal of Molecular Biology. 294 (5): 1351–62. doi:10.1006/jmbi.1999.3310. PMID10600390.