Protein fold class, typically <100 amino acids long
Small proteins are a diverse fold class of proteins (usually <100 amino acids long).[1][2][3] Their tertiary structure is usually maintained by disulphide bridges,[4]metal ligands,[5] and or cofactors such as heme. Some small proteins serve important regulatory functions by direct interaction with certain enzymes and are therefore also an interesting tool for biotechnological applications in microorganisms. [6]
Identification of small proteins
The size of small proteins has limited their identification and characterization for a long time. However, the various examples of functionality have led to the development of methods for their identification.
For larger ORFs, computational identification is based solely on their long uninterrupted coding potential. Computational searches for small proteins take into account multiple parameters, such as the presence of a ribosome binding site and amino acid conservation. [7]RNA sequencing or mass spectrometric data sets available are also incorporated into computational predictions. [8][9]
A method extensively used for the identification of small proteins is ribosome profiling (Ribo-seq or ribosome footprinting). Ribosome profiling uses next generation sequencing and targets only mRNA sequences protected by the ribosomes. Binding of a ribosome on an mRNA suggests that the transcript is being actively translated, allowing for the identification even of very small ORFs.[10]
Mass spectrometry is the best method thus far for identifying small proteins, but their sizes again pose a barrier. However, several adjustments are possible to perform to improve detection and data quality.[11]