Protein motifs are small regions of protein three-dimensional structure or amino acid sequence shared among different proteins. They are recognizable regions of protein structure that may be defined by a unique chemical or biological function. Biological sequence motifs are defined as short, usually fixed length, sequence patterns that may represent important structural or functional features in nucleic acid and protein sequences such as transcription binding sites, splice junctions, active sites, or interaction interfaces. They occur in an exact or approximate form within a family or a subfamily of sequences. Motif discovery is therefore an important challenge in bioinformatics and certain methods have been developed for the identification of motifs shared by a set of functionally related sequences.
Motifs, also known as supersecondary structures, are particular arrangements and combinations of two or three secondary structures, often with defined topology.
An example of a structural motif that generally performs a structural role is a beta-turn. A beta turn consists of four consecutive residues where the polypeptide chain folds back on itself by nearly 180 degrees.
Beta-hairpin or beta-beta is present in most antiparallel beta structures both as an isolated ribbon and as part of beta sheets.
Helix-loop-helix is found in DNA binding proteins and also in calcium binding proteins. This motif, which is also a helix-loop-helix, is often called the EF hand.
The zinc finger motif is another motif commonly found in proteins that bind RNA and DNA. This finger-like structure consists of an α helix and two short antiparallel β strands all held together by a zinc ion, coordinated between two conserved cysteine and two histidine side-chains.
Methods used for the prediction of motifs are discussed below;
Alignment‐based methods for motif discovery first construct a multiple sequence alignment of the set of sequences, where each sequence of amino acids is typically represented as a row within a matrix. Gaps are inserted between the amino acids so that identical or similar characters are aligned in successive columns. Once the multiple alignments are constructed, the patterns are extracted from the alignment by combining the substrings common to most of the sequences.
The advantage of the alignment‐based approach is that no upper limit has to be imposed on the length of the motifs. Moreover, these algorithms usually do not need as input a maximum threshold value for the motif distance from the sequences.
The vast majority of motif discovery methods in bioinformatics are alignment‐free approaches that do not depend on the initial construction of a multiple sequence alignment. Instead, they generally search for patterns that are overrepresented in a given set of sequences. The simplest solution is to generate all possible motifs up to a maximum length, and then to search separately for the approximate occurrences of each motif in the set of sequences. Once a list of candidate patterns is obtained, the ones with the highest significance scores are selected. This approach guarantees to find all motifs that satisfy the input constraints.
Tools used for the prediction of motifs
FIRE‐pro stands for finding informative regulatory elements in proteins.
qPMS stands for quorum planted motif search
The MEME Suite