top of page


BioCodeKb - Bioinformatics Knowledgebase

Transcription is the first step in gene expression. It involves copying a gene's DNA sequence to make an RNA molecule, which carries the information needed for protein synthesis. Transcription is performed by enzymes called RNA polymerases, which link nucleotides to form an RNA strand.

Not all genes are transcribed all the time. Instead, transcription is controlled individually for each gene. Cells carefully regulate transcription, transcribing just the genes whose products are needed at a particular moment.

First, pre-messenger RNA is formed, with the involvement of RNA polymerase enzymes. The process relies on Watson-Crick base pairing, and the resultant single strand of RNA is the reverse-complement of the original DNA sequence. The pre-messenger RNA is then "edited" to produce the desired mRNA molecule in a process called RNA splicing.

As happen in DNA replication, partial unwinding of the double helix must occur before transcription can take place.

Only one strand of DNA is transcribed. The strand that contains the gene is called the sense strand, while the complementary strand is the antisense strand. The mRNA produced in transcription is a copy of the sense strand, but it is the antisense strand that is transcribed.

Stages of Transcription

The process of transcription can be broadly categorized into 3 main stages: initiation, elongation & termination.


Transcription is catalyzed by the enzyme RNA polymerase. It attaches to and moves along the DNA molecule until it recognizes a promoter sequence, which shows the starting point of transcription. There may be multiple promoter sequences in a DNA molecule. Transcription factors are proteins that control the rate of transcription. They too bind to the promoter sequences with RNA polymerase.

Once bound to the promotor sequence, RNA polymerase unwinds a portion of the DNA double helix, exposing the bases on each of the two DNA strands.


One DNA template strand is read in a 3′ to 5′ direction and so provides the template for the new mRNA molecule. The other DNA strand is known as the coding strand. This is because the base sequence of the new mRNA is identical to it, except for the replacement of thiamine bases with uracil.

Incoming ribonucleotides are used by RNA polymerase to form the mRNA strand. It does this using complementary base pairing (A to U, T to A, C to G and G to C). RNA polymerase then catalyzes the formation of phosphodiester bonds between adjacent ribonucleotides. Bases can only be added to the 3′ (three-prime) end, so the strand elongates in a 5’ to 3’ direction.


Elongation will continue until the RNA polymerase encounters a stop sequence. At this point, transcription stops and the RNA polymerase releases the DNA template.

In alternative splicing, individual exons are either spliced or included, giving rise to many different possible mRNA products. Each mRNA product codes for a different protein isoform; these protein isoforms differ in their peptide sequence and therefore their biological activity. It is estimated that up to 60% of human gene products undergo alternative splicing.

Alternative splicing contributes to protein diversity − a single gene transcript (RNA) can have thousands of different splicing patterns, and will therefore code for thousands of different proteins, a diverse proteome is generated from a relatively limited genome. Splicing is important in genetic regulation (alteration of the splicing pattern in response to cellular conditions changes protein expression).


Need to learn more about Transcription and much more?

To learn Bioinformatics, analysis, tools, biological databases, Computational Biology, Bioinformatics Programming in Python & R through interactive video courses and tutorials, Join BioCode.

bottom of page