top of page

Conserved Domain Database (CDD)

BioCodeKb - Bioinformatics Knowledgebase

A protein domain is a conserved part of a protein sequence and tertiary structure that can evolve, function, and exist independently of the rest of the protein chain. Each domain forms a compact three-dimensional structure and often can be independently stable and folded.


CDD is a protein annotation resource that have a collection of well-annotated multiple sequence alignment models for ancient domains and full-length proteins. CDD content includes NCBI-curated domains, which use 3D-structure information to explicitly define domain boundaries and provide insights into sequence, structure, function relationships, as well as domain models imported from a number of external source databases such as Pfam, SMART, COG, PRK, TIGRFAMs etc.


CD-Search is NCBI's interface to searching the Conserved Domain Database with protein or nucleotide query sequences. It uses RPS-BLAST, a variant of PSI-BLAST, to quickly scan a set of pre-calculated position-specific scoring matrices (PSSMs) with a protein query. The results of CD-Search are presented as an annotation of protein domains on the user query sequence, and can be seen as domain multiple sequence alignments with embedded user queries. The CD-Search Help provides additional details, including information about running CD-Search locally.

Batch CD-Search helps as both a web application and a script interface for a conserved domain search on multiple protein sequences, accepting up to 4,000 proteins in a single job. It enables us to view a graphical display of the concise or full search result for any individual protein from our input list, or to download the results for the complete set of proteins. The Batch CD-Search Help provides additional details.


Features of CDD

  • Identify the putative function of a protein sequence.

  • Identify a protein's classification based on domain architecture.

  • Identify the specific amino acids in a protein sequence that are putatively involved in functions such as binding or catalysis, as mapped from conserved domain annotations to the query sequence.

  • View a protein query sequence embedded within the multiple sequence alignment of a domain model.

  • Interactively visualization of the 3D structure of a conserved domain.

  • Find other proteins with similar domain architecture.

  • Interactively view the phylogenetic sequence tree for a conserved domain model of interest, with or without a query sequence embedded.

  • Annotation of structural motifs


The collection is also part of NCBI's Entrez query and retrieval system, cross linked to many other resources. CDD provides annotation of domain footprints and conserved functional sites on protein sequences. Precalculated domain annotation can be retrieved for protein sequences tracked in NCBI's Entrez system, and CDD's collection of models can be queried with novel protein sequences.


The majority of protein domain models in CDD are "singletons" that do not form clusters with other domain models.


Many large classifications for common and functionally diverse domain families have recently been updated or added to CDD, such as comprehensive hierarchies of models representing the catalytic domains of protein kinases, type 2 periplasmic binding proteins, globins and globin-like domains, the pleckstrin-homology domains, RNA recognition motif, SH3 domains, immunoglobulin domains, LIM domains, UBA_like domains or the thioredoxin superfamily.

ad-scaled.webp

Need to learn more about BioCodeKB - Bioinformatics Knowledge... | BioCode and much more?

To learn Bioinformatics, analysis, tools, biological databases, Computational Biology, Bioinformatics Programming in Python & R through interactive video courses and tutorials, Join BioCode.

bottom of page