Tag: Conserved Domains Database (CDD)

Conserved Domain Database Version 3.21 Now Available!

Check out the newly released Conserved Domain Database (CDD) version 3.21. Updated content is available on the CDD FTP site.

What’s New?

1,174 new or updated NCBI-curated domains
Mirrors Pfam version 35 as well as new models from the NCBIfams collection and revised models from the Clusters of Orthologous Genes (COG) database
Fine-grained classifications of the following domain families:

Continue reading “Conserved Domain Database Version 3.21 Now Available!” →

Comparing Yeast Species Used in Beer Brewing and Bread Making

Using the NIH Comparative Genomics Resource (CGR) to gain knowledge about less-researched organisms

The scientific community relies heavily on model organism research to gain knowledge and make discoveries. However, focusing solely on these species misses valuable variation. Comparative genomics allows us to use knowledge from a model species, such as Saccharomyces cerevisiae, to understand traits in other, related organisms, such as Saccharomyces pastorianus or Saccharomyces eubayanus. Applying this information may provide valuable insight for other less-researched organisms. The National Institutes of Health (NIH) Comparative Genomics Resource (CGR) offers a cutting-edge NCBI toolkit of high-quality genomics data and tools to help you do just that. Continue reading “Comparing Yeast Species Used in Beer Brewing and Bread Making” →

Read About NCBI Resources in 2023 Nucleic Acids Research Database Issue

The 2023 Nucleic Acids Research Database Issue features papers from NCBI staff on GenBank, Conserved Domain Database, and more. The citations are available in PubMed with full-text available in PubMed Central (PMC). To read an article, click on the PMCID number listed below. Continue reading “Read About NCBI Resources in 2023 Nucleic Acids Research Database Issue” →

RefSeq Release 216

RefSeq release 216 is now available online, from the FTP site, and through NCBI’s new resource, Datasets.

This full release incorporates genomic, transcript, and protein data available as of January 9, 2023, and contains 342,395,932 records, including 249,868,639 proteins, 49,869,497 RNAs, and sequences from 128,299 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings. Continue reading “RefSeq Release 216” →

Conserved Domain Database version 3.20 is available!

A new version of the Conserved Domain Database (CDD) is now available. Version 3.20 contains 1,614 new or updated NCBI/CDD-curated domains and now mirrors Pfam version 34 as well as new models from the NCBIfam collection. Fine-grained classifications of the [(+)ssRNA] virus RNA-dependent RNA polymerase catalytic domain, RING-finger/U-box, dimerization/docking domains of the cAMP-dependent protein kinase regulatory subunit, and Galactose/rhamnose-binding lectin domain superfamily have been added, along with many other new models.

We have significantly increased the fraction of CD-Search and interactive BATCH CD-Search queries that yield results showing conserved domain architecture information and attributes that further characterize protein function through links to information-rich resources such as Enzyme Commission (EC) numbers , Gene Ontology (GO) terms, PubMed IDs, and identifiers from the CaZY, TCDB, and MEROPS databases. See our earlier post for additional details. You can access CDD and find updated content on the CDD FTP site at CDD version 3.20.

Database statistics for CDD version 3.20:

Models	Source
64,234	Total models from all Source Databases Organized into 4,541 multi-model Superfamilies
18,882	NCBI CDD curation effort
1,125	NCBIfams
1,009	SMART v6.0
19,178	PFAM v34
4,871	COGs v1.0
10,140	NCBI Protein Clusters
4,488	TIGRFAM v15
59,693	Total models form the default CD-Search database

CD Search is part of the NIH Comparative Genomics Resource (CGR), an NLM project to establish an ecosystem to facilitate reliable comparative genomics analyses for all eukaryotic organisms.

Join our mailing list to keep up to date with CD Search and other CGR news.

Announcing new links and annotations on Conserved Domain Search results!

Conserved Domain Search (CD Search) results now show domain architecture information and other annotations that further characterize predicted domain and protein function. These include links to PubMed, Gene Ontology (GO) terms, Enzyme Commission (EC) numbers, and the SPARCLE Domain Architecture Viewer. You can use these links on the results to find literature (PubMed), assign biological roles and protein function (GO and EC), and find proteins with the same domain architecture (Domain Architecture Viewer). These annotations are currently available for a limited number of architectures, but we will continue to add them as part of our curation effort.

Figure 1 shows the results of an example CD Search showing these new links. Note that you can use the GO and EC information provided to retrieve protein models with these annotations from the Protein Family Models database, for example GO:0030246[GOTermId] — molecular function carbohydrate binding or 2.7.11.1[ECNumber] — non-specific serine/threonine protein kinase.

Figure 1. Conserved Domain Database search results for a hypothetical protein (XP_007132600.1) from the common bean (Phaseolus vulgaris). The results classify the protein as a plant receptor-like protein kinase. The results also show the EC number and the GO terms associated with this domain architecture, a link to a PubMed citation for the protein family (receptor-like protein kinases), and a link to the Domain Architecture Viewer for G-type lectin S-receptor-like serine/threonine-protein kinases. The Domain Architecture Viewer shows other proteins from the NCBI databases with the same domain architecture (order, number and types of domains). Continue reading “Announcing new links and annotations on Conserved Domain Search results!” →

Conserved Domain Database version 3.19 is available!

The Conserved Domain Database (CDD) version 3.19 is now available. This version contains 3,148 new or updated NCBI-curated domains and now mirrors Pfam version 33.1 as well as models from the NCBIfam collection. We also included fine-grained classifications of the immunoglobulin, RRM, cytochrome P450, 7-transmembrane GPCRs, KH, calponin homology and C1 domain superfamilies.

Continue reading “Conserved Domain Database version 3.19 is available!” →

The Protein Family Model resource is now available!

The new Protein Family Model resource (Figure 1) provides a way for you to search across the evidence used by the NCBI annotation pipelines to name and classify proteins. You can find protein families by gene symbol, protein function, and many other terms. You have access to related proteins in the family and publications describing members. Protein Family Models includes protein profile hidden Markov models (HMMs) and BlastRules for prokaryotes, and conserved domain architectures for prokaryotes and eukaryotes. The HMMs in the collection include Pfam models, TIGRFAMs as well as models developed at NCBI either de novo, or from NCBI protein clusters. Each of the BlastRules (PMCID: 5753331) consists of one or more model proteins of known biological function with BLAST identity and coverage cutoffs. The conserved domain architectures are based on BLAST-compatible Position Specific Score Matrices (PSSMs) that constitute the NCBI Conserved Domain database.Figure 1. Protein Family Model resource pages. Top panel. Home page. Middle panel, selected results summaries from a fielded search for the DnaK gene product (DnaK[Gene Symbol]). Bottom panel, a portion of an HMM record for DnaK derived from NCBI Protein Clusters (NF009946). The record also includes PubMed citations and HMMER analyses showing the RefSeq proteins named by this method.

Continue reading “The Protein Family Model resource is now available!” →

New viral protein domain models for annotation of coronaviruses

NLM’s Conserved Domain Database (CDD) has expanded its scope to now include 153 new viral protein domain family models for the annotation of coronaviruses, including models such as for the S1 subunit of coronavirus Spike proteins (cd21527), the nucleocapsid (N) protein of coronavirus (cd21595), and the coronavirus RNA-dependent RNA polymerase (cd21530).

Each curated domain model consists of a multiple sequence alignment containing conserved sequence features that may have been confirmed experimentally, plus links to relevant publications. When available, the domain models include 3D structures with links to interactive 3D views and interacting partners.

Check out this tabular summary of SARS-CoV-2 gene products for links to matching conserved domain models and representative 3D protein structures.

Want to view these alignments in 3D space? We’ve updated iCn3D, a web-based 3D structure viewer, with new rendering, annotation, and alignment features. Read more about how you can use iCn3D to view and analyze SARS-CoV-2-related structures.

Don’t forget to review our SARS-CoV-2 resources page to keep up to date on other coronavirus data at NCBI!

Conserved Domain Database (CDD) v. 3.18 is now available

The latest version of the Conserved Domain Database contains 2,128 new or updated NCBI-curated domains and now mirrors Pfam version 32 as well as models from NCBIfams, a collection of protein family hidden Markov models (HMMs) for improving bacterial genome annotation. We have also added fine-grained classifications of the cupin and PBP1 superfamilies. You can find this updated content on the CDD FTP site. Read on for detailed release statistics.

Continue reading “Conserved Domain Database (CDD) v. 3.18 is now available” →