Apply to attend October 2022 interactive, hands-on workshops
Want to learn more about NCBI resources and how to implement our cutting-edge tools in your research? NCBI offers a variety of educational opportunities, including workshops, webinars, codeathons, tutorials, and more!
We are excited to announce our upcoming virtual workshop series for October 2022. Our interactive, hands-on workshops are taught by experienced NCBI Education Faculty. Applications are open to the public; however, each workshop will accept a limited number of participants to facilitate the best possible educational experience. Continue reading →
NCBI Microbial Pathogen and SARS-CoV-2 Resources in the Cloud
Get hands-on experience with NCBI Pathogen Detection and SARS-CoV-2 Surveillance data in the cloud. No prior cloud experience necessary!
We are excited that our own Stephen Sherry, PhD, is now the new NCBI Director at the National Library
of Medicine (NLM), and the NLM Associate Director for Scientific Data Resources. In these roles, Dr. Sherry will oversee the development and deployment of advanced computational solutions to meet life and health science information needs and facilitate open science and scholarship through a growing array of data, literature, and other information offerings and services from NLM.
Dr. Sherry brings a history of innovation and leadership to the NCBI Director position. Most recently, he served as Acting Director of NCBI, bringing a vision of customer engagement, and modular, interoperable, and cloud-based approaches to the technical platforms for NLM offerings and services. He is also recognized for his inventiveness in leveraging research for public health emergency response. Dr. Sherry has been central in making key innovations at NLM including the ClincalTrials.gov modernization effort
and development of the NIH Comparative Genomics Resource
, ensuring public input and technical innovation in the process. Dr. Sherry positioned NCBI as a strong collaborative force across the NIH and in supporting major NLM projects including the MEDLINE 2022 initiative
, which resulted in 100% automated indexing of the biomedical literature available through NLM’s PubMed
and PubMed Central (PMC)
“Dr. Sherry has the skills, knowledge, and insight to deliver creative, forward-thinking scientific and operational leadership for NLM and the communities we serve,” said NLM Director Patricia Flatley Brennan, RN, PhD. “His vast experience, expertise, and vision for NCBI is a great fit for NLM’s eye to the future and its commitment to drive innovation.”
Throughout his tenure at NCBI, Dr. Sherry has participated in many NIH efforts to characterize human genetic diversity and has served on numerous working groups across NIH to address a range of data science issues including the development of the genomic data sharing policy, privacy analysis for risk-sensitive data sets, and advances in scientific publications.
Dr. Sherry earned his PhD in Anthropology at the Pennsylvania State University in 1996 and completed a postdoctoral fellowship at the Louisiana State University Medical Center prior to joining NLM in 1998.
In October 2022, NCBI Datasets
will release version 14 of our datasets and dataformat command-line tools
. This release will contain breaking changes to the command syntax, content of the data packages and data reports. Thank you for your feedback that inspired these new features. We hope they will improve your experience!
We will continue to support CLI v13.x, although new features and improvements will be exclusive to CLI v14.0.0 release and up.
How is version 14 of the Datasets command-line tools (CLI v14.x) different from CLI v13.x and previous versions? Continue reading →
A new version of the Conserved Domain Database
(CDD) is now available. Version 3.20 contains 1,614 new or updated NCBI/CDD-curated domains and now mirrors Pfam
version 34 as well as new models from the NCBIfam collection. Fine-grained classifications of the [(+)ssRNA] virus RNA-dependent RNA polymerase catalytic domain, RING-finger/U-box, dimerization/docking domains of the cAMP-dependent protein kinase regulatory subunit, and Galactose/rhamnose-binding lectin domain superfamily have been added, along with many other new models.
We have significantly increased the fraction of CD-Search and interactive BATCH CD-Search queries that yield results showing conserved domain architecture information and attributes that further characterize protein function through links to information-rich resources such as Enzyme Commission (EC) numbers
, Gene Ontology (GO)
IDs, and identifiers from the CaZY
, and MEROPS
databases. See our earlier post
for additional details. You can access CDD
and find updated content on the CDD FTP site at CDD version 3.20
Database statistics for CDD version 3.20:
RefSeq release 214 is now available online
, from the FTP
site, and through NCBI’s Entrez programming utilities, E-utilities
This full release incorporates genomic, transcript, and protein data available as of September 12, 2022, and contains 328,588,569 records, including 239,609,016 proteins, 47,387,931 RNAs, and sequences from 123,394 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.
Foreign contamination screeningIntroducing the new Foreign Contamination Screen (FCS) tool! If you produce assembled genomes, check out FCS, a tool you can run yourself to improve your genome assemblies and facilitate high-quality data submissions to GenBank. FCS is part of the NIH Comparative Genomics Resource (CGR), an NLM project to establish an ecosystem to facilitate reliable comparative genomics analyses for all eukaryotic organisms. See our previous blog post to learn how FCS enhances contaminant detection sensitivity. Continue reading →
Learn about the NIH Comparative Genomics Resource (CGR) Project
The Biodiversity Genomics conference
will take place virtually, October 2-7, 2022. This event is hosted by the Earth BioGenome Project and is open and free for all to attend.
NCBI staff will present a variety of recorded talks and posters highlighting various elements of the NIH Comparative Genomics Resource (CGR)
, including NCBI Datasets
and the Comparative Genome Viewer (CGV)
. CGR is a multi-year National Library of Medicine (NLM) project to maximize the impact of eukaryotic research organisms and their genomic data resources to biomedical research. NCBI is charged with leading CGR development and engaging genomics communities. The CGR project will facilitate reliable comparative genomics analyses for all eukaryotic organisms in collaboration with the genomics community.
Release 10.0 of the NCBI Hidden Markov models (HMM) used by the Prokaryotic Genome Annotation Pipeline (PGAP
) is now available for download
. You can search this collection against your favorite prokaryotic proteins to identify their function using the HMMER
sequence analysis package.
The 10.0 release contains 15,360 models maintained by NCBI, including 228 that are new since 9.0, 99 that were modified significantly, and 205 that were assigned better names, EC numbers, Gene Ontology
(GO) terms, gene symbols or publications. You can search and view the details for these in the Protein Family Model
collection, which also includes conserved domain architectures and BlastRules, and find all RefSeq
proteins they name.
GO terms associated with HMMs are now propagated to CDSs and proteins annotated with PGAP. In case you missed it, see our previous blog post on this topic.
PubMed will be moving to an updated version of the E-utilities
API on November 15, 2022. As previously announced
, this updated version of E-utilities will use the same technology as the web version of PubMed released in 2020. So, search results returned by the updated ESearch E-utility
will now match those of the PubMed.gov
This update only affects E-utility calls with &db=pubmed. There are no changes to the E-utilities for other databases. You can refer to our previous post or watch our recorded webinar for more details on this update. Continue reading →
Maps clinically significant variants by gene and position!
is a freely accessible, public archive of reports of the relationships between human variations and phenotypes, with supporting evidence at NLM/NCBI. To help you access your variants of interest quickly, ClinVar is offering an experimental release of an all-new visualization tool in the search results. This graphical display provides an overview of variants when you search by gene or genomic region (Figures 1 and 2).
Currently the graphical display is implemented as an experiment and will appear for only 10 percent of searches by gene or genomic region, but the links in this post will show the display so you can try it out. Alternatively, if you would like to bring up the graphical display for your gene or genomic region search, you can edit the URL in the address bar to change the default gr=0 to gr=1. For example, the following URL with show the graphical display:
Note that you can only get the graphical display with gene or genomic region searches. For other types of searches, you will see the table only.
Gene search display
The display for a gene search highlights small variants within the gene. Large structural variants are also marked as a single dot in the middle of the variation. The interactive display shows the placement of variants on the gene and their clinical significance and allows you to zoom in or pan right / left and limit results to variants in a chosen gene. Figure 1 shows the graphical display as it appears at the top of the search results for the desmoglein 2 (DSG2) gene
and how to filter and navigate to variants of interest (Search ClinVar: DSG2[gene]
A. Graphical view showing all variants for the DSG2 gene. Results default to the GRCh37 assembly. You can change to the GRCh38 assembly by clicking the arrow at the upper left (circled in red). B. You can zoom in by mousing over the 8th exon in the gene diagram, which activates a pop-up menu that allows you to re-display only this region by following the link (red box). C. Refreshed result for the 8th exon of DSG2 showing a number of variants including pathogenic, benign, and ones with conflicting interpretations of pathogenicity. You can select the filters on the left-hand side of the ClinVar result to limit to variants with characteristics of interest, for example Conflicting Interpretations of pathogenicity. D. Variants in exon 8 of DSG2 filtered for conflicting interpretations of pathogenicity. You can retrieve individual variants by mousing over the graphic to activate the pop-up menu and following the link (red box).
Continue reading →