Quantcast
Channel: BLASTFeed
Viewing all articles
Browse latest Browse all 105

New nr database available with fewer redundant titles

$
0
0

​We have made changes to the nr version 5 database, (nr_v5), to facilitate better search results and improved performance.

We have reduced the number of redundant titles in the nr_v5 database used by webBLAST, which is also available for BLAST+ users.

  • The changes in nr preserve the taxonomic diversity of the entries in the database while reducing the number of titles for identical sequences. GenPept accessions are still accessible via www.ncbi.nlm.nih.gov/protein/$GENBANK_ACCESSION or the IPG website https://www.ncbi.nlm.nih.gov/ipg/. The "Identical Proteins" link in the alignments section of the webBLAST results takes you to a full list of all accessions associated with a sequence.
  • For BLAST+ users downloading nr_v5: the database is now approximately 50% smaller, resulting in faster downloads and BLAST searches, and smaller disk space requirements. The database is downloadable at: ftp://ftp.ncbi.nlm.nih.gov/blast/db/v5/
  • For BLAST+ there is a cleanup script to help you manage the transition to this smaller database. The script removes unused database volumes: ftp://ftp.ncbi.nlm.nih.gov/blast/temp/cleanup-blastdb-volumes.py

Here are the new rules on how we keep titles in nr_v5:

  • We keep all refseq, swissprot, pir and PDB titles.
  • We keep any GenPept titles with a TAXID that has not already been seen in the record.
  • We keep at least five GenPept titles regardless of whether the TAXIDS have been seen before or not in this record.

Viewing all articles
Browse latest Browse all 105

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>