{"id":410,"date":"2015-11-23T02:00:27","date_gmt":"2015-11-23T02:00:27","guid":{"rendered":"http:\/\/www.abyteofcommonsense.com\/?p=410"},"modified":"2015-11-23T02:00:27","modified_gmt":"2015-11-23T02:00:27","slug":"creating-a-custom-local-blast-database-from-fasta","status":"publish","type":"post","link":"http:\/\/www.abyteofcommonsense.com\/?p=410","title":{"rendered":"Creating a custom local Blast database from .fasta"},"content":{"rendered":"<p>All these commands are pretty simple to use, but I couldn&#8217;t find really straight forward answers for why the hell I was getting errors, so here is my quick guide. I hope you find it useful.<\/p>\n<p>When you are creating a custom NCBI blast database to use, there&#8217;s a couple of things you need to keep in mind. It will depend in part what kind of sequences you want, nt or aa. Remember that for blastn you need nt sequences (nucleotide sequences from that section of NCBI) and for blastp\/blastx you need protein sequences (protein sequences as above). Simply choose which organisms you want it for, go to &#8216;send to&#8217;, then &#8216;File&#8217;, then &#8216;file type .fasta&#8217; After that has downloaded, you&#8217;re going to want to copy it into your ncbi\/db . I put mine in a new file. Then, use the command <strong>makeblastdb<\/strong>. I hard code all of the things I am dealing with, because sometimes my computer just doesn&#8217;t play nice.<\/p>\n<p>&nbsp;<\/p>\n<p class=\"p1\"><strong><span class=\"s1\">ncbi-blast-2.2.31+\/bin\/makeblastdb -in ncbi-blast-2.2.31+\/db\/flavivirus-nt-custom-db\/flavivirus.nogaps.noempty.nt -outncbi-blast-2.2.31+\/db\/flavivirus-nt-custom-db\/flavivirus.nogaps.noempty.db -dbtype nucl<\/span><\/strong><\/p>\n<p>If you want to do protein sequences, then you need to use -dbtype prot<\/p>\n<p>&nbsp;<\/p>\n<p class=\"p1\"><strong><span class=\"s1\">BLAST Database creation error: FASTA-Reader: No residues given<\/span><\/strong><\/p>\n<p>Blast can give this ugly, rather uninformative error for two reasons that I have run into so far. If there are gaps between your sequences in your fasta file, then it will fail. And if there are any blank sequences, that will also fail. The commands you want to use are:<\/p>\n<p class=\"p1\"><span class=\"s1\">grep -v &#8216;^$&#8217; flavivirus.nt &gt; flavivirus.nogaps.nt<\/span><\/p>\n<p class=\"p1\"><span class=\"s1\">awk -v RS=&#8221;&gt;&#8221; -v FS=&#8221;\\n&#8221; -v ORS=&#8221;&#8221; &#8216; { if ($2) print &#8220;&gt;&#8221;$0 } &#8216; flavivirus.nogaps.nt &gt; flavivirus.nogaps.noempty.nt<\/span><\/p>\n<p class=\"p1\">\n<p class=\"p1\">Now that that has hopefully completed successfully, you can use your new database as you like. In the file where you have created the database, there should now be 4 files &#8211;<\/p>\n<p class=\"p1\"><strong><span class=\"s1\">flavivirus.nogaps.noempty.db<\/span><\/strong><\/p>\n<p class=\"p1\"><strong><span class=\"s1\">flavivirus.nogaps.noempty.db.nsq<\/span><\/strong><\/p>\n<p class=\"p1\"><strong><span class=\"s1\">flavivirus.nogaps.noempty.db.nin<\/span><\/strong><\/p>\n<p class=\"p1\"><strong><span class=\"s1\">flavivirus.nogaps.noempty.db.nhr<\/span><\/strong><\/p>\n<p class=\"p1\">or, if you are using protein sequences, the files will end with .pin, .psq and .phr<\/p>\n<p class=\"p1\">\n<p class=\"p1\">To call upon your database, you need to use the original file that you created the database from. The command I used for a nt search was:<\/p>\n<p class=\"p1\"><strong><span class=\"s1\">ncbi-blast-2.2.31+\/bin\/blastn -query .\/seqs.fasta -db ncbi-blast-2.2.31+\/db\/flavivirus-nt-custom-db\/<span class=\"s1\">flavivirus.nogaps.noempty.nt<\/span> -out seqs.blast<\/span><\/strong><\/p>\n<p class=\"p1\">At this point you might get another annoying error message:<\/p>\n<p class=\"p1\"><strong><span class=\"s1\">Error: Too many positional arguments (1), the offending value: db<\/span><\/strong><\/p>\n<p class=\"p1\"><strong><span class=\"s1\">Error: Too many positional arguments (1), the offending value: \/ncbi-blast-2.2.31+\/db\/flavivirus-nt-custom-db\/flavivirus.nogaps.noempty.nt<\/span><\/strong><\/p>\n<p class=\"p1\">These errors will come up for a number of reasons:<\/p>\n<ol>\n<li class=\"p1\">make sure that you are blasting against the right database (eg. blastn = nucleotide)<\/li>\n<li class=\"p1\">blast may be throwing a hissy fit that you have accidentally called the wrong file (eg.\u00a0<span class=\"s1\">flavivirus.nogaps.noempty.nt is not the same as\u00a0<span class=\"s1\">flavivirus.nogaps.noempty). You must use exactly the same naming scheme for your original .fasta file as you do for the output from makeblastdb . It is best not to put a .fasta on the end of your original file, as this seems to confuse blast sometimes also.<\/span><\/span><\/li>\n<\/ol>\n<p>Remember, don&#8217;t worry if this error comes up<\/p>\n<p class=\"p1\"><strong><span class=\"s1\">Warning: [blastn] yoursequencenamewillbehere\u00a0Warning: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and\/or filtering options <\/span><\/strong><\/p>\n<p class=\"p1\">This just means that that particular record is empty (or is occupied with a string of NNNNNNNNNN, quite common in large scale miseq datasets). You could have also deleted it using the above command that you used on empty sequences in your custom blast database. You can check whether it is empty or not by using the grep command<\/p>\n<p class=\"p1\"><strong><span class=\"s1\">grep yoursequencenamewithoutgapswillbehere seqs.fasta -C 2<\/span><\/strong><\/p>\n<p class=\"p1\">the [-C 2] tells grep to show you the 2 lines before and after where the query appears.<\/p>\n<p class=\"p1\">\n","protected":false},"excerpt":{"rendered":"<p>All these commands are pretty simple to use, but I couldn&#8217;t find really straight forward answers for why the hell I was getting errors, so&#8230;<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"http:\/\/www.abyteofcommonsense.com\/?p=410\">Continue reading<span class=\"screen-reader-text\">Creating a custom local Blast database from .fasta<\/span><\/a><\/div>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false},"categories":[69],"tags":[70,72,71],"jetpack_featured_media_url":"","jetpack_publicize_connections":[],"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p6nzXS-6C","jetpack-related-posts":[{"id":400,"url":"http:\/\/www.abyteofcommonsense.com\/?p=400","url_meta":{"origin":410,"position":0},"title":"Local NCBI BLAST problems? Some solutions for the unexperienced bioinformatician","date":"November 16, 2015","format":false,"excerpt":"Is local blast driving you nuts? Blast is a super powerful tool if you download it onto your own computer because you can blast more than one sequence at once, and not have to worry about the server dying on you. Having set it up both with the experience of\u2026","rel":"","context":"In &quot;Bioinformatics&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":465,"url":"http:\/\/www.abyteofcommonsense.com\/?p=465","url_meta":{"origin":410,"position":1},"title":"Blast(ed) MEGAN Round 2 - or what to do when you're running yet another blast","date":"April 5, 2016","format":false,"excerpt":"So MEGAN can be a bit annoying at times. As can Blast outputs required by MEGAN. Because I'm particularly interested in the alignments produced, and want to create a consensus sequence for designing primers, I need to run blast again from what I did the other day. This time, I've\u2026","rel":"","context":"In &quot;Bioinformatics&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":403,"url":"http:\/\/www.abyteofcommonsense.com\/?p=403","url_meta":{"origin":410,"position":2},"title":"How to deal with Blast output and Fasta to get what you need for MEGAN (using Qiime)","date":"November 18, 2015","format":false,"excerpt":"So... You have some sequence data from Illumina sequencing.\u00a0These are the two sets of files I have, one is forward reads, and the other is reverse. PP2_S1_L001_R2_001.fastq.gz PP2_S1_L001_R1_001.fastq.gz ... it's in fastaq.gz format. You want to upzip it, simple right? You want to use this command: gunzip --keep PP2_S1_L001_R2_001.fastq.gz But\u2026","rel":"","context":"In &quot;Bioinformatics&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":412,"url":"http:\/\/www.abyteofcommonsense.com\/?p=412","url_meta":{"origin":410,"position":3},"title":"Using the mac OSX command line","date":"November 30, 2015","format":false,"excerpt":"The most sensible way to set up your working environment in the Mac command line when you want to do the same thing in multiple folders is to make sure you have labelled everything in the same way.\u00a0A handy hint here is that if you have terminal open, and you\u2026","rel":"","context":"In &quot;Bioinformatics&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":475,"url":"http:\/\/www.abyteofcommonsense.com\/?p=475","url_meta":{"origin":410,"position":4},"title":"Review: from the earth The All Fruit Box","date":"April 27, 2016","format":false,"excerpt":"This is my fourth\u00a0fruit box review! My first was on Aussie Farmers Direct, and I was really disappointed. My second was on Organic Angels, which had great produce but was pretty expensive. My third was on Ceres Fair Food, which had great variety in its produce. Next on the list\u2026","rel":"","context":"In &quot;Food&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":13,"url":"http:\/\/www.abyteofcommonsense.com\/?p=13","url_meta":{"origin":410,"position":5},"title":"Saving money in time for Christmas","date":"December 4, 2014","format":false,"excerpt":"I've just bought a house, with a reasonable mortgage to boot. At the same time, Eli\u00a0and I have moved in together and I've lost some of my income. Since I'll be paying mortgage payments as well as maintenance to GD\/GM, I figured I'd best look at other ways to save\u2026","rel":"","context":"In &quot;Finance&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"http:\/\/www.abyteofcommonsense.com\/index.php?rest_route=\/wp\/v2\/posts\/410"}],"collection":[{"href":"http:\/\/www.abyteofcommonsense.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.abyteofcommonsense.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.abyteofcommonsense.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/www.abyteofcommonsense.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=410"}],"version-history":[{"count":1,"href":"http:\/\/www.abyteofcommonsense.com\/index.php?rest_route=\/wp\/v2\/posts\/410\/revisions"}],"predecessor-version":[{"id":411,"href":"http:\/\/www.abyteofcommonsense.com\/index.php?rest_route=\/wp\/v2\/posts\/410\/revisions\/411"}],"wp:attachment":[{"href":"http:\/\/www.abyteofcommonsense.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=410"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.abyteofcommonsense.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=410"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.abyteofcommonsense.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=410"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}