Extracting Sequences from a Fasta using ID list

I often have to extract specific sequences from big fasta. I found this nice script here and with some slight modifications you can also use it for other file types.

save the script as “script.pl”

call script as: > perl script.pl ID-List input.fasta output

script:

#!/usr/bin/perl

use strict;
use warnings;
$ARGV[2] or die “use extractSeq.pl LIST FASTA OUT\n”;

my $list = shift @ARGV;
my $fasta = shift @ARGV;
my $out = shift @ARGV;
my %select;
open L, “$list” or die;
while () {
chomp;
s/>//g;
$select{$_} = 1;
}
close L;

$/ = “\n>”;
open O, “>$out” or die;
open F, “$fasta” or die;
while () {
s/>//g;
my ($id) = split (/\n/, $_);
print O “>$_” if (defined $select{$id});
}
close F;
close O;

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s