sort and reduce redundancy in FASTA files

we had a fasta file with unwanted redundancy.

Reads were mapped to a reference and then each match was extracted with an ID and  sequence, to which this ID mapped. This resulted in a redundancy in IDs. Our goal was to get only one ID for each sequence.

Extracting Sequences from a Fasta using ID list

I often have to extract specific sequences from big fasta. I found this nice script here and with some slight modifications you can also use it for other file types.

