The code below extract short sequence in every sequence with the window size 4. How to shift the window by step size 2 and extract 4 base pairs?
from Bio import SeqIO with open("testA_out.fasta","w") as f: for seq_record in SeqIO.parse("testA.fasta", "fasta"): i = 0 while ((i+4) < len(seq_record.seq)) : f.write(">" + str(seq_record.id) + "\n") f.write(str(seq_record.seq[i:i+4]) + "\n") i += 2
Example Input of testA.fasta
Example Output of testA_out
>human1 ACCC >human1 CCGA >human1 GATT
The problem with this output is that there are one T left out so in this case I hope to include it as well. How can I come out with this output? With a reverse extract as well to include base pairs that are probably left out when extract from start to end. Can anyone help me?
>human1 ACCC >human1 CCGA >human1 GATT >human1 ATTT >human1 CGAT >human1 CCCG