top of page

SeqIO Reading a Sequence File

You're currently learning a lecture from the course: 

... 

Prerequisite Terminologies

In order to have thorough understanding of the main topic, you should have the basic concept of the following terms:

Introduction to BioPython
SeqRecord creating Seq Records
SeqRecord Formatting Records

Duration: 

Transcription

By:

Yusra Karim

Introduction:

Bio.SeqIO, the standard sequence input/output interface for BioPython. It provides a simple uniform interface to input and output assorted sequence file formats (including multiple sequence alignments), but only deals with sequences as SeqRecord objects. Among various functions of the SeqIO module, the SeqIO.read() function, like SeqIO.parse() function, which allows us to read particular files of fasta, genbank or any other format. SeqIO.read() function expects the filename and format. It is used when the file contains only one record which is returned as a single SeqRecord object.


Steps:

  • Import the SeqIO module from BioPython.

              from Bio import SeqIO

       [In this video, we’ve utilized one FASTA file of Gallus gallus tumor necrosis and one GenBank file of mRNA.]

  • Declare a variable (e.g: getFASTA_record) and call the SeqIO.read() function and pass in the fasta file name and the format.

              getFASTA_record = SeqIO.read(“Filename”, “fasta”)

  • Here, we’re telling the SeqIO.read() function that it is a fasta file or it should be interpreted as a fasta file.

  • Basically, once a particular file is read using SeqIO.read() function, it gets converted to a seq record. So this seq record should be stored in a particular variable to print out the results.

       [Use the print function and pass in the variable along with the ‘.’ parameter and id, description and sequence to print out them.]

              print(getFASTA_record.id)

              print(getFASTA_record.description)

              print(getFASTA_record.seq)

  • To read the GenBank file, declare a variable (e.g: getGanBank_record), call the SeqIO.read() function and pass in the GanBank file name and the format.

              getGenBank_record = SeqIO.read(“Filename”, “GanBank”)

       [Use the print function and pass in the variable along with the ‘.’ parameter and id, description and sequence to print out them.]

              print(getGenBank_record.id)

              print(getGenBank_record.description)

              print(getGenBank_record.seq)

              print(getGenBank_record.annotation)

       [Run the entire code to observe the results.]

  • Once you’ve run the code, it’ll print out the id, description and the sequence from the FASTA file.

  • From the GenBank file, it’ll print out the id, description, sequence and the annotation.

       Note: Basically SeqIO.parse() and SeqIO.read() functions are the same, they both take the same parameters and arguments. The difference is that the SeqIO.parse() function is used when we read multiple sequences within a single file, but in case of SeqIO.read() function, a single file is used which contains a single sequence.


Summary:

In this video, we’ve learned about the SeqIO class of the BioPython package. We also got to know about the SeqIO.read() function of the SeqIO module. We have learned to read a particular FASTA file and a GenBank file utilizing the SeqIO.read() function and the difference between the SeqIO.read() and SeqIO.parse() functions.

File(s) Section

If a particular file is required for this video, and was discussed in the lecture, you can download it by clicking the button below.

bottom of page