|
Exercise - Sequence Database Searching for Similar Sequences
First we are going to perform a successive database search to retrieve several members of a protein family.
• Use the NCBI protein database search to retrieve the sequence of a protein of the light harvesting (LHC) protein family
• Copy the sequence and paste its first line into the sequence field at the FASTA web page: http://www.ebi.ac.uk/fasta33/index.html
• Perform the search using the default parameters
• Select a more distant match with a Tobacco protein by checking the box on the left
• Click ‘Show Alignments’ and interpret the results (note the Z-score, the E value and the percentage of identity)
• Go back and click on the link of the entry
• Display the entry in FASTA format and copy the sequence
• Paste the first line of the sequence into a newly opened search field, select Gap open = -2 and Lower expectation value = 0.001 (this looses the search restrictions and limits displayed results to more badly aligning sequences)
• Run the search and then repeat the last steps to find more distant protein family members
• Save the found Tobacco sequences of all searches in a text file (not more than 10) Next we’re going to use the blastx program and the protein database to compare a translated DNA sequence to a protein database.
• Open the file seq2.txt from the download page and copy and paste the sequence into the blastx search field (go to http://www.ncbi.nlm.nih.gov/BLAST/ and choose blastx from the ‚Translated’ area)
• Perform the search and click on the link of the first sequence to show the entry
• Display the entry in FASTA format, copy the sequence and perform a proteinprotein search
• Save all protein sequences into a text file
• Repeat the last search with the first and second sequence which do not belong to an unknown protein
Contact Jitendra Narayan
|
|