Bioinformatics Co-ordinators Contacts Bioinformatics Statistics Bioinformatics FAQ Bioinformatics Jokes Bioinformatics Lecture Notes BioinformaticsOnline Members List Bioinformatics Cartoons Contact Us Bioinformatics Scholarship Search Just About Me
BioinformaticsOnline  Home
   
     
 

How to become expert bioinformatician

PhD in Bioinformatics

Bioinformatics Conferences

Bioinformatics Journals

Bioinformatics Lecture Notes

 

BLAST Exercise

Part: One

DNA BLAST

You have just cloned a new 189 bp DNA fragment from a rat liver genomic library and want to analyze it.  A friend has given you the library and you think it was prepared by Sau3A restriction digestion of genomic DNA which was then ligated to a  vector.  You need to analyze this sequence and determine which gene it was derived from.  You need to find out as much as you can concerning this clone using bioinformatic analysis because it is the basis of your PHD thesis which you need to finish soon so that you can accept an amazing job offer by the company GenoRat.  You obtain the sequence of the 189bp insert using an M13 primer (This sequence is found in the Class Sequences with the title Rat Genomic Sequence).  Do different types of BLAST searches of the sequence against the NCBI DNA databases to help you identify the origin/s of the DNA sequence.  Analyze your results accounting for all 189 bp of the sequence, ie you need to know the origin and similarity of all of the sequence, every bp of the 189bp.  There may be parts of the DNA sequence that have different origins.  You may have to do additional BLAST searches with all or part of your sequence.

  1. Identify the origin of all 189 bp of your DNA sequence.  (ie are there any sequences in the database with an exact match to ALL of your insert).  Give your interpretation of the analysis remembering that this is your thesis project and you should get it right.  Describe the searches and databases that you performed for this analysis.  You will be sending the DNA sequence to Genbank and you need to annotate the DNA sequence with important features of the sequence (see any Genbank accession record to get an idea of what to do).  Think about whether your analysis makes sense.  (Things to keep in mind:  your cloning vector, the species of DNA that you started with, potential cloning artifacts). (50 points)
  2. (A) Show the portion of the 189 bp sequence which corresponds to your potential rat genomic sequence (5 pts) and and (B) do a DNA BLAST search of the Genbank database using the rat genomic sequence.  Does the BLAST search detect an mRNA sequence (ie a transcribed sequence – not a genomic sequence)  transcribed from the rat gene using this approach? If so, give the name/locus and accession number of the mRNA sequence (5 pts) and show the % identity and length of the match (5 pts).
  3. Does your rat genomic sequence from 2A above contain protein coding sequences? Describe how you determined this using your new bioinformatic skills? You may not rely upon statements in the databases as to coding - you must determine this yourself. (5pts)
  4. Did your DNA BLAST search in 2B indicate that the rat gene sequence that you cloned was already present in the Genbank nucleotide database? (not mRNA or EST), ie., was there a sequence in the database with a perfect match  (5 pts) If so, identify the accession number and determine whether your cloned rat sequence contains exons and/or introns.  Show the sequence and indicate the various regions by underlining or color. (5pts)
  5. Did the BLAST analysis using the rat genomic sequence from 2B match with a human DNA sequence which is highly related to your rat sequence in the database. (5pts)  If so, what is the accession number, what is the % nucleotide identity, and what is the Evalue of the BLAST alignment between the rat and human sequences? (5pts)
  6. Does your rat sequence in 2A above have a related pseudogene?  (5 pts) If so, indicate the accession number and the E-value and nucleotide identities of the BLAST alignment. (5pts)

 

Obtain the human ACBP mRNA sequence from the ENTREZ database Accession # M15887
 and do a BLAST search against the human genome project sequences (Use the "genome" database, not mRNAs or ESTs)

  1. What significant hits (ie similarities that you think are important and are evolutionarily related to the ACBP sequence) are detected? Describe what your criteria for significance is (10 pts)
  2. Which sequence is the gene from which your ACBP mRNA sequence was derived, give the accession number?  Discuss the differences between the different similarity matches identified in the BLAST search and their relationship to your mRNA. (20 pts)

9.   What is BLAST and what does it allow you to do? (10 pts)

  1. What are the five different BLAST programs? (5pts) Briefly describe them, indicating the type of query sequence and the type of database searched? (10 pts)
  2. Describe a database searching artifact that produces many false and confounding database matches? (5pts)
  3. After running a search why would you see a string of "X"s (or "N"s) in your query sequence that you did not put there? (5pts)
  4. What is the Expect (E) value of a BLAST search? (5pts)
  5. How would you do a BLAST search with a short nucleotide sequence?  Indicate potential problems with such a search. (10 pts)

Part: Two

Protein BLAST and Motifs

  1. Retrieve the Acyl-CoA Binding protein (human) sequence (87 amino acids) from the class sequence link and do a standard Protein BLAST search against the Non-Redundant protein database (IMPORTANT) limited for only the Caenorhabiditis elegans sequences. (In the organism box, start typing “caen” and then choose from the list. - your analysis should only find a restricted number of C. elegans hits).  Using only the initial summary analysis which shows the title and E values for the C.elegans sequences, decide which sequences are homologous to ACBP and which are not. Provide your reasons pro and con.  For the sequences that have “ACBP family member” in the title, do you know how this was determined? (10 pts)
  2. Examine the alignments of all the C.elegans hits resulting from your search.  How many of the alignments reveal homology to ACBP? What is your criteria? (10 pts) 
    Do any of them have what you would consider to be ACBP motifs (scan by eye)?  Do any appear not to have the ACBP motif? If so, state which ones.  (10 pts) 
  3. Take each of the C. elegans in question 2 that you have determined to be related to ACBP and determine whether they have a recognizable ACBP motif (use the BLAST server against the CD database.  How does this compare to your analysis of the sequence alignments? (10 pts) Do any of these proteins have more than one recognizable functional domain?  Indicate what they are (10 pts).  Do the same analysis for the remaining BLAST hits that you determined in question 1 and 2 to not be related to ACBP.  Indicate the recognizable functional domains for these sequences. (10 pts).
  4. Use the ACBP human protein sequence to do a PSI BLAST search of the non-redundant protein database.  The PSI BLAST results will select all protein matches above a selected E-value to be used in determining a new ACBP-specific scoring matrix.  Use the default setting and perform the second iteration of the PSI-BLAST search.  Is there a new C.elegans sequence that has a significant E-value that was not readily detected in the original protein BLAST search? Indicate the E-values for such a hit for both the initial PSI BLAST search and the second iteration search.  How many C.elegans sequences are related, ie homologous, to human ACBP? (10 pts)
  5. Does C. elegans have an ortholog of ACBP? State your reasons and provide your criteria and ID. An Evalue alone is not sufficient. (10 pts).  How many proteins containing ACBP "domains" are there in C. elegans? (10 points) List the proteins with their ID, size and E score compared to human ACBP. (5 points). Your anwer should be consistent with your analyses in 2 and 3.
  6. Provide a short explanation regarding the evolutionary history of the ACBP homologs in C. elegans, discussing gene duplication, conservation of structure and function, and structural similarities between homologs (20 pts).
  7. Determine an amino acid pattern that is conserved in the N-terminal domain of the ACBP orthologs in the linked alignment. What is your pattern? Do a PHI-BLAST search using the human ACBP as the probe, but insert your pattern to be searched.  Limit the search to C. elegans sequences.  How many of the C.elegans ACBP-like sequences contain your pattern?  Is this pattern specific to ACBP orthologs? Discuss. (10 pts).
  8. Extra Credit:  Are there orthologs of all of the C.elegans ACBP family members in humans?  In drosophila? Identify each possible ortholog and provide evidence to support your conclusions (Correct answers will be given from 1 to 50 extra points depending on the excellence of the response.)
  9. Extra Credit:  Link to the “Distance tree of results” (NEW) just below the graph of the BLAST results from your search in question 1.  Discuss the output from the tree clustering with regards to a potential ortholog of the 87aa ACBP that you suggested in Question 4. (20 pts)

 

Contact Jitendra Narayan


 
 
 
 
 
 

 

 
© BioinformaticsOnline.com,2007-09, India, All rights reservedow
Conceptualized & Designed by: Jitendra Narayan Powered by: BCS-InfoSolutions, India