COSMOS Summer 2006
Lab Exercise #5: Using Your Programs to Process Real Data
Today's exercise
In the past couple of weeks, we've written a variety of programs that process DNA sequences in different ways: counting nucleotides, calculating reverse complements, finding candidate genes, and so on. Unfortunately, it hasn't been possible for us to run them on any real data, such as the data available from GenBank. Today, using the component we built in class yesterday for reading DNA or amino acid sequences from data stored in FASTA format, I'd like you to enable some of your previous programs to work on real data. Be sure to compare your results with one another, as it's one way to know whether your results are correct.
Useful functions
In the last two lectures, we learned about functions, which allow you to write pieces of a program separately, then assemble the pieces into complete programs. Yesterday, we wrote a function to read sequence data from a FASTA file; below is a link to that function.
Previous solutions of yours — or the "official" solutions that I provided after each lab session — will also be useful today, though you may need to make functions out of some of them.
Useful data
I visited the GenBank web site and downloaded some real DNA data for us to experiment with today. I took the liberty of copying and pasting the data into files for you, so that it can be used by your program. Links to the data files follow.
Save these files on your desktop as you prepare to work today.
Today's programs
Today, I'd like you to assemble a few programs out of pieces that we've already written previously. The objective is to make some of our previous programs work with real data, such as the data that I downloaded from GenBank. You may need to write some code for each, but the difficult part has mostly been done in previous labs; today, it will mostly be necessary to assemble existing parts, with a couple of modifications to some of them so that they'll fit together.
Today's programs are:
Run each of your programs on the provided FASTA data from GenBank. Be sure to compare your results with others' results.
Finished early? (Sorry, no games or videos today)
If you're done with the assignment, I'd prefer that you spend the remaining time on productive tasks. In particular, it would be good for you to spend time working on your projects, which will need to be done sooner than it seems.