COSMOS Summer 2006
Lab Exercise #2: Writing a Few Python Programs


Today's exercise

Yesterday in lecture, we wrote a short Python program that counted the number of "A" nucleotides in a DNA sequence. This morning, you'll have the opportunity to write some short Python programs of your own. First, though, I'll need to teach you a couple of new things.


Pair up! (Optional)

As in Tuesday's section, feel free to pair up. I'm making it optional today, but you may find that your experience goes a lot more smoothly if you have someone to bounce ideas off of. If you pair up, work in the same way you did Tuesday: one computer shared between the two of you, one person "driving," the other watching, and the keyboard changing hands at least once every fifteen minutes.


Writing a program outside the interpreter window

One frustration that we faced in lecture yesterday is the problem of retyping a complete program into the interpreter just because there was one mistake in it. This was a minor issue in class, since the program we were writing was so short, but imagine that the program was 8,000 lines long instead of 8; it would be very painful indeed to make changes to an 8,000-line program, if it was necessary to retype it (or even copy-and-paste it) into the interpreter every time! And, besides, we usually want to be able to keep the programs we write, so that we can use them again later; if we type them into the interpreter window, our programs die as soon as we shut down the interpreter.

IDLE provides an easy and effective solution to these problems. It allows you to write a program in a separate window, editing it as much as you'd like and saving it to your hard drive (or an external drive, like a USB drive) periodically. Then, whenever you'd like to run your program, you can send it to the Python interpreter automatically, so that you don't have to type it in.

When you want to start writing a program, first start IDLE. Then, go to the "File" menu and select "New Window." A new, empty window, initially called "Untitled," will pop up as a result. (I'll call this window the "editor window," since it's where you'll be editing your program.) In this editor window, you can write a Python program. Along the way, you should save it periodically, which you can do by going to your new window's "File" menu and selecting "Save".

Try typing in this short program from yesterday's lecture. Type it; don't copy and paste it. I'd like you to see how the editor window behaves as you move from one line to the next. Notice, in particular, how it automatically indents lines for you when they ought to be. (Note that indentation is not just used to make programs pretty in Python; it's actually a requirement of the language and affects a program's meaning.)

    # This program prints the number of A's in a DNA sequence.

    dnaSequence = "atcgatgagagctagcgata"

    aCount = 0
    
    for c in dnaSequence:
        if c == 'a':
            aCount = aCount + 1

    print "The number of A's is", aCount

(The line that begins with a "#" character is called a "comment." It's a note that you write for the benefit of yourself or others who read your program, but that Python will purposefully ignore. You can put comments anywhere you'd like; once you type "#" on a line, the rest of that line will be ignored by Python. It might be good for you to write a brief comment at the top of each of your programs explaining its purpose, because you may need to refer to your programs again later in the course.)

When you're done typing the program, you'll want to run it. Go to your editor window's "Run" menu and select "Run Module." Over in the interpreter window, you'll see two things happen. First, the word "RESTART" will have appeared, indicating that the interpreter has just (intentionally) forgotten everything it knew, as though you'd closed it and restarted it, with all variables losing their values. Second, the program's output (which, in this case, should be 7) is displayed.

So, if you want to count the number of "A"s in several DNA sequences, you can now do so by pasting the sequence into your program and re-running it. In general, it's a good idea to try your programs with several sets of test input, to be sure that there aren't any mistakes. It's usually good to include not only a "typical" case, but also more peculiar ones. In this case, peculiar cases might include a sequence with nothing but "A" nucleotides, a sequence with no "A" nucleotides, or an empty sequence.

Try re-running the program with each of the following sequences and see if the program comes up with the right counts.

    ttagcgataatagcgctaa
    aaaaaaaaa
    tcgcgctgcgcctc

Also, try it with the empty sequence, which can be represented in our program as an empty string (two double-quotes with nothing in between).


Writing some programs of your own

Yesterday in lecture, we learned a variety of building blocks that we can use to construct programs. Today, I'd like you to experiment with putting them together in ways that we didn't in lecture.

Remember, as you work, to consider first how you might solve the problems on paper. Slow yourself down, like we did in lecture; discuss with your partner, or consider on your own, exactly how you would solve the problem on paper, step by step. Then translate your solution to a Python program, using the building blocks we learned previously.

As always, the course staff is more than happy to help at any point. Don't expect all of these to be easy, necessarily, and don't expect that you'll necessarily finish all of them during the lab period. If you and your partner are stuck on something, don't hesitate to ask.

Today's problems

In each of these programs, include the program's input as part of the program, as I did in the example above when I included this line at the top:

    dnaSequence = "atcgatgagagctagcgata"

(Soon, we'll learn ways for our program to get its input from other places, such as a human user or a file.)

  1. Write a program that takes a DNA sequence that may or may not have "gaps" in it, printing "There is a gap in this sequence" if there is one, or "There is not a gap in this sequence" if there isn't. A gap — a portion of the sequence that is unknown — is represented in a DNA sequence by a "-" (dash) character.
  2. Write a program that counts and prints the number of "gaps" in a DNA sequence.
  3. Write a program that counts and prints the number of "A"s, "C"s, "T"s, and "G"s in a DNA sequence.
  4. Write a program that prints the percentage of the nucleotides in a DNA sequence are, respectively, "A"s, "C"s, "T"s, and "G"s. For example, in the sequence "actgactg", there are 8 nucleotides, with 2 A's, 2 C's, 2 T's, and 2 G's. So, your program would say that 25% of the nucleotides are A's, 25% are C's, 25% are T's, and 25% are G's.
  5. Write a program that takes a DNA sequence and reports whether the subsequence "atg" appears somewhere within it. Recall that "atg" is a marker that often indicates the beginning of a gene, so this technique will be of use later when we look for candidate genes.

Save each of your programs in a separate file on your USB drive — or, if you don't have a USB drive, try emailing them to yourself — so that you can refer to them again during a later lab session if you need to.

Note that some of these programs will require bits and pieces that we haven't learned yet. If you know what you want to do, but are unsure how to do it, feel free to ask us, and we'll be glad to help.

Be sure to test each program with several inputs before concluding that it's correct.

Enjoy!