CS 2401 Assignment #3

Due Date: Sunday, September 20, 2009.

Objective: The goal of this assignment is to practice recursion.

Motivation. All the genetic information about every living creature is stored in this creature's DNA. A DNA is a long sequence of "code letters" T, C, A, G; e.g., it may have the form TCAAGTG... Whether a creature will have a tail or blue eyes is determined by parts of the DNA called genes.

In a DNA sequence, there are no blank spaces or other separators; so, how does nature determine when a gene starts and when it ends? One way is by looping. Looping is based on the fact that some letters are attracted to each other: T is attracted to A, and C is attracted to G. So, they easily form pairs: T--A and C--G. Thus, if we have a sequence like AACGT at the beginning of the gene and an "opposite" sequence ACGTT at the end:

```AACGT --------------------------- ACGTT
```
then these code letters will stick to each other: the last letter T of the beginning sequence sticks to the first letter A of the ending sequence, the last-but-one letter G of the beginning sequence sticks to the second letter C of the ending sequence, etc.
```     ------
\    /
T--A
|  |
G--C
|  |
C--G
|  |
A--T
|  |
A--T
```
As a result, the whole gene between the beginning and the ending sequences is encompassed into a loop: In biology, a sequence flanked by such pairs (like AACGT and ACGTT) is called a bio-palindrome. It is important to mention that there is an important difference between these biological "palindromes" and traditional computer science palindromes like acbca:
• in the CS palindrome, the first and the last letters match exactly, while
• in DNA, they are opposite ("attracting") letters.
Because of the biological relevance, the problem of detecting such "bio-palindromes" is important in computational biology.

In this lab, we will analyze a simplified ("toy") version of this problem, when there are no letters between the beginning and the ending sequences. For example, a sequence AACGTACGTT is a bio-palindrome in this sense.

Assignment. Write a recursive program that, given a DNA sequence, checks whether this sequence is a bio-palindrome.

For extra credit:

• In the main assignment, we implicitly assumed that the input is a valid DNA sequence, i.e., that all its symbols are A, T, C, and G. Modify your recursive program in such a way that it also produces an error message if the input is not a valid DNA sequence, i.e., if the input contain any other symbol.
• Make your program not only returns the result "yes" or "no" but also, if the input sequence is a bio-palindrome, make your program draw the corresponding loop (as above).