Tales from the Genome – 1

Human beings are complex multicellular organisms where as Amoeba is a single celled organism. Do you know who has more genetic material in them? The answer is surprising; Amoeba has 670 billion units of DNA compared to 3 billion units in humans. What do we learn from this? It is not the quantity but the sequence (order) of DNA that matters when it comes to determining the complexity in organisms. Recently I took Tales from the Genome course from Udacity and I throughly enjoyed it. I will be writing several posts on the human genome and this is the first one.

Before the 1900s some people died and some lives were saved because of blood transfusions. Karl Landsteiner, an Austrian biologist and physician wanted to find out the reason behind this. In 1900 he discovered that human blood has four principal types A, B, AB, and O categorized into ABO group. A and B are called as antigens. What does an antigen mean? How do they get produced? In order to answer these questions we need to look inside our cell.

Inside a cell

We are made up of trillions of cells. Every cell (1) is protected by a cell membrane (2) contains a gel like substance called as cytoplasm enclosed with in the cell membranes (3) contains ribosomes floating in the cytoplasm  (4) contains an inner core called as nucleus.


Inside the nucleus there is a super molecule called as DNA. This molecule is responsible for creating all living organisms that we see in our planet. DNA is the acronym for Deoxyribonucleic Acid. Let us break down the name into chunks so that we can understand it better.

Deoxyribo - Ribose sugar with a missing oxygen.
Nucleic   - Nitrogen.
Acid      - Phosphorus which is acidic in nature.


DNA is made up of chains of deoxynucleotides. In the above diagram the parts indicated by 1 to 4 is one deoxynucleotide. There are four types of deoxynucleotides each representing a nitrogenous base – Adenine, Guanine, Cytosine, and Thymine and for the ease of use, we refer to them by A, G, C, and T letters. The structure of DNA is a double helix in which two single strands of deoxynucleotides are twisted together to form a helix. Each nitrogenous base from one strand base pairs with another. Adenine and Thymine pair with each other. Guanine and Cytosine pair with each other. Reading the nucleotides from a single strand we can derive the contents of the other strand by using the base pairing rule.


If we want to write the letters from a single strand of human genome then how many books do we need? Human genome contains 3 billion base pairs. This means a single strand contains 3 billion letters. Let us assume that a book contains 300,000 words and each word on average has 5 letters. Hence a book will contain 1,500,000 letters. To write 3 billion letters (information from a single strand) we need 2,000 books. By reading 2,000 books we can tell the entire make up of that human being.

Length of a single base pair is 0.34 nanometer.  What is the length of 3 billion base pairs? It comes to 1 meter (0.34 * 10-9 * 3 * 109). We get 3 billion base pairs from our biological mother and another 3 billion from our biological father. Hence we have 2 meters long base pairs inside each cell. How can a cell which is measured in micrometers store 2 meters long base pairs?


DNA is organized into histone proteins by compaction. These histone proteins are organized into chromosomes. By compacting and organizing itself in chromosomes, two meters long DNA fits nicely inside the cell’s nucleus. If this is hard to believe think of singularity in big bang. The entire universe came from an infinitesimally small dot.


Thus the DNA that we get from our biological mother and father is organized into 23 pairs of chromosomes. The first 22 pairs are called as autosomes. The 23rd pair determines if we are going to be a male or female and they are called as sex chromosomes. This makes you, me, and all living organisms that we see. Life is chemistry that crawls.

Ribosomes meets mRNA

Floating in the cytoplasm of a cell there is a molecule called as ribosomes. They are the machines that produce proteins which makes the cells to go around. There are several types of proteins and each cell knows what type of protein it needs to make. How does the cell know what type it needs to make? This information is in the DNA. But wait the DNA is present inside the nucleus and ribosome floats outside the nucleus. How does it access this information from DNA? This is where mRNA comes in.

mRNA stands for messenger Ribonucleic acid. What is the difference between DNA and RNA? There are two differences (1) The oxygen that is missing in the DNA is present in RNA. Hence RNA does not have the prefix deoxy (2) Instead of Thymine nitrogenous base it has Uracil nitrogenous base.


Each cell does not need all the information from the DNA. It needs only some portions of it. This portion is copied and put outside the nucleus so that the ribosomes can access it. This copied information is called as mRNA. If you think of an entire book as DNA then xeroxing few pages from that book is RNA. This process of copying information from DNA to RNA is called as transcription. Let us do a simple transcription from DNA to RNA.

DNA: ... A T A G C T A C ...
RNA:     A U A G C U A C [Thymine in DNA is changed to Uracil in RNA]

Ribosomes reads the nucleotides from RNA in sets of three to create proteins. This process is called as translation. Sequence of 3 nucleotides it reads are called as codons. Depending on the sets of nucleotides the ribosomes reads it creates a corresponding protein (amino acids). There are twenty amino acids that it can create. Given below is the table which shows all kinds of proteins created by the ribosome.



Let us understand this with an example. Imagine that the ribosome sees the following nucleotides in the RNA: UUU AUG UUC UAU UGA UGG

In order to start the translation the ribosomes should encounter the START codon AUG (refer to the codon table). It translates this to protein Methionine (Met). It ignores UUU as it was present before the START codon. It then translates UUC to Phenylalanine (Phe). It then translates UAU to Tyrosine (Tyr). It encounters UGA which is a STOP codon. It means ribosomes will stop further translations. Thus by translation, ribosome was able to convert RNA to amino acids.

[ UUU AUG UUC UAU UGA UGG => Methionine Phenylalanine Tyrosine ]

This process of transcribing DNA to RNA and translating RNA to Proteins is called as the Central Dogma and it is found in all life forms.

Back to blood types

One kind of cell type in our body is red blood cell. On the surface of each red blood cell there is a carbohydrate molecule. This molecule is modified by adding an antigen of type A or B or no antigen (type O). Who does the modification? It is done by a protein called as ABO protein. Where did this protein come from? You guessed it right. The ribosome in the red blood cell translates the RNA sequence to produce this protein.

In chromosome 9 there is a region of DNA called as the ABO gene. What is a gene? A DNA sequence that codes for something; typically a protein is called as a gene. Zoom into the ABO gene and look at deoxynucleotides from position 793 to 801.

In some people you will find the sequence TAC TTG GGG [transcribes to RNA] UAC UUG GGG [translates to protein] Tyrosine Leucine Glycine. Presence of leucine protein creates type A antigen and it results in blood type A.

For some other people you will find the sequence TAC ATG GGG [transcribes to RNA] UAC AUG GGG [translates to protein] Tyrosine Methionine Glycine. Presence of methionine protein creates type B antigen and it results in blood type B.

For some people position 258 will not have any nucleotide. Their DNA sequence will be GTC CTC GT GTG ACC CCT TGG. Position 258 is indicated by  and because of this deletion frame shifting (read the next non deleted position) takes place during transcription. This DNA region [ transcribes to RNA ] GUC CUC GUG UGA. As we can see from the codon table UGA signals STOP and protein translation stops prematurely. Hence a non functioning ABO protein gets created and it does not produce any antigen and it results in blood type O.

Why some people have blood type AB? We have two copies of ABO gene one from biological mother and the other from biological father. These copies are called as alleles. If one of the allele codes to blood type A and the other codes to blood type B then we end up getting blood type AB.

Antigens or antibody generator is responsible for making the immune system to produce antibodies. These antibodies attack antigens that are foreign (unknown) to it. A person with blood type A antigen will have Antibody anti-B. This antibody anti-B will attack antigen B. This is the reason why a person with blood type A can only accept blood type A or O (no antigens). Blood type O is a universal donor as it does not contain any antigens and hence they can be accepted by all blood types. Blood type AB is an universal acceptor as it contains both the antigens and hence it can receive any blood type. For more information about ABO types read this article. If you want to find out the probability of your blood type from your parent’s go here.


If you want to educate yourself on human genome take the Udacity course Tales from the Genome. If you have more time pick up the book Genome: The Autobiography of a Species in 23 Chapters. It is time well spent as it helps you to avoid a lot of mythical nonsense.

2 thoughts on “Tales from the Genome – 1

    • Prashanth,

      You are pulling my legs but I am willing to take this pull 🙂


Comments are closed.