The chemical DNA is the basis of all life on Earth. The process of evolution has created some mind-boggling systems to record such biological data, replicate it in all living cells, and execute that code in order to create and maintain life. Today I want to give you a glimpse of the amazing facility of DNA.
What is DNA?
DNA is a self-replicating material inside every single cell of your body. It stands for deoxyribonucleic acid and is analogous to a recipe book for cooking up a human being from scratch. Not just any human being, of course, but specifically you, since your DNA is unique to you. With the exception of any clones you forgot to mention. Or twins, which are nature’s clones. But that’s really all.
Inside the recipe book are various chapters which are akin to chromosomes. Humans have 46 chromosomes, half of which come from your mother, and half from your father. When you lay them all out in pairs, you get your karyotype.
Different species have different numbers of chromosomes. For instance, rats have 42 chromosomes. Chickens have 78 chromosomes. And butterflies have really run away with the whole idea and cooked up 380 chromosomes.
Why do “simpler” species have more chromosomes than humans? It turns out that chromosome number isn’t due to the organism’s complexity but the pathway in their evolution. Throughout the generations, chromosomes go through many duplications and mutations, and because the whole process of mutation is random and accidental, nobody’s doing any housekeeping. So after millions of years, we end up with a fair bit of redundancy with unused stretches of DNA. Incidentally, a lot of other junk DNA is due to retroviral insertions but that’s another rabbit hole for another day.
Back to the recipe book. If chromosomes are chapters, then genes are individual recipes in each chapter. While most physical traits (or phenotypes) are a combination of lots of genes (or genotypes), there are a number of single-gene traits, such as whether or not you have long eyelashes, or a chin fissure, or brown eyes. In contrast, traits like intelligence are the result of many complex gene interactions. We’ll look at how genes are expressed in a moment. First let’s take a quick look at how these recipe concepts link together in reality.
How DNA, Chromosomes and Genes Fit Together
Take a moment to visualise the process of genes coiling into the chromosomes which ultimately make up your DNA.
Starting at the smallest scale, we have genes contained in long stretches of the double helix. The helix is then wound around molecules called histones to form neat little packages called nucleosomes. The condensing strand coils and twists further into increasingly large bundles to form supercoiled chromatin fibres. The supercoils fold into loops which wind even further until we have a chromosome. Collectively, all of this is referred to as DNA, with a single molecule of DNA being a chromosome.
This bundling and coiling is an incredibly fine-tuned process, with different coiling styles in different cells, appropriate to local gene expression. In other words, while every cell in your body contains the genes that code for eye colour, the DNA in iris cells is coiled in such a way as to leave the pigment genes in more accessible parts of the chromosome, ready for expression.
This coiling process happens a lot. Every time a cell in your body divides (every few hours or days, depending on the cell’s job) all 46 chromosomes necessarily uncoil into vast stretches of DNA, exposing all the inside ladder rungs to replicate, then coil back up in a highly specific manner to rest as neat little packages once again.
Extreme Close-Up: The Structure of DNA
From now on, we’re going to zoom in and look at DNA very closely indeed. The double helix shape of DNA was deduced by the (then) young scientists Watson and Crick in 1953. Prior to their discovery, no-one knew what shape DNA took. Some thought it was a single strand; others worked on the hypothesis of a triple helix. It was a lucky coincidence that Watson and Crick were handed Rosalind Franklin’s x-ray crystallography images of DNA, and intuited the double-helix shape. To prove their hypothesis, they built an oversized real-world model with ball and stick molecules to demonstrate how all the pieces fit together. It worked perfectly.
Think of DNA as a jigsaw puzzle, except there are only six different shapes and 18 billion pieces overall. The six shapes to consider are the DNA bases (adenine, thymine, guanine and cytosine) and the DNA backbone (sugar rings and phosphate groups). Because of their chemical structure and physical attractions, the bases and backbone are compelled to join together like a ladder. Since molecules are rule-bound, we find this beautiful order emerging out of apparent chaos.
How Bases Create Genes
How do these series of chemical molecules relate to genes? I’m going to borrow Frank Ryan’s analogy from his wonderful book, The Mysterious World of The Human Genome. Imagine yourself in a landscape with a train track stretching out behind you and in front of you, all the way into the distance. The longitudinal rails are like the sugar-phosphate backbone of DNA, while the latitudinal sleepers are like the complementary base pairs (always A=T and G=C).
As you walk along the train track, you can count off the sleepers for miles and miles. The average gene is about 27,000 base pairs long, although some are as long as 2 million base pairs, which is quite a lot of train track. If you were to examine your entire genome (all your DNA) you would have to walk along 3 billion sleepers (or 6 billion bases in total).
Surprisingly, not every sleeper is considered to be coding DNA. In humans, only 2% of our DNA actually codes for proteins, which is the whole purpose of DNA. The other 98% – called non-coding DNA or junk DNA – is a mixed bag of knowns and unknowns. While some has been identified as regulatory DNA (telling the DNA how to replicate itself) and viral DNA (inserted into your germ line throughout ancient history) much of it is still a complete mystery.
Gene Expression: Cooking The Recipe
So DNA is like a cook book. Chromosomes are like individual chapters. And genes are like individual recipes, most of which don’t actually make pie or soup or apple crumble but are relics redundant to your current meal plan. In this analogy, who’s the cook? Who’s mixing the ingredients? And what does it actually make?
This is called gene expression, and it’s happening all the time in your body. Genes are expressed on-demand and there’s a great deal of variety going on during rapid growth phases like embryonic development and puberty. But it’s also essential for bog standard daily living too, such as producing insulin to “unlock” your cells and allow them to take up sugar from your food. Such responses are regulated on a real-time basis, underpinning the life-sustaining nature of your DNA.
Transcription = Photocopying The Recipe
Gene expression takes place in three stages, known as the Central Dogma because it’s so very important in terms of how DNA works.
The first stage is transcription, where the helical DNA unwinds inside the cell nucleus and is copied into a single-stranded molecule called RNA. This is like taking a photocopy of a recipe to work with, because your kitchen is messy and you often set things on fire and you want to keep your very important recipe book super nice and clean.
A large protein molecule called RNA polymerase (the big pink thing) works its way along the DNA helix, teasing apart the two strands with the help of its enzyme pals (not shown) to make room for the RNA strand (the photocopy) to be generated. DNA bases float freely in the surrounding goo and are attracted to their complementary bases without much fuss. It’s all very elegant.
RNA Processing = Customising the Recipe
The second stage is called RNA processing, where the new strand of RNA goes through some modifications. Various little enzyme fellas come along and attach a cap and tail to the strand of RNA, which determine how long it should “live”. Other enzymes (spliceosomes) chop out non-coding base pair sequences of the gene (introns) and leave behind only about 1,200 out of 27,000 base pairs to be expressed (called exons) in a process known as alternative splicing. Here’s a zoomed out diagram to visualise it.
There is a pretty spectacular bit of biology going on here. The end product of gene expression (a protein) depends on which exons are spliced out at the RNA processing stage. It means that a single gene can be expressed in numerous different ways, which is super efficient. It’s ironic then that so much of DNA is non-coding junk, which is super inefficient. Damn you, nature.
Translation = Following The Recipe
The third stage of gene expression is called translation because there is a change of language: from bases to amino acids, which are the building blocks of proteins. The spliced RNA strand (shown here in red) pops out of the cell nucleus and into the cell cytoplasm. There, it attaches to a ribosome (grey blobby), a molecular complex that reads the base pair sequence in groups of three (codons).
With the help of transfer RNA molecules, which have an anti-codon on one end and its amino acid counterpart on the other, the ribosome translates the sequence of codons into amino acids. What emerges is a long purple chain of amino acids, also known as a polypeptide. It’s purple because proteins are always purple in biology textbooks. I don’t know who started that but it’s a nice bit of synaesthesia to help you learn your molecular groupings. Incidentally, if you aren’t using colours in your notes, it’s never too late to start.
Several amino acid chains can fold together to form a protein, the ultimate product of gene expression. Here’s how it would look if the world were made of cartoon. The numbers indicate the timeline of events, and in absolutely no way did I forget to label them.
“Tell me more about the codons!” I hear you scream. And you’d be right. This is a good thing to scream about, if anything is. The relationship between codons and amino acids is detailed by the genetic code, which is universal to all life forms on Earth. For instance, the base sequence C-G-C codes for the amino acid arginine. There is one codon which tells the ribosome to start coding, but that also codes for methionine, and three codons which tell the ribosome it’s reached the end. Here’s the complete code if you’re interested.
You’ll note there are 64 possible base pair combinations (4 x 4 x 4) and only 20 amino acids. So there are multiple codons for the same amino acid, creating a fair bit of redundancy in the genetic code. This may well be a very good thing, however, because it dampens the effects of mutations (a switch from G-T-T to G-T-C still codes for valine, for example) which is good for preventing inherited diseases. Check out my article Evolution 101 for more on how mutations can be good, bad, or neutral.
As amazing as this naturally-occurring complexity may sound, it’s also really cool how scientists have figured all this out and translated it into comprehensible terms that non molecular biologists can understand. This makes me very happy.
What Are Proteins For?
Now we’ve gone from DNA to protein, what’s next?
Proteins are large complex molecules with myriad essential roles in the body. So once the desired protein is cooked up, it’s released by the cell to fulfil its destiny inside the body. That might be haemoglobin bound for a red blood cell, for example, which is what carries oxygen around the body. It’s kinda important.
Sometimes proteins are retained inside the very cell that made them, because cells need proteins too. Some of the elements described above fall into this category, such as the spliceosomes used in RNA processing, or the ribosome used in translation.
All of this takes place at an astonishing rate. A single ribosome can produce dozens of polypeptide chains every second. And there are up to 10 million ribosomes in a typical body cell, enabling for mass translation. A single cell can throw out vast numbers of protein molecules when it needs to, and does so alongside thousands of other cells at once.
And that’s pretty much how DNA works. It’s an extraordinary, complex choreography of biological molecules culminating in the basic normal functioning of a living organism – such as a friendly newt or toad. Isn’t that brilliant?