How DNA Works: Structure, Replication, and the Genetic Code

What Is DNA?

Deoxyribonucleic acid (DNA) is the molecule that carries the genetic instructions for the development, functioning, growth, and reproduction of all known living organisms and many viruses. Found in the nucleus of virtually every cell in the human body, DNA functions as a blueprint — encoding information that tells cells which proteins to build, when to build them, and in what quantities.

The human genome — the complete set of genetic instructions in a human cell — consists of approximately 3.2 billion base pairs organized into 23 pairs of chromosomes. If the DNA in a single human cell were uncoiled and stretched end to end, it would measure approximately 2 meters in length, yet it is packed into a nucleus roughly 6 micrometers in diameter through elaborate protein-based compaction.

The Double Helix Structure

DNA's three-dimensional structure was determined in 1953 by James Watson and Francis Crick, building on X-ray crystallography data produced by Rosalind Franklin and Raymond Gosling. The structure — the double helix — resembles a twisted ladder:

The sides of the ladder are made of alternating sugar (deoxyribose) and phosphate groups, forming the backbone of each DNA strand.
The rungs of the ladder are pairs of nitrogenous bases connected by hydrogen bonds in the interior of the helix.

There are four nitrogenous bases in DNA: adenine (A), thymine (T), guanine (G), and cytosine (C). Base pairing is specific and complementary: adenine always pairs with thymine (A–T), and guanine always pairs with cytosine (G–C). This complementarity is the molecular basis for DNA replication and gene expression.

The two strands of the helix run antiparallel to each other — one runs from 5' (five-prime) to 3' (three-prime) end, while the other runs 3' to 5'.

The Genetic Code

DNA carries information through the sequence of its four bases — A, T, G, C — along each strand. This sequence constitutes the genetic code. The information is read in groups of three consecutive bases called codons. Each codon specifies a particular amino acid or a start/stop signal for protein synthesis. Since there are 4³ = 64 possible codons but only 20 standard amino acids, the code is redundant — most amino acids are specified by more than one codon.

Codon (mRNA)	Amino Acid	Note
AUG	Methionine	Start codon
UUU / UUC	Phenylalanine	—
GGU / GGC / GGA / GGG	Glycine	4-fold degenerate
UAA / UAG / UGA	None	Stop codons

The genetic code is nearly universal — almost every organism from bacteria to humans uses the same codon assignments — strong evidence for the common ancestry of all life.

DNA Replication

Before a cell divides, it must duplicate its entire genome so that each daughter cell receives a complete copy. This process — DNA replication — is semi-conservative: each of the two resulting double helices contains one original strand and one newly synthesized strand.

The key steps are:

Unwinding: The enzyme helicase unwinds and separates the two strands of the double helix at multiple points called origins of replication, creating replication forks.
Priming: Because DNA polymerase can only extend an existing strand (not start synthesis from scratch), short RNA sequences called primers are synthesized by primase to provide a starting point.
Synthesis: DNA polymerase III (in prokaryotes) or DNA polymerases δ and ε (in eukaryotes) read the template strand in the 3' to 5' direction and synthesize a new complementary strand in the 5' to 3' direction. One strand (the leading strand) is synthesized continuously; the other (the lagging strand) is synthesized in short fragments called Okazaki fragments.
Editing and ligation: DNA polymerase has proofreading activity that removes incorrectly inserted bases. DNA ligase joins Okazaki fragments. The overall error rate is approximately 1 mistake per billion base pairs copied, due to these correction mechanisms.

From Gene to Protein: Gene Expression

The information stored in DNA is used to build proteins through a two-stage process: transcription and translation.

Transcription

In transcription, the enzyme RNA polymerase reads a gene's template strand and produces a complementary messenger RNA (mRNA) molecule. In eukaryotes (cells with a nucleus), the pre-mRNA undergoes processing — a 5' cap and poly-A tail are added, and non-coding sequences (introns) are spliced out — before the mature mRNA is exported from the nucleus.

Translation

In translation, ribosomes read the mRNA sequence in triplets and assemble a chain of amino acids in the corresponding order. Transfer RNA (tRNA) molecules act as adapters — each tRNA carries a specific amino acid and has an anticodon that base-pairs with the corresponding mRNA codon. The ribosome moves along the mRNA codon by codon, until it reaches a stop codon, at which point the completed polypeptide chain is released and folds into its functional three-dimensional protein structure.

This flow of information — DNA → RNA → Protein — is known as the central dogma of molecular biology, first articulated by Francis Crick in 1958.

Chromosomes and the Human Genome

Feature	Human Genome Statistics
Total base pairs	~3.2 billion
Number of chromosomes	46 (23 pairs) in somatic cells
Number of protein-coding genes	~20,000–25,000
Protein-coding fraction of genome	~1.5%
Repetitive sequences	~50% of genome
Known functional non-coding elements	~80% of genome (ENCODE project estimate)

One of the surprises of the Human Genome Project (completed in 2003) was that humans have far fewer protein-coding genes than expected — roughly 20,000–25,000, similar in number to a nematode worm. The complexity of human biology arises not from the number of genes, but from the elaborate regulation of when, where, and how much each gene is expressed.

DNA Mutations and Genetic Variation

Changes in the DNA sequence — mutations — can arise from replication errors, environmental damage (UV radiation, chemical carcinogens), or transposable elements. Most mutations are neutral or occur in non-coding regions with no effect. Some are beneficial; others are harmful.

Types of mutations include:

Point mutation: Single base change. Can be silent (same amino acid), missense (different amino acid), or nonsense (creates a premature stop codon).
Insertion/deletion (indel): Addition or removal of one or more bases, potentially causing a frameshift.
Copy number variation: Duplication or deletion of larger chromosomal segments.
Chromosomal rearrangement: Inversions, translocations between chromosomes.

The natural variation in DNA sequences between individuals — mostly single-nucleotide polymorphisms (SNPs) — underlies differences in traits ranging from eye color to disease susceptibility. Two unrelated humans share approximately 99.9% of their DNA sequence; the 0.1% difference accounts for the vast diversity seen within the human species.

DNA in Medicine and Biotechnology

Understanding DNA has transformed medicine. Applications include:

Genetic testing: Identifying disease-causing mutations (e.g., BRCA1/2, CFTR for cystic fibrosis)
Forensic DNA profiling: Using short tandem repeat (STR) analysis to identify individuals from biological samples
PCR (Polymerase Chain Reaction): Amplifying specific DNA sequences from minute samples — foundational to modern diagnostics, including COVID-19 testing
CRISPR-Cas9 gene editing: Precise targeted modification of DNA sequences in living cells; in clinical trials for sickle cell disease, beta thalassemia, and other genetic disorders
mRNA vaccines: Based on understanding of how genetic information flows from DNA through mRNA to protein

The ability to read, write, and edit DNA sequences at will is one of the defining technological capabilities of the 21st century, with implications spanning medicine, agriculture, and our fundamental understanding of life.