We’ve established that evolution can be thought of as a game, involving the shuffling of nucleotides in the genome to be passed on to the next generation. This means we can run a thought experiment, to see how many organisms it would take to evolve a new protein within a specific evolutionary timescale.
Let’s set up an imaginary bacterial colony in the laboratory of our mind, and have it run a natural search for a protein that bacteria use today. It doesn’t really matter which specific protein it is. We just want it to evolve a protein to see how feasible this is, and how many bacteria it will take.
We’re using bacteria here because they are single-celled organisms, and they don’t take up much space in our thought laboratory, so we will have room for lots of them. Bacteria are also able to replicate very quickly, so we can get ridiculously large numbers of them very fast.
Now, if we’re going to round off or simplify any numbers in this experiment, we’ll aim to skew them in favor of evolution, so skeptics can’t say we’re making things more difficult for our imaginary colony than it needs to be.
Proteins used by bacteria today are over 250 amino acids long on average. Our goal will be to evolve a protein that is 240 amino acids in length, for two reasons. Firstly, because this is less than the average protein size currently used by bacteria, so it should be easier to evolve. Second, this number can be divided into lots of smaller whole numbers, which will be useful later on.
We can’t tell our colony exactly what to look for in advance, because that would be cheating. On the other hand, we will need to allow a series of intermediate steps, because natural selection is supposed to work through the accumulation of small advantages. Each step gives the organism enough of an advantage that it is better at survival or reproduction somehow, and as a result, the advantage gets spread throughout the population at large.
Amino acids are coded for in the DNA molecule by three nucleotides, which for the sake of simplicity we’ll refer to as “letters,” so we can say that three letters are needed to code for one amino acid. The protein we’re hoping to evolve consists of 240 amino acids, so its DNA sequence would be 720 letters long.
It’s basically impossible for nature to find this exact sequence by performing a brute search, or even a fairly civilized search. There are just far too many possibilities to search through.
The “alphabet” of DNA consists of only four letters: A, C, G and T. If you wanted to write a two letter “word” in the language of DNA, you would have 16 possibilities: AA, AC, AG, AT, CA and so on through to TT. For three letter words, there are 64 variations ranging from AAA to TTT. For four letter words, there are 256 possibilities. Each time we add a letter, the number of possibilities grows fourfold.
This might not sound significant, but it is. As I said a few chapters before, by the time we reach 150 letters in a sequence, which I suppose would make it a “sentence” rather than a “word,” there are more “sentence” variations than there are atoms in the universe.1 Clearly then, a protein that takes up 720 letters of DNA can’t evolve all in one go. This is why we need to break up the process into a series of smaller steps.
Now, to perform this experiment, we will need to specify some things in advance. These can be changed, and they will probably affect the outcome. We will also need to make some initial assumptions, which will also affect the result, but whenever possible we will do our best to skew these assumptions in favor of evolution. We will talk more about these once the results are in. What I’m really doing here is creating a model for testing the evolution of a protein from scratch.
First, we’re going to hire a whole lot of bacteria cells for a long period of time – a billion years. This should be enough time for them to produce something at least mildly interesting. We’ll fire them if they don’t.
Second, we’ll assume each individual bacterium lives only an hour, and then replaces itself with another bacterium. This is a very simplistic version of how a bacterial colony works, but it skews the experiment heavily in favor of evolution. In reality, an individual bacterium can potentially live forever, but if it doesn’t replicate or replace itself, mutations can’t be tested by natural selection.
We will say that each cell, and then its replacement, fills one “slot” in our colony, and therefore the size of the colony – the number of slots – remains fixed over the entire length of our experiment. This is to prevent the colony growing exponentially, which although useful to begin with would quickly cause problems. For example, starting with just two bacteria cells, if the number of cells was allowed to double every hour, within about 270 hours the colony would contain more cells than atoms in the universe.2 Clearly this would make us very irresponsible thought experimenters.
This is why we’ll assume a fixed colony size. Our experiment will run for a billion years, which is about 9 trillion hours, so one cell “slot” in our colony will be host to about 9 trillion bacteria cells over the whole period. If it helps, think of the experiment as being conducted in an imaginary hotel designed exclusively for bacteria. Each bacterium gets its own room. There are only ever a fixed number of rooms, and each hour, every cell in the hotel checks out of its room and its offspring checks in.
The third assumption we’ll make is that there is a mutation rate of about one in every billion base pairs. In other words, when genomes are duplicated and passed on to their replacement an hour later, one in every billion letters contains a mutation across the colony as a whole. In real life, some bacteria have higher mutation rates at times, but this also makes their genomes more unstable.
Fourth, we will allow 12 intermediate steps. What this means is, rather than requiring our bacterial colony to find the desired protein sequence all at once, which we already know is impossible, we will allow the colony to find it in stages, with each stage involving a natural search.
Allowing 12 intermediate steps means we can break up the desired 720 letter DNA sequence into 12 smaller blocks of 60 letters each. Once the colony finds the correct sequence of 60 letters in the first block, we will say that the first of twelve “evolutionary milestones” has been reached on the way to our desired protein, and we will allow the colony to keep that block perfectly intact from then on.
We do this because we’re assuming that when an individual bacterium stumbles on the correct sequence for one block, this gives it some kind of advantage in terms of survival or replication. Each cell in the colony then adopts the block into their own genome, and moves on to finding the correct sequence for the next block.
This is a highly simplified way of imitating what natural selection needs to do, but since we’re creating a model of protein evolution from scratch here, I think it’s a reasonably good way of simulating the idea of small but significant advantages that accumulate over time.
In theory, the experiment could continue until the colony has found the right sequence for all 12 blocks, which means it has successfully evolved our desired protein. However, if it can find the first evolutionary milestone within the required timeframe of a billion years, we don’t really need to continue the experiment. We can just assume the remaining milestones could be reached if we allowed more time or enlarged the colony.
We will also assume that the correct sequence in a 60 letter block must be found perfectly. We can think of each block as the equivalent of a 60 letter combination lock, with each letter being either A, C, G or T. The exact unlock sequence must be found before the colony can move on to the next block, and the only way they can find it is by testing combinations until they hit upon the right one – what we’ve called a “natural search.”
When a bacteria cell replicates, passes on its genome to its replacement, and then conveniently dies, we will call this a “trial.” To keep things simple, we’ll come up with a single probability of a mutation happening in each trial. Since each letter has a one in billion chance of mutating, we’ll just multiply this by the length of the block we’re allowing for, to get the chance of a mutation occurring in one trial. For a 60 length block of letters, there would be roughly a 1 in 17 million chance of a mutation occurring in each trial. The math isn’t exact here, but it’s good enough for our purposes.
We’ll revisit these assumptions after we have conducted our experiment, but now that we have set everything up, the question we want to know is: how many bacteria does it take to find the first 60 letters, the first block, in our desired 720 letter DNA sequence, in the specified timeframe of a billion years?
What we’re trying to find out here is, how easy or difficult is it to evolve a protein under these assumptions? Would we need to fill every atom in the universe with our bacteria, suggesting it is impossible, or could we do it with much more modest numbers that fit within the evolutionary timescale, and that don’t require us to use whole galaxies for our thought experiments? Let’s look at the results.
1 There are an estimated 1080 atoms in the universe, and 4150 or about 1090 possible permutations of 150 letters of DNA code. 2 2269 = 9.486×1080 to 3 decimal places.