51. The Microprotein Love Story – Letter To The Atheists

According to evolutionary theory, evolution usually happens in small cumulative steps. Each one should give the organism a survival or reproductive advantage, otherwise the step is likely to be mutated away again.

In our protein evolution thought experiment, the small amino acid blocks we could successfully evolve are the cumulative steps, but they are not the final product. The protein we wanted to arrive at is 240 amino acids in length, but each block must be less than 20 amino acids long. At the same time, each block must have its own useful function in the cell, otherwise mutations could destroy it over time.

What use do smaller parts of a protein have? Biologists have several ideas. For example, proteins usually have regions in them called “domains,” typically about 100 amino acids in length. A domain can act somewhat independently, and tends to fold into its own particular shape. Since many proteins use similar domains, evolutionary theorists think that some proteins evolved by the shuffling and stitching together of different domains.

However, domains can’t be the original building blocks of protein evolution, because a domain made up of 100 amino acids, which is 300 letters of DNA, is still far too long to be found by a natural search. Even if we ran an experiment in which we filled the universe with mutating bacteria, this wouldn’t even begin to be enough.

Researchers have also discovered recurring “themes” found in a variety of proteins. These themes vary from 35 to 200 amino acids in length.¹ However, themes can’t be the evolutionary building blocks of proteins either. They are still much larger than the 20 amino acid blocks in our experiment, making them pretty much impossible to evolve from scratch. You would still need to fill the universe with bacteria to have even a slight chance of one evolving.

There are also “microproteins” floating around in the cell, miniature proteins that have useful roles, and small amino acid chains called “peptides” that perform useful biological functions involving hormones and antibiotics. In other words, it’s certainly possible for smaller amino acid sequences to have a function, and biologists speculate that some proteins were therefore “stitched together” from smaller ones by evolution.

What does “stitched together” actually mean? It would be romantic to imagine two shy microproteins bumping into each other in a quiet corner of a cell, falling in love and committing to stay together forever, or at least until the death of the cell.

But this isn’t how evolution works. It’s a game of Nucleotide Shuffle. In other words, a mutation first has to take place in the genome, with a sequence of letters shuffling into a slightly different sequence of letters. When biologists casually say “larger proteins were stitched together out of smaller proteins,” this sounds easy, but it doesn’t explain how the process actually happens. Is there a genetic grandmother in each cell, knitting protein scarves for the cell to wear on those dark winter nights? The idea of “stitching together” also sounds believable, because we understand this metaphor better than we understand the idea of a sequence of letters mutating.

Let’s see how nature could actually stitch together two microproteins. We’ll call the first one Alice and the second one Bob. We will presume, as does evolutionary theory, that Alice and Bob already have useful functions in the cell. They are transcribed and translated separately as little microproteins.

Let’s also assume they’re already close to each other in the genome. However, between them is Carol, a small sequence of nucleotides that doesn’t really do much, except keep Alice and Bob from meeting each other and perhaps enjoying a little cellular wining and dining together. Fortunately, Carol is only ten letters long. If we, or rather nature, could just make Carol disappear, perhaps the cell could treat Alice and Bob as one protein. They could finally be together, and who knows what might happen next?

Incidentally, evolutionary theorists have an elegant mechanism for making Carol disappear, called the “story.” When a story is evoked, any problem can disappear, just like that. Since letters in a DNA sequence have the potential to drop out as a deletion, theorists simply have to tell a story about the ten inconvenient letters being deleted, and their job is done. Carol is eliminated. Alice and Bob finally get to meet up, and the two microproteins can be “stitched together” into one.

Of course, engineers, computer programmers, mathematicians and other scientists might look at this and ask, “Wait a minute, what’s the actual chance of this happening, compared with all of the other things that could happen, such as Carol never going away, and perhaps even getting bigger and more interfering?”

If we factor in all of the other possibilities that could happen, the probability of Alice and Bob actually getting together is pretty low. But this vague language isn’t very helpful. How low is “pretty low”? We can’t know the exact figure, but we can make a rough estimate that will give us an indication of what “pretty low” means.

If Carol is made up of 10 random letters selected out of the four bases A, C, G and T used in DNA, there are about a million possible permutations of these 10 letters.²

However, we must also allow for the possibility that a letter might disappear or a new one might be added, otherwise we’ll never actually get rid of Carol. We don’t want her to change. We want her to go away. But this makes the math much more tricky, and I won’t work it out here. We could create a computer program to mutate one letter in a 10 letter sequence in a series of rounds, and to keep going until the whole sequence was eliminated. Unlike our previous experiment, what counts as a mutation here wouldn’t just be a letter change, but would also include the deletion or addition of a letter. We could then run this program a large number of times, to find the average number of mutations it would take before Carol disappeared altogether.

The answer we get would depend on the probabilities of a letter appearing or disappearing. However, if we had one guaranteed mutation every round, then let’s just say for the sake of argument that it takes an average of a million rounds before Carol gets mutated out of existence. This is not in any way an exact figure, but it’s good enough for our purpose. I’ll show you why in just a moment. We could say it’s a “One In A Million” event.

In reality, a 10 letter sequence in the DNA of a living organism isn’t going to mutate every time it is duplicated. If it did, this would be an incredibly high mutation rate, meaning a stable genome couldn’t be passed on, which would almost certainly result in the death of the offspring.

Therefore, let’s also assume it takes a million copies of “Carol” before one mutation creeps in. This is still high compared to the mutation rate for most organisms, but my aim here is to keep the math as simple as possible. In this case, we can say that a single mutation is also a One In A Million event.

How many trials would we need to run in living organisms, in order to wipe Carol from the face of the genome for the sake of our Alice and Bob love story?

If we were evolving this sequence with a computer program, and one mutation occurred each round, we’ve assumed it might take a million rounds on average to eliminate a 10 letter sequence. But in an organism with a one in a million mutation rate, we’d have to run somewhere around a million trials just to get a single mutation.

Therefore, we’d need to run a million multiplied by a million trials in living organisms – that is, a trillion trials – before our 10 nucleotide sequence called “Carol” had a reasonable chance of being mutated away into oblivion.

I have kept the numbers as simple as possible in this example, because exact numbers don’t really matter here. I’m just trying to show the magnitude of what is needed.³ I assumed the “Carol” sequence was small, just 10 nucleotides in length, and yet it could take a trillion generations of an organism just to get rid of Carol. We could call it a “One In A Trillion” event.

What happens if we were to add another 10 nucleotides to Carol? The number of possible permutations of the “Carol” sequence jumps from about a million to about a trillion, or by six orders of magnitude (i.e. six extra zeros), before we even factor in the probability of a mutation occurring.⁴

In other words, as the “Carol” sequence gets larger, or rather, the further apart our “Alice” and “Bob” microproteins are in the genome, the probability of them ever being stitched together by mutations also shrinks by orders of magnitude – which means it becomes very improbable, very fast.

The simple point I am making here is: when biologists and popular science writers say that “proteins were stitched together by evolution,” it sounds easy. It sounds plausible. If your grandmother can stitch things together, why not evolution?

But when we look at what this actually involves, we see that even using very simplified math, the “stitching together” of two microproteins virtually next to each other in the genome is in the category of a “One In A Trillion” event, and the odds against this happening grow by orders of magnitude the further apart they are.

Think about the implications of this. It means nature first has to evolve the Alice and Bob microproteins, which I have shown is difficult and time consuming, but could theoretically be done if they are small enough, and if we allow enough time. But then, for nature to arrive at the combined AliceBob protein, it would need to run another trillion or more trials, just to stitch the two microproteins together in the first place, assuming they were a mere 10 nucleotides from each other.

This might sound trivial, compared to the countless trillions of trials we talked about in the process of evolving a small block of amino acids, but nature has to go through all these trillion additional trials just to find out if Alice and Bob are even compatible. After all this, they might not even like each other! In that case, Carol would have “died” in vain. Evolution then has to find another microprotein for Alice to hook up with, if it wants to evolve larger proteins through this “stitching together” process.

But if Bob is the only microprotein in the local area of the genome, how is nature going to find Alice another date? The odds of being stitched up with anyone else drops dramatically, the further away the other microprotein is, at least through mutations.

Fortunately, biologists have an answer: magic wands.

1 Nepomnyachiy, Ben-Tal, Kolodny, “Complex evolutionary footprints revealed in an analysis of reused protein segments of diverse lengths”, PNAS, 2017. See also the article “’Protein archaeology’: Understanding how proteins evolve” published by Tel Aviv University on December 17, 2017. 2 There are 4¹⁰ or 1,048,576 permutations. 3 Scientists often have to deal with ridiculously large numbers involving lots of zeros. When a zero is added to a number, they say the number increases by an “order of magnitude,” so the difference between 1 and 10 is one order of magnitude, and the numbers 10 and 10,000 differ by three orders of magnitude. This idea of “orders of magnitude” helps scientists to grasp the relevance of big numbers in relation to other big numbers. 4 There are 4²⁰ or just over 1,000,000,000,000 (one trillion) permutations. A trillion has 12 zeros.

Home | Contents | Previous Chapter | Next Chapter >>>