The protein we attempted to evolve in our earlier thought experiment faces another big challenge, which was also hinted at in the story of the freezing codfish: how to get the protein to a location where it will actually be useful.
Proteins are made in the main body of the cell, called the “cytoplasm” or “cytosol.” 1 But they often need to get to more specific locations inside or outside the cell. How can they do this? The answer is, there is a postal system and transportation network available for them.
Incidentally, there are two main types of cell found in nature. The one is called “prokaryotic” and the other is “eukaryotic.” I will explain the differences in the next chapter. The cells of plants, animals and humans are eukaryotic, and these are what I will focus on here.
A newly manufactured protein usually comes with one or more address labels in the form of amino acid sequences. Sorting by the postal system starts when a ribosome begins making a protein. As the protein emerges, another machine checks to see if it has a certain type of address label that biologists call a “signal peptide.” If it doesn’t, the protein continues to be made in the cytoplasm. If it has a different type of address label, it can be shipped to other places, such as the nucleus. However, without an address label the protein stays in the cytoplasm.
If a signal peptide is found at the start of the emerging protein chain, both the protein and the ribosome producing it are whisked away to a part of the cell called the “endoplasmic reticulum” (ER), which is like a mail processing center. Once there, the ribosome continues to make the protein, feeding it into the ER. Then the address label can be removed, the protein folds into shape, and certain bits and pieces are added to it, depending on its final destination, to help with stability.
Most of these proteins are then transported to the “Golgi apparatus,” the main sorting center for the protein postal system. Tiny transport molecules called “vesicles” act like a taxi service, carrying the proteins inside, traveling along filaments pulled by motor proteins until they arrive at the Golgi apparatus. Once there, proteins may get modified some more, before being sent on to their final destinations, depending on the address label. If they don’t have any specific destination tags, they can be secreted out of the cell.
There are various labeling systems in use by the cell. For a protein meant for the nucleus, the address label is around 6 to 20 amino acids long, and can be anywhere in the protein. One destined for the “chloroplast,” a miniature organ in plant cells for converting light into chemical energy, has a sequence 40 to 50 amino acids long at the beginning of the protein chain. Proteins bound for the ER processing center have a 16-30 amino acid signal peptide at the beginning. Some also have a second sequence, to indicate their destiny within a smaller compartment of one of the cell’s miniature organs.
Now, the whole system presents an interesting challenge for a newly evolving gene. How does it get an address label in the first place? Without one, the protein it codes for will be stuck in the cytoplasm, assuming it is even produced at all. The answer depends on how it supposedly evolved. There are four main theories about protein evolution.
The first is de novo evolution, where the protein evolves from scratch, as in the story of the freezing codfish. In that story, the antifreeze gene didn’t have to do any hard work to get its address label. A signal peptide enabling it to be secreted into the codfish’s blood just happened to be lying around, a mere one nucleotide away. Given that the codfish genome is over half a billion base pairs in length, this is either truly astonishing luck, or the evolutionary story is wrong. But how do other proteins that evolve de novo get their address labels? Do they all get just as lucky?
The second supposed method of protein evolution is by the “stitching together” of smaller proteins. Let’s briefly return to the psychodrama of Alice and Bob, our two microproteins. In the latest episode of the drama, they find themselves working usefully in a eukaryotic cell, and as fate would have it, their genes have been mutated next to each other in the genome, so that they can be “stitched together” into one protein.
For a gene to be preserved intact in the genome, it needs to be produced at some point, and be useful to the organism in some way, so that natural selection can preserve it along with the organism.
But how can we be sure the newly combined AliceBob protein has a useful function to the cell? We can’t. Evolutionary theorists simply have to assume the cell produces the new AliceBob protein, and that it has a function to perform which gives the cell a survival or reproduction advantage.
But the more relevant question for us right now is, what address label does the new AliceBob protein have? As individual microproteins, Alice and Bob were performing useful functions in the cell, which means they must have been produced by ribosomes. If they worked outside of the cytoplasm, they must have had their own address labels.
If they both worked at the exact same location and had the exact same address labels, then the new AliceBob protein would probably be posted to the same place. But if their work was in different locations, where would the new protein be sent? If it’s sent to where Alice worked, Bob’s place of work would notice his absence. If it’s sent to where Bob worked, Alice’s place of work loses a valuable employee.
Maybe the new AliceBob protein is sent to both locations. But what is the likelihood that the new much larger protein turning up at both places of work is an improvement, rather than a hindrance? Unfortunately, it’s impossible to calculate the odds here. I will simply suggest that most of the time the combined AliceBob protein would be worse, and may even break the original function.
Either way, evolution faces a much bigger hurdle if the AliceBob protein is to be used in a different location from either Alice or Bob’s place of work, because then a new address label has to evolve or be acquired somehow.
Since address labels are fairly specific, I would suggest it’s close to impossible for AliceBob to evolve one de novo. If we take a short sequence of DNA coding for just 10 amino acids, this is 30 letters of DNA. There are roughly a million trillion variations of this sequence, which is 1 with 18 zeros.2 But since mutations would be in the order of a One In A Million event for a specific sequence like this, the number of trials nature would need to run, just to evolve a specific address label consisting of 10 amino acids, would be something like a trillion trillion, or 1 with 24 zeros. This would be on top of evolving the functional part of the protein.
I suggest this would be far too impractical, and so probably wouldn’t happen. While there may be some flexibility built into the address system, the situation would be closer to a “combination lock” scenario. Without a fairly specific and accurate address label, the AliceBob protein risks being labeled “return to sender” and destroyed. The postal system can’t send it to locations that don’t exist. And if AliceBob isn’t useful, then natural selection can’t save it from being mutated away.
In other words, if the AliceBob protein is to serve anywhere else besides where Alice and Bob originally worked, it will need to acquire a different address label. But there are no clear mechanisms in the cell that give newly evolved proteins new labels.
The third method of protein evolution is by gene duplication, where a gene somehow gets copied and pasted in its entirety, and then the duplicate gradually mutates into a new gene.
This method solves the problem of how it gets an address label, along with tags identifying it as a protein. They are copied from the original gene. Again, this is fine if the new protein works in the exact same location as the old one, but to be delivered somewhere else it needs a different address label. This is the real challenge.
The fourth method of protein evolution assumes a protein has two functions, a primary one and a secondary one. With this method, the secondary function is supposedly freer to evolve than the primary one, and may gradually evolve to become its own protein.
Many proteins do indeed have secondary functions, and even lots of functions. They don’t just do one job. This method of protein evolution has been demonstrated for small proteins where functions can be interchanged because of the similarity of the structures, but whether this method can be extended to all or most larger proteins is a different question.
Either way, the key question for us is, how does a protein evolving through this method acquire its address label? If we call the original protein the parent, and the newly emerging one the daughter, I suppose it would be plausible for the daughter to inherit the same address label as the parent. But for the daughter to be used at a different location in the cell, it would need to acquire a different address label, and there is no biological mechanism that allows for this, except through serendipity wands like “translocation.”
Incidentally, I’m not arguing that these methods of protein evolution are impossible. After all, if they weren’t designed, the existence of a vast library of proteins that organisms have available to them needs to be explained somehow. What I am suggesting is that these methods are a lot more theoretical than biologists like to make out.
My intention here has been to show that there is an additional layer of complexity nature has to deal with, when it comes to evolving a useful protein. It needs an address label to get it somewhere other than the cytoplasm. This make a huge difference to what evolution can achieve.
From an evolutionary point of view, nature has only so much creative bandwidth to play with. Natural selection can only select from what is available at the time, and there are only a finite number of living organisms that have ever existed on Earth, meaning nature can run a fairly vast but still only a limited number of mutational trials.
It’s true that bacteria colonies can be measured in ridiculously large numbers with plenty of zeros in them, but they are still finite. Once we pass about forty zeroes in a number, we exceed the number of bacteria that are assumed to have ever existed on Earth in the evolutionary timescale. We reach a threshold of improbability that is measurable, at least approximately. In other words, if evolving something needs more trials than nature is capable of running, then it probably isn’t going to happen. This is the critical issue we are exploring.
The cellular postal system also poses an interesting riddle. If evolution built it from the ground up, then at some point it must have been simpler than the one used by all eukaryotic cells we know today, with their complex system of address labels and pathways, vesicles as taxis and filaments as highways.
But since the address label for a protein is already written into the genome, the postal system can’t easily evolve a different address system without large numbers of proteins suddenly being sent to the wrong place, which would probably result in chaos and the death of the cell. In other words, the core functions of the postal system must already be in place before a cell can function properly.
This sounds like a chicken and egg problem. Which came first, the postal system or the protein labels used by the postal system? Fortunately, evolutionary theorists have another magic wand to cover dilemmas like this. They call it “co-evolution.” The two evolved at the same time. Inventing a word doesn’t actually explain the phenomenon. It just labels it, making the problem seem to go away, and making it sound more believable to the general public.
Whatever the case, getting a newly evolved protein into a new or different place in the cell is an additional layer of complexity, because not only does the functional part of the sequence have to mutate into something useful, but an address label also has to evolve or be acquired, to get the protein to a place where it can be useful.
1 Technically, the “cytoplasm” is the cell substance between the membrane and the nucleus, while the “cytosol” is the fluid portion of the cytoplasm. 2 430 = 1.153 x 1018, to 3 decimal places.