Genetic engineering usually starts with a single chassis, which contains a genome selected based on the past experience of the organisation or team undertaking the strain development. It is likely to already contain engineered or modified genes. Thereafter, the team uses its judgement and experience to engineer variants of a finite number of genes to optimise the strain.
When considered in the context of the three observed realities of evolution, as listed above, the limits of genetic engineering in seeking to develop an optimum strain for a given product become clear.
Gene engineering uses only a fraction of the genetic diversity
Irrespective of the selection made, the starting genomic chassis will contain only a fraction of the genetic diversity possible. The genes selected for adjustment include only a small number of the possible alternatives. The type of variation made to each gene is also a selection made from many potential choices.
More genetic combinations that atoms in the universe
On average, there are over a hundred thousand base differences between the genomes of two parental baker’s yeast strains used for QTL optimisation, an average of about 1 in every 120 base pairs. Based on this observed reality, the number of alternative genotypes in a typical yeast genome with 6,000 genes exceeds the number of atoms in the Universe. Consequently, seeking to establish the optimum phenotype by systematically experimentally working through the alternatives is simply not feasible, even with the AI of the future.
In practice, this means approximately a thousand progeny must be screened if ten factors are irresponsible for in the phenotype, a million for twenty factors or a billion for thirty factors, respectively. Given that there are rarely more than thirty factors segregating and frequently less for any phenotype, it is practical to identify the causative QTLs for complex genomic traits using populations containing a billion progeny, which are easily generated using baker’s yeast breeding methods. Screening strategies vary, e.g. to isolate strains capable of growing at a higher fermentation temperature that is better suited to folding the polypeptide chain of interest, a temperature screen is possible using billions of progeny. For other phenotypes, flow cytometry with fluorescently tagged proteins or selecting individual progeny for analysis of untagged products is effective, including multiple rounds and/or combinations of phenotypic screens leading to manageable numbers of candidates for product characterisation and scale-up.
Evolutionary adaptation adjusts multiple bases in multiple genes simultaneously, and the combinations giving improved phenotypes do not occur in obvious combinations. The probability of finding this ‘secret sauce’ of optimum adaptation of a single chassis through heuristic processes is minuscule, irrespective of the engineering team’s skill and experience.
The genetic diversity accessible through standard single genome chassis of traditional host cells, combined with the engineering of such a chassis is analogous to the volume of a garden pea. The genetic diversity accessible through Phenotypeca libraries in comparison, is larger than the volume of our Galaxy.
Multiple gene engineering generates ‘sick’ strains
A further issue is that genetic engineering on already engineered strains may lead to adverse phenotypes, e.g. ‘sick’ strains with poor growth or increased fragility, which are unsuitable for commercial scaling. This reflects the hidden inter-relations within the genome whereby modifications of genes to improve one phenotype, e.g. product yield, can have a deleterious impact on the cell, e.g. increased lysis during centrifugation leading to a higher HCP burden on downstream processing. These inter-relations can also be dynamic, meaning they exist in certain conditions but not in others; they can also affect different phenotypes in opposite ways, improving one beneficially while negatively impacting a different phenotype.
Millions of Years of Evolution
Baker’s yeast is believed to have existed for over a hundred million years, since the mid-Mesozoic era and late Cretaceous period when flowering plants and fruiting trees first evolved. This has exposed this species to extremely varied environments across the planet, leading to massive diversity in genomes, even for similar environments in different locations. The parent genomes are the starting point in any evolutionary event, giving rise to a population or set of diverse progeny from breeding. Diversity, therefore, is extremely important, as the greater the relevant diversity, the more scope there is for evolving an ideal solution for the challenges being faced.
For strain development for biologics manufacturing, such diversity is not about the random scattering of genomic traits as widely as possible but rather about ensuring the parent strains have as broad a spectrum of relevant diversity for the project concerned. Knowledge of the genomes in the context of what environments the diverse strains have evolved in globally, together with quantitative trait loci (the location and nature of variants in the genome), plus the natural screening of survival of the fittest, provides the basis to create such relevant genetic diversity.
