The Science Behind the CRISPR Babies Lulu and Nana

Scientist and Bioethicists are in outrage at the moment. Many post about how much they are crying for babies, the babies that scientist Dr. Jiankui He claimed to have genetically edited with CRISPR.

On Sunday Nov. 25th, a story was leaked that Dr. He claimed to have created the first humans to be genetically modified. Dr. He claimed to have utilized CRISPR gene editing to modify the CCR5 gene to make the children HIV resistant. To make a claim like this is crazy, because you know people are going to want some sort of data and proof. The person was a Ph.D. scientist with biological science training and so it seemed legit.

On Tuesday night in California I watched as Dr. He  gave a scientific talk about what he did and presented the data. The data seems really legit and while there are still a few conspiracy theories of possibly fakery online, most people trust what was presented. If one were to fake the data, they probably wouldn’t also talk about possible mistakes they made, which Jiankui He did. Dr. He also seemed pretty calm and knowledgeable during the Q&A. Considering the pressure Dr. He was under, I was kind of impressed.

If true, this is one of the most monumental things in human history. Never before have we purposefully edited DNA inside a human embryo and had it grow to term and become a baby. We are literally entering a time in history where a new species of human has been created. Homo sapiens’s time on Earth is limited.

 

Who Is This Guy?

Jiankui He received his Ph.D from Rice University and spent time at Stanford afterward. Dr. He moved back to China and started working at a university there in Shenzhen in 2012. In 2018 he took a leave of absence from his job. Dr. He claims that he initially did a bunch of research on monkeys and after multiple successes modifying monkey embryos, he wanted to proceed and try his CRISPR editing techniques in humans. He recruited people who might be interested from an HIV support group and decided that his first attempt would be to modify the CCR5 gene to make the humans HIV resistant. Dr. He claims to have paid for all the research out of pocket not under any institution or business.

 

One might dare call him a Biohacker.

 

When HIV tries to enter a cell, it binds to a protein in the membrane of the cell called CCR5. Getting into the cell is important because this allows the HIV to replicate. A naturally occurring human mutation known as CCR5 delta32 makes it so that the HIV cannot bind to the receptor. Dr. He wanted to replicate or make similar mutations to disrupt the HIV binding site and CRISPR would be a great tool to try and do that.

People are asking why CCR5 as the mutation is not preventing a genetic disease it is preventing an infectious disease (HIV) that we have medicines to treat.

First of all, CRISPR is not great at making specific changes. As much as we herald its genome editing capability, CRISPR is actually just really effective in making pseudo-random mutations in genes. Scientists generally call this Non-Homologous End Joining or NHEJ. Usually, these mutations end up “knocking-out” the function of the gene. There are not many genes where making unknown mutations is useful. CCR5 just happens to be one of them. In CCR5 HIV resistance intervention primarily seeks to make any mutation after amino acid 168 to disrupt the loop region where HIV binds. This is easy to target and do.

Second, CCR5 mutations have been well studied including with CRISPR. A search of google scholar shows 14 paper with CRISPR and CCR5 in the title a comparison with a favorite gene target, myostatin, which regulates muscle growth has 20 papers. You can find many papers on CCR5 and CRISPR it may be one of the most studied genes with CRISPR. This makes the CCR5 gene low hanging fruit, i.e. one that would require the least original work to target. If I were doing CRISPR embryo editing I probably would have chosen something similar or the same thing.

Queue  armchair  CRISPR experts coming out of the woodwork also, screaming about how so called off-target effects and CRISPR editing efficiency are terrible, ignoring the fact that Dr. He presented data on all of that and other scientists using the same method to target CCR5 have also. The beauty of this story is that it looks as if the experiment was actually really well thought out and executed with every precaution to protect the embryos and the babies they would become. I personally have never seen a CRISPR study that was so thorough and I sincerely welcome you to post if you have.

No CRISPR isn’t perfect and that is the biggest argument against its use. The mistakes are referred to as “off-target effects”. In adult humans cells that can be a problem because once edited it is hard to go back but also it isn’t such a big deal because adults have lots of cells and so if one dies it doesn’t really harm us. In embryos the story is different. If one cell goes crazy, it eventually develops into many cells inside a human and so the whole human could have a genetic disease. But there is a way around that and this is one of the most fascinating things about the CRISPR babies. You can edit a bunch of embryos and let them grow a little and once they have a couple of cells you can remove one or two for sequencing and it doesn’t harm the embryo. While this doesn’t give an exact picture, it gives a pretty good idea of what is going on. This process is generally termed Preimplantation Genetic Diagnosis (PGD) and can also be done on normal embryos that have not been CRISPRed, but is not widely used. When PGD is used it will probably revolutionize birth as you can basically 23andMe your embryos before you implant them. Anyways, if you do PGD to many CRISPRed embryos you have a good idea what you are up against in terms of off target and bad effects of CRISPR. Modern CRISPR targeting prediction and experimental procedure can help reduce this also, which Dr. He made improvements on.

You can find Dr. He’s talk here.

 

Below are slides taken directly from the talk but are not all the slides. There are many more slides and much more data than what I present below. What I write does not go into all the details as discussing every topic from first principle would take forever but I do my best to go in depth when needed.

 

Dr. He did many experiments on mice and monkeys before he moved to humans

Dr. He’s initial experiments showed that animals born using a specific CRISPR target were born healthy and didn’t have any detectable behaivour problems. Dr. He showed that by modifying the concentrations, timing and number of injections you can increase the efficiency of embryo editing. Previous studies by other researchers on the same target in human cells also showed no off-target effects, as Dr. He reminded viewers in his talk.

Another major problem with editing embryos, besides off-target effects, is what is termed mosaicism meaning two or more different types of cells. This happens when injecting embryos because sometimes all of the cells in the early embryo don’t get edited. For instance, if there are 4 cells total and only 3 of them get edited, that one cell can end up being ~25% of cells in the adult human body, creating a mosaic of different cells. Dr. He addressed this risk directly and performed a double injection early on to reduce the likelihood of mosaicism.

Dr. He and his team looked at Batch to Batch variation. Scientists usually don’t give a shit in experiments like this, because they are often under immense institutional pressure to  publish papers as quickly as possible to bolster their professional reputations (which are tenuously dependent on grants and other mysterious forces in academia) and there aren’t really concrete incentives for  creating a solid technique that is reproducible.

Dr. He then moved from monkey embryos to human cells and then to human embryos. A slide on DNA sequencing amplification methods and why he chose them kind of blew my mind, because most scientists don’t put this much thought into DNA sequencing. It is just assumed that the technique a scientist uses in a published paper is reasonable but you can imagine this is not always the case. We call it the MPU or Minimum Publishable Unit. Scientists try to do the least amount of work they can to get out publications because you would too. For instance, mycoplasma bacterial contamination in cell culture can be found at incidences of 15%-35% and causes artifactual results in studies done on human cell lines in culture but mycoplasma testing is not required before publishing papers on these topics. That’s a lot of potentially error-laden and easily preventable artifacts being published in prestigious journals on the regular and that’s just one of the many ridiculous examples of oversight in the normal production cycle of scientific knowledge. There many, many more examples.

Generally, when doing DNA sequencing on a human or cells we use a reference. That’s because we can only sequence short lengths of DNA at a time and so need to assemble all these short sections together like a puzzle. The reference is like the guides that come with Ikea furniture, it helps you put it together. As we all know though, sometimes your furniture is a little bent or missing a screw and so doesn’t exactly conform to the Ikea guide as you struggle to push the bent piece of metal to fit with the screw.

Dr. He knew using the standard reference human genome everyone uses would be lazy and sloppy. Instead, Dr. He actually sequenced the parents of the child, utilizing that as a reference so it would be much much more accurate. He found 282 sites not present in hg19 (human reference genome 19). Not bad. Certainly worthwhile.  

When sequencing cells from embryos for Preimplantation Genetic Diagnosis of diseases and off-target effects, Dr. He went above and beyond. Not only did his team  present the exact protocol used for DNA amplification before sequencing, but they showed that putative mutations, off-targets, or abnormalities were validated using Sanger sequencing (a more accurate but lower throughput method) so that the results would be ~99.999% accurate-  the standard in a clinical setting.

Dr. He then performed the experiment on 31 non viable and 19 viable human embryos before proceeding with the 2 embryos for the parents.

The genetic sequencing done in the experiment alone is pretty epic.

There is probably no human being alive today who has had more sequencing done on their genes than the two embryos and now babies edited by Dr. He. Normally, a whole genome sequencing (WGS) experiment is performed by sequencing the genome 30 times (30x). Sequencing the genome 30 times gives us a pretty good idea statistically that a mutation *is actually* a mutation, as sequencing error rates for MiSeq can be 0.1%-1%. Dr. He did 30x WGS on the embryos before implantation and sequenced the babies 3 times while they were in the womb (at weeks 12, 19 and 24) looking at cancer genes, off-targets and gene editing. Then when the babies were born, Dr. He’s team did 100x WGS using cord blood.

 

Seriously, this is an amazing work-up of the DNA.

 

From this DNA sequencing no off-target effects or large deletions in the genome of the babies were detected. This is consistent with prior literature on the gRNA CRISPR target.

What Dr. He presented is actually an experiment of epic magnitude. From a technical perspective, his experiment was done carefully and was well thought out. The experiments and data presented leave little room for criticism. If it were to be published in a scientific journal it would be at least 3 papers worth of data.

 

So What Happened To The Babies Genetically?

According to he data presented, the baby with the pseudonym “Nana” had mutations in both copies of her CCR5 gene (remember as humans we have two copies of our genes and chromosomes).  Baby pseudonymised “Lulu” only had a mutation in one copy of her CCR5 gene.

Lulu’s mosaicism has people freaking out because it wasn’t intended. She was supposed to have both copies edited. Mosaicism isn’t the fault of a lazy experimenter but is an issue inherent in DNA sequencing of embryos. In order to sequence the DNA of an embryo, you let it grow until there are enough cells so that removing a few won’t harm the embryo. The problem is that if there are 8 cells in an embryo and you remove and sequence 2 (if you take too many cells the embryo doesn’t grow properly) there is still a chance one of the other 6 might not be edited. There is no known way to solve this problem. You cannot sequence DNA without removing and destroying the cells and you cannot remove all the cells or there won’t be a viable embryo. The testing and batch to batch QC was pretty elaborate and I don’t think people will find a better way to reduce mosaicism any time soon than what Dr. He did.

Still, only having one copy of the gene (heterozygosity) still leads to decreased HIV infection rates, lower viral loads and longer life in those infected. People see this as a mistake but I see it as winning the lottery but only getting to collect half the money. Still amazing.

 

So if NHEJ CRISPR makes random mutations, what mutations were made in the CCR5 gene and how do they affect its function?

Proteins are strings of amino acids that function like molecular machines. The amino acids spontaneously (or sometimes with a little help from other proteins) fold into 3 dimensional structures that allow them to function. CCR5 is a protein that sits in the outside membrane of cells and works as a receptor in the immune system to detect things. The CCR5 gene can be completely removed from mice and has shown to increase some immune function and decreasing others but the mice don’t have any serious defects.

Below is what is termed a multiple sequence alignment (I modified it slightly to make it clearer). This is used to compare multiple protein amino acid sequences to each other. CCR5WT is the version of the CCR5 gene that most of the population has. The region of the CCR5 protein that was effected is highlighted in light blue.

 

CLUSTAL O(1.2.4) multiple sequence alignment

CCR5WT           MDYQVSSPIYDINYYTSEPCQKINVKQIAARLLPPLYSLVFIFGFVGNMLVILILINCKR 60
LuluDelta15      MDYQVSSPIYDINYYTSEPCQKINVKQIAARLLPPLYSLVFIFGFVGNMLVILILINCKR 60
CCR5delta32      MDYQVSSPIYDINYYTSEPCQKINVKQIAARLLPPLYSLVFIFGFVGNMLVILILINCKR 60
NanaInsert       MDYQVSSPIYDINYYTSEPCQKINVKQIAARLLPPLYSLVFIFGFVGNMLVILILINCKR 60
NanaDelta4       MDYQVSSPIYDINYYTSEPCQKINVKQIAARLLPPLYSLVFIFGFVGNMLVILILINCKR 60
                 ************************************************************

CCR5WT           LKSMTDIYLLNLAISDLFFLLTVPFWAHYAAAQWDFGNTMCQLLTGLYFIGFFSGIFFII 120
LuluDelta15      LKSMTDIYLLNLAISDLFFLLTVPFWAHYAAAQWDFGNTMCQLLTGLYFIGFFSGIFFII 120
CCR5delta32      LKSMTDIYLLNLAISDLFFLLTVPFWAHYAAAQWDFGNTMCQLLTGLYFIGFFSGIFFII 120
NanaInsert       LKSMTDIYLLNLAISDLFFLLTVPFWAHYAAAQWDFGNTMCQLLTGLYFIGFFSGIFFII 120
NanaDelta4       LKSMTDIYLLNLAISDLFFLLTVPFWAHYAAAQWDFGNTMCQLLTGLYFIGFFSGIFFII 120
                 ************************************************************

CCR5WT           LLTIDRYLAVVHAVFALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQKEGLHYTCSS 180
LuluDelta15      LLTIDRYLAVVHAVFALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQKEGLHYTCSS 180
CCR5delta32      LLTIDRYLAVVHAVFALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQKEGLHYTCSS 180
NanaInsert       LLTIDRYLAVVHAVFALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQKEGLHYTCSS 180
NanaDelta4       LLTIDRYLAVVHAVFALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQKEGLHYTCSS 180
                 ************************************************************
CCR5WT           HFPYSQYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTLLRCRNEKKRHRAVRLIFTI 240
LuluDelta15      -----QYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTLLRCRNEKKRHRAVRLIFTI 235
CCR5delta32      HFPYIKDSHLGAGPAAACHGHLLGNPKNSASVSK-------------------------- 214
NanaInsert       HFPYKSVSILEEFPDIKDSHLGAGPAAACHGHLLGNPKNSASVSK--------------- 225
NanaDelta4       HFPYSINSGRISRH---------------------------------------------- 194
                                                                             
CCR5WT           MIVYFLFWAPYNIVLLLNTFQEFFGLNNCSSSNRLDQAMQVTETLGMTHCCINPIIYAFV 300
LuluDelta15      MIVYFLFWAPYNIVLLLNTFQEFFGLNNCSSSNRLDQAMQVTETLGMTHCCINPIIYAFV 295
CCR5delta32      ------------------------------------------------------------ 214
NanaInsert       ------------------------------------------------------------ 225
NanaDelta4       ------------------------------------------------------------ 194
                                                                             
CCR5WT           GEKFRNYLLVFFQKHIAKRFCKCCSIFQQEAPERASSVYTRSTGEQEISVGL 352
LuluDelta15      GEKFRNYLLVFFQKHIAKRFCKCCSIFQQEAPERASSVYTRSTGEQEISVGL 347
CCR5delta32      ---------------------------------------------------- 214
NanaInsert       ---------------------------------------------------- 225
NanaDelta4       ---------------------------------------------------- 194

 

There are a few arguments people have with these mutations and the first is that they have never been seen before in humans or animals. While that is true, it doesn’t mean we can’t predict their outcome with high accuracy. Because proteins are so much like little machines, they can be engineered (seriously, check out this video for an idea of what they are like). This is especially true in regards to proteins interacting with each other, like what happens when the HIV protein gp120 tries to interact with the CCR5 protein to gain entry into a cell. People have already figured out where the HIV virus interacts with the CCR5 protein:

Modified from https://www.mdpi.com/1999-4915/2/2/574

 

The two Nana mutations will almost guarantee that HIV will not be able to enter the cells because it is so similar to CCR5 delta32. The CCR5 delta32 mutations removes 32 DNA bases from CCR5 after the red highlighted PY amino acids in the multiple sequence alignment. Because each amino acid is coded for by 3 DNA bases, the sequence is shifted, which results in 30 random amino acids being added to the end of the protein before a DNA stop sequence is randomly coded for. Because CCR5 delta32 stops HIV from being able to enter cells we can assume with extremely high confidence that any CCR5 gene in which there is a string of random amino acids after PY will have the same function. What if these random amino acids do something?

That Is Extremely Improbable.

Understand, proteins not only have a strict requirement for the exact sequence of amino acids being next to each other for an interaction to occur but there are additional constraints  based on how they are arranged in 3 dimensional space. Now protein structure prediction is a whole other thing, but suffice it to say that a random string of amino acids will not form a coherent 3 dimensional structure, just a much as connecting random parts of a car will not result in a car. Sure, there is a probabilistically tiny chance that 1 million monkeys putting together a car for eternity will eventually arrive at the desired result, but this is not within the realm of reasonable.

Ok, so what about the LuluDelta15? This one is a little more complicated. There was a 15 DNA base deletion which didn’t change or ruin the sequence after the PY amino acids much. However, you need to remember that proteins are also 3 dimensional objects that are fitted together like puzzle pieces and so are the interactions between CCR5 and gp120. So if you remove 5 amino acids and shorten the sequence you move everything that comes after it, which disrupts the position of every amino acid after the deletion and the deletion is in the same PY location as CCR5 delta32. But you don’t have to take my word for it. We can actually look at the structure of CCR5.

Proteins are tiny as shit, but using crazy techniques we can actually determine their 3 dimensional structure and fortunately for us the structure of CCR5 is available at the Protein DataBank (PDB) In order to view the structure in 3D space, you can find a link on that site or you can download a program like VMD. Using VMD I highlighted in yellow and pink in the structure the sequence change in Lulu would affect.

The first thing we notice is that the deletion of the residues up to S185 in Lulu is at the boundary of two different structural elements: the protein loop known as ECL-2 in yellow and the alpha helix in pink. Mutations at structural boundaries like this have a high likelihood of disrupting the structure.

 

Let’s do a little trick. What we can actually do is use science and algorithms and all the protein structures and knowledge that exists to predict the 3 dimensional protein structure of Lulu’s CCR5. Using a server called SWISS-MODEL I took Lulu’s sequence and fed it in and what popped out is below. Now, what we can do is overlay it with the normal CCR5 structure and see how they match up. These structure predictions, especially when based on known structures, in our case the wildtype or normal CCR5 structure can be highly accurate.

First, I aligned both structures using the RMSD align tool in VMD from amino acids 20-200. Generally, you skip the amino acids at the beginning and the end because those regions are more unstructured and can cause a strange alignment. The Root Mean Square Deviation (RMSD) between the two structure is ~6Å (angstroms). This is not terrible but it is not good either. Generally, for a mutated protein to function similarly to its wildtype or normal counterpart, it should be under 3Å RMSD, but that’s not always the case.

So I aligned and flipped the structures so we are looking down at the top of it into the binding cleft where gp120 binds. The Lulu and WT or normal CCR5 protein are in the exact same orientation. This is a surface representation with Nitrogen atoms represented as blue and oxygen as red. I highlighted the QFW amino acid residues(see sequence alignment for where they are located) in orange so the structure can be compared between the two and put a yellow box around them. You can immediately tell the binding pocket is not the same structurally so gp120 wouldn’t be able to fit in the same way. The upper left parts look similar to one other, but that is the part of the protein sequence which appears before the Lulu deletion. Basically everything after the deletion is moved. Some reds (negative charge) are where blues (positive charge) should be and blues where reds should be. You can imagine like magnets that opposites attract and likes repel so changing the location of charged amino acids is very likely to inhibit any sort of binding.

What all this tells us, is that the chances that Lulu’s 15 amino acid deletion creates a CCR5 that is still functional is small.

Even though these mutations protect against HIV some scientists are arguing that the mutations could also lead to other problems such as helping other viruses.

Scientists and Bioethicists are citing a very poorly done study that probably wouldn’t make it through peer review in most journals. The study shows that 20 people with one copy (heterozygous) and 3 people with two copies (homozygous) of the CCR5 delta32 mutation have a statistically higher fatality rate compared to 148 individuals without the mutations. Seriously, people are using this paper as an argument that the children might have an increased chance to die of influenza? A post hoc correlation study with 23 people that have similar genetics to Lulu and Nana or in the case of Nana only 3 people with homozygosity? It probably wouldn’t even win a high school science fair. In order to extrapolate to a whole population of people you need a bigger sample size than 23 people and many many more than 3 homozygous individuals.

People are also saying CCR5 delta32 homo and heterozygosity increase chances of West Nile Virus (WNV) Infection. I started look for papers that try and support this claim and found a paper where authors actually write,

“CCR5 deficiency is not a risk factor for WNV infection per se, but it is a risk factor for both early and late clinical manifestations after infection. Thus, CCR5 may function normally to limit disease due to WNV infection in humans.”

If something is per se not a risk factor than it is not a risk factor. I have never even seen such a sentence written in a scientific publication. Science is about arguing with data, not using it to mislead the public because you don’t believe in it.

Do we know enough to do CRISPR experiments on embryos? I think the answer is yes, a properly chosen CRISPR gRNA or experiment and proper screening of embryos using whole genome sequencing can lead to the creation of humans that are safely and effectively genetically modified. Can we avoid mosaicism? Probably not at this time. Unfortunately, we will probably never 100% be able to prevent mosaicism by using CRISPR to modify embryos.

No medicine is 100%, not even routine surgery. Tylenol causes around 100 unintentional deaths each year. As far as we know, based on the data presented, these babies will have nothing wrong with them and have a high probability of having some resistance to HIV.

The big question is what amount of risk is ok and who gets to decide? Every scientist, medical doctor and bio-ethicist I know says that the general public is too stupid to decide themselves whether a treatment is “worth the risk”. Seriously? 

Surveys in many publications show that only around 10%-12% of people are against gene editing to prevent disease.

The view being shown in the media and by scientists is not the view of the world. The gene editing done by Dr. He was well thought out. I challenge people to propose experiments do it better.

You can’t stop embryo editing. There are over 40 countries that don’t regulate medical devices, procedures, or medicine. Most of these countries would probably welcome becoming the home of embryo editing and gene therapies.

How do people propose we stop this? By making it illegal? Drugs are illegal and we even had war against them. But making things illegal doesn’t keep them from happening. It just makes them more dangerous. They went so far as having D.A.R.E. classes when I was a kid to try and indoctrinate us to not do drugs but still lots of people did. We lost the war on drugs and drugs just make you feel good. CRISPR gene editing promises to change humanity as we know it.

Do you really think a government is going to win a war against it?

And why on Earth would they want to?