Fourth Annual DNA Grantees' Workshop
Monday, June 23, 2003
MORNING SESSION
Multiplex Snapshot Assay Panels for Increasing Forensic Discrimination of mtDNA Testing
Thomas J. Parsons
Biography
MR. FRANK: Our next speaker is Dr. Tom Parsons. He received his bachelor's degree from the University of Chicago and his Ph.D. in biochemistry from the University of Washington. He has been working as the chief scientist with the Armed Forces DNA Identification Laboratory in Rockville, Maryland. His research interests include molecular evolution, phylogenetics, and mitochondrial DNA biogeography and avian speciation.
The topic he's going to speak to us about today is multiplex SNaPShot assay panels for increasing forensic discrimination of mitochondrial DNA (mtDNA) testing.
DR. PARSONS: Thanks William. Parsons: Slide 1
My hat goes off to NIJ for having the foresight to bring mitochondrial DNA to a panel of products and prototypes. That's an excellent contribution to the field, and in fact I think mitochondrial DNA is going to become evermore firmly established as a result of this and other efforts.
I also would like to draw attention to my fine group of colleagues at the Armed Forces DNA Identification Laboratory (AFDIL), whom I've listed as coauthors here and I'll refer to throughout the course of the talk.
Here is a brief schematic of the mitochondrial DNA genome. Thankfully, my previous colleagues have alleviated much of the need to go into this in detail, especially considering the savvy audience we have here. But I'll simply point out that mtDNA forensics predominantly concentrates on sequence data from hypervariable region I (HVI) and hypervariable region II (HVII) of the noncoding control region. These sections are amplified and sequenced, constituting the type of an individual, and then compared. Parsons: Slide 2
So why do you go to mitochondrial DNA? I think we all know that this is basically because it's a multicopy element that is recovered with greater efficiency from highly degraded sources. In fact, there might be occasions where you have only a maternal relative to whom you want to compare something, so its maternal inheritance comes in handy. Parsons: Slide 3
There is quite a range of forensic DNA sources that give little or no nuclear DNA intrinsically or are very highly degraded (e.g., the World Trade Center ). Much of the work we do at AFDIL is with degraded skeletal remains; thus, mitochondrial DNA is where we go.
So that's what's nifty about it. What's bad about it is it's not all that discriminatory in terms of forensic exclusions. Because it's maternally inherited, you already know that you're going to match all your matrilineal relatives, and generally all of us have quite a few more matrilineal relatives running around out there than we're aware of. In fact, we're all matrilineal relatives if we go back far enough in time and are able to put up with a few mutations that get sprinkled as this molecule transmits itself from generation to generation. In fact, it's that sprinkling of mutations that allows us to discriminate between one maternal lineage and another. Parsons: Slide 4
It's okay that it's not a unique identifier, but how common are the types? Sadly, some of them are pretty common, and we'll see that that's where we need to focus to increase mitochondrial DNA's capability.
So, getting to the significance of mitochondrial matching. You sequence a sample, you sequence a reference, and they match. So that's good. In interpreting this, you have to keep in mind that mitochondrial DNA is a single linkage group. It's a single locus; therefore, you shouldn't be thinking about multiplying polymorphisms and that type of thing. You simply have to relate the rarity of the type that you're encountering in a particular case to the number of times that type has been seen in the database. Parsons: Slide 5
The good news with mitochondrial DNA is that many of the types, even with large databases of up to 2,000 or so occasions, still don't match anybody in the database. In fact, most types are like that. So that's a pretty good thing.
The problem is in the ones that are more common. They happen, and I'll show you about that in a little bit. As far as increasing the strength of mitochondrial DNA testing for the rare types, the trick is making your database grow. That's another area where we need to focus. Unfortunately, though, the growth is arithmetic there. So if you have a database of 4,000 individuals and you put 1,000 more sequences in, taking the size up to 5,000, you're only five-fourths as good. Even though it's a slow increase, it nevertheless needs to happen.
So what is the result of this? In a hypothetical case, you recover a hair from a crime scene and genetically sequence all of the HVI and HVII, and you get this result where the only two differences from a reference sequence are these two (315.1C and 263G), which happens to be the most common Caucasian type, matching 7 percent of the population. Well, that's not great, but we can tell the jury that in fact the DNA includes the suspect. But if they're alert, they'll notice that it includes a lot of people as well, leaving you with diminished significance. Parsons: Slide 6
Where it gets really bad is if, for example, it would also be consistent with the victim. Then you're basically wasting your time doing all this DNA testing, so it would be very nice to improve this situation.
Mitochondrial DNA is significant in other areas, for example, identifying victims in the World Trade Center attacks, which—through the KADAP (Kinship and Data Analysis Planning Panel)—has become near and dear to many of our hearts, and identifying those missing in action from the Korean War, where we have large numbers of individuals. Mitochondrial DNA is basically the only place you can look. Parsons: Slide 7
In the World Trade Center analysis, it is known that there are a substantial number of samples that don't give STRs (short tandem repeats). Let's imagine that we have a sample that gave only mitochondrial DNA and we have sequenced HVI and HVII. And let's say that some time in the future, if we're very lucky, maybe only 193 people have not been identified from the World Trade Center bombing. You then compare your mitochondrial DNA of these 193 families with somebody missing and what happens? You match four individuals. So even though this wasn't one of the rare of the rarest types—or even the common of commonest types—you still match four families and have no further grounds for identification. So it's a sad thing, and it'd be nice if it was better.
I want to quickly go through this graph regarding the population distribution. The upper left dot indicates that in a population sample of about 1,200 individuals, you have some 750 types that are unique within that sample set. That's the good part. The bad part is over here on this end where you have the most common type occurring in about 10 percent of Caucasians, the second most common type at about 2 percent of Caucasians, and then there are some 20 or so types that are present at 0.5 percent or greater in the population; those are the ones that are really causing us a lot of consternation in mtDNA typing. Parsons: Slide 8
It happens frequently, but one-fifth of the time you're dealing with something like that. So the single biggest limitation of mitochondrial DNA is the low power of discrimination you get with common types about 20 percent of the time. In database or mass-fatality comparisons, you are going to encounter these. Parsons: Slide 9
So what can we do? We don't have to suffer through this because we are only looking at a very small portion of the mitochondrial DNA genome when we do forensic testing. Wouldn't it be nice to go to the coding region, which has all kinds of varieties, and determine additional polymorphic sites that are present? In fact, it's a slower evolving region—almost 10-fold slower—but there's 15 times the amount. So you're in real good shape if you're able to wade through that and find the sites of interest. Parsons: Slide 10
With that said, I want to point out one thing we need to keep in mind. Barbara [Levin] gave us a good lead in that there are a zillion different hereditary diseases, including very serious ones, caused by mutations in mitochondrial DNA, and there are many correlations with other universal type things, like Alzheimer's disease, Parkinson's disease, cancer, and so on. So we have to be very careful in screening the coding regions for sites that may have medical genetic significance. Parsons: Slide 11
So I'm going to argue that we want to develop SNP (single nucleotide polymorphism) assays rather than blindly sequence away and discover something that we didn't want to discover about Mrs. Jones, who is a reference for her missing child. We can do SNP assays at sites that have been determined to be neutral—that is, silent third-position code-on mutations that have no potential phenotypic effect.
We sold NIJ on sequencing the mitochondrial DNA genome from individuals that match common HVI and HVII types. I will tell you at the outset that we knew going to the literature wasn't going to do us any good, because we knew the sites that discriminate among these very closely related individuals were not going to come out of general population screens. We've proven that to be correct.
We simply need to identify those needles in the haystack that separate people by matching HVI and HVII common types, developing SNP assays for them and identifying those that discriminate, creating databases, validating them in forensic casework samples, and so on. Parsons: Slide 12
We've done a lot of mitochondrial DNA sequencing (the entire genome, in fact). Fortunately we have robotics that makes this a great deal less painful. We also have a very strong bioinformatics section that allows us to keep track of all the data, which is really the hardest part. If you're going to do this, you've got to be able to handle the data. Parsons: Slide 13
So our strategy is not to reinvent the wheel with SNPs, as can be done. Instead, we should take the field as it exists and maximize the discrimination by combining HVI and HVII sequencing information or linear arrays from Roche that are based on the control region and identify and apply select panels of multiplex SNPs that are chosen specifically for the issue at hand. This sort of value-added SNP testing approach takes into account the case in question at any given time. Parsons: Slide 14
We've talked a little bit about haplogroups before. They're closely related individuals. Here are a number of them. Everybody in this table belongs to haplogroup H, and it turns out that HI here, which is what we've named it, is the most common Caucasian type. HII, which is depicted in the second row there, is the second most common, and has an additional polymorphism relative to the HI type. The "N" on the left relates to the number of individuals matching that type for whom we have sequenced the entire mitochondrial DNA genome. These are seven common H types. Parsons: Slide 15
Here are a bunch more common J, T, B, and K types, for a total of 234 mitochondrial genomes sequenced from the 18 most common HVI and HVII types present in the Caucasian population. Individuals matching those types comprise about 18 percent of the database. Parsons: Slide 16
So what happens if you sequence these things? Well, it turns out that it's kind of rare that people match each other over the entire mitochondrial DNA genome, and here's what happens if you look only at neutral sites for the most common HI type (labeled with "n equals 31" in the middle). Sequence the entire genome and you can see how neutral sites divide them out. Parsons: Slide 17
The neutral sites that are highlighted in yellow here represent ones that we've designed for a multiplex SNP panel that we'll be applying to that. So you get a great deal more resolution if you take this approach.
Having done this for all these different haplogroups, we've designed a number of multiplex panels that are specific to the common types one might encounter here. We've put these things together recognizing what we consider to be a reasonable multiplexable number of sites. So that's our target for SNP assay determination. Parsons: Slide 18
So what happens if you do this? Well, as I told you before, the most common type (7 percent) and 18 most common types in the population greater than 0.5 percent comprise about 20 percent of all the individuals. If we apply our batteries of SNP panels to the 234 individuals that match one of the 18 types, the most common type then drops to 1 percent—still not all that great. But significantly, these 18 types now switch to 122 types, 74 of which are now unique within these 234 individuals and in the population database from which they were extracted. Parsons: Slide 19
So we've done a lot of good, I think, to mitochondrial DNA discrimination. But where does that leave us with assays? There are a zillion different assays out there that you can read about and that people will be happy to sell you. But how do you choose among them? Parsons: Slide 20
It's really critical that you keep in mind that we are forensic typing. We're doing stuff on actual crime samples. As far as design and application, they should be easy and fast. They need to be supersensitive and robust to a wide range of starting templates. They need to be able to deal with mixture and heteroplasmy, and because extract quantities are limited, you have to be able to multiplex them. Parsons: Slide 21
Last year, we described a great deal of work we've done using TaqMan real-time allelic discrimination-type assays, but I'm not going to go into that in any great detail. However, I must say that we've shown this to be an appropriate platform and very sensitive, and it does well with all the criteria I mentioned. Parsons: Slide 22
Just as so many of the other people speaking today, we have interacted with John Butler and one of his scientists, Pete Vallone at NIST, with their multiplexing capabilities. Just for kicks, they decided to take some of our sites out for a test drive with the SNaPShot method of multiplexing, and it worked very nicely and has attracted our attention to the point where we're doing all our subsequent development on this assay. Parsons: Slide 23
Here's how SNaPShots work: amplification primers encompass the SNP site of interest, and they're pretty short so they're going to work well with forensic casework samples. You put a couple of primers on each side of your SNP site of interest to amplify it. These different sites, in this case 11 for the example I'm going to show here for the most common type, are multiplexed together in a single amplification. Next, you extend the primers in the presence of ddNTPs (dideoxynucleotide primers) and get a mini-sequencing reaction that tells you what the following base is.
There are three good things about the SNaPShot detection system:
- It's going to be widely applicable to many laboratories because it runs on things like 310s, 3100s, and so on.
- You can type the different sites in a single run. As you'll see, the primers differ in the length of a poly-T tail that allows them to separate spatially.
- From the standpoint of wide-scale applicability, the results look a lot like STRs. So you don't really need to think "sequence" quite as much as you would during regular mitochondrial DNA sequencing. Parsons: Slide 24
Here's an illustration of how it works: The blue line is your amplified PCR (polymerase chain reaction) product. The G would be the site you're interested in knowing whether it's a G or a C. You place a primer, which has some Ts on the end of it, in front of the G. Next, you extend this with a fluorogenic ddNTP, which puts in a C, and then you run that out on your 3100. Parsons: Slide 25
Here's the panel that we have from the most common Caucasian type, and again I refer to Pete Vallone who did the initial development of this particular set. You see the different lengths of poly-T tails and the different sizes and colors of sites that stand out from the entire genome. Parsons: Slide 26
So how does it work in terms of forensic assays? Quite well. It's very sensitive and reproducible down to a single-picogram level. This is a real strong signal we're getting from some control DNA at 1 picogram. Parsons: Slide 27
How does it do for mixtures? It does quite well; in fact, it's better than DNA sequencing, which is a good thing. Here we see a 50–50 mixture between two individuals that differ at four of the sites, and you can see they're fairly balanced in peak heights. A 90–10 mixture allows you to detect heteroplasmy as well. That is as good as it gets for sequencing in general. Parsons: Slide 28
One attribute about this for heteroplasmy that's particularly good is the fact that the alternative base has a slightly shifted mobility, so you don't have this spectral bleed-through issue that makes it difficult to distinguish between background and true mixture that you get in DNA sequencing. So that's a real good thing, too. Parsons: Slide 29
Plus, your variants only pop up in a particular site. So even if you have a little background in your assay, if it occurs where no base is supposed to be, you still have a pretty good idea that it shouldn't be ignored as you go through your interpretation.
At this point, we're about one-third done with the whole thing, because you haven't gotten anywhere until you've gotten actual casework samples that you would encounter normally. Multiplexing particularly can throw some weird things at you, and you do get the standard kind of things that one sees.
Here we have a particular site that is showing a great deal of reduced amplification. This is just a baby picture that I tossed up there; we'll go through it a little bit more systematically in some of the subsequent work we've done. Parsons: Slide 30
This is a degraded sample extract that wouldn't work for STRs. It's one of AFDIL's reasonably decent cases that had to go to primer sets in order to amplify it—a primer set being about an amplicon of a couple of hundred base pairs in length. Early on in the trials, we tried it with a multiplex, and we saw 1 microliter of DNA flat-line; 0.5 microliters of DNA flat-line; and 0.2 microliters of DNA, and some loci positions started to come up.
So what does that tell you? Well, you've got an assay that's pretty sensitive to inhibition and darn sensitive in general. You're getting a result from 0.2 microliters of a fairly difficult forensic case, so good and bad. Parsons: Slide 31
But the good news is that these things can be made substantially more robust about as easily as I've ever seen anything get optimized.
Here we have simply increased the Taq and brought things up to a 25-microliter reaction. As you can see, we're getting very robust results from the very same extract of casework samples throughout the range of concentrations. And instead of less being more, we have more being slightly more, which means we've gotten over the inhibition hurdle in this particular case. Parsons: Slide 32
But you have to use the type of savvy you would always use in trying to balance the amount of sample that you would use. This is where we'll get into discussing how interpretation thresholds and that kind of thing can be kind of similar to STRs. Here is a sample that had one of the higher dilutions, partly inhibited, and resulted in a partial profile. You blow that up and look down at the baseline, and you can start to see other background peaks pulling up around the same height as some of the real alleles. Parsons: Slide 33
As I mentioned earlier, you know where the real alleles are going to be, so you're pretty comfortable ignoring these little things. But nevertheless, you need to develop some kind of a validated threshold. We're working through forensic protocols to tighten all this up, but it's really quite straightforward. I think any lab would need to go through standard internal validation to make those kinds of conclusions themselves, just as they do for STRs.
So back to this panel with the multiplexes. Here we have been focusing on multiplex panel A for the most common type. We have developed, designed, and obtained primers for all these different sites throughout the multiplexes and tested about 50 percent of those in singleplex. We're trying to get through this by September [2003] so that we can supply you with readily available forensic protocols. Parsons: Slide 34
I should mention that the SNaPShot assay, in terms of the basic mastermix, is an ABI (Applied Biosystems) product that you can buy as a kit and is quite easy to apply. One would then need to order the specific primers and probes that would be specified in the publication. Maybe someday somebody will put these things together as a kit. We're very eager to see Roche get into it with their linear arrays. In fact, I've been focusing so much on SNP assays, that I may not have stressed the most critical part of the project: genetic data.
It's important for groups like NIJ to continue to fund the quest to ferret out this information. This can be used for any possible SNP assay further down the road. So get the information right and get it out there where people can use it.
As far as publications, our lab, with lead author Pete Vallone at NIST, is presently publishing the SNaPShot assay. This will be ready for immediate submission as soon as I get out of meetings and get back to finishing it up. Then, we are also going to cosubmit the 234 mitochondrial DNA genomes that will deliver the genetic data. Parsons: Slide 35
One direct spinoff of the NIJ funding that we've had here is the collaboration with a lab in Innsbruck, Austria, where we developed SNaPShot assays that allow you to distinguish each of the Caucasian haplogroups using coding region SNP assays. Furthermore, this has a stand-alone discrimination potential really quite good, almost similar to the Roche linear array strips.
I will note that the multiplex panels I've been speaking about in this talk are no good by themselves. They're only good as value added to knowing that you already have one of the common types. So they've been very specifically selected.
We'll be very happy and are looking forward to providing the forensic community with forensic-grade validated protocols from AFDIL, so you can go to ABI and get this going in your own labs.
I'd like to acknowledge my coworkers at AFDIL who have done all the work: Mike Coble is a Ph.D. student under my direction, and Jodi Irwin, a research scientist, has been very helpful in bioinformatics. Then this group of people here have all been enabled by the NIJ funding to work as super techs: Rebecca Hamm, Jennifer O'Callaghan, Elona Letmanyi, and Christine Harvie. Timothy McMahon actually is a validation coordinator who's helping us go through the degraded samples, and Jessica Towne has just recently been hired. There is a small fleet of interns from George Washington University and others at AFDIL who are essential. Parsons: Slide 36
I'd like to also acknowledge my colleagues in the field. Of course, NIST, John Butler, Pete Vallone, Anita Brandstaetter, Harold Niederstaetter, and Walt Parson at the Innsbruck lab. For samples, I'd like to thank Connie Fisher at the FBI, as well as others throughout the field who have contributed. And finally, I'd like to thank the National Institute of Justice for funding. Parsons: Slide 37
Thanks very much. Parsons: Slide 38
MR. FRANK: We thank Tom and all the other presenters.

