Fourth Annual DNA Grantees' Workshop
Monday, June 23, 2003
MORNING SESSION
Research Update Briefings: Prototypes and Products (continued)
William Frank, Moderator
Biography
MR. FRANK: Good morning everybody. My name is William Frank. I'm with the Research and Development Laboratory of the Illinois State Police. I'm going to be your moderator for this session.
Our first speaker is Cassandra Calloway. Cassandra finished her bachelor's and master's degrees in genetics from the University of Georgia. She's presently working on her Ph.D. in George Stensebau's laboratory at the University of California at Berkeley. Her major interests include mitochondrial genome analysis, characterization of polymorphisms, and heteroplasmia in mitochondrial DNA. Cassandra.
Applications and Validation of the Linear Array Mitochondrial DNA HVI/HVII Region-Sequence Typing Kit
Cassandra Calloway
Biography
(Note: The PowerPoint presentation that supplements Ms. Calloway's transcript is not available.)
MS. CALLOWAY: Thank you. I'd like to thank the meeting organizers and Lisa and Lois for allowing me to speak to you guys today. When Lisa mentioned earlier that the new grantees would be speaking first, I thought I might have had the wrong date today, because many of you know that we've been working on this project for quite a while now, and we're wrapping things up now.
Some of you may remember from last year that we initiated beta trials with approximately 10 laboratories. These were training studies that, once completed, went on to some independent projects. Currently we're working with nearly 20 laboratories worldwide. What I'd like to do today is to use some of these collaborative projects as well as some of our own research to illustrate certain points, including automation; alternative, simplified, or low-cost methods; and developmental validation.
As many of you may know, we are using linear array technology, which is similar to the PMDQ A–1 technology or the reverse dot blot, to target mitochondrial DNA variation in hypervariable regions I and II (HVI and HVII). This assay does work on a variety of samples, including hair, bone, teeth, blood, and saliva, that may be targeted for mitochondrial analysis when there's no nuclear DNA available or when it's highly degraded.
Once you extract the DNA, you can then proceed to amplification. We use Biotin primers. It's a duplex primer system that reduces the amount of material you use so that you consume less DNA. Once you amplify your DNA, you then quantitate your product. We use and recommend a basic gel method compared to a mass ladder to determine the quantity of PCR (polymerase chain reaction) product. In a few moments I'll be talking about alternative methods that some of our collaborators have come up with to increase throughput.
Once you quantitate your PCR product and you have enough product, you hybridize the denatured product to the array, interpret, make exclusions, and sequence if necessary. There is enough PCR product after typing that you can go straight to sequencing without having to do a second amplification.
As I mentioned, we would like to extend the assay to all laboratories and offer it as a low-cost alternative to sequencing, in that you need only a thermocycler, a water bath, and a gel apparatus. However, some labs would like to do more high throughput or use more high-throughput methods. I think one of the great things with working with all these collaborators is that everyone has their own ideas, their own ways of doing things, and has come up with ways to increase throughput of the PCR quantitation, typing, and interpretation.
One method of quantitation that NIST (National Institute of Science and Technology) (Margaret Kline) used was the FMBio system to quantitate the PCR product. This simply gives a value for the signal intensity, and it helps determine whether you can proceed with the typing or whether you need to use increased cycle numbers.
Similarly, the Phoenix Police Department used the Kodak imaging system, which also gives you values for the PCR yield. NIST and the FBI use the Agilant 2100 Bioanalyzer, which is a DNA chip system. This also gives you more quantitative values rather than just comparing the gel image or the gel products to a mass ladder.
The linear array hybridization step can also be automated using the Tecan Profiblot system. Many of you may have used this with the PM typing strips or typing arrays. This process was also validated by Margaret Kline at NIST. Initially she used some of the control samples and compared them to the manual method to make sure everything was working properly and then proceeded to type 666 samples for a population database using the Profiblot.
Interpretation of the linear arrays can also be automated, and we are currently working with our bioinformatics department to generate a simple scanning software. With this software, you would be able to take the developed arrays, place them in a plastic holder, and invert them on a scanning software. Then you would get a scanned image (in JPEG format) that could be entered into your scanning software program that would identify where your probe signals should be and determine if there is any signal intensity. If there's no signal or some signal, you would get a value for that. Once you have those values, we have developed a set of rules and another software program that can then convert those into a mitotype. This software program would also ease interpretation and increase throughput.
The Phoenix Police Department, as part of their independent project, suggested using the chemiluminescence method coupled with the Kodak imaging system to increase throughput and ease interpretation. When tests failed on two airbag samples that were initially typed with nuclear DNA STR (short tandem repeat) analysis, they then went on to use mitochondrial amplification and typing. Even at 34 cycles, they had sufficient product for typing the two airbag samples. One of the airbag samples was a mixture, and the other airbag gave a single type and did not match any of the known reference samples.
One hurdle they had to overcome using the Kodak imaging system was that they needed a ladder and a strip or an array that had every signal visualized. So, they mixed 10 different DNA samples that would light up each of the probe signals on this single array and used this as a ladder. If you're interested in more about this, you could speak to Dave Johnson of the Phoenix Police Department.
Now I'd like to shift to developmental validation using, again, points based on the collaborative studies as well as some of our own research. I'd also like to quickly go through characterization of the genetic markers, population studies, species specificity, and degraded DNA, looking at World Trade Center samples from the Office of the Chief Medical Examiner (OCME) in New York City.
As I mentioned earlier, we are targeting the hypervariable regions I and II of the mitochondrial genome. Specifically, we are looking at 19 positions in these hypervariable regions. As you may have read back in 1991, initially Mark Stonekey had analyzed 50 samples and identified the most polymorphic regions. These are included in the array, and we've since added more polymorphisms.
As part of the SWGDAM (Scientific Working Group on DNA Analysis Methods) guidelines, we want to be able to determine the frequency of the different polymorphisms. To do this, we looked at a population database of 674 samples of our own. In addition, as I mentioned, Margaret Kline at NIST looked at 666 samples. So we have a total of 1,340 samples.
Offhand, this is just a portion of the allele frequencies specific for the IIB probes. There are seven probes in this region, but there are more than seven possible types. That is because you can have a zero signal wherein multiple polymorphisms underlay the probe and act to destabilize the binding. Alternatively, you can have a weakly destabilizing mismatch, but they are rare because we went to great lengths to minimize the potential for weak signals.
Another category of signals is heteroplasmic or a mixture or multiple signals within a region. For this particular region, around 1 percent of the samples were observed to be heteroplasmic.
We also used the population database to calculate a discrimination power. We found discrimination to be relatively high for four different populations. We also found a number of unique types for each of the populations, which makes the discrimination power increase.
Among the 666 samples from NIST, I also looked at the different types and the number of samples and unique samples, and they are similar to that of ours, as expected. The interesting thing here is that within this combined population of 666, they are mostly unique types—185 of the 666 were unique. One type (the common Caucasian type) was observed 51 times, or around 7 percent. You may hear about this from Tom Parsons later in the session.
But the common type, the 1111111 or Anderson type, in our population was observed around 10 percent. We went on to sequence to see how well the array did at discriminating for this particular type. Although you get more information through sequencing, only an additional 50 percent, or 11 of those samples, were further differentiated by sequence analysis. Therefore, the linear array is a good screening tool. In some cases, as with sequencing, there are some common types that can't be discriminated by just looking at HVI and HVII; therefore, there is a need to look outside these regions.
To determine species specificity, we amplified 14 different species of animals that you are most likely to come across in a typical crime scene, plus humans, chimpanzees, gorillas, and orangutans.
We looked at 500, 50, and 5 picograms of DNA input. Although 5 picograms was our typical input, we wanted to see if you would see something at the higher inputs. We did see product visible on a gel for the chimpanzees, orangutans, and gorillas. Although there is a product visible on the gel for the rat at the highest concentration, it is not the same size as the human mitochondrial DNA; it is much smaller than that.
The chimpanzee and gorilla are similar in size to the HVI and HVII products of the human. Specifically, the gorilla has a product similar in size to the HVII, but the HVI is much smaller.
We proceeded to type the 14 different species at the 3 concentrations to determine if there would be any visible signals on the linear array. There were visible signals for the chimp and the gorilla. With the orangutan, a very weak signal existed for the 500 picograms. However, the other 11 species, including the rat, exhibited no signals, more than likely because the rat product visualized on the gel may be a nonspecific product since we are only seeing it at very high concentrations. So we're now looking at sequencing these products, and hopefully at another time we'll report a match.
In collaboration with the California Department of Justice, we looked at degraded DNA by DNAse I treatment. Of course, longer time points equate to more degraded DNA. After about 120 minutes, most of the degraded DNA species were smaller than 500 base pairs, which corresponds to the lowest span on the ladder; and at 300 minutes, or 5 hours, there were no products visible on the gel and most of the DNA was degraded.
We then went on to amplify at 34 cycles and 38 cycles, and, as I mentioned, after 120 minutes, most things were degraded below the size of our HVI and HVII targets. Even at 34 cycles, you do have visible product on the gel. However, even the most completely degraded DNA points are lower than our limit for our linear array detection. At 38 cycles, which is still well below the standard 40 cycles that can be amplified, even the completely degraded DNA was visualized on the gel.
These samples were then typed on the linear arrays. As expected, at 38 cycles, all products gave types. There was no signal among the very degraded DNA, but this wasn't unexpected, since it was beyond our lower limit.
The samples were then sequenced. At 34 cycles, even though there was no type on the array, the most degraded DNA was sequenced because it exhibited a signal on the gel. And even though it was below the 10 nanograms of PCR product that is suggested for sequencing analysis, an interpretable sequence was still obtainable.
One thing to note is that in some cases, we tend to think that if we have a degraded DNA sample, it may affect sequence quality. But you can still get great signal-to-noise ratio with no to very little background noise for even the degraded DNA samples.
Let's move on to degraded DNA samples from the New York City OCME. As you know, they were faced after September 11th with the daunting task of identifying the remains of nearly 3,000 individuals from more than 19,000 samples. They have analyzed much more than this, but for this particular project, they looked at 300 cases and found usable STR profiles for 75 percent of the cases. A subset of the remaining 25 percent was then used for amplification and linear array analysis using the HVI mtDNA kit and without consuming any additional bone matrix.
They analyzed 54 cases and were able to successfully amplify 19 samples at 34 cycles. (We recommend that you go up to 38 cycles.) Twenty-four additional samples yielded visible product on the gel, and there was no visible product in 12 samples.
There are a number of things that we're working on together to continue to see if we can get these additional 12 samples to amplify, including adding BSA and maybe even lowering the sample input just in case there is an inhibitor that is keeping it from amplifying. Among the sample that gave a visible product when typed, 28 showed full types, and 5 of them were mixtures, which probably wouldn't be unexpected for the type of samples they were identifying. Nine were inconclusive; again, even though there was visible product on the gel, some of them were lower than our target level. There were 12 negatives, as expected, since there was no visible product on the gel. We don't recommend going on to linear array typing; we just used it for this particular study to see what we could find.
Among the results in this group of 15 samples, 6 had the same type. I found this to be interesting because, as you remember back from the population data, it's not unexpected that some samples are going to share the same types. But they are very common types: the very common Caucasian type, the 111, or some other common types.
But looking at this type, I was thinking that I've never seen this type before. I queried our database and the combined database with the 666 from NIST. Of the samples that shared profiles, 7 of the 28 had the same type. Well, that type wasn't even observed in the 1,340 individuals. So it's likely that these samples are from the same individual rather than from seven separate individuals or even a couple of different individuals.
Again, there was another type that was observed 4 times, and this also wasn't observed in the 1,340 samples. Actually, in all the samples that were observed multiple times, they weren't even observed in the population database. So these are thought to be unique, and from this, researchers concluded that the array would be a very useful screening tool, especially if you had a group of commingled samples where you're trying to differentiate individuals or if there are multiple samples possibly from the same individual.
In conclusion, as I mentioned earlier, although we'd like to provide a very low-cost method and make this assay available to anyone, some laboratories would like a higher throughput to increase the throughput of the PCR quantitation, the linear array, and interpretation.
Additionally, the duplex amplification system is a very sensitive and robust method, and it is successful at amplifying degraded DNA and would potentially be useful for screening samples from mass disasters or graves.
Finally, the array is a very sensitive method and has a high discrimination power, as you saw from the population studies.
I'd like to thank all the members of our group and former members, as well as the members of the Roche Applied Sciences, who are working to transfer this to manufacturing, and of course each of the 10 laboratories that we've worked with for the beta sites. Unfortunately, I wasn't able to share everything today from each of the labs, but maybe another time we'll get to share the other data.
Of course, I'd like to thank the NIJ, particularly Lois and Lisa, for the long-lived grant here.
Finally, just in case you would like to get in touch with the individuals with the data, specifically Rom Kishore, Mavis, and Margaret Kline at NIST; Zoren at the New York City OCME; and Dave Johnson at the Phoenix Police Department.
From here, I'll end, and Mark Timken will be coming up here to discuss his work at the California Department of Justice using the linear array. Thanks.
MR. FRANK: To get back on schedule, we're going to hold questions for the speakers until the end of this session and then take questions for all of the speakers.

