Fourth Annual DNA Grantees' Workshop
Monday, June 23, 2003
AFTERNOON SESSION
Validation Study of the TrueAllele Automated Data Review System
Barry W. Duceman
Biography
MS. TOMSEY: Next we have Barry Duceman from New York State Police Investigation Center. He's currently an adjunct professor in the Departments of Biological Science and of Biomedical Science at the State University of New York.
Barry also is a member of the World Trade Center DNA Data Analysis Panel and a past member of SWGDAM (Scientific Working Group on DNA Analysis Methods), the National Transportation Board of Mass Fatality Planning Panel, and the FBI's CODIS Design Review Panel. Dr. Duceman's credentials include a doctorate from Penn State University and 5 years' experience as a postdoctoral researcher in the Department of Human Genetics at Yale University.
Today, Dr. Duceman will tell us about his validation study of the TrueAllele Automated Data Review System.
DR. DUCEMAN: Thank you. Duceman: Slide 1
Nine months of full-time validation effort compressed into five minutes. No time for lawyer jokes.
We've just completed our validation of TrueAllele for use in our convicted-offender DNA database. The TrueAllele study is now with the FBI, and after all this time, they'll tell us whether or not we'll actually be able to use it. We're very optimistic. The process has been going very well. Cybergenetics is a great company to work, with and I think, as we go through this, you'll see that this databank application will have some utility in casework as well. Duceman: Slide 2
Our goal in validation is basically to automate our process as much as we can. We're trying to become a high-throughput lab. Our definition of a high-throughput lab right now is about 50,000 convicted-offender DNA patterns on a yearly basis. We quickly recognized that there are some significant bottlenecks, and one of them is the quality data review.
It makes no sense to be generating thousands of profiles just to have forensic scientists sitting in front of the computer screen, dooming them day in and day out to look at the quality of these data. We think that's a tedious, potentially error-prone process, and we're trying to make their life easier and also make sure that we're confident of the data we're bringing into CODIS.
My folks are very protective about their data, as most of us are. So we wanted to make sure that when we brought this online, everybody was going to be comfortable. That's why we undertook an extensive large-scale validation, and that's what I'd like to share with you for the next few minutes.
Before I do that, let me give you a quick, 30-second introduction to TrueAllele. TrueAllele is basically a bunch of subroutines. There are several others, but these are some I thought might be the most interesting. Duceman: Slide 3
The control check, for example, which satisfies DNA Advisory Board standard 9.5.1, looks at various control lanes, positive and negative controls, and 9947 values and tells you if everything is copacetic in a particular gel.
Allele Call is particularly interesting. The Allele Call program identifies and quantitates the peaks and makes the call. But what's really helpful is it prioritizes the quality of the peak. Right now we use a scale between zero and one, and the higher the quality, the better the gels. The whole point of this validation is to determine where in that scale we're comfortable with letting the data scoot right into CODIS. Duceman: Slide 4
The idea is to look only at the poor-quality data. It makes no sense to look at good data. We look at the poor-quality data in a program called Allele View.
The first part of our validation is broken down into two phases: optimization and a concordance study. In the optimization phase, we establish the quality-control rules. What's nice about TrueAllele is it's custom configured. When we begin the process with them, we give them our standard operating procedures and some data, we tell them what we look at, and we tell them our RFA cutoffs and what we use for heterozygous imbalance. They then build that into the program for us, so it's customized. Duceman: Slide 5
Basically, what they set up for us is 24 rules that define what's a good gel and what's a bad gel. If you go through those rules and spend time studying them, you'll see that everything you're looking at is in those rules. The challenge is to decide when those rules should fire on your data, and to do that we started this large-scale optimization study.
As I mentioned, we gave them our SOP and a bunch of data, and they gave us a template. The template basically was those rules with our settings and the quality review value. The Allele Call program assigns a quality score to each run. Duceman: Slide 6
In the first dataset, we're looking to determine whether or not we agree with the calls of the program. So, we have an analyst looking at every allele plus TrueAllele looking at every allele, and we're tweaking the software. In the first dataset, we looked at 7,000 alleles. We gave that information back to Cybergenetics, they did the tweaking of the software, and they gave us the second dataset. We looked at an additional 7,000 alleles plus the original, so at that particular stage we're looking at 14,000 alleles. Duceman: Slide 7
We made some final tweaks to the software, came back with template three, and added another 7,000 alleles to the original 14,000. By the time we were done, we had looked at 42,000 alleles. At that point, we were comfortable that we could define a quality score above which we did not have to look at the data any more. The quality score was 0.3 on a scale of zero to one.
So we had optimized software, we knew the quality scale, and we thought we were ready to start the concordance phase. In this case, we're processing a large amount of data (2,048 convicted-offender profiles), and we're doing it by our old process, which is Genotyper and human review twice. In our databank, we do Genescan, Genotyper, and a human review and go back and complete that process for all the data; or we use TrueAllele with one reviewer only looking at the bad data. So the comparisons are either TrueAllele with one person looking at 10 or 15 percent of the data before they go into an output file or Genescan-Genotyper looking at all the data twice. Duceman: Slides 8 and 9
Now, Cybergenetics was very accommodating and actually developed for us a piece of software called Autovalidate. Autovalidate looks at the overlapping loci in Cofiler and Profiler and makes sure they're all copacetic. It also looks at the output files for the concordance in allele calls from the Genotyper and TrueAllele programs. Are the same alleles called? It also looks at reasons for rejecting and accepting and prints out results in some very nice reports. Duceman: Slide 10
The bottom line is that we had 99.8 percent concordance of the calls between the two programs. That's actually a very conservative estimate because, in some cases, what we're calling nonconcordance is actually differences in the way the software presents the data. Duceman: Slide 11
The only significant difference is in one case where we had a spike that was actually called a peak by our reviewer using Genotyper. However, when it was looked at in TrueAllele, it was detected as a spike and called correctly. So, as far as I'm concerned, we have 100 percent concordance between these two programs, and where there was a difference, we sided with TrueAllele. Duceman: Slides 12 and 13
In terms of why samples are rejected and accepted, there are differences between the two programs. Part of that is because we put some constraints on our Genotyper approach because we had to follow our standard operating procedures. All in all, we found that TrueAllele allowed us more leeway in terms of the RFU range and that it always brought the bad data to attention. Duceman: Slide 14
We had 100 percent concordance, and whenever there was bad data, it was brought to our attention for further examination. Bottom line: We figure this program enables accurate calling of the alleles and successfully performs the quality review. It uses your own criteria, which is nice, and it eliminates the need for a full technical review, so the analyst only has to look at the questionable data. Duceman: Slide 15
As I mentioned, we're at the point right now where we're waiting for a green light from the FBI. Duceman: Slide 16
Thank you.

