National Commission on the Future of DNA Evidence

P R O C E E D I N G S
Sunday, April 9, 2000

Research and Development Working Group Report
Dr. James Crow
Chair

DR. CROW: I do want to get to the substance of what's been written in the report, but I want to make some more general kind of comments first. Our job is, as you know -- was to look for ten years into the future and try to make some kind of predictions as to what will happen.
I want to start out, however, by explaining why you get this report in two parts. It's a short story but I'm going to make it long. I had written a draft of this statistic section and then one of our committee members, Bruce Weir, read it and didn't like it and suggested a large number of amendments, which added to its length, so I, being a good Joe, put those in.
And then I got -- well, to coin an expression first invented by Isaac Newton, I got an equal and opposite reaction from Bruce Budowle. In fact, equal and opposite is a gross understatement of the reaction that I got. So I then tried to write something that both Bruces would agree to. Well, I wrote a lot but it turned out neither of them agreed to it at all.
So I finally got bailed out of this by Lisa. She said let's take out everything that's at all controversial and that's at all difficult and put it in the appendices, then whoever wants to read the appendix can read it, and make a baby version, you might say, for the report itself.
So I sat down the other evening and did that, and it's a rush job and you have it in front of you. I
suggested -- I'm going to talk mostly about other things now because if you have a chance tomorrow -- and you can look at this overnight and make whatever comments you would like to
make about what's really otherwise a rather rough draft. This has not been seen by the working group. This particular chapter is my own writing and I have no guarantee that they'll approve of it.
A little bit of philosophical remarks, if I can so dignify them. This Human Genome Project is going to be finished in a few years, and we'll have an enormous amount of information and an enormous number of new loci that probably will be cheap, but they'll be there. There are 3 billion base pairs per gamete, as you people have heard time and again, and if we look at single nucleotide differences, they occur about one in a 1,000.
That means that if we could sequence 10,000 bases, we'd have 100 differences, and a 100 differences conservatively guessed as to would have a probability of the reciprocal of ten to the 31st power. So we're getting into astronomical numbers in a hurry whenever DNA sequencing becomes something that's practical. And if you read the papers, they tell you that in ten years from now every school boy is going to be able to sequence his own DNA.
I'm not sure we want this. He may sequence his own or he may sequence his father's and find some discrepancy. That's almost certain to happen sooner or later. But the -- I say all this not to just impress you with gee whiz stuff, but I do think the science of this is moving very fast, andalthough I think the 13 loci are here to stay for a while, they're not going to be the state of the art for very long.
To go on though, STRs really work very well, and I tried to say that in the report. And the 13 core loci are established, and they're certainly established as far as databases are concerned, and there's a large investment by laboratories and by the databases in these 13 loci, so they're here to stay whether they're the bet or not. And actually, as I said, I think they're pretty good, but that
doesn't mean that there won't be better things coming along, and it's almost certain that individual laboratories will be taking advantage of the better things as they come.
It's not feasible to try to predict everything about it, but I think certain things can be predicted.
First, about the 13 loci themselves, we're certainly going to have advances in automation and speed, and I hope with automation and speed comes a reduction in price, but that's harder to guarantee than the automation and speed. I've read quite a bit about hand-held gadgets that will do the equivalent of the analysis that's done by a laboratory now, do it in a few minutes time, and do it with far less materials, and as far as working in the laboratory is concerned, it works very well.
I'm aware, we all are, that it's a big step from something that can be done in a refined laboratory where you don't mind making mistakes, and something that's being done in the forensic situation where you demand much more perfection and much more robustness and much more ability to work, no matter how badly abused the gadget is. Nonetheless, I'm sure we're going to have within ten years -- at least our committee is -- hand-held machinery that will permit analysis at the crime scene itself and perhaps tie it into a database and find immediate innocenceor immediate possibility of a suspect.
A little bit about the kinds of additional loci -- they're here now but they're increasing in numbers very rapidly. The one I just mentioned, what we call SNP -- these are just single nucleotide changes that can be discerned by any technique that can do DNA sequencing, and a lot of this depends on how effective that particular technique is.
Mitochondrial DNA is transmitted from mother to child -- has a big advantage, as we've said several times now, of occurring many times within the cell so that trace amounts of material are often more useful for analysis in this way, and the y chromosome is becoming useful. It used to be that there weren't any markers on the y chromosome and it couldn't be used at all.
And I want to take time to be personal for a moment, not that I haven't already, and that is that about 40 years ago I invented a method of using the person's name as a way of studying migration and inbreeding in populations, and we applied it to some of these isolated religious groups like the Hutterites, and it worked very well. And it's been taken up by anthropologists and became very popular. Well, slightly popular is a much better statement. Two or three anthropologists used it.
But at the time I made the statement somewhere that this technique of using names will be replaced whenever the y chromosome has markers on it, and that state is now here. And those of you that looked at this morning's New York Times saw an article in which a person's done what I'd love to have been able to do myself, really just test to see how well the analysis of names follows the real biological ancestry. We all know it isn't perfect, but we're also impressed by the fact that it's remarkably perfect.
And I find two instances of what seemed to me unusually strict monogamy. One is in the Jewish priesthood, where the Cohen name seems to be transmitted generation after generation without a break. And let me tell you one joke too. I'm trying to stall to keep from getting into thenitty-gritty.
But I told one of my physicist friends that the fact that the Cohens and the related names were perpetuated through -- ever since Aaron with hardly any exceptions must mean that the priesthood was remarkably nearly monogamous and that I had the greatest admiration for their morality. This physicist told his wife about this, and she said that means nothing at all. It only means that if there's any hanky-panky going on, it involves other priests.
(General laughter.)
DR. CROW: That woman has a career in genetics.
The serious part of this message though is that we can use y chromosome markers very effectively for anthropological and evolutionary studies and we can identify people as has in fact happened to these people in the middle of Africa which have some Jewish y chromosomes. But it also gives us a caution that we shouldn't use the y chromosome to determine anything important about anything else except just the y chromosome, because the rest of the genes are randomized and play very little role.
So although I think we'll have a large -- there's a lot of talk now about the y chromosome. I think it's going to play an important role in anthropology and evolution, but it won't very often tell us the kind of things we want to know for forensic purposes.
In the controversy within this committee, part of it had to do with whether you used -- as the FBI advocates as the 1996 committee essentially did -- whether you use essentially the product rule to calculate match probabilities or whether you use these corrections based on assumptions about the population structure. I have both kinds of formulae, and maybe I'll ask you to turn to it for just a moment. This is on page 6 of the new handout. There's an impressive-looking formula at the bottom of the page which comes from Balding and Weir and this group of people, and these are measures of match probability in terms of the legal frequency, which is P here, and then this magic quantity theta, which is a measure of population structure. I think you can see that if theta
is equal to zero, this formula up here is simply P squared. And if theta is equal to zero, the second formulae is simply 2P1P2.
So what this is is a correction that is much beloved by some statistical groups and thought unnecessary by others.
Well, is it unnecessary or not? The ones who point out how important it is point out that by not using this, you could perhaps make an error by a factor of a hundred. The rejoinder to that is that is if you compute a probability that's 100 quadrillion or that is one in a 100 quadrillion, if that's 100 times too low, who cares. It's still a probability of one in a quadrillion.
And the point that I'm interested in saying about this is that you can make this correction. It makes what looks like a big difference if the probabilities are very, very small. It makes a small difference if they're large, and in any case it would hardly ever cause you to chang your conclusion. So what probably sounds like a trivial issue for you people is enough for statisticians to argue about by the hour, and it happened within our committee.
A few other things before we come down to the subjects of the report. One thing that is certainly true now and it's going to be increasingly true as the numbers go on -- the 13 loci, which are now carved in stone are capable of telling us a lot more than identity because you can quickly recognize brothers and sisters by the pattern that these show, and you can also recognize other relatives and usually you can distinguish between brothers and other kinds of relatives. So it's possible with the 13 loci to make a pretty good guess, if you find another sample, that these persons might well be brothers or they might be relatives other than brothers.
I raise it here because, as you well know better than anyone else here probably, states differ in their view as to this particular point. But certainly the technology is here and it could frequently be found that -- to identify from partial matches close relatives, which could be searched out or whatever the proper procedure is and whatever the law allows in that particular state.
It's also possible now to take a sample of DNA of the kind that's ordinarily analyzed and make a reasonable guess as to what group that particular DNA came from, and with the existing 13 loci the chances of your -- let me say it this way -- and you make an assessment as to which group it came from, Black or White say, the likelihood ratio of getting the right answer versus the wrong answer is about 45, on the average. So you can make a pretty good guess as to the source of the DNA, and it could be useful in further investigations.
But again, that raises questions of whether we want to do that or not. I point out the possibility of it rather than what's the right thing to do. This would be done using just existing markers, the 13 loci. Once you get out of this step and start bringing in other kinds of things that are not part of forensics, that is some of the blood group markers which are not used in forensic purposes, or very common in some groups, very rare in other groups, and the other kind of genes of this type too.
If we follow the kind of rule that's been done up until now most of the time, which is to do your forensic studies on traits that have nothing to do with anything interesting, that is only the parts of DNA that's mainly junk, the 13 loci is about as far as you can go. But if you want to do other things, there are a lot of things that could become possible. And again, in trying to foresee the future, I think we point out that these will become possible.
As far as individualization is concerned, we have reported, and I hope accurately, what the FBI's procedure is. If it isn't accurate I want to make it accurate. And we neither endorse or not that particular procedure, but we do state it. We do mention that it's been criticized by some people for not using the most sophisticated statistics, which is a proper criticism. I think it's the wrong criticism in this particular case, and I think that that principle is rather like to be adopted. It's already being used, of course, by the FBI and by others.
I don't think you'll get a committee of size to ever say that this person has to be -- these two samples have to come from the same person with absolute certainty. All they'll say is that the probability is small. And I think it's necessary for the courts or the legal community or the social community or someone else to say that once the probability gets to be small enough, we're willing to say that this constitutes a unique identification.
The FBI's procedure is a very sensible one. They took anything from 1996 -- the NRC report is sensible by definition, by my definition. They took the form of calculation and they asked what is the probability -- or how small must it be to be smaller than the reciprocal of the population of the United States? If the probability is less than the number of people in the United States, it's pretty likely there's only one such person.
Then they built in three factors of safety into this. They took a factor of ten. That factor of ten comes from us, because we did a study that showed that the actual best guess as to the accuracy of the VNTR profile is about a factor of ten. Ten too large, ten too small, hundred all the way. Exactly the same thing hasn't been done for STRs, but there's every reason to think it would be exactly the same, and so the FBI took that, quite reasonably.
The other thing they did was used a 95 percent confidence level, which again is the level of conservatism, and then they made one other level of conservatism, and that is you use whatever database is most favorable to the defendant, you might say. That is you use either White, Black,Hispanic, whatever is necessary.
So I think almost anybody would agree that this is a procedure that's safe in the sense it would be very, very rarely would you make the mistake in the wrong direction; that is in the direction of wrongly convicting an innocent person.
Within our committee -- the real controversy has nothing to do with whether this is a sensible thing to do. It has to do with whether it's the most elegant thing to do from a scientific standpoint, and I guess we'll have to ultimately take the position -- I would myself at least -- that the FBI's procedure is the right thing to do. I'm happy to say that the DAB said the same thing. And I might go on to say that I received the DAB's report about a week or two ago, and I find that most of what we've written was actually exactly the same, so we -- I see no conflict between my personal views, which I think most of the committee share, and the DAB.
One thing -- now a little about database searches -- one of the things we need to emphasize is that a database search needs to be treated as that particular database search. There are going to be many considerations come into it other than a pure probabilistic calculation, and specifically, a database of convicted felons is going to have a different probability of finding a match than a database taken from the population as a whole. Recidivism is high enough that the prior probability of a person being found in a database of convicted felons, if he himself is the convicted felon, is an appreciable one.
On the other hand, if we move into the realm where the databases become essentially a random sample of the population, then that really is not a viable consideration, and then we need to do something to take into account that the size of the database influences your probability of finding an innocent person. And we also -- it was publicized very heavily a couple of weeks ago, this statement from Britain where a six locus match was found and then it turned out they picked out the wrong person.
The New York Times regarded that as surprising. I regard it as not at all surprising. With six loci, the probability of a match -- of their particular match in this particular case was about one in 37 million. That's pretty small, of course. On the other hand, the database consisted of 700,000 samples, so if you go through a database of 700,000 with an individual probability of one in 37 million, you end up with a probability of about one in 50, about 2 percent of finding a match. And now suppose you follow this procedure 50 times, you're almost certainly going to find a mismatch, that is find a correct match of the wrong people.
So clearly, we have to -- that has to be taken into account, and I'm happy to say that I agree with -- I personally agree and I hope our committee will with what the DAB says to do about this, which is to say to follow the policy of the 1996 report. Alternatively, you can deliberately choose not too large a number to use in your original search and then regard that as investigatory, and
then use other things for the conclusions to be reached.
In any case, the database problem, as a problem, is not going to get easier as the sizes of the databases become larger and as the databases become more representative of the population as a whole rather than of people who have previously been convicted of some crime.
Looking to the future, further in the future -- now I'm afraid to say this, but I will say it anyhow, and that is that eventually DNA is become more like fingerprints in the sense of being trustworthy without the necessity for a great deal of statistication. I think almost certainly that will happen if we look to the kind of numbers that I was talking about when I first started talking. It doesn't really make any difference how much substructure there is in the population or anything else. Finding two matches when the probability is one over ten to the 31st power, as Isaid, is not going to -- well, it's equivalent to certainty.
What are we going to do until that stage is reached? I have a suggestion. I'm not sure if it's going to be accepted by people or not. But I think there's something that is -- would be available in the future that would be an improvement of what we do now -- which I guess I should repeat that what we do now is pretty good -- but would satisfy critics whose -- hypercritics, and that is that I would like to have a procedure that would distinguish between any two individuals, even if they're closely related. Right now we assume that when we make these tests that the individuals are unrelated or we make some adjustment for possible close relatives.
It would be nice to have a calculation in which any individuals, no matter how closely related, would be identified as being separate or the same, and there is an approach to this. It comes from a happy genetic fact. And that is the brothers share exactly one-fourth of their -- they're going to be alike one-fourth of the time, and that one-fourth is determined mainly by just Mendel's rules and has nothing to do with population genetics considerations. It has a little bit to do with it but not nearly so much.
So if we in the future go to this now non-sacrosanct 20 loci, we can easily distinguish brothers, and that means you can distinguish anybody less related than brothers, and I rather suspect that five or ten years down the road something like this will become common. The 13 loci, obviously, we think, is here to stay, but it may -- individual laboratories may very well want to add to it either additional STR loci or new and better techniques as they come along.
And -- oh. Almost certainly it's going to be possible in the future to identify from a blood sample some physical characteristics of the person who left the blood sample. If you're lucky enough, you might be able to identify baldness, for example, if you found the gene for baldness in the sample, and there are half a dozen other traits. This is now a long way from being -- from having
enough traits to be at all useful, but I don't think we can be at all sure what's going to happen in the next five or ten years as growing out of the genome project more and more individual loci will be discovered that could be put to use for this kind of purpose.
It might be nicer from a purely social standpoint for a sample of blood or semen or something to identify physical properties of the person rather than biological ancestry of the person, but we'll see what the future political opinion holds as to this question.
And finally, every time anybody raises the question -- we can always tell two people apart unless
they're identical twins. The identical twins don't happen very often in the crime scene, but it happens once in a while, and it's worth asking what the possibilities are of making an identification of identical twins.
There are certainly possibilities, and I suspect that a really well-equipped laboratory probably could do it right now. This is a guess, but it may be a good guess. For one thing, the genes that produce antibodies vary a great deal during the development of the individual, so that means that even though two individuals, as identical twins are, start out with the same genes, their antibody-producing capacities will be somewhat different.
They'll be especially different if these people have been exposed to different diseases, but they'll be different anyhow. So I think that's one possibility of looking at these particular loci. They're not part of the natural normal forensic identification, but it could be done.
Or maybe they're going to have different virus infections. We all have different virus infections, even identical twins do, and you can often identify the site in which the virus introduced itself into the genes. And mitochondria mutate pretty rapidly and probably it would be possible in many cases to identify identical twins by a sufficiently careful study of mitochondrial DNA.
And then finally the newest thing on the horizon from my somewhat belated view is that it's not possible to study genes, not just whether they're there or not, which is what DNA analysis does, but whether the genes are expressed or not, which means they follow the protein products or the RNA products of the gene. And identical twins who have the same genes are not going to express them the same way. We all know identical twins are not exactly alike, and that might be another possibility of this.
So I think our committee feels pretty strongly that within ten years identifying identical twins will be about as accurate as identifying unrelated people is now.
Now back to the report itself. What I want from you people -- and I'm sure you want to give it to me -- are suggested revisions in this. I tried to write something that the people who were critical earlier in the day would not be critical of -- not that I had them in mind -- but I tried to write something that would be non-prescriptive but descriptive and predictive. And if you can find sentences in here in which my attempts to be predictive look as if they're prescriptive and possibly could be used, please let me know.
And with that, I'll stop talking, Shirley, and entertain questions or comments or whatever it is.
JUSTICE ABRAHAMSON: I just -- thank you, Jim. I'm just going to move for the moment to see if there's public comment, because it's 4:30, so I'd like to do that, and then if there is none or whatever there is, we'll hear and then move back, if that's all right?
DR. CROW: Yes.
JUSTICE ABRAHAMSON: Okay.
DR. CROW: And it may very well be that people haven't had a chance to read this yet and would like to have the overnight opportunity to do --
JUSTICE ABRAHAMSON: Well, they will not have read the new --
DR. CROW: The new version and the new version that I particularly want comments on are the Chapter 7.
JUSTICE ABRAHAMSON: Is that the one you mostly want -- and that has not been -- that was handed out today, so I doubt the --
DR. CROW: I want to commend Jim Wooley, who read this with such close reading to discover that there were 35 pages missing. He's the only person who pointed it out.
MR. WOOLEY: That's what you get from all those years of scientific training I have in stats and stuff --
JUSTICE ABRAHAMSON: He only reads page numbers, Jim.
MR. WOOLEY: Yes.



Previous Contents Next
 
Back to National Commission Main Page