Skip to local navigation | Skip to main content

Fourth Annual DNA Grantees' Workshop

Wednesday, June 25, 2003

MORNING SESSION

Development of an Amplified Fragment Length Polymorphism Marijuana Database for Forensics
Heather Miller Coyle
Biography

MR. DELLA MANNA: Our next talk is from Dr. Heather Miller Coyle. Dr. Coyle is a DNA casework analyst and the research and validation coordinator at the Connecticut State Forensic Science Laboratory. She also serves as an adjunct professor in the Molecular Biology and Biochemistry Department at the University of Connecticut. Today, she will be speaking about the development of an amplified fragment length polymorphism (AFLP) marijuana database for forensics. Please join me in welcoming Dr. Coyle.

DR. COYLE: Good morning. I want to thank NIJ for inviting me to speak and give you an update on our project. I would say that we're part way through the second phase of this project. Coyle: Slide 1 and 2

I'm going to speak mostly about the construction of our marijuana database, but first I want to give you a little bit of framework about the first portion. Amplified fragment length polymorphism (AFLP) is a method used to individualize samples. I frequently get the question: Why are you working on this, because we already know how to identify marijuana? Well, that's true. There are plenty of methods out there to do that.

We're looking at two potential applications. We can see this technique being used as a typical forensic application, for example, to link a leaf fragment, perhaps found on a victim, to a leaf fragment found in a suspect's car. That is for seed-grown plants that are expected to have unique and individualized profiles, just like everyone in this room, except if you're an identical twin.

We're particularly interested in looking at hydroponics (i.e., growing plants without soil), for example high-tech clonal marijuana. It's just like a spider plant: If you take one of the little plants off the end and plant it, those are then genetically identical (like identical twins), and you can create a large population that is then easily traced by identical DNA profiles. That is one of the applications that we see being very useful for law enforcement and narcotics enforcement—being able to track sources, distributors, and users and linking them all together.

AFLP can be used on any single source sample and it can certainly be applied to other plant species, potentially for all those cases where you might not have a lot of human evidence but you might have other plant species, animals, or even bacterial species as evidence.

At the DNA level, this technique is a combination of RFLP (restriction fragment length polymorphism) and perhaps RAPD (random amplified polymorphic DNA). It's also most like multilocus probes that were originally designed by Alec Jeffries. They have highly discriminating but complex band patterns. Here is an example of one of the more complex band patterns. It looks a little different than what people are used to seeing, but we're actually working on some software modifications to Genotyper and creating some macros that will help score these complex patterns. I will talk about that a little today, so it's not as frightening as it might look here in this profile. Coyle: Slide 3

There are two sources of AFLP kits available right now. The Plant Mapping Kit, by Applied Biosystems, is the one that we're using. Lycor is the other source.

The Applied Biosystems kit comes as three components. The first component is a digestion ligation kit. What you do is you take two restriction enzymes, cut up your DNA into different size fragments, and attach them to adapters. The second part of the kit has preselective primers that will actually recognize the adapter sequences and allow you to amplify those fragments. That's why you can use it on any species, because the primers are actually recognizing the adapters, not any internal DNA specific to your organism. Therefore, you don't need to have any prior sequence information. The third component of the kit is a box of primers that are labeled with different dyes. With this, you can basically do 64 possible primer combinations for identifying or individualizing your sample.

Currently we are using four of those primer combinations, and we've found that they have been more than sufficient in individualizing the samples, other than the clones.

We're using 10 to 20 nanograms of DNA, and in the March issue of Journal of Forensic Sciences, we have published a technical note that documents the DNA extraction procedure we use. The extraction procedure is also kit based. Produced by Qiagen, the DNEZ Kit has a brief introduction to the AFLP technique with the four selective primer sets. Coyle: Slide 4

We also have a longstanding collaboration with Dr. Gary Shutler, who's in the audience. He has made clonal-reference samples for us so that we have known clones to test our technique, as well as samples from seizure material. What we know from what the equipment has collected suggests that these seizure samples were being cloned, but we aren't sure. We have actually processed multiple seizure samples from indoor growth operations and have been able to look at the genetics to know how many clonal lines are being produced in those cases.

I want to introduce you to some of the terminology that we use. When we do the four selective PCR (polymerase chain reaction) primer sets, we call that a composite AFLP profile. We are also looking at our sampling strategy for building a comparative database. Right now we're officially validated for doing AFLP but we also need the comparative database to look at random match frequencies; that's the second portion that we're working on. Coyle: Slide 5

We have access to a lot of samples in our home State of Connecticut, so we have a pretty good representation and we've done a lot of screening for the past year of the seizure material that is coming into our toxicology lab that handles those types of samples. We also have a pretty good representation from the area in Canada where our clonal-reference samples were being made, so we have a good idea of what might be going on in that small region. However, we want to build more of a national database that might be more representative of what's going on in your areas.

This is what we have so far. The States presented in yellow (Connecticut, Florida, Kentucky, Tennessee, Vermont, Virginia, and Wyoming) have contributed samples to us. Down here are the big producer states that are hooked up with Mexico; maybe some are imported. Samples are pending from these States. Coyle: Slide 6

Dr. Gary Shutler has generously offered to provide us with some samples from his home State of Washington. We have been talking to some other people, particularly a woman in Kentucky who has a good collection of what might be going on in California.

Although I'm up here explaining some of the research, I'm also here to ask you to give us samples for our national database so that you all can use it in the future if you want to implement this technology.

The other thing that we're going to look at is the concept of regional databases, because the national level can be overwhelming. But breaking it down into representative groups by focusing on different sections may make our task of building an accurate and representative database a little easier.

My e-mail is listed here. You can certainly contact me or anyone at our laboratory and ask about sending samples. We also hope to have a Web site posted soon that will give a description of what we're doing and include a sample submission form, additional information, and perhaps updates on where we're at, that way you can see the progress of your contributions.

We also have some out groups. As I mentioned, we have Canada, and we also have an individual in Taiwan who sent us more than 100 samples to test.

So what goes into our database? Sampling is certainly an issue. If you're building a representative database you have to look at how you're sampling. If the cases are known to be linked or plants are to be clonal, one representative profile goes into the database. For example, if we have a city in which people are sharing clonal samples across two sites, then that is considered one case. One sample from each site is examined but only one representative profile is included in the database. If they're not known to be linked ahead of time but the profiles are identical, they all get included. That's equivalent to an investigative lead. They could be linked, but we don't know for sure. Coyle: Slide 7

You always have to keep in mind that we're working with a difficult system here. We're working with illegal material. We have informants who are sometimes very happy to give us information about what they're breeding because they're very proud of what they have, but we're not really considering them as definitely reliable sources of what they're breeding, because something that they think might be true gold as a variety may have been misrepresented to them when they were growing it. That's why we're being a little careful about who we consider good informative sources. We're actually doing it based on the genetics that we see.

If it's proven to be clonal or linked afterward by informant information, then we can remove the duplicates. But we actually have multiple levels of databases going on right now.

From the Canadian region in which were sampling, we did have three street seizure locations that were linked by a common sample type. And we also have had two seizures separated by a fairly significant amount of time that were linked by having common AFLP profiles. That's sort of an idea as to how we're generating our database.

This is what we have so far as a breakdown. In Connecticut, we have a combination of indoor and street seizures. We have 57 individual cases that we've looked at; 173 samples were actually submitted, so those are individual plants or bags from the same case that we can test. The number processed is all of them. Coyle: Slide 8

In Vermont, I know that Eric Buel was very generous in giving us some street seizure samples from different cases, and they've all been processed. We did have one sample that did not yield any DNA. That's why there is a one-number difference in Vermont between the number submitted and the number processed.

Samples from Kentucky, Tennessee, and Virginia were donated by Margaret Singer as a sampling for us to test. We have two samples from Florida that have a variety of names associated with them. We're a little cautious about what's considered a variety, but they've been processed.

In Iowa, we had one case in which 500 or more plants were submitted from an outdoor growth operation to see if they were clonal or not. We have an indoor grow operation in Wyoming who sent multiple plants, and as I mentioned, we have good representation from Canada. Some of them are known clones that were propagated in-house.

The most recent sample was submitted to us from Jim Lee's lab in Taiwan. It's actually an import from China that he described as 245 tons. Thankfully, he didn't send all that to us. He sent us a representative sample of already-extracted DNA, but we have not yet processed it.

For the second portion of our NIJ grant, we said that we would process 500 or more samples that were representative. Although we're close to that number, we're still trying to get better representation.

We're trying to automate the scoring to make it easier for people to implement the technique for these more complicated profiles. We're using very inexpensive technology. I know that there are some computer programming people out there who may cringe when they see this, but it was done with in-house and readily available software. Software costs for anyone who wants to do this will be $100 if you already own Genotyper. The $100 will be for a philogenetic grouping program that can be downloaded from the Internet. Coyle: Slide 9

We're running on 377s and looking at the data in Genescan. We are running the data through Genotyper, modifying our Genotyper macros to score for peaks that are very clear and easily separated, and converting presence or absence of peak to binary code. This will allow you to more conveniently search a database rather than look through screen after screen after screen of AFLP profiles.

Just briefly, these grey bars that a lot of people take for granted are already matched up with an allelic ladder in your kits and can be adjusted. They're called user-defined categories. You can go into your Applied Biosystems manual or whatever software you're using and adjust them—make them bigger or smaller or move them around.

We took 100 samples that we felt were fairly different, lined them up in Genotyper, and looked for peaks that were varied from sample to sample, and created our bins or grey boxes for our binary coding. So when we were screening samples, we looked only at part of the peaks or the information from the profile, and then at the end, we went back and looked at the full amount of information. Basically, you take your profile and run it through the macro. If it has a peak present in a grey box, it gets converted to a one; if it's absent, it gets converted to a zero. You need an analyst to quickly screen through and make sure that you don't have a peak that falls half in or half out of a box; we have some criteria that we're going to write out about that. If you're more than half in, then you're in. If you're less than half in, then you're out. As long as you're doing it consistently, it can be useful for scoring. This allows you to semi-automate searching.

We selected and created bins around peaks that were variable in that set of 100. We certainly looked at peak height. We wanted things that were going to be fairly high RFU (relative fluorescence unit) values for the 20 nanograms of input DNA that we were using, so nothing that was on the margin. Coyle: Slide 10

We looked for peaks that did not have much interference from neighboring peaks, so no big shoulders or fragments that were comigrating close together that would be hard to separate. We looked for amplification consistency by doing reproducibility experiments, and we certainly looked at fragment size. So we have bins in the small area for degraded samples, and we have bins in the larger areas as well.

Here's where you can modify your bins: You just go in with these macros and type them. For example, here's the size, and you just create how big or small a bin you want. Coyle: Slide 11

Then this is what you get at the end. This is shows two primer sets (or half the composite profile), and you get strings of ones and zeros, and a subset of those peaks is converted for screening. Coyle: Slide 12

This is the $100 investment that you have to make. It's Philogenetic Analysis Using Parsimony (PAUP) software. We used it because we couldn't figure out how to make the Excel spreadsheets that hold the binary coding code for a one-, two-, or three-peak mismatch. Coyle: Slide 13

So, if you convert a sample to binary code, screen it through your database, and plug it into this software, you will get a group of peaks that will be related, and you only have to go back and look at the whole profile in that group. It works pretty well. Usually you have to go back and look at three, four, or five samples to see if it's a real match at the peaks that are not scored in those grey boxes.

It's inexpensive and low tech. I'm sure a software person could really help out in making something even more user friendly or combining things, but you can certainly do it in-house with very little expense.

So here's the general AFLP procedure:

  • Examine the evidence under a microscope, making sure that there is not a lot of mold or contamination or contaminating organisms on the outside.
  • Perform DNA extraction using the Qiagen kit.
  • Run out the sample on an agarose gel to look at the quantity and quality. If it's degraded beyond a certain point, you should not perform AFLP. But we've been surprised at what a good AFLP profile you can get with a several-year-old seizure sample.
  • Perform AFLP typing.
  • Take a quick look in Genescan to check the quality of the data and controls.
  • Perform Genotyper and score the subset of peaks.
  • Import to Excel and search your database.
  • Use the PAUP software to locate candidate matching profiles.
  • Look back at the raw data of a couple of samples to confirm match of non-scored peaks.
  • Calculate statistics. Right now we're struggling with how to do statistics. Because these are potentially lineage-based samples, we think we would use a counting method—something more like what mitochondrial DNA uses—which is another reason why I'm asking for samples. We want a lot of samples in our database so that the statistics will improve. We're probably going to use a confidence interval around whatever observed value we see in the database. Coyle: Slide 14

We are putting in some quality-control measures. Certainly, if you get comfortable with AFLP, then you don't have to do this every time; but for beginners, if you do duplicate reactions, then you get a very good feel of what is considered a good reaction and what is a bad one. We're also setting parameters for the highest peaks to be within 2,000 to 4,000 RFUs, so you can basically create the box, shoot the sample into the box, and get normalized data that you can compare adequately in the database. Coyle: Slide 15

Here's my request for samples. Again, it's a theme: We want more. You can certainly contact Major Tim Palmbach; Elaine Pagliaro, who's in the audience; or myself, and we can get you copies of our licenses and sample sheets to use for submissions. Coyle: Slide 16

Once we get a good representative database, we want to ask some interesting population genetic questions about this illicitly grown crop. We want to ask how much genetic diversity is out there, because historically we have information that says marijuana has been in cultivation for more than 10,000 years. It has been moved all over the place by humans and has potentially multiple origins: Asia and Africa. There is even a book that I recently read that said most of the U.S. varieties came out of a few genetic lines from California in the 1970s. Coyle: Slide 17

Therefore, we've gone through a bottleneck and can potentially have a lot of very intensive inbreeding. So we're interested in looking at how much genetic diversity is out there and where it is going.

We also are going to pursue ploidy levels. Marijuana is not naturally polyploid, but one of the things that anyone with an ornamental propagation background will know, you get bigger flowers or bigger buds if you increase the ploidy level. So we're wondering if some of the growers are perhaps using colchicine or something to increase the normal ploidy level to tetraploids. We're also going to take a look at geographic trends.

Because this project has being going on for a while, there are many acknowledgments. The National Institute of Justice has been funding us for quite some time, and we're very grateful for that funding, because without it, we wouldn't be doing this work. Gary Shutler originally provided us with the reference samples and got us off the ground and continues to help with ongoing collaboration, and Applied Biosystems donated a few kits to get us started way back in the beginning stages. Coyle: Slides 18–20

Jose Almaral, Eric Buel, Sandra Mays, Margaret Singer, and Sandra Stottenow have all made the effort to get us samples. The Connecticut State Narcotics Task Force is making the extra effort to get us samples and put them into bags; that's a lot of work for them. They don't get paid any more, but they're doing it to help us out, with hopes that they'll have a tool in the future. Then there are the many people who work at the Connecticut lab, primarily Joselle Germano Presty, who did all of the AFLP validation work this past year and who got a good start on the database; and Eric Carita, who's currently taken over Joselle's position and who's going to be finishing the database. We had a good intern, Elizabeth Baker, who helped out for 4 months and normalized all of the data. And we have a variety of people who have been helping with DNA extractions.

This is a collaborative effort between University of New Haven, NIJ, and us, and so these are the people that we have to thank for actually getting this all together.

I'll stop there. Thank you very much.


Previous          Contents          Next
Date Entered: January 17, 2008