Fourth Annual DNA Grantees' Workshop
Wednesday, June 25, 2003
MORNING SESSION
LINE Elements: A New Source of Genomic Variation for DNA Profiling
David A. Ray
Biography
MR. DELLA MANNA: Our next speaker comes to us from Louisiana State University (LSU) of the Southeastern Conference. Dr. Ray is a postdoctoral fellow in the Laboratory of Comparative Genomics with Mark Batzer, and he is going to be speaking to us today on LINEs (long interspersed elements). So please join me in welcoming Dr. Ray.
DR. RAY: Thank you and good morning. Mark [Batzer] apologizes for not being here himself. He is at a meeting in Australia, so I'm sure he's just heartbroken that he couldn't be here. Ray: Slide 1
This morning I'll be talking about LINEs. LINEs are a class of mobile elements in the human genome, actually in all mammalian genomes, and the Batzer lab focuses on these and ALU elements in characterizing human and other primate genomic diversity.
The first little thing I'm going to do is talk about what mobile elements are and how LINEs specifically work. Most of you are probably familiar with this, but mobile elements are retroposons that act through a copy-and-paste mechanism using an RNA intermediate. For example, a source or a master gene serves as a template and is then copied using RNA polymerase II or III into an RNA transcript. That RNA transcript is then reverse transcribed and inserted back into the genome in various locations. So each time one of these elements is copied and put back into the genome, the genome gets a little bit larger. Ray: Slide 2
Mobile elements are found in just about every eukaryotic species that has been examined. Some have a lot of relatively new elements, such as the fly, whereas humans have relatively few new everything. The amplification of most mobile elements occurred in the past; only about 5 percent of the mobile elements that are currently in the genome are relatively young. Fortunately, though, that 5 percent represents several thousand potential loci for use during forensic and human identification work. Ray: Slide 3
Specifically, LINE1 (L1) mobile elements are examined through this grant. LINEs are long interspersed elements, as opposed to SINEs, which are short interspersed elements. LINEs are greater than 500 base pairs in length, and L1s, the ones I'm going to be talking about in just a while, are 6.5 KB (kilobytes) in length, if you get a full copy from one portion of the genome to another. Most of them are truncated, however, at the 5' end. The genome has very high copy numbers, more than 100,000. Can you imagine 6.5 KB times 100,000? It ends up that about 10 percent of the human genome is actually LINEs, and that does not include ALU and MIR (mammalian interspersed repetitive) elements, long terminal repeats (LTRs), and all sorts of other mobile elements. So, a significant portion of the genome, about half, is actually represented by mobile-element DNA. Ray: Slide 4
A LINE is made up of several parts. First of all, there are two open reading frames. ORF1 encodes a protein that has single-stranded binding protein properties. ORF2 encodes an endonuclease with reverse transcriptase activity. These two are required for the element to be mobile. Ray: Slide 5
There are two untranslated regions (UTRs) on either end, the 3' and 5' UTRs. These UTRs are useful for classifying the elements into particular subfamilies. Each end also includes direct repeats. These direct repeats are artifacts of either a nic, which naturally occurs in the DNA, or endonuclease activity of ORF2. One of these events must occur in order for the sequence to insert itself back into the genome. As a result, we end up getting a direct copy of part of the genomic sequence on both ends.
L1Ta and preTa are the subfamilies, with "Ta" standing for "transcriptionally active." These are elements that are actually still moving around in the human genome. That's wonderful for us, because by definition, that means that they are polymorphic in the human genome, and we can use them to identify geographic groups, populations, or subgroups of populations.
There are diagnostic bases (three in a row) for these L1Ta and preTa subfamilies in the 3' UTR. There are approximately 800, almost 900, copies in the human genome. We're not sure of the exact number because they're still moving and active. Forty-four of 124 are full-length elements, and any of the 44 could be the source gene or several source genes for the mobility of the subfamilies of elements. We have been able to locate several insertions that have occurred very recently in the human genome. Ray: Slide 6
It's interesting that mobile elements are there, but why bother using them? First of all, they are neutral genetic loci, making them useful for population genetic studies because we now have a locus with potentially 6,500 base pairs that evolve at a mutual rate. We can actually age and find the time that insertion occurred in the human genome. Ray: Slide 7
They are identical by descent, making them different from other loci that have been used before, for example microsatellite loci. Microsatellite loci, you may have a CA (cytosine-adenine dinucleotide) repeated 15 times, but how did it get to 15 copies? Did it go from 16 down to 15, from 14 up to 15, or some other combination? With mobile elements, we know that the ancestral state lacks an element; therefore, we know that the insertion of the element is going to be transferred faithfully from generation to generation.
Because of different population dynamics, including founder effect and population subdivision, we can have population-specific alleles. For example, say a European individual has a certain probability of having one particular LINE, whereas someone from Egypt may have a very small probability of having that same LINE. They're also very easy to genotype. Basically, all you need is a thermal cycler and an agarose gel.
For a couple of reasons, our lab tends to use a combination of ALU and L1 elements. First of all, there are a lot more ALU insertions than there are L1 elements, so that gives us two possibly different sets of markers for each study that we're doing. They have different genomic distributions. ALUs tend to insert themselves in guanine-cytosine-rich regions, whereas L1s tend to insert themselves in adenine-thymine-rich regions. Therefore, we're going to have different distributions of each set of elements. Ray: Slide 8
They have different genetic ages. L1s are older than ALUs, and they start moving around in the genome first. As a matter of fact, ALUs require L1s to be there in order for them to move. ALUs use the machinery because they do not encode their own endonuclease or reverse transcriptase. Using both sets of elements gives us additional population-specific alleles.
Here is an example of the different genomic distributions. LINE1s are shown in yellow, and as you can see, they tend to be clustered in regions of low guanine-cytosine content. ALUs are shown in light blue, and they tend to be clustered in regions of high guanine-cytosine content. Ray: Slide 9
There were four goals to this particular project. One was to identify 50 new polymorphic line insertions. Second was to determine the chromosomal location and philogenetic distribution of each LINE, in other words make sure it's human-specific and moving around just in the human genome. We do that by using a philogenetic panel using everything from green monkeys, which are Old World monkeys, to New World monkeys, chimpanzees, orangutans, and gorillas. We also developed or attempted to develop multiplex assays for the analysis of LINE insertion polymorphisms and to determine the human genetic variation associated with each LINE repeat. Ray: Slide 10
This was accomplished by using a combination of both computer and traditional wet-bench work. Ray: Slide 11
The completion of the Human Genome Project was very important for this project. We had the human genome; therefore, we could search the human genome for LINEs that were specific to the Ta and preTa subfamilies, use that sequence and the flanking sequence to design the primers, and go back and test different populations. The process basically involves searching GenBank using an oligonucleotide probe, Insilico, finding exact matches to that probe which are specific to the Ta and preTa sub-families, pulling out those sequences, then characterizing that sequence using either Repeat Masker or Sensor, two different pieces of software which can classify it as an L1, or a particular type of L1, or an L2. We then used Primer3 to design primers on either end of the element and BLAST to make sure those primers fell in unique sequences in the genome so we didn't get multiple bands. You end up with new elements that can be characterized in various individuals. Ray: Slide 12
Here is an example of the polymorphic LINE insertion assay. We used three different primers. The two primers in red and green are the flanking primers that were designed using GenBank, and there is a primer (indicated in blue) particular to the LINE element, either Ta or preTa. Ray: Slide 13
Two PCRs (polymerase chain reactions) are required for this. Although we could have, we chose not to amplify the entire LINE element, because that's more 70,000 base pairs in some cases, and using generic Taq, it's not that practical. So, we did two different PCRs: One using the LINE element primer and the flanking primer, and a second one using the two flanking primers. It's possible to get as many as two bands. If you're homozygous for having the element in your genome, then you would get just a single band with no fragment, because it simply wouldn't amplify because it's too far to go across using generic Taq. If you're heterozygous, you get both bands, with a fragment showing the empty site and another fragment showing the filled site. If you're homozygous for not having the repeat, you would just get the empty site band.
Here's an example of a primate panel using a human, several chimps, a gorilla, orangutans, a macaque, and a tamarin. All of these happened to be with the empty site. There are no filled sites in this case, but you can see there is no amplification if the element is not there because there is no place for the Ta primer to bind. Ray: Slide 14
Here's an example of a population panel using African-American DNA in which we've got several individuals who are polymorphic. Most of them do not have the insertion. However, several individuals are heterozygous for the insertion. Ray: Slide 15
We can also see that it is human specific. We only get the lower band for the chimpanzee and gorilla, and we don't get any amplification at all for the owl monkey, probably because of primer mismatches in the binding site. But this is a human-specific allele, and we do have polymorphic aspects of this allele.
We found several population-specific alleles. For example, L1HS149 occurs in Asian populations with 80 percent frequency. Whereas in Europeans, African Americans, and Egyptians, it's only 20 to 30 percent. So these are alleles that can be used to identify an individual's geographic origin. Ray: Slide 16
In review, there are certain properties of mobile elements that make them useful:
- They actually are very simple to assay. All that you need is a thermal cycler, an agarose gel, and a couple of primers.
- They are stable polymorphisms and are essentially homeoplasia free. Again, with microsatellites, you could go from 15 to 16 repeats or 16 to 15 repeats. With LINE insertions and ALU element insertions, you know the ancestral state. You know whether there was or was not an insertion. And it's only been shown that a partial deletion has occurred before. Simply, the chances of going back to the ancestral state of no insertion is basically nil. You know the ancestral state.
- They are identical by descent, and they get faithfully passed from generation to generation.
- There are population-specific alleles. Ray: Slide 17
This project analyzed more than 800 L1 elements. Unfortunately, almost half of those were monomorphic in human populations. They were fixed present in all humans so far as we could tell. Unfortunately, also, 75 of those were at the end of sequencing contigs in the human genome database, so the primers at either the 5' or 3' end couldn't be designed. Ray: Slide 18
Nearly 250 of these LINEs had actually inserted in either other LINEs or ALU repeats or some other type of repeats, so we were unable to design primers for those simply because we would end up getting multiple banding from the other repeats. One actually amplified in a non-human primate. That turned out to be an example of gene conversion. The details of that we can go into later if we need to, but 148 of them were polymorphic at various frequencies.
One of our goals was to identify 50 new polymorphic LINE insertions. This involved more than 100,000 individual PCRs. We managed to get 148 LINEs that were polymorphic; therefore, we exceeded our goal by about three times. We also were able to determine the chromosomal location and philogenetic distribution of each LINE; all but one were human specific. Ray: Slide 19
We attempted to develop multiplex assays for the analysis of LINE insertion polymorphisms, but for various reasons that simply wasn't practical. We also attempted to determine the human genetic variation associated with each one, and yes, we have 148 polymorphic LINE insertions that vary by different amounts in different human populations.
Several papers came out of this work. You're welcome to look at any of these. One I didn't get a chance to put up here is a Bamshed et al. paper using ALUs to type DNA to geographic origin. Right now we're working on using these LINE elements to add to that and using DNA to get even more specific identification of human origin. Ray: Slide 20
In conclusion, the L1, Ta, and preTa subfamilies have been amplified within the human genome. The allele frequencies and level of heterozygosity of each of the polymorphic LINEs was variable and diverse in human population groups, and polymorphic interspersed repeats are novel identical by descent markers with population-specific alleles for forensic identity testing. Ray: Slide 21
I'm new to the lab since I only got there in September of last year, but these people did most of the work for this. DNA samples, of course, were contributed by other people. Ray: Slide 22
Thank you very much.
MR. DELLA MANNA: Do we have any questions for any member of the panel?
(No response.)
With that, we'll break for 15 minutes. Thank you again to the whole panel.

