Introduction

The X chromosome has the unique feature of being present only in a single copy in male subjects, which leads to a series of special characteristics that justify its increasing interest in studies of human population genetics. Mutations occur less frequently in this chromosome because every existing X chromosome spends two-thirds of its lifetime in female subjects where the nucleotide mutation rate is much lower than in male subjects. Additionally, the effective population size of the X chromosome is three quarters that of the autosomes. These facts contribute to the lower diversity of this chromosome (estimated to be about half of that on the autosomes). However, the smaller population size of the X chromosome also causes changes owing to genetic drift being faster than in other chromosomal regions and, therefore, population structure is expected to be more emphasized. Consequently, populations should differ more in their X chromosomes than in their autosomes. Linkage disequilibrium is also greater on the X chromosome, because only two-thirds of this chromosome manages to recombine in each generation. The size of regions with a single genetic history is expected to be larger than in autosomes, once more making it ideal for human population genetic studies. Finally, determining haplotypes on the X chromosome is a simple procedure and therefore it has proved to be a good choice for haplotype-based phylogenetic studies.1

X chromosome variation has been applied successfully in large-scale geographical studies that cope with the origin of non-African populations pointing in the direction of the ‘Out of Africa’ model.2, 3 However, the use of X chromosome for fine-scale geographical studies is negligible when compared with the high number of human population studies focused on Y chromosome and mtDNA variation.

Alu repeats are a category of short interspersed nuclear elements widely distributed in the genome of all primates. With a size of less than 500 bp, Alu elements are the most abundant mobile elements in the human genome (1 100 000 copies). Twenty-five percent of the ‘young’ Alu human-specific elements have been incorporated in the human genome so recently that they are dimorphic for the presence or absence of the insertion. Subsequently, individuals can be polymorphic for the presence or absence of an Alu element at a particular chromosomal location. Alu elements have been shown to be useful in human evolution studies because they offer two important advantages compared with other polymorphic markers: (i) they are identical by descent – that is, individuals that share Alu-insertion polymorphisms have inherited them from a common ancestor and (ii) the ancestral state of each Alu insertion polymorphism is known to be the absence of the element, so that they can be used to ‘polarize’ the evolutionary process.4

The first thorough attempt to assess X chromosome variability relating to Alu elements was carried out in 2003 by Calliman et al.5 The result was 345 Alu repeat elements from eight young Alu subfamilies, 264 of which were found on the X chromosome. From these elements, 16 were found to be polymorphic, with various levels of heterozygosity depending on the origin of each population. The heterozygosity data from that study suggested that the Alu elements from the X chromosome would be useful as genetic markers for human population genetics. The data showed a slight reduction in Alu polymorphism on the human sex chromosomes, in accordance with the observations made above. A brand new polymorphic Alu element, baptized DXS225 and belonging to the Ya5 Alu subfamily, was described by Pereira et al.6 It was embedded in a LINE-1 retrotransposon, a region not previously examined, suggesting that there might actually be more polymorphic Alu elements on the X chromosome.

So far, the potential of Alu polymorphisms of the X chromosome, including a wide number of Alu polymorphisms of several anthropologically well-defined human populations, has not been tested. The Mediterranean region appears to be an interesting sample on which to test the efficacy of these markers. Previous anthropological works on Alu polymorphisms seem to be quite controversial, with some authors defending genetic flow between various groups in the Mediterranean and others assuming less interpopulation genetic relationships.7, 8, 9 In this region, there are two levels of genetic flow that could be the subject of interesting surveys: among Mediterranean groups, and between Mediterranean and sub-Saharan groups.

In this study, 13 Alu elements scattered along the whole X chromosome were analyzed for the first time in six anthropologically well-defined groups selected by geographical, historical and ethnical criteria, including the Basques, several Mediterranean groups and a sub-Saharan sample from the Ivory Coast. The objectives were: (i) to explore the variation of polymorphic X chromosome Alu repeats in well-defined human populations; (ii) to apply this variation to the study of population relationships among South European and North African groups, and the various sub-Saharan influences in these populations and (iii) to provide new evidence of the usefulness of these markers to uncover the genetic variation between human populations.

Materials and methods

DNA samples

Blood samples were collected for DNA extraction and Alu amplification from six different populations (525 individuals), each one originating from a different country. Samples were obtained, with informed consent, from healthy and unrelated participants of both sexes. All participants had all four of their grandparents born in the same region. The work was approved by the Ethical Committee of the University of Barcelona. Two of the populations originated from anthropologically well-defined Berber groups in Morocco and Egypt (High Atlas and Siwa Oasis, respectively). The other samples were from Monastir (a Centre-North region of Tunisia), the Basque Country (in northern Spain) and Crete Island (Greece). Finally, a sample from the Ahizi ethnic group from the Ivory Coast was genotyped in order to include a representation of the sub-Saharan African variation.

Genetic determinations

PCR amplification was accomplished in 20-μl reactions for the 13 Alu sequences of the X chromosome (Ya5DP62, Ya5DP57, Yb8DP49, Ya5a2DP1, Yb8DP2, Ya5DP3, Ya5NBC37, Yd3JX437, Ya5DP77, Ya5NBC491, Yb8NBC578, Ya5DP4 and Ya5DP13). Primer sequences are described in Calliman et al,5 as well as amplification and electrophoresis conditions with minor modifications.

Statistical analysis

Allele frequencies were calculated by direct gene counting. χ2 and Fisher's Exact tests were used to detect significant differences between male and female allele frequencies, and to check the Hardy–Weinberg equilibrium for each locus10 using the GENEPOP v3.311 statistical package. Reynolds’ distances12 were calculated between pairs of populations with PHYLIP13 statistical package and the consistency of the distance values was checked by bootstrap resampling analysis (100 iterations). Principal component analysis (PCA) was carried out using the R-MATRIX program.14 Polymorphism and population structure within and between groups was tested by analysis of molecular variance (AMOVA) using the ARLEQUIN v2.0 program.15 Global FST values were estimated by averaging partial values, and the resultant probability was calculated by combining probabilities from each individual test.16

Apart from the usual distance-based clustering methods described above, a model-based method was also used to infer population structure by means of the STRUCTURE 2.1 program.17 A model of K population groups (where K might be unknown) was assumed. This model was tested for several values of K using a specific Markov Chain Monte Carlo algorithm (the Gibbs sampler). STRUCTURE estimates the ‘natural logarithm of the probability of the data’ for each value of K, briefly referred to as ‘Ln P(XK)’. Among the estimated K values, that yielding the lowest absolute value of the Ln P(XK) is the one that best describes the data. In our data sets, we ran the Gibbs sampler under the admixture model (INFERALPHA=1.0), using prior population information and assuming correlated allele frequencies. All runs included a burn-in period of 50 000 iterations followed by 106 iterations, and they were repeated three times each in order to test the consistency of the results.

Results

Allele frequencies, Hardy–Weinberg equilibrium and heterozygosity

Table 1 shows the frequencies of the 13 Alu insertion alleles (Alu+) for the six populations. Some Alu elements appear to be fixed in several populations. In most cases, the frequency distributions fitted the Hardy–Weinberg equilibrium. The raw deviations found in Ya5DP62 in Ivory Coast (P=0.003), Ya5DP57 in Basques (P=0.007) and Ya5NBC491 in Tunisia (P=0.043) were not significant after the Bonferroni correction. Heterozygosity levels per locus and per population are also shown in Table 1. In general, most Alu elements show moderate to high diversity, except for Ya5NBC491, Yb8NBC578, Ya5DP4 and Ya5DP13. The Ya5NBC37 Alu shows the highest heterozygosity (0.436), followed by Ya5DP57, Yb8DP49, Ya5a2DP1 and Yb8DP2 Alu elements. Likewise, the most diverse population seems to be the Ivory Coast (H=0.342) followed by the rest of the North African (average H=0.250) and South European populations (average H=0.209). No significant linkage disequilibrium was present in any pair of the Alu markers, in accordance with the large chromosomal distances between them (ranging from 560 Kb to 59.65 Mb).

Table 1 X chromosome Alu insertion frequencies and heterozygosity per locus and per population

Reynolds' distances and principal components analysis

Population pairwise comparisons are indicated in Table 2. Bootstrap resampling analysis for the calculation of distance errors showed high consistency of values. The lowest distance has been found between Crete Island and Basque Country (0.0163), and this is the only value not significantly different from zero. On average, the mean distance of Ivory Coast to the other five samples is 0.2650, whereas the mean distance among Mediterraneans and Basques is considerably lower (0.0453).

Table 2 Reynolds' distances (below diagonal) and respective distance errors (above diagonal) for the six populations studied

The PC analysis allowed the graphic representation of population relationships, as shown in Figure 1. The first two axes account for 94.31% of the total variance when the six samples are considered. The first axis clearly separates the sub-Saharan group of the Ivory Coast from the rest. The two European samples lie in the other extreme of the variation, whereas the three North African samples show an intermediate position, although always closer to the Europeans. The second component contributes to a relative separation among non-sub-Saharan groups, with Siwa Oasis and Tunisia appearing at the most distant positions. When the analysis was repeated, in order to remove the effect of the sub-Saharan sample (not shown), the relationship pattern among the remaining populations was substantially the same. In this case, the first axis (accounting for 64.79% of the total variance) underlines the separation of Siwa Oasis and High Atlas from the Basques, Crete Island and Tunisia. It is interesting to note that Tunisia is the North African group that lies genetically closest to the European samples.

Figure 1
figure 1

Principal components analysis of the Ivory Coast, Basque Country, Crete Island, High Atlas, Siwa Oasis and Tunisia. The first two axes account for 94.31% of the total variance. The percentage of variance accounted for by each axis is shown in parentheses.

Analysis of molecular variance

A first approach to population diversity through the FST statistic yields a global value of 9.88% (P<0.001) when all six populations were grouped together (data not shown). That is, almost 10% of the detected genetic variance was concentrated between populations. By locus, the FST values range from 0.99% (P<0.05, for Ya5NBC37) to an impressive 40.38% (P<0.0001, for Ya5DP77). The global FST (9.88) is mainly attributed to, on the one hand, the variation of Ya5DP57, Ya5a2DP1, Yb8DP2, Yd3JX437 and Ya5DP77 Alu markers, and, on the other hand, the inclusion of the Ivory Coast sample. By excluding the Ivory Coast, the global FST changed to a more moderate value (3.81%, P<0.001). This mainly reflects the variation of the Ya5DP57 Alu polymorphism (individual FST=11.50%, P<0.0001), which in Siwa Oasis presents clearly distinguishable allele frequencies.

A hierarchical AMOVA, assuming two geographical groups (‘South European’ and ‘North African’) without the Ivory Coast, does not reveal any significant differences between the two Mediterranean shores, the frequency variance between the two groups (FCT=1.24%, P=0.338) being clearly lower than the diversity among populations within groups (FSC=3.06%, P<0.001). Separate AMOVA tests for South European and North African samples gave FST values of 0.38 and 3.45, respectively, both of them significant (P<0.001).

Model-based inference of population structure

To explore the degree of genetic structure among our samples, we defined a STRUCTURE data file containing all six populations. We estimated the posterior probabilities departing from the Ln P(XK) value. As seen in Table 3, when surveying population structure in all six populations, the model with K=3 seems to fit our data best. The same table also indicates the proportion of membership of each pre-defined population in each of the three clusters inferred for data set. The pattern of membership of the Ivory Coast is the most differentiated, with 77% of membership in cluster 1. The European and Tunisian samples show a similar pattern of membership in the three clusters (1:2:2), whereas Siwa Oasis and High Atlas seem to follow a 1:1:1 pattern. As expected, no cluster gets to become exclusively characteristic of the populations implicated, although cluster 1 could be considered as representative of the sub-Saharan variation. To delve into the two clusters not directly related with the sub-Saharan variation, we repeated the runs including only the five non-sub-Saharan groups (see Table 3). This time the model with K=1 was best to describe the data.

Table 3 Estimated natural logarithm of the probability of the data (Ln P(XK)) for each value of K in three data sets and proportion of membership of each pre-defined population in each of the three clusters

Discussion

This paper describes the pattern of the frequency distribution of 13 polymorphic Alu insertions of the X chromosome in five well-defined groups from the Mediterranean region, including the Basque Country. In general, the allele frequencies found range within the general patterns described previously,5 but with a remarkable between-population variation. In terms of variation, the six samples were adjusted to a decreasing pattern of diversity from South to North (mean heterozygosity for the Ivory Coast: 0.342, for North African samples: 0.250 and for the European ones: 0.209).

The hierarchical AMOVA analysis in the five Mediterranean groups (Basques included) showed that only a small and nonsignificant part of the genetic variance could be attributed to the variation between North–South groups (FCT=1.24%), indicating no particular genetic differentiation between the two sides of the Mediterranean Sea. However, the markers examined are consistent with a more important diversity within North Africa (FST=3.45%, P<0.001) than in South Europe (0.38%, P<0.001). Comas et al7 and González-Pérez et al8 had previously studied population relationships in the western-Mediterranean basin using polymorphic autosomal Alu elements. These two surveys indicated a North versus South differentiation (FCT values: 1.80 and 1.96%, respectively) slightly higher than in our results, but a population variation within groups clearly lower (FSC values of 2.30 and 0.47%, respectively) than that evidenced from the X chromosome Alu markers (FSC of 3.06%.). Moreover, the 13 Alu markers of this study reveal a population variation within North Africa (FST=3.45%) sixfold higher than that obtained from a similar number of autosomal Alu elements (FST=0.57%; Comas et al, 2000). The X chromosome markers reveal a higher population differentiation in comparison with the same kind of genetic markers in autosomes. This possibly reflects the effect of the reduced population size of the X chromosome on population variation, which, as we mentioned in the introduction, makes populations differ more in their X chromosomes than in autosomal markers.

The Reynolds' distances revealed generally accepted relationships among the Mediterranean populations. It is remarkable that the Arab-speaking sample from Tunisia shows a particular genetic position as compared with other North African groups. In fact, the Tunisian genetic distances to European samples are smaller than those to North African groups. This close position of Tunisia to the Europeans also appears in the population distribution in the PCA graph (Figure 1). This could be explained by the history of the Tunisian population, reflecting the influence of the ancient Phoenician settlers of Carthage followed, among others, by Roman, Byzantine, Arab and French occupations, according to historical records. Notwithstanding, other explanations cannot be discarded, such as the relative heterogeneity within current Tunisian populations,18 and/or the limited sub-Saharan genetic influence in this region as compared with other North African areas, without excluding the possibility of the genetic drift, whose effect might be particularly amplified on the X chromosome.

An interesting aspect comes from the evidenced relationships between the Basque Country and Crete Island. These two populations have distinct historical, anthropological and cultural backgrounds, and yet no significant differences were found between them when a locus-by-locus χ2 comparison was carried out. As for the remaining analyzed populations, Siwa Oasis seems to be the most differentiated (see Table 2 and Figure 1). The differentiation shown by Siwa Oasis, and also by High Atlas, could be related to higher foreign genetic contributions, from West Sahara into High Atlas and Nile groups into the Siwa Oasis. Esteban et al19 described a similar pattern of GGC allele frequencies of the androgen receptor (located in chromosome X) for the Ivory Coast and Siwa Oasis samples, giving evidence of sub-Saharan genetic influence in this Berber group.

The model-based method indicated population structure with three clusters inferred when all populations were examined. Cluster 1 (Table 3) is evidently the ‘sub-Saharan’ one. As for the other two, they seem to have a biological meaning only when seen in comparison with cluster 1. All European and North African samples show a 1:1 membership proportion in clusters 2 and 3. Only when seen in conjunction with cluster 1 do differences appear. The Basque country, Crete Island and Tunisia feature a similar pattern of membership in the three clusters (1:2:2), whereas Siwa Oasis and High Atlas seem to follow a 1:1:1 pattern. This could possibly explain why the software has failed to detect population structure in the Mediterranean populations, implying that there are no population-specific genetic patterns representative enough to allow us to assign, with certainty, individuals to populations.

The failure to detect population structure among Mediterranean groups might lead to the conclusion that our data are controversial, because we have already discussed the striking differences of, for example, Siwa Berbers and High Atlas Berbers from the other groups. Apparently, differences do exist, but they are not striking enough to allow the definition of different clusters within the Mediterranean region. Clustering appears only when a quite distinct human group is added, such as the sub-Saharan Ahizi from the Ivory Coast.

To sum up, our data on X chromosome markers support, in general, the differentiation patterns of the Mediterranean populations described by other investigators, providing, at the same time, detailed data of the frequency distribution of X chromosome Alu elements. To our knowledge, it is the first time that these specific molecular markers have been used in such a study. X chromosome Alu elements seem to perform well in fine-scale population differentiation studies. As no completely comparable data exist to survey the effect of the reduced population size of the X chromosome on the genetic distances between populations, a future investigatory line should include data from autosomal markers from the same populations. Furthermore, linkage disequilibrium studies are advisable, including higher mutation rate STR markers close to the X chromosome Alu insertions.