Introduction

Over the last decade, the analysis of Y-chromosomal short tandem repeat (STR) loci has emerged as a powerful tool for paternity testing and forensic casework [1]. The need of information regarding the distribution of Y-STR haplotypes in human populations to obtain reliable frequency estimates has produced an enormous amount of literature, most of which was conveyed in online reference databases like the Y Haplotype Reference Database (YHRD) [2]. As novel Y-STRs become conveniently available in multiplex polymerase chain reaction (PCR) format, in addition to those traditionally included in forensic databases forming the so-called minimal haplotype (minHt), population data on extended Y-chromosomal haplotypes are required. In recent years, there has been an increasing interest in the forensic community towards complementing Y-STR data with information on Y-chromosomal single nucleotide polymorphisms (SNPs) [3]. These binary markers are characterized by a low mutation rate, and therefore their occurrence represents a unique event in human evolution. These properties, combined with the absence of recombination in most parts of the Y chromosome and the consequent accumulation of genetic diversity among lineages and populations, make Y-SNPs a valuable tool for predicting the geographic or ethnic origin of unknown samples found in forensic casework [4, 5]. In this study, combined Y-chromosomal SNP and STR variation was analyzed in a sample of Algerian individuals of Arab descent from the Oran area (northwest Algeria).

Materials and methods

Blood samples were drawn from 102 healthy adult men of Arab descent from the Oran area (northwest Algeria). Informed consent was obtained from all subjects, and sampling was anonymous. DNA extraction was carried out after the salting-out method [6]. Y-SNP genotyping was performed by heteroduplex analysis of PCR products, using the primers described by Cinnioglu et al. [7] and the Transgenomic WAVE dHPLC system (Transgenomic, Omaha NE, USA) [8]. The set of 22 binary markers used in the study and the complete Y-SNP typing results are shown in supplementary Table S1. Data were referred to by haplogroup, according to a standardized nomenclature [9]. Diagnostic Y-SNP defining each haplogroup are indicated in supplementary Table S1. Y-STRs were amplified with the AmpFlSTR Yfiler system (Applied Biosystems, Foster City CA), according to the manufacturer’s instructions, and typed by capillary electrophoresis on the ABI PRISM 310 Genetic Analyzer using GeneScan and Genotyper software (Applied Biosystems). Sequencing of DYS458 intermediate alleles was performed using the Big Dye Terminator Sequencing kit (Applied Biosystems) with primers described by Schoske et al. [10]. Standard diversity indices, pairwise genetic distances (R ST) and analysis of molecular variance (AMOVA) were calculated with the software Arlequin version 2.000 [11]. Discrimination capacity (DC) was determined by dividing the number of observed haplotypes by the number of sampled individuals. The Neighbor Joining (NJ) tree illustrating the relationship between Algerian Arabs and 11 reference populations based on R ST values was constructed using the program PHYLIP 3.67 [12]. Haplotype-frequency surfaces were graphically computer reconstructed following the Kriging procedure [13] by use of the Surfer System version 8.05 (Golden Software).

Results and discussion

The list of observed haplogroups is provided in supplementary Table S1. E3b2 was found to be the most frequent haplogroup (45.1%) in the northwest Algerian population sample. As shown by previous studies, E3b2 is very common in northwest Africa, but its frequency sharply declines eastward across north Africa [14], and it is almost completely absent in sub-saharan Africa. Individuals carrying this haplogroup are also found, with much lower frequencies compared to those observed in northwest Africa, in Turkey, the near East, the Balkans, southern Europe and in Iberia, possibly as a consequence of recent gene flow due to Islamic occupation of the peninsula [15]. Another common haplogroup in the Algerian population is J1 (22.5%). Outside the Maghreb, haplogroup J1 was shown to have the highest concentration in Egypt and the Middle East, with declining frequencies towards the east (Indian subcontinent), south (Horn of Africa) and northwest (Turkey, the Balkans and southern Europe) [16, 17]. Haplogroups typically found in European (R1b3) and sub-saharan African (E3a) populations were also observed with frequencies of 10.8 and 7.8%, respectively.

The list of observed haplotypes has been submitted to the YHRD and is provided in supplementary Table S2. By Y-STR typing, a total of 93 different 17-loci haplotypes were found in the Algerian population sample, and 88 were unique. Three haplotypes were shared by four, three and two individuals, respectively, all belonging to haplogroup E3b2. Two haplotypes, showing a single one-repeat unit mismatch at locus DYS389II, were shared by three and two subjects, respectively, carrying haplogroup J1. No cases of haplotype sharing among different haplogroups were observed, even when only minHt loci were considered. The contribution of additional Y-STRs to haplotype diversity and power of discrimination provided by minHt loci is shown in Table 1. There was good correlation between gene diversity (GD) of additional loci and their ability to distinguish between Y lineages, as assessed by DC. Such a strict correspondence was not seen for haplotype diversity (h), e.g. locus YGATAH4 although less diverse than DYS439 provided a larger percentage increase in h. With the exception of E3b2 and J1, all haplogroups were already subdivided into unique Y lineages by minHt loci. Among individuals sharing haplogroup E3b2, h as determined by minHt markers (0.900 ± 0.029) was low, if compared to the general sample (0.979 ± 0.007), but by typing additional Y-STRs, the difference was almost completely compensated. Diversity between minHts within haplogroup J1 (0.972 ± 0.026) was closer to that observed in the whole population sample, although among the additional loci only DYS458 further contributed to the increase in h and DC.

Table 1 Haplotype diversity (h) and discrimination capacity (DC) of minimal and 17-loci haplotypes in the whole Algerian population sample and within major haplogroups

Pairwise genetic distances (R ST) calculated between minHts (with the exclusion of DYS385) found in the Arab population sample from Algeria and geographical neighbours from north Africa [14, 1820] and southern Europe [2123] are shown in Table S3. Significant R ST values were observed between Algerian Arabs and Berbers from Morocco (R ST = 0.05962; p < 0.01) and Tunisia (R ST = 0.02791; p < 0.01), whereas the only significant comparisons with non-Berber populations from north Africa were with northern Egyptians (R ST = 0.01996; p < 0.05) and Tunisian Andalusians (R ST = 0.00962; p < 0.05). The NJ tree illustrating the relationship between populations based on R ST values is shown in Fig. S1.

The apportionment of haplotype diversity among and within haplogroups in the Algerian population sample, as determined by AMOVA, showed that 71.5% (p < 0.0001) of the total genetic variation was attributable to differences between haplogroups. The most frequent Y-STR alleles within major Algerian haplogroups are shown in supplementary Table S4. The ability of single Y-STRs included in the minHt to clearly differentiate between haplogroups has been previously described, e.g. African haplogroup E3a was shown to be typically associated with alleles with less than 22 repeat units at locus DYS390 [24]. Among additional Y-STRs included in the 17-loci multiplex PCR system used in this study, DYS635 completely differentiated the European haplogroup R1b3 (alleles 23 and 24) from Mediterranean and African haplogroups, in which only alleles with less than 23 repeats were observed. Similarly, genotypes at locus DYS448 discriminated between R1b3 (alleles 18 and 19) and the African haplogroup E3a (alleles 20–22). Also noteworthy was the high frequency of intermediate alleles (18.2 and 19.2) at locus DYS458 and its constant association with haplogroup J1 in the Algerian population. Although there are data suggesting that locus DYS458 may be affected by a higher mutation rate compared to other tetrameric Y-STR loci [25], the observed microvariants seem to reflect a single mutational event. They all shared a unique Y-SNP background and according to sequencing results a common repeat sequence structure [GAAA]16–17AA[GAAA]2. These findings are consistent with other recent population studies indicating that individuals carrying intermediate alleles at locus DYS458 are confined to a distinct subclade of haplogroup J [21, 26]. Further collection of joint Y-SNP and Y-STR population data will help to better understand the meaning of this association in terms of human evolutionary history.

The absence of recombination in the Y chromosome implies that haplotype diversity within subjects carrying a certain diagnostic SNP is limited to mutations accumulated over generations in the founder haplotype. As a consequence, specific allele combinations observed in a haplotype can be used to a certain extent to predict haplogroup, and hence the geographic or ethnic origin of an unknown biological sample. For instance, the frequency distribution of modal minHts (with the exclusion of DYS385) seen in the Algerian population sample among E3b2 and J1 individuals closely resembles that of the corresponding haplogroups (Figs. 1 and 2).

Fig. 1
figure 1

Contour maps created by the Kriging method showing the frequency distributions of: a haplogroup E3b2, based on this study and literature data [1416]; b the combination of the two most common haplotypes observed in Algerians carrying haplogroup E3b2 (DYS19:13; DYS389I:14; DYS389II:30; DYS390:24; DYS391:9/10; DYS392:11; DYS393:13), based on this study and data from three continental metapopulations (Europe, Asia, Africa) included in the YHRD (release 21). For geographical areas not covered in the YHRD, like Morocco and Palestine, population data from Quintana-Murci et al. [18] and Nebel et al. [28] were, respectively, used. Data from Nebel et al. on Jews and Palestinian Arabs do not include locus DYS389I/II

Fig. 2
figure 2

Contour maps created by the Kriging method showing the frequency distributions of: a haplogroup J1, based on this study and literature data [16, 17]; b the combination of the two most common haplotypes observed in Algerians carrying haplogroup J1 (DYS19:14; DYS389I:13; DYS389II:29/30; DYS390:23; DYS391:11; DYS392:11; DYS393:12), based on this study and data from three continental metapopulations (Europe, Asia, Africa) included in YHRD (release 21). For geographical areas not covered in the YHRD, like Morocco and Palestine, population data from Quintana-Murci et al. [18] and Nebel et al. [28] were, respectively, used. Data from Nebel et al. on Jews and Palestinian Arabs do not include locus DYS389I/II

In this respect, the availability of detailed information on Y-SNP and Y-STR joint distribution in human populations can provide a great benefit to forensic experts. The analysis of large Y-SNP panels for in-depth resolution of haplogroups is still a complex and DNA consuming procedure [5, 27] and cannot be routinely performed on small stains, like those commonly analyzed in forensic casework, which have to be preserved for STR multilocus profiling and individual identification. However, a-priori knowledge about the most likely SNP background of a certain Y-chromosomal haplotype can help the investigators to infer the provenience of the subject who left the evidence. At the same time, information on the geographic and ethnic distribution of Y-SNP haplogroups and their internal Y-STR variability can assist the expert with the estimate of the frequency of a particular haplotype in populations for which a specific reference database of Y-STR haplotypes is not yet available. To integrate Y-STR and Y-SNP data in convenient on-line accessible databases, issues like the selection of binary polymorphisms to generate a standard level of tree resolution, choice of suitable technologies for Y-SNP typing, Y-SNP nomenclature, and quality control of participating laboratories will have to be addressed in the future by the forensic community.