search this blog

Saturday, August 19, 2017

Genetic and archaeological continuity from Khvalynsk to Yamnaya

Over a year ago, using the D-stats/nMonte method of mixture modeling (see here), I noticed that Yamnaya did not appear to be simply a two-way mixture between Eastern European and Caucasus Hunter-Gatherers (EHG and CHG, respectively), but the result of a much more complex process:

Using the most plausible reference samples currently available - almost all of them older than Yamnaya, and thus unlikely to skew the results with Yamnaya admixture - reveals the following models for the two Yamnaya sets from Kalmykia and Samara, respectively.

Khvalynsk 57.7
Kotias 28.3
Hungary_EN 12.9
Ulchi 1.1
AfontovaGora3 0
Anatolia_Neolithic 0
Karelia_HG 0
Loschbour 0
MA1 0
Motala_HG 0

distance%=1.9125 / distance=0.019125

Khvalynsk 56.75
Kotias 26.4
Hungary_EN 10.85
Karelia_HG 4.4
Loschbour 1.6
AfontovaGora3 0
Anatolia_Neolithic 0
MA1 0
Motala_HG 0
Ulchi 0

distance%=2.1354 / distance=0.021354

Very interesting but hardly surprising. Essentially what we're seeing there is potentially very strong genetic continuity from the Eneolithic to the Early Bronze Age on the Pontic-Caspian Steppe. In other words, from Khvalynsk to Yamnaya.

However, at some point between the Eneolithic and the Early Bronze Age, the steppes saw a major influx of extra CHG, represented by the ~27% of Kotias-related admixture. Considering the relevant uniparental data, with lots of Y-HG R1b and no Y-HG J among Yamnaya males, I'd say this CHG came with women.

Also, the relatively high admixture related to early Hungarian Plain farmers (Hungary EN) is a fairly curious detail that has not been reported before. If real, it probably represents gene flow from the Neolithic and/or Chalcolithic Balkans to the Pontic-Caspian Steppe. Again, in all likelihood it mostly came with women, perhaps from Tripolye-Cucuteni and/or Varna communities.

The reason I mention this now is because I can reproduce basically the same model using the updated qpAdm methodology described recently in Lazaridis et al. 2017, which relies on a relatively large number (≥16) of ancient genomes/populations as outgroups (see here), and, in my experience, causes many formerly successful models to fail miserably (P-value dives from >0.05 to <0.05). Note that in my dataset Khvalynsk is now labeled Samara_Eneolithic, Kotias as CHG, and Hungary_EN as Hungary_N.

CHG 0.334±0.044
Hungary_N 0.115±0.031
Samara_Eneolithic 0.550±0.032
P-value 0.419775785
chisq 13.368
Full output

CHG 0.267±0.040
Hungary_N 0.130±0.027
Samara_Eneolithic 0.603±0.030
P-value 0.300777879
chisq 15.106
Full output

Here's a formerly successful model in which Steppe_EMBA (a grouping which includes Afanasievo, Poltavka, Russia_EBA and Yamnaya) is posited as a mixture between EHG and Chalcolithic farmers from the Zagros Mountains in what is now Iran. It clearly fails when I use CHG as one of the outgroups.

EHG 0.544±0.020
Iran_ChL 0.456±0.020
P-value 0.00279643007
chisq 31.553
Full output


CHG 0.310±0.034
Hungary_N 0.121±0.023
Samara_Eneolithic 0.568±0.025
P-value 0.50194795
chisq 12.316
Full output

Now, tight statistical fits are great, but they don't always reflect reality, especially when fine scale genetic structure is being tested. So does my model have any support from archeology? In other words, does archaeological data show continuity between Khvalynsk and Yamnaya (Pit-Grave culture)? According to Morgunova and Turetskij 2016 it does. Emphasis is mine:

Abstract: The aim of the paper is to provide the research results concerning the Pit-Grave culture sites of the south Ural region, which is a part of the Volga-Ural interfluve. The Pit-Grave culture developed mostly out of the Khvalynsk Eneolithic culture at the turn of the 5th–4th millennium cal BC. People of the Sredny Stog and forest-steppe Eneolithic cultures from the Middle Volga region also influenced the Pit-Grave culture. The paper considers the radiocarbon data (more than 120 dates), specifies the periodization of the Pit-Grave culture of the Volga-Ural interfluve, singles out the three stages of its development. The chronology of the culture is determined 3900–2300 cal BC. The authors provide new information about the Pit-Grave economy. Paleopedology, palynology, anthropology, metallography, ceramic technical, and technological analyses were used together with archaeological methods to make a more detailed description of the culture.


A number of steppe Eneolithic features remained at the Repin stage. The cultural continuity between the Pit-Grave, Khvalynsk, and Sredny Stog Eneolithic cultures was proved by the following features: skeletons in crouched supine position with bent legs to the left or to the right, heads at the eastern sector of burials, ochre coverage with high or low density, multiple burials, egg-shaped ceramics with neck and crushed shell impurity. Technical and technological analysis of pottery was another evidence demonstrating the pottery continuity between the Khvalynsk and Repin traditions (Vasilyeva 2002; Salugina 2005). Big soil burial grounds were substituted by individual burials under the barrow. The spread of local production copper articles was a distinctive feature of the Pit-Grave culture. This was the phenomenon, which archaeologists consider to be the beginning of the Early Bronze Age in steppe of Eastern Europe.

Morgunova N. and Turetskij M., Archaeological and natural scientific studies of Pit-Grave culture barrows in the Volga-Ural interfluve, Estonian Journal of Archaeology, Vol. 20, Issue 2, doi: 10.3176/arch.2016.2.02

Friday, August 18, 2017

So far so good for the Kurgan hypothesis

This is basically what I'm seeing in the ancient DNA published to date. Thus, the Kurgan hypothesis or steppe theory, which, of course, posits that the Proto-Indo-European homeland was on the Pontic-Caspian steppe, is looking really good at this stage. Indeed, unless there are some ancient DNA shocks on the way from, say, Anatolia or the Caucasus, that might buck the trend, then this one's in the bag.

See also...

A Bronze Age dominion from the Atlantic to the Altai

A homeland, but not the homeland #2

The Out-of-India Theory (OIT) challenge: can we hear a viable argument for once?

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Wednesday, August 16, 2017

A homeland, but not the homeland #2

Back in May, in a post titled A homeland but not the homeland, I said this:

It seems increasingly likely that ancient DNA has identified a massive expansion, or a series of expansions, from Mesopotamia and/or surrounds in basically all directions dating to the Chalcolithic (ChL) and Bronze Age (BA). This phenomenon is mainly characterized by the simultaneous spread of:

- Iran_ChL-related genome-wide ancestry

- Y-haplogroup J

- South Caspian-specific mitochondrial haplogroups such as R2 and U7

In the same post I also included a list of ancient populations that showed at least two of these characteristics. I can now add two more populations to this list: the Minoans and Mycenaeans.

- Anatolia_BA, Western Turkey, 2836-1800 calBCE (Lazaridis et al. 2017)

- Egyptian mummies, Middle Egypt, 776-2 calBCE (Schuenemann et al. 2017)

- Iran_ChL, Western Iran, 4839-3796 calBCE (Lazaridis et al. 2016)

- Levant_BA, Northwestern Jordan, 2489-1966 calBCE (Lazaridis et al. 2016)

- Minoans, Crete, Greece, 2900-1700 BCE (Lazaridis et al. 2017)

- Mycenaeans, Greece, 1700-1200 BCE (Lazaridis et al. 2017)

- Sidon_BA, Southern Lebanon, 1750-1600 BCE (Haber et al. 2017)

Out of all of these groups, only the Mycenaeans are generally accepted to have been speakers of an Indo-European language. However, they differ from the others in that they harbor minor but significant ancestry from a source, or multiple sources, closely related to Yamnaya, Sinatshta and other Bronze Age peoples of the Pontic-Caspian steppe (see here).

Possible question for the discussion in the comments: what does this say about where the Mycenaeans got their Indo-European language? Also, who wants to bet that Bronze Age samples from the Indus Valley Civilization will too make it onto my list?

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Monday, August 14, 2017

CHG or no CHG in Bronze Age western Iberia?

Here's what Martiniano et al. had to say recently in regards to the genetic shifts in what is now Portugal, western Iberia, during the Bronze Age that they saw in their ancient DNA data:

A recurring feature of ADMIXTURE analyses of ancient northern Europeans is the appearance and subsequent dissemination within the Bronze Age of a component (teal) that is earliest identified in our dataset in HGs from the Caucasus (CHG). Unlike contemporaries elsewhere (but similarly to earlier Hungarian BA), Portuguese BA individuals show no signal of this component, although a slight but discernible increase in European HG ancestry (red component) is apparent. D-Statistic tests would suggest this increase is associated not with Western HG ancestry, but instead reveal significant introgression from several steppe populations into the Portuguese BA relative to the preceding LNCA (S4 Text, S6 Table).


In the present analysis, fineSTRUCTURE has identified the 3 Portuguese Bronze Age individuals as a genetically distinct population (S23 Fig). When compared to Central or Northern European populations such as Ireland [11], the degree of discontinuity between the Neolithic and Bronze Age in Portugal is not pronounced. However, despite the small sample size we have evidence suggesting complete discontinuity at the level of Y-chromosome lineages with all 3 male Bronze Age samples presenting derived alleles at marker M269.

Although in ADMIXTURE analysis we were not able to observe the presence of the CHG-related cluster in the ancestry proportions of the Portuguese Bronze Age samples, with D(Mbuti, X; Portuguese MN/LNCA, Portuguese BA) we find support for CHG/Yamnaya related introgression and also an increase in EHG [Eastern European Hunter-Gatherer] ancestry.

Despite the authors' conclusion that steppe-related admixture was present in their Portuguese BA samples, the ambiguity created by their ADMIXTURE analysis encouraged some heated debates in the comments at this blog and elsewhere about whether their findings were legitimate, and also suggestions that the Portuguese BA R1b-M269 Y-chromosomes were not derived from the steppe.

To try and put this debate to bed, at least on this blog, let's run the same samples with the qpAdm mixture modeling algorithm. I don't want to get into the details here about the difference between ADMIXTURE and qpAdm, because I don't feel it's something that I can explain accurately. But, suffice to say that qpAdm is a more direct way of estimating ancestry proportions, so, in my experience, it's less likely to lose minor but significant admixture signals in a well thought out and put together analysis.

First up, I need to test whether these Portuguese BA (Portugal_BA) individuals can be modeled as a two-way mixture between EHG and Portuguese Late Neolithic farmers (Portugal_LN).

EHG 0.093±0.036
Portugal_LN 0.907±0.036
P-value 0.0102798873
chisq 20.015
Full output

Nope, they can't. But what happens if I add CHG to the model?

CHG 0.106±0.048
EHG 0.042±0.042
Portugal_LN 0.852±0.042
P-value 0.0367007784
chisq 14.946
Full output

The statistical fit improves, but it's still lousy, which perhaps suggests that I need a temporally more proximate CHG-related reference sample. How about Yamnaya?

Portugal_LN 0.849±0.045
Yamnaya_Samara 0.151±0.045
P-value 0.0725988319
chisq 14.371
Full output

That's not too bad. But let's try a more proximate Yamnaya-related population: Bell Beakers from Germany. Note that some of these Beakers belonged to Y-haplogroup R1b-M269(P312+), which is the most common Y-chromosome lineage among present-day Iberians.

Bell_Beaker_Germany 0.328±0.089
Portugal_LN 0.672±0.089
P-value 0.109643502
chisq 13.065
Full output

Somewhat better, and we could probably keep going like this, improving the fits each time, with more relevant reference samples if they were available, like, say, late Beakers from what is now France. I suspect also that using more westerly Hunter-Gatherers than EHG, perhaps from what is now Ukraine, might significantly improve the second model. In any case, my qpAdm analysis provides strong evidence that, unlike Portugal_LN, Portugal_BA harbored CHG-related ancestry that was probably mediated via Yamnaya- and Beaker-related groups.


Martiniano R, Cassidy LM, Ó'Maoldúin R, McLaughlin R, Silva NM, Manco L, et al. (2017) The population genomics of archaeological transition in west Iberia: Investigation of ancient substructure using imputation and haplotype-based methods. PLoS Genet 13(7): e1006852.

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Steppe admixture in Mycenaeans, lots of Caucasus admixture already in Minoans (Lazaridis et al. 2017)

Saturday, August 12, 2017

The Iron Age Iranian (?)

After the recent publication of Bronze Age genomes from present-day Greece and Portugal, you'd have to be a desperate fool not to accept that the Pontic-Caspian steppe in Eastern Europe is the most likely homeland of all surviving branches of the Indo-European language family. I don't want to say I told you so, but, well, I told you so (see here).

Yes, we're still waiting for those ancient genomes from South Asia. But don't expect any surprises when they do arrive, probably in a couple of months. Indeed, if you've still got a thing for the Out-of-India Theory (OIT), then it might be time to start looking around for a different hobby than following ancient DNA results. My advice is try meditation.

Thus, pending the sequencing of Hittite and other bona fide early Indo-European genomes from Bronze Age Anatolia, which should be able to help pinpoint the Proto-Indo-European (as opposed to just the Late Proto-Indo-European) Urheimat to the satisfaction of most, I suggest that we shift focus in the comments here in a big way, and, instead of wasting time arguing whether the early Indo-European expansions from the steppe happened, we get stuck into the details of how they happened.

Worthy subjects of discussion in this context, I'd say, are a couple of intriguing ancient West Asian individuals whose genotypes are now available for download at the Reich Lab website: Kumtepe4 from Chalcolithic Anatolia and F38 from an Iron Age burial at Tepe Hasanlu in what is now Iran.

Let's start with F38, whose genome was originally published back in 2016 as part of Broushaki et al. (see here):

Furthermore, our male Iron Age genome (F38; 971-832 BCE; sequenced to 1.9x) from Tepe Hasanlu in NW-Iran shares greatest similarity with Kumtepe6 (fig. S21) even when compared to Neolithic Iranians (table S20). We inferred additional non-Iranian or non-Anatolian ancestry in F38 from sources such as European Neolithics and even post-Neolithic Steppe populations (table S20). Consistent with this, F38 carried a N1a sub-clade mtDNA, which is common in early European and NW-Anatolian farmers (3). In contrast, his Y-chromosome belongs to sub-haplogroup R1b1a2a2, also found in five Yamnaya individuals (17) and in two individuals from the Poltavka culture (3). These patterns indicate that post-Neolithic homogenization in SW-Asia involved substantial bidirectional gene flow between the East and West of the region, as well as possible gene flow from the Steppe.

In other words, it's almost certain that F38 had recent ancestry from elsewhere than the South Caspian region, and probably from the Pontic-Caspian steppe.

However, interestingly, when F38 was alive, Tepe Hasanlu was more likely to have been an ethnically Hurrian or Urartian site, rather than an Iranian one, and the Iron Age settlement there has a fascinating and tragic final story (see here).

Also, F38 shows a great deal of genetic similarity to three Early Bronze Age (EBA) samples from Kura-Araxes culture burials in what is now Armenia (labeled together as Armenia_EBA). Indeed, one of these Kura-Araxes individuals belongs to Y-haplogroup R1b, albeit to a different subclade than F38. Moreover, Kura-Araxes people are hypothesized to have been early speakers of Hurro-Urartian languages.

This is where Armenia_EBA and F38 cluster in my Principal Component Analysis (PCA) of ancient and present-day West Eurasian populations. Right click and open in a new tab to enlarge:

Like four peas in a pod, right? Not necessarily, because this outcome might be a simple coincidence. And, in fact, that's what my qpAdm analysis suggests. Using no less than 16 ancient outgroups, I found that the models below produced the best fits. Obviously, Anatolia_BA stands for Anatolia Bronze Age, CHG for Caucasus Hunter-Gatherer, Iran_ChL for Iran Chalcolithic, and Tepecik_Ciftlik_N for Tepecik Ciftkik Neolithic.

Iran_IA F38 (2-way)
Iran_ChL 0.815±0.066
Poltavka_outlier 0.185±0.066
P-value 0.72807065
chisq 10.457
Full output

Iran_IA F38 (3-way)
Anatolia_BA 0.122±0.107
Iran_ChL 0.717±0.098
Poltavka_outlier 0.161±0.070
P-value 0.773758066
chisq 8.989
Full output

Armenia_EBA (2-way)
CHG 0.582±0.042
Tepecik_Ciftlik_N 0.418±0.042
P-value 0.817374811
chisq 9.210
Full output

Admittedly, a more systematic and exhaustive search might be able to dig up even better fitting models and show that F38 does share recent ancestry with Armenia_EBA. But in any case, after running these tests, I'm now certain that F38 had significant admixture from the European steppe, probably via a population very similar to Poltavka_outlier.

On the other hand, I'd say that if Armenia_EBA had any steppe ancestry, then it's only a few per cent, and likely from a less northern-shifted source than Poltavka_outlier. This is what the 2-way models look like on the same PCA as above. Armenia_EBA and F38: so similar, yet potentially so different.

F38's probable steppe connection, of course, suggests that he was at least partly of Indo-European origin, and possibly a speaker of an Iranic language, because the Poltavka culture has been associated by some scholars with early Indo-Iranians.

Unfortunately, I don't have a decent enough diploid version of F38's genome to test his fine scale genetic affinities with a haplotype analysis. So I'd say that the most useful thing I can do, that wasn't already done in Broushaki et al., is to run an Identical-by-State (IBS) affinity test. This method is generally pretty good at picking up recent ethnic-specific genetic drift. These are F38's top 25 matches out of over 100 present-day populations:

Georgian 0.676468
Armenian 0.676024
Abkhasian 0.675791
Iranian_Jew 0.675418
Iraqi_Jew 0.675224
Lezgin 0.675124
Cypriot 0.674942
Greek 0.674824
Kurdish 0.674795
Uzbek_Jew 0.674770
Azeri_Jew 0.674701
Greek_Macedonia 0.674700
Italian_South 0.674556
Kosovar 0.674489
Chechen 0.674463
Sicilian_East 0.674334
Turkish 0.674315
Sicilian_West 0.674247
Sephardic_Jew 0.674198
North_Ossetian 0.674125
Kumyk 0.674045
Romanian 0.674017
Greek_Peloponnese 0.673945
Iranian 0.673911
Yemenite_Jew 0.673875

The top three hits are from the Caucasus, which I suspect is due to F38's high ratio of CHG-related ancestry. Iranian and Iraqi Jews are both in the top five, probably because they're relatively similar to Iran_ChL. Armenians are the highest scoring Indo-European speakers, but Kurds also make the top ten, and it's interesting to see several different Greek and Italian groups in the top 25. No idea what that might mean though? To wrap things up, I'll suggest a few questions for the ensuing discussion in the comments:

- Was F38 an Hurro-Urartian or Indo-European, or an Hurro-Urartian with some Indo-European ancestry? If Indo-European or partly Indo-European, then what type? Armenian, Cimmerian, Iranian, or...?

- Is F38's R1b1a2a2 lineage a reflection of his potential Poltavka ancestry from the steppe or Kura-Araxes ancestry from the Caucasus?

- What explains F38's strong affinity to many modern-day European groups?

- Does the southern, non-Eastern European Hunter-Gatherer (EHG), part of Yamnaya's ancestry perhaps derive from a Bronze Age South Caspian population closely related to F38 and rich in R1b1a2a2?

Nah, I'm just trolling with that last one. I thought I'd save some of you the trouble. Let's be honest, what are the chances that this will ever pan out? I'll give it a probability of 5%.

See also...

Ancient herders from the Pontic-Caspian steppe crashed into India: no ifs or buts

Steppe admixture in Mycenaeans, lots of Caucasus admixture already in Minoans (Lazaridis et al. 2017)

Yamnaya-related migrations into Iberia: infiltration rather than invasion (Martiniano et al. 2017)

Thursday, August 10, 2017

Basal-rich K7 & Global 10 updates (10/08/2017)

I've updated the Basal-rich K7 spreadsheet and the Global 10 datasheets with a plethora of ancient individuals and populations, including Anglo-Saxons, British Celts (labeled England_IA), Minoans, Mycenaeans, Bronze Age Iberians and many more.

Basal-rich K7 spreadsheet

Global 10 main datasheet

Global 10 ancient averages datasheet

Please keep in mind that the K7 can be somewhat conservative with minor ancestry proportions, especially Ancient North Eurasian (ANE) admixture, and low coverage samples can behave in odd ways in the Global 10. So when modeling ancestry with ancient samples it might be useful to stick to high coverage individuals that show consistent results. If you don't know what the Basal-rich K7 and Global 10 are, then these links will be useful.

The Basal-rich K7

Global 10: A fresh look at global genetic diversity

An nMonte and 4mix guide for the participants of the Basal-rich K7 and/or Global 10 tests

Tuesday, August 8, 2017

Pots were people in Bronze Age southern Central Asia too

New archaeological evidence of potentially significant Bronze Age migrations from the Eurasian steppe into present-day southern Turkmenistan is coming to light thanks to the Archaeological Map of the Murghab Delta (AMMD) project. The new findings are discussed in a paper in Quaternary International available here or here. From the paper:

Adding to the number of questions was the fact that the AMMD project also recorded hundreds of small campsites, particularly in the northern distal reaches of the fan, that bore ceramics [my note: called Incised Coarse Ware or steppe ware] unlike those of other Murghab communities, but with unmistakable affinities to the so-called Andronovo cultural group occupying regions to the north and east during this same period (Cattani, 2008; Cattani et al., 2008; Cerasetti, 2008, 2012). These campsites are interpreted as representing the influx of a new socio-cultural group of mobile pastoralists who began to occupy first more remote areas and gradually move toward more physical and subsistence integration with settled farming groups in the Murghab (Cerasetti et al., in press; see also; Rouse and Cerasetti, 2014). However, the question of whether such encounters upset a careful ecological balance struck by Murghab farming settlements for over a millennium, or whether they were merely coincidental with environmental changes, could not be sufficiently addressed with the coarseness of survey data; targeted research agendas were (and are) still needed to address such questions specifically. Nonetheless, up to this point, it is clear that at the end of the Bronze Age, major social, demographic, and environmental changes were coinciding.

Southern Turkmenistan is, of course, not too far away from South Asia, which was also potentially a target of large scale Bronze Age migrations from the Eurasian steppe that may have brought Indo-European languages to the region. Archaeological evidence of such population movements into South Asia is, for now, apparently minimal or, as some claim, even non-existent. However, ancient DNA evidence in favor of the so called Aryan Invasion or Migration Theory (AIT/AMT) is rapidly building up (see here). By the way, if you're wondering about the title of this post then this might help: "Kossinna's Smile" (Heyd, 2017).


Rousea and Cerasetti, Micro-dynamics and macro-patterns: Exploring new archaeological data for the late Holocene human-water relationship in the Murghab alluvial fan, Turkmenistan, Quaternary International, Volume 437, Part B, 5 May 2017, Pages 20-34,

See also...

Maybe first direct hints of Yamnaya-related gene flow into South Central Asia

Swat Valley "early Indo-Aryans" at the lab

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Wednesday, August 2, 2017

Steppe admixture in Mycenaeans, lots of Caucasus admixture already in Minoans (Lazaridis et al. 2017)

Over at Nature at this LINK. Why is the presence of steppe admixture in Mycenaeans important? And why does it matter if the Minoans already had a lot of ancestry from the Caucasus or surrounds? Because Mycenaeans were Indo-Europeans and Minoans weren't. I'm still reading the paper and will update this entry regularly over the next few days. Below is the abstract and, in my opinion, a key quote. Emphasis is mine.

The origins of the Bronze Age Minoan and Mycenaean cultures have puzzled archaeologists for more than a century. We have assembled genome-wide data from 19 ancient individuals, including Minoans from Crete, Mycenaeans from mainland Greece, and their eastern neighbours from southwestern Anatolia. Here we show that Minoans and Mycenaeans were genetically similar, having at least three-quarters of their ancestry from the first Neolithic farmers of western Anatolia and the Aegean [1, 2], and most of the remainder from ancient populations related to those of the Caucasus [3] and Iran [4, 5]. However, the Mycenaeans differed from Minoans in deriving additional ancestry from an ultimate source related to the hunter–gatherers of eastern Europe and Siberia [6, 7, 8], introduced via a proximal source related to the inhabitants of either the Eurasian steppe [1, 6, 9] or Armenia [4, 9]. Modern Greeks resemble the Mycenaeans, but with some additional dilution of the Early Neolithic ancestry. Our results support the idea of continuity but not isolation in the history of populations of the Aegean, before and after the time of its earliest civilizations.


The simulation framework also allows us to compare different models directly. Suppose that there are two models (Simulated1, Simulated2) and we wish to examine whether either of them is a better description of a population of interest (in this case, Mycenaeans). We test f4(Simulated1, Simulated2; Mycenaean, Chimp), which directly determines whether the observed Mycenaeans shares more alleles with one or the other of the two models. When we apply this intuition to the best models for the Mycenaeans (Extended Data Fig. 6), we observe that none of them clearly outperforms the others as there are no statistics with |Z|>3 (Table S2.28). However, we do notice that the model 79%Minoan_Lasithi+21%Europe_LNBA tends to share more drift with Mycenaeans (at the |Z|>2 level). Europe_LNBA is a diverse group of steppe-admixed Late Neolithic/Bronze Age individuals from mainland Europe, and we think that the further study of areas to the north of Greece might identify a surrogate for this admixture event – if, indeed, the Minoan_Lasithi+Europe_LNBA model represents the true history.

Lazaridis, Mittnik et al., Genetic origins of the Minoans and Mycenaeans, Nature, Published online 02 August 2017, doi:10.1038/nature23310

Update 03/08/2017: This is my own Principal Component Analysis (PCA) of the Minoan and Mycenaean samples, which are freely available at the Reich Lab website here. The Armenian angle for the eastern admixture in Mycenaeans looks forced. The trajectory of this admixture obviously runs from Northern or Eastern Europe to the Minoans. If it did arrive from Armenia, then realistically only via a heavily steppe-admixed population. Right click and open in a new tab to enlarge:

Update 05/08/2017: Much like Lazaridis et al., I ran a series to qpAdm analyses to find the best mixture model for the Mycenaeans. However, just to see what would happen, unlike Lazaridis et al., I didn't group any of the archaeological populations into larger clusters based on their genetic affinities. The three models below stood out from the rest in terms of their statistical fits.

Minoan_Lasithi 0.786±0.049
Sintashta 0.214±0.049
P-value 0.96574059
chisq 6.030
Full output

Corded_Ware_Germany 0.210±0.043
Minoan_Lasithi 0.790±0.043
P-value 0.961238695
chisq 6.198
Full output

Minoan_Lasithi 0.791±0.043
Srubnaya 0.209±0.043
P-value 0.950419642
chisq 6.558
Full output

So it's essentially the same outcome as the one obtained by Lazaridis et al., because Sintashta and Srubnaya are part of their Steppe_MLBA cluster, while Corded Ware is part of their Europe_LNBA cluster, and it's these clusters that, along with Minoan_Lasithi, provided their most successful mixture models for the Mycenaeans. But it's nice to see Sintashta at the top of my results, because it fits so well with the long postulated archaeological links between Sintashta and the Mycenaeans (for instance, see here).

By the way, here's what I said back in May when the Mathieson et al. 2017 preprint came out (see here). So things are falling into place rather nicely.

The same paper also includes the following individual from present-day Bulgaria dated to the start of the Late Bronze Age (LBA), which is roughly when the Mycenaeans appeared nearby in what is now Greece:

Bulgaria_MLBA I2163: Y-hg R1a1a1b2 mt-hg U5a2 1750-1625 calBCE

This guy is the most Yamnaya-like of all of the Balkan samples in Mathieson et al. 2017, and, as far as I can see based on his overall genome-wide results, probably indistinguishable from the contemporaneous Srubnaya people of the Pontic-Caspian steppe. He also belongs to Y-haplogroup R1a-Z93, which is a marker typical of Srubnaya and other closely related steppe groups such as Andronovo, Potapovka and Sintashta. So there's very little doubt that he's either a migrant or a recent descendant of migrants to the Balkans from the Pontic-Caspian steppe.

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Tuesday, August 1, 2017

A Bronze Age dominion from the Atlantic to the Altai

The BEAGLE analysis that I foreshadowed a couple of days ago (see here) is finally done. The output is available for download as a matrix of shared genomic tracts in centimorgans (cM) here.

I haven't yet had a chance to look at the results in detail, but I'd say that the outcomes for the three Early Bronze Age (EBA) Afanasievo and Yamnaya individuals make a lot of sense. The high affinity that these individuals show to the Irish EBA samples is not at all surprising, but striking nonetheless. The Afanasievo people, after all, lived in the Altai Mountains deep in Asia, more than 6,000kms from Ireland.

Update 02/08/2017: Interestingly, the graphs below, based on the cM values in my coancestry matrix, suggest that upper caste Indo-Aryan-speaking Brahmins from Northern India share relatively more ancestry with the Afanasievo genome than Iranic-speakers such as Pamir Tajiks, who generally share relatively more ancestry with the younger Andronovo and Sintashta samples. The relevant datasheet is available here.

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Ancient herders from the Pontic-Caspian steppe crashed into India: no ifs or buts

Saturday, July 29, 2017

New resource: 67 diploid ancient genomes

Published this week along with Martiniano et al. 2017, a dataset of 67 new and publicly available genomes, genotyped and imputed for 30 million markers:

Data from: The population genomics of archaeological transition in west Iberia: investigation of ancient substructure using imputation and haplotype-based methods

Martiniano R, Cassidy LM, Ó'Maoldúin R, McLaughlin R, Silva NM, Manco L, Fidalgo D, Pereira T, Coelho MJ, Serra M, Burger J, Parreira R, Moran E, Valera AC, Porfirio E, Boaventura R, Silva AM, Bradley DG

Date Published: July 28, 2017


Keep in mind however, that this dataset is specifically designed for haplotype-based tests, like those done with Chromopainter (for more details, see S5 Text in Martiniano et al. 2017). As far as I know, it should also perform well in ADMIXTURE runs.

On the other hand, the diploid and imputed genotype calls are likely to slightly skew results in formal statistics and formal statistics-based modeling analyses. So it's best to use pseudo-haploid genomes for such tests, and/or high coverage diploid genomes if available, with 100% observed calls.

I'm about to run a quick and dirty haplotype/Principal Component analysis with this dataset using BEAGLE, mainly to check whether South Asians show greater recent genetic affinity to Afanasievo/Yamnaya over Andronovo/Sintashta (for more on this controversy, see here). It's a pity that this dataset doesn't include any genomes from Neolithic Iran, because then I'd also be able to try haplotype-based mixture models for South Asians.

By the way, I won't be using all of the 30 million markers. I've only kept the SNPs that overlap with the Harvard Medical School's 1240K SNP ancient capture array, which should mean that only a small minority of the calls in my analysis won't be real.

Update 02/08/2017: The BEAGLE run is complete and the analysis is unfolding. See post and comments here.

Thursday, July 27, 2017

Yamnaya-related migrations into Iberia: infiltration rather than invasion (Martiniano et al. 2017)

The Martiniano et al. preprint that appeared at bioRxiv more than two months ago was published at PLoS Genetics today (see here). The paper packs a lot of supplementary information that wasn't included with the preprint. Below is the press release about the paper from the Public Library of Science (PLoS). Emphasis is mine.

The genomes of individuals who lived on the Iberian Peninsula in the Bronze Age had minor genetic input from Steppe invaders, suggesting that these migrations played a smaller role in the genetic makeup and culture of Iberian people, compared to other parts of Europe. Daniel Bradley and Rui Martiniano of Trinity College Dublin, in Ireland, and Ana Maria Silva of University of Coimbra, Portugal, report these findings July 27, 2017 in PLOS Genetics.

Between the Middle Neolithic (4200-3500 BC) and the Middle Bronze Age (1740-1430 BC), Central and Northern Europe received a massive influx of people from the Steppe regions of Eastern Europe and Asia. Archaeological digs in Iberia have uncovered changes in culture and funeral rituals during this time, but no one had looked at the genetic impact of these migrations in this part of Europe. Researchers sequenced the genomes of 14 individuals who lived in Portugal during the Neolithic and Bronze Ages and compared them to other ancient and modern genomes.

In contrast with other parts of Europe, they detected only subtle genetic changes between the Portuguese Neolithic and Bronze Age samples resulting from small-scale migration. However, these changes are more pronounced on the paternal lineage. "It was surprising to observe such a striking Y chromosome discontinuity between the Neolithic and the Bronze Age, such as would be consistent with a predominantly male-mediated genetic influx" says first author Rui Martiniano. Researchers also estimated height from the samples, based on relevant DNA sequences, and found that genetic input from Neolithic migrants decreased the height of Europeans, which subsequently increased steadily through later generations.

The study finds that migration into the Iberian Peninsula occurred on a much smaller scale compared to the Steppe invasions in Northern, Central and Northwestern Europe, which likely has implications for the spread of language, culture and technology. These findings may provide an explanation for why Iberia harbors a pre-Indo-European language, called Euskera, spoken in the Basque region along the border of Spain and France. It has been suggested that Indo-European spread with migrations through Europe from the Steppe heartland; a model that fits these results.

Daniel Bradley says "Unlike further north, a mix of earlier tongues and Indo-European languages persist until the dawn of Iberian history, a pattern that resonates with the real but limited influx of migrants around the Bronze Age."

Martiniano R, Cassidy LM, Ó'Maoldúin R, McLaughlin R, Silva NM, Manco L, et al. (2017) The population genomics of archaeological transition in west Iberia: Investigation of ancient substructure using imputation and haplotype-based methods. PLoS Genet 13(7): e1006852.

See also...

New resource: 67 diploid ancient genomes

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Map: Eastern Europe c. 4000-2500 BC

Interesting map, don't you think? It comes from here. But how accurate is it? And does it match ancient DNA?


Bluhm, Lara, *dhéĝhōm,*héshr, and *wek (earth, blood, and speech): an archaeological, genetic, and linguistic exploration of Indo-European origins (2017). Honors Projects. 80.

Monday, July 24, 2017

The crisis

Correct me if I'm straying from the facts, but the 4300–3800 YBP date mentioned in this new paper at Eurasian Soil Science, on the "catastrophic aridization" of the steppes in the Lower Volga region, is roughly the time when big, tall, round headed folks rich in Yamnaya-related ancestry basically hijack the Beaker phenomenon, and just before the collapse of the Indus Valley Civilization and, according to most sane people, the arrival of Indo-Europeans in South Asia. Coincidence?

Abstract: Diagnostic features of a catastrophic aridization of climate, desertification, and paleoecological crisis in steppes of the Lower Volga region have been identified on the basis of data on the morphological, chemical, and microbiological properties of paleosols under archeological monuments (burial mounds) of the Middle Bronze Age. These processes resulted in a certain convergence of the soil cover with transformation of zonal chestnut (Kastanozems) paleosols and paleosolonetzes (Solonetz Humic) into specific chestnut-like eroded saline calcareous paleosols analogous to the modern brown desert-steppe soils (Calcisols Haplic) that predominated in this region 4300–3800 years ago. [1] In the second millennium BC, humidization of the climate led to the divergence of the soil cover with secondary formation of the complexes of chestnut soils and solonetzes. This paleoecological crisis had a significant effect on the economy of the tribes in the Late Catacomb and Post-Catacomb time stipulating their higher mobility and transition to the nomadic cattle breeding.

Demkina et al., Paleoecological crisis in the steppes of the Lower Volga region in the Middle of the Bronze Age (III–II centuries BC), July 2017, Volume 50, Issue 7, pp 791–804

See also...

Swat Valley "early Indo-Aryans" at the lab

The Bell Beaker Behemoth (Olalde et al. 2017 preprint)

Corded Ware origin of a big chunk of Finnish mtDNA (Oversti et al. 2017)

Over at Scientific Reports at this LINK. Emphasis is mine. Corded Ware people were in all likelihood early Indo-European speakers and belonged, perhaps almost exclusively, to Y-chromosome haplogroup R1a, while present-day Finns obviously speak a Uralic language and mostly belong to Y-chromosome haplogroups N1c and I1. But Finns do show a lot of Corded Ware- or Yamnaya-related genome-wide ancestry, so it shouldn't be surprising that a large part of their maternal ancestry is derived from the Corded Ware population.

Abstract: In Europe, modern mitochondrial diversity is relatively homogeneous and suggests an ubiquitous rapid population growth since the Neolithic revolution. Similar patterns also have been observed in mitochondrial control region data in Finland, which contrasts with the distinctive autosomal and Y-chromosomal diversity among Finns. A different picture emerges from the 843 whole mitochondrial genomes from modern Finns analyzed here. Up to one third of the subhaplogroups can be considered as Finn-characteristic, i.e. rather common in Finland but virtually absent or rare elsewhere in Europe. Bayesian phylogenetic analyses suggest that most of these attributed Finnish lineages date back to around 3,000–5,000 years, coinciding with the arrival of Corded Ware culture and agriculture into Finland. Bayesian estimation of past effective population sizes reveals two differing demographic histories: 1) the ‘local’ Finnish mtDNA haplotypes yielding small and dwindling size estimates for most of the past; and 2) the ‘immigrant’ haplotypes showing growth typical of most European populations. The results based on the local diversity are more in line with that known about Finns from other studies, e.g., Y-chromosome analyses and archaeology findings. The mitochondrial gene pool thus may contain signals of local population history that cannot be readily deduced from the total diversity.

Oversti et al., Identification and analysis of mtDNA genomes attributed to Finns reveal long-stagnant demographic trends obscured in the total diversity, Scientific Reports, Published online: 21 July 2017, doi:10.1038/s41598-017-05673-7

See also...

Baltic Corded Ware: rich in R1a-Z645

Neolithic transition in the Baltic

The genetic history of Northern Europe (or rather the South Baltic)

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Thursday, July 20, 2017

The Out-of-India Theory (OIT) challenge: can we hear a viable argument for once?

Recent weeks have seen a rash of activity from OIT proponents defending their "truth", largely as a response to a news feature in The Hindu on new genetic evidence backing the Aryan Invasion or Migration Theory (AIT/AMT). A few examples:

Genetics Might Be Settling The Aryan Migration Debate, But Not How Left-Liberals Believe

Genetics and the Aryan invasion debate

Propagandizing the Aryan Invasion Debate: A Rebuttal to Tony Joseph

Here We Go Again: Why They Are Wrong About The Aryan Migration Debate This Time As Well

The problematics of genetics and the Aryan issue

Too early to settle the Aryan migration debate?

The people who wrote these articles are able to string sentences together in a reasonable way, but apart from that, their efforts are clumsy at best. Not only do they not appear to completely understand what they're attempting to debunk, but they also fail to offer an OIT that realistically incorporates new findings from ancient and modern-day DNA.

AIT/AMT is now firmly backed by ancient DNA from Eastern Europe and high resolution modern-day DNA from South Asia. To quote myself from a week ago:

During the past couple of years ancient DNA has revealed the presence of Y-chromosome haplogroup R1a in Eastern European remains dated to the Mesolithic, Neolithic, Eneolithic and Bronze Age. Moreover, the Bronze Age remains, packed in ancestry derived from Eastern European hunter-gatherers (or EHG) and totally lacking any sort of South Asian admixture, belong to R1a-Z645, which is the ancestral clade of by far the most common types of R1a in Europe and South Asia today: R1a-Z282 and R1a-Z93, respectively. And on top of that, South Asians, especially those speaking Indo-European languages, show significant admixture derived from EHG.

The conclusion from this data is self-evident: during the Bronze Age R1a-Z645 became a very important Y-chromosome lineage in Europe and quickly moved to South Asia, in all likelihood on the back of the Indo-European expansion.

Pre-Indo-European Eastern Europe and South Asia were not the same world; they were world's apart. Thus, you will never read anything like this, no matter how much ancient DNA from South Asia is sequenced:

During the past couple of years ancient DNA has revealed the presence of Y-chromosome haplogroup R1a in South Asian remains dated to the Mesolithic, Neolithic, Eneolithic and Bronze Age. Moreover, the Bronze Age remains, packed in ancestry derived from South Asian hunter-gatherers, and totally lacking any sort of European admixture, belong to R1a-Z645, which is the ancestral clade of by far the most common types of R1a in Europe and South Asia today: R1a-Z282 and R1a-Z93, respectively. And on top of that, Europeans, especially those speaking Indo-European languages, show significant admixture derived from South Asian hunter-gatherers.

So, OIT proponents, what counter-arguments can you offer? And can you come up with a new vision for OIT that coherently takes into account ancient DNA from Eastern Europe?

However, to ensure that the debate is a fruitful one not derailed regularly by anti-AIT/pro-OIT red herrings, let's take care of the most obvious of these red herrings now. I reserve the right to delete any comments that attempt to go down these tired, irrelevant avenues without a very good excuse for doing so.

You: So and so found Y-haplogroup P* and other basal clades upstream of R1a in Papuans, therefore R1a and Indo-Europeans are from South Asia. Me: Nonsense. R1 and R1a are found in the remains of Eastern European Mesolithic foragers. Were these individuals recently arrived Indo-European-speakers from South Asia? Try harder.

You: It doesn't matter that Eastern European Mesolithic foragers belonged to R1a, because the most common form of R1a in the world is R1a-M417, and if it originated in India then OIT is a reality. Me: But what are the chances realistically that R1a-M417 is from India or South Asia, considering that prehistoric European samples, with absolutely no signals of ancestry from South Asia, belong to both M417+ and M417- lineages? In fact, Europe is the most likely homeland of R1a-M417.

You: India has incredible diversity in R1a, therefore it's the R1a and Indo-European homeland. Me: No it doesn't. India, and indeed, South Asia as a whole are dominated by one fairly young subclade: R1a-Z93. Europe is home to three different subclades that show up at perceptible frequencies: R1a-Z282, found throughout much of the continent; R1a-L664, mostly confined to Northwestern Europe; and R1a-Z93, mostly confined to far Eastern Europe.

You: Many unique Indian ethnic groups are yet to be tested genetically. They may show surprising results, including new subclades of R1a. Me: If you dig hard enough, you'll always find some exceptions to the rule. But how do you know where the ancestral lineages of such exceptions in South Asia were during, say, the Neolithic? What makes you think they were in South Asia? To prove that South Asia is the homeland of its by far most dominant R1a subclade, R1a-Z93, then at the very least you need to show that other, closely and distantly related subclades, are also found at perceptible frequencies in whole regions of South Asia, and therefore that they have some sort of history there. Otherwise we can safely assume that R1a-Z93 and the few exceptions to the R1a-Z93 rule in South Asia are relative latecomers from somewhere else.

You: But we have no ancient DNA from South Asia yet, and it may produce a huge shock. Me: For you yes, but not for me. What are the chances realistically that R1a was present among both European and South Asian foragers? I'd say practically zero. Feel free to raise it to a few per cent to make yourself feel better, but we both know the hard reality.

You: Ancient DNA from South Asia might show that Northern India was home to a population very similar to Yamnaya, and if so, then the Yamnaya-related ancestry in modern-day Indians is native to India. Me: There's no logic behind this. Yamnaya and other closely related Bronze Age groups were very specific mixtures of Mesolithic foragers and Neolithic farmers living in Eastern Europe and surrounds. There's absolutely no reason to assume that such unique mixtures would also form independently in South Asia, or even outside of Europe's generally accepted borders.

You: Bronze Age Europeans who belonged to R1a also carried southern admixture from Iran, or maybe even India. Me: In prehistoric samples, R1a is always highly correlated with Eastern European Hunter-Gatherer (EHG) ancestry, so positing that it also arrived in Europe with a southern population makes no sense. And why would this southern ancestry be from Iran or India? Why not the Caucasus? We know from ancient DNA that the type of southern ancestry that these ancient Europeans carried has been sitting in the Caucasus since the Upper Paleolithic. Moreover, they lack South Caspian- and South Asian-specific markers such as mtDNA haplogroup U7. How were such markers purged from their gene pool if they or their recent ancestors arrived in Europe from Iran or India?

You: Chickens and mice came from South Asia, therefore Indo-Europeans came from South Asia too. Me: Bullshit. Do better or go away.

Does anyone want to claim that I don't know what I'm talking about? Or perhaps that I'm just putting out Eurocentric propaganda? If you don't understand my arguments, and that they're indeed very solid arguments, then there's no hope for you. Go and find a new hobby or profession, because you're not cut out for this.

OK, now what we have the formalities out of the way, who wants to have a go at salvaging OIT in the comments? Don't be shy.

See also...

Ancient herders from the Pontic-Caspian steppe crashed into India: no ifs or buts

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Monday, July 17, 2017

On the Mesolithic colonization of Scandinavia (Günther et al. 2017 preprint)

Over at bioRxiv at this LINK. The main takeaway point from this preprint is that Scandinavia was a more happening place than most of the rest of Europe during the Mesolithic, because at the time it was the meeting place between two relatively divergent forager groups, West European hunter-gatherers (WHG) and East European hunter-gatherers (EHG), that entered the peninsula from different directions, the southwest and northeast, respectively, and mixed to form Scandinavian hunter-gatherers (SHG). Other key points:

- EHG probably dispersed across Scandinavia in a counter-clockwise direction via an ice-free route along the Atlantic coast in what is now Norway, because SHG samples from northern and western Scandinavia show more EHG ancestry than those from southern and eastern Scandinavia

- at least 17% of the SNPs that are common in SHG are not found in present-day Europeans, suggesting that a large part of European variation has been lost since the Mesolithic

- although it's unlikely that SHG made a significant contribution to the present-day Northern European gene pool, some gene-variants common in SHG that appear to be associated with metabolic, cardiovascular, developmental and psychological traits are carried at high frequencies by present-day Northern Europeans, especially compared to present-day Southern Europeans, probably due to strong selective pressures specific to northern latitudes in Europe

- SHG is inferred to have had fair skin and varied blue to light-brown eye color, which makes sense considering that it was a mixture of apparently fair-skinned/brown-eyed EHG and dark-skinned/blue-eyed WHG, except that the frequencies of blue-eyed variants and one fair-skinned variant in SHG are much higher than expected from its EHG/WHG mixture ratios, again pointing to strong selective pressures specific to northern latitudes in Europe acting upon certain gene-variants

- a 3D computer generated facial reconstruction of an SHG female based on data from a very high (57x) coverage genome sequence looks, at least to me, like a fairly typical present-day Northern European woman (see Figure S9.1 in the supp info here), though I suspect that the result might be biased in some way, simply because it's impossible to know whether variants associated with specific facial traits in present-day Northern Europeans were also associated with the same facial traits in SHG.


Günther et al., Genomics of Mesolithic Scandinavia reveal colonization routes and high-latitude adaptation, bioRxiv, Posted July 17, 2017, doi:

Sunday, July 16, 2017

North European admixture in the Han Chinese (Charleston et al. 2017 preprint)

Over at bioRxiv at this LINK. Emphasis is mine. The estimated date of the North European-related admixture signal is probably much too late. These sorts of estimates always look way off. And I doubt that it's largely the result of the Silk Road, which linked China to the Near East and Mediterranean rather than to Northern Europe. More likely it reflects gene flow from the Pontic-Caspian steppe in Eastern Europe during the Bronze and Iron ages, via the Afanasievo, Andronovo, and other closely related steppe peoples (see here).

Abstract: As are most non-European populations around the globe, the Han Chinese are relatively understudied in population and medical genetics studies. From low-coverage whole-genome sequencing of 11,670 Han Chinese women we present a catalog of 25,057,223 variants, including 548,401 novel variants that are seen at least 10 times in our dataset. Individuals from our study come from 19 out of 22 provinces across China, allowing us to study population structure, genetic ancestry, and local adaptation in Han Chinese. We identify previously unrecognized population structure along the East-West axis of China and report unique signals of admixture across geographical space, such as European influences among the Northwestern provinces of China. Finally, we identified a number of highly differentiated loci, indicative of local adaptation in the Han Chinese. In particular, we detected extreme differentiation among the Han Chinese at MTHFR, ADH7, and FADS loci, suggesting that these loci may not be specifically selected in Tibetan and Inuit populations as previously suggested. On the other hand, we find that Neandertal ancestry does not vary significantly across the provinces, consistent with admixture prior to the dispersal of modern Han Chinese. Furthermore, contrary to a previous report, Neandertal ancestry does not explain a significant amount of heritability in depression. Our findings provide the largest genetic data set so far made available for Han Chinese and provide insights into the history and population structure of the world's largest ethnic group.


One finding from our analysis of admixture signals that most likely fit a one-pulse admixture model is our observation of admixture from Northern European populations to the Northwestern provinces of China (Gansu, Shaanxi, Shanxi), but not other parts of China. Previous analysis of the HGDP data, based on patterns of haplotype sharing among 10 Han Chinese from Northern China, estimated a single pulse of ~6% West Eurasian ancestry among the Northern Han Chinese. The estimated date of admixture was around 1200 CE. This signal is also observed among the Tu people, an ethnic minority also from Northwestern China; the authors attributed this signal to contact through the Silk Road (Hellenthal et al. 2014). We estimate a lower bound of admixture proportion due to Northern Europeans at approximately 2%-5%, with an admixture date of about 26 +/-3 generations for Gansu, and 47 +/-3 generations for Shaanxi [Table S8]. Using a generation time of about 26-30 years (Moorjani et al. 2016), these estimates correspond to admixture events occurring at around 700 CE and 1300 CE, respectively, corresponding roughly to the Tang and Yuan dynasty in China. However, these estimated dates should be interpreted with caution, as both the violation of a single pulse admixture model and the additional noise in inter-­marker LD estimates due to low coverage data could bias the estimates.

Charleston et al.,A comprehensive map of genetic variation in the world's largest ethnic group - Han Chinese, bioRxiv, Posted July 13, 2017, doi:

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Wednesday, July 12, 2017

Indian confirmation bias

In a largely fact free but obfuscation rich comment piece at The Hindu, Indian scientists Gyaneshwer Chaubey and Kumarasamy Thangaraj ask whether it's too early to settle the Aryan migration debate. See here.

No, it's not too early. It's game over chaps, and has been for a while.

During the past couple of years ancient DNA has revealed the presence of Y-chromosome haplogroup R1a in Eastern European remains dated to the Mesolithic, Neolithic, Eneolithic and Bronze Age. Moreover, the Bronze Age remains, packed in ancestry derived from Eastern European hunter-gatherers (or EHG) and totally lacking any sort of South Asian admixture, belong to R1a-Z645, which is the ancestral clade of by far the most common types of R1a in Europe and South Asia today: R1a-Z282 and R1a-Z93, respectively. And on top of that, South Asians, especially those speaking Indo-European languages, show significant admixture derived from EHG.

The conclusion from this data is self-evident: during the Bronze Age R1a-Z645 became a very important Y-chromosome lineage in Europe and quickly moved to South Asia, in all likelihood on the back of the Indo-European expansion. Yet, in spite of this, Gyaneshwer and Kumarasamy make the following claim in their article.

Moreover, there is evidence which is consistent with the early presence of several R1a branches in India (our unpublished data).

Potentially powerful stuff, you might say. But hang on, what are Gyaneshwer and Kumarasamy seeing in their data that could possibly reverse the current reality about R1a? Did they find R1a in South Asian remains from the Mesolithic and Neolithic? Or perhaps they've uncovered South Asian Bronze Age remains that belong to R1a-Z645 and lack any signals of ancestry from Eastern Europe?

This is impossible. The ancient DNA from Eastern Europe says so. That's because pre-Indo-European Eastern Europe and South Asia were not the same world; they were world's apart. Thus, you will never read anything like this, no matter how much ancient DNA from South Asia is sequenced:

During the past couple of years ancient DNA has revealed the presence of Y-chromosome haplogroup R1a in South Asian remains dated to the Mesolithic, Neolithic, Eneolithic and Bronze Age. Moreover, the Bronze Age remains, packed in ancestry derived from South Asian hunter-gatherers, and totally lacking any sort of European admixture, belong to R1a-Z645, which is the ancestral clade of by far the most common types of R1a in Europe and South Asia today: R1a-Z282 and R1a-Z93, respectively. And on top of that, Europeans, especially those speaking Indo-European languages, show significant admixture derived from South Asian hunter-gatherers.

See also...

Ancient herders from the Pontic-Caspian steppe crashed into India: no ifs or buts

The Out-of-India Theory (OIT) challenge: can we hear a viable argument for once?

Tuesday, July 11, 2017

Working topology for Eurasian population structure

Here's my new "basic" qpGraph topology that I'll be using to test phylogenetic and mixture models for Eurasians. I think it reconciles a few key findings from recent scientific literature. Please note that since my main interest is post-Neolithic prehistory of West Eurasia, and in particular the early Indo-European expansions, I don't want to make this model unnecessarily complex by adding "dead end" Upper Paleolithic genomes.

But I welcome ideas on how to improve and make use of this topology, so if, say, adding Ust_Ishim helps, then let's do it. The ancient samples featured in the above graph are listed here and the graph file is available here. Feel free to post your own versions of the graph file in the comments and I'll run them as soon as possible. But please remember to label the samples correctly at all times.

Update 13/07/2017: Thanks to Matt in the comments, here's a neater version of the same model, with a lower (highest) Z score and slightly different mixture coefficients. It includes a couple of zero edges, which are generally undesirable, but these might disappear when more populations are added to the topology. The graph file is available here.

Monday, July 10, 2017

Armenian confirmation bias

Current Biology recently published a paper by Margaryan and Derenko et al. titled Eight Millennia of Matrilineal Genetic Continuity in the South Caucasus. I wasn't going to bother calling out the authors on their, unfortunately I have to say, rather dubious claim, but then I saw this ScienceNordic article enthusiastically attempting to drive home their misguided point, so a few words are now in order.

“It’s basically the same female population in the region over the past 8,000 years. It’s very surprising considering the many waves of migration and cultural shifts,” says lead-author Ashot Margaryan from the Centre for GeoGenetics at the National History Museum of Denmark, University of Copenhagen.

Genetics have remained constant for 8,000 years in world’s melting pot

I'm at a loss as to why Ashot Margaryan is very surprised. I'm not even mildly surprised. Why? Let's take a closer look at what we're dealing with here:

- the authors sequenced just 52 full mitogenomes to represent 8,000 years of prehistory and early history in the South Caucasus

- they lumped all of these sequences together into an "Ancient" sample set as if they were from a single time slice (I know, pretty crazy)

- they then ran a few complex models on this neither here nor there sample set, and concluded that it resembled the maternal gene pool of present-day Armenians.

Well, duh, present-day Armenians are more or less the end product of the population history of the last eight thousand years in what is now Armenia and surrounds. Is anyone still as surprised about this as Ashot? Surely not.

Obviously, the problem here is that the authors have mistaken their none too surprising outcome to mean that the South Caucasus has not experienced any major upheavals in its maternal gene pool over the past 8,000 years, which, if actually true, would indeed be very surprising, and even shocking.

But the haplogroup assignments of the 52 mitogenomes are reported in the spreadsheet here, and just by eyeballing these results, I can tell that they suggest an influx of foreign ancestry, probably from the Pontic-Caspian steppe, to the South Caucasus after the Early Bronze Age (EBA). Note, for instance, the presence of what appear to be typically steppe haplogroups U4a, U2e1e and U5a1b in the samples dated to the Middle Bronze Age (MBA), Late Bronze Age (LBA) and Early Iron Age (EIA), respectively.


Margaryan and Derenko et al., Eight Millennia of Matrilineal Genetic Continuity in the South Caucasus, Current Biology 27, 1–6 July 10, 2017, DOI: 10.1016/j.cub.2017.05.087

Tuesday, July 4, 2017

Out-of-India chickens coming home to roost

Razib has posted a spacious but none-too-technical review of the ongoing Aryan Invasion Theory (AIT) controversy, along with some personal anecdotes and predictions about how ancient DNA from South Asia might shape the debate in the near future (see here).

It should be a useful guide to the topic for those of you who aren't quite as excited about reading about my latest adventures with qpGraph as many of the regulars in the comments here.

One thing that I'd perhaps add to Razib's post is that the ancient DNA record now boasts Late Neolithic Yamnaya-like Corded Ware Culture individuals from the East Baltic region that belong to Y-haplogroup R1a-Z645. And that's usually as far as their lineages go (see here).

This is important, because the Z645 mutation is directly and recently ancestral to the pair of likely post-Neolithic mutations that define the two R1a subclades most common in Europe and South Asia today: Z282 and Z93, respectively.

So not only are the "European" R1a-Z282 and "South Asian" R1a-Z93 relatively young sister clades, but their ancestral clade has now been found in ancient samples from Northeastern Europe that probably predate their appearance by only a few generations, if that. Of course, the upshot of all of this is that R1a-Z93 could not have originated very far from the East Baltic, which makes South Asia look about as likely as the homeland of this subclade as the goddamn moon. Conversely, it makes AIT look very plausible indeed.

However, granted, this might seem very confusing to anyone who hasn't been studying the R1a topology for years, and perhaps better left out of the more mainstream debates on AIT for the sake of simplicity. By the way, I found this part of Razib's post especially intriguing:

One scientist who holds to the position that most South Asian ancestry dates to the Pleistocene argued to me that we don’t know if ancient Indian samples from the northwest won’t share even more ancestry than the Iranian Neolithic and Pontic steppe samples. In other words, ANI was part of some genetic continuum that extended to the west and north. This is possible, but I do not find it plausible.

I suspect that this scientist's rather fanciful suggestion (which really flies in the face of very solid models based on ancient genomic data from Europe and surrounds) is a hint of the direction that the debate will take right after the publication of ancient genomes from South Asia. Because when that happens, obfuscators like this guy (usually hopeless Out-of-India proponents) will either have to concede defeat and quit the debate, or ramp up their obfuscations to spectacular new highs.

And please don't mistake my confidence on this issue for bluff and bluster. It's not exactly the best kept secret out there that ancient samples from India and Pakistan are now ready, and...oops I probably can't say more than that for now. Pity.

See also...

Ancient herders from the Pontic-Caspian steppe crashed into India: no ifs or buts

Indian confirmation bias

Europeans: genetically homogeneous on a global scale

From SMBE 2017 via benmpeter on Twitter:

Also at SMBE 2017, David Reich is "sad to leave space of f-statistics", presumably because they don't offer enough resolution when analyzing more recent ancient data from such genetically homogeneous regions as Europe. Via jgschraiber on Twitter.

Update 04/07/2017: A PDF of the Benjamin Peter poster is available at figshare here (30MB).

See also...

SMBE 2017 abstracts

Matters of geography

Monday, July 3, 2017

The Indo-Europeanization of South Asia: migration or invasion?

The recent avalanche of ancient DNA data from across Eastern Europe, including modern-day Bulgaria, Estonia, Latvia, Romania, Ukraine and western Russia, has revealed prehistoric hunter-gatherer populations indigenous to the region harboring a remarkable diversity in Y-chromosome lineages belonging to haplogroups R1, R1a and R1b.

Neolithic transition in the Baltic

Baltic Corded Ware: rich in R1a-Z645

The genetic history of Northern Europe

The genomic history of Southeastern Europe

A few more ancient genomes from the Balkans and Iberia

So the once popular idea that these Y-haplogroups were instead native to Central Asia, the Near East and/or South Asia now looks very wrong.

R1a probably first arrived in South Asia during the Bronze Age with highly mobile Yamnaya-related pastoralists. These people were expanding in almost all directions from the Pontic-Caspian steppe at the time, and it's difficult to imagine that they weren't the ones who first spread Indo-European languages to peninsular Europe and the Indian subcontinent.

It's likely that almost all interested parties will soon agree that this was indeed the case. So the focus in the debate on the expansion of the Indo-Europeans, including Indo-Aryans, into South Asia will soon have to shift from whether it actually happened to how it happened. For instance, was it simply a migration or potentially violent invasion?

I already strongly believe that it was an invasion, or rather a series of invasions. I'll change my mind if, at the end of the day, the evidence says otherwise. But if you favor a migration scenario, then consider these points:

- the population in the northern part of the Indian subcontinent during the Bronze Age, even after the collapse of the Indus Civilization, was likely to have been very large for its time, and yet there was a massive pulse of admixture across South Asia from the steppe and a turnover in Y-chromosomes, especially amongst the ruling classes, suggesting that something very dramatic took place that had a major impact on the social and political fabric of the region

- early Indo-Europeans in the Near East, from the Hittites to the Scythians, are often recorded as warlike and expansionist, with a habit of invading and subjugating other peoples, like the Hattians, Hurrians and Mitanni (who apparently ended up with an Aryan elite)

- if early Indo-Europeans outside of South Asia had a penchant for invasions, then there's no reason to believe that the M.O. of the early Indo-Europeans in South Asia would have been any different, unless some sort of direct empirical evidence says so, but what kind of direct empirical evidence?

Please note, I agree that the suggestion of a potentially violent invasion of South Asia by Indo-Europeans, and, indeed, Aryans, sounds provocative, and will always be politically controversial no matter how much evidence is gathered in its favor. But what if it really happened?

See also...

Ancient herders from the Pontic-Caspian steppe crashed into India: no ifs or buts

Indian confirmation bias

Friday, June 30, 2017

At the half-way mark

It's been a huge first six months of the year, with the publication of at least five major ancient DNA preprints and papers (depending on how you define major in this context). Here are the five most popular posts at this blog in 2017 thus far:

Ancient herders from the Pontic-Caspian steppe crashed into India: no ifs or buts - over 10,000 hits and counting

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but... - over 10,000 hits

The Bell Beaker Behemoth (Olalde et al. 2017 preprint) - almost 7,000 hits

Latest on Bell Beaker and Corded Ware - almost 6,000 hits

The genomic history of Southeastern Europe (Mathieson et al. 2017 preprint) - almost 6,000 hits

All of these posts are, one way or another, concerned with what ancient DNA says about the expansions of the Proto-Indo-Europeans and/or Indo-Aryans. In other words, the combo of ancient DNA and the Indo-Europeans is what really brings in the crowds here. Conversely, as far as I can tell based on the blog stats, few people nowadays care much about population history papers based solely on present-day DNA.

So what are we to expect in the second half of 2017? Probably quite a lot, including all of that awesome genotype data from the Olalde et al. and Mathieson et al. preprints, as well as a few more ancient DNA preprints and papers. I'm pretty sure that we'll soon see a paper on the origins of the Minoans and Mycenaeans, and another one on the population history of South Asia, with samples from Harappan and Swat culture sites. Somewhere amongst all of that there will also probably be genomes from BMAC and Maikop. Below, a pic of South Asia and surrounds courtesy of NASA.

It's hard to predict what will happen in the comments here when the paper on South Asia comes out. But apparently there are five stages of grief (denial, anger, bargaining, depression and acceptance), and I expect our anti-Kurgan and anti-Aryan Invasion/Migration regulars to go through all five of these stages before they finally accept reality as dictated by the ancient DNA from South Asia. It'll be a hoot whatever happens. So please stay tuned, and remember to behave in the comments.

SMBE 2017 abstracts

The abstract book is available here. Lots of interesting stuff this year, although nothing really earth-shattering as far as I can see, and a couple of the ancient DNA talks are based on preprints that have already appeared at bioRxiv. I'd check out these talks:

Genome wide data from the Iron Age provides insights into the population history of Finland

Lamnidis et al.

Abstract: The population history of Finland is subject of an ongoing debate, in particular with respect to the relationship and origins of modern Finnish and Saami people. Here we analyse genome-wide data, extracted from three teeth found in the archaeological site of Levänluhta, in southern Ostrobothnia. The site dates back to the Iron Age between 550-800 AD, according to the artefacts recovered, while radiocarbon dating on scattered femurs from the site span 350-730 AD. When analysed together with previously published ancient European samples and with modern European populations, the ancient Finnish samples lack a genetic component found in early Neolithic Farmers and all modern European populations today. Instead, we find that they are more closely related to modern Siberian and East Asian populations than modern Finnish are, a pattern also observed in genetic data from modern Saami. Our results suggest that the ancestral Saami population 1500 years ago, inhabited a larger region than today, extending as far south as Levänluhta. Such a scenario is also supported by linguistic evidence suggesting most of Finland to have been speaking Saami languages before 1000 AD. We also observe genetic differences between modern Saami and our ancient samples, which are likely to have arisen due to admixture with Finnish people during the last 1500 years.

40,000-year-old individual from Asia provides insight into early population structure in Eurasia

Yang et al.

Abstract: To date, very few ancient genomic studies have been conducted in Asia. Genome-wide studies using ancient individuals from Europe have revealed complex ancestry and genetic structure in ancient populations that could not be observed studying only present-day populations, suggesting similar approaches may also aid in elucidating the demographic history in Asia. Here, we present genome-wide data for a 40,000-year-old individual from Tianyuan Cave near Beijing, China. We show that he is more related to present-day Asians than present-day and ancient Europeans. However, unlike present-day Asians, he shows potential relationships with some present-day South Americans and a 35,000-year-old European individual. Our results suggest that there was extensive population structure in Asia by 40,000 years ago that persisted over an extended period of time.

Bridging the Divide Between Modern and Ancient DNA

David Reich

Abstract: Genome-wide studies of human variation have for the most part focused either on DNA from present-day individuals, or from individuals who lived prior to 4,000 years ago. However, developing a detailed understanding of how the peoples who lived in the early Bronze Age contributed to Iron Age populations who in turn contributed to Medieval populations who in turn contributed to people living today, has been difficult. One challenge is that by the beginning of the Bronze Age (at least in Western Eurasia where the most ancient DNA data have been collected), the ancestry composition of many populations was very similar to that of populations that live in the same regions today. As a result, the powerful methods that have been developed for learning about population history based on allele frequency correlation patterns are sometimes not able to discern the often subtle differences in ancestry composition between past populations. In this talk, I will describe work in which my colleagues and I have tried to begin to bridge this divide, both by studying ancient samples from intermediate time points, and by deploying more sensitive statistical methods.

See also...

Europeans: genetically homogeneous on a global scale