search this blog

Monday, June 26, 2017

Matters of geography

The steppe north of the Black Sea has basically always been considered part of Europe, and around the year 1900 some guy with a map decided that the steppe between the Black Sea and the Ural River in western Kazakhstan should also be Europe. So nowadays, right or wrong, it's generally accepted that the entire steppe region west of the Ural River, known as the Pontic-Caspian steppe, is in Eastern Europe. Here's a map courtesy of Wikipedia showing how the official boundary between Eastern Europe and Asia has shifted since the 18th century.

But this decision wasn't entirely arbitrary, because the current boundary between Eastern Europe and Asia by and large follows several major geographic barriers, including the Caucasus Mountains, the Caspian Sea and the Ural Mountains. It'd be hard to argue that these barriers haven't had a profound impact across the ages on the character of Europe and Europeans, and this has probably been known for much longer than a couple hundred years.

For instance, if we're to believe the most common interpretations of the works of ancient geographers like Hecataeus and Herodotus, then their worlds in some important ways resembled the typical Principal Component Analysis (PCA) of West Eurasian genetic variation. And it seems that they had a pretty good idea where both the strong continental boundaries and fuzzy areas were located.

Below, on the geographic map inspired by Herodotus, Europa or Europe is delineated from much of Asia by the Black Sea, the Caucasus Mountains and the Caspian Sea, while on the genetic map, most European and Asian populations form two, more or less parallel, clusters fairly cleanly separated by empty space (this was first noted in Lazaridis et al. 2013). Indeed, this empty space is the work of the Black Sea, the Caucasus Mountains and the Caspian Sea acting as rather effective barriers to gene flow between Eastern Europe and Asia (see Yunusbayev et al. 2012).

However, on the genetic map, the Iranic Scythians of the Asian steppes straddle my somewhat arbitrary red line separating Europa and Asia, and this is echoed on the Herodotus map by Iranic and related peoples like the Massagetae and Issedones, who inhabit the seemingly undefined part of the world between Europa and Asia east of the Caspian Sea (Mare Caspium).

Nothing really ground breaking, but pretty cool stuff.

On a related note, I've seen the term "mainland Europe" used recently in at least one of the big ancient DNA papers to describe the part of Europe west of Eastern Europe, and especially west of the Pontic-Caspian steppe. It seems that the authors wanted to underline the fairly stark genetic difference that existed between most of Europe and the steppe just prior to the expansion of Yamnaya and related steppe herder groups that initiated the formation of the present-day European gene pool.

I can see why they did this, but to my mind they got things backwards. That's because the term mainland implies the opposite of peninsula, and of course the part of Europe west of Eastern Europe is a relatively narrow strip of land largely surrounded by water, so it's a peninsula. Let's visualize this on a map of Europe courtesy of Wikipedia:

I understand that this might cause heart palpitations for some readers, especially those from Western Europe, who generally see their part of Europe as core Europe, but I feel that it makes good sense from a purely geographic POV.

Monday, June 19, 2017

Ancient herders from the Pontic-Caspian steppe crashed into India: no ifs or buts

It's now more than obvious that South Asia experienced an almighty pulse of admixture from an Early Bronze Age (EBA) population originally from somewhere on the Pontic-Caspian steppe in Eastern Europe. This is fairly easy to demonstrate thanks to ancient DNA from Europe and West Asia. One way of doing it is with the qpGraph algorithm.

Moreover, the widespread presence of Y-chromosome haplogroup R1a in South Asia is, at least in large part, linked to this event, because:

- Mesolithic Eastern European foragers belonging to basal clades of R1a do not show any South Asian or even Near Eastern ancestry, so it's likely that R1a is native to Eastern Europe and surrounds

- If R1a is native to Eastern Europe then it can't also be native to South Asia, which is not only thousands of miles away, but also ecologically a different world

- The most common R1a subclades in the world today, R1a-M417 and one of its main daughter branches R1a-Z93, appear in Late Neolithic and Bronze Age European pastoralist groups (Corded Ware, Srubnaya and closely related peoples) that harbor high levels of Eastern European forager ancestry and no signs of South Asian admixture

- Practically 100% of the R1a in South Asia today belongs to the R1a-Z93 subclade, which, based on full Y-chromosome sequencing data, looks like it began expanding rapidly only during the EBA, eventually making its way to South Asia, and this is in line with the available ancient DNA evidence

- In South Asia, R1a and ancient steppe admixture peak in groups that speak Indo-European, including Indo-Aryan, languages, suggesting that both are genetic signals of the Indo-European expansions into the Indian subcontinent

So we're now at a stage where anyone with at least moderate thinking capacity, whose mind isn't poisoned by extreme bias, has to agree that there was a rather large movement of people from the Eurasian steppes into South Asia during the Bronze Age. No ifs or buts.

Ancient DNA from South Asia is on the way. It might throw up a few surprises and force a new model of how the Indo-Europeans and R1a got to South Asia, but it won't turn things upside down. In other words, don't expect the Out-of-India or "indigenous Aryans" theory to suddenly come into the picture as a viable alternative to the Aryan Invasion Theory (AIT), occasionally presented as the more politically correct Aryan Migration Theory (AMT).

Many Indians still don't get this, or rather they refuse to get it, which is very frustrating, especially if you're a regular in the comments section here. But admittedly it can also be very entertaining.

Last week The Hindu published an interesting piece on the latest developments in South Asian population genetics that were making the AIT, or at least AMT, look like a sure thing:

How genetics is settling the Aryan migration debate

Soon after came this peculiarly titled retort in the Swarajya online magazine, in which unfortunately it's impossible to find a single coherent argument:

Genetics Might Be Settling The Aryan Migration Debate, But Not How Left-Liberals Believe

Generally hilarious stuff, except the parts where the author abuses blogger Razib Khan for moving with the latest genetic data and arguing in favor of the Aryan expansion into India (see here and here).

So what are we to expect when the first big paper with ancient DNA from South Asia comes out, probably in the next few months? For starters, accusations of racism and maybe even hate speech against anyone who claims that the results support the AIT or AMT, or anything even close. And lots of shouting and carrying on. But also a lot more comic relief.

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Friday, June 16, 2017

Cypriot Y-chromosomes (Heraclides et al. 2017)

Over at PLoS ONE at this link. Note the fairly high levels of Y-haplogroups R1a and/or R1b in many of the Greek and Turkish populations in the figure below. Much of this might be of fairly recent European (mostly Slavic) and Central Asian (Turkic nomad and Ottoman) provenance, but I'd say some of it has to date back to the Bronze Age, and potentially to the expansions of the Proto-Anatolians, Proto-Armenians and Proto-Greeks into the Balkans and Anatolia from the Pontic-Caspian steppe. Emphasis is mine:

Abstract: Genetics can provide invaluable information on the ancestry of the current inhabitants of Cyprus. A Y-chromosome analysis was performed to (i) determine paternal ancestry among the Greek Cypriot (GCy) community in the context of the Central and Eastern Mediterranean and the Near East; and (ii) identify genetic similarities and differences between Greek Cypriots (GCy) and Turkish Cypriots (TCy). Our haplotype-based analysis has revealed that GCy and TCy patrilineages derive primarily from a single gene pool and show very close genetic affinity (low genetic differentiation) to Calabrian Italian and Lebanese patrilineages. In terms of more recent (past millennium) ancestry, as indicated by Y-haplotype sharing, GCy and TCy share much more haplotypes between them than with any surrounding population (7–8% of total haplotypes shared), while TCy also share around 3% of haplotypes with mainland Turks, and to a lesser extent with North Africans. In terms of Y-haplogroup frequencies, again GCy and TCy show very similar distributions, with the predominant haplogroups in both being J2a-M410, E-M78, and G2-P287. Overall, GCy also have a similar Y-haplogroup distribution to non-Turkic Anatolian and Southwest Caucasian populations, as well as Cretan Greeks. TCy show a slight shift towards Turkish populations, due to the presence of Eastern Eurasian (some of which of possible Ottoman origin) Y-haplogroups. Overall, the Y-chromosome analysis performed, using both Y-STR haplotype and binary Y-haplogroup data puts Cypriot in the middle of a genetic continuum stretching from the Levant to Southeast Europe and reveals that despite some differences in haplotype sharing and haplogroup structure, Greek Cypriots and Turkish Cypriots share primarily a common pre-Ottoman paternal ancestry.


Y-haplogroup frequencies within GCy and TCy can be found in S6 Table. Y-haplogroup frequencies of Cypriots, Greeks, and Turks, as well as other surrounding populations can be found in Fig 1 (as well as S7 Table). GCy and TCy showed very similar frequencies for the major Y-haplogroups, differentiating both from Greek and Turkish sub-populations (Fig 3). The most frequent major Y-haplogroup subclade in both GCy and TCy was J2a-M410 (23.8% and 20.3% among GCy and TCy, respectively), followed by E-M78 (12.8% Vs 13.9%) and G2-P287 (12.5% Vs13.7%). R1b-M343 was found in higher frequency among GCy (11.9%) than TCy (6.8%), while the same applies for E-M123 (13.1% Vs 6.3%). Finally, haplogroup, although in much lower frequencies than the aforementioned haplogroups, haplogroup I2 was somewhat higher among TCy (6.8%), than among GCy (2.3%), while haplogroup J2b was higher among GCy (5.8%) than TCy (1.8%). Other, less common haplogroups (i.e. I1, R1a, L, and T) showed similar frequencies (in the range of 1–5%) between GCy and TCy.

One additional difference between GCy and TCy was the presence of moderate numbers of East Eurasian (primarily Central Asian) Y-haplogroups and small numbers of North African Y-haplogroups among TCy but not among GCy. The frequency of East Eurasian haplogroups among TCy was C-M130 (0.5%), H-L901 (0.3%), N-M231 (2.4%), O-M175 (0.8%) and Q-M242 (1.3%), reaching a total of 5.6%, but only totalling 0.6% among GCy. North African haplogroups (E-M81, E-V38) were only found among TCy (2.1%) (S6 and S7 Figs).

A major feature differentiating Cypriots from Greeks, is the much lower frequency of haplogroups I (2.9% GCy, 7.3% TCy, ~10–21% mainland Greeks) and R1a (2.9% GCy, 3.2% TCy, ~10–22% mainland Greeks) among the former. All differences in haplogroup frequencies between populations were statistically significant (Fisher’s Exact test, p<0.001).


In terms of Y-haplogroup distribution, Cypriots (GCy and TCy) show substantial differences from Greeks, characterized by much lower frequency of haplogroups I2, R1a, and R1b in the former. These haplogroup differences indicate differential migrations into Cyprus and mainland Greece, at different points in history and prehistory. I2 is considered the major haplogroup among Mesolithic European Hunter-Gatherers[60], who apparently were either absent from Cyprus or were totally diluted (nearly extinguished) by subsequent migrations. Although the exact origins and migratory patterns of R1a and R1b are still under rigorous investigation, it seems that they are linked to Bronze Age migrations from the Western Eurasian Steppe and Eastern Europe into Southern (including Greece) and Western Europe[61]. Apparently, such migrations (especially as regards R1a) into Cyprus were limited.

Additionally, the Greek population has received considerable migrations during the Byzantine era and the Middle Ages from other Balkanic populations, such as Slavs[62,63], Aromanians (Vlachs)[64], and Albanians (Arvanites)[65,66]. The former, is very likely to have increased R1a frequencies among Greeks. In fact, Fig 3 (also S7 Table) indicate that R1a increases gradually with increasing latitude in Greece. There is no historical evidence for such migrations into Cyprus during the same period.

Heraclides A, Bashiardes E, Fernández-Domínguez E, Bertoncini S, Chimonas M, Christofi V, et al. (2017) Y-chromosomal analysis of Greek Cypriots reveals a primarily common pre-Ottoman paternal ancestry with Turkish Cypriots. PLoS ONE 12(6): e0179474.

Tuesday, June 13, 2017

qpGraph models for the Kalash & Yamnaya

I'm pretty happy with this effort, but it's a very complex topology with a lot of admixture edges. Moreover, its highest Z score of nearly 3 suggests that it can be improved (Z >3 would mean a failed model). Indeed, I'd say that the Basal Eurasian admixture coefficients are a little too high, and perhaps Steppe_EBA is a few per cent more West Asian/Caucasian than it should be. More details about all of the graphs in this post are available here.

Obviously, the labels for the inferred ancestral populations, like North Caucasian, are speculative. In hindsight, it may have been better to use something like single letter labels.

But now that I have a fairly robust topology, I can try and ask some questions. For instance, is the inferred Caspian pop a better source of West Asian ancestry in Yamnaya than the so called North Caucasian one? The answer is probably no.

My main graph is also a decent statistical fit for at least a number Indian groups, like, for instance one of the Gujarati subpopulations labeled GujaratiD in the Human Origins dataset. But it fails marginally for Pathans, so it's not a robust solution for all of South Asia. Incredibly, using Andronovo instead of Yamnaya in the Pathan model makes it work. Tajiks can also be modeled in this way using Andronovo. I say incredibly, because Pathans and Tajiks are obviously Iranic speakers, and their Iranic ancestors in all likelihood arrived in South Asia from the Eurasian steppe much later than the Indo-Aryan ancestors of the Kalash and most Indians.

So what we might be seeing here is substructure within the steppe-related admixture amongst South Asians, with Indo-Aryan speakers apparently showing Yamnaya-related (Catacomb?) ancestry, and Iranic speakers, as well as possibly groups with significant Iranic ancestry, showing a preference for later Andronovo-related ancestry. I need to have a closer look at this. But it won't happen overnight; my brain is fried as it is after this effort, and I need to get some fresh air.

Update 14/06/2017: I've now had the chance to test many more Indo-Aryan and Iranic groups with my model. Most of these groups show a slight, non-significant, preference for Yamnaya_Samara as the steppe reference population. However, those that show a slight, and again non-significant, preference for Andronovo are usually Iranic, such as the Balochi in the graphs below. I'm not claiming that this proves anything, but I do think that it hints at something, and I'll try testing a few different hypotheses in the near future with qpGraph.

See also...

qpGraph open thread

Thursday, June 8, 2017

qpGraph open thread

I managed to put together a simple qpGraph model for the Kalash using present-day populations. It's largely based on the model for the Paniya by Nakatsuka et al. (see Supplementary Figure 5. here). The graph and pops files for my model can be downloaded here and here, respectively. I'm now working on a more complex model for the Kalash that includes ancient genomes from Eastern Europe and West Asia.

I'm willing take a few requests for qpGraph models in the comments below. Please note, however, that these requests will have to be accompanied by graph and pops files, and the graph files must be correctly set out; if they don't work, then they don't work, and you won't get your graph. On the other hand, you only need to supply pops files with the correct populations and I'll do the rest.

See also...

qpGraph models for the Kalash & Yamnaya

Ancient herders from the Pontic-Caspian steppe crashed into India: no ifs or buts

Wednesday, June 7, 2017

The pigtailed figures

Reconstructed Proto-Indo-European (PIE) vocabulary suggests that the speakers of PIE, who probably lived on the Pontic-Caspian steppe during the Eneolithic, were familiar with wool. Interestingly, ancient DNA suggests that Near Eastern-related ancestry first appeared on the Pontic-Caspian steppe during the Eneolithic, because Neolithic samples from the Pontic steppe in what is now Ukraine lack this type of admixture. Perhaps it first arrived there with women from south of the Caucasus who knew how to spin wool? Below are a couple of interesting quotes from Becker et al. 2016. Emphasis is mine:

For ancient Mesopotamia McCorriston has proposed a fundamental shift from linen-based to woollen textile production. [4] Drawing on evidence from cuneiform texts as well as faunal and botanical remains, she suggests that it was in the 3rd or perhaps late 4th millennium BCE that wool became the fibre of choice for everyday use. Recent archaeological and archaeozoological research, however, suggests a considerably earlier date, before the advent of writing. Written sources from the mid- to late 3rd millennium BCE demonstrate that sheep and goats were maintained in herds of some dozens to a few hundred and herded in large flocks up to several thousand animals. In fact, cuneiform records provide ample evidence for the usage of wool in textile manufacture, whereas linen appears only rarely. The growth of a large-scale woollen textile industry rested on women as the main source of labour.


During the Late Uruk and Jemdat Nasr periods in Mesopotamia, scenes appear on cylinder seals that have been interpreted as showing textile production carried out by so-called pigtailed figures. [93] A specific raw material cannot be deduced from these depictions, but the substantial number of scenes indicates a significant concern with cloth manufacture.

Becker et al., The Textile Revolution. Research into the Origin and Spread of Wool Production between the Near East and Central Europe, eTopoi, Special Volume (6) 2016, (ISSN 2192-2608)

See also...

A plausible model for the formation of the Yamnaya genotype

A homeland, but not the homeland

Monday, June 5, 2017

Ancient human genomes from Southern Africa (Schlebusch et al. 2017 preprint)

Over at bioRxiv at this LINK. Emphasis is mine:

Abstract: Southern Africa is consistently placed as one of the potential regions for the evolution of Homo sapiens. To examine the region's human prehistory prior to the arrival of migrants from East and West Africa or Eurasia in the last 1,700 years, we generated and analyzed genome sequence data from seven ancient individuals from KwaZulu-Natal, South Africa. Three Stone Age hunter-gatherers date to ~2,000 years ago, and we show that they were related to current-day southern San groups such as the Karretjie People. Four Iron Age farmers (300-500 years old) have genetic signatures similar to present day Bantu-speakers. The genome sequence (13x coverage) of a juvenile boy from Ballito Bay, who lived ~2,000 years ago, demonstrates that southern African Stone Age hunter-gatherers were not impacted by recent admixture; however, we estimate that all modern-day Khoekhoe and San groups have been influenced by 9-22% genetic admixture from East African/Eurasian pastoralist groups arriving >1,000 years ago, including the Ju|'hoansi San, previously thought to have very low levels of admixture. Using traditional and new approaches, we estimate the population divergence time between the Ballito Bay boy and other groups to beyond 260,000 years ago. These estimates dramatically increases the deepest divergence amongst modern humans, coincide with the onset of the Middle Stone Age in sub-Saharan Africa, and coincide with anatomical developments of archaic humans into modern humans as represented in the local fossil record. Cumulatively, cross-disciplinary records increasingly point to southern Africa as a potential (not necessarily exclusive) 'hot spot' for the evolution of our species.

Schlebusch et al., Ancient genomes from southern Africa pushes modern human divergence beyond 260,000 years ago, bioRxiv, Posted June 5, 2017, doi:

Friday, June 2, 2017

The healthy Kurgan pastoralist

Just in at bioRxiv, a new preprint on the genomic health of ancient hominins, at this LINK. Obviously, if it's true that the Yamnaya and other closely related Kurgan culture pastoralists of the ancient Eurasian steppe had unusually healthy genomes, then it becomes easier to understand why they made such a massive impact on the ancestry of present-day Europeans and Central and South Asians, because populations that enjoy good health are likely to grow faster than those that don't. From the preprint, emphasis is mine:

Abstract: The genomes of ancient humans, Neandertals, and Denisovans contain many alleles that influence disease risks. Using genotypes at 3180 disease-associated loci, we estimated the disease burden of 147 ancient genomes. After correcting for missing data, genetic risk scores were generated for nine disease categories and the set of all combined diseases. These genetic risk scores were used to examine the effects of different types of subsistence, geography, and sample age on the number of risk alleles in each ancient genome. On a broad scale, hereditary disease risks are similar for ancient hominins and modern-day humans, and the GRS percentiles of ancient individuals span the full range of what is observed in present day individuals. In addition, there is evidence that ancient pastoralists may have had healthier genomes than hunter-gatherers and agriculturalists. We also observed a temporal trend whereby genomes from the recent past are more likely to be healthier than genomes from the deep past. This calls into question the idea that modern lifestyles have caused genetic load to increase over time. Focusing on individual genomes, we find that the overall genomic health of the Altai Neandertal is worse than 97% of present day humans and that Otzi the Tyrolean Iceman had a genetic predisposition to gastrointestinal and cardiovascular diseases. As demonstrated by this work, ancient genomes afford us new opportunities to diagnose past human health, which has previously been limited by the quality and completeness of remains.


Both the allergy/autoimmune and gastrointestinal/liver disease categories (which share many of the same disease-associated loci) show significantly lower genetic risk in pastoralists than agriculturalists and hunter gatherers. Pastoralists also have significantly reduced risk for cancer compared to agriculturalists. Agriculturalists have a higher genetic risk for dental/periodontal diseases than hunter-gatherers and pastoralists. In general, pastoralists possess extremely healthy genomes, especially for cancers and immune-related, periodontal, and gastrointestinal diseases.


It is unclear why pastoralists would have the lowest risk in these specific disease categories. We caution that this pattern may be the result of technical issues, as pastoralists have the smallest sample size (only 19 individuals) and geographic range (between 40-90°E longitude and 45-55°N latitude, Figure 1B). Because populations that have different subsistence types also differ in other ways, the lower GRS of pastoral populations may be due to other factors, including demographic history.

Ali J. Berens, Taylor L. Cooper, Joseph Lachance, The Genomic Health Of Ancient Hominins, bioRxiv, Posted June 2, 2017, doi:

Wednesday, May 31, 2017

A homeland, but not the homeland

It seems increasingly likely that ancient DNA has identified a massive expansion, or a series of expansions, from Mesopotamia and/or surrounds in basically all directions dating to the Chalcolithic (ChL) and Bronze Age (BA). This phenomenon is mainly characterized by the simultaneous spread of:
- Iran_ChL-related genome-wide ancestry

- Y-haplogroup J

- South Caspian-specific mitochondrial haplogroups such as R2 and U7

At least two of these characteristics are shared by five groups that have appeared in the Near Eastern and African ancient DNA record as probable post-Neolithic newcomers, at least in part, at their respective sampling sites:

- Anatolia_BA, Western Turkey, 2836-1800 calBCE (Lazaridis et al. 2017)

- Egyptian mummies, Middle Egypt, 776-2 calBCE (Schuenemann et al. 2017)

- Iran_ChL, Western Iran, 4839-3796 calBCE (Lazaridis et al. 2016)

- Levant_BA, Northwestern Jordan, 2489-1966 calBCE (Lazaridis et al. 2016)

- Sidon_BA, Southern Lebanon, 1750-1600 BCE (Haber et al. 2017)

I'm confident that many more such groups will soon be added to the ancient DNA record, probably including Levant_ChL from the upcoming Harney et al. 2017 (a teaser of the paper can be seen here). Below, a map of Mesopotamia courtesy of Wikipedia.

It's an interesting and important question who these likely Mesopotamian migrants and their descendants were in terms of linguistic affinities. It seems that they left a massive genetic imprint on the Near East and much of North Africa, and perhaps also Central Asia and Southeastern Europe, so they probably also left some sort of linguistic legacy.

Obviously, it's highly improbable that most of them were Indo-European speakers. So if most of them weren't Indo-Europeans, then the phenomenon I'm describing here can't be related to the Proto-Indo-European (PIE) expansion. Forget the idea of an West Asian linguistic hot spot spewing out different, distantly related language families, including Indo-European, via the migrations of closely related Iran_ChL-like populations over a span of a few thousand years; it's plain stupid.

So who were they?

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Tuesday, May 30, 2017

Ancient Egyptians less Sub-Saharan than present-day Egyptians

Over at Nature Communications at this LINK. Emphasis is mine:

Abstract: Egypt, located on the isthmus of Africa, is an ideal region to study historical population dynamics due to its geographic location and documented interactions with ancient civilizations in Africa, Asia and Europe. Particularly, in the first millennium BCE Egypt endured foreign domination leading to growing numbers of foreigners living within its borders possibly contributing genetically to the local population. Here we present 90 mitochondrial genomes as well as genome-wide data sets from three individuals obtained from Egyptian mummies. The samples recovered from Middle Egypt span around 1,300 years of ancient Egyptian history from the New Kingdom to the Roman Period. Our analyses reveal that ancient Egyptians shared more ancestry with Near Easterners than present-day Egyptians, who received additional sub-Saharan admixture in more recent times. This analysis establishes ancient Egyptian mummies as a genetic source to study ancient human history and offers the perspective of deciphering Egypt’s past at a genome-wide level.

Schuenemann et al., Ancient Egyptian mummy genomes suggest an increase of Sub-Saharan African ancestry in post-Roman periods, Nature Communications 8, Article number: 15694 (2017), doi:10.1038/ncomms15694

See also...

A homeland, but not the homeland

Friday, May 26, 2017

Canaanite genomes (Haber et al. 2017 preprint)

Over at bioRxiv at this LINK:

Abstract: The Canaanites inhabited the Levant region during the Bronze Age and established a culture which became influential in the Near East and beyond. However, the Canaanites, unlike most other ancient Near Easterners of this period, left few surviving textual records and thus their origin and relationship to ancient and present-day populations remain unclear. In this study, we sequenced five whole-genomes from ~3,700-year-old individuals from the city of Sidon, a major Canaanite city-state on the Eastern Mediterranean coast. We also sequenced the genomes of 99 individuals from present-day Lebanon to catalogue modern Levantine genetic diversity. We find that a Bronze Age Canaanite-related ancestry was widespread in the region, shared among urban populations inhabiting the coast (Sidon) and inland populations (Jordan) who likely lived in farming societies or were pastoral nomads. This Canaanite-related ancestry derived from mixture between local Neolithic populations and eastern migrants genetically related to Chalcolithic Iranians. We estimate, using linkage-disequilibrium decay patterns, that admixture occurred 6,600-3,550 years ago, coinciding with massive population movements in the mid-Holocene triggered by aridification ~4,200 years ago. We show that present-day Lebanese derive most of their ancestry from a Canaanite-related population, which therefore implies substantial genetic continuity in the Levant since at least the Bronze Age. In addition, we find Eurasian ancestry in the Lebanese not present in Bronze Age or earlier Levantines. We estimate this Eurasian ancestry arrived in the Levant around 3,750-2,170 years ago during a period of successive conquests by distant populations such as the Persians and Macedonians.


However, the present-day Lebanese, in addition to their Levant_N and Iranian ancestry, have a component (11-22%) related to EHG and Steppe populations not found in Bronze Age populations (Figure 3A). We confirm the presence of this ancestry in the Lebanese by testing f4(Sidon_BA, Lebanese; Ancient Eurasian, Chimpanzee) and find that Eurasian hunter-gatherers and Steppe populations share more alleles with the Lebanese than with Sidon_BA (Figure 3B). We next tested a model of the present-day Lebanese as a mixture of Sidon_BA and any other ancient Eurasian population using qpAdm. We found that the Lebanese can be best modelled as Sidon_BA 93±1.6% and a Steppe Bronze Age population 7±1.6% (Figure 3C; Table S6).

Haber et al., Continuity and admixture in the last five millennia of Levantine history from ancient Canaanite and present-day Lebanese genome sequences, bioRxiv, Posted May 26, 2017, doi:

See also...

Yamnaya-related ancestry proportions in Europe and west Asia

A homeland, but not the homeland

Thursday, May 25, 2017

A few more ancient genomes from the Balkans and Iberia

Open access at Current Biology:

Our results show major Western hunter-gatherer (WHG) ancestry in a Romanian Eneolithic sample [GB1_Eneo] with a minor, but sizeable, contribution from Anatolian farmers, suggesting multiple admixture events between hunter-gatherers and farmers.

González-Fortes et al., Paleogenomic Evidence for Multi-generational Mixing between Neolithic Farmers and Mesolithic Hunter-Gatherers in the Lower Danube Basin, Current Biology, Published Online: May 25, 2017, DOI:

See also...

The genomic history of Southeastern Europe (Mathieson et al. 2017 preprint)

Anywhere but the steppe

Last week Scientific Reports put out a paper by Sarno et al. on the population history of Sicily and South Italy. I didn't blog about it at the time because I felt that it was generally a weak effort and not worth advertising. But people keep bringing it up in the comments section, so here goes.

If you download the PDF and do a search for "Africa", you'll see that the only time it comes up is in the bibliography. "Maghreb" doesn't come up at all.

Can anyone explain this? I can't. If you're doing a paper on the population history of Sicily and South Italy and you don't take a close look at the fairly recent North African admixture there, then at best you're naive and confused.

Also, the authors try to enter the Proto-Indo-European (PIE) homeland debate. They basically argue that Indo-European (IE) languages could not have arrived in Southeastern Europe from the Pontic-Caspian steppe because modern-day Southeastern Europeans overall don't pack much Bronze Age steppe admixture. They also claim that based on their admixture dating efforts (which may or may not be accurate) the steppe ancestry by and large arrived in the east Mediterranean during the early Middle Ages with Slavic migrations. Thus, they suggest that a better PIE homeland alternative to the Pontic-Caspian steppe might be West Asia.

These are very weak arguments for a number of important reasons. For instance, language change can happen without massive migrations from afar. Case in point: the Etruscans were a sizable non-IE speaking population in Southeastern Europe until historic times, and discarded their Etruscan language in favor of the IE Latin by being subsumed into the Roman Empire. Indeed, Southeastern Europe has been a bit of a hotspot for this type of thing; Razib has a little more on that and the admixture dating here.

Also worth positing is the likely scenario in which much of the Bronze Age steppe ancestry in Southeastern Europe has been diluted by more recent admixture from the Near East and North Africa. It's hard to say for sure to what extent without direct evidence from ancient DNA, but this is something that should have been considered in the paper.

I won't be blogging much from now on about population history papers based on modern-day samples, because such papers aren't usually worth blogging about.


Sarno et al., Ancient and recent admixture layers in Sicily and Southern Italy trace multiple migration routes along the Mediterranean, Scientific Reports 7, Article number: 1984, (2017), doi:10.1038/s41598-017-01802-4

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Sunday, May 21, 2017

Steppe invaders in the Bronze Age Balkans

In a recent blog post announcing the end of the search for the Late Proto-Indo-European (PIE) homeland I wrote this:

But of course I2a has also been recorded in prehistoric samples from the Pontic-Caspian steppe. So, you might ask, why did the populations migrating out of the steppe belong to R1a and R1b, and why did some of them seemingly carry only R1a while others only R1b? This can be explained by local founder effects on the steppe due to patrilocality. Moreover, it's possible that some groups moving out of the steppe did carry high frequencies of I2a, but they're yet to enter the ancient DNA record.

Actually, in hindsight, such a population has probably already shown up in the ancient DNA record, via two Early Bronze Age (EBA) individuals from the Balkans in the Mathieson et al. 2017 preprint:

Balkans_BronzeAge I2165: Y-hg I2a2a1b1b mt-hg T2f 3020-2895 calBCE

Yamnaya_Bulgaria Bul4: Y-hg I2a2a1b1b mt-hg ? 3012-2900 calBCE

Both samples are from burial sites in present-day South-Central Bulgaria. Apart from sharing I2a2a1b1b, they each pack a fair bit of Yamnaya-related ancestry and are dated to a very similar time period. Unlike Bul4, I2165 does not make the cut archaeologically as a Yamnaya sample, but he does come from a Tumulus (Kurgan-like) burial, so perhaps he's from a group influenced by Yamnaya?

By the way, the I2a2a1b1b lineage is also shared by Yamnaya_Kalmykia RISE552, and as far as I can tell, the oldest individual sampled to date belonging to this line is Ukraine_Neolithic I1738, dated to 5473-5326 calBCE. So I2a2a1b1b appears to be a Pontic-Caspian steppe marker.

The same paper also includes the following individual from present-day Bulgaria dated to the start of the Late Bronze Age (LBA), which is roughly when the Mycenaeans appeared nearby in what is now Greece:

Bulgaria_MLBA I2163: Y-hg R1a1a1b2 mt-hg U5a2 1750-1625 calBCE

This guy is the most Yamnaya-like of all of the Balkan samples in Mathieson et al. 2017, and, as far as I can see based on his overall genome-wide results, probably indistinguishable from the contemporaneous Srubnaya people of the Pontic-Caspian steppe. He also belongs to Y-haplogroup R1a-Z93, which is a marker typical of Srubnaya and other closely related steppe groups such as Andronovo, Potapovka and Sintashta. So there's very little doubt that he's either a migrant or a recent descendant of migrants to the Balkans from the Pontic-Caspian steppe.

The presence of multiple individuals like this in the still rather spotty Balkan Bronze Age ancient DNA record suggests that this part of Europe experienced sustained and possibly at times large scale incursions of various peoples from the Pontic-Caspian steppe throughout the Bronze Age.

Here's one of the Principal Component Analyses (PCA) plots from Mathieson et al. 2017, edited by me to highlight the above mentioned three samples, as well as the anything but weak impact of gene flow from the Pontic-Casian steppe on the Balkans during the Bronze Age. Just in case some of you are confused, I added an arrow pointing to the cluster that most of the Balkan Bronze Age samples are pulling towards.

Of course, many of us are now eagerly awaiting a paper on the genetic origins of the Minoans and Mycenaeans. The latter are one of the few attested Indo-European speakers from prehistory, so their genetic structure may prove pivotal in the Indo-European homeland debate.

I know for a fact that a couple of ancient DNA labs have been working on such a paper for a while now, but I haven't heard anything about the results. However, just looking at the PCA above, I'd be shocked if the Mycenaean samples did not show a strong signal of gene flow from the Pontic-Caspian steppe. If so, the implications of this will be obvious.


Mathieson et al., The Genomic History Of Southeastern Europe, bioRxiv, Posted May 9, 2017, doi:

Saturday, May 20, 2017

A plausible model for the formation of the Yamnaya genotype

Strictly speaking, not just the Yamnaya genotype, but also Afanasievo, early Corded Ware and Poltavka. In other words, what has been referred to in recent scientific literature as Steppe_EMBA:

- From the Eneolthic onwards, human populations on the Pontic-Caspian steppe in Eastern Europe became increasingly mobile (as evidenced by the downsizing of cemeteries, the appearance of Kurgan burial mounds all over this part of the Eurasian steppe, and the presence of increasingly sophisticated wagons and eventually also chariots as grave goods in burials).

- Greater mobility led to new contacts and more intense contacts between populations once separated by distance, but now practically neighbors, and thus also to a homogenization of culture across vast areas, and the appearance of the Yamnaya horizon across the entire Pontic-Caspian steppe during the Early Bronze Age.

- When humans are mobile and they share culture and lifestyle, they usually mix in a big way, so the Pontic-Caspian steppe was probably one big melting pot from the Eneolithic onwards, and especially during the Yamnaya period.

- It's likely that low population densities in Eastern Europe during the Eneolithic ensured the rapid spread and rise of admixture from the Caucasus across much of the Pontic-Caspian steppe, which then plateaued at around 50% during the Yamnaya period, when population densities on the steppe may have become high enough so that continued gene flow from the Caucasus no longer had much of an impact.

- The process that led to the Yamnaya genotype eventually led to its extinction by the Late Bronze Age, due to the large scale spread of Middle Neolithic European farmer ancestry across the entire Pontic-Caspian steppe, probably from its western half, resulting in the formation of the Steppe_MLBA genotype, exemplified by the Sintashta and Srubnaya people.

- Ancient DNA suggests that Bronze Age steppe groups were highly patrilocal, and if so, it's likely that most of the mixture on the steppe at this time was facilitated via female exogamy (i.e. foreign brides), which would explain the lack of typically Caucasian Y-haplogroups, such as J2, in Bronze Age steppe and derived ancient groups sampled to date, such as the Corded Ware people and eastern Bell Beakers.

My theory that most of the mixture on the Eneolithic/Bronze Age steppe was facilitated via female exogamy has proved to be a somewhat controversial one in the comments section here. It's usually vehemently opposed by people who prefer to see the Indo-European homeland in the Caucasus or Iran rather than Eastern Europe, because they realize that a female mediated spread of southern admixture into the steppe lessens the chance that it was accompanied by the introduction of the patriarchal language and culture of the early Indo-Europeans.

But there's nothing in the data currently available to suggest that I'm talking nonsense. In fact, the recent Mathieson et al. 2017 preprint on the population history of Southeastern Europe and surrounds includes several ancient female samples from the Pontic-Caspian steppe that appear to back up my theory:

- Yamnaya_Ukraine_outlier I1917: by far the most West Asian-shifted Yamnaya individual to date, sitting about half way between the Yamnaya cluster and present-day Caucasians in a Principal Component Analysis (PCA) of West Eurasian populations, and belonging to the typically Near Eastern mtDNA haplogroup R0a1. What this strongly suggests is that her father was from the Pontic-Caspian steppe and mother probably from the Near East, perhaps from the Caucasus, or at least of fully Near Eastern origin; an obvious smoking gun for what I've been arguing.

- Ukraine_Neolithic_outlier I4110: by far the most West Asian-shifted Ukraine Neolithic/Eneolithic individual to date, sitting about 1/3 of the way from the Ukraine Mesolithic/Neolithic cluster to present-day Caucasians in a PCA of West Eurasian populations, and belonging to the typically Near Eastern mtDNA haplogroup J2b1. What this strongly suggests is that her mother was largely of Near Eastern origin, possibly from the southern periphery of the Pontic-Caspian steppe; another smoking gun for what I've been arguing.

- Yamnaya_Ukraine I2105 & I3141: both from just north of the Sea of Azov, and yet both practically indistinguishable from Yamnaya samples from sites several hundred kilometers to the east in Kalmykia and Samara. These individuals are potential evidence of female exogamy amongst far flung Yamnaya groups.

Below is a PCA from Mathieson et al. 2017 showing where these samples cluster in respect to other ancients, slightly edited by me to highlight the two outliers.

Let me just reiterate that I'm not using these four genomes to claim that I'm right. All I'm saying is that they appear to support my arguments. The fact that they're all in one paper is either a pretty amazing coincidence or a sign of things to come. Let's wait and see.


Mathieson et al., The Genomic History Of Southeastern Europe, bioRxiv, Posted May 9, 2017, doi:

See also...

The pigtailed figures

Women on the move

Thursday, May 18, 2017

Two early Slavs from Bohemia

Two Bohemian Bell Beaker genomes from Allentoft et al. 2015 - RISE568 and RISE569 - are labeled as early Czech Slavs in the new Mathieson et al. 2017 preprint (see rows 148 and 149 in the spreadsheet here).

Obviously these samples were initially wrongly dated to the Bronze Age and misidentified. They really date to 600-900 CE and 660-770 calCE, respectively. It's an unfortunate mistake, but also an interesting situation, because they've been analyzed in great detail in several papers and on this blog, and no one suspected that anything was wrong.

So the fact that these two Medieval Slavs from East Central Europe passed so convincingly for eastern Bell Beakers is a hint of very strong genetic continuity in the region since the Bronze Age. Indeed, they're very similar to present-day Czechs, western Poles (from Poznan), and eastern Germans, except perhaps with lower excess Western Hunter-Gatherer (WHG) ancestry and higher Yamnaya-related ancestry.

This is where RISE569, the higher coverage of the two genomes, clusters in my Principal Component Analysis (PCA) of West Eurasian populations.

Unfortunately, both are females, so there's no Y-DNA data. But I suspect that if there was, we'd probably know something was wrong, because their Y-chromosome haplogroups may have turned out to be relatively young Slavic-specific subclades of R1a-M548 and/or R1a-Z280.

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

PCA projection bias fix

A new version of EIGENSOFT has just been posted at GitHub (see here). It offers two flags to minimize the problem of Principal Component Analysis (PCA) projection bias or shrinkage: shrinkmode: YES and autoshrink: YES. For more details refer to the contents of the tarball here.

Thus, when running the new EIGENSOFT and you're wanting to project a sample or a set of samples onto the variation of another set of samples, include the lsqproject: YES flag to account for missing data, and then either shrinkmode: YES or autoshrink: YES. I haven't tried this myself yet, but according to the README file in the tarball linked to above, shrinkmode: YES gives better results but takes up much more CPU time.

PCA projection bias is a problem that I've been whining about for a while now (for instance, see here). I actually have my own simple techniques to get around it that appear to work very well, so I'm not sure if I'll be using the new flags. But I might after I try them out. I'd certainly urge the authors of upcoming ancient DNA papers to do so.

Wednesday, May 17, 2017

European blond hair may have originated on the North Eurasian Mammoth steppe

The quote below is from the recent Mathieson et al. 2017 preprint on the population history of Southeastern Europe and surrounds. Surprisingly, this titbit hasn't received much attention yet considering the fascination that many people have with blond hair and blonds.

The derived allele of the KITLG SNP rs12821256 that is associated with – and likely causal for – blond hair in Europeans [4,5] is present in one hunter-gatherer from each of Samara, Motala and Ukraine (I0124, I0014 and I1763), as well as several later individuals with Steppe ancestry. Since the allele is found in populations with EHG but not WHG ancestry, it suggests that its origin is in the Ancient North Eurasian (ANE) population. Consistent with this, we observe that earliest known individual with the derived allele is the [Siberian] ANE individual Afontova Gora 3 which is directly dated to 16130-15749 cal BCE (14710±60 BP, MAMS-27186: a previously unpublished date that we newly report here).

Here's a really nice shot of one of the last remnants of the Mammoth steppe on the border of Mongolia and the Republic of Tuva (courtesy of Александр Лещёнок at Wikipedia). All it needs is a few mammoths grazing on the horizon and it's like we're back in 15,000 BCE.

I'd say a strong case can be made that modern-day European populations with the highest frequencies of blond hair also show the highest levels of ANE ancestry in Europe (for instance, Baltic Finns, Scandinavians and Balts). You can check the ANE levels in hundreds of modern-day and ancient individuals in my Basal-rich K7 spreadsheet here. The K7 is not a perfect measure of ANE admixture, but I'd say it's accurate enough, especially in relative terms.

On a related note, the Swedish web portal has an article on the latest ancient DNA research on the peopling of Scandinavia, focusing on the migrations of Western European Hunter-Gatherers (WHG) and Eastern European Hunter-Gatherers (EHG) into the region during the Mesolithic.

De var de första svenskarna

Basically, the article broadly supports the findings of Mathieson et al. 2017, pointing out that WHG were likely blue eyed, dark haired and dark skinned, while EHG probably had variable eye coloring, but lighter hair and skin than WHG. I suppose what this implies is that the blue eyed blond phenotype most common today amongst Northern Europeans, like the Polish Danish tennis player below (picture courtesy of Wikipedia), is a relatively recent, perhaps post-Mesolithic, phenomenon.

What I don't get is why the Early Bronze Age Yamnaya people of the Pontic-Caspian Steppe were apparently so dark haired despite their extreme level of ANE ancestry and relatively close genetic relationship to modern-day Northern Europeans? On the other hand, the Middle Bronze Age Andronovo people of the Kazakh Steppe and South Siberia, who were largely derived from Yamnaya or a closely related group from the Pontic-Caspian Steppe, were probably often blue eyed and blond haired (see here). It's unlikely that natural selection alone could have lightened up the steppe people in such a relatively short time. Or is it?

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Friday, May 12, 2017

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

All of the post-Middle Neolithic samples from the recent Mittnik et al. and Saag et al. preprints on the ancient population history of the Baltic region belonged to Y-chromosome haplogroup R1a. And most of them belonged to the R1a-M417 (R1a1a) subclade that makes up almost 100% of the R1a lineages in the world today. This is what the results look like in a table (the sample IDs are of my own design):

Earlier samples from the same region belonged to Y-haplogroups I2a and R1a, but this was a subclade of R1a defined by the YP1272 mutation that is extremely rare today even in Northeastern Europe.

And now shifting our focus west of Scandinavia: all but two of the post-Middle Neolithic samples from around the North Sea from the recent Olalde et al. preprint on the Bell Beaker phenomenon and ancient population history of Northwest Europe belonged to Y-chromosome R1b, and more specifically to the R1b-M269 (R1b1a1a2) subclade, which makes up almost 100% of the R1b lineages in the world today. Here's a table:

Earlier samples from the same region belonged to Y-haplogroups I2a, I, G2a and CF, and most of the instances of I and the CF would probably be classified as I2a if not for missing data.

Interestingly, despite the R1a vs R1b dichotomy between these post-Middle Neolithic obvious newcomers to the Baltic and North Sea regions, respectively, they were very similar in terms of overall genetic structure, obviously closely related, starkly different from Middle Neolithic Northern Europeans, and in all likelihood mainly derived from the same homeland that was not located in Northern Europe.

So can we locate this homeland with any degree of certainty, you might wonder? In fact, you might ask, isn't this a futile search for the time being, as we await ancient DNA from many prehistoric Eurasian populations?

Not at all, because when attempting to answer this question we're bounded by two key constraints: the exceptionally high frequencies of R1a and R1b in the post-Middle Neolithic Baltic and North Sea samples, and their close genetic affinity to earlier and contemporaneous populations from the Pontic-Caspian Steppe, part of which is due to significant Caucasus Hunter-Gatherer (CHG) admixture that was lacking in Middle Neolithic Northern Europeans.

Indeed, to date, the Pontic-Caspian Steppe is the only region where both R1a and R1b have been found in ancient remains from the same sites dating to the Mesolithic, Neolithic and Eneolithic. Here's a table based on results from Mathieson et al. 2015 and 2017. The R and R1 might really be R1a or R1b if not for missing data.

The Pontic-Caspian Steppe also abuts the Caucasus foothills, and we know that CHG admixture was a major feature of its inhabitants from at least the Eneolithic. So odds are, and make no mistake, these are indeed excellent odds, that the homeland we're looking for was on the Pontic-Caspian Steppe.

But of course I2a has also been recorded in prehistoric samples from the Pontic-Caspian steppe. So, you might ask, why did the populations migrating out of the steppe belong to R1a and R1b, and why did some of them seemingly carry only R1a while others only R1b? This can be explained by local founder effects on the steppe due to patrilocality. Moreover, it's possible that some groups moving out of the steppe did carry high frequencies of I2a, but they're yet to enter the ancient DNA record. [Edit: Maybe they already have? See here]

Now, the aforementioned post-Middle Neolithic newcomers to the Baltic and North Sea regions are most certainly in large part the direct ancestors of modern-day Northern Europeans, speaking languages belonging to the three daughter branches of Late Proto-Indo-European (PIE): Balto-Slavic, Celtic and Germanic. It's highly unlikely that languages ancestral to these present-day languages were spoken by Middle Neolithic farmers, nor introduced into Northern Europe after it was colonized by the migrants from the Pontic-Caspian Steppe.

What this strongly suggests is that the Pontic-Caspian Steppe was also the late PIE homeland.

But, you might argue, the Pontic-Casian Steppe may have just been the expansion point for some of the late PIE language branches. No, that won't work. For one, modern-day populations speaking languages belonging to all other late PIE branches, such as Armenian, Greek, Indo-Iranian and Italic, show signals of the same population expansion from the Pontic-Caspian Steppe that gave rise to modern-day Northern Europeans, in the form of Yamnaya-related genome-wide genetic admixture and appreciable frequencies of Y-chromosome haplogroups R1a-M417 and/or R1b-M269.

Some of these signals are certainly due to fairly recent admixture from Northern Europeans, like in much of Greece as a result of the Slavic expansions during the Early Middle Ages, but most cannot be explained in this way.

Secondly, Balto-Slavic, Celtic and Germanic are not more closely related to each other than to some of the other late PIE branches. For instance, Balto-Slavic is considered far more closely related to Indo-Iranian than to Celtic, which is generally seen as a sister branch to Italic. Therefore, if Balto-Slavic and Celtic derive from a homeland on the Pontic-Caspian Steppe, then logically this is also where we should look for the origins of Indo-Iranian and Italic.

So as far as the late PIE homeland is concerned, thanks to ancient DNA, the debate is now practically over. But the PIE homeland debate is still wide open, or so we're told.

Apparently, Mathieson et al. 2017 aren't comfortable with putting the PIE homeland on the Pontic-Caspian Steppe because they can't find any evidence in their ancient DNA dataset of a significant migration through the Balkans that would potentially bring Anatolian languages from the Pontic-Caspian Steppe to Anatolia. From the paper:

One version of the Steppe Hypothesis of Indo-European language origins suggests that Proto-Indo European languages developed in the steppe north of the Black and Caspian seas, and that the earliest known diverging branch – Anatolian – was spread into Asia Minor by movements of steppe peoples through the Balkan peninsula during the Copper Age around 4000 BCE, as part of the same incursions from the steppe that coincided with the decline of the tell settlements. [51] If this were correct, then one way to detect evidence of it would be the appearance of large amounts of characteristic steppe ancestry first in the Balkan Peninsula, and then in Anatolia. However, our genetic data do not support this scenario. While we find steppe ancestry in Balkan Copper Age and Bronze Age individuals, this ancestry is sporadic across individuals in the Copper Age, and at low levels in the Bronze Age. Moreover, while Bronze Age Anatolian individuals have CHG/Iran Neolithic related ancestry, they have neither the EHG ancestry characteristic of all steppe populations sampled to date [20] , nor the WHG ancestry that is ubiquitous in southeastern Europe in the Neolithic (Figure 1A, Supplementary Data Table 2, Supplementary Information section 1). This pattern is consistent with that seen in northwestern Anatolia [11] and later in Copper Age Anatolia [23], suggesting continuing migration into Anatolia from the East rather than from Europe.

And this...

On the other hand, our data could still be consistent with the Steppe-Balkans-Anatolia route hypothesis model, albeit with constraints. It remains possible that populations dating to around 1600 BCE in the regions where the Indo-European Luwian, Hittite and Palaic languages were spoken did have European hunter-gatherer ancestry. However, our results would require that such ancestry was not ubiquitous in Bronze Age Anatolia, and was perhaps tightly linked to Indo-European speaking groups. We predict that additional insight about the genetic origins of the potential speakers of early Indo-European languages will be obtained when ancient DNA data become available from additional sites in this key period in Anatolia and the Caucasus.

But I'd say the authors are taking that one particular version of the Steppe Hypothesis way too seriously. They might even be implying things that the creator(s) of the said hypothesis never posited.

Why do they seemingly expect a massive surge of steppe admixture into the Balkans during the Copper Age? If the steppe people are just shooting through the Balkans on their way to Anatolia, why would they leave a lot of admixture along the way? And if the locals are abandoning their tell settlements and running for the hills as far away from the oncoming steppe invaders as they can, how exactly would they acquire steppe admixture? Osmosis or what?

The Balkans is not Northern Europe, and the hypothesized migration of the proto-Anatolians from the Pontic-Caspian Steppe to Anatolia through the Balkans was never, as far as I know, meant to parallel the massive Corded Ware expansion across Northern Europe. In other words, why should all of the early Indo-European expansions have been of the same character, especially considering that they moved into such starkly different areas of Eurasia?

Indeed, as Mathieson et al. 2017 point out in the quote above, the evidence for the fleeting presence of steppe peoples in the Copper Age Balkans is in their dataset. For instance, in their Varna 1 sample set from Bulgaria, three out of the five individuals show significant steppe admixture. One of these individuals is almost 50% Yamnaya-like. Surely, there's really no need to expect anything more than that when looking for signals of a proto-Anatolian migration from the Pontic-Caspian Steppe to Anatolia.

In fact, even though I do appreciate the incredible work these guys are doing and the data they're making available to myself and everyone else, I suspect that there's a little bit of, shall we say, schadenfreude going on here.

They sequenced all of three Early Bronze Age Anatolians of obscure origin (are they actually suspected Anatolian speakers, like Luwians?), and apparently it's a big deal that they can't find any steppe admixture in Early Bronze Age Anatolia. Come on.

And then we're offered just three Yamnaya samples from the Pontic Steppe in Ukraine. One happens to be a massive outlier towards the Caucasus. Wow, what are the chances of that? And guess what, all three of these Yamnayans are females, so of course we're left wondering about the Y-haplogroups of the Yamnaya males on the Pontic Steppe. What happened to the males? Next paper, that's what.

Update 19//05/2017: Please note that the authors are not holding back any Yamnaya males from Ukraine for a future paper, as per my claim in the last paragraph above. They used what they had for the time being.

Update 21/05/2017: Actually, I suspect that we already have a population from the Bronze Age steppe in the ancient DNA record with a high frequency of Y-haplogroup I2a. See here.

See also...

Ancient herders from the Pontic-Caspian steppe crashed into India: no ifs or buts

Eastern Europe as a bifurcation hotspot for Y-hg R1

Globular Amphora people starkly different from Yamnaya people

Wednesday, May 10, 2017

Ancient population shifts in western Iberia (Martiniano et al. 2017 preprint)

Over at bioRxiv at this LINK:

Abstract: We analyse new genomic data (0.05-2.95x) from 14 ancient individuals from Portugal distributed from the Middle Neolithic (4200-3500 BC) to the Middle Bronze Age (1740-1430 BC) and impute genomewide diploid genotypes in these together with published ancient Eurasians. While discontinuity is evident in the transition to agriculture across the region, sensitive haplotype-based analyses suggest a significant degree of local hunter-gatherer contribution to later Iberian Neolithic populations. A more subtle genetic influx is also apparent in the Bronze Age, detectable from analyses including haplotype sharing with both ancient and modern genomes, D-statistics and Y-chromosome lineages. However, the limited nature of this introgression contrasts with the major Steppe migration turnovers within third Millennium northern Europe and echoes the survival of non-Indo-European language in Iberia. Changes in genomic estimates of individual height across Europe are also associated with these major cultural transitions, and ancestral components continue to correlate with modern differences in stature.

Martiniano et al., The Population Genomics Of Archaeological Transition In West Iberia, bioRxiv, Posted May 10, 2017, doi:

The Bell Beaker Behemoth (Olalde et al. 2017 preprint)

Over at BioRxiv at this LINK:

Abstract: Bell Beaker pottery spread across western and central Europe beginning around 2750 BCE before disappearing between 2200-1800 BCE. The mechanism of its expansion is a topic of long-standing debate, with support for both cultural diffusion and human migration. We present new genome-wide ancient DNA data from 170 Neolithic, Copper Age and Bronze Age Europeans, including 100 Beaker-associated individuals. In contrast to the Corded Ware Complex, which has previously been identified as arriving in central Europe following migration from the east, we observe limited genetic affinity between Iberian and central European Beaker Complex-associated individuals, and thus exclude migration as a significant mechanism of spread between these two regions. However, human migration did have an important role in the further dissemination of the Beaker Complex, which we document most clearly in Britain using data from 80 newly reported individuals dating to 3900-1200 BCE. British Neolithic farmers were genetically similar to contemporary populations in continental Europe and in particular to Neolithic Iberians, suggesting that a portion of the farmer ancestry in Britain came from the Mediterranean rather than the Danubian route of farming expansion. Beginning with the Beaker period, and continuing through the Bronze Age, all British individuals harboured high proportions of Steppe ancestry and were genetically closely related to Beaker-associated individuals from the Lower Rhine area. We use these observations to show that the spread of the Beaker Complex to Britain was mediated by migration from the continent that replaced >90% of Britain's Neolithic gene pool within a few hundred years, continuing the process that brought Steppe ancestry into central and northern Europe 400 years earlier.

Olalde et al., The Beaker Phenomenon And The Genomic Transformation Of Northwest Europe, bioRxiv, Posted May 9, 2017, doi:

See also...

The genomic history of Southeastern Europe (Mathieson et al. 2017 preprint)

The genomic history of Southeastern Europe (Mathieson et al. 2017 preprint)

Over at BioRxiv at this LINK:

Abstract: Farming was first introduced to southeastern Europe in the mid-7th millennium BCE - brought by migrants from Anatolia who settled in the region before spreading throughout Europe. However, the dynamics of the interaction between the first farmers and the indigenous hunter-gatherers remain poorly understood because of the near absence of ancient DNA from the region. We report new genome-wide ancient DNA data from 204 individuals-65 Paleolithic and Mesolithic, 93 Neolithic, and 46 Copper, Bronze and Iron Age-who lived in southeastern Europe and surrounding regions between about 12,000 and 500 BCE. We document that the hunter-gatherer populations of southeastern Europe, the Baltic, and the North Pontic Steppe were distinctive from those of western Europe, with a West-East cline of ancestry. We show that the people who brought farming to Europe were not part of a single population, as early farmers from southern Greece are not descended from the Neolithic population of northwestern Anatolia that was ancestral to all other European farmers. The ancestors of the first farmers of northern and western Europe passed through southeastern Europe with limited admixture with local hunter-gatherers, but we show that some groups that remained in the region mixed extensively with local hunter-gatherers, with relatively sex-balanced admixture compared to the male-biased hunter-gatherer admixture that we show prevailed later in the North and West. After the spread of farming, southeastern Europe continued to be a nexus between East and West, with intermittent steppe ancestry, including in individuals from the Varna I cemetery and associated with the Cucuteni-Trypillian archaeological complex, up to 2,000 years before the Steppe migration that replaced much of northern Europe's population.

Mathieson et al., The Genomic History Of Southeastern Europe, bioRxiv, Posted May 9, 2017, doi:

See also...

Globular Amphora people starkly different from Yamnaya people

The Bell Beaker Behemoth (Olalde et al. 2017 preprint)

Monday, May 8, 2017

ESHG 2017 abstracts

The titles are already up but the abstracts will only be available this Saturday, May 13. The programme planner and abstract search engine are here. Below are links to a few random abstracts that caught my eye.

To be brutally honest, I suspect that the Rai et al. presentation on South Asian population history (first link below) won't amount to much more than a preemptive strike against the impending confirmation via ancient DNA that the Aryan invasion really did happen. In other words, I expect them to argue for strong genetic continuity in South Asia since at least the Neolithic and against the Aryan Invasion Theory (AIT).

Perhaps I'm being overly cynical and I'll apologize if I'm wrong, but I think it's a good bet, considering the many papers put out by Indian scientists over the past 15 years or so arguing that both the Indo-Aryans and "Aryan" Y-chromosome haplogroup R1a are native to South Asia. At best this is naive, and at worst plain crazy, but that doesn't seem to bother many of our Indian friends. Nevertheless, the ancient DNA sequenced as part of the Rai et al. study, when analyzed properly, should be very useful and I look forward to seeing it.

E-P18.02 - Reconstructing the human population history of the Indian subcontinent using ancient population genomics

C14.5 - Complex spatio-temporal distribution and genogeographic affinity of mitochondrial DNA haplogroups in 24,216 Danes

E-P18.03 - Genomic analysis of ethnic regions in Armenia

E-P18.21 - Detailed study of the genetic structure of the Volga-Ural region populations

P18.28D - The migrations and barriers that shaped the Central Asian Y-chromosomal pool

I don't have the time right now to do a detailed search of the database, so there might be many more titles that deserve attention. Feel free to post your favorite abstract in the comments below.

Update 13/05/2017: The Rai at el. abstract is up. It doesn't reveal any results, but it does list the types of ancient samples that they're testing. Emphasis is mine.

The more than 1.3 billion people who live in Indian subcontinent correspond to several large ethnic groups who are highly diverse and complex. Importantly, India’s genetic past remains a subject a great debate due to numerous hypotheses surrounding population origins and migrations within and from outside India. In order to reconstruct and explain the patterns of genetic diversity evident in modern humans, an understanding of both past and present population dynamics is crucial. Several studies have shown that genetic data from ancient individuals are indispensable when reconstructing past population histories. We for the first time use the ancient genomics approach in South Asia to reconstruct the complex human population history of Indian Sub continent. We are exploring the recent technological advancement to directly test these hypotheses using ancient and modern human DNA in India. We have collected several ancient skeletal remains from different time scale of human civilization ranging from early Mesolithic, Neolithic, Harappan (Indus Valley civilization) and Megalithic culture. With the whole/partial genome NGS data, we are reconstructing the prehistoric peopling and migration of modern human in the Indian subcontinent. We are also testing the pervasive founder events and gradient of recessive genes accumulation by comparing the ancient genome with the modern human population of India.

Sunday, May 7, 2017

Through time AND space?

Ever since the publication of Lazaridis et al. 2016, the comments section here has seen regular debates about the nature and source of steppe-related ancestry in South Asia.

According to mixture models featured in that paper, the populations that brought steppe ancestry to South Asia probably lacked early European farmer (EEF) admixture. In other words, they were more like the samples from Early to Middle Bronze Age (EMBA) cultures Yamnaya, Afanasievo, and Poltavka, than those from Middle to Late Bronze Age (MLBA) cultures Sintashta, Andronovo and Srubnaya.

This of course poses a major dilemma to those of us interested in early Indo-European expansions, because the consensus amongst historical linguists is that Indo-Iranian languages were introduced into South Asia during the Late Bronze Age from the Andronovo horizon.

So how do we reconcile ancient genomics with historical linguistics in this case? Should we assume that the linguists are way off, and posit that Indo-Iranian languages were introduced into South Asia straight from the Poltavka or even Yamnaya culture, much earlier than generally accepted?

Not necessarily.

Lazaridis et al. 2016 identified three post-Poltavka steppe individuals in their dataset that lacked EEF ancestry and were thus more similar to samples from Poltavka than Andronovo: Potapovka I0246, Potapovka I0418 and Srubnaya_outlier I0354. So where did these outliers come from and how is it that their steppe ancestors managed to stay free of EEF admixture?

One possible explanation is that most of the population on the MLBA steppe didn't carry significant levels of EEF admixture, because it was largely limited to the elites. As a result, we might be getting a skewed picture of the genetic structure of the steppe at this time, because for obvious reasons the vast majority of Bronze Age steppe samples being tested are from the best preserved burials, which are usually elite Kurgan burials.

So why would these elites harbor EEF ancestry and the commoners lack it? Perhaps because the former migrated from deep within the European part of the steppe, and imposed their culture on populations derived from, say, Afanasievo, Catacomb, Poltavka and late Yamnaya? Potential evidence of such an expansion exists in the form of chariot burials with similar horse cheek pieces found all the way from the Carpathian Basin to Central Asia (refer to the third map from Allentoft et al. 2015 here).

In any case, one way or another Poltavka-like people managed to survive on the steppe, perhaps in considerable numbers, well into the Andronovo period and probably beyond. So considering that this type of genetic structure was transmitted on the steppe across the millennia, then why not also across space into South Asia?

Interestingly, it's often claimed that some of the rituals described in the early Indo-Aryan Rig Veda hymns are very similar to the Kurgan burial rituals practiced by Potapovka people (see here). This is open to interpretation and impossible to prove, but I can test whether the above mentioned three post-Poltavka steppe outliers, including the two Potapovka individuals, show the right type of genetic structure to be potentially ancestral to modern-day South Asians.

So using the qpAdm algorithm let's test a model in which the descendants or close relatives of these three samples, labeled as Potapovka2-Srubnaya_outlier, move into the Andronovo horizon and then onto South Asia, contributing significantly to the genetic structure of modern-day South Asians.












These models look fine in terms of the statistical fits. In fact, much more than just fine in most cases. My prediction is that a population like Potapovka2-Srubnaya_outlier will eventually be discovered on the Late Bronze Age steppe, perhaps even at a site linked to the Andronovo horizon, and it'll fit the bill as a main player in the story of the peopling of South Asia.

However, this population might not necessarily be isolated from its EEF-rich neighbors by geography, but rather by culture and even social class. In other words, we should expect significant substructures on the steppe at this late stage of the game, after a couple of millennia of intense mobility, and in a complex way too, not simply defined by geography.