Population genetic analysis in P. oryzae
To detect genetic lineages within the 46 P. oryzae isolates for which we had full-genome information, we combined these data with 48 worldwide rice-infecting P. oryzae genomes published by (Gladieux et al., 2018). In Gladieux’s study, six rice-infecting lineages were described, two of which (lineages 5 and 6) being represented by one isolate each. Since then, two studies based on larger sets of fully sequenced and/or genome-wide genotyped isolates (Latorre et al., 2020; Thierry et al., 2021) showed that lineages 5 and 6 are in fact part of lineage 1. We used the phylogenetic network approach neighbor-net as implemented in Splitstree 4.13 (Bryant & Moulton, 2004). This allowed visualizing evolutionary relationships, while taking into account the possibility of recombination within or between lineages. We also assessed the genealogical relationships among the 46 YYT fully-sequenced P. oryzae isolates by analyzing pseudo-assembled genomic sequences (i.e. genomic sequences generated from the table of SNPs and reference sequences) with RAxML (Stamatakis, 2014). We used the General Time-Reversible model of nucleotide substitution with the Γ model of rate heterogeneity, and performed 100 bootstrap replicates to assess branch support. To assess the population subdivision among the genomic data of 46 P. oryzae isolates, sNMF analyses were performed by generating GenLight object in R-statistical environment. To assess population subdivision without consideringa priori information about the origin of samples, and without assuming random-mating, discriminant analyses of principal components (DAPC) were conducted on the microsatellite data for the 512 YYT and 45 worldwide isolates. DAPC analyses were done using the Adegenet package in R (Jombart, Devillard, & Balloux, 2010), by varying the number of inferred genetic clusters (K) from 2 to 10.
For microsatellite data, a distance-based neighbour-joining tree was generated with Population (Langella, 2008), and within-population diversity and linkage equilibrium parameters were estimated using Poppr package in R-environment (Kamvar, Tabima, & Grünwald, 2014). Nucleotidic diversity (π) within lineages identified using full-genome sequencing data was estimated using the package Egglib 3.0.0b10 (De Mita & Siol, 2012) and divergence among lineages was estimated in 10kb windows using the dxy statistic as implemented in the scikit-allel (https://zenodo.org/record/4759368#.YW1duxBBzwQ). LD decay along the genome was assessed within each genetic lineage using PopLDdecay (Zhang, Dong, Xu, He, & Yang, 2019). We determined the mating type of each resequenced isolate using a BLAST search of Mat1 and Mat2 idiomorphs sequences within each genome assembled de novo using ABySS 2.0 with default parameters (Jackman et al., 2017). The ancestral relationship and admixture among the YYT and worldwide lineages were assessed with Treemix (Pickrell & Pritchard, 2012), assuming various admixture events. The input files were extracted from the vcf file through in-house scripts, while the results were plotted in R through the script provided with the software.