Protein-coding gene annotation
Protein-coding genes of A. flavipes and Didelphis
virginiana were annotated using homology-based prediction, de novo
prediction, and RNA-seq-assisted prediction methods. Sequences of
homologous proteins from five mammals [human (Homo sapiens ),M. domestica ), P. cinereus , S. harrisii , andV. ursinus )] were downloaded from NCBI. These protein sequences
were aligned to the repeat-masked genome using BLAT v0.36 (Kent, 2002).
Genewise v2.4.1 (Birney, Clamp, & Durbin, 2004) was employed to
generate gene structures based on the alignments of proteins to the
genome assembly. De novo gene prediction was performed using AUGUSTUS
v3.2.3 (Stanke et al., 2006), GENSCAN v1.0 (Burge & Karlin, 1997), and
GlimmerHMM v3.0.1 (Majoros, Pertea, & Salzberg, 2004) with a human
training set. Transcriptome data were mapped to the assembled genome
using HISAT2 v2.1.0 (Kim, Paggi, Park, Bennett, & Salzberg, 2019) and
SAMtools v1.9 (Li et al., 2009), and coding regions were predicted using
TransDecoder v5.5.0 (Grabherr et al., 2011; Haas et al., 2013). A final
non-redundant reference gene set was generated by merging the three
annotated gene sets using EVidenceModeler v1.1.1 (EVM) (Haas et al.,
2008) and excluding EVM gene models with only ab initio support. The
gene models were translated into amino acid sequences and used in local
BLASTp (Camacho et al., 2009) searches against the public databases
Kyoto Encyclopedia of Genes and Genomes (KEGG; v89.1) (Kanehisa & Goto,
2000), Clusters of Orthologous Groups (COG) (Tatusov, Galperin, Natale,
& Koonin, 2000), NCBI non-redundant protein sequences (NR; v20170924)
(O’Leary et al., 2016), Swiss-Prot (release-2018_07) (UniProt, 2012),
TrEMBL (TRanslation of EMBL [nucleotide sequences that are not in
Swiss-Prot]; release-2018_07) (O’Donovan et al., 2002), and InterPro
(v69.0) (A. L. Mitchell et al., 2019). A total of 18,068 (93.5%) ofA. flavipes genes could be functionally annotated. Where specific
genes are named in this manuscript, human nomenclature assignments are
used unless otherwise noted.