A bioinformatic platform to integrate target capture and whole genome sequences of various read depths for phylogenomics
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F60076658%3A12310%2F21%3A43904322" target="_blank" >RIV/60076658:12310/21:43904322 - isvavai.cz</a>
Alternative codes found
RIV/60077344:_____/21:00547831
Result on the web
<a href="https://onlinelibrary.wiley.com/doi/10.1111/mec.16240" target="_blank" >https://onlinelibrary.wiley.com/doi/10.1111/mec.16240</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1111/mec.16240" target="_blank" >10.1111/mec.16240</a>
Alternative languages
Result language
angličtina
Original language name
A bioinformatic platform to integrate target capture and whole genome sequences of various read depths for phylogenomics
Original language description
The increasing availability of short-read whole genome sequencing (WGS) provides unprecedented opportunities to study ecological and evolutionary processes. Although loci of interest can be extracted from WGS data and combined with target sequence data, this requires suitable bioinformatic workflows. Here, we test different assembly and locus extraction strategies and implement them into secapr, a pipeline that processes short-read data into multilocus alignments for phylogenetics and molecular ecology analyses. We integrate the processing of data from low-coverage WGS (<30x) and target sequence capture into a flexible framework, while optimizing de novo contig assembly and loci extraction. Specifically, we test different assembly strategies by contrasting their ability to recover loci from targeted butterfly protein-coding genes, using four data sets: a WGS data set across different average coverages (10x, 5x and 2x) and a data set for which these loci were enriched prior to sequencing via target sequence capture. Using the resulting de novo contigs, we account for potential errors within contigs and infer phylogenetic trees to evaluate the ability of each assembly strategy to recover species relationships. We demonstrate that choosing multiple sizes of kmer simultaneously for assembly results in the highest yield of extracted loci from de novo assembled contigs, while data sets derived from sequencing read depths as low as 5x recovers the expected species relationships in phylogenetic trees. By making the tested assembly approaches available in the secapr pipeline, we hope to inspire future studies to incorporate complementary data and make an informed choice on the optimal assembly strategy.
Czech name
—
Czech description
—
Classification
Type
J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database
CEP classification
—
OECD FORD branch
10618 - Ecology
Result continuities
Project
<a href="/en/project/GJ20-18566Y" target="_blank" >GJ20-18566Y: The role of species interactions in the diversification of Neotropical butterflies at the macroevolutionary and microevolutionary scales</a><br>
Continuities
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2021
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Molecular Ecology
ISSN
0962-1083
e-ISSN
—
Volume of the periodical
30
Issue of the periodical within the volume
23
Country of publishing house
US - UNITED STATES
Number of pages
15
Pages from-to
6021-6035
UT code for WoS article
000712974600001
EID of the result in the Scopus database
2-s2.0-85118255763