Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development
Autoři:
Jian Zhou aff001; Ignacio E. Schor aff004; Victoria Yao aff001; Chandra L. Theesfeld aff001; Raquel Marco-Ferreres aff004; Alicja Tadych aff001; Eileen E. M. Furlong aff004; Olga G. Troyanskaya aff001
Působiště autorů:
Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
aff001; Graduate Program in Quantitative and Computational Biology, Princeton University, Princeton, New Jersey, United States of America
aff002; Center for Computational Biology, Flatiron Institute, New York, New York, United States of America
aff003; Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
aff004; Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
aff005
Vyšlo v časopise:
Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development. PLoS Genet 15(9): e32767. doi:10.1371/journal.pgen.1008382
Kategorie:
Research Article
doi:
https://doi.org/10.1371/journal.pgen.1008382
Souhrn
Comprehensive information on the timing and location of gene expression is fundamental to our understanding of embryonic development and tissue formation. While high-throughput in situ hybridization projects provide invaluable information about developmental gene expression patterns for model organisms like Drosophila, the output of these experiments is primarily qualitative, and a high proportion of protein coding genes and most non-coding genes lack any annotation. Accurate data-centric predictions of spatio-temporal gene expression will therefore complement current in situ hybridization efforts. Here, we applied a machine learning approach by training models on all public gene expression and chromatin data, even from whole-organism experiments, to provide genome-wide, quantitative spatio-temporal predictions for all genes. We developed structured in silico nano-dissection, a computational approach that predicts gene expression in >200 tissue-developmental stages. The algorithm integrates expression signals from a compendium of 6,378 genome-wide expression and chromatin profiling experiments in a cell lineage-aware fashion. We systematically evaluated our performance via cross-validation and experimentally confirmed 22 new predictions for four different embryonic tissues. The model also predicts complex, multi-tissue expression and developmental regulation with high accuracy. We further show the potential of applying these genome-wide predictions to extract tissue specificity signals from non-tissue-dissected experiments, and to prioritize tissues and stages for disease modeling. This resource, together with the exploratory tools are freely available at our webserver http://find.princeton.edu, which provides a valuable tool for a range of applications, from predicting spatio-temporal expression patterns to recognizing tissue signatures from differential gene expression profiles.
Klíčová slova:
Drosophila melanogaster – Embryos – Gene expression – Gene prediction – Machine learning algorithms – Transcriptome analysis – Pharyngeal muscles
Zdroje
1. Tomancak P, Berman BP, Beaton A, Weiszmann R, Kwan E, Hartenstein V, et al. Global analysis of patterns of gene expression during Drosophila embryogenesis. Genome Biol. 2007; doi: 10.1186/gb-2007-8-7-r145 17645804
2. Lécuyer E, Yoshida H, Parthasarathy N, Alm C, Babak T, Cerovina T, et al. Global Analysis of mRNA Localization Reveals a Prominent Role in Organizing Cellular Architecture and Function. Cell. 2007;131: 174–187. doi: 10.1016/j.cell.2007.08.003 17923096
3. Hammonds AS, Bristow CA, Fisher WW, Weiszmann R, Wu S, Hartenstein V, et al. Spatial expression of transcription factors in Drosophila embryonic organ development. Genome Biol. 2013; doi: 10.1186/gb-2013-14-12-r140 24359758
4. Wunderlich Z, Bragdon MD, DePace AH. Comparing mRNA levels using in situ hybridization of a target gene and co-stain. Methods. 2014;68: 233–241. doi: 10.1016/j.ymeth.2014.01.003 24434507
5. Chikina MD, Huttenhower C, Murphy CT, Troyanskaya OG. Global Prediction of Tissue-Specific Gene Expression and Context-Dependent Gene Networks in Caenorhabditis elegans. PLOS Comput Biol. 2009;5: e1000417. doi: 10.1371/journal.pcbi.1000417 19543383
6. Ju W, Greene CS, Eichinger F, Nair V, Hodgin JB, Bitzer M, et al. Defining cell-type specificity at the transcriptional level in human disease. Genome Res. 2013; doi: 10.1101/gr.155697.113 23950145
7. Samsonova AA, Niranjan M, Russell S, Brazma A. Prediction of gene expression in embryonic structures of Drosophila melanogaster. PLoS Comput Biol. 2007; doi: 10.1371/journal.pcbi.0030144 17658945
8. Kramerova IA, Kramerov AA, Fessler JH. Alternative splicing of papilin and the diversity of Drosophila extracellular matrix during embryonic morphogenesis. Dev Dyn. 2003;226: 634–642. doi: 10.1002/dvdy.10265 12666201
9. Sandmann T, Jensen LJ, Jakobsen JS, Karzynski MM, Eichenlaub MP, Bork P, et al. A Temporal Map of Transcription Factor Activity: Mef2 Directly Regulates Target Genes at All Stages of Muscle Development. Dev Cell. 2006;10: 797–807. doi: 10.1016/j.devcel.2006.04.009 16740481
10. Sandmann T, Girardot C, Brehme M, Tongprasit W, Stolc V, Furlong EEM. A core transcriptional network for early mesoderm development in Drosophila melanogaster. Genes Dev. 2007;21: 436–449. doi: 10.1101/gad.1509007 17322403
11. Chung S, Chavez C, Andrew DJ. Trachealess (Trh) regulates all tracheal genes during Drosophila embryogenesis. Dev Biol. 2011;360: 160–172. doi: 10.1016/j.ydbio.2011.09.014 21963537
12. Park CY, Wong AK, Greene CS, Rowland J, Guan Y, Bongo LA, et al. Functional Knowledge Transfer for High-accuracy Prediction of Under-studied Biological Processes. PLoS Comput Biol. 2013; doi: 10.1371/journal.pcbi.1002957 23516347
13. Quiring R, Walldorf U, Kloter U, Gehring WJ. Homology of the eyeless gene of Drosophila to the Small eye gene in mice and Aniridia in humans. Science (80-). 1994;265: 785–789. doi: 10.1126/science.7914031 7914031
14. Troyanskaya OG, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, et al. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;
15. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9: 357–359. doi: 10.1038/nmeth.1923 22388286
16. Brown JB, Boley N, Eisman R, May GE, Stoiber MH, Duff MO, et al. Diversity and dynamics of the Drosophila transcriptome. Nature. 2014;512: 393–399. doi: 10.1038/nature12962 24670639
17. Gramates LS, Marygold SJ, dos Santos G, Urbano J-M, Antonazzo G, Matthews BB, et al. FlyBase at 25: looking to the future. Nucleic Acids Res. 2017;45: D663–D671. doi: 10.1093/nar/gkw1016 27799470
18. Huttenhower C, Schroeder M, Chikina MD, Troyanskaya OG. The Sleipnir library for computational functional genomics. Bioinformatics. 2008; doi: 10.1093/bioinformatics/btn237 18499696
19. Wilczynski B, Liu Y-H, Yeo ZX, Furlong EEM. Predicting Spatial and Temporal Gene Expression Using an Integrative Model of Transcription Factor Occupancy and Chromatin State. PLOS Comput Biol. 2012;8: e1002798. doi: 10.1371/journal.pcbi.1002798 23236268
20. Schor IE, Bussotti G, Maleš M, Forneris M, Viales RR, Enright AJ, et al. Non-coding RNA Expression, Function, and Variation during Drosophila Embryogenesis. Curr Biol. 2018; doi: 10.1016/j.cub.2018.09.026 30393032
21. Stapleton M, Carlson J, Brokstein P, Yu C, Champe M, George R, et al. A Drosophilafull-length cDNA resource. Genome Biol. 2002;3: research0080. doi: 10.1186/gb-2002-3-12-research0080 12537569
22. Rembold M, Ciglar L, Omar Yáñez-Cuna J, Zinzen RP, Girardot C, Jain A, et al. A conserved role for Snail as a potentiator of active transcription. Genes Dev. 2014; doi: 10.1101/gad.230953.113 24402316
23. Schneider CA, Rasband WS, Eliceiri KW. NIH Image to ImageJ: 25 years of image analysis. Nat Methods. 2012;
Štítky
Genetika Reprodukční medicínaČlánek vyšel v časopise
PLOS Genetics
2019 Číslo 9
Nejčtenější v tomto čísle
- Origins of DNA replication
- Environmental and epigenetic regulation of Rider retrotransposons in tomato
- Integrating transcriptomic network reconstruction and eQTL analyses reveals mechanistic connections between genomic architecture and Brassica rapa development
- Temperature preference can bias parental genome retention during hybrid evolution