Comparative analysis of genomic features in industrial and fast-growing trees: A study of poplar and eucalypt

Document Type : Scientific article

Author

Assistant professor, Ahar Faculty of Agriculture and Natural Resources, University of Tabriz, Ahar , Iran

Abstract

Extended abstract
Background and objectives: Identification of genetic similarities and gene orthology between species can be used in understanding the evolution of genomes, conservation and breeding. A lot of knowledge about the genome function of forest trees can be extracted through comparative genomics studies. So far, various economically important crop species have been well studied in this field, but forest trees have been less studied. It seems that comprehensive studies in the direction of genome comparison between industrial and fast-growing trees, poplar (Populus trichocarpa) and eucalypt (Eucalyptus grandis), which share a common ancestor from the Rosids clade, have been relatively limited, especially given that these two plant species serves as model and have up-to-date biological data. The aim of this study is to compare the complete genome sequence of eucalyptus and poplars in terms of genomic characteristics such as genome size, chromosome number, gene content, microsatellite markers, the number of genes of the terpene synthase gene family and identification of genes related to two important traits of interest to forest tree breeders, including wood formation and cell wall quality.
Methodology: In this research, whole genome sequencing of eucalyptus (E. grandis) with NCBI accession number GCF-016545825.1 and poplar (P. trichocarpa) with NCBI accession number GCF- 000002775.5 is used. Both the tree species are model plants and their genomes were assembled at the chromosome level. In this study, we investigate various genomic characteristics, including genome size, chromosome number, total GC content, gene count, protein-coding genes, small non-coding RNAs (SncRNA), pseudogenes, and microsatellite sequences, in two rapidly growing poplar and eucalypt species. Additionally, we construct a corresponding Venn diagram to illustrate the findings. Also, the sequences of microsatellites with MISA software in Perl and the sequences related to tandem duplication on the genomes were extracted. Also, the number of terpene synthase gene family genes in two species was compared. Finally, genes related to two important traits of interest to breeders, including wood formation and cell wall quality traits, were studied.
Results: The results reveal that the eucalypt genome is larger than that of poplar, containing 42,619 genes, including 33,352 protein-encoding genes. The poplar genome, on the other hand, consists of 34,621 genes, with 29,617 being protein-coding. Moreover, the number of pseudogenes in the eucalypt genome is 2.9 times higher than that in poplar. The number of eucalyptus chromosomes is 11 and the number of poplar chromosomes is 19. The number of small RNAs for eucalyptus and poplar genomes were 1507 and 1347, respectively. According to the genome annotation information available on NCBI site, some genes were found only in Eucalyptus and some genes were found only in poplar. According to the Venn diagram, 14,484 unique genes for Eucalyptus and 12,114 genes specific to poplar were identified. 9133 genes were shared between the two species. The total number of microsatellite markers identified on the eucalyptus genome was 136,147 and for the poplar genome was 77,024. The results showed that the genomes of eucalyptus and poplar are composed of 3.8 Mb and 10.2 Mb of microsatellite sequences, respectively. Interestingly, the eucalypt genome exhibits 1.8 times more microsatellite markers and a 1.2 times greater marker density (Total microsatellite sizes in kilobases divided by genome size in megabases or kb/Mb) compared to the poplar genome. It should be noted that 4067 types of motifs were identified in the eucalyptus genome and 2898 types of motifs were identified in the poplar genome. We observed an inverse relationship between the frequency of microsatellites and the number of nucleotides among the genomic sequences of the studied species. So, with the increase in the frequency of microsatellites, a significant decrease in the number of nucleotides has been observed. Based on this, single and two nucleotide microsatellites had the highest frequency, while eight and nine nucleotide microsatellites had the lowest frequency. The results of the evaluation regarding the difference in the presence of the terpene synthase gene family in the two studied species also indicated that 112 genes were identified in eucalyptus and 7 genes were identified in poplar. The number of clusters has been identified as 3185 in Eucalyptus species and 2575 in poplar species. The total number of retained tandem genes in the eucalypt genome was 16 % more than that of the poplar genome. Additionally, the number of functional and non-functional genes in eucalypt surpasses that of poplar. The valuable insights obtained from such comparative genomics studies have the potential to facilitate plant breeding and conservation genetic efforts. The alternative splicing event has occurred in a large number of genes related to wood formation trait in the two studied trees with different patterns. A total of 59 candidate genes for cell wall quality trait were identified for poplar and eucalyptus.
Conclusion: Comparative genomics can speed up the breeding program of tree species by providing diverse alleles related to important economic and ecological traits and also help to preserve endangered and genetically distinct species.

Keywords

Main Subjects