基于基因进化树和地理数据库追踪禽流感病毒变异

时间:2022-06-21 04:17:09

基于基因进化树和地理数据库追踪禽流感病毒变异

中华急诊医学杂志2012年8月第21卷第8期Chin J Emerg Med,August 2012,Vol.21,No.8

P887-891

【摘要】目的 禽流感疫情的爆发和传播受到多种自然因素的影响。今欲尝试将地理信息系统与基因进化树分析相结合,以建立一种基于基因序列变异追踪中国禽流感病毒地理传播的技术。方法 禽流感病毒基因来源于美国国立医学图书馆(National Library for Medicine, NLM)数据库,所获得的基因组数据利用E-Utilities软件包转化为结构体后,可用Matlab软件阅读。结构体主要字段包括PB2、PB1、PA、HA、NP、HA、M1和NS1 8个片段,分别代表流感病毒的8个不同的基因片段。基于结构体字段,利用计算生物学的方法比较不同传播能力禽流感病毒的同义突变/非同义突变基因(Ka/Ks)比例,确定不同选择压力之下A型禽流感病毒的基因突变模式。进而选择Ka/Ks比例最大的基因片段,采用Jukes-Cantor算法估计氨基酸序列变异的进化距离,然后对不同爆发点的H5N1型禽流感进行进化树聚类。将聚类信息输入Google Earth,并利用不同图层地理信息对影响爆发点分布的因素做单因素分析。结果 比较分析A型禽流感所有的8个基因序列可以看出,NS1、HA和NA蛋白的Ka/Ks比值较大。三者中,HA基因的Ka/Ks比值最大,可以代表病毒的传播能力。利用分级聚类的思路对HA基因转录的氨基酸相似程度进行比较, 发现自2003年以来亚洲地区爆发的H5N1型禽流感之间的关系可以表示为一个由30个节点构成的进化树,其中14个节点为分支节点,16个节点为叶子结点。把分支树的前三个节点作为分类标准,可以把所有16个病毒株分为四类。这四类病毒在地理空间的分布呈现一定规律。计算发现禽流感爆发相关地理因素排序分别为:内陆水体>主要铁路交通线>家禽密度。结论 对中国H5N1病毒株基因序列变异的地理分布分析显示,禽流感病毒爆发与候鸟迁徙、家禽运输密切相关。

【关键词】禽流感;病毒基因变异;谷歌地图;地理信息系统;基因进化树;同义突变/非同义突变基因比例;Jukes-Cantor算法;中国

Tracking the spread of avian influenza in China: a model based on evolutionary genetics analysis and geographic visualization CAI Bin, PENG Jin, JIANG Hua, YANG Hao, SUN Mingwei, Charles Damien Lu, Hu Wei-jian, ZENG Jun. Computational Biology Team, Metabolomics and Multidisciplinary Laboratory for Trauma Research, Sichuan Provincial People’s Hospital, Sichuan Academy of Medical Sciences. Chengdu 610101,China

Corresponding author: JIANG Hua, Email:

【Abstract】Objective To explore the diverse natural and human factors affect the outbreak and spread of avian influenza. We integrated geographic visualization and evolutionary genetics technique to establish a method to track spread of avian influenza in China. Methods The sequence data of type A avian flu virus were provided by NCBI Nucleotide and Protein Databases. We transformed the original data to readable structures for Matlab using E-Utilities software. These MATLAB readable structures represented 8 genes of the virus, they are: RNA polymerase B2 (PB2), polymerase B1 (PB1), polymerase A (PA), hemagglutinin (HA), nucleoprotein (NP), neural aminidase (NA), matrix (M1), and non-structural (NS1) proteins. Based on these readable structures, we compared Ka/Ks ratio of different virus strains and identified the gene mutation patterns under different selection pressures. Then we selected the gene that exhibited the highest Ka/Ks ratio and performed a phylogenetic analysis by Jukes-Cantor algorithm. Google Earth layer tools were then used to integrate gene variation and geographic transmission information.Results When we compared these 8 virus genes, the NS1, HA and NA were found to exhibit high Ka/Ks ratio and could be seen to represent the transmission capacity of the virus. Among these, the HA gene has the highest Ka/Ks ratio. When we compare the amino acids encoded by the HA gene using clustering analysis, we found that the relationship between H5N1 avian influenza strains since 2003 in Asia made up an evolutionary tree. This evolutionary tree contained 30 nodes (14 branch nodes and 16 leaf nodes). All genes were classified into 4 major groups by the first 3 nodes. And these 4 groups exhibit clear geographic patterns in their spread. The impact of geographic factors on the outbreak of avian influenza in China can be ranked as: inland water bodies (lakes, reservoirs) > major railway paths > density of poultry.Conclusions The analysis on the dominant strains’ gene mutations in China’ s H5N1 found that the outbreaks of avian influenza correlate with avian migration and poultry transportation.

上一篇:盛夏,不可不防“空调病” 下一篇:创意“无线”