劉玄石,李巍
?
早產(chǎn)相關(guān)基因的挖掘與特征分析
劉玄石,李巍
國(guó)家兒童醫(yī)學(xué)中心,首都醫(yī)科大學(xué)附屬北京兒童醫(yī)院,遺傳與出生缺陷防治中心;北京市兒科研究所,出生缺陷遺傳學(xué)研究北京市重點(diǎn)實(shí)驗(yàn)室;兒科重大疾病研究教育部重點(diǎn)實(shí)驗(yàn)室,北京 100045
早產(chǎn)(preterm birth, PTB)指胎兒在完成37周妊娠前出生,是新生兒死亡的主要原因,與多種新生兒疾病和成年發(fā)生的慢性病相關(guān)。據(jù)雙生子和家系研究報(bào)道,遺傳因素約占早產(chǎn)風(fēng)險(xiǎn)的15%~35%,然而早產(chǎn)的分子流行病學(xué)機(jī)制目前尚不明確。本研究通過挖掘文獻(xiàn)數(shù)據(jù)庫和疾病數(shù)據(jù)庫中與早產(chǎn)相關(guān)的文獻(xiàn),并結(jié)合兩重過濾的方法,篩選出355個(gè)與早產(chǎn)相關(guān)基因。富集分析發(fā)現(xiàn)早產(chǎn)相關(guān)基因主要分子功能包括:受體配體活性、細(xì)胞因子受體結(jié)合、細(xì)胞因子活性和生長(zhǎng)因子活性等;主要通路包括KEGG中富集的糖尿病并發(fā)癥中的AGE-RAGE信號(hào)通路、Chagas病和IL-17信號(hào)通路和TNF信號(hào)通路等,以及Reactome中富集的多個(gè)與免疫相關(guān)的通路。早產(chǎn)相關(guān)基因與基因組其他基因相比較,轉(zhuǎn)錄本數(shù)量有差異(α = 0.1,= 0.06),但在GC含量和基因長(zhǎng)度上沒有明顯差異。本研究結(jié)果提示早產(chǎn)基因大多集中在免疫相關(guān)通路,具備與免疫過程密切相關(guān)的分子功能,為早產(chǎn)的遺傳機(jī)制研究提供了重要資源。
早產(chǎn);數(shù)據(jù)挖掘;富集分析;基因特征;轉(zhuǎn)錄本數(shù)量
早產(chǎn)是指胎兒在完成37周妊娠前出生。2010年,世界衛(wèi)生組織等國(guó)際組織對(duì)全世界184個(gè)國(guó)家的調(diào)查發(fā)現(xiàn),新生兒的早產(chǎn)率大致是5%~ 18%[1],中國(guó)的早產(chǎn)率大約是7%,每年約有120萬早產(chǎn)嬰兒,全球排名第二,僅低于印度[2]。除死亡風(fēng)險(xiǎn)外,早產(chǎn)還可能伴有腦癱、肺部疾病、聽覺和視覺缺陷等風(fēng)險(xiǎn)[1,2],甚至有研究發(fā)現(xiàn)早產(chǎn)與成年后發(fā)生的一些慢性疾病相關(guān),如心血管疾病和糖尿病等[3]。目前,早產(chǎn)的發(fā)生機(jī)制尚不明確。根據(jù)雙生子及家系研究的估算,遺傳因素對(duì)早產(chǎn)風(fēng)險(xiǎn)的影響大約占15%~35%[4~6]。早期對(duì)早產(chǎn)遺傳機(jī)制的研究,通常根據(jù)早產(chǎn)病理學(xué)特點(diǎn),選擇可能相關(guān)的基因展開研究。例如,與新生兒出生體重和月經(jīng)期有關(guān)的[7],參與炎癥反應(yīng)的、[8]和[9],與血管生成有關(guān)的[10,11]等。近年來,采用高通量測(cè)序技術(shù)對(duì)早產(chǎn)遺傳因素的研究,發(fā)現(xiàn)了大量相關(guān)的位點(diǎn)和基因,包括采用全基因組關(guān)聯(lián)分析找到的與自發(fā)早產(chǎn)相關(guān)的3個(gè)位點(diǎn)(rs17053026、rs17527054和rs3777722)[12],以及位于、和基因上的與早產(chǎn)相關(guān)的位點(diǎn)[13];利用全外顯子測(cè)序發(fā)現(xiàn)與早產(chǎn)最顯著相關(guān)的位點(diǎn)落在基因外顯子上[14];全基因組、轉(zhuǎn)錄組和甲基化數(shù)據(jù)的結(jié)果提示和基因與早產(chǎn)相關(guān)[15]等。雖然針對(duì)早產(chǎn)遺傳因素的研究已經(jīng)積累了大量數(shù)據(jù),然而由于早產(chǎn)的遺傳機(jī)制相當(dāng)復(fù)雜,現(xiàn)有研究結(jié)果也缺乏較好的歸納和整合,如Database for Preterm Birth (dbPTB)最后一次更新是2014年,這使得后續(xù)采用生物信息學(xué)手段對(duì)早產(chǎn)遺傳信息的挖掘和早產(chǎn)遺傳模型的構(gòu)建變得困難[16]。因此,本研究利用生物信息學(xué)方法,通過挖掘文獻(xiàn)數(shù)據(jù)庫以及疾病基因數(shù)據(jù)庫中報(bào)道的早產(chǎn)相關(guān)基因信息,整合并分析早產(chǎn)相關(guān)基因的特征,為早產(chǎn)的遺傳研究提供重要資源。
(1)文獻(xiàn)數(shù)據(jù)庫:美國(guó)國(guó)家醫(yī)學(xué)圖書館(PubMed, https://www.ncbi.nlm.nih.gov/pubmed/);(2)疾病數(shù)據(jù)庫:人類孟德爾遺傳數(shù)據(jù)庫(OMIM, https://www. omim.org/,下載時(shí)間:2019年1月18日)、人類基因組變異數(shù)據(jù)庫(ClinVar, https://www.ncbi.nlm.nih. gov/clinvar/,下載時(shí)間:2019年2月11日)以及毒物基因組學(xué)數(shù)據(jù)庫(CTD, http://ctdbase.org/,下載時(shí)間2019年2月6日);(3)基因特征數(shù)據(jù)通過Ensembl 數(shù)據(jù)庫收集(http://grch37.ensembl.org/biomart/martview/ b3df3ce0609b9d96d3347ff1d09e4348,數(shù)據(jù)下載時(shí)間:2019年3月10日)?;驍?shù)據(jù)均統(tǒng)一使用人類參考基因組GRCh37/hg19;(4)統(tǒng)計(jì)應(yīng)用軟件R,版本號(hào)3.5.1。R包ClusterProfiler (版本3.10.1)用于富集分析[17];(5)網(wǎng)頁版文本挖掘工具SciMiner (http:// hurlab.med.und.edu/SciMiner/,使用時(shí)間:2019年3月10日)[18]。
2019年3月8日,通過計(jì)算機(jī)檢索PubMed數(shù)據(jù)庫,采用關(guān)鍵詞檢索式“preterm birth”AND“gene”,檢索年限為建庫至2019年3月。整理出所有文獻(xiàn)的PMID,輸入文本挖掘工具SciMiner。SciMiner軟件通過關(guān)鍵字“preterm birth”,以及軟件內(nèi)置的正則表達(dá)規(guī)則和基因字典,挖掘文獻(xiàn)中與早產(chǎn)相關(guān)基因。為避免過度匹配,對(duì)SciMiner挖掘結(jié)果設(shè)置閾值和人工審核的兩層過濾方式。首先根據(jù)設(shè)置的闕值,刪除了僅在2篇及以下文獻(xiàn)中出現(xiàn)的基因。其次通過人工核查摘要,刪除摘要中沒有直接提及早產(chǎn)的基因。最后篩選出用于后續(xù)分析的基因列表。
通過Shell腳本程序,搜索疾病數(shù)據(jù)庫OMIM,ClinVar和CTD,查找與“preterm birth”或其同義詞匹配的記錄,提取記錄下的基因信息,并合并進(jìn)文獻(xiàn)數(shù)據(jù)庫篩選出的基因列表。
采用R軟件包ClusterProfiler對(duì)篩選出的基因,進(jìn)行了基因功能(Gene Ontology, GO)和KEGG通路(京都基因與基因組大百科全書數(shù)據(jù)庫,Kyoto En-cyclopedia of Genes and Genomes)以及Reactome通路[19]的富集分析,對(duì)結(jié)果進(jìn)行多重檢驗(yàn)后,獲得顯著的功能和通路,以FDR<0.05 (false discovery rate)作為顯著性的閾值。
采用Ensembl的BioMart,收集了20320個(gè)基因的長(zhǎng)度,轉(zhuǎn)錄本數(shù)量,GC含量特征(人基因組版本GRCh37.p13/hg19)。根據(jù)篩選出的基因列表,采用Shell腳本程序,從BioMart數(shù)據(jù)中提取了所需基因的特征信息。
通過計(jì)算機(jī)檢索PubMed數(shù)據(jù)庫獲得來源于800種雜志的2264篇相關(guān)文獻(xiàn)的摘要,并通過PMID和SciMiner軟件挖掘出了文獻(xiàn)中與早產(chǎn)可能相關(guān)的2149個(gè)基因。其中,文獻(xiàn)數(shù)量居前5%的雜志多數(shù)是臨床專業(yè)期刊(附表1)。經(jīng)過閾值和人工審核的兩層過濾,篩選出在1274篇文獻(xiàn)里出現(xiàn)的355個(gè)基因(附表2),表1列出了在文獻(xiàn)數(shù)量中排名前5%的基因。
通過對(duì)疾病數(shù)據(jù)庫OMIM、ClinVar和CTD的挖掘,找到1個(gè)早產(chǎn)相關(guān)基因()。由于該基因已存在于上述355個(gè)基因中,因此最終用于分析的基因數(shù)目不變。
GO富集分析發(fā)現(xiàn)174種顯著的生物學(xué)功能(FDR<0.05)。根據(jù)顯著性由高到低排列,前10種生物學(xué)功能包括:受體配體活性(receptor ligand activity)、細(xì)胞因子受體結(jié)合(cytokine receptor binding)、細(xì)胞因子活性(cytokine activity)、生長(zhǎng)因子活性(growth factor activity)、生長(zhǎng)因子結(jié)合(growth factor binding)、蛋白酶結(jié)合(protease binding)、血紅素結(jié)合(heme bin-ding)、生長(zhǎng)因子受體結(jié)合(growth factor receptor bi-nding)、四吡咯結(jié)合(tetrapyrrole binding)和脂多糖結(jié)合(lipopolysaccharide binding) (圖1,附表3)。其中具有受體配體活性功能的基因數(shù)量最多,共有61個(gè)。
KEGG富集分析發(fā)現(xiàn)的顯著信號(hào)通路達(dá)到158個(gè)(FDR<0.05)。前10條通路根據(jù)顯著性由高到低排列分別是:糖尿病并發(fā)癥中的AGE-RAGE信號(hào)通路(AGE-RAGE signaling pathway in diabetic compli-cations),Chagas病(美洲錐蟲病),IL-17信號(hào)通路(IL-17 signaling pathway),TNF信號(hào)通路(TNF sign-aling pathway),PI3K-Akt信號(hào)通路(PI3K-Akt signa-ling pathway),Toll樣受體信號(hào)通路(Toll-like receptor signaling pathway),結(jié)核(tuberculosis),炎癥性腸病(inflammatory bowel disease (IBD)),乙型肝炎(hep-atitis B)和流體剪切力和動(dòng)脈粥樣硬化(fluid shear stress and atherosclerosis) (圖2,附表4)。
Reactome通路富集分析中前10個(gè)顯著通路分別是:白細(xì)胞介素信號(hào)(Signaling by Interleukins),白細(xì)胞介素4和白細(xì)胞介素-13信號(hào)傳導(dǎo)(Inter-leukin-4 and Interleukin-13 signaling),白細(xì)胞介素10信號(hào)傳導(dǎo)(Interleukin-10 signaling),Toll樣受體級(jí)聯(lián)(Toll-like Receptor Cascades),Toll樣受體4 (TLR4)級(jí)聯(lián)(Toll Like Receptor 4 (TLR4) Cascade),Toll樣受體TLR1:TLR2級(jí)聯(lián)(Toll Like Receptor TLR1: TLR2 Cascade),Toll樣受體2 (TLR2)級(jí)聯(lián)(Toll Like Receptor 2 (TLR2) Cascade),免疫系統(tǒng)疾病(Diseases of Immune System),與TLR信號(hào)級(jí)聯(lián)相關(guān)疾病(Dise-ases associated with the TLR signaling cascade),質(zhì)膜上啟動(dòng)的MyD88:MAL (TIRAP)級(jí)聯(lián)(MyD88:MAL (TIRAP) cascade initiated on plasma membrane) (圖3,附表5)。
表1 篩選出的基因列表中排前5%的早產(chǎn)相關(guān)基因
圖1 基因分子功能的GO富集
顏色代表FDR值的大小,由藍(lán)色到紅色FDR值逐漸變小,圓點(diǎn)的面積代表基因的數(shù)量。
圖2 基因KEGG通路的富集結(jié)果
顏色代表FDR值的大小,由藍(lán)色到紅色FDR值逐漸變小,圓點(diǎn)的面積代表基因的數(shù)量。
圖3 基因Reactome通路的富集
顏色代表FDR值的大小,由藍(lán)色到紅色FDR值逐漸變小,圓點(diǎn)的面積代表基因的數(shù)量。
對(duì)比早產(chǎn)基因的每個(gè)基因轉(zhuǎn)錄本數(shù)量和全基因組每個(gè)基因的轉(zhuǎn)錄本數(shù)量,早產(chǎn)基因的轉(zhuǎn)錄本數(shù)量平均值(8.2)要高于全基因組基因的轉(zhuǎn)錄本數(shù)量平均值(7.5) (圖4A)。在顯著性水平α=0.1的情況下,差異顯著(檢驗(yàn):=0.06)。針對(duì)GC含量的比較,早產(chǎn)基因和全基因組基因之間沒有明顯差異(檢驗(yàn):=0.70,α=0.1) (圖4B)。
在早產(chǎn)基因長(zhǎng)度和全基因組編碼蛋白的基因長(zhǎng)度的比較中發(fā)現(xiàn),早產(chǎn)基因的平均長(zhǎng)度為63 100 bp,而全基因組基因的長(zhǎng)度平均為61 191 bp (圖5)。在顯著性水平α=0.1的情況下,差異不顯著(檢驗(yàn):=0.73)。
早產(chǎn)是新生兒健康研究領(lǐng)域的一個(gè)極其重要的研究方向。雖然關(guān)于早產(chǎn)發(fā)生發(fā)展的分子作用機(jī)制尚不明確,但是已有大量研究表明早產(chǎn)的發(fā)生與遺傳有關(guān),并已產(chǎn)生了大量的數(shù)據(jù)。本研究通過文本挖掘工具挖掘PubMed中所檢索的2264篇早產(chǎn)相關(guān)文獻(xiàn)中的基因,結(jié)合閾值和人工審核的兩層過濾以及疾病數(shù)據(jù)庫記錄,最終鎖定355個(gè)早產(chǎn)相關(guān)基因。這是目前為止從文獻(xiàn)中挖掘的最新的早產(chǎn)相關(guān)基因數(shù)據(jù)集。富集分析表明早產(chǎn)相關(guān)基因大多集中在免疫相關(guān)通路,基因特征分析發(fā)現(xiàn)早產(chǎn)相關(guān)基因和全基因組基因?qū)Ρ?,GC含量和基因長(zhǎng)度沒有差異,而轉(zhuǎn)錄本數(shù)量有差異。
以往的研究發(fā)現(xiàn),免疫和炎癥反應(yīng)對(duì)維持妊娠和決定分娩時(shí)間起重要作用[8,20,21]。其中,由于父源和母源抗原的同時(shí)存在,母胎免疫耐受的維持在妊娠期間起重要作用,而這種穩(wěn)態(tài)的破壞,可能會(huì)導(dǎo)致早產(chǎn)的發(fā)生[20]。先天免疫細(xì)胞通過釋放炎性因子來影響妊娠過程和分娩時(shí)間,例如巨噬細(xì)胞釋放的炎性因子可能促進(jìn)催產(chǎn)素的產(chǎn)生,從而使子宮發(fā)生收縮,為分娩做準(zhǔn)備[22]。同時(shí),先天免疫和獲得性免疫之間的失衡,也可能導(dǎo)致早產(chǎn)發(fā)生[23]。本研究采用挖掘得到的早產(chǎn)相關(guān)基因進(jìn)行KEGG和Reactome富集分析,結(jié)果發(fā)現(xiàn)早產(chǎn)基因大多集中在免疫和炎癥反應(yīng)相關(guān)通路,這一點(diǎn)與以往的研究發(fā)現(xiàn)相吻合。先天免疫系統(tǒng)反映了對(duì)感染的應(yīng)答作用,包括但不限于巨噬細(xì)胞、toll-like受體、噬中性粒細(xì)胞和細(xì)胞因子等;獲得性免疫系統(tǒng)主要是T淋巴細(xì)胞和B淋巴細(xì)胞[24]。GO富集分析的結(jié)果也體現(xiàn)了早產(chǎn)相關(guān)基因具備與免疫過程密切相關(guān)的分子功能,包括受體配體活性、細(xì)胞因子受體活性等。本研究找到的前20個(gè)早產(chǎn)相關(guān)基因中,大多與免疫直接或間接相關(guān)。其中研究基因的文獻(xiàn)數(shù)目最多,研究包括胎兒腸膜發(fā)育和早產(chǎn)介導(dǎo)炎癥[25]、環(huán)境內(nèi)分泌物與孕期炎癥生物標(biāo)志物[26]。
圖4 對(duì)比早產(chǎn)基因和全基因組基因的轉(zhuǎn)錄本數(shù)量以及GC含量
A:轉(zhuǎn)錄本數(shù)量分布(個(gè));B:GC含量分布(%)。紅色的曲線代表全基因組,黑色的曲線代表早產(chǎn)基因。
圖5 對(duì)比早產(chǎn)基因和全基因組編碼蛋白基因的長(zhǎng)度
紅色的曲線代表全基因組,黑色的曲線代表早產(chǎn)基因。
據(jù)文獻(xiàn)報(bào)道,人類基因組可能在疾病中具備一定特征[27,28],如慢性阻塞性肺疾病相關(guān)的基因轉(zhuǎn)錄本復(fù)雜度與對(duì)照組顯著不同[29],內(nèi)源性疾病的基因編碼區(qū)具有高GC含量[30],在神經(jīng)發(fā)育和神經(jīng)退行性疾病中發(fā)現(xiàn)基因的長(zhǎng)度扮演重要角色[31],其中在自閉癥可能的候選基因中有許多長(zhǎng)基因[32]。為進(jìn)一步探索早產(chǎn)相關(guān)基因的基因組特征,本研究對(duì)比了早產(chǎn)相關(guān)基因與全基因組基因在轉(zhuǎn)錄本數(shù)量、GC含量和基因長(zhǎng)度上的差異。其中,轉(zhuǎn)錄本數(shù)量存在差異。有研究發(fā)現(xiàn),具有較多轉(zhuǎn)錄本數(shù)量的基因多為管家基因或必需基因,在生物學(xué)上起重要作用[33],然而針對(duì)轉(zhuǎn)錄本數(shù)量較多的早產(chǎn)相關(guān)基因,目前尚無文獻(xiàn)報(bào)道。這些基因在早產(chǎn)所起的作用,仍需要進(jìn)一步研究。GC含量在本研究中反映的是鳥嘌呤和胞嘧啶在每個(gè)基因中所占的比例。本研究并未發(fā)現(xiàn)早產(chǎn)相關(guān)基因與全基因組基因GC含量上存在顯著差異。同時(shí),早產(chǎn)基因在基因長(zhǎng)度上與全基因組的所有基因相比,也無明顯差異。
然而,本研究也有一定的局限性。首先,在數(shù)據(jù)庫的甄選上,挖掘文獻(xiàn)中早產(chǎn)相關(guān)基因時(shí),也可以考慮包括中文數(shù)據(jù)庫,例如CNKI,可以挖掘更多與中國(guó)人早產(chǎn)相關(guān)的研究和相關(guān)基因。其次,對(duì)基因的特征分析可以引入更多的變量,如種族信息等。對(duì)不同種族的研究,或許可以找到疾病相關(guān)且種族特異的遺傳背景[34]。
綜上所述,本研究結(jié)合文本挖掘和兩層過濾方法以及疾病數(shù)據(jù)庫記錄,最終鎖定355個(gè)早產(chǎn)相關(guān)基因,是截止到投稿時(shí),最新的早產(chǎn)相關(guān)基因的整合記錄。富集分析表明早產(chǎn)相關(guān)基因大多集中在免疫相關(guān)信號(hào)通路,基因特征分析提示了早產(chǎn)相關(guān)基因的轉(zhuǎn)錄本數(shù)量對(duì)比全基因組基因有一定差異。本研究對(duì)早產(chǎn)基因的挖掘和整合,可以為早產(chǎn)的遺傳研究提供重要資源和提示相關(guān)研究方向。
附表1~5見文章電子版www.chinagene.cn。
[1] Liu L, Oza S, Hogan D, Chu Y, Perin J, Zhu J, Lawn JE, Cousens S, Mathers C, Black RE. Global, regional, and national causes of under-5 mortality in 2000-15: an updated systematic analysis with implications for the sustainable development goals, 2016, 388(10063): 3027– 3035.
[2] Blencowe H, Cousens S, Oestergaard MZ, Chou D, Moller AB, Narwal R, Adler A, Vera Garcia C, Rohde S, Say L, Lawn JE. National, regional, and worldwide estimates of preterm birth rates in the year 2010 with time trends since 1990 for selected countries: a systematic analysis and implications, 2012, 379(9832): 2162–2172.
[3] Sipola-Lepp?nen M, V??r?sm?ki M, Tikanm?ki M, Matinolli HM, Miettola S, Hovi P, Wehkalampi K, Ruokonen A, Sundvall J, Pouta A, Eriksson JG, J?rvelin MR, Kajantie E, Cardiometabolic risk factors in young adults who were born preterm, 2015, 181(11): 861–873.
[4] Wu W, Witherspoon DJ, Fraser A, Clark EA, Rogers A, Stoddard GJ, Manuck TA, Chen K, Esplin MS, Smith KR, Varner MW, Jorde LB. The heritability of gestational age in a two-million member cohort: implications for spontaneous preterm birth, 2015, 134(7): 803–808.
[5] Kistka ZA, DeFranco EA, Ligthart L, Willemsen G, Plunkett J, Muglia LJ, Boomsma DI. Heritability of parturition timing: an extended twin design analysis, 2008, 199(1): 43.e1–5.
[6] York TP, Eaves LJ, Lichtenstein P, Neale MC, Svensson A, Latendresse S, L?ngstr?m N, Strauss JF 3rd. Fetal and maternal genes' influence on gestational age in a quantitative genetic analysis of 244,000 Swedish births, 2013, 178(4): 543–550.
[7] Liang HY, Wu BY, Chen DF, Yang F, Hu HY, Chen L, Xu XP. Association of PON2 Gene Polymorphisms in Neonates with Preterm., 2002, 24(5): 515–518.梁紅業(yè), 吳白燕, 陳大方, 楊帆, 胡海燕, 陳櫟, 徐希平, 新生兒PON2基因多態(tài)性與早產(chǎn)的關(guān)系遺傳, 2002, 24(5): 515–518.
[8] Annells MF, Hart PH, Mullighan CG, Heatley SL, Robinson JS, Bardy P, McDonald HM. Interleukins-1, -4, -6, -10, tumor necrosis factor, transforming growth factor-beta, FAS, and mannose-binding protein C gene polymorphisms in australian women: risk of preterm birth, 2004, 191(6): 2056–2067.
[9] Krediet TG, Wiertsema SP, Vossers MJ, Hoeks SB, Fleer A, Ruven HJ, Rijkers GT. Toll-like receptor 2 polymorphism is associated with preterm birth, 2007, 62(4): 474–476.
[10] Papazoglou D, Galazios G, Koukourakis MI, Kontomanolis EN, Maltezos E. Association of -634G/C and 936C/T polymorphisms of the vascular endothelial growth factor with spontaneous preterm delivery, 2004, 83(5): 461–465.
[11] Chen BH, Carmichael SL, Shaw GM, Iovannisci DM, Lammer EJ. Association between 49 infant gene polymorphisms and preterm delivery, 2007, 143A(17): 1990–1906.
[12] Zhang H, Baldwin DA, Bukowski RK, Parry S, Xu Y, Song C, Andrews WW, Saade GR, Esplin MS, Sadovsky Y, Reddy UM, Ilekis J, Varner M, Biggio JR Jr. A genome-wide association study of early spontaneous preterm delivery, 2015, 39(3): 217–226.
[13] Zhang GB, Feenstra B, Bacelis J, Liu X, Muglia LM, Juodakis J, Miller DE, Litterman N, Jiang PP, Russell L, Hinds DA, Hu Y, Weirauch MT, Chen X, Chavan AR, Wagner GP, Pavli?ev M, Nnamani MC, Maziarz J, Karjalainen MK, R?met M, Sengpiel V, Geller F, Boyd HA, Palotie A, Momany A, Bedell B, Ryckman KK, Huusko JM, Forney CR, Kottyan LC, Hallman M, Teramo K, Nohr EA, Davey Smith G, Melbye M, Jacobsson B, Muglia LJ. Genetic associations with gestational duration and spontaneous preterm birth, 2017, 377(12): 1156–1167.
[14] McElroy JJ, Gutman CE, Shaffer CM, Busch TD, Puttonen H, Teramo K, Murray JC, Hallman M, Muglia LJ. Maternal coding variants in complement receptor 1 and spontaneous idiopathic preterm birth, 2013, 132(8): 935–942.
[15] Knijnenburg TA, Vockley JG, Chambwe N, Gibbs DL, Humphries C, Huddleston KC, Klein E, Kothiyal P, Tasseff R, Dhankani V, Bodian DL, Wong WSW, Glusman G, Mauldin DE, Miller M, Slagel J, Elasady S, Roach JC, Kramer R, Leinonen K, Linthorst J, Baveja R, Baker R, Solomon BD, Eley G, Iyer RK, Maxwell GL, Bernard B, Shmulevich I, Hood L, Niederhuber JE. Genomic and molecular characterization of preterm birth, 2019, 116(12): 5819–5827.
[16] Uzun A, Laliberte A, Parker J, Andrew C, Winterrowd E, Sharma S, Istrail S, Padbury JF. DbPTB: a database for preterm birth, 2012, 2012: bar069.
[17] Yu G, Wang LG, Han Y, He QY. ClusterProfiler: an R package for comparing biological themes among gene clusters, 2012, 16(5): 284–287.
[18] Hur J, Schuyler AD, States DJ, Feldman EL. SciMiner: web-based literature mining tool for target identification and functional enrichment analysis, 2009, 25(6): 838–840.
[19] Fabregat A, Jupe S, Matthews L, Sidiropoulos K, Gillespie M, Garapati P, Haw R, Jassal B, Korninger F, May B, Milacic M, Roca CD, Rothfels K, Sevilla C, Shamovsky V, Shorser S, Varusai T, Viteri G, Weiser J, Wu G, Stein L, Hermjakob H, D'Eustachio P. The reactome pathway knowledgebase, 2018, 46(D1): D649– D655.
[20] Romero R, Dey SK, Fisher SJ. Preterm labor: one syndrome, many causes,, 2014, 345(6198): 760–765.
[21] Macones GA, Parry S, Elkousy M, Clothier B, Ural SH, Strauss JF 3rd. A polymorphism in the promoter region of TNF and bacterial vaginosis: preliminary evidence of gene-environment interaction in the etiology of spontaneous preterm birth, 2004, 190(6): 1509–1519.
[22] Fang X, Wong S, Mitchell BF. Effects of LPS and IL-6 on oxytocin receptor in non-pregnant and pregnant rat uterus, 2000, 44(2): 65–72.
[23] Gomez-Lopez N, StLouis D, Lehr MA, Sanchez- Rodriguez EN, Arenas-Hernandez M. Immune cells in term and preterm labor, 2014, 11(6): 571–581.
[24] Melville JM, Moss TJ. The immune consequences of preterm birth, 2013, 7: 79.
[25] Schreurs R, Baumdick ME, Sagebiel AF, Kaufmann M, Mokry M, Klarenbeek PL, Schaltenberg N, Steinert FL, van Rijn JM, Drewniak A, The SML, Bakx R, Derikx JPM, de Vries N, Corpeleijn WE, Pals ST, Gagliani N, Friese MA, Middendorp S, Nieuwenhuis EES, Reinshagen K, Geijtenbeek TBH, van Goudoever JB, Bunders MJ. Human fetal TNF-α-Cytokine-Producing CD4+effector memory T cells promote intestinal development and mediate inflammation early in life, 2019, 50(2): 462–476.e8.
[26] Ferguson KK, Cantonwine DE, Rivera-González LO, Loch-Caruso R, Mukherjee B, Anzalota Del Toro LV, Jiménez-Vélez B, Calafat AM, Ye X, Alshawabkeh AN, Cordero JF, Meeker JD. Urinary phthalate metabolite associations with biomarkers of inflammation and oxidative stress across pregnancy in Puerto Rico, 2014, 48(12): 7018–7025.
[27] Collins A. The genomic and functional characteristics of disease genes, 2014 16(1): 16–23.
[28] Pengelly RJ, Vergara-Lope A, Alyousfi D, Jabalameli MR, Collins A. Understanding the disease genome: gene essentiality and the interplay of selection, recombination and mutation, 2019, 20(1): 267–273.
[29] Lackey L, McArthur E, Laederach A. Increased transcript complexity in genes associated with chronic obstructive pulmonary disease, 2015, 10(10): e0140885.
[30] Peng Z, Uversky VN, Kurgan L. Genes encoding intrinsic disorder in Eukaryota have high GC content, 2016, 4(1): e1262225.
[31] Zylka MJ, Simon JM, Philpot BD. Gene length matters in neurons, 2015, 86(2): 353–355.
[32] King IF, Yandava CN, Mabb AM, Hsiao JS, Huang HS, Pearson BL, Calabrese JM, Starmer J, Parker JS, Magnuson T, Chamberlain SJ, Philpot BD, Zylka MJ. Topoisomerases facilitate transcription of long genes linked to autism, 2013, 501(7465): 58–62.
[33] Ryu JY, Kim HU, Lee SY. Human genes with a greater number of transcript variants tend to show biological features of housekeeping and essential genes, 2015, 11(10): 2798–2807.
[34] Rappoport N, Toung J, Hadley D, Wong RJ, Fujioka K, Reuter J, Abbott CW, Oh S, Hu D, Eng C, Huntsman S, Bodian DL, Niederhuber JE, Hong X, Zhang G, Sikora-Wohfeld W, Gignoux CR, Wang H, Oehlert J, Jelliffe-Pawlowski LL, Gould JB, Darmstadt GL, Wang X, Bustamante CD, Snyder MP, Ziv E, Patsopoulos NA, Muglia LJ, Burchard E, Shaw GM, O'Brodovich HM, Stevenson DK, Butte AJ, Sirota M. A genome-wide association study identifies only two ancestry specific variants associated with spontaneous preterm birth, 2018, 8(1): 226.
Mining and characterization of preterm birth related genes
Xuanshi Liu, Wei Li
Preterm birth (PTB) refers to birth before 37 completed gestational weeks. PTB is the leading cause of neonatal deaths and is associated with various neonatal complications and adult-onset chronic diseases. According to twin and family studies, genetic variants account for about 15% to 35% of the incidence of PTB. However, the molecular epidemiology of PTB is still unclear. By mining the PTB-related researches in the literature database and the disease databases, and combining two filtering methods, 355 PTB-related genes were selected. The enrichment analyses of molecular function revealed that the main functions of PTB-related genes include: receptor ligand activity, cytokine receptor binding, cytokine activity, growth factor activity, etc.; the main pathways from KEGG enrichment were the AGE-RAGE signaling pathway in diabetic complications, Chagas disease, and the IL-17 signaling pathway, the TNF signaling pathway, etc, as well as several immune-related pathways from Reactome enrichment. There were differences in the number of transcripts between PTB-related genes and other genes in the genome (α = 0.1,= 0.06), but there was no significant difference in GC content and gene lengths. The results suggest that PTB-related genes are mostly in immune-related pathways, and have molecular functions closely related to immunity. Our work provides an important resource for the study of the genetical mechanisms of PTB.
preterm birth; data mining; enrichment analysis; gene features; transcript number
2019-03-21;
2019-05-08
劉玄石,博士研究生,助理研究員,專業(yè)方向:生物信息學(xué)。E-mail: liuxs2017bioinf@163.com
李巍,博士,教授,博士生導(dǎo)師,研究方向:醫(yī)學(xué)生物化學(xué),醫(yī)學(xué)遺傳,細(xì)胞生物學(xué),產(chǎn)前診斷以及遺傳咨詢。E-mail: liwei@bch.com.cn
10.16288/j.yczz.19-078
2019/5/10 15:23:07
URI: http://kns.cnki.net/kcms/detail/11.1913.R.20190510.1522.002.html
(責(zé)任編委: 方向東)