魏大勇 崔藝馨 熊 清 湯青林 梅家琴 李加納 錢 偉,*
?
用全基因組關(guān)聯(lián)作圖和共表達(dá)網(wǎng)絡(luò)分析鑒定油菜種子硫苷含量的候選基因
魏大勇1,2,3崔藝馨3熊 清4湯青林1,2梅家琴3李加納3錢 偉3,*
1西南大學(xué)園藝園林學(xué)院, 重慶 400715;2南方山地園藝學(xué)教育部重點(diǎn)實(shí)驗(yàn)室, 重慶 400715;3西南大學(xué)農(nóng)學(xué)與生物科技學(xué)院, 重慶 400715;4西南大學(xué)計(jì)算機(jī)與信息科學(xué)學(xué)院, 重慶 400715
油菜籽餅粕是畜禽養(yǎng)殖中重要的蛋白原料, 但餅粕中的硫苷是一種抗?fàn)I養(yǎng)物質(zhì), 食用過(guò)多會(huì)對(duì)禽畜產(chǎn)生毒害, 因此挖掘油菜籽粒硫苷含量的候選基因?qū)τ筒朔N子低硫苷育種具有重要現(xiàn)實(shí)意義。本研究連續(xù)4年種植1個(gè)含157份材料的油菜自然群體, 結(jié)合重測(cè)序數(shù)據(jù)對(duì)種子硫苷含量進(jìn)行全基因組關(guān)聯(lián)分析(GWAS), 并對(duì)15份低硫苷和15份高硫苷材料進(jìn)行種子發(fā)育早期的轉(zhuǎn)錄組測(cè)序, 通過(guò)權(quán)重基因共表達(dá)網(wǎng)絡(luò)分析(WGCNA)鑒定種子硫苷含量的候選基因。用GWAS共檢測(cè)到45個(gè)與種子硫苷含量顯著相關(guān)的SNP, 單個(gè)位點(diǎn)解釋的表型變異為13.5%~23.3%, 主要分布在A09、C02和C09染色體的3個(gè)區(qū)間中, 覆蓋5個(gè)已知的硫苷代謝基因。用WGCNA分析發(fā)現(xiàn)高、低硫苷材料之間的2275個(gè)差異表達(dá)基因, 可分為12個(gè)基因模塊, 其中1個(gè)模塊的基因顯著富集在已知的硫苷生物合成途徑, 對(duì)該模塊內(nèi)163個(gè)基因的權(quán)重分析得到13個(gè)候選基因。經(jīng)檢測(cè), GWAS和WGCNA共得到的18個(gè)候選基因中, 有14個(gè)候選基因的表達(dá)量與種子硫苷含量顯著相關(guān)(= 0.376~0.638,<0.05)。用兩種方法鑒定到1個(gè)共同的候選基因(基因名), 與該基因連鎖的5個(gè)SNP構(gòu)成5種單體型, 等位基因效應(yīng)分析發(fā)現(xiàn), 自然群體中63%的材料(99/157)為Hap 5, 平均硫苷含量為50.79 μmol g–1, 與另外4種單體型(95.04~110.28 μmol g–1)存在極顯著差異(<0.01)。本研究結(jié)合GWAS和WGCNA兩種方法鑒定了油菜種子硫苷含量的候選基因, 可為復(fù)雜性狀候選基因的篩選提供參考。
甘藍(lán)型油菜; 全基因組關(guān)聯(lián)分析; 權(quán)重基因共表達(dá)網(wǎng)絡(luò)分析; 重測(cè)序; 轉(zhuǎn)錄組測(cè)序; 種子硫苷含量
油菜(L., 基因組為AACC)由白菜(L., 基因組為AA)和甘藍(lán)(L., 基因組為CC)種間自然雜交并染色體加倍形成[1-2], 是我國(guó)乃至世界主要的油料作物。油菜籽粒榨油后的餅粕含豐富的蛋白質(zhì), 是良好的天然動(dòng)物飼料[3]。然而硫代葡萄糖苷(glucosinolate, 簡(jiǎn)稱硫苷)及其降解產(chǎn)物是油菜餅粕中主要的抗?fàn)I養(yǎng)因子, 嚴(yán)重影響其作為動(dòng)物飼料的食用安全和適口性, 限制了菜籽餅粕蛋白資源的開(kāi)發(fā)利用[4], 因此降低油菜籽粒中硫苷的含量是油菜育種的重要目標(biāo)之一。
硫苷是一類含氮和硫的次生代謝產(chǎn)物, 廣泛存在于十字花科植物中[5]。硫苷的種類根據(jù)側(cè)鏈R基團(tuán)的來(lái)源不同, 分為脂肪族、芳香族和吲哚族三大類[6]。硫苷的生物合成途徑大致可以分為前體氨基酸側(cè)鏈延長(zhǎng)、核心結(jié)構(gòu)合成和次級(jí)修飾3個(gè)主要階段, 主要基因在模式植物擬南芥中均已被證實(shí)[6-10]。隨著油菜及其親本種(白菜和甘藍(lán))參考基因組的釋放, 大量的硫苷合成和降解基因已被鑒定, 其中在白菜[11]、甘藍(lán)[12]和油菜[1]中分別鑒定出123、127和181個(gè)硫苷代謝相關(guān)基因。目前對(duì)油菜種子硫苷含量的研究主要是通過(guò)QTL定位[13-15]、全基因組關(guān)聯(lián)分析(GWAS)[16-17]和關(guān)聯(lián)轉(zhuǎn)錄組分析(associative transcriptomic)[18-19]挖掘候選基因和功能位點(diǎn)。
隨著測(cè)序技術(shù)的不斷更新和測(cè)序費(fèi)用的不斷降低, 海量的測(cè)序數(shù)據(jù)需要分析, 由此一種新的方法應(yīng)運(yùn)而生,即權(quán)重基因共表達(dá)網(wǎng)絡(luò)分析(WGCNA)[20]。其首先假定基因網(wǎng)絡(luò)服從無(wú)尺度分布, 根據(jù)功能相似的基因往往具有類似的表達(dá)變化, 構(gòu)建基因模塊。然后對(duì)模塊進(jìn)行深入挖掘, 比如篩選模塊的樞紐基因、關(guān)聯(lián)模塊與性狀、模塊富集分析和建立基因表達(dá)網(wǎng)絡(luò)等[20-21]。該技術(shù)已被廣泛應(yīng)用于人類的疾病研究, 比如Farber[22]結(jié)合GWAS和WGCNA分析策略, 找到調(diào)控人類骨密度的關(guān)鍵樞紐基因。
本研究將結(jié)合GWAS和WGCNA研究方法, 從全基因組和轉(zhuǎn)錄組水平挖掘油菜種子硫苷總量候選基因, 為油菜種子低硫苷分子育種和農(nóng)作物復(fù)雜性狀候選基因鑒定提供參考。
157份具有廣泛變異的甘藍(lán)型油菜(L.)自交種由重慶市油菜工程技術(shù)研究中心提供, 于2013?2016連續(xù)4年種植于西南大學(xué)北碚實(shí)驗(yàn)基地, 3行區(qū)播種, 每行10株, 每年2次田間重復(fù)。采用當(dāng)?shù)爻R?guī)田間管理方式。用福斯(FOSS)近紅外光譜分析儀(NIRSystem, TR-3750)測(cè)定157份油菜自然群體的自交種籽粒硫苷總量, 取每份材料3株自交種, 每袋樣品重復(fù)測(cè)定2次。以SAS軟件(版本9.13)[23]對(duì)表型數(shù)據(jù)進(jìn)行方差分析和相關(guān)性分析。根據(jù)4年數(shù)據(jù)從上述自然群體中選擇穩(wěn)定遺傳的15份極端低硫苷和15份極端高硫苷材料繼續(xù)種植, 于次年始花期統(tǒng)一剝蕾自交, 并于自交后第15天取角果提取RNA。
于八葉期選擇鮮嫩葉片迅速冷凍在液氮中, 由北京百邁客生物科技公司完成總DNA的提取及全基因組重測(cè)序。質(zhì)檢合格的文庫(kù)用Illumina HiSeq 4000平臺(tái)的雙末端150 bp (paired-end, PE150)模式測(cè)序, 共得到34.5億對(duì)reads, 平均每個(gè)材料的測(cè)序深度大約為5倍, 每個(gè)材料的覆蓋率達(dá)到85%。用BWA (版本0.6.1-r104)軟件[24]將過(guò)濾后得到的干凈有效的reads (clean reads)與油菜參考基因組v4.1 (http://www.genoscope.cns.fr/brassicanapus/data/)進(jìn)行比對(duì); 用GATK (版本2.4-7-g5e89f01)軟件[25]檢測(cè)單核苷酸多態(tài)性(single nucleotide polymorphisms, SNP)和插入缺失(insertions and deletions, Indel)。
采用R軟件的GenABEL包[26]進(jìn)行GWAS分析。通過(guò)Merk等[27]方法對(duì)4年油菜種子的硫苷含量數(shù)據(jù)進(jìn)行最佳線性無(wú)偏預(yù)測(cè)(best linear unbiased prediction, BLUP), 估計(jì)種子硫苷含量的BLUP值作為表型數(shù)據(jù)。采用PCA+K (控制主成分和親緣關(guān)系)的混合線性模型對(duì)BLUP和SNP標(biāo)記進(jìn)行關(guān)聯(lián)位點(diǎn)的檢測(cè), 設(shè)定閾值為<8.56×10–8(0.05/所使用的標(biāo)記, –lg () = 7)。將與顯著SNP處于同一單體型塊(2>0.5)的區(qū)間, 定義為候選關(guān)聯(lián)區(qū)間, 在此區(qū)間按以下標(biāo)準(zhǔn)預(yù)測(cè)候選基因: (1)在甘藍(lán)型油菜或擬南芥參考基因組上與性狀相關(guān)的已知功能的基因; (2)SNP直接落在基因內(nèi)部; (3)參考已報(bào)道QTL定位的結(jié)果。
采用植物總RNA提取試劑盒DP432 (天根生化科技有限公司, 北京)提取30份極端材料發(fā)育15 d角果的總RNA, 然后將樣品送北京百邁客生物科技有限公司構(gòu)建文庫(kù), 并以HiSeq 2000基因分析系統(tǒng)(Illumina公司, USA)進(jìn)行RNA-Seq PE100測(cè)序分析。
采用Tophat-Cufflinks-Cuffmerge-Cuffdiff分析流程鑒定DEGs, 利用TopHat 2.0.0調(diào)用bowtie 2將reads比對(duì)到甘藍(lán)型油菜參考基因組, 以Cufflinks 2.0.0對(duì)測(cè)序reads進(jìn)行組裝[28], 分析30個(gè)樣品中所有轉(zhuǎn)錄本的表達(dá)量, 以每1百萬(wàn)個(gè)map上的reads中map到外顯子的每千個(gè)堿基上的reads數(shù)(reads per kilobase of exon per million mapped reads, RPKM)表示。采用Cuffdiff進(jìn)行DEGs的篩選鑒定和統(tǒng)計(jì)檢驗(yàn), 篩選標(biāo)準(zhǔn)如下: (1)過(guò)濾掉在30份材料中基因表達(dá)量最大值小于1的低豐度基因; (2)差異表達(dá)倍數(shù)大于2, 即log2絕對(duì)值大于1; (3)錯(cuò)誤發(fā)現(xiàn)率(false discovery rate,) q-value小于0.05。
通過(guò)基于R語(yǔ)言的WGCNA包[20]對(duì)篩選后得到的2275個(gè)DEGs構(gòu)建基因模塊, 經(jīng)過(guò)計(jì)算設(shè)加權(quán)參數(shù)軟閾值為9, 按照基因間兩兩不相似性聚類得到基因的系統(tǒng)聚類樹(shù)(附圖1-a), 然后根據(jù)動(dòng)態(tài)混合剪切法對(duì)聚類樹(shù)切割修剪, 得到17個(gè)分支模塊(附圖1-b), 計(jì)算每個(gè)模塊的模塊特征值(module eigengene, ME), 并對(duì)ME聚類, 通過(guò)相似度高的ME合并分支模塊, 最終得到12個(gè)整合的基因模塊(附圖1-c)。每個(gè)模塊含有50個(gè)到534個(gè)基因不等, 有200個(gè)基因未能分配到任何1個(gè)基因模塊中。隨后計(jì)算ME與樣本種子硫苷含量的相關(guān)性和顯著性, 確定模塊在樣本中的表達(dá)情況, 正(負(fù))相關(guān)越高(低), 說(shuō)明在這個(gè)模塊內(nèi)的基因表達(dá)量越高(低)。通過(guò)基因的連通性確定每個(gè)基因的權(quán)重值, 連通性高的基因在該模塊中可能起樞紐作用。因此, 我們將顯著模塊內(nèi)單個(gè)基因的平均連通性排名前10%的基因作為樞紐基因(hub gene)。
以BLASTP將每個(gè)模塊的油菜蛋白序列與擬南芥蛋白參考序列(TAIR10)比對(duì), 閾值為1×10–10。根據(jù)比對(duì)結(jié)果, 分別采用在線軟件agriGO (http://bioinfo. cau.edu.cn/agriGO/)和在線工具DAVID (https://david. ncifcrf.gov/), 以及共同采用Fisher精確檢測(cè)(<0.05)和Benjamini Hochberg多重比對(duì)檢驗(yàn)(FDR<0.05), 以同源性最高的擬南芥蛋白為油菜相對(duì)應(yīng)的蛋白進(jìn)行GO富集和KEGG代謝途徑分析。
157份油菜自然群體的種子硫苷含量存在廣泛的遺傳變異, 介于27.38~169.20, 平均為(70.13± 36.50) μmol g–1(表1和附表1)。該性狀4年均表現(xiàn)頻率為連續(xù)近正態(tài)分布, 表明油菜種子硫苷含量是1個(gè)數(shù)量性狀, 受多基因控制。
表1 關(guān)聯(lián)分析群體中油菜種子硫苷含量的表型變異
方差分析結(jié)果表明, 基因型、環(huán)境以及基因型與環(huán)境互作間都存在顯著差異(<0.01), 但年度內(nèi)的重復(fù)間不存在顯著差異(附表2)。4年種子硫苷含量相關(guān)性在0.92~0.96之間。
以全基因組重測(cè)序共檢測(cè)到5 294 158個(gè)SNP和1 307 151個(gè)InDel。過(guò)濾最小等位基因頻率小于5%和雜合率大于25%的位點(diǎn), 剩下690 953個(gè)特異的SNP位點(diǎn)用于GWAS分析, GWAS共檢測(cè)到45個(gè)與油菜種子硫苷含量顯著相關(guān)的SNP, 主要分布在A09 (10個(gè), 物理位置位于2 372 597~3 118 196 bp)、C02 (5個(gè), 物理位置位于44 926 609~44 991 771 bp)和C09 (29個(gè), 物理位置位于2 375 598~3 198 893 bp)染色體的3個(gè)區(qū)間(圖1), 單個(gè)位點(diǎn)解釋的表型變異介于13.45%~23.34%。這個(gè)結(jié)果與以前報(bào)道的QTL結(jié)果一致[14,16,18-19], 表明上述3個(gè)區(qū)間存在控制油菜種子硫苷含量的主效位點(diǎn)。
圖1 全基因組關(guān)聯(lián)分析曼哈頓圖
虛線代表矯正后的閾值–lg () = 7。
The dashed lines represent the bonferroni-adjusted significance threshold –lg () = 7.
與油菜參考基因組的基因注釋比對(duì), 在上述3個(gè)區(qū)間共發(fā)現(xiàn)285個(gè)基因, 其中5個(gè)為已知的硫苷代謝基因。候選基因位于A09染色體2.699 Mb處(顯著SNP區(qū)間內(nèi)); 候選基因和分別位于C09染色體2.927 Mb和3.100 Mb處(顯著SNP區(qū)間內(nèi)); 候選基因和分別位于C02染色體44.600 Mb和44.703 Mb處(顯著SNP上游大約0.2~0.3 Mb處)。與模式植物擬南芥的參考基因組比對(duì)后發(fā)現(xiàn), 候選基因與擬南芥基因0 ()同源, 候選基因與()同源, 其他3個(gè)候選基因均與()同源, 反映出油菜在進(jìn)化過(guò)程中的基因組復(fù)制事件。
WGCNA分析發(fā)現(xiàn)ME與種子硫苷含量的相關(guān)性在–0.76~0.76之間, 其中顯著正和負(fù)相關(guān)的模塊分別有4個(gè)(Brown、Blue、Purple和Yellow)和2個(gè)(Greenyellow和Magenta)(<0.001)。進(jìn)一步對(duì)顯著相關(guān)模塊內(nèi)的基因進(jìn)行GO富集和KEGG分析, 只有1個(gè)模塊(Brown)內(nèi)的基因顯著富集在硫苷生物合成進(jìn)程(GO: 0019761)和硫苷生物合成代謝通路(KEGG: ko00966)(表2), 說(shuō)明該模塊內(nèi)存在控制油菜種子硫苷含量的關(guān)鍵基因。
進(jìn)一步對(duì)Brown模塊內(nèi)163個(gè)基因進(jìn)行權(quán)重分析, 確定樞紐基因。一共檢測(cè)到6100條節(jié)點(diǎn)線, 每條線的節(jié)點(diǎn)連接著1個(gè)基因。單個(gè)基因的權(quán)重值在0.02~0.28之間, 平均值為0.10。刪除位于隨機(jī)染色體的基因, 將剩下的126個(gè)基因蛋白序列與擬南芥蛋白參考序列進(jìn)行同源比對(duì)和功能注釋分析, 共檢測(cè)到33個(gè)(占26.2%)已知的硫苷代謝基因, 包括13對(duì)同源基因, 每對(duì)同源基因分別來(lái)自于油菜的A和C亞基因組, 說(shuō)明這些同源基因共表達(dá)可能參與種子硫苷的代謝。權(quán)重分析中, 連通性排名前10% (權(quán)重值>0.20)的基因有13個(gè), 功能注釋顯示, 其中9個(gè)為已知的硫苷代謝基因, 且3對(duì)為同源基因, 這些基因在油菜種子硫苷代謝中可能起樞紐作用(表3)。
表2 Brown模塊GO富集和KEGG通路分析
表3 Brown模塊樞紐基因分析
對(duì)GWAS鑒定到的5個(gè)已知硫苷代謝基因和利用WGCNA發(fā)現(xiàn)的13個(gè)樞紐基因進(jìn)行表達(dá)分析發(fā)現(xiàn), 其中14個(gè)基因的表達(dá)量與種子硫苷含量的積累顯著正相關(guān)(= 0.376~0.638,<0.05), 1個(gè)基因顯著負(fù)相關(guān)(= –0.489,<0.01)(附圖2)。此外, 兩種方法鑒定到1個(gè)共同的基因(), 該基因與C02染色體上通過(guò)GWAS得到的5個(gè)顯著SNP構(gòu)成1個(gè)單體型塊(2>0.5)(圖2-a), 等位基因效應(yīng)分析發(fā)現(xiàn), 刪除自然群體中11份單體型數(shù)目小于3的材料, 剩下146份材料被分為五種單體型, 其中99份材料(占群體總數(shù)的63%)均為Hap 5, 其平均硫苷含量為50.79 μmol g–1, 與另外四種單體型材料的種子硫苷含量(95.04~110.28 μmol g–1)存在極顯著差異(<0.01)(圖2-b)。上述結(jié)果表明, 兩種方法鑒定到的候選基因與表型性狀具有較高的相關(guān)性, 并且結(jié)合兩種方法可能鑒定到比較重要的關(guān)鍵基因。
基于全基因組重測(cè)序相對(duì)高的圖譜密度, 本研究通過(guò)GWAS共檢測(cè)到5個(gè)已知的硫苷代謝基因。其中, 調(diào)控吲哚族硫苷代謝的候選基因的3個(gè)同源拷貝分別分布在染色體A09 ()、C02 ()和C09 ()。另外, 控制脂肪族硫苷生物合成的基因僅僅1個(gè)拷貝在染色體C09 ()被檢測(cè)到, 該候選基因的低表達(dá)與種子低硫苷顯著相關(guān), 這個(gè)結(jié)果與以前報(bào)道的結(jié)果一致。研究發(fā)現(xiàn)在A09和C02染色體上的2個(gè)拷貝在低硫苷油菜中已經(jīng)丟失[18], 而本研究重測(cè)序數(shù)據(jù)比對(duì)的甘藍(lán)型油菜參考基因組是1個(gè)法國(guó)雙低冬性油菜品種“Darmor-”, 所以通過(guò)GWAS在A09和C02染色體上未檢測(cè)到的SNP變異。事實(shí)上, 基因組的插入和缺失在作物上是很普遍的, 并且是性狀變異的重要機(jī)制。在玉米上, 基因啟動(dòng)子上117 bp的插入和內(nèi)含子35 bp的缺失, 導(dǎo)致了生育酚含量顯著的提升[29]。
圖2 連鎖不平衡和單體型效應(yīng)分析
a: 候選基因與C02染色體5個(gè)顯著SNP構(gòu)成的單體型連鎖不平衡分析, 箭頭指示核心基因和顯著SNP的物理位置。b: C02染色體5個(gè)顯著SNP構(gòu)成的單體型效應(yīng)分析; **表示在0.01水平顯著。
a: the LD analysis between the candidate geneand five significant SNPs from GWAS, the arrows shows physical location of the candidate gene and five significant SNPs on chromosome C02. b: the haplotype effect analysis for five significant SNPs in chromosome C02; ** indicates significant difference at<0.01.
WGCNA對(duì)差異表達(dá)基因進(jìn)行模塊化分類, 可以有效分離出真實(shí)與性狀相關(guān)的基因, 提高功能富集的檢測(cè)功效, 對(duì)復(fù)雜性狀的候選基因挖掘提供新的途徑[22]。本研究通過(guò)WGCNA共檢測(cè)到13個(gè)樞紐基因, 其中9個(gè)已被證實(shí)參與硫苷的代謝途徑[1], 但只有1個(gè)基因()在GWAS中被重復(fù)檢測(cè)到, 該基因在擬南芥中已被證實(shí)編碼甲硫烷基化蘋果酸合酶(methylthioalkylmalate synthase), 是控制甲硫氨酸起源的脂肪族硫苷側(cè)鏈延長(zhǎng)的1個(gè)主要基因[30]。轉(zhuǎn)錄組表達(dá)分析發(fā)現(xiàn), 該基因的表達(dá)與種子硫苷含量的積累顯著正相關(guān)(=0.376,<0.05)。比較五種單體型發(fā)現(xiàn), Hap 5與另外4種單體型的差別主要是C02染色體上SNP位點(diǎn)44 943 847等位基因由C變?yōu)锳, 使得種子硫苷含量降低52%。根據(jù)該位點(diǎn)開(kāi)發(fā)基于PCR的功能標(biāo)記, 將使油菜進(jìn)一步降低種子硫苷含量成為可能。
兩種方法只檢測(cè)到1個(gè)重復(fù)的基因, 原因可能是: (1) WGCNA中的其他12個(gè)樞紐基因在本研究油菜自然群體的DNA水平上沒(méi)有差異, 導(dǎo)致通過(guò)GWAS檢測(cè)不到顯著的位點(diǎn); (2) WGCNA中單一樞紐基因?qū)τ筒朔N子硫苷的貢獻(xiàn)度太小, 它們的顯著性淹沒(méi)在背景噪音中, 通過(guò)GWAS難以準(zhǔn)確挖掘出來(lái), 對(duì)微效多基因控制的位點(diǎn)檢測(cè)能力不足, 正是GWAS面臨的主要問(wèn)題。因此, 在植物復(fù)雜農(nóng)藝性狀候選基因挖掘中, 結(jié)合GWAS和WGCNA分析是一種新的思路, 可以提高對(duì)微效位點(diǎn)的檢測(cè)能力和構(gòu)建關(guān)鍵基因的調(diào)控網(wǎng)絡(luò)。另外, 硫苷總量是由各分量構(gòu)成, 控制硫苷各組分的相關(guān)基因的變異(包括轉(zhuǎn)錄水平的差異)均能影響硫苷總量的表現(xiàn)型。本研究通過(guò)GWAS和WGCNA方法試圖挖掘影響硫苷總量的基因, 但這些基因可能參與硫苷分量的代謝, 需進(jìn)一步研究確定。
另外, 用WGCNA可以深入挖掘顯著模塊內(nèi)樞紐基因, 并對(duì)功能未知的基因進(jìn)行功能注釋預(yù)測(cè)。因?yàn)? 在權(quán)重網(wǎng)絡(luò)中, 節(jié)點(diǎn)線連接的2個(gè)基因表達(dá)模式是相似的, 且有潛在的相似功能, 若節(jié)點(diǎn)線一端基因功能已知的話, 就可以預(yù)測(cè)另一端基因的功能, 為以后驗(yàn)證功能未知基因提供參考。
通過(guò)GWAS和WGCNA共鑒定到18個(gè)控制油菜種子硫苷含量的候選基因, 其中多數(shù)為已知的硫苷基因, 且其表達(dá)量與種子硫苷含量顯著相關(guān)。1個(gè)控制脂肪族硫苷側(cè)鏈延長(zhǎng)的關(guān)鍵基因()被兩種方法共同檢測(cè)出, 與該基因連鎖的5個(gè)SNP構(gòu)成5種單體型, 其中Hap 5覆蓋了63%的材料, 其種子硫苷含量顯著低于含其他4種單體型的材料。因此, 結(jié)合GWAS和WGCNA是一種鑒定復(fù)雜性狀候選基因的新策略, 可兼顧微效位點(diǎn)的檢出率以及對(duì)關(guān)鍵基因的確定。
[1] Chalhoub B, Denoeud F, Liu S, Parkin I A, Tang H, Wang X, Chiquet J, Belcram H, Tong C, Samans B, Correa M, Da Silva C, Just J, Falentin C, Koh C S, Le Clainche I, Bernard M, Bento P, Noel B, Labadie K, Alberti A, Charles M, Arnaud D, Guo H, Daviaud C, Alamery S, Jabbari K, Zhao M, Edger P P, Chelaifa H, Tack D, Lassalle G, Mestiri I, Schnel N, Le Paslier M C, Fan G, Renault V, Bayer P E, Golicz A A, Manoli S, Lee T H, Thi V H, Chalabi S, Hu Q, Fan C, Tollenaere R, Lu Y, Battail C, Shen J, Sidebottom C H, Wang X, Canaguier A, Chauveau A, Berard A, Deniot G, Guan M, Liu Z, Sun F, Lim Y P, Lyons E, Town C D, Bancroft I, Wang X, Meng J, Ma J, Pires J C, King G J, Brunel D, Delourme R, Renard M, Aury J M, Adams K L, Batley J, Snowdon R J, Tost J, Edwards D, Zhou Y, Hua W, Sharpe A G, Paterson A H, Guan C, Wincker P. Early allopolyploid evolution in the post-Neolithicoilseed genome., 2014, 345: 950–953
[2] Nagaharu U. Genomic analysis inwith special reference to the experimental formation ofand peculiar bode of fertilization., 1935, 7: 389–452
[3] 劉后利. 油菜遺傳育種學(xué). 北京: 中國(guó)農(nóng)業(yè)大學(xué)出版社, 2000. pp 146–154 Liu H L. Genetics and Breeding in Rrapeseed. Beijing: Chinese Agricultural Universitatis Press, 2000. pp 146–154 (in Chinese)
[4] Mithen R. Glucosinolates-biochemistry, genetics and biological activity., 2001, 34: 91–103
[5] Fahey J W, Zalcmann A T, Talalay P. The chemical diversity and distribution of glucosinolates and isothiocyanates among plants., 2001, 56: 5–51
[6] Halkier B A, Gershenzon J. Biology and biochemistry of glucosinolates., 2006, 57: 303–333
[7] Bak S, Feyereisen R. The involvement of two p450 enzymes, CYP83B1 and CYP83A1, in auxin homeostasis and glucosinolate biosynthesis., 2001, 127: 108–118
[8] Grubb C D, Abel S. Glucosinolate metabolism and its control., 2006, 11: 89–100
[9] Mikkelsen M D, Naur P, Halkier B A.mutants in the C-S lyase of glucosinolate biosynthesis establish a critical role for indole-3-acetaldoxime in auxin homeostasis., 2004, 37: 770–777
[10] Wittstock U, Halkier B A. Cytochrome P450 CYP79A2 fromL. Catalyzes the conversion of L-phenylalanine to phenylacetaldoxime in the biosynthesis of benzylglucosinolate., 2000, 275: 14659–14666
[11] Wang X, Wang H, Wang J, Sun R, Wu J, Liu S, Bai Y, Mun J H, Bancroft I, Cheng F, Huang S, Li X, Hua W, Wang J, Wang X, Freeling M, Pires J C, Paterson A H, Chalhoub B, Wang B, Hayward A, Sharpe A G, Park B S, Weisshaar B, Liu B, Li B, Liu B, Tong C, Song C, Duran C, Peng C, Geng C, Koh C, Lin C, Edwards D, Mu D, Shen D, Soumpourou E, Li F, Fraser F, Conant G, Lassalle G, King G J, Bonnema G, Tang H, Wang H, Belcram H, Zhou H, Hirakawa H, Abe H, Guo H, Wang H, Jin H, Parkin I A, Batley J, Kim J S, Just J, Li J, Xu J, Deng J, Kim J A, Li J, Yu J, Meng J, Wang J, Min J, Poulain J, Wang J, Hatakeyama K, Wu K, Wang L, Fang L, Trick M, Links M G, Zhao M, Jin M, Ramchiary N, Drou N, Berkman P J, Cai Q, Huang Q, Li R, Tabata S, Cheng S, Zhang S, Zhang S, Huang S, Sato S, Sun S, Kwon S J, Choi S R, Lee T H, Fan W, Zhao X, Tan X, Xu X, Wang Y, Qiu Y, Yin Y, Li Y, Du Y, Liao Y, Lim Y, Narusaka Y, Wang Y, Wang Z, Li Z, Wang Z, Xiong Z, Zhang Z. The genome of the mesopolyploid crop species., 2011, 43: 1035–1039
[12] Liu S Y, Liu Y M, Yang X H, Tong C B, Edwards D, Parkin I A P, Zhao M X, Ma J X, Yu J Y, Huang S M, Wang X Y, Wang J Y, Lu K, Fang Z Y, Bancroft I, Yang T J, Hu Q, Wang X F, Yue Z, Li H J, Yang L F, Wu J, Zhou Q, Wang W X, King G J, Pires J C, Lu C X, Wu Z Y, Sampath P, Wang Z, Guo H, Pan S K, Yang L M, Min J M, Zhang D, Jin D C, Li W S, Belcram H, Tu J X, Guan M, Qi C K, Du D Z, Li J N, Jiang L C, Batley J, Sharpe A G, Park B S, Ruperao P, Cheng F, Waminal N E, Huang Y, Dong C H, Wang L, Li J P, Hu Z Y, Zhuang M, Huang Y, Huang J Y, Shi J Q, Mei D S, Liu J, Lee T H, Wang J P, Jin H Z, Li Z Y, Li X, Zhang J F, Xiao L, Zhou Y M, Liu Z S, Liu X Q, Qin R, Tang X, Liu W B, Wang Y P, Zhang Y Y, Lee J, Kim H H, Denoeud F, Xu X, Liang X M, Hua W, Wang X W, Wang J, Chalhoub B, Paterson A H. Thegenome reveals the asymmetrical evolution of polyploid genomes., 2014, 5: 3930
[13] Fu Y, Lu K, Qian L W, Mei J Q, Wei D Y, Peng X H, Xu X F, Li J N, Frauen M, Dreyer F, Snowdon R J, Qian W. Development of genic cleavage markers in association with seed glucosinolate content in canola., 2015, 128: 1029–1037
[14] Howell P M, Sharpe A G, Lydiate D J. Homoeologous loci control the accumulation of seed glucosinolates in oilseed rape ()., 2003, 46: 454–460
[15] Zhao J, Meng J. Detection of loci controlling seed glucosinolate content and their association with Sclerotinia resistance in., 2003, 122: 19–23
[16] Li F, Chen B Y, Xu K, Wu J F, Song W L, Bancroft I, Harper A L, Trick M, Liu S Y, Gao G Z, Wang N, Yan G X, Qiao J W, Li J, Li H, Xiao X, Zhang T Y, Wu X M. Genome-wide association sudy dissects the genetic architecture of seed weight and seed quality in rapeseed (L.)., 2014, 21: 355–367
[17] Qu C M, Li S M, Duan X J, Fan J H, Jia L D, Zhao H Y, Lu K, Li J N, Xu X F, Wang R. Identification of candidate genes for seed glucosinolate content using association mapping inL., 2015, 6: 1215–1229
[18] Harper A L, Trick M, Higgins J, Fraser F, Clissold L, Wells R, Hattori C, Werner P, Bancroft I. Associative transcriptomics of traits in the polyploid crop species., 2012, 30: 798–802
[19] Lu G, Harper A L, Trick M, Morgan C, Fraser F, O'Neill C, Bancroft I. Associative transcriptomics study dissects the genetic architecture of seed glucosinolate content in., 2014, 21: 613–625
[20] Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis., 2008, 9: 559
[21] 宋長(zhǎng)新, 雷萍, 王婷. 基于WGCNA算法的基因共表達(dá)網(wǎng)絡(luò)構(gòu)建理論及其R軟件實(shí)現(xiàn). 基因組學(xué)與應(yīng)用生物學(xué), 2013, 32: 135–141 Song C X, Lei P, Wang T. Gene co-expression network analysis ased onWGCNA algorithm-theory and implementation in R Software., 2013, 32: 135–141 (in Chinese with English abstract)
[22] Farber C R. Systems-level analysis of genome-wide association data., 2013, 3: 119–129
[23] SAS V9.13 software. SAS Institute, Cary, NC, USA, 2005
[24] Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform., 2009, 25: 1754–1760
[25] McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo M A. The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data., 2010, 20: 1297–1303
[26] Aulchenko Y S, Ripke S, Isaacs A, Van Duijn C M. GenABEL: an R library for genome-wide association analysis., 2007, 23: 1294–1296
[27] Merk H L, Yarnes S C, Van Deynze A. Trait diversity and potential for selection indices based on variation among regionally adapted processing tomato germplasm., 2012, 137: 427–437
[28] Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley D R, Pimentel H, Salzberg S L, Rinn J L, Pachter L. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks., 2012, 7: 562–578
[29] Li Q, Yang X H, Xu S T, Cai Y, Zhang D L, Han Y J, Li L, Zhang Z X, Gao S B, Li J S, Yan J B. Genome-wide association studies identified three independent polymorphisms associated with alpha-tocopherol content in maize kernels., 2012, 7: e36807
[30] Kroymann J, Textor S, Tokuhisa J G, Falk K L, Bartram S, Gershenzon J, Mitchell-Olds T. A gene controlling variation in Arabidopsis glucosinolate composition is part of the methionine chain elongation pathway., 2001, 127: 1077–1088
附表1 材料來(lái)源
Supplementary table 1 Source of the accessions in this study
編號(hào)Code名稱Name種子硫苷含量 Seed GSL content (μmol g–1) 2016201520142013 1自交種 Inbreed line28.0632.2525.2528.06 2自交種 Inbreed line24.5727.1128.8828.06 3自交種 Inbreed line28.4129.2126.7930.40 4自交種 Inbreed line30.4527.6127.2030.33 5蘇油3號(hào) Suyou 328.1833.8329.8230.46 6Rivette30.0027.0532.5029.12 7自交種 Inbreed line38.1430.5033.0127.38 8自交種 Inbreed line34.6333.4130.8531.50 9自交種 Inbreed line29.2828.7228.7933.88 10華雙5號(hào) Huashuang 530.9531.0733.4729.52 11自交種 Inbreed line35.3337.4130.1233.00 12自交種 Inbreed line28.7425.6228.6734.74 13自交種 Inbreed line29.8431.2031.0632.71 14自交種 Inbreed line32.1530.7132.5032.08 15自交種 Inbreed line30.2230.7831.6233.59 16Larissa35.1225.3834.8230.92 17自交種Inbreed line33.4038.2328.4137.88 18自交種 Inbreed line32.2428.4336.9529.40 19Altex33.0428.3432.6635.42 20自交種Inbreed line36.7534.0136.2332.31 21Bronowski32.0233.4932.1136.81 22西南大學(xué)7號(hào) SWU 734.1036.7736.1732.75 23自交種 Inbreed line42.7538.8837.3031.90 24自交種 Inbreed line33.7133.6135.5833.78 25Korall37.6630.1938.6931.61 26自交種 Inbreed line37.0832.6835.0536.10 27兩優(yōu)586 (F2)-6-3 Liangyou 586 (F2)-6-328.3933.3429.6641.63 28自交種 Inbreed line35.9033.5737.5234.14 29Pauline34.9625.2233.8938.58 30ACS N4531.4824.1735.2937.59 31Ability38.1930.5035.6937.34 32自交種 Inbreed line34.6028.8339.0134.57 33Campino31.9732.1938.1535.81 34自交種 Inbreed line34.6439.2938.9035.67 35自交種 Inbreed line51.3831.8538.7337.54 36自交種 Inbreed line35.5334.2937.1339.33 37自交種 Inbreed line37.1539.5633.9342.90
(續(xù)附表1)
編號(hào)Code名稱Name種子硫苷含量 Seed GSL content (μmol g–1)2016201520142013 38自交種 Inbreed line41.5338.4435.1341.86 39自交種 Inbreed line42.8773.0037.9439.10 40Omega104.2391.6138.3838.71 41自交種 Inbreed line37.4738.8242.3635.69 42Q290.6456.7336.8244.85 43自交種 Inbreed line31.3130.9344.4834.73 44自交種 Inbreed line31.2933.7751.0329.67 45SW Sinatra39.6126.6730.6351.22 46Andor36.1725.7840.1542.52 47Tenor37.5938.3345.8937.66 48Aurora36.0434.3447.4836.77 49Clipper39.7845.4140.6443.75 50自交種 Inbreed line41.8045.2844.5240.35 51自交種 Inbreed line41.5936.3245.4839.77 52自交種 Inbreed line41.3041.8444.0942.22 53Granit38.6634.0639.9244.87 54自交種 Inbreed line33.5934.0442.2344.32 55自交種 Inbreed line38.7348.8843.7643.72 56Montego42.9334.9444.6743.51 57Allure49.9141.9946.3942.28 58自交種 Inbreed line40.0841.4146.0442.97 59自交種 Inbreed line38.5445.5044.6244.85 60NK Passion48.4946.2046.1343.47 61Campari46.7352.8548.1242.50 62Lord61.0981.4745.9545.95 63自交種 Inbreed line43.3641.1553.4139.47 64DRAKKAR33.9730.3048.1445.05 65中雙5號(hào) Zhongshuang 553.7147.4548.1445.61 66Lilian48.6642.9450.8743.28 67Rapid50.2749.0049.9244.69 68Aragon54.0443.7955.1539.59 69Lisandra48.3731.8554.8740.51 70821選×品93-498 F8 821 Xuan × Pin 93-498 F857.5246.8243.1753.81 71油研2號(hào)× 84-24016 Youyan 2 × 84-2401676.7079.3041.6756.24 72Pivot31.0531.3558.4241.61 73SLM 051256.1730.5853.7248.36 74Musette61.0559.7353.4149.47 75Lion92.53103.6555.5848.03 76Baros52.4338.7359.5445.92 77Korinth58.7750.0047.2057.10 78Odin68.9760.0060.2050.25 79Beluga59.4252.5954.8058.19 80Express 61783.7846.9760.9052.10 81SWGospel65.6452.6662.8652.86 82Fortis77.7245.9765.6653.94 83Remy68.0353.2965.3654.30 84BRISTOL76.1964.9471.3649.38 85Jantar69.9147.8161.9860.50 86Binera86.6881.6062.2460.40 87Boston74.5552.6064.1359.12 88Licapo60.2344.5466.0857.42 89Alesi73.3250.7566.7859.90 90Nugget73.9852.7668.2758.44 91Jessica71.9273.4769.8057.61 92HANNA76.3576.0262.7867.88 93Duplo74.1876.9374.0557.65 94Wesroona71.1979.2867.3764.48 95Laser76.8158.8775.3357.31 96Recital70.9452.2572.3662.24 97Escort82.0465.8075.9565.02 98Darmor96.64141.0078.3863.00 99Ww 128676.3997.9773.0870.24 100(D57×Oro)×油研2號(hào)-F6 (D57×Oro)×Youyan 2-F672.9256.8167.0776.86 101Falcon72.2173.1176.7571.03 102華油6號(hào) Huayou 684.0491.2873.5078.47 103貴農(nóng)78-6-112 Guinong 78-6-11278.6389.8671.6185.40 104Chuosenshu86.26108.8488.6488.43 105Lirabon109.95104.4998.2886.98 106Pera91.5298.4776.04111.30 107Manitoba87.8994.70114.9973.71 108Nugget105.0194.40104.7187.41 109Regina II89.1895.8393.3399.70 110Oro74.4088.1585.84108.59 111Galant94.2273.9287.67106.82 112V8108.7089.26109.7785.73 113Nakate Chousen126.30124.3691.44104.60 114CRESOR92.3684.7389.55106.61 115Hankkija’s Lauri99.37128.2896.18100.79 116TANTAL103.22115.3495.86104.73 117云油14 Youyou 1496.84102.85106.2196.13 118K26-96114.75102.00103.92100.34 119Spaeths Zollerngold104.30106.79100.98108.09 120Baltia112.0193.61106.74102.92 121Mali101.14105.85102.20114.77 122RESYN-H048112.25106.00104.38113.97 123Daichousen (fuku)114.21106.34114.41106.62 124Orpal95.2496.20110.46111.72 125Orriba110.66112.91112.52111.31 126EMERALD110.66112.91109.30114.72 127湘油16 Xiangyou 16100.0594.64110.22116.66 128Daichousen (mizuyasu)102.05100.97113.01114.26 129Zephyr114.07105.58117.34112.79 130Wesreo114.03101.87118.95113.23 131Miyauchi Na106.39138.65119.01110.30 132Olivia115.30115.82119.21113.47 133Taisetsu107.13116.00111.10122.19 134JANETZKIS133.00129.88115.00119.73 135JetNeuf112.14132.64120.52114.68 136Gulliver110.21102.68113.37119.94 137Leonessa123.22112.63114.65122.08 138Ceska Krajova107.40105.93123.48115.89 139Skrzeszowicki143.05131.53119.89119.54 140CANARD74.88122.32114.91125.84 141Mansholt102.00107.84122.01121.82 142Palu140.92128.05126.79118.74 143Parapluie131.00142.78124.71122.51 144Kruglik119.41114.38118.23129.02 145Czyzowska108.55122.55116.10131.66 146Edita147.21115.40131.10119.89 147Sonnengold128.77144.22133.00125.70 148Conny139.65122.66134.30128.54 149湘農(nóng)油-1 Xiangnongou-1120.54101.40135.57123.31 150西農(nóng)長(zhǎng)角×((D57×Oro)×85-64)F7Xinongchangjiao×((D57×Oro)×85-64)F7106.86128.13122.10141.81 151MOANA139.74145.62138.69125.34 152Mlochowski118.99120.69133.42131.32 153Samo169.19152.80137.73132.82 154Suigenshu124.93144.33147.49126.56 155Nunsdale134.00151.62144.68131.49 156Gisora150.75148.00131.50159.43 157Dippes149.04153.73149.56143.58
附表2 關(guān)聯(lián)分析種子硫苷含量表型方差分析
Supplementary table 2 ANOVA of phenotype for seed GSL in association population
變異來(lái)源Source自由度df均方MS概率值P-value 基因型 Genotype1569238.54<0.001 環(huán)境 Environment3406.700.0002 基因型×環(huán)境 Genotype×Environment459165.83<0.001 重復(fù) Repeat10.180.9561
附圖1 基因聚類樹(shù)及模塊切割
Supplementary fig. 1 Gene cluster dendrogram and module splitting
a: 基因聚類樹(shù), 縱坐標(biāo)表示各基因間的聚類距離; b: 動(dòng)態(tài)混合切割得到的分支模塊; c: 合并相似度高的模塊。
a: gene cluster dendrogram tree,-coordinate indicates the cluster distance between genes; b: modules assignment cut using dynamic tree; c: merging of modules whose expression profiles are very similar.
附圖2 核心基因在30份極端材料中表達(dá)水平與種子硫苷含量的相關(guān)性分析
Supplementary fig. 2 Correlation analysis between core gene expression levels and seed GSL contents in 30 extreme accessions
*和**分別表示在0.05和0.01水平上相關(guān)顯著。
*and**indicate significant correlation at<0.05 and 0.01, respectively.
Identification of Candidate Genes for Seed Glucosinolate Content of Rapeseed by Using Genome-wide Association Mapping and Co-expression Networks Analysis
WEI Da-Yong1,2,3, CUI Yi-Xin3, XIONG Qing4, TANG Qing-Lin1,2, MEI Jia-Qin3, LI Jia-Na3, and QIAN Wei3,*
1College of Horticulture and Landscape Architecture, Southwest University, Chongqing 400715, China;2Key Laboratory of Horticulture Science for Southern Mountainous Regions, Ministry of Education, Chongqing 400715, China;3College of Agronomy and Biotechnology, Southwest University, Chongqing 400715, China;4School of Computer and Information Science, Chongqing 400715, China
Seed meal of rapeseed (L.) is a valuable protein source for livestock raising. However, high seed glucosinolates (GSL) content is harmful and toxic to livestock. Therefore, identifying candidate genes of seed GSL content is important in rapeseed breeding for low seed GSL. In this study, a genome-wide association study (GWAS) for seed GSL content was conducted using 157 rapeseed lines grown in four consecutive years. Meanwhile, a weighted gene co-expression network analysis (WGCNA) was carried out in early seed development stage of 15 low and 15 high seed GSL content lines for detecting candidate genes. In total, 45 SNPs found by GWAS significantly associated with seed GSL contents, explaining 13.5%–23.3% of the phenotypic variance per SNP. These SNPs were mainly detected from three intervals on chromosomes A09, C02, and C09, covering five known GSL metabolism genes. A total of 2275 differentially expressed genes (DEGs) were identified by RNA-Seq between rapeseed lines with low and high seed GSL contents. These DEGs were clustered into 12 modules by WGCNA, of which one module (contains 163 DEGs) was mainly enriched in the GSL biosynthetic process. By using a weighted analysis for this module, 13 hub-genes were detected including nine known GSL metabolic genes. Among the 18 candidate genes identified by GWAS and WGCNA, 14 genes showed significant correlation between their expressions and the seed GSL contents (= 0.376–0.638,< 0.05). Furthermore, one gene,(), was detected by both GWAS and WGCNA. Five haplotypes were formed by five SNPs that significantly linked with, and 63% of the rapeseed population (99/157) were found to carry Hap 5 with significant lower seed GSL contents (an average of 50.79 μmol g–1) compared with those carrying the other four haplotypes (95.04–110.28 μmol g–1). By GWAS and WGCNA, our study not only identified the candidate genes for seed GSL content of rapeseed, but also provided a guidance for digging candidate genes for other complex traits.
;GWAS; WGCNA; Re-sequencing; RNA-seq; Seed GSL content
2017-12-10;
2018-03-15;
2018-03-16.
10.3724/SP.J.1006.2018.00629
本研究由國(guó)家自然科學(xué)基金項(xiàng)目(31601333), 國(guó)家重點(diǎn)基礎(chǔ)研究發(fā)展計(jì)劃(973計(jì)劃)項(xiàng)目(2015CB150201)和中央高校基本科研業(yè)務(wù)專項(xiàng)(XDJK2017B036)資助。
This study was supported by National Natural Science Foundation of China (31601333), the National Basic Research Program of China (973 Program) (2015CB150201), and the Fundamental Research Funds for the Central Universities (XDJK2017B036).
錢偉, E-mail: qianwei666@hotmail.com, Tel: 023-68250701
E-mail: swuwdy@swu.edu.cn
http://kns.cnki.net/kcms/detail/11.1809.S.20180316.0853.002.html