智能導(dǎo)師系統(tǒng)對學(xué)業(yè)成就的影響研究：量化元分析的視角

2019-12-12 10:04汪維富毛美娟閆寒冰

中國遠(yuǎn)程教育 2019年10期

汪維富毛美娟閆寒冰

【摘要】智能導(dǎo)師系統(tǒng)是運(yùn)用人工智能技術(shù)模擬人類教師輔導(dǎo)行為，為學(xué)習(xí)過程提供自適應(yīng)的交互反饋，是信息技術(shù)促進(jìn)個(gè)性化學(xué)習(xí)的重要標(biāo)志性產(chǎn)品。然而，在智能導(dǎo)師系統(tǒng)能否改進(jìn)學(xué)業(yè)成就的問題上仍然存在著較大爭議。為此，研究主要采用量化元分析方法，針對1990年以來國際上關(guān)于智能導(dǎo)師系統(tǒng)提升學(xué)業(yè)成就的58篇實(shí)證研究進(jìn)行了綜合分析。研究發(fā)現(xiàn)：智能導(dǎo)師系統(tǒng)對學(xué)業(yè)成就的平均效應(yīng)量為0.492，具有中等的正向提升作用，能將第50個(gè)百分位的學(xué)生成就提高至約第68個(gè)百分位;在不同的學(xué)生特征、發(fā)表特征與研究設(shè)計(jì)特征中都發(fā)現(xiàn)了智能導(dǎo)師系統(tǒng)的正向促進(jìn)作用，其中測試類型、持續(xù)時(shí)間與樣本量對平均效應(yīng)量具有顯著的調(diào)節(jié)作用，本地測試的平均效應(yīng)量大于標(biāo)準(zhǔn)化考試，實(shí)驗(yàn)處理時(shí)間越長的平均效應(yīng)量越大，而樣本量超過200后，平均效應(yīng)量顯著下降。建議國內(nèi)教育技術(shù)領(lǐng)域擴(kuò)大實(shí)證研究規(guī)模，建立規(guī)范、嚴(yán)謹(jǐn)?shù)母深A(yù)規(guī)范與實(shí)踐指南，促進(jìn)學(xué)科領(lǐng)域的健康發(fā)展。

【關(guān)鍵詞】? 智能導(dǎo)師系統(tǒng);元分析;學(xué)業(yè)成就;個(gè)性化學(xué)習(xí);計(jì)算機(jī)輔助教學(xué);效應(yīng)量;人工智能;認(rèn)知輔導(dǎo)系統(tǒng)

【中圖分類號】? ?G434? ? ? ? 【文獻(xiàn)標(biāo)識碼】? A? ? ? ?【文章編號】? 1009-458x（2019）10-0040-12

一、引言

從孔子私塾式的一對一啟發(fā)式教學(xué)到現(xiàn)代班級制度的一對多授課，從大眾媒體的單向知識傳播到人工智能的人機(jī)對話交互，人類輔導(dǎo)學(xué)習(xí)的歷史可謂源遠(yuǎn)流長，而基于機(jī)器的輔導(dǎo)是學(xué)業(yè)輔導(dǎo)歷史中較晚出現(xiàn)的一種新形式。雖然利用機(jī)器輔助或代替人類個(gè)體開展教學(xué)活動的實(shí)踐探索遠(yuǎn)早于數(shù)字電子計(jì)算機(jī)與人工智能的發(fā)明（張志禎，等， 2019），但正是電子計(jì)算機(jī)的出現(xiàn)才讓機(jī)器輔導(dǎo)的大規(guī)模探索與應(yīng)用成為現(xiàn)實(shí)。20世紀(jì)50年代開始出現(xiàn)的教學(xué)機(jī)器與程序化教學(xué)系統(tǒng)，基于行為主義學(xué)習(xí)理論，以片段分塊、順序分支的方式呈現(xiàn)事先準(zhǔn)備好的固定教學(xué)內(nèi)容，提供適度的交互反饋并引出后續(xù)學(xué)習(xí)內(nèi)容。這種系統(tǒng)被統(tǒng)稱為計(jì)算機(jī)輔助教學(xué)系統(tǒng)（Computer-Aided Instruction， CAI），具有機(jī)械性、程序化、預(yù)設(shè)性與封閉性特征。20世紀(jì)70年代出現(xiàn)了一種新型的機(jī)器輔導(dǎo)系統(tǒng)，它扎根于人工智能理論與認(rèn)知理論，從專家知識庫中創(chuàng)建自動化的提示與反饋，依據(jù)學(xué)生知識水平與風(fēng)格提供自適應(yīng)性的精細(xì)化腳手架，被統(tǒng)稱為“智能導(dǎo)師系統(tǒng)”（Intelligent Tutoring Systems， ITSs）。智能導(dǎo)師系統(tǒng)與純粹的CAI有明顯區(qū)別，智能導(dǎo)師系統(tǒng)增強(qiáng)了學(xué)習(xí)環(huán)境的自適應(yīng)能力，具備一定的情境感知、自然語言理解、推理進(jìn)化、智能適應(yīng)和自我學(xué)習(xí)的能力。

智能導(dǎo)師系統(tǒng)是按照人類輔導(dǎo)行為的理念來開發(fā)的（Woolf， 2009），以幫助具有不同基礎(chǔ)、風(fēng)格與背景的學(xué)生實(shí)現(xiàn)個(gè)性化學(xué)習(xí)。特別是隨著人工智能技術(shù)的不斷發(fā)展，智能導(dǎo)師系統(tǒng)能有效追蹤學(xué)生的知識水平、學(xué)習(xí)策略、情感狀態(tài)、學(xué)習(xí)風(fēng)格并給予自適應(yīng)的智能輔導(dǎo)等，這可能是人類輔導(dǎo)人員都無法做到的精細(xì)化水平，但人工智能實(shí)現(xiàn)教學(xué)自動化依然會存在限度（張志禎，等， 2019）?，F(xiàn)階段，智能導(dǎo)師系統(tǒng)能否提升學(xué)習(xí)成效？是否比傳統(tǒng)課堂教學(xué)更有效？哪些因素會影響智能導(dǎo)師系統(tǒng)提升學(xué)業(yè)成就？這些是教育技術(shù)領(lǐng)域需要回答的重要問題。

二、相關(guān)研究綜述

范萊恩（VanLehn， 2011）總結(jié)了不同輔導(dǎo)形式對于提升學(xué)業(yè)成就的影響效應(yīng)，認(rèn)為計(jì)算機(jī)輔助教學(xué)系統(tǒng)比常規(guī)教學(xué)提高約0.3個(gè)標(biāo)準(zhǔn)差，智能導(dǎo)師系統(tǒng)能提高約1個(gè)標(biāo)準(zhǔn)差，而人類輔導(dǎo)被認(rèn)為是最有效的，能提高2個(gè)標(biāo)準(zhǔn)差，即將第50個(gè)百分位的學(xué)生提升至第98個(gè)百分位。這是范萊恩根據(jù)前幾十年的相關(guān)研究所做出的推斷，結(jié)果是否真實(shí)呢？

關(guān)于計(jì)算機(jī)輔助教學(xué)系統(tǒng)有效性的觀點(diǎn)還是比較可信的。從元分析的整體視角考察，早期計(jì)算機(jī)輔助教學(xué)系統(tǒng)的影響效應(yīng)量處在0.37～0.42之間（Burns， 1981; Hartley， 1978）。庫利克等人（Kulik， et al.， 1991; Kulik， 1994， 2002）持續(xù)開展了多次跟蹤性分析，發(fā)現(xiàn)CAI能帶來0.30～0.38個(gè)標(biāo)準(zhǔn)差的提升。巴伊拉克塔爾（Bayraktar， 2001）也發(fā)現(xiàn)CAI能為中學(xué)生與大學(xué)生的科學(xué)教育成績提高0.273個(gè)標(biāo)準(zhǔn)差。最近，索薩等人（Sosa， et al.， 2011）發(fā)現(xiàn)CAI能為大學(xué)生帶來0.33個(gè)標(biāo)準(zhǔn)差的成績提升。塔米姆等人（Tamim， et al.， 2011）針對25項(xiàng)關(guān)于教學(xué)技術(shù)與系統(tǒng)有效性的元分析進(jìn)行了二次綜述，發(fā)現(xiàn)14個(gè)CAI的平均效應(yīng)量是0.26個(gè)標(biāo)準(zhǔn)差。然而，CAI在特定領(lǐng)域的影響差異較大，如在早期閱讀（Blok， et al.， 2002）中存在偏低的效應(yīng)量（0.19），而在特殊教育領(lǐng)域（Kroesbergen， et al.， 2003）又具有偏高的效應(yīng)量（0.75）。但從整體上考察，提高約1/3個(gè)標(biāo)準(zhǔn)差是CAI影響效應(yīng)較為一致的結(jié)論。

關(guān)于人類輔導(dǎo)的有效性難以找到一致的直接證據(jù)。人類輔導(dǎo)提高2個(gè)標(biāo)準(zhǔn)差的觀點(diǎn)來自于布魯姆的一篇頗具影響力的文章（Bloom， 1984），他提出了著名的“雙西格瑪（two-sigma）”問題，認(rèn)為每個(gè)學(xué)生通過一對一的個(gè)性化輔導(dǎo)都能取得2個(gè)標(biāo)準(zhǔn)差的成績提升。他在實(shí)驗(yàn)中發(fā)現(xiàn)，在無人輔導(dǎo)的情況下，掌握學(xué)習(xí)小組比常規(guī)教學(xué)小組提高了1.2個(gè)標(biāo)準(zhǔn)差，而有教師輔導(dǎo)的掌握學(xué)習(xí)小組又額外提高了0.8個(gè)標(biāo)準(zhǔn)差，共產(chǎn)生2個(gè)標(biāo)準(zhǔn)差的效應(yīng)。這種改進(jìn)效應(yīng)是人類輔導(dǎo)與掌握學(xué)習(xí)策略混合干預(yù)的結(jié)果，并沒有單獨(dú)評估人類輔導(dǎo)的直接效應(yīng)。范萊恩（VanLehn， 2011）發(fā)現(xiàn)了人類輔導(dǎo)的改進(jìn)效應(yīng)量為0.79個(gè)標(biāo)準(zhǔn)差，明顯低于布魯姆的結(jié)論。此外，柯恩等人（Cohen， et al.， 1982）發(fā)現(xiàn)中小學(xué)同伴輔導(dǎo)僅能提高0.4個(gè)標(biāo)準(zhǔn)差，成人輔導(dǎo)能給小學(xué)生成績帶來0.3個(gè)標(biāo)準(zhǔn)差的提升（Ritter， 2009），劉珊珊與楊向東（2015）發(fā)現(xiàn)課外輔導(dǎo)對學(xué)業(yè)成績的效應(yīng)量僅為0.27，甚至李佳麗（2017）還發(fā)現(xiàn)一對一家教補(bǔ)習(xí)對于小學(xué)生的成績有顯著的消極影響。這些研究結(jié)論與布魯姆得出的2個(gè)標(biāo)準(zhǔn)差的效果有巨大差異，有待更嚴(yán)謹(jǐn)?shù)暮铣煞治觥?/p>

智能導(dǎo)師系統(tǒng)從誕生起就被賦予實(shí)現(xiàn)個(gè)性化學(xué)習(xí)輔導(dǎo)的期望，然而對于學(xué)業(yè)成就提升的成效影響至今爭議很大。認(rèn)知輔導(dǎo)系統(tǒng)（Cognitive Tutor， CT）是美國K-12廣泛應(yīng)用的一款數(shù)學(xué)智能導(dǎo)師系統(tǒng)，安德森等人（Anderson， et al.， 1995）發(fā)現(xiàn)早期版本的CT能產(chǎn)生1個(gè)標(biāo)準(zhǔn)差的成績提升，但利特等人（Ritter， et al.， 2007）發(fā)現(xiàn)較新版本的CT只能提高0.38個(gè)標(biāo)準(zhǔn)差。按照美國教育部有效教育策略資料中心（What Works Clearinghouse， WWC， 2017）的標(biāo)準(zhǔn)，認(rèn)為能提高0.25個(gè)標(biāo)準(zhǔn)差就表示有實(shí)質(zhì)性的重要影響，因此，他們認(rèn)為CT具有實(shí)質(zhì)性的重要影響。近年來，通過元分析，范萊恩（VanLehn， 2011）發(fā)現(xiàn)智能導(dǎo)師系統(tǒng)能平均提高0.76個(gè)標(biāo)準(zhǔn)差。庫里克與弗萊徹（Kulik & Fletcher， 2016）分析了50項(xiàng)ITSs應(yīng)用于學(xué)業(yè)改進(jìn)的獨(dú)立研究，發(fā)現(xiàn)能提高0.66個(gè)標(biāo)準(zhǔn)差。瑪?shù)热耍∕a， et al.， 2014）針對73個(gè)獨(dú)立實(shí)證研究進(jìn)行元分析，發(fā)現(xiàn)其平均效應(yīng)量是0.43個(gè)標(biāo)準(zhǔn)差，且產(chǎn)品類型、對照組類型、測試類型等因素的調(diào)節(jié)作用顯著。斯滕貝格·胡與庫珀（Steenbergen-Hu & Cooper， 2014）發(fā)現(xiàn)ITSs能為大學(xué)生學(xué)業(yè)成績帶來0.37個(gè)標(biāo)準(zhǔn)差的提升。然而，美國教育部有效教育策略資料中心（WWC， 2016）另一項(xiàng)綜合評估報(bào)告發(fā)現(xiàn)，CT的平均效應(yīng)量接近零。斯滕貝格·胡與庫珀（Steenbergen-Hu & Cooper， 2013）的另一項(xiàng)元分析發(fā)現(xiàn)，ITSs在K-12數(shù)學(xué)科目中只能提高0.05個(gè)標(biāo)準(zhǔn)差，也微不足道。

ITSs是從CAI發(fā)展而來的，能實(shí)現(xiàn)更智能化、個(gè)性化的學(xué)習(xí)輔導(dǎo)。根據(jù)CAI能提高1/3個(gè)標(biāo)準(zhǔn)差的基本共識，ITSs與傳統(tǒng)教學(xué)比較，應(yīng)該還能額外增加效應(yīng)量。然而，目前針對ITSs的有效性評價(jià)卻是如此缺乏共識，令人吃驚。如今，人工智能正從1.0走向2.0（潘云鶴， 2018），人工智能在教育領(lǐng)域中的應(yīng)用不斷發(fā)展升級，作為典型的教育類人工智能產(chǎn)品，吸引了越來越多的社會關(guān)注，若ITSs在有效性這個(gè)重要問題上不能達(dá)成一個(gè)基本共識，不利于人工智能在教育領(lǐng)域的規(guī)模化應(yīng)用。因此，本研究采用量化元分析方法（Quantitative meta-analysis）來試圖回應(yīng)這個(gè)重要問題。

三、研究過程

量化元分析（也稱“元分析”）是國際上廣泛使用的一種合成性實(shí)證研究方法，能針對一個(gè)主題的多項(xiàng)獨(dú)立實(shí)驗(yàn)研究的結(jié)果進(jìn)行再次合并分析，充分挖掘不同實(shí)驗(yàn)特征對總體結(jié)果的影響程度，獲得更具普適性的研究結(jié)論。根據(jù)格拉斯等（Glass， et al.， 1981）關(guān)于元分析的研究，本研究包括文獻(xiàn)準(zhǔn)備、特征編碼、效應(yīng)量計(jì)算與分析和結(jié)果討論四個(gè)階段。

（一）文獻(xiàn)準(zhǔn)備

1. 文獻(xiàn)搜索

廣泛搜集關(guān)于智能導(dǎo)師系統(tǒng)的實(shí)驗(yàn)或準(zhǔn)實(shí)驗(yàn)研究，一方面，通過國外權(quán)威文獻(xiàn)數(shù)據(jù)庫（ERIC， Proquest Dissertation and Theses， ScienceDirect， Wiley， SpringerLink）與Google學(xué)術(shù)引擎搜索候選文獻(xiàn)，檢索時(shí)用技術(shù)關(guān)鍵詞（如intelligent tutoring system、artificial tutor、pedagogical agents等）與學(xué)業(yè)關(guān)鍵詞（如learning gains、outcome、achievement、performance、improvement、evaluate等）進(jìn)行交叉檢索，限定時(shí)間為1990年1月1日至2018年11月30日），同時(shí)在CNKI采用“智能導(dǎo)師系統(tǒng)”“智能輔導(dǎo)”“智能教學(xué)系統(tǒng)”等關(guān)鍵詞進(jìn)行檢索。另一方面，通過相關(guān)研究的參考文獻(xiàn)進(jìn)行回溯倒查（VanLehn， 2011; Steenbergen-Hu & Cooper， 2013; Kulik & Fletcher， 2016），對在學(xué)校應(yīng)用較為廣泛的ITSs產(chǎn)品進(jìn)行專題搜索（如Cognitive Tutor、AutoTutor、WWC等）。最后，做初步數(shù)據(jù)處理，剔除軟件開發(fā)報(bào)告、滿意度調(diào)查、技術(shù)優(yōu)化研究等不相關(guān)文獻(xiàn)，最后總共產(chǎn)生了489篇可供進(jìn)一步分析的候選文獻(xiàn)。

2. 文獻(xiàn)篩選

在粗略考察了小部分候選樣本之后，制定了一份文獻(xiàn)篩選標(biāo)準(zhǔn)，入選的候選文獻(xiàn)必須滿足這些條件：①采用實(shí)驗(yàn)或準(zhǔn)實(shí)驗(yàn)設(shè)計(jì)，有對照組與實(shí)驗(yàn)組，且匯報(bào)了兩組前測無顯著差異，單組實(shí)驗(yàn)或未報(bào)告前測無顯著差異的不予納入。②實(shí)驗(yàn)組必須接受了智能導(dǎo)學(xué)系統(tǒng)的干預(yù)實(shí)踐。如今計(jì)算機(jī)輔助教學(xué)系統(tǒng)層出不窮，對于智能導(dǎo)師系統(tǒng)概念的準(zhǔn)確理解非常關(guān)鍵。CAI與ITSs的主要區(qū)別在于：前者只提供固定的機(jī)械性交互反饋，而后者能在學(xué)習(xí)過程中提供顆粒更精細(xì)的自適應(yīng)反饋。同時(shí)，根據(jù)范萊恩（VanLehn， 2006）提出的ITSs包含外循環(huán)（outer loop）和內(nèi)循環(huán)（inner loop）的反饋機(jī)制進(jìn)行識別。③對照組接受了相同內(nèi)容的常規(guī)教與學(xué)，但不能接受其他類型的計(jì)算機(jī)輔助教學(xué)或人類單獨(dú)輔導(dǎo)等。④實(shí)驗(yàn)組與對照組都測量了知識、技能等學(xué)業(yè)目標(biāo)，可以是卷面測試、現(xiàn)場技能評估、真實(shí)任務(wù)遷移等，但滿意度、情感目標(biāo)的測量結(jié)果不被接受。⑤被試群體是具有正常需求的學(xué)習(xí)群體，剔除有身心障礙、學(xué)習(xí)障礙的相關(guān)研究，但知識與能力基礎(chǔ)有差異的相關(guān)研究是可接受的。⑥文獻(xiàn)報(bào)告了必要的數(shù)據(jù)資料，可用于計(jì)算相應(yīng)的效應(yīng)量（Effect size）。最后從候選文獻(xiàn)中遴選出58個(gè)符合標(biāo)準(zhǔn)的獨(dú)立研究，可供分析的效應(yīng)量達(dá)83個(gè)，其中部分獨(dú)立研究提供了多個(gè)效應(yīng)量，如表1所示（“作者”列括號中的數(shù)字表示所提供的效應(yīng)量個(gè)數(shù)）。

入選分析的58份獨(dú)立研究都是英文文獻(xiàn)，最早的是在1990年，實(shí)驗(yàn)群體涵蓋小學(xué)、中學(xué)、大學(xué)?？傮w上來說，代表性強(qiáng)、嚴(yán)謹(jǐn)性程度高：①入選研究的總樣本達(dá)到10，764，而樣本量最大是2，279，最小樣本量為21;②在教育層次上，大學(xué)占比超過一半（53.45%），小學(xué)最少（12.07%）;③在學(xué)科應(yīng)用上，計(jì)算機(jī)（29.31%）、數(shù)學(xué)（29.31%）和語言（18.97%）是主力軍，還涉及物理、經(jīng)濟(jì)學(xué)、心理研究方法、生物、法律等其他學(xué)科。

（二）特征編碼

為分析特征變量對效應(yīng)量的影響，在多輪考察部分入選文獻(xiàn)的特征之后，研制了一份涵蓋10個(gè)變量的研究特征編碼表（如表2所示）。最初將這些內(nèi)容詳細(xì)記錄，再將觀察結(jié)果重新編碼為有序的類別變量。

為何這樣設(shè)計(jì)編碼方式？首先，在實(shí)驗(yàn)時(shí)間上WWC建立了針對教育實(shí)證研究的接納標(biāo)準(zhǔn)，而使用最為廣泛的認(rèn)知輔導(dǎo)系統(tǒng)從2007年開始有符合嚴(yán)格標(biāo)準(zhǔn)的干預(yù)研究，考慮到研究周期，特將2004年作為初始分類年份;其次，元分析注重發(fā)布偏倚的考察，在出版類型上作了正式出版（期刊/會議/書籍章節(jié)）與未正式出版（項(xiàng)目報(bào)告/學(xué)位論文/未發(fā)表手稿）的區(qū)分;最后，在標(biāo)準(zhǔn)化測試中使用的是權(quán)威機(jī)構(gòu)研制的具有高信效度的測試內(nèi)容，而本地自行測試是研究者自行設(shè)計(jì)的相關(guān)考試，在相關(guān)研究中發(fā)現(xiàn)了測試類型對效應(yīng)量的顯著調(diào)節(jié)作用（Kulik & Fletcher， 2016）。

（三）效應(yīng)量計(jì)算方法

本研究采用綜合元分析軟件（Comprehensive Meta-Analysis 3.0）作為主要的數(shù)據(jù)處理與分析工具，將來自不同獨(dú)立研究的原始數(shù)據(jù)進(jìn)行合并，按照赫杰斯和奧利金（Hedges & Olkin， 1985）提出的標(biāo)準(zhǔn)化均差（Standardized Mean Difference， SMD）公式計(jì)算每個(gè)效應(yīng)量，再依次計(jì)算合并后的平均效應(yīng)量和分組效應(yīng)量，每個(gè)效應(yīng)量的計(jì)算數(shù)據(jù)需要至少包括雙組實(shí)驗(yàn)的樣本（N）、后測均值（Mean）和后測標(biāo)準(zhǔn)差（SD），而后結(jié)合SPSS軟件將效應(yīng)量與特征編碼進(jìn)行統(tǒng)計(jì)分析，了解研究特征對效應(yīng)量的具體影響。

四、結(jié)果檢驗(yàn)與分析

（一）發(fā)表偏倚檢驗(yàn)

當(dāng)所入選的實(shí)證研究樣本無法代表可能存在的全體實(shí)證研究時(shí)就會發(fā)生發(fā)表偏倚。因此，在計(jì)算效應(yīng)量之前，需要對是否存在發(fā)表偏倚進(jìn)行估計(jì)和檢驗(yàn)。通常使用的發(fā)表偏倚檢驗(yàn)方法包括漏斗圖法、Begg秩相關(guān)法和失安全系數(shù)（Fail-safe）等（李玉，等， 2018）。本研究的漏斗圖檢驗(yàn)結(jié)果如圖1，大部分研究分布在中線左右，整體呈漏斗形，但左邊偏少一些且靠近底部有部分散點(diǎn)（標(biāo)準(zhǔn)誤較大），可能會發(fā)生一定程度的偏倚。漏斗圖只能按照定性的方式來表征結(jié)果，在判斷時(shí)具有很大的主觀性（Greenland， 1994），而且入選研究中已經(jīng)囊括了部分非正式發(fā)表的手稿，因此需要進(jìn)一步檢驗(yàn)。Begg秩相關(guān)性通過分析效應(yīng)與標(biāo)準(zhǔn)誤是否相關(guān)來檢驗(yàn)偏倚，發(fā)現(xiàn)其相關(guān)性（Tau）為0.123，且不顯著（p=0.09），若將非正式發(fā)表的文獻(xiàn)剔除，則相關(guān)性更低（Tau=0.03， p=0.64），說明不存在發(fā)表偏倚。失安全系數(shù)方法檢驗(yàn)是為了排除存在偏倚的可能，計(jì)算最少需要多少個(gè)消極結(jié)果的研究才能使本結(jié)論發(fā)生逆轉(zhuǎn)，失安全系數(shù)越大，說明存在偏倚的可能性越小。本研究發(fā)現(xiàn)其失安全系數(shù)為9806，遠(yuǎn)超過“5N+10”（N=83），說明發(fā)表偏倚的可能性很小。整體上考慮，存在發(fā)表偏倚的可能性很小。

（二）基本統(tǒng)計(jì)分析

數(shù)據(jù)樣本執(zhí)行效應(yīng)量計(jì)算后發(fā)現(xiàn)，83個(gè)獨(dú)立效應(yīng)量中有75個(gè)（90.36%）實(shí)驗(yàn)組的后測成績顯著高于對照組。根據(jù)科恩（Cohen， 1992）效應(yīng)量統(tǒng)計(jì)理論，效應(yīng)量處在0.2時(shí)說明影響很小，處在0.5時(shí)說明是中等程度的影響，而達(dá)到0.8時(shí)表示影響顯著，本研究發(fā)現(xiàn)超過一半（42項(xiàng)）的效應(yīng)量達(dá)到0.5以上。因此，絕大部分研究表明智能導(dǎo)師系統(tǒng)能積極提升學(xué)業(yè)成就，而且半數(shù)研究達(dá)到了中等以上程度的正向影響。

（三）平均效應(yīng)量計(jì)算

為更準(zhǔn)確地表征智能導(dǎo)師系統(tǒng)對學(xué)業(yè)成就的影響，計(jì)算了其平均效應(yīng)量，如表3所示，Q值檢驗(yàn)顯著（p<0.001）且I2明顯高于50%，說明入選元分析的獨(dú)立研究存在明顯的異質(zhì)性，宜采納隨機(jī)效應(yīng)模型作為合并效應(yīng)量的計(jì)算模型。因此，本研究的平均效應(yīng)量為0.492，95%置信區(qū)間的效應(yīng)量為0.408～0.577（p<0.000），表明智能導(dǎo)師系統(tǒng)對學(xué)業(yè)成就具有中等的正向作用，平均效應(yīng)量處在0.408～0.577。

（四）影響效應(yīng)量的特征因素

盡管平均效應(yīng)量處于中等水平，但是在一些研究中效應(yīng)量非常大而其他研究中效應(yīng)量比較小，存在異質(zhì)性。為弄清影響效應(yīng)量的特征因素，研究分別對學(xué)生特征、發(fā)表特征和研究設(shè)計(jì)進(jìn)行了效應(yīng)量的分組分析及線性回歸分析。

1. 學(xué)生特征對實(shí)驗(yàn)效應(yīng)的影響

探究學(xué)生的國家屬地、知識基礎(chǔ)水平和教育層次對實(shí)驗(yàn)效應(yīng)的影響。按照上述異質(zhì)性檢驗(yàn)方法（后續(xù)均按照該方法選擇相應(yīng)效應(yīng)模型），發(fā)現(xiàn)這三個(gè)因素應(yīng)該分別使用隨機(jī)效應(yīng)模型、固定效應(yīng)模型和隨機(jī)效應(yīng)模型，如表4所示。在發(fā)展中國家和發(fā)達(dá)國家維度，智能導(dǎo)師系統(tǒng)均能顯著提升學(xué)生學(xué)業(yè)成就。在發(fā)展中國家（g=0.777）的實(shí)驗(yàn)效果似乎高于發(fā)達(dá)國家（0.465），但在相應(yīng)的隨機(jī)效應(yīng)模型中并未達(dá)到顯著水平（p>0.05）。不管是知識基礎(chǔ)一般，還是基礎(chǔ)較差的學(xué)生，智能導(dǎo)師系統(tǒng)均顯示出顯著的積極影響，而且基礎(chǔ)較差的學(xué)生（g=0.568）與基礎(chǔ)普通的學(xué)生（g=0.291）比較，具有顯著的差異（p<0.05），前者的平均效應(yīng)量幾乎是后者的兩倍。對于所有教育層次的學(xué)生來說，智能導(dǎo)師系統(tǒng)都具有顯著的積極作用，而且層次之間具有顯著差異（p<0.05），對大學(xué)生、小學(xué)生的影響更大，對中學(xué)生的影響更小。因此，智能導(dǎo)師系統(tǒng)對于不同學(xué)生都具有顯著的正向影響，而且對知識基礎(chǔ)較低、大學(xué)生具有更為顯著的積極影響。

2. 發(fā)表特征對實(shí)驗(yàn)效應(yīng)的影響

為考察文獻(xiàn)的發(fā)表特征對效應(yīng)量的影響，研究從實(shí)施時(shí)間與發(fā)表類型上進(jìn)行了效應(yīng)量的分組計(jì)算，結(jié)果如表5所示。各年度區(qū)間的實(shí)驗(yàn)效果都具有顯著的正向作用，雖然2005年至2011年看似效應(yīng)量更高，但并未達(dá)到顯著水平，即實(shí)驗(yàn)時(shí)間對效應(yīng)量的變異并無實(shí)際影響。不同發(fā)表類型的文獻(xiàn)都具有顯著的正向提升作用，盡管正式發(fā)表類型（期刊、會議和書籍）文獻(xiàn)的效應(yīng)量較高，但與非正式發(fā)表類型的差異并不顯著。

3. 研究設(shè)計(jì)對實(shí)驗(yàn)效應(yīng)的影響

研究設(shè)計(jì)往往是影響實(shí)驗(yàn)效果的重要因素，研究從樣本量、學(xué)科、應(yīng)用產(chǎn)品、持續(xù)時(shí)間、測試類型等五個(gè)方面進(jìn)行考察，如表6所示。在各個(gè)樣本量區(qū)間，ITSs都顯示出了顯著的正向促進(jìn)作用，而且組間存在顯著的差異性，樣本量越大，效應(yīng)量越小，樣本量低于200時(shí)效應(yīng)量變化不大，而高于200之后效應(yīng)量銳減至一半。在學(xué)科分類上，ITSs對理工科與文科都產(chǎn)生了顯著的積極效應(yīng)，但理工科的效應(yīng)量顯著高于文科。針對兩類產(chǎn)品的實(shí)驗(yàn)都產(chǎn)生了積極的促進(jìn)作用，但其他產(chǎn)品與認(rèn)知輔導(dǎo)系統(tǒng)的影響成效上有顯著的差異，前者是后者的三倍。無論實(shí)驗(yàn)持續(xù)時(shí)間的長短都具有正向的促進(jìn)作用，但不同的持續(xù)時(shí)間具有顯著的差異，短期（小于1周）與長期（超過15周）的實(shí)驗(yàn)效應(yīng)都不如中期（1周～15周）好。不同的測試類型都具有顯著的積極效應(yīng)，但本地自行測試要顯著優(yōu)于標(biāo)準(zhǔn)化考試，前者的效應(yīng)量超過后者的兩倍。

4. 對平均效應(yīng)量的調(diào)節(jié)影響分析

按照上述分析結(jié)果，在所有特征層面ITSs對學(xué)業(yè)成就都具有正向的顯著提升作用，而且部分學(xué)生特征（如知識基礎(chǔ)水平和教育層次）與研究設(shè)計(jì)特征（學(xué)科、產(chǎn)品、樣本量、持續(xù)時(shí)間和測試類型）表現(xiàn)出了組間效應(yīng)量的顯著差異，但這些特征對平均效應(yīng)量是否有調(diào)節(jié)影響還有待進(jìn)一步驗(yàn)證，因?yàn)楦黝愄卣髦g可能存在相關(guān)性。為此，研究結(jié)合線性回歸分析方法驗(yàn)證上述特征對于平均效應(yīng)量變異的影響，擬合效果較好的是一個(gè)涵蓋三個(gè)自變量的模型，如表7所示。

有顯著調(diào)節(jié)作用的是測試類型、持續(xù)時(shí)間和樣本量三個(gè)特征，雖然對平均效應(yīng)量的變異解釋程度并不高（R2=22.7%），但對平均效應(yīng)量的變異具有顯著的調(diào)節(jié)作用，依據(jù)其作用大小，依次為測試類型（-0.467）、持續(xù)時(shí)間（0.356）與樣本量（-0.191），說明本地測試、較長實(shí)驗(yàn)時(shí)間、更小樣本量會獲得更大的效應(yīng)量（此處使用的樣本量為真實(shí)數(shù)量，未使用編碼后的有序分類變量）。不同的知識基礎(chǔ)、教育層次、學(xué)科和產(chǎn)品都有顯著的效應(yīng)量差異，卻為何并沒有對平均效應(yīng)量變異產(chǎn)生顯著影響？研究分析了這三個(gè)因素與其他因素之間的相關(guān)性，發(fā)現(xiàn)產(chǎn)品類型分別與測試類型、樣本量和持續(xù)時(shí)間存在顯著的相關(guān)性;教育層次分別與測試類型和樣本量存在顯著的相關(guān)性，知識基礎(chǔ)水平又與教育層次存在顯著的相關(guān)性;學(xué)科也與產(chǎn)品類型存在顯著的相關(guān)性?？梢哉J(rèn)為，不同知識基礎(chǔ)、教育層次、學(xué)科與產(chǎn)品之所以會產(chǎn)生顯著的效應(yīng)量差異，其背后實(shí)質(zhì)上是這三個(gè)調(diào)節(jié)變量在起作用，如選用認(rèn)知輔導(dǎo)系統(tǒng)的實(shí)驗(yàn)設(shè)計(jì)基本上都采納大樣本（80%）、標(biāo)準(zhǔn)化測試（90%）、中長期實(shí)驗(yàn)（超過15周的占70%），導(dǎo)致認(rèn)知輔導(dǎo)系統(tǒng)產(chǎn)品的效應(yīng)量較低。因此，可以認(rèn)為，只有測試類型、持續(xù)時(shí)間與樣本量才是顯著調(diào)節(jié)平均效應(yīng)量的關(guān)鍵特征。

五、總結(jié)與討論

本研究通過對國外58項(xiàng)關(guān)于智能導(dǎo)師系統(tǒng)提升學(xué)業(yè)成就的獨(dú)立實(shí)證研究進(jìn)行了量化元分析，主要從發(fā)表偏倚檢驗(yàn)、平均效應(yīng)量的效應(yīng)量的調(diào)節(jié)特征等方面進(jìn)行了分析。

（一）智能導(dǎo)師系統(tǒng)對學(xué)業(yè)成就具有中等的正向促進(jìn)作用

智能導(dǎo)師系統(tǒng)與學(xué)業(yè)成就之間呈顯著的積極相關(guān)，超過九成的獨(dú)立研究都發(fā)現(xiàn)了顯著的正向促進(jìn)效應(yīng)，合并后的平均效應(yīng)量達(dá)到0.492，即能將第50個(gè)百分位的學(xué)生提高至第68個(gè)百分位，且95%研究的效應(yīng)量處在0.408與0.577之間。這與瑪?shù)热耍∕a， et al.， 2014）的研究結(jié)論一致，但與其他元分析（VanLehn， 2011; Kulik & Fletcher， 2016; WWC， 2016; Steenbergen-Hu & Cooper， 2013）的結(jié)果有較小的差異。范萊恩（VanLehn， 2011）區(qū)別了按步輔導(dǎo)（Step-based tutoring）與按分步輔導(dǎo)（Substep-based tutoring）等兩類智能導(dǎo)師系統(tǒng)，前者為單個(gè)問題提供一個(gè)總體線索與解釋，而后者則提供更精細(xì)的腳手架，將解決一個(gè)問題的相關(guān)提示細(xì)化為多個(gè)微提示，按照多個(gè)分步驟依次反饋。其中，按步輔導(dǎo)提高了0.76個(gè)標(biāo)準(zhǔn)差，而按分步輔導(dǎo)提高了0.4個(gè)標(biāo)準(zhǔn)差，而本研究并未加以區(qū)分，本研究效應(yīng)量處在兩者之間是合理的。在庫里克與弗萊徹（Kulik & Fletcher， 2016）的研究中，通過本地測試獲取的效應(yīng)量占比是82%，明顯高于本研究的66%，根據(jù)測試類型的調(diào)節(jié)作用可推斷，其效應(yīng)量比本研究高一些就不奇怪了。美國教育部有效教育策略資料中心（WWC， 2016）考察了認(rèn)知輔導(dǎo)系統(tǒng)大規(guī)模應(yīng)用的有效性，其中測試類型都是標(biāo)準(zhǔn)化考試。本研究發(fā)現(xiàn)大樣本和標(biāo)準(zhǔn)化考試都對效應(yīng)量具有顯著的消極調(diào)節(jié)作用。因此，其結(jié)果明顯低于本研究。至于斯滕貝格·胡與庫珀（Steenbergen-Hu & Cooper， 2013）的效應(yīng)量更低的原因，庫里克與弗萊徹（Kulik & Fletcher， 2016）給出了相應(yīng)解釋，因?yàn)樗麄儗⒅悄軐?dǎo)師系統(tǒng)的概念寬泛化（包括了部分CAI），納入標(biāo)準(zhǔn)要求低，部分研究甚至沒有恰當(dāng)對照組、無雙組前測等，這些解釋在本研究中同樣適用。所以，0.5左右的效應(yīng)量比較符合研究現(xiàn)實(shí)。

（二）測試類型、持續(xù)時(shí)間與樣本量對平均效應(yīng)量具有顯著調(diào)節(jié)作用

在所有特征類型上，ITSs對學(xué)業(yè)成就提升都具有顯著的積極效應(yīng)，而且知識基礎(chǔ)、教育層次、學(xué)科、產(chǎn)品、樣本量、持續(xù)時(shí)間和測試類型七個(gè)特征對相應(yīng)的效應(yīng)量均有顯著影響。經(jīng)過線性回歸分析發(fā)現(xiàn)：測試類型、實(shí)驗(yàn)持續(xù)時(shí)間與樣本量對平均效應(yīng)量具有顯著的調(diào)節(jié)作用。

測試類型對平均效應(yīng)量具有顯著的調(diào)節(jié)作用，在之前研究（Rosenshine & Meister， 1994; Koedinger， et al.， 1997; 劉珊珊，楊向東， 2015; Kulik & Fletcher， 2016）中均有此發(fā)現(xiàn)。在本地開發(fā)的測試中發(fā)現(xiàn)了更積極的效應(yīng)，因?yàn)檫@類測試內(nèi)容與教學(xué)目標(biāo)和學(xué)習(xí)內(nèi)容更為一致，而標(biāo)準(zhǔn)化測試題目經(jīng)常是第三方開發(fā)的，考察內(nèi)容更為寬泛但信效度更高，當(dāng)然這兩種測試類型都具有相應(yīng)的參考價(jià)值，同時(shí)納入兩者可能更具有客觀意義。

不同實(shí)驗(yàn)持續(xù)時(shí)間對平均效應(yīng)量的影響都是積極、顯著的，但實(shí)驗(yàn)持續(xù)的時(shí)間越長，智能導(dǎo)師系統(tǒng)越能提升學(xué)業(yè)成就，其背后可能是學(xué)生的技術(shù)接受程度、教師的實(shí)施充分性、組織干預(yù)的嫻熟程度、教學(xué)策略的恰當(dāng)性等多方面的綜合優(yōu)化。納澤（Naser， 2009）在小型實(shí)驗(yàn)中發(fā)現(xiàn)，第二階段比第一階段的實(shí)驗(yàn)效應(yīng)量要高出0.65。潘恩等人（Pane， et al.， 2013）在大規(guī)模干預(yù)中也發(fā)現(xiàn)了類似的顯著差異，第一年實(shí)驗(yàn)的平均效應(yīng)量是-0.06，而第二年另一批學(xué)生的平均效應(yīng)量是0.20，且達(dá)到顯著的積極影響。此外，部分其他相關(guān)研究（Koedinger & Anderson， 1993; VanLehn， et al.， 2005; Le， et al.， 2009）中也有類似的發(fā)現(xiàn)。

不同樣本量對效應(yīng)量的影響也都是積極顯著的。然而，樣本量越大，平均效應(yīng)量卻顯著地減少，尤其是以200作為樣本量的臨界點(diǎn)，斯滕貝格·胡與庫珀（Steenbergen-Hu & Cooper， 2013）也發(fā)現(xiàn)了類似的樣本臨界點(diǎn)效應(yīng)。樣本量超過200以后，一個(gè)教師同時(shí)在不同班級授課這個(gè)重要實(shí)驗(yàn)條件一般是很難得到保證的，可能會對實(shí)驗(yàn)結(jié)果有所影響。

六、建議與展望

本研究的數(shù)據(jù)涵蓋了28年間各教育層面在多個(gè)學(xué)科領(lǐng)域執(zhí)行的58項(xiàng)獨(dú)立實(shí)證研究，發(fā)現(xiàn)智能導(dǎo)師系統(tǒng)對學(xué)業(yè)成就提升的平均效應(yīng)量為0.492，95%處于0.408至0.577之間，具有中等的正向促進(jìn)作用，與其他相關(guān)元分析取得了較為一致的結(jié)果。此外，本研究提取了對平均效應(yīng)量變異影響較大的三個(gè)因素，但總體上的解釋程度比較有限，可能是由于對研究條件的考慮不夠充分，包括教師能力水平、學(xué)生技術(shù)嫻熟程度、智能導(dǎo)師系統(tǒng)的技術(shù)功能、學(xué)習(xí)目標(biāo)分類、教學(xué)法等。因此，一方面，繼續(xù)增加與跟蹤對智能導(dǎo)師系統(tǒng)、自適應(yīng)學(xué)習(xí)系統(tǒng)等先進(jìn)輔導(dǎo)技術(shù)的實(shí)證考察，擴(kuò)大實(shí)驗(yàn)研究規(guī)模與影響。特別是國內(nèi)教育技術(shù)領(lǐng)域在實(shí)驗(yàn)研究上尚未形成主流，如針對智能導(dǎo)師系統(tǒng)就未發(fā)現(xiàn)一例符合要求的實(shí)驗(yàn)或準(zhǔn)實(shí)驗(yàn)。如今在教育信息化事業(yè)大發(fā)展的背景下教育技術(shù)學(xué)科建設(shè)卻有式微的風(fēng)險(xiǎn)，將循證作為學(xué)科建設(shè)的核心研究方法之一，會顯著提升學(xué)科的科學(xué)性，在國家教育信息化事業(yè)發(fā)展中增加獨(dú)特的“實(shí)證”話語權(quán)，在教育現(xiàn)代化進(jìn)程中發(fā)揮指向性、引領(lǐng)性與變革性的突出作用。另一方面，規(guī)范教育技術(shù)的實(shí)證研究方法，增強(qiáng)研究設(shè)計(jì)的嚴(yán)謹(jǐn)性，形成完整、高質(zhì)量的研究證據(jù)鏈，讓政策制定者與教育實(shí)踐者知道“某項(xiàng)教育技術(shù)在什么情況下能達(dá)到怎么樣的提升成效”。前文提到的美國教育部有效教育策略資料中心（WWC， 2017）建立了實(shí)證教育研究的干預(yù)規(guī)范、報(bào)告標(biāo)準(zhǔn)及其實(shí)踐指南，為研究者、教育者、政策制定者與學(xué)習(xí)者提供了一個(gè)可持續(xù)發(fā)展的證據(jù)積累框架，值得借鑒。

雖然ITSs比CAI的效應(yīng)量并沒有高出多少，但為當(dāng)下智能化、精準(zhǔn)化的自適應(yīng)學(xué)習(xí)系統(tǒng)的發(fā)展與應(yīng)用前景提供了堅(jiān)實(shí)的證據(jù)基礎(chǔ)。隨著深度學(xué)習(xí)、語音識別、大數(shù)據(jù)等技術(shù)的不斷發(fā)展，智能化將引發(fā)教與學(xué)發(fā)生更加深刻的變革（王珠珠， 2018）。期待在新一代人工智能支持下的自適應(yīng)學(xué)習(xí)系統(tǒng)能對學(xué)業(yè)成就產(chǎn)生“兩個(gè)西格瑪”的改進(jìn)效應(yīng)，讓孔子古老的因材施教的理想插上教育技術(shù)的翅膀，真正實(shí)現(xiàn)大規(guī)模的公平而優(yōu)質(zhì)的個(gè)性化學(xué)習(xí)。

[參考文獻(xiàn)]

（*表示入選元分析的獨(dú)立實(shí)證研究）

李佳麗. 2017. 不同類型影子教育對小學(xué)生學(xué)業(yè)成績的影響——及其對教育不均等的啟示[J]. 教育科學(xué)（5）：16-25.

李玉，柴陽麗，閆寒冰. 2018. 思維導(dǎo)圖對學(xué)生學(xué)業(yè)成就的影響效應(yīng)——近十年國際思維導(dǎo)圖教育應(yīng)用的元分析[J]. 中國遠(yuǎn)程教育（1）：16-27，28.

劉珊珊，楊向東. 2015. 課外輔導(dǎo)對學(xué)生學(xué)業(yè)成績影響效應(yīng)的元分析[J]. 教育發(fā)展研究（22）：55-64.

潘云鶴. 2018. 人工智能2.0與教育的發(fā)展[J]. 中國遠(yuǎn)程教育（5）：7-10，46，81.

王珠珠. 2018 . 教育信息化2.0：核心要義與實(shí)施建議[J]. 中國遠(yuǎn)程教育（7）：5-8.

張志禎，張玲玲，羅瓊菱子，鄭葳. 2019. 人工智能教育應(yīng)用的實(shí)然分析：教學(xué)自動化的方法與限度[J]. 中國遠(yuǎn)程教育（3）：1-13，92.

*Albacete， P. L.， & Vanlehn， K. A. （2000）. Evaluating the Effectiveness of a Cognitive Tutor for Fundamental Physics Concepts. Proceedings of the 22nd Annual Meeting of the Cognitive Science Society， 25-30.

Anderson， J. R.， Corbett， A. T.， Koedinger， K. R.， & Pelletier， R. （1995）. Cognitive tutors： lessons learned. Journal of the Learning Sciences， 4（2）， 167-207.

*Arbuckle， W. J. （2005）. Conceptual understanding in a computer assisted Algebra I classroom. ProQuest Dissertations Publishing， University of Oklahoma， Norman.

*Arnott， E.， Hastings， P.， & Allbritton， D. （2008）. Research methods tutor： evaluation of a dialogue-based tutoring system in the classroom. Behavior Research Methods， 40（3）， 694-698.

*Arroyo， I.， Royer， J. M.， & Woolf， B. P. （2011）. Using an intelligent tutor and math fluency training to improve math performance. International Journal of Artificial Intelligence in Education， 21（1）， 135-152.

*Baghaei， N.， Mitrovic， A.， & Irwin， W. （2007）. Supporting collaborative learning and problem-solving in a constraint-based CSCL environment for UML class diagrams. International Journal of Computer-Supported Collaborative Learning， 2（2）， 159-190.

Bayraktar， S. （2001）. A Meta-analysis of the Effectiveness of Computer-Assisted Instruction in Science Education， Journal of Research on Technology in Education， 34（2）， 173-188.

*Beal， C. R.， Arroyo， I. M.， Cohen， P. R.， & Woolf， B. P. （2010）. Evaluation of animal watch： an intelligent tutoring system for arithmetic and fractions. Journal of Interactive Online Learning， 9（1）， 64-77.

Blok， H.， Oostdam， R.， Otter， M. E.， & Overmaat， M. （2002）. Computer-assisted instruction in support of beginning reading instruction： a review. Review of Educational Research， 72（1）， 101-130.

Bloom， B. S. （1984）. The 2 sigma problem： The search for methods of group instruction as effective as one-to-one tutoring. Educational Researcher， 13（6）， 4-16.

Burns， P. （1981）. A quantitative synthesis of research findings relative to the pedagogical effectiveness of computer-assisted instruction in elementary and secondary schools. Dissertation Abstracts International， 42， 2946A.

*Cabalo， J. V.， Jaciw， A.， & Vu， M. （2007）. Comparative effectiveness of Carnegie Learnings Cognitive Tutor Algebra I curriculum： A report of a randomized experiment in the Maui School District. Palo Alto， CA： Empirical Education， Inc.

*Campuzano， L.， Dynarski， M.， Agodini， R.， & Rall， K. （2009）. Effectiveness of reading and mathematics software products： Findings from two student cohorts. Washington， DC： U.S. Department of Education， Institute of Education Sciences.

*Carlson， P. A.， & Miller， T. M. （1996）. Beyond word processing： Using an interactive learning environment to teach writing. Brooks AFB， TX： Human Resources Directorate， Technical Training Research Division.

*Chambers， B.， Abrami， P.， Tucker， B.， Slavin， R. E.， Madden， N. A.， & Cheung， A.， et al. （2008）. Computer-assisted tutoring in success for all： reading outcomes for first graders. Journal of Research on Educational Effectiveness， 1（2）， 120-137.

Cohen， P. A.， Kulik， J. A.， & Kulik， C. L. C. （1982）. Educational outcomes of tutoring： A meta-analysis of findings. American Educational Research Journal， 19（2）， 237–248.

Cohen， J. （1992）. A power primer. Psychological Bulletin， 112（1）： 155-159.

*Dolenc， K.， & Abersek， B. （2015）. TECH8 intelligent and adaptive e-learning system： Integration into Technology and Science classrooms in lower secondary schools. Computers & Education， 82， 354-365.

*Fletcher， J. D. （2011）. DARPA Education Dominance Program： April 2010 and November 2010 Digital Tutor assessments. Alexandria， VA： Institute for Defense Analysis.

*Fossati， D.， Eugenio， B. D.， Brown， C.， & Ohlsson， S. （2008）. Learning Linked Lists： Experiments with the iList System. International Conference on Intelligent Tutoring Systems， 5091， 80-89.

*Fossati， D.， Di Eugenio， B.， Ohlsson， S.， Brown， C.， Chen， L.， & Cosejo， D. （2009）. I learn from you， you learn from me： How to make iList learn from students. In： Dimitrova， V.， Mizoguchi， R.， Du Boulay， B.， & Graesser， A. （Eds.）， Proceedings of The 14th International Conference on Artificial Intelligence in Education （pp.186-195） IOS Press， Brighton.

Glass， G. V.， McGaw， B.， & Smith， M. L. （1981）. Meta-analysis in social research. Beverly Hills， CA： Sage.

*Graesser， A. C.， Jackson， G. T.， Mathews， E. C.， Mitchell， H. H.， Olney， A.， Ventura， M.， & Chipman， P. （2003a）. Why/AutoTutor： A test of learning gains from a physics tutor with natural language dialog. Proceedings of the Twenty Fifth Annual Conference of the Cognitive Science Society， 1-6.

*Graesser， A. C.， Moreno， K.， Marineau， J.， Adcock， A.， Olney， A.， Person， N.， & the Tutoring Research Group （2003b）. AutoTutor improves deep learning of computer literacy： Is it the dialogue or the talking head？ Proceedings of artificial intelligence in education， 47-54.

*Graff， M.， Mayer， P.， & Lebens， M. （2008）. Evaluating a web based intelligent tutoring system for mathematics at German lower secondary schools. Education and Information Technologies， 13（3）， 221-230.

Greenland， S. （1994）. Invited commentary： a critical look at some popular meta-analytic methods. Am J Epidemiol， 140（3）， 290-296.

*Grubi?ic， A.， Stankov， S.， Rosic， M.， & ?itko， B. （2009）. Controlled experiment replication in evaluation of e-learning systems educational influence. Computers & Education， 53， 591-602.

*Hastings， P.， Arnott-Hill， E.， & Allbritton， D. （2010）. Squeezing out gaming behavior in a dialog-based ITS. In V. Aleven， H. Kay， & J. Mostow （Eds.）， Intelligent tutoring systems （pp. 204-213）， Berlin， Germany： Springer-Verlag.

Hedges， L. V.， & Olkin， I. （1985）. Statistical methods for meta-analysis. New York， NY： Academic Press.

Ji， X. R.， Beerwinkle， A.， Wijekumar， K. K.， Lei， P.， Malatesha Joshi， R.， & Zhang， S. （2018）. Using latent transition analysis to identify effects of an intelligent tutoring system on reading comprehension of seventh-grade students. Reading and Writing， 31（9）， 2095-2113.

*Johnson， B. G.， Phillips， F.， & Chase， L. G. （2009）. An intelligent tutoring system for the accounting cycle： enhancing textbook homework with artificial intelligence. Journal of Accounting Education， 27（1）， 30-39.

Koedinger， K. R.， & Anderson， J. R. （1993）. Effective use of intelligent software in high school math classrooms. Proceedings of the World Conference on AI in Education， 1993， 241-248.

*Koedinger， K. R.， Aleven， V.， Heffernan， N.， Mclaren， B.， & Hockenberry， M. （2004）. Opening the Door to Non-programmers： Authoring Intelligent Tutor Behavior by Demonstration. Intelligent Tutoring Systems. Springer Berlin Heidelberg.

*Koedinger， K. R.， Anderson， J. R.， Hadley， W. H.， & Mark， M. A. （1997）. Intelligent tutoring goes to school in the big city. International Journal of Artificial Intelligence in Education， 8（1）， 30-43.

Kroesbergen， E. H.， & Van Luit， J. E. H. （2003）. Mathematics interventions for children with special educational needs： a meta-analysis. Remedial and Special Education， 24（2）， 97-114.

Kulik， J. A.， & Fletcher， J. D. （2016）. Effectiveness of intelligent tutoring systems： a meta-analytic review. Review of Educational Research， 86（1）， 42-78

Kulik， C. C.， & Kulik， J. A. （1991）. Effectiveness of computer-based instruction： An updated analysis. Computers in Human Behavior， 7， 75-94.

Kulik， J. A. （1994）. Meta-analytic studies of findings on computer-based instruction. In E. L. Baker & H. F. ONeil Jr. （Eds.）， Technology assessment in education and training （pp. 9-33）， Hillsdale， NJ： Erlbaum.

Kulik， J. A. （2002）. School mathematics and science programs benefit from instruction technology （Info Brief NSF 03-301）. Washington， DC： National Science Foundation.

*Lane， H. C.， & VanLehn， K. （2005）. Teaching the tacit knowledge of programming to novices with natural language tutoring. Computer Science Education， 15（3）， 183-201.

*Lanzilotti， R.， & Roselli， T. （2007）. An experimental evaluation of logiocando， an intelligent tutoring hypermedia system. International Journal of Artificial Intelligence in Education， 17（1）， 41-56.

*Le， N. T.， Menzel， W.， & Pinkwart， N. （2009）. Evaluation of a constraint-based homework assistance system for logic programming. Proceedings of the 6th International Conference on Web-based Learning， Edinburgh， UK， 367-379.

*Long， Y.， & Aleven， V. （2017）. Enhancing learning outcomes through self-regulated learning support with an open learner model. User Modeling and User-Adapted Interaction， 27（1）， 55-88.

Ma， W.， Adesope， O. O.， Nesbit， J. C.， & Liu， Q. （2014）. Intelligent tutoring systems and learning outcomes： A meta-analysis. Journal of Educational Psychology， 106（4）， 901-918.

*McCarthy， K. S.， Likens， A. D.， Johnson， A. M.， Guerrero， T. A.， & Mcnamara， D. S. （2018）. Metacognitive overload： positive and negative effects of metacognitive prompts in an intelligent tutoring system. International Journal of Artificial Intelligence in Education， 28（3）， 420-438.

*Mello， S.， Olney， A.， Williams， C.， & Hays， P. （2012）. Gaze tutor： a gaze-reactive intelligent tutoring system. International Journal of Human-Computer Studies， 70（5）， 377-398.

*Mendicino， M.， & Heffernan， N. （2007）. Comparing the learning from intelligent tutoring systems， non-intelligent computer-based versions， and traditional classroom instruction. ProQuest Dissertations Publishing， West Virginia University， Morgantown.

*Mendicino， M.， Razzaq， L.， & Heffernan， N. T. （2009）. A comparison of traditional homework to computer-supported homework. Journal of Research on Technology in Education， 41（3）， 331-359.

*Mills-Tettey， G. A.， Mostow， J.， Dias， M. B.， Sweet， T. M.， Belousov， S. M.， & Dias， M. F.， et al. （2009）. Improving child literacy in Africa： experiments with an automated reading tutor. International Conference on Information and Communication Technologies and Development， 6， 129-138.

*Mitrovic， A.， Martin， B.， Suraweera， P.， Zakharov， K.， Milik， N.， & Holland， J. （2009）. Aspire： an authoring system and deployment environment for constraint-based tutors. International Journal of Artificial Intelligence in Education， 19（2）， 155-188.

*Mohammadzadeh， A.， & Sarkhosh， M. （2018）. The Effects of Self-Regulatory Learning through Computer-Assisted Intelligent Tutoring System on the Improvement of EFL Learner Speaking Ability. International Journal of Instruction， 11（2）， 167-184.

*Morgan， P.， & Ritter， S. （2002）. An experimental study of the effects of Cognitive Tutor Alegbra I on student knowledge and attitude. Available from Carnegie Learning， Inc.

*Mostow， J.， Nelson-Taylor， J.， & Beck， J. E. （2013）. Computer-guided oral reading versus independent practice： comparison of sustained silent reading to an automated reading tutor that listens. Journal of Educational Computing Research， 49（2）， 249-276.

*Naser， S. （2009）. Evaluating the effectiveness of the CPP-Tutor， an intelligent tutoring system for students learning to program in C++. Journal of Applied Sciences Research， 5（1）， 109-114.

*Nguyen， D. H. C.， Arch-Int， N.， & Arch-Int， S. （2018）. FUSE： A Fuzzy-Semantic Framework for Personalizing Learning Recommendations. International Journal of Information Technology & Decision Making， 17（4）， 1173-1201.

*Nye， B. D.， Pavlik， P. I.， Windsor， A.， Olney， A. M.， Hajeer， M.， & Hu， X. （2018）. Skope-IT： overlaying natural language tutoring on an adaptive learning system for mathematics. International Journal of STEM Education， 5（1）， 1-20.

Pane， J. F.， Griffin， B. A.， McCaffrey， D. F.， & Karam， R. （2013）. Effectiveness of cognitive tutor algebra I at scale. Educational Evaluation and Policy Analysis， 36（2）， 127-144.

*Parvez， S. M. （2008）. A pedagogical framework for integrating individual learning style into an intelligent tutoring system. [Unpublished manuscript]， Lehigh University， Bethlehem.

*Pek， P. K.， & Poh， K. L. （2005）. Making decisions in an intelligent tutoring system. International Journal of Information Technology & Decision Making， 4（02）， 207-233.

*Pinkwart， N.， Ashley， K.， Lynch， C.， & Aleven， V. （2009）. Evaluating an intelligent tutoring system for making legal arguments with hypotheticals. International Journal of Artificial Intelligence in Education， 19（4）， 401-424.

*Poulsen， R. （2004）. Tutoring bilingual students with an automated reading tutor that listens： results of a two-month pilot study. Journal of Educational Computing Research， 36（2）， 191-221.

Ritter， G. W.， Barnett， J. H.， Denny， G. S.， & Albin， G. R. （2009）. The effectiveness of volunteer tutoring programs for elementary and middle school students： A meta-analysis. Review of Educational Research， 79（1）， 3-38.

*Ritter， S.， Kulikowich， J.， Lei， P. W.， Mcguire， C. L.， & Morgan， P. （2007）. What Evidence Matters？ A randomized field trial of Cognitive Tutor Algebra I. International Conference on Supporting Learning Flow Through Integrative Technologies， 13-20.

Rosenshine， B.， & Meister， C. （1994）. Reciprocal teaching： A review of the research. Review of Educational Research， 64（4）， 479–530.

*Serrano， M. A.， Vidal-Abarca， E.， & Ferrer， A. （2018）. Teaching self-regulation strategies via an intelligent tutoring system （TuinLECweb）： Effects for low-skilled comprehenders. Journal of Computer Assisted Learning， 34（5）， 515-525.

*Shute， V. J.， & Glaser， R. （1990）. A large-scale evaluation of an intelligent discovery world： Smithtown. Interactive Learning Environments， 1（1）， 51-77.

*Smith， J. E. （2001）. The effect of the Carnegie Algebra Tutor on student achievement and attitude in introductory high school algebra. ProQuest Dissertations Publishing， Virginia Polytechnic Institute and State University， Blacksburg.

Sosa， G. W.， Berger， D. E.， Saw， A. T.， & Mary， J. C. （2011）. Effectiveness of computer-assisted instruction in statistics： A meta-analysis. Review of Educational Research， 81（1）， 97-127.

*Stankov， S.， Glavinic， V.， & Grubi?ic， A. （2004）. What is our effect size： Evaluating the educational influence of a web-based intelligent authoring shell？ Eighth IEEE International Conference on Intelligent Engineering Systems. 545-550.

Steenbergen-Hu， S.， & Cooper， H. （2013）. A meta-analysis of the effectiveness of intelligent tutoring systems on K-12 students mathematical learning. Journal of Educational Psychology， 105（4）， 970–987.

Steenbergen-Hu， S.， & Cooper， H. （2014）. A meta-analysis of the effectiveness of intelligent tutoring systems on college students academic learning. Journal of Educational Psychology， 106（2）， 331-347.

*Stylianou， D.A. & Shapiro， L. （2002）. Revitalizing Algebra： the effect of the use of a cognitive tutor in a remedial course， Journal of Educational Media， 27（3）， 147-171.

*Suraweera， P.， & Mitrovic， A. （2002）. KERMIT： A constraint-based tutor for database modeling. 6th International Conference of ITSs， 377-387.

*Suraweera， P.， & Mitrovic， A. （2004）. An Intelligent Tutoring System for Entity Relationship Modelling. International Journal of Artificial Intelligence in Education archive， 14， 375-417.

Tamim， R. M.， Bernard， R. M.， Borokhovski， E.， Abrami， P. C.， & Schmid， R. F. （2011）. What forty years of research says about the impact of technology on learning a second-order meta-analysis and validation study. Review of Educational Research， 81（1）， 4-28.

VanLehn， K. （2006）. The behavior of tutoring systems. International Journal of Artificial Intelligence in Education， 16（3）： 227-265.

VanLehn， K. （2011）. The relative effectiveness of human tutoring， intelligent tutoring systems， and other tutoring systems. Educational Psychologist， 46（4）， 197-221.

*Vanlehn， K.， Lynch， C.， Schulze， K.， Shapiro， J. A.， Shelby， R.， & Taylor， L.， et al. （2005）. The Andes physics tutoring system： five years of evaluations. International Journal of Artificial Intelligence in Education， 15（3）， 147-204.

*Vidal-Abarca， & E.， Gilabert， R.， et al. （2014）. TuinLEC， an intelligent tutoring system to improve reading literacy skills. Journal for the Study of Education and Development， 37（1）， 25-56.

*Weber， W.A. （2003， August 10）. An Evaluation of the Reasoning Mind Program at Hogg Middle School. Retrieved December 23， 2018， from http：//www.reasoningmind.org/wp-content/uploads/2013/11/2003_Pilot_Independent_Evaluation.pdf

What Works Clearinghouse. （2016）. WWC Intervention Report for Cognitive Tutor. Retrieved December 12， 2018， from http：//ies.ed.gov/ncee/wwc/Intervention/818

What Works Clearinghouse. （2017）. WWC Version 4.0 Standards Handbook. Retrieved December 20， 2018， http：//ies.ed.gov/ncee/wwc/multimedia/40

*Wijekumar， K. K.， Meyer， B. J.， & Lei， P. （2012）. Large-scale randomized controlled trial with 4th graders using intelligent tutoring of the structure strategy to improve nonfiction reading comprehension. Educational Technology Research and Development， 60（6）， 987-1013.

*Woo， C. W.， Evens， M. W.， Freedman， R.， Glass， M.， Shim， L. S.， & Zhang， Y.， et al. （2006）. An intelligent tutoring system that generates a natural language dialogue using dynamic multi-level planning. Artificial Intelligence in Medicine， 38（1）， 25-46.

Woolf， B. P. （2009）. Building intelligent interactive tutors： student-centered strategies for revolutionizing e-learning. Telearn， 59（5）， 337-379.

*Zafar， A.， & Albidewi， I. （2015）. Evaluation study of eLGuide： a framework for adaptive e-learning. Computer Applications in Engineering Education， 23（4）， 542-555.

*Zakharov， K.， Mitrovic， & A.， Ohlsson， S. （2005）. Feedback Micro-engineering in EER-Tutor. International Conference on Artificial Intelligence in Education， IOS Press， 718-725.

責(zé)任編輯韓世梅

国产日韩欧美一区二区三区三州_亚洲少妇熟女av_久久久久亚洲av国产精品_波多野结衣网站一区二区_亚洲欧美色片在线91_国产亚洲精品精品国产优播av_日本一区二区三区波多野结衣 _久久国产av不卡

智能導(dǎo)師系統(tǒng)對學(xué)業(yè)成就的影響研究： 量化元分析的視角

智能導(dǎo)師系統(tǒng)對學(xué)業(yè)成就的影響研究：量化元分析的視角