劉莉
And, as a new paper proposes, should the data-providers unionise? 還有一篇新論文提議,數(shù)據(jù)提供者是否應(yīng)當(dāng)成立工會(huì)?
You have multiple jobs, whether you know it or not. Most begin first thing in the morning, when you pick up your phone and begin generating the data that make up Silicon Valley’s most important resource. That, at least, is how we ought to think about the role of data-creation in the economy, according to a fascinating new economics paper. We are all digital labourers, helping make possible the fortunes generated by firms like Google and Facebook, the authors argue. If the economy is to function properly in the future—and if a crisis of technological unemployment is to be avoided—we must take account of this, and change the relationship between big internet companies and their users.
Artificial intelligence (AI) is getting better all the time, and stands poised to transform a host of industries, say the authors (Imanol Arrieta Ibarra and Diego Jiménez Hernández, of Stanford University, Leonard Goff, of Columbia University, and Jaron Lanier and Glen Weyl, of Microsoft). But, in order to learn to drive a car or recognise a face, the algorithms that make clever machines tick must usually be trained on massive amounts of data. Internet firms gather these data from users every time they click on a Google search result, say, or issue a command to Alexa. They also hoover up valuable data from users through the use of tools like reCAPTCHA, which ask visitors to solve problems that are easy for humans but hard for AIs, such as deciphering text from books that machines are unable to parse. That does not just screen out malicious bots, but also helps digitise books. People “pay” for useful free services by providing firms with the data they crave.
These data become part of the firms’ capital, and, as such, a fearsome source of competitive advantage. Would-be startups that might challenge internet giants cannot train their AIs without access to the data only those giants possess. Their best hope is often to be acquired by those very same titans, adding to the problem of uncompetitive markets.
That, for now, AI’s contributions to productivity growth are small, the authors say, is partly because of the free-data model, which limits the quality of data gathered. Firms trying to develop useful applications for AI must hope that the data they have are sufficient, or come up with ways to coax users into providing them with better information at no cost. For example, they must pester random people—like those blur-deciphering visitors to websites—into labelling data, and hope that in their annoyance and haste they do not make mistakes.
Even so, as AI improves, the amount of work made vulnerable to displacement by technology grows, and ever more of the value generated in the economy accrues to profitable firms rather than workers. As the authors point out, the share of GDP paid out to workers in wages and salaries—once thought to be relatively stable—has already been declining over the past few decades.
To tackle these problems, they have a radical proposal. Rather than being regarded as capital, data should be treated as labour—and, more specifically, regarded as the property of those who generate such information, unless they agree to provide it to firms in exchange for payment. In such a world, user data might be sold multiple times, to multiple firms, reducing the extent to which data sets serve as barriers to entry. Payments to users for their data would help spread the wealth generated by AI. Firms could also potentially generate better data by paying. Rather than guess what a person is up to as they wander around a shopping centre, for example, firms could ask individuals to share information on which shops were visited and which items were viewed, in exchange for payment. Perhaps most ambitiously, the authors muse that data labour could come to be seen as useful work, conferring the same sort of dignity as paid employment: a desirable side-effect in a possible future of mass automation.
The authors’ ideas need fleshing out; their paper, thought-provoking though it is, runs to only five pages. Parts of the envisioned scheme seem impractical. Would people really be interested in taking the time to describe their morning routine or office habits without a substantial monetary inducement (and would their data be valuable enough for firms to pay a substantial amount)? Might not such systems attract data mercenaries, spamming firms with useless junk data simply to make a quick buck?
Nothing to use but your brains
Still, the paper contains essential insights which should frame discussion of data’s role in the economy. One concerns the imbalance of power in the market for data. That stems partly from concentration among big internet firms. But it is also because, though data may be extremely valuable in aggregate, an individual’s personal data typically are not. For one Facebook user to threaten to deprive Facebook of his data is no threat at all. So effective negotiation with internet firms might require collective action: and the formation, perhaps, of a “data-labour union”.
This might have drawbacks. A union might demand too much in compensation for data, for example, impairing the development of useful AIs. It might make all user data freely available and extract compensation by demanding a share of firms’ profits; that would rule out the pay-for-data labour model the authors see as vital to improving data quality. Still, a data union holds potential as a way of solidifying worker power at a time when conventional unions struggle to remain relevant.
Most important, the authors’ proposal puts front and centre the collective nature of value in an AI world. Each person becomes something like an oil well, pumping out the fuel that makes the digital economy run. Both fairness and efficiency demand that the distribution of income generated by that fuel should be shared more evenly, according to our contributions. The tricky part is working out how.
不論你知道與否,其實(shí)你正身兼數(shù)職。大多數(shù)人早晨就開(kāi)工了——你拿起手機(jī)開(kāi)始產(chǎn)生數(shù)據(jù),構(gòu)成了硅谷最重要的信息來(lái)源。一篇引人入勝的經(jīng)濟(jì)學(xué)新論文提出,我們至少應(yīng)當(dāng)從這個(gè)角度去思考數(shù)據(jù)創(chuàng)造在經(jīng)濟(jì)學(xué)當(dāng)中的角色。作者們認(rèn)為,我們所有人都是數(shù)字勞工,為谷歌、臉書(shū)之類(lèi)的公司制造財(cái)富。要想讓未來(lái)的經(jīng)濟(jì)正常運(yùn)轉(zhuǎn),要想避免技術(shù)帶來(lái)的失業(yè)危機(jī),我們就必須考慮到這一點(diǎn),改變大型互聯(lián)網(wǎng)公司與其用戶(hù)的關(guān)系。
人工智能(AI)日新月異,時(shí)刻準(zhǔn)備著讓一系列行業(yè)轉(zhuǎn)型換代,論文的作者們(來(lái)自斯坦福大學(xué)的伊馬諾爾·阿列塔·伊瓦拉與迭戈·希門(mén)尼斯·埃爾南德斯,來(lái)自哥倫比亞大學(xué)的倫納德·戈夫,來(lái)自微軟公司的雅龍·拉尼爾與格倫·韋爾)表示。不過(guò),為了學(xué)習(xí)汽車(chē)駕駛和人臉識(shí)別,智慧機(jī)器所用的算法通常需要先在海量數(shù)據(jù)中訓(xùn)練運(yùn)行。互聯(lián)網(wǎng)公司的數(shù)據(jù),來(lái)源于用戶(hù)對(duì)谷歌搜索的每一次點(diǎn)擊、對(duì)亞馬遜語(yǔ)音助手Alexa發(fā)出的每一條指令。他們還會(huì)使用reCAPTCHA之類(lèi)的工具,從用戶(hù)身上抓取有價(jià)值的數(shù)據(jù)——該工具要求訪客去解決對(duì)人類(lèi)很容易但AI卻難以勝任的問(wèn)題,例如對(duì)書(shū)中的文本進(jìn)行句法分析。這樣做不僅能篩除惡意自動(dòng)程序,還能將紙本圖書(shū)電子化。人們向互聯(lián)網(wǎng)公司提供他們渴求的數(shù)據(jù),從而為免費(fèi)又好用的服務(wù)“買(mǎi)單”。
這些數(shù)據(jù)不但成為了互聯(lián)網(wǎng)公司的資本,更可以帶來(lái)驚人的競(jìng)爭(zhēng)優(yōu)勢(shì)。躍躍欲試的創(chuàng)業(yè)公司也許會(huì)向互聯(lián)網(wǎng)巨頭發(fā)起挑戰(zhàn),但卻必須借助巨頭手中的數(shù)據(jù)才能訓(xùn)練自家AI。他們最好的結(jié)局往往是被巨頭同行收購(gòu),讓競(jìng)爭(zhēng)本就不夠充分的市場(chǎng)雪上加霜。
論文作者們認(rèn)為,目前AI對(duì)生產(chǎn)力增長(zhǎng)的貢獻(xiàn)不大,部分原因在于免費(fèi)數(shù)據(jù)模式限制了數(shù)據(jù)采集的質(zhì)量。若要開(kāi)發(fā)實(shí)用的AI應(yīng)用,互聯(lián)網(wǎng)公司必須寄希望于充足的數(shù)據(jù),或者想辦法誘導(dǎo)用戶(hù)無(wú)償向其提供更優(yōu)質(zhì)的信息。例如,他們必須纏著隨機(jī)人群去給數(shù)據(jù)貼標(biāo)簽,比如那些要識(shí)別模糊驗(yàn)證碼的訪客,而且還要希望他們?cè)跓_和匆忙中不出錯(cuò)。
即便如此,隨著AI的改進(jìn),越來(lái)越多的工作會(huì)因技術(shù)進(jìn)步而被取代,所產(chǎn)生的經(jīng)濟(jì)價(jià)值也會(huì)更多地落入贏利公司而非工人手中。作者們指出,薪水支出所占的GDP份額曾被認(rèn)為是相對(duì)穩(wěn)定的,但過(guò)去幾十年間卻每況愈下。
為了應(yīng)對(duì)這些問(wèn)題,他們提出了一種激進(jìn)的方案。數(shù)據(jù)不應(yīng)該被當(dāng)作資本看待,而應(yīng)當(dāng)作為勞動(dòng)成果——具體來(lái)講就是信息產(chǎn)生者的財(cái)產(chǎn),除非他們同意向公司提供數(shù)據(jù)以換取報(bào)酬。如此一來(lái),用戶(hù)數(shù)據(jù)可能會(huì)多次兜售給多家公司,從而降低數(shù)據(jù)作為準(zhǔn)入門(mén)檻的高度。向提供數(shù)據(jù)的用戶(hù)支付報(bào)酬,有利于將AI制造的財(cái)富分配開(kāi)來(lái),也讓互聯(lián)網(wǎng)公司有望獲得更好的數(shù)據(jù)。舉個(gè)例子,與其猜測(cè)商場(chǎng)里的顧客想要什么,不如請(qǐng)求人們分享自己的信息以換取報(bào)酬,告訴互聯(lián)網(wǎng)公司他們到訪了什么店鋪、瀏覽了哪些物品。那些論文作者們最大膽的想法也許是,數(shù)據(jù)勞動(dòng)可能會(huì)漸漸被視作一項(xiàng)有用的工作,像帶薪職務(wù)一樣賦予人們尊嚴(yán)——未來(lái)興許會(huì)出現(xiàn)的大規(guī)模自動(dòng)化便帶有這種令人期待的副作用。
這些作者的想法雖然發(fā)人深省,但只有區(qū)區(qū)五頁(yè)篇幅,還需詳加闡述。他們?cè)O(shè)想的這個(gè)體系里,有些部分似乎不切實(shí)際。如果沒(méi)有可觀的酬金,人們是否真有興趣花時(shí)間描述自己每天早上的起居或辦公室里的習(xí)慣(他們的數(shù)據(jù)又是否真那么寶貴,值得互聯(lián)網(wǎng)公司大掏腰包)?這些體系會(huì)不會(huì)引來(lái)一眾數(shù)據(jù)雇傭兵,為了掙快錢(qián)而拿沒(méi)用的垃圾數(shù)據(jù)敷衍交差?
除了大腦別無(wú)可用
當(dāng)然,這篇論文仍然具有一些重要洞見(jiàn),給探討數(shù)據(jù)在經(jīng)濟(jì)活動(dòng)中扮演的角色擬訂了框架。其中一個(gè)角色,便牽涉數(shù)據(jù)市場(chǎng)中權(quán)力的失衡。大型互聯(lián)網(wǎng)公司的集中性是一方面,還有一個(gè)原因則是,盡管數(shù)據(jù)總體的價(jià)值極高,個(gè)體提供的單一數(shù)據(jù)一般卻無(wú)足輕重。就算某位用戶(hù)拒不提供他的數(shù)據(jù),也不會(huì)對(duì)臉書(shū)構(gòu)成任何威脅。因此,要與互聯(lián)網(wǎng)公司進(jìn)行有效磋商,可能需要采取集體行動(dòng):也許,還需要成立一個(gè)“數(shù)據(jù)工會(huì)”。
這樣做也許有其弊端。比如說(shuō),工會(huì)也許會(huì)開(kāi)出過(guò)高的數(shù)據(jù)價(jià)碼,令實(shí)用AI的開(kāi)發(fā)受阻。工會(huì)也許會(huì)要求互聯(lián)網(wǎng)公司以利潤(rùn)分成來(lái)?yè)Q取免費(fèi)使用所有數(shù)據(jù)的權(quán)利。這就與論文作者們主張的數(shù)據(jù)付費(fèi)勞動(dòng)模型背道而馳了。他們認(rèn)為該模型對(duì)提高數(shù)據(jù)質(zhì)量至關(guān)重要。不過(guò),在傳統(tǒng)工會(huì)慘淡經(jīng)營(yíng)之際,數(shù)據(jù)工會(huì)作為鞏固工人權(quán)力的的一種方式還是有前景的。
最重要的是,作者們的提議將AI世界中價(jià)值的集體性本質(zhì)放到了聚光燈下。每個(gè)人變成了像油井一樣的東西,從中可抽出數(shù)字經(jīng)濟(jì)賴(lài)以運(yùn)行的燃料來(lái)。不論是出于公平還是效率的要求,那種燃料產(chǎn)生的收入都應(yīng)當(dāng)按勞分配。至于如何實(shí)現(xiàn),則是難點(diǎn)所在。