复合题 Passage 2 

The reams of data that man modem business collect—dubbed “big data” —can provide powerful insights. It is the key to Netflix’ s recommendations engines, Facebook’ s social ads, and even Amazon’ s methods for speeding up the new Web browser, Silk, which comes with its new Fire tablet. But big data is like any powerful tool. Using it carelessly can have dangerous results. 

A new paper by Kate Crawford, an associate professor at the university of New South Wales and Microsoft senior researcher Danah Boyd spells out the reasons that businesses and academics should proceed with caution. Whole privacy invasions—both deliberate and accidental— are obvious issues; the paper also warns that data can easily be incomplete and distorted. “With big data comes big responsibilities” , says Crawford. “There’ s been the emergence of a philosophy that big data is all you need” , she adds, “We would suggest that, actually, numbers don’ t speak for themselves. ” 

Google is a poster child for the power of data. The company has transformed a massive amount of information, gathered through its search engine, into a commanding ad network and powerful role as the gatekeeper of much of the world’ s information. Google’ s director of research, Peter Norvig, demonstrated the true power of a large data set, using the example of machine translation. With enough data, Norvig said, even the worst training algorithm performs far better than what can be achieved with a smaller data set. 

But Crawford and Boyd’ s work shows that studying large data still requires finesse. Twitter, which is commonly scrutinized for insights about people’ s moods, attitudes toward politics, and other aspects of daily life, presents a number of problems, the researchers say. About 40 percent of Twitter’ s active users sign in to listen, not to post, which Crawford and Boyd say, suggests that posts could come from a certain type of person, rather than a random sample. They also note that few researchers have access to all Twitter posts—most use smaller samples provided by the company. Without better information about how those samples were collected, studies could arrive at skewed results, they argue. 

Crawford notes that many big data sets—particularly social data—come from companies that have no obligation to support scientific inquiry. Getting access to the data might mean paying for it, or keeping the company happy by not performing certain types of studies. 

The researchers add that big data can also raise serious ethical concerns. Many times, Crawford notes, combing data from different sources can lead to unexpected results for the people involved. For example, other researchers have previously shown that they can identify individuals by using social media data in combination with supposedly anoymized behavioral data provided by companies. 

Handling big data sets takes almost impossible care. Given the quantity of information now available on the internet, Crawford argues, researchers need to slow down and think about the methods the use. 

单选题 By “numbers don’ t speak for themselves” (paragraph2) , Crawford means “big data” _____.
【正确答案】 D
【答案解析】本题为细节理解题。 数量并不为自己代言意思是指数据再多也有不需要的, 结合“numbers don’ t speak for themselves” 是对前文“big data is all you need” 的反驳, 可得D正确。
单选题 According to Norvig, the decisive factor for the effect of machine translation is _____.
【正确答案】 C
【答案解析】本题为细节理解题。第三段以翻译工具为例,最后一句指出即使是最差的算法,大量数据得出来的结果比小量数据的结果要好得多。因此选C。
单选题 The views of Crawford and Norvig on big data can be said to be _____.
【正确答案】 B
【答案解析】本题为细节理解题,第二段以及倒数第二段指出Crawford认为网络不安全,且信息不一定准确有用,甚至有可能引发伦理道德问题。而Norvig则认为大数据为他带来了便利。两者截然相反,故选B。
单选题 Crawford and Boyd think Twitter presents problems because _____.
【正确答案】 A
【答案解析】本题为细节理解题。第四段指出推特有问题是因为一方面发推文的人是某种固定类型的人,而不是所有类型的人,抽样研究不具有全面性。另一方面,很少有人能够看到所有推文,数量少。综合来说,你所看到的推特上的推文只是冰山一角,很难结合这些数据就对推特世界中的用户有整体了解,故选A。
单选题 To which of the following about big data might the author possibly agree?
【正确答案】 C
【答案解析】本题为细节辨析题。根据倒数第二段研究员可以通过结合某些公司提供的匿名的行为信息与社交媒体数据确定一个人的身份,从而得到私人信息,故C正确。第一段指出大数据在许多领域都有运用,A错误。倒数第二段指出结合不同来源的信息可以侵犯私人信息,B错误。倒数第三段指出只是有些公司提供信息会收费,并不是说所有机构。