US EUROPE AFRICA ASIA 中文
    China / View

    Better manage risks inherent in Big Data

    By Ernest Davis (China Daily) Updated: 2017-02-13 08:36

    In the last 15 years, we have witnessed an explosion in the amount of digital data available - from the Internet, social media, scientific equipment, smart phones, surveillance cameras, and many other sources - and in the computer technologies used to process it. "Big Data", as it is known, will undoubtedly deliver important scientific, technological, and medical advances. But Big Data also poses serious risks if it is misused or abused.

    But having more data is no substitute for having high-quality data. For example, a recent article in Nature reports that election pollsters in the United States are struggling to obtain representative samples of the population, because they are legally permitted to call only landline telephones, whereas Americans increasingly rely on cellphones. And while one can find countless political opinions on social media, these aren't reliably representative of voters, either. In fact, a substantial share of tweets and Facebook posts about politics are computer-generated.

    A Big Data program that used this search result to evaluate hiring and promotion decisions might penalize black candidates who resembled the pictures in the results for "unprofessional hairstyles," thereby perpetuating traditional social biases. And this isn't just a hypothetical possibility. Last year, a ProPublica investigation of "recidivism risk models" demonstrated that a widely used methodology to determine sentences for convicted criminals systematically overestimates the likelihood that black defendants will commit crimes in the future, and underestimates the risk that white defendants will do so.

    Another hazard of Big Data is that it can be gamed. When people know that a data set is being used to make important decisions that will affect them, they have an incentive to tip the scales in their favor. For example, teachers who are judged according to their students' test scores may be more likely to "teach to the test," or even to cheat.

    Similarly, college administrators who want to move their institutions up in the US News and World Reports rankings have made unwise decisions, such as investing in extravagant gyms at the expense of academics. Worse, they have made grotesquely unethical decisions, such as the effort by Mount Saint Mary's University to boost its "retention rate" by identifying and expelling weaker students in the first few weeks of school.

    A third hazard is privacy violations, because so much of the data now available contains personal information. In recent years, enormous collections of confidential data have been stolen from commercial and government sites; and researchers have shown how people's political opinions or even sexual preferences can be accurately gleaned from seemingly innocuous online postings, such as movie reviews - even when they are published pseudonymously.

    Finally, Big Data poses a challenge for accountability. Someone who feels that he or she has been treated unfairly by an algorithm's decision often has no way to appeal it, either because specific results cannot be interpreted, or because the people who have written the algorithm refuse to provide details about how it works. And while governments or corporations might intimidate anyone who objects by describing their algorithms as "mathematical" or "scientific," they, too, are often awed by their creations' behavior. The European Union recently adopted a measure guaranteeing people affected by algorithms a "right to an explanation"; but only time will tell how this will work in practice.

    When people who are harmed by Big Data have no avenues for recourse, the results can be toxic and far-reaching, as data scientist Cathy O'Neil demonstrates in her recent book Weapons of Math Destruction.

    The good news is that the hazards of Big Data can be largely avoided. But they won't be unless we zealously protect people's privacy, detect and correct unfairness, use algorithmic recommendations prudently, and maintain a rigorous understanding of algorithms' inner workings and the data that informs their decisions.

    The author is a professor of computer science at the Courant Institute of Mathematical Sciences, New York University.

    Project Syndicate

    Highlights
    Hot Topics

    ...
    亚洲欧洲美洲无码精品VA| 东京热加勒比无码视频| 91无码人妻精品一区二区三区L| 狠狠躁夜夜躁无码中文字幕| 精品国产v无码大片在线观看| 中文字幕日韩欧美| 人妻丝袜中文无码av影音先锋专区| 久久无码av三级| 亚洲AV无码乱码国产麻豆穿越| 中文字幕一区二区三区在线观看| 久久无码一区二区三区少妇| 无码日韩精品一区二区免费| 中文精品久久久久人妻| 中文人妻av高清一区二区| AA区一区二区三无码精片| 一本色道无码道在线观看| 精品久久久久中文字| 7777久久亚洲中文字幕| 无码人妻AⅤ一区二区三区水密桃| 久久水蜜桃亚洲av无码精品麻豆| 亚洲精品无码专区久久久| 直接看的成人无码视频网站| 精品久久久久久无码中文字幕| 亚洲精品中文字幕乱码三区| 亚洲精品无码永久在线观看 | 国产中文字幕在线免费观看| 中文字幕人妻中文AV不卡专区 | 国产 日韩 中文字幕 制服| A∨变态另类天堂无码专区| 精品无码一区在线观看 | 无码人妻一区二区三区在线水卜樱| 人妻无码第一区二区三区| 亚洲精品无码久久一线| 一区二区三区人妻无码| 亚洲成av人片在线观看无码不卡 | 无码专区狠狠躁躁天天躁| 亚洲欧洲日产国码无码网站| 亚洲Aⅴ无码专区在线观看q| 亚洲AV无码一区二区二三区软件 | 久久精品中文无码资源站| 无码成A毛片免费|