专注在线职业教育24年
下载APP
小程序
希赛网小程序
导航

About the Data Mining Results and Ethic

责编:zuli720 2004-08-23

The technique of Data Mining is very important to us. In the web interface of user, we adopt B/S structure. And we adopt C/S to backstage supporter management logic layer that needs Data Mining. Every art and every inquiry, is thought to aim at some good; and that for this reason, it has rightly been declared to be that at which all things aim. But a key target is found among ends. . .

Knowledge discovery needs insight into data from everywhere. But this brings with it the inherent risk that what is inferred may be private or ethically sensitive. The process of generating rules through a mining operation is becoming an ethical issue. Significantly, the sensitivity of a rule may not be apparent to the miner, particularly since the volume and diversity of rules can often be large. However, given the subjective nature of such sensitivity, rather than prohibit the production of ethically and privacy sensitive rules, we present here an alerting process that detects and highlights the sensitivity of the discovered rules. The process caters for diering sensitivities at the attribute value level and allows a variety of sensitivity combination functions to be employed.

These functions have been tested empirically and the results of these tests are reported.

Introduction

Knowledge discovery, like many other powerful technologies, alse lends itself both to abuse and to benefit. The publication of a rule which subsequently has a negative impact on the community bears signifi-cant risks, through litigation, adverse publicity, loss of reputation and so on. However, the number and complexity of rules generated from many data mining systems means that the human post-processing of a data mining run can be long and potentially complex,leading to suspect rules being overlooked.

Unlike most other methods which adopt objective statistical measures to determine interestingness, in work of mining data, we propose a subjective system for rating a rule's interestingness.

To properly discuss issues of privacy and ethics in data mining the terms privacy and ethics need to be clearly defined. Privacy will be referred to as an individual's desire and ability to keep certain information about themselves hidden from others. Defining privacy in a legal context has historically been a dicult process which still hampers new privacy laws. Ethics will be referred to as a set of moral principles or a system of values which guides the behaviour of individuals and organisations. It is the correct way of doing things which as judged by society and often enforced through law (such as anti-discrimination legislation).

To act ethically involves acting for the benefit of the community. It is entirely possible to act unethically yet legally. such a rule there can be a negative impact through stereo-typing and an invasion of privacy. For a publisher, there is the risk of a loss of reputation and of litigation. Two approaches can be taken to mitigate the acts of ethical compromise. Firstly, privacypreservation mechanisms can be put in place that limit access to data, restrict the scope of queries or perturb, hide or delete data so that undesired responses do not occur. Unfortunately, this can also affect the capacity of a mining system to generate beneficial results. The second approach is thus to allow unrestricted mining but to employ an alerting process to inform users to the potentially sensitivies of rules, ie. to manage rather than eliminate the risk. A major problem that then needs to be overcome with this approach is that sensitivity is context dependent and thus global measures of sensitivity cannot be adopted. This is the problem tackled by this work.

Discussion and Related Work

1. KDD and Ethics

Both inside and outside of the KDD community there is growing concern regarding the use of sensitive information. for example, cite recent surveys about public opinion surrounding personal privacy which show a raised level of concern about the use of private information. There is some justification for this concern – a recent survey in InfoWeek found that over 20% of companies store data on their customers with information about medical profile, a similar amount store customer demographics with salary and credit information, and over 15% store information about their customers'legal history. With this increasing level of storage of personal information there is a greater risk that misleading, erroneous or even defamatory rules might be generated.

To demonstrate the potentially misleading nature of data mining, Leinweber mined United Nations data combined with stock market data (Leinweber 1997). It was found that the best indicator for the S&P 500 Index was the estimated level of butter production in Bangladesh. It would be obvious that this is a statistical coincidence, but as other correlations are more dicult to refute, it is important to consider this dif-ficulty in other situations. The use of more statistically appropriate interestingness measures can help address this problem. Moreover, the ability to judge that a generated rule is sensitive is highly dependent on the knowledge and experience of the domain expert, rather than the data miner. Since knowledge discovery techniques are increasing being applied in areas in which the data miner is unlikely to possess the required domain knowledge this is becoming an important aspect.

The first workshop focusing on privacy and data mining was recently held in Japan. In common with much research in the area, the papers on the topic of privacy preservation in data mining generally focused on issues surrounding the sharing of data between organisations or on mechanisms to prohibit access to data during sharing. The dierence in this work is instead to automate the alerting of users when data mining systems produce potentially sensitive results (as opposed to either screening potentially sensitive data or manually checking for sensitive rules), and to highlight these sensitive rules so that they can be reviewed before use/publication.

In data mining research, particularly in areas such as medical and health research, there are a considerable number of databases that could be considered ethically sensitive1. Access to these datasets is usually tightly controlled with approval for the use of the data only granted where there is a clear and de-finable benefit to the research and a strong adherence to agreed research ethics. The problem for data mining researchers is that investigations using knowledge discovery tools are commonly open ended – it is not possible to know what will be found until it is discovered.Moreover, many useful investigations require the use of non-anonymised data (for example, to link episodes of treatment). It is hoped that the use of systems such as that described in this paper will help with relieving concerns about using data mining on ethically sensitive datasets and open them up for further research.

2 Related Work

Until recently, privacy protection and ethical alerting has received relatively little interest in mainstream KDD research. However, over the past few years there has been some important work, some which is discussed below. The recent concern over homeland defence, for example, has heightened the awareness for the need to find a balance between protecting the privacy of individuals and detecting terrorist threats. In addition, privacy protection for statistical databases is a related disci

更多资料
更多课程
更多真题
温馨提示:因考试政策、内容不断变化与调整,本网站提供的以上信息仅供参考,如有异议,请考生以权威部门公布的内容为准!
相关阅读
查看更多

加群交流

公众号

客服咨询

考试资料

每日一练

咨询客服