Major Issues In Data Mining
Issues in the data mining process are broadly divided into three.
- Mining Methodology
- User Interaction
- Applications & Social Impacts
Mining Methodology
It involves understanding the issues regarding different factors regarding mining techniques.
- Mining different kinds of knowledge from diverse data types, e.g., bio, stream, Web.
- Handling noise and incomplete data: data cleaning and data analysis methods that can handle noise are required. Outlier mining methods for discovery and analysis of exceptional cases.
- Incorporation of background knowledge: domain knowledge is required to guide the discovery process and express patterns in concise terms and at different levels of abstraction.
- Pattern evaluation: the interestingness problem
- Performance: efficiency, effectiveness, and scalability: running time of data mining algorithm must be predictable and acceptable.
- Parallel, distributed, and incremental mining methods.
- Integration of the discovered knowledge with the existing one.
User Interaction
It involves understanding the issues regarding mined data or interpretation of data by the end-user.
- It involves data mining query languages and Adhoc mining languages.
- Data mining query language needs to be developed to allow users to describe ad-hoc data mining tasks.
- Interpretation of expression and visualization of data mining results.
- Interactive mining of knowledge at multiple levels of abstraction.
Applications and social impacts
It involves understanding issues regarding how the interpreted data or mined data can be applied in real-world scenarios.
- Performing domain-specific data mining & invisible data mining
- Eg. Companies like Amazon keeps track of customer profiles
- Protection of data security, integrity, and privacy
- We need to observe data sensitivity and preserve people's privacy while performing successful data mining.
A data mining system has the potential to generate thousands or even millions of patterns and insights, or rules, then “are all of the patterns interesting?” Typically not—only a small fraction of the patterns potentially generated would actually be of interest to any given user.
What makes a pattern interesting?
To answer this question, a pattern is interesting if it is easily understood by humans, (2)valid on new or test data with some degree of certainty, potentially useful, and novel.
A pattern is also interesting if it validates a concept that the user sought to confirm. An interesting pattern represents knowledge.
Several objective measures of pattern interestingness exist.
An association rule of the form X=>Y is rule support, representing the percentage of transactions from a transaction database that the given rule satisfies. i.e P(X U Y)
Another objective measure for association rule is confidence, which assesses the degree of certainty of detected association. This is taken to be conditional probability P(Y|X) i.e., the probability that a T containing X also contains Y.
Each interesting measure is associated with a threshold, which may be controlled by the user.
Can a data mining system generate all of the interesting patterns?
- The answer to this depends on the completeness of the data mining algorithm.
- Unrealistic and inefficient
- We need to focus on a search based on user-provided constraints and interestingness measures.
Can a data mining system generate only interesting patterns?
- This is an optimization problem.
- Highly desirable.
- But still a challenging issue in data mining.
Summary
Issues in the data mining process are broadly divided into three.
- Mining Methodology
- User Interaction
- Applications & Social Impacts
Subscribe us for more content on Data.
Read also -> Classification In Data Mining
0 Comments