The functionalities of data mining and the variety of knowledge they discover are briefly presented in the following list:
- Class/Concept Description: Characterization and Discrimination
- Prediction
- Outlier Analysis
- Evolution & Deviation AnalysisÂ
Â
Characterization
Data characterization is a summarization of general features of things in a target class and produces what is called characteristic rules.Â
The data relevant to a user-specified class are normally computed by a database query and run through a summarization component to extract the essence of the data at different levels of abstractions.
For example, one may want to characterize the "ProVideo(Company)" customers who regularly rent more than 30 movies a year. With concept hierarchies on the attributes describing the target class, the attribute-oriented induction method can be used, for example, to carry out data summarization.
Note that with a data cube containing a summarization of data, simple OLAP operations fit the purpose of data characterization.Â
Concept Description |
Discrimination
Data discrimination produces a set of rules called discriminant rules and is basically the comparison of the general features of objects between two classes associated with the target class and the contrasting class.Â
For example, one may want to compare the general characteristics of the customers who rented more than 25 movies in the past year with those whose rental account is lower than 5.Â
The techniques used for data discrimination are very similar to the techniques used for data characterization with the exclusion of data discrimination results include comparative measures.
Classification
Classification is the data analysis method that can be used to extract models describing important data classes or to predict future data trends and patterns.Â
Classification is a data mining technique that predicts categorical class labels while prediction models continuous-valued functions.
For example, a classification model may be built to categorize credit card transactions as either real or fake, while the prediction model may be built to predict the expenditures of potential customers on furniture the equipment is given their income and
Classification is a data mining technique that predicts categorical class labels while prediction models continuous-valued functions.
For example, a classification model may be built to categorize credit card transactions as either real or fake, while the prediction model may be built to predict the expenditures of potential customers on furniture the equipment is given their income and
Classification approaches normally use a training set where all objects are already associated with known class labels.
The classification algorithm learns from the training set and builds a model.Â
The model is used to classify new objects.Â
Another example, after starting a credit policy, the "ProVideo(Company)" managers could analyze the customers’ behaviors vis-à -vis their credit, and label accordingly, the customers who received credits with three possible labels "safe", "risky" and "very risky".
The classification analysis would generate a model that could be used to either accept or reject credit requests in the future.Â
Prediction
The prediction has attracted substantial attention given the potential implications of successful projecting in a business context.
There are two major types of predictions: one can either try to predict some unavailable data values or pending trends or predict a class label for some data. The latter is considered as classification.
Once a classification model is built based on a training set, the class the label of an object can be foreseen based on the attribute values of the object and the attribute values of the classes.Â
Prediction is, nonetheless, more often referred to as the forecast of missing numerical values, or increase/ decrease trends in time-related data. Â
The primary idea is to use a large number of past values to consider probable future values.
Association Analysis
Association analysis is the discovery of what are commonly called association rules.Â
It interprets the occurrence of items associating together in transactional databases, and based on a threshold called support, identifies the frequent itemsets.Â
Another threshold, confidence, which is the conditional probability than an item appears in a transaction when another item appears, is used to identify association rules.Â
Association analysis is commonly used for market basket analysis.Â
For example, it could be useful for the "ProVideo(Campany)" manager to know what movies are often rented together or if there is a relationship between renting a certain type of movies and buying popcorn or pop.Â
The discovered association rules are of the form: A -> B [s,c], where A and B are conjunctions of attribute value-pairs, and s (for support) is the probability that A and B appear together in a transaction and c (for confidence) is the conditional probability that B appears in a transaction when A is present.
For example, the hypothetic association rule: RentType(X, "game") AND Age(X, "13-19") -> Buys(X, "pop") [s=2%,c=55%] would indicate that 2% of the transactions considered are of customers aged between 14 and 20 who are renting a game and buying pop and that there is a certainty of 55% that teenage customers who rent a game also buy pop.Â
Â
Â
Cluster Analysis
Similar to classification, clustering is the organization of data in groups. However, unlike classification, in clustering, class labels are unknown and it is up to the clustering algorithm to discover acceptable classes.Â
Clustering is also called unsupervised classification because the classification is not performed by given class labels.Â
There are many clustering approaches all based on the principle of maximizing the similarity between objects in the same class (intra-class similarity) and minimizing the similarity between objects of different classes (inter-class similarity).
Outlier Analysis
Outliers are data elements that cannot be grouped in a given class or cluster.Â
They are also known as exceptions or surprises, they are often very important to identify.Â
While outliers can be considered noise and discarded in some applications, they can reveal important knowledge in other domains, and thus can be very significant and their analysis valuable.
Evolution & Deviation Analysis
Evolution and deviation analysis pertains to the study of time-series data that changes in time.
Evolution analysis models evolutionary trends in data, which consent to characterize, comparing, classifying, or clustering of time-related data.
Deviation analysis, on the other hand, considers differences between measured values and expected values, and attempts to find the cause of the deviations from the anticipated values.Â
(Read Also - > Data Mining Tasks)Â
Summary
The functionalities of data mining and the variety of knowledge they discover are briefly presented in the following list:
- Class/Concept Description: Characterization and Discrimination
- Prediction
- Outlier Analysis
- Evolution & Deviation Analysis
Subscribe us for more content on Data Â
0 Comments