Introduction To Data Mining Query Language(DMQL)
It was proposed by Han, Fu, Wang, et al. for the Database Miner data mining system. Although it was based on the structured Data Mining Query Language, These query languages are designed to support ad hoc and interactive data mining. Also, it provides commands for specifying primitives.Â
We can use Data Mining Query Language(DMQL) to work with databases and data warehouses as well. We can also use it to define data mining tasks. Particularly we examine how to define data warehouses and data marts in DMQL.Â
Data Mining Query Language(DMQL) adopts SQL-like syntax.
Hence, it can be easily integrated with relational query languages.
(Check Out The Data Science Course On Udemy)
Â
(Check Out The Data Science Course On Udemy)
Â
A Data Mining Query Language (DMQL) can be designed to incorporate these primitives, allowing users to flexibly interact with data mining systems.
It is defined in BNF grammar
It is defined in BNF grammar
- [ ] represents 0 or one occurrence.
- { } represents 0 or more occurrences.
- Words in sans serif represent keywords.
Motivation To Data Mining Query Language
A Data Mining Query Language (DMQL) can provide the ability to support ad-hoc and interactive data mining.
By providing a standardized language like SQL
- We can hope to achieve a similar effect like that SQL has on the relational database.
- It helps in the foundation for system development and evolution.
- It facilitates information exchange, technology transfer, commercialization, and wide acceptance.
Syntax For DMQL
DMQL is designed for Data Mining Task Primitives.
Its syntax goes for all of the primitives.
Syntax for the specification of
- Task-relevant data
- The kind of knowledge to be mined
- Concept hierarchy specification
- Interestingness measure
- Pattern presentation and visualization
Putting it all together — a DMQL query is formed.
Â
DMQL-Syntax for task-relevant data specificationÂ
Names of the relevant database or data warehouse, conditions, and relevant attributes or dimensions must be specified:
use database ‹database_name› or use data warehouse ‹data_warehouse_name›     Â
from ‹relation(s)/cube(s)› [where condition] (data cubes and tables)
in relevance to ‹attribute_or_dimension_list› (attributes or dimension for exploration)
order by ‹order_list› (sorting order)
group by ‹grouping_list› (specifies criteria to group)
having ‹condition› (it represent which group of data are considered relevant)
Syntax for Specifying Kind of Knowledge to be Mined
Characterization:
 ‹Mine_Knowledge_Specification›::= Â
 mine characteristics [as ‹pattern_name›]
 analyze ‹measure(s)›
 analyze ‹measure(s)›
Example:Â
mine characteristics as customerPurchasing analyze count%
Â
Discrimination:
 ‹Mine_Knowledge_Specification› ::=   Â
 mine comparison [as ‹ pattern_name›]   Â
 for ‹target_class› where ‹target_condition {versus ‹contrast_class_i where ‹contrast_condition_i›} analyze ‹measure(s)›
Example:Â
Â
   Mine comparison as purchaseGroups
   for bigspenders where avg(I.price) >= $100
   versus budgetspenders where avg(I.price) < $100
   analyze count     Â
   Mine comparison as purchaseGroups
   for bigspenders where avg(I.price) >= $100
   versus budgetspenders where avg(I.price) < $100
   analyze count     Â
Association:Â
This specifies the mining of patterns of association
‹Mine_Knowledge_Specification›::=    Â
mine associations [as ‹pattern_name›] [matching ‹metapattern›]
Example:Â
mine associations as buyingHabits matching P(X: customer, W) ^ Q(X,Y) => buys (X,Z)
Classification:
‹Mine_Knowledge_Specification› ::= mine classification [as ‹pattern_name›] analyze ‹classifying_attribute_or_dimension›
Example:Â
mine classification as classifyCustomerCreditRating analyze credit_rating
Syntax for concept hierarchy specification
More than one concept per attribute can be specified
Use hierarchy ‹hierarchy_name› for ‹attribute_or_dimension›
Examples:
Use hierarchy ‹hierarchy_name› for ‹attribute_or_dimension›
Examples:
Schema concept hierarchy for relation address as the total order (ordering is important).
define hierarchy location_hierarchy on address as [street,city,province_or_state,country]
Â
Set-Grouping Concept Hierarchy:
define hierarchy age_hierarchy for age on customer as
    level1: {young, middle_aged, senior}< level0: all
    level2: {20, ..., 39} < level1: young
    level2: {40, ..., 59} < level1: middle_aged
    level2: {60, ..., 89} < level1: senior
Operation-Derived Concept Hierarchy:
define hierarchy age_hierarchy for age on customer as
   {age_category(1), ..., age_category(5)} := cluster (default, age, 5) < all(age)
Â
Rule-Based Concept Hierarchy:
define hierarchy profit_margin_hierarchy on item as
  level_1: low_profit_margin < level_0: all
      if (price - cost)< $50
  level_1: medium-profit_margin < level_0: all
     if ((price - cost) > $50) and ((price - cost) <= $250))Â
  level_1: high_profit_margin < level_0: all
     if (price - cost) > $250
Â
Syntax for interestingness measure specification
The user can help control the number of uninteresting patterns returned by the data mining system by specifying measures of pattern interestingness and corresponding thresholds.
Interestingness measures include the confidence, support, noise, and novelty measures.
with [‹interest_measure_name›] threshold = ‹threshold_value›
Example:Â
   with support threshold = 5%
   with confidence threshold = 70%
Syntax for pattern presentation and visualization specification
display as ‹result_form›The result form can be rules, tables, cubes, crosstabs, pie or bar charts, decision trees, curves or surfaces.
To facilitate interactive viewing at different concept levels or different angles, the following syntax is defined:
‹Multilevel_Manipulation›::= roll up on ‹attribute_or_dimension› | drill down on ‹attribute_or_dimension› | add ‹attribute_or_dimension› | drop ‹attribute_or_dimension›Â
Putting it all together: A DMQL query
Here is an example DMQL for AllElectronics Database
use database AllElectronics_db
use hierarchy location_hierarchy for B.address
mine characteristics as customerPurchasing
analyze count%
in relevance to C.age, I.type, I.place_made
from customer C, item I, purchases P, items_sold S, works_at W, branch
where I.item_ID = S.item_IDÂ and S.trans_ID = P.trans_ID
   and P.cust_ID = C.cust_ID and P.method_paid = ""AmEx""Â
   and P.empl_ID = W.empl_ID and W.branch_ID = B.branch_ID and B.address = ""Canada"" and I.price >= 100
with noise threshold = 0.05
display as table
use hierarchy location_hierarchy for B.address
mine characteristics as customerPurchasing
analyze count%
in relevance to C.age, I.type, I.place_made
from customer C, item I, purchases P, items_sold S, works_at W, branch
where I.item_ID = S.item_IDÂ and S.trans_ID = P.trans_ID
   and P.cust_ID = C.cust_ID and P.method_paid = ""AmEx""Â
   and P.empl_ID = W.empl_ID and W.branch_ID = B.branch_ID and B.address = ""Canada"" and I.price >= 100
with noise threshold = 0.05
display as table
This above query represents the whole Data Mining Query for a Database or Data Warehouse.
Summary
Data Mining Query Language(DMQL) is used to work with databases and data warehouses as well. We can also use it to define data mining tasks. Particularly
we examine how to define data warehouses and data marts in DMQL.Â
Subscribe us for more content on Data. Â
 Â
Subscribe us for more content on Data. Â
 Â
0 Comments