教科書:資料挖掘:概念與技術(第二版),Jiawei Han和Micheline Kamber 著,機械工業出版社(2007)
Lecture 1: Introduction
1) Why data mining?
Necessity Is the Mother of Invention需要是發明之母
2) What is data mining?
Data mining (knowledge discovery from data從大量資料中提取或挖掘知識)
Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data從大量的資料中挖掘哪些令人感興趣的、有用的、隐含的、先前未知的和可能有用的模式或知識
Alternative names: Knowledge discovery (mining) in databases (KDD) 資料庫中的知識挖掘
Steps of a KDD Process
Learning the application domain: relevant prior knowledge and goals of application
Creating a target data set: data selection
Data cleaning and preprocessing: (may take 60% of effort!)
Data reduction and transformation:Find useful features, dimensionality/variable reduction, invariant representation
Choosing functions of data mining: summarization, classification, regression, association, clustering
Choosing the mining algorithm(s)
Data mining: search for patterns of interest
Pattern evaluation and knowledge presentation: visualization, transformation, removing redundant patterns, etc.
Use of discovered knowledge
Architecture: Typical Data Mining System
![](https://img.laitimes.com/img/_0nNw4CM6IyYiwiM6ICdiwiIn5GcuQmYiFWM5YmN0U2NkR2MhRjZzQWM4MmMmNTM4UDZ2UWZfdWbp9CXt92Yu4GZjlGbh5SZslmZxl3Lc9CX6MHc0RHaiojIsJye.png)
3) On what kind of data?
Traditional database and appllications
Relational database, data warehouse, transactional database關系資料庫,資料倉庫,事務資料庫
Advanced database and advanced applications
Object-relational databases對象-關系資料庫
Temporal database, sequence data (incl. biosequences), time-series data時間資料庫、序列資料庫和時間序列資料庫
Spatial database and spatiotemporal database空間資料庫和時間空間資料庫
Text databases Multimedia database文本資料庫和多媒體資料庫
Heterogeneous databases and legacy databases異構資料庫和遺産資料庫
Data streams and sensor data資料流和傳感器資料
Structure data, graphs, social networks and link databases
The World-Wide Web網際網路
4) Data Mining Functionalities
Lass/concept description: Characterization and discrimination 類/概念描述: 特性化和區分
Frequent patterns, association, correlation and causality頻繁模式、關聯和相關
Classification and prediction分類和預測
Cluster analysis聚類分析
Outlier analysis離群點分析
Trend and evolution analysis趨勢和演變分析
5) Are all the patterns interesting?
6) Classification of data mining systems