R语言和数据挖掘的一些模型实现案例及教你如何学习R

Contents

List of Figures v

List of Abbreviations vii

1 Introduction 1

1.1 Data Mining 1

1.2 R 1

1.3 Datasets 2

1.3.1 The Iris Dataset 2

1.3.2 The Bodyfat Dataset 3

2 Data Import and Export 5

2.1 Save and Load R Data 5

2.2 Import from and Export to .CSV Files 5

2.3 Import Data from SAS 6

2.4 Import/Export via ODBC 7

2.4.1 Read from Databases 7

2.4.2 Output to and Input from EXCEL Files 7

3 Data Exploration 9

3.1 Have a Look at Data 9

3.2 Explore Individual Variables 11

3.3 Explore Multiple Variables 15

3.4 More Explorations 19

3.5 Save Charts into Files 27

4 Decision Trees and Random Forest 29

4.1 Decision Trees with Package party 29

4.2 Decision Trees with Package rpart 32

4.3 Random Forest 36

5 Regression 41

5.1 Linear Regression 41

5.2 Logistic Regression 46

5.3 Generalized Linear Regression 47

5.4 Non-linear Regression 48

6 Clustering 49

6.1 The k-Means Clustering 49

6.2 The k-Medoids Clustering 51

6.3 Hierarchical Clustering 53

6.4 Density-based Clustering 54

ii CONTENTS

7 Outlier Detection 59

7.1 Univariate Outlier Detection 59

7.2 Outlier Detection with LOF 62

7.3 Outlier Detection by Clustering 66

7.4 Outlier Detection from Time Series 67

7.5 Discussions 68

8 Time Series Analysis and Mining 71

8.1 Time Series Data in R 71

8.2 Time Series Decomposition 72

8.3 Time Series Forecasting 74

8.4 Time Series Clustering 75

8.4.1 Dynamic Time Warping 75

8.4.2 Synthetic Control Chart Time Series Data 76

8.4.3 Hierarchical Clustering with Euclidean Distance 77

8.4.4 Hierarchical Clustering with DTW Distance 79

8.5 Time Series Classification 81

8.5.1 Classification with Original Data 81

8.5.2 Classification with Extracted Features 82

8.5.3 k-NN Classification 84

8.6 Discussions 84

8.7 Further Readings 84

9 Association Rules 85

9.1 Basics of Association Rules 85

9.2 The Titanic Dataset 85

9.3 Association Rule Mining 87

9.4 Removing Redundancy 90

9.5 Interpreting Rules 91

9.6 Visualizing Association Rules 91

9.7 Discussions and Further Readings 96

10 Text Mining 97

10.1 Retrieving Text from Twitter 97

10.2 Transforming Text 98

10.3 Stemming Words 99

10.4 Building a Term-Document Matrix 100

10.5 Frequent Terms and Associations 101

10.6 Word Cloud 103

10.7 Clustering Words 104

10.8 Clustering Tweets 105

10.8.1 Clustering Tweets with the k-means Algorithm 106

10.8.2 Clustering Tweets with the k-medoids Algorithm 107

10.9 Packages, Further Readings and Discussions 109

11 Social Network Analysis 111

11.1 Network of Terms 111

11.2 Network of Tweets 114

11.3 Two-Mode Network 119

11.4 Discussions and Further Readings 122

12 Case Study I: Analysis and Forecasting of House Price Indices 125

13 Case Study II: Customer Response Prediction and Proftt Optimization 127

CONTENTS iii

14 Case Study III: Predictive Modeling of Big Data with Limited Memory 129

15 Online Resources 131

15.1 R Reference Cards 131

15.2 R 131

15.3 Data Mining 132

15.4 Data Mining with R 133

15.5 Classification/Prediction with R 133

15.6 Time Series Analysis with R 134

15.7 Association Rule Mining with R 134

15.8 Spatial Data Analysis with R 134

15.9 Text Mining with R 134

15.10 Social Network Analysis with R 134

15.11 Data Cleansing and Transformation with R 135

15.12 Big Data and Parallel Computing with R 135

Bibliography 137

General Index 143

Package Index 145

Function Index 147

New Book Promotion 149

R语言与数据挖掘最佳实践和经典案例（英文版）.pdf

当前位置：以往代写 > R语言教程 >R语言和数据挖掘的一些模型实现案例及教你如何学习R