R处理惩罚大数据聚合操纵与MYSQL较量
当前位置:以往代写 > 其他教程 >R处理惩罚大数据聚合操纵与MYSQL较量
2019-06-14

R处理惩罚大数据聚合操纵与MYSQL较量

R处理惩罚大数据聚合操纵与MYSQL较量

应用场景:

MYSQL布局:

table(用户地址公司表)
uid, company
========
1, tianji
2, tianji
3, tianji
4, ganji
5, ganji
6, ganji
7, ganji
8, 58
….

聚合操纵:
select company,count(company) as num
from t_company group by company
having num>3 and num<=300
order by num desc;

功效:
company,num
===========
tianji,3
ganji,4

1kw行,800MB,MYSQL执行时间,2分钟。

R数据处理惩罚
读入csv(用户地址公司表)
1, tianji
2, tianji
3, tianji
4, ganji
5, ganji
6, ganji
7, ganji
8, 58

  1.   file=’comapng’
  2.   companyData<-read.table(file=file, header = FALSE, sep=”,”, quote = “\”‘”,
  3.              na.strings=”NA”,fileEncoding=”utf-8″,encoding=”utf-8″)
  4.   names(companyData)<-c(‘uid’,’company’)
  5.   print(paste(‘Total Company =>’,nrow(companyData)))
  6.   nset<-ddply(companyData, .(company), “nrow”)
  7.   nset<-nset[which(nset$nrow<=300 & nset$nrow>3),]
  8.   include<-c()
  9.   for(i in 1:nrow(nset)){
  10.     t<-which(companyData$company==nset$company[i])
  11.     include<-c(include,t)
  12.   }
  13.   print(paste(‘Available Company =>’,length(include)))
  14.   companyData<-companyData[include,]

复制代码

1kw行,800MB,占用内存1.5G,R执行时间,30分钟+

====================
想步伐优化!!

    关键字:

在线提交作业