PivotalR

PivotalR package is an R front-end to PostgreSQL, Pivotal (Greenplum) database, and a wrapper for machine learning open-source library MADlib. It also interacts with Pivotal HD/HAWQ for Big Data analytics by providing an interface to the operations on tables/views in the database that is similar to data.frame. Hence it eliminates the need to learn SQL for the users of R when they work on objects in the database.

This package enables R users to operate on big data sets that would not fit into R memory and let them use R scripts to leverage MPP database as well as in-database analytics libraries. It also minimizes the data transferred between R and database. Big data is stored in database. When the user enters R commands, this package effectively translates into SQL queries and sends them to database for parallel execution. After execution the computed result is returned to R. Thereby using the powerful analytical capabilities of database and plotting the result with graphical functionalities of R.

PivotalR provides the core R infrastructure and over 50 analytical functions in R that leverage in-database execution. These include

Data Connectivity – db.connect, db.disconnect, db.Rquery
Data Exploration – db.data.frame, subsets
R language features – dim, names, min, max, nrow, ncol, summary etc
Reorganization Functions – merge, by (group-by), samples
Transformations – as.factor, null replacement
Algorithms – linear regression and logistic regression wrappers for MADlib

Useful Links

This entry was posted in R and tagged Pivotal, R. Bookmark the permalink.

PivotalR

Leave a comment Cancel reply

BLOG RSS

Follow Blog via Email

Disclaimer

Recent Posts

Categories

Links

Mercy’s Tweets

Archives

Meta

PivotalR

Share this:

Related

Leave a comment Cancel reply

BLOG RSS

Follow Blog via Email

Disclaimer

Recent Posts

Categories

Links

Mercy’s Tweets

Archives

Meta