IBM  Data Science  Experience

Using Db2 Warehouse on Cloud with RStudio in Data Science Experience

Connecting to a database is a crucial step for most data science workflows. In this post, I'll share a script for connecting to Db2 Warehouse on Cloud in RStudio in Data Science Experience (DSX). If you are coming from Db2 Warehouse on Cloud, please login to Data Science Experience RStudio using your Bluemix id.

Advanced Users: Scroll to the bottom to grab the full script and figure it out yourself ;-)

Find your credentials

After logging in to DSX, click the user profile icon and select Settings:

Then click the Services tab, where you should find your Db2 Warehouse on Cloud service:

Now you need to open the Manage on Bluemix page in order to see the service credentials:

Finally, you need to click on the Service credentials tab on the left side and view all credentials of Db2 Warehouse on Cloud.

If you already had the instance created, click Service Credentials, then click either New Credential or View Credentials if you have that option:

Once you have created your credentials, copy the JSON blob so you can paste it into RStudio.

Once you have this copied, paste it into the script where you installed or loaded the ibmdbR package. Then you just need to set the values for the variables shown below -- take these from your credentials. The default protocol and port are filled in, but these might change based on your set up:

dsn_driver <- #your_db  
dsn_database <- #your_db  
dsn_hostname <- #your_hostname  
dsn_port <- "50000"  
dsn_protocol <- "TCPIP"  
dsn_uid <- #your_uid  
dsn_pwd <- #your_pwd  

It may seem repetitive to have your DB name set to two different variables for authentication. This was done to have explicit variables for each variable in our connection path. Our connection path needs this value in two separate places.

Package installation

First, the obligatory installation of the R package. You can use the ibmdbR package to easily connect to Db2 Warehouse on Cloud. If you already installed this package you can skip the first line and just call library() to load it.

install.packages('ibmdbR')  
library(ibmdbR)

Making the connection

Now the hard part is done; you just need to create a connection path variable by pasting your credentials together like so:

conn_path <- paste(dsn_driver,  
                   ";DATABASE=",dsn_database,
                   ";HOSTNAME=",dsn_hostname,
                   ";PORT=",dsn_port,
                   ";PROTOCOL=",dsn_protocol,
                   ";UID=",dsn_uid,
                   ";PWD=",dsn_pwd,sep="")

Once you have the connection path, use two methods from the ibmdbR package to establish the connection (idaConnect()) then initialize it (idaInit()).

Here you are calling the connection to the DB ch.

ch <- idaConnect(conn_path)  
idaInit(ch)  

Run a Query

With the connection established, you can easily create an ida.data.frame. Here you create a data frame for a table named GOSALES.PRODUCT

table_string = `GOSALES.PRODUCT`  
df = ida.data.frame(table_string)  

If you are curious about the data structure, here is an excerpt from the ibmdbR documentation:

This function creates an IDA data frame (that is, an object of the class ida.data.frame). It does not store any data in local memory, but aggregates metadata used to determine the exact table subset (columns - SELECT clause; and/or rows - WHERE clause) and creates a pointer to a table located in the database.

You are now ready to use Db2 Warehouse on Cloud in RStudio.

Full Script

Here's the complete script for reference:

install.packages('ibmdbR')  
library(ibmdbR)  
dsn_driver <- #your_db  
dsn_database <- #your_db  
dsn_hostname <- #your_hostname  
dsn_port <- "50000"  
dsn_protocol <- "TCPIP"  
dsn_uid <- #your_uid  
dsn_pwd <- #your_pwd

conn_path <- paste(dsn_driver,  
                   ";DATABASE=",dsn_database,
                   ";HOSTNAME=",dsn_hostname,
                   ";PORT=",dsn_port,
                   ";PROTOCOL=",dsn_protocol,
                   ";UID=",dsn_uid,
                   ";PWD=",dsn_pwd,sep="")
ch <- idaConnect(conn_path)  
idaInit(ch)  
table_string = 'GOSALES.PRODUCT'  
idadf = ida.data.frame(table_string)  
dim(idadf)  
head(idadf)  

Greg Filla

Read more posts by this author.

Chicago

Subscribe to IBM Data Science Experience Blog

Get the latest posts delivered right to your inbox.

or subscribe via RSS with Feedly!