IBM  Data Science  Experience

Using dashDB with RStudio in Data Science Experience

Connecting to a database is a crucial step for most data science workflows. In this post I'll share a script for connecting to DashDB in RStudio in Data Science Experience.

Advanced Users: Scroll to the bottom to grab the full script and figure it out yourself ;-)

Package installation

First, the obligatory installation of the R package. You can use the ibmdbR package to easily connect to DashDB. If you already installed this package you can skip the first line and just call library() to load it.

install.packages('ibmdbR')  
library(ibmdbR)

Find your credentials

If you are starting in Data Science Experience, you need to switch the context to be Watson Data Platform. Click the down arrow to the right of Data Science Experience then click Watson Data Platform to change the context.

Now, find your DashDB instance by going to Data Services in the left side bar, which is opened by clicking the hamburger icon in the top left.

From Data Services you can locate your DashDB instance. Click the name to be taken to the Bluemix page for that service. If you need to create a DashDB service instance, click the plus sign in the action-bar:

If you already had the instance created, click Service Credentials, then click either New Credential or View Credentials if you have that option:

Once you have created your credentials, copy the JSON blob so you can paste it into RStudio.

Once you have this copied, paste it into the script where you installed or loaded the ibmdbR package. Then you just need to set the values for the variables shown below -- take these from your credentials. The default protocol and port are filled in, but these might change based on your set up:

dsn_driver <- #your_db  
dsn_database <- #your_db  
dsn_hostname <- #your_hostname  
dsn_port <- "50000"  
dsn_protocol <- "TCPIP"  
dsn_uid <- #your_uid  
dsn_pwd <- #your_pwd  

It may seem repetitive to have your DB name set to two different variables for authentication. This was done to have explicit variables for each variable in our connection path. Our connection path needs this value in two separate places.

Making the connection

Now the hard part is done; you just need to create a connection path variable by pasting your credentials together like so:

conn_path <- paste(dsn_driver,  
                   ";DATABASE=",dsn_database,
                   ";HOSTNAME=",dsn_hostname,
                   ";PORT=",dsn_port,
                   ";PROTOCOL=",dsn_protocol,
                   ";UID=",dsn_uid,
                   ";PWD=",dsn_pwd,sep="")

Once you have the connection path, use two methods from the ibmdbR package to establish the connection (idaConnect()) then initialize it (idaInit()).

Here you are calling the connection to the DB ch.

ch <- idaConnect(conn_path)  
idaInit(ch)  

Run a Query

With the connection established, you can easily create an ida.data.frame. Here you create a data frame for a table named GOSALES.PRODUCT

table_string = `GOSALES.PRODUCT`  
df = ida.data.frame(table_string)  

If you are curious about the data structure, here is an excerpt from the ibmdbR documentation:

This function creates an IDA data frame (that is, an object of the class ida.data.frame). It does not store any data in local memory, but aggregates metadata used to determine the exact table subset (columns - SELECT clause; and/or rows - WHERE clause) and creates a pointer to a table located in the database.

You are now ready to use dashDB in RStudio.

Full Script

Here's the complete script for reference:

install.packages('ibmdbR')  
library(ibmdbR)  
dsn_driver <- #your_db  
dsn_database <- #your_db  
dsn_hostname <- #your_hostname  
dsn_port <- "50000"  
dsn_protocol <- "TCPIP"  
dsn_uid <- #your_uid  
dsn_pwd <- #your_pwd

conn_path <- paste(dsn_driver,  
                   ";DATABASE=",dsn_database,
                   ";HOSTNAME=",dsn_hostname,
                   ";PORT=",dsn_port,
                   ";PROTOCOL=",dsn_protocol,
                   ";UID=",dsn_uid,
                   ";PWD=",dsn_pwd,sep="")
ch <- idaConnect(conn_path)  
idaInit(ch)  
table_string = 'GOSALES.PRODUCT'  
idadf = ida.data.frame(table_string)  
dim(idadf)  
head(idadf)  

Greg Filla

Read more posts by this author.

Chicago

Subscribe to IBM Data Science Experience Blog

Get the latest posts delivered right to your inbox.

or subscribe via RSS with Feedly!