Connecting to a database is a crucial step for most data science workflows. In this post, I'll share a script for connecting to Db2 Warehouse on Cloud in RStudio in Data Science Experience (DSX). If you are coming from Db2 Warehouse on Cloud, please login to Data Science Experience RStudio using your Bluemix id.
Advanced Users: Scroll to the bottom to grab the full script and figure it out yourself ;-)
Find your credentials
After logging in to DSX, click the user profile icon and select Settings:
Then click the Services tab, where you should find your Db2 Warehouse on Cloud service:
Now you need to open the Manage on Bluemix page in order to see the service credentials:
Finally, you need to click on the Service credentials tab on the left side and view all credentials of Db2 Warehouse on Cloud.
If you already had the instance created, click Service Credentials, then click either New Credential or View Credentials if you have that option:
Once you have created your credentials, copy the JSON blob so you can paste it into RStudio.
Once you have this copied, paste it into the script where you installed or loaded the ibmdbR package. Then you just need to set the values for the variables shown below -- take these from your credentials. The default protocol and port are filled in, but these might change based on your set up:
dsn_driver <- #your_db dsn_database <- #your_db dsn_hostname <- #your_hostname dsn_port <- "50000" dsn_protocol <- "TCPIP" dsn_uid <- #your_uid dsn_pwd <- #your_pwd
It may seem repetitive to have your DB name set to two different variables for authentication. This was done to have explicit variables for each variable in our connection path. Our connection path needs this value in two separate places.
First, the obligatory installation of the R package. You can use the ibmdbR package to easily connect to Db2 Warehouse on Cloud. If you already installed this package you can skip the first line and just call
library() to load it.
Making the connection
Now the hard part is done; you just need to create a connection path variable by pasting your credentials together like so:
conn_path <- paste(dsn_driver, ";DATABASE=",dsn_database, ";HOSTNAME=",dsn_hostname, ";PORT=",dsn_port, ";PROTOCOL=",dsn_protocol, ";UID=",dsn_uid, ";PWD=",dsn_pwd,sep="")
Once you have the connection path, use two methods from the ibmdbR package to establish the connection (
idaConnect()) then initialize it (
Here you are calling the connection to the DB
ch <- idaConnect(conn_path) idaInit(ch)
Run a Query
With the connection established, you can easily create an ida.data.frame. Here you create a data frame for a table named
table_string = `GOSALES.PRODUCT` df = ida.data.frame(table_string)
If you are curious about the data structure, here is an excerpt from the ibmdbR documentation:
This function creates an IDA data frame (that is, an object of the class ida.data.frame). It does not store any data in local memory, but aggregates metadata used to determine the exact table subset (columns - SELECT clause; and/or rows - WHERE clause) and creates a pointer to a table located in the database.
You are now ready to use Db2 Warehouse on Cloud in RStudio.
Here's the complete script for reference:
install.packages('ibmdbR') library(ibmdbR) dsn_driver <- #your_db dsn_database <- #your_db dsn_hostname <- #your_hostname dsn_port <- "50000" dsn_protocol <- "TCPIP" dsn_uid <- #your_uid dsn_pwd <- #your_pwd conn_path <- paste(dsn_driver, ";DATABASE=",dsn_database, ";HOSTNAME=",dsn_hostname, ";PORT=",dsn_port, ";PROTOCOL=",dsn_protocol, ";UID=",dsn_uid, ";PWD=",dsn_pwd,sep="") ch <- idaConnect(conn_path) idaInit(ch) table_string = 'GOSALES.PRODUCT' idadf = ida.data.frame(table_string) dim(idadf) head(idadf)