IBM Data Science Experience provides an environment that brings together all of a data scientist's tools into one location. It includes the most popular Open Source tools and IBM value-add functionality with integration of community and social features to make data scientists more successful.
A wide variety of cloud data sources are easily connected to DSX, or a user could upload data from DB2 on z/OS (by creating a copy) into the environment. However, the ability to access to data in-place on the system of record, has great appeal.
In this guide, I will outline the steps needed to configure DSX to access an on-prem DB2 for z/OS database using JDBC and provide the ability to utilize RStudio to perform advanced analytics operations on this data.
Here is an outline of the steps we will go through in the guide to set up the DSX environment for DB2 for z/OS connectivity:
- Examine the RStudio interface.
- Identify the JDBC driver files and configure the initial environment with RStudio.
- If required, setup a Bluemix Secure Gateway.
- Use the RJDBC package to connect with a DB2 for z/OS database, and run a query.
Data Science Experience – RStudio Interface
(Note: The screenshots in this tutorial are from a Mac OSX environment running Firefox. If you are running from Windows or Linux, there may some differences in the appearance of the native dialogs and browser interfaces.)
Once setup with your account, click on the DSX Logo at the top of your screen, and this will take you to the main Community page. Once there, if you click the far left drop-down menu, you will see your project options, including RStudio. Click on the the RStudio option, and this will open up your RStudio instance.
If you have opened DSX RStudio before, the instance opened will have retained any history or enviroment settings that you had saved/autosaved from before. If you are familiar with RStudio and have used a native version of RStudio on Windows/Mac OSX, it will look exactly the same. The only difference is that the underlying infrastucture is running in a Bluemix virtual machine. There is limited access to the underlying operating system, but we can access the filesystem to upload the JDBC driver and license file and configure the connection for DB2 for z/OS.
The RStudio interface should look similar to what is shown below......
(For a complete guide to using RStudio, please see the documentation linked to in the ‘Help’ Section)
Next, we will confirm that the correct packages are installed/loaded that enable JDBC access to our on-prem database.
In the bottom right panel of the RStudio desktop you will see the ‘Packages’ tab, and within that is an ‘Install’ tab. If you click “Install”, you will get a dialog box that allows you to load the required packages from the CRAN repo. You will need to install ‘RJDBC’. If the ‘Install dependencies’ box is checked, you will also be installing ‘rJava’ and ‘DBI’ (which load automatically). Once you install these packages, you should see them in the User Library. If they are unchecked, please check them, and they will be enabled in the environment.
Next, we will need to upload the appropriate DB2 for z/OS data server driver .jar files to the DSX environment. The files we will need are: db2jcc.jar and db2jcc_license_cisuz.jar
If you already have a licensed version of DB2 Connect installed, these driver files are already available on one of your existing client environment setups. Ask your DB2 system administrator for the location of these files in your client environment.
This guide is based on DB2Connect Version 11.1 for Linux, Unix and Windows (client) and DB2 for z/OS Unlimited Edition (server). If you are not currently licensed for DB2Connect, see the Appendix at the end of this guide for instructions on how to obtain the correct data server drivers and license files that are required.
From the RStudio desktop (shown above), go to the lower right panel again, to the “Files” section. Then, click “Upload” and you will get a dialog box that allows you to browse your local file system to upload the driver files. For both db2jcc.jar and db2jcc_license_cisuz.jar, upload these to the current directory of your DSX/RStudio environment.
Connecting to the z Server over TCPIP – Secure Gateway
With the JDBC driver files in place, we now need to configure access to the server which is running the DB2 for z/OS instance. The assumption for most IBM customers is that their target z server is running on a secure internal network, which is accessed externally through a VPN or from an internal private network access point. If this is the case, you will also need to set up a Bluemix Secure Gateway, that will allow the Bluemix/DSX cloud app to access your z server through a firewall, using the same VPN/internal network acces point that you use for other client connections from a desktop.
For more general information on Bluemix Secure Gateways, please go to: https://console.ng.bluemix.net/docs/services/SecureGateway/secure_gateway.html
If you do not need to configure a Secure Gateway, you can skip to the "Connecting to DB2/zOS with JDBC" section.
To configure the secure gateway, please go to https://console.ng.bluemix.net/dashboard/services/, log in to Bluemix with the same credentials you are using for DSX, and then use the drop-down menu on the left to go to the ‘Services’ and then ‘Dashboard’ menu.
Once within the Services dashboard, click the “Create Service +” button.
Next, we will go to “Integrate” and then choose the “Secure Gateway” service from the menu.
Leave the service “Unbound” and click “Create”.
And, then “+ Add Gateway’ Choose a name for your gateway, uncheck both of the boxes listed, and then click “Add Gateway”.
Now, we can setup the destination details, and the client. Click on the gateway.
Then click on “+ Add Destination”. Choose “On-Premises” and click Next. On the next screen add your host name/IP , and the DB Port of your DB2z Subsystem. Click ‘Next’
Then, select TCP protocol on the next screen, and click “Next” again.
The next screen setting is related to server side authentication. If you typically connect with just a user and password, then choose “None”. If you are not sure whether or not your server requires additional authentication, ask your System Administrator, and you can change this later. Click ‘Next’.
You can skip over the next setting, unless you want to privatize the IP address and port or your connection. Click ‘Next’.
To complete the destination settings for the back-end server, we just need to give a name for our connection. Choose a name for your destination that describes the DB2 Subsystem you are connecting to, and then click “Finish”.
Now let’s install the client for our secure gateway. Choose “+ Add Clients”
Choose your preferred method of hosting the gateway client. For this demonstration, we will be using IBM Installer - Mac OSX version on a laptop that has a VPN connection to access an on-prem z server. The details for different OS installs and Using Docker or IBM DataPower are covered in detail in the online instructions. For specific install details, See the “Bluemix Documentation” link.
At this time, copy the Gateway ID, for input later into the gateway client command-line interface.
After Downloading the client, double-clicking the image, you might have to go to your ‘Finder’ to see the mount point for the image.
Click on the mount-point and then drag the ‘ibm’ folder into the Applications Folder.
Now, we can go into the ‘IBM’ folder within Applicatons, and double-click to run the segw.command to open the CLI for the gateway.
Enter the gateway ID you copied earlier, and then enter ‘none’ for the security token.
Next, note the web address of the gateway interface - localhost:9003/dashboard
Enter that web address in your web browser.
Below, we see the gateway web UI. You can see all the details of your secure gateway from this interface. There is one more setting, for the access control list. Please click “Access Control List”, and then enter your destination hostname/IP and port once again.
Below, once you have added the host/IP and port, you can close the browser window, or minimize it.
We are almost done. The Secure Gateway config is complete.
Now we will return to the main Secure Gateway dashboard to copy the cloud hostname/port for the gateway. It will be used for the JDBC connection string in DSX/RStudio. From the Services Dashboard, click on the gateway you created.
On this screen, click on the ‘settings gear’ of the destination server connection details. A dialog box will open.
Copy the ‘Cloud Host : Port’ address to use in your JDBC connection in DSX/RStudio.
Return to the RStudio desktop, and we can setup the JDBC driver and run a query.
Connecting to DB2/zOS with JDBC
In the RStudio desktop, we are now ready to enter the JDBC commands at the console prompt to setup the connection to our DB2 for z/OS database. Make sure the RJDBC, DBI, and rJava packages have loaded.
Also, at this time, make sure you are running any VPN connection that is normally required to access your backend system from a desktop client.
Enter the following commands in the RStudio console, noting the results you get for the classpath display:
drv <- JDBC("com.ibm.db2.jcc.DB2Driver","/home <br> /rstudio/db2jcc.jar")
(* note the addition of the license file….its position in the classpath is important *)
Now, we will specify the connection itself, and use either the host/port of our external facing backend server (if we did not configure a secure gateway), or the cloud host/port that we set up with our secure gateway client.
In either case, the DB LOCATION name and the login/password will be the same.
(* In our example this is NDCDB202 *)
So, it might look like this for a direct server connection:
conn <-dbConnect(drv,"jdbc:db2://demomvs.cc9.pok.ibm.com:446/NDCDB202", user="user", password="secret")
It will look like this if we are using the secure gateway:
conn <- dbConnect(drv,"jdbc:db2://cap-sg-prd4.integration.ibmcloud.com:16912/NDCDB202", user="user", password="secret")
Then, we specify the query we want to run, and fetch the results.
rs <- dbSendQuery(conn, "SELECT AGE,OCCUPATION,INCOME FROM DDS4487.ADULT WHERE AGE < 30 AND EDUCATION = 'HS-grad'")
Alternatively, you can store the results in a data frame, to then further process and analyze the query results using other algorithms in RStudio.
(In the dbSendQuery() command we reference a DB2 table ADULT. This test table was created from the UCI Machine Learning Data Repository, and was loaded into DB2 through Data Studio. The data prep and loading of a test table is left as an exercise for the user of this guide.
DB2Connect and License Certificate Activation:
Linux, Unix, and Windows
If you have an IBM id, you can login and download the Data Server Driver package from here: https://www-01.ibm.com/support/docview.wss?uid=swg21385217
and then, download the appropriate license file activation from IBM Passport Advantage:
Please take note of the version of DB2 Connect that you are licensed for, listed below:
Once at the website for Data Server Driver downloads, you will be presented with several options to download. For the purposes of what we are doing here, you will want to download this package.
Once downloaded and unarchived, you will see the following files.
For connecting to DB2 for z/OS, you will also need the license certificate for DB2Connect. This is downloaded separately from the PassPort Advantage site, as noted above.
For more information about the Data Server Drivers for JDBC and SQLJ, go to the following link: http://www.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.apdv.gs.doc/doc/t0010264.html