IBM Data Science Experience overview

Data Science Experience (DSX) provides you with the environment and tools to solve your business problems by collaboratively analyzing data. This illustration shows how the architecture of DSX is centered around the project. A project is how you organize your resources for solving a business problem.

Shows DSX architecture, as described in the text.

Projects

When you create a project for analyzing data, you associate it with a compute engine and storage. Then, you add collaborators, data assets, and analytic assets to your project. You can also add bookmarks to important resources and associate other services with your project.

Here's what a project looks like:

An example project page in Data Science Experience.

Community

The Community contains resources to help you learn more about data science:

  • Read articles from many sources to keep current with data science trends.
  • Read tutorials for multiple skill levels to learn how to do specific data science tasks.
  • Run sample notebooks to learn new techniques or to use as templates for your own notebooks.
  • Analyze data sets in sample notebooks or in your own notebooks.

Here's what the community looks like:

The community includes notebooks, articles, tutorials, and data sets.

Watch this video to see a tour of the Community section.

Figure 1. Video iconCommunity Tour
This video provides a tour of the Community section in DSX.

Data assets

You can analyze data from the following sources:

  • You can load files from your local computer to use in notebooks and models.
  • You can create connections to these kinds of data sources:
    • Cloud data services, such as Db2 Warehouse on Cloud, Amazon S3, Cloudant, PostgreSQL
    • On-premises databases, such as Db2, Netezza, Oracle, and SQLServer
    • Streaming data services, such as IBM MessageHub and IBM Streaming Analytics
  • You can use DSX Community data sets.

Collaborators

Depending on how your DSX account is set up, you can add the following types of people as collaborators to your project and control their permissions:

  • People who already have DSX accounts plus people who don't have DSX accounts yet.
  • People who belong to your DSX enterprise account.
  • People in your company, if your company set up SAML federation in IBM Cloud.

Tools

You can analyze data with these tools:

  • Write Jupyter notebooks in Python, Scala, or R.
  • Use the Flow Editor to create models that use machine learning for predictive analytics.
  • Use Rstudio within DSX as an alternative to running R notebooks.
  • Use Streams Designer to design stream flows to collect and analyze large amounts of streaming data.

Here's what a notebook looks like:

When you open a notebook, the action bar includes the previously listed features.

Libraries and APIs

You can use these libraries and APIs within your analytic assets to analyze data and display your results:

  • Visualization libraries and tools help you tell a story with your data. DSX includes open source libraries like Brunel and PixieDust, as well as the SPSS Model visualization tool.
  • Machine learning algorithms and tools provide predictive analytics. DSX includes SPSS machine learning algorithms and open source machine learning and deep learning APIs.
  • Open source libraries and package provide computation, analytics, and visualization methods. DSX includes some popular open source libraries, such as PySpark, matplotlib, SparkML.
  • You can install other open source libraries that you need.

Runtime environment

You use a Spark runtime compute engine to run your notebooks and models:

  • IBM Apache Spark is the default service that you configure when you create a project.
  • If you have an Enterprise account, you can add Amazon EMR to a project.
  • If you need a Spark cluster, you can add IBM Analytics Engine to a project.

Platform integrations

You can use IBM Cloud services within your analytic assets:

  • Watson Machine Learning provides a Flow Editor within DSX to create models and IBM Cloud integration to deploy models.
  • Decision Optimization provides an optimization engine for running prescriptive analytic APIs.
  • MessageHub ingests Kafka topics.
  • Streaming Analytics ingests, analyzes, monitors, and correlates data from real-time data sources.