IBM Data Science Experience overview

Data Science Experience (DSX) provides you with the environment and tools to solve your business problems by collaboratively analyzing data. This illustration shows how the architecture of DSX is centered around the project. A project is how you organize your resources for solving a business problem.

Shows DSX architecture, as described in the text.

Projects

When you create a project, it is associated with an analytic engine and storage. Then, you add collaborators, data assets, and analytic assets to your project. You can also add bookmarks to important resources and associate other services with your project. Here's what a project looks like:

An example project page in Data Science Experience. The Overview page includes a list of notebooks and data assets in the project.

Community

The Community contains resources to help you learn more about data science:

  • Read articles from many sources to keep current with data science trends.
  • Read tutorials for multiple skill levels to learn how to do specific data science tasks.
  • Run sample notebooks to learn new techniques or to use as templates for your own notebooks.
  • Analyze data sets in sample notebooks or in your own notebooks.

Here's what the community looks like:

The community includes notebooks, articles, tutorials, and data sets.

Watch this video to see a tour of the Community section.

Data assets

You can analyze data from the following sources:

  • You can load files from your local computer to use in notebooks and models.
  • You can create connections to these kinds of data sources:
    • Cloud data services, such as Db2 Warehouse on Cloud, Amazon S3, Cloudant, PostgreSQL
    • On-premises databases, such as Db2, Netezza, Oracle, and SQLServer
    • Streaming data services, such as IBM MessageHub and IBM Streaming Analytics
  • You can use DSX Community data sets.

Collaborators

Depending on how your DSX account is set up, you can add the following types of people as collaborators to your project and control their permissions:

  • People who already have DSX accounts plus people who don't have DSX accounts yet.
  • People who belong to your DSX enterprise account.

Tools

You can analyze data with these tools:

  • You can write Jupyter notebooks in Python, Scala, or R.
  • The Flow Editor guides you through creating models that use machine learning for predictive analytics.
  • You can use Rstudio within DSX as an alternative to running R notebooks.

Here's what a notebook looks like:

When you open a notebook, the action bar includes the previously listed features.

Libraries and APIs

You can use these libraries and APIs within your analytic assets to analyze data and display your results:

  • Visualization libraries and tools help you tell a story with your data. DSX includes open source libraries like Brunel and PixieDust, as well as the SPSS Model visualization tool.
  • Machine learning algorithms and tools provide predictive analytics. DSX includes SPSS machine learning algorithms and open source machine learning and deep learning APIs.
  • Open source libraries and package provide computation, analytics, and visualization methods. DSX includes some popular open source libraries, such as PySpark, matplotlib, SparkML.
  • You can install other open source libraries that you need.

Runtime engine

You use a Spark runtime engine to run your notebooks and models:

  • IBM Apache Spark is the default service that is configured when you sign up for DSX.
  • If you have an Enterprise account, you can add Amazon EMR to a project.

Platform integrations

You can use IBM Bluemix services within your analytic assets:

  • Watson Machine Learning provides a Flow Editor within DSX to create models and Bluemix integration to deploy models.
  • Decision Optimization provides an optimization engine for running prescriptive analytic APIs.
  • MessageHub ingests Kafka topics.
  • Streaming Analytics ingests, analyzes, monitors, and correlates data from real-time data sources.