IBM Data Science Experience overview
Data Science Experience (DSX) provides you with the environment and tools to solve your business problems by collaboratively analyzing data. This illustration shows how the architecture of DSX is centered around the project. A project is how you organize your resources for solving a business problem.
When you create a project for analyzing data, you associate it with a compute engine and storage. Then, you add collaborators, data assets, and analytic assets to your project. You can also add bookmarks to important resources and associate other services with your project.
Here's what a project looks like:
The Community contains resources to help you learn more about data science:
- Read articles from many sources to keep current with data science trends.
- Read tutorials for multiple skill levels to learn how to do specific data science tasks.
- Run sample notebooks to learn new techniques or to use as templates for your own notebooks.
- Analyze data sets in sample notebooks or in your own notebooks.
Here's what the community looks like:
Watch this video to see a tour of the Community section.
You can analyze data from the following sources:
- You can load files from your local computer to use in notebooks and models.
- You can create connections to these kinds of data sources:
- Cloud data services, such as Db2 Warehouse on Cloud, Amazon S3, Cloudant, PostgreSQL
- On-premises databases, such as Db2, Netezza, Oracle, and SQLServer
- Streaming data services, such as IBM MessageHub and IBM Streaming Analytics
- You can use DSX Community data sets.
Depending on how your DSX account is set up, you can add the following types of people as collaborators to your project and control their permissions:
- People who already have DSX accounts plus people who don't have DSX accounts yet.
- People who belong to your DSX enterprise account.
- People in your company, if your company set up SAML federation in IBM Cloud.
You can analyze data with these tools:
- Write Jupyter notebooks in Python, Scala, or R.
- Use the Flow Editor to create models that use machine learning for predictive analytics.
- Use Rstudio within DSX as an alternative to running R notebooks.
- Use Streams Designer to design stream flows to collect and analyze large amounts of streaming data.
Here's what a notebook looks like:
Libraries and APIs
You can use these libraries and APIs within your analytic assets to analyze data and display your results:
- Visualization libraries and tools help you tell a story with your data. DSX includes open source libraries like Brunel and PixieDust, as well as the SPSS Model visualization tool.
- Machine learning algorithms and tools provide predictive analytics. DSX includes SPSS machine learning algorithms and open source machine learning and deep learning APIs.
- Open source libraries and package provide computation, analytics, and visualization methods. DSX includes some popular open source libraries, such as PySpark, matplotlib, SparkML.
- You can install other open source libraries that you need.
You use a Spark runtime compute engine to run your notebooks and models:
- IBM Apache Spark is the default service that you configure when you create a project.
- If you have an Enterprise account, you can add Amazon EMR to a project.
- If you need a Spark cluster, you can add IBM Analytics Engine to a project.
You can use IBM Cloud services within your analytic assets:
- Watson Machine Learning provides a Flow Editor within DSX to create models and IBM Cloud integration to deploy models.
- Decision Optimization provides an optimization engine for running prescriptive analytic APIs.
- MessageHub ingests Kafka topics.
- Streaming Analytics ingests, analyzes, monitors, and correlates data from real-time data sources.