Welcome to PixieDust

This notebook features an introduction to PixieDust, the Python library that makes data visualization easy.

This notebook runs on Python 2.7 and 3.5, with Spark 2.0.

Table of Contents


Get started

This introduction is pretty straightforward, but it wouldn't hurt to load up the PixieDust documentation so it's handy.

New to notebooks? Don't worry. Here's all you need to know to run this introduction:

  1. Make sure this notebook is in Edit mode
  2. To run code cells, put your cursor in the cell and press Shift + Enter.
  3. The cell number will change to [*] to indicate that it is currently executing. (When starting with notebooks, it's best to run cells in order, one at a time.)
In [1]:
# To confirm you have the latest version of PixieDust on your system, run this cell
!pip install --user --upgrade pixiedust
Requirement already up-to-date: pixiedust in /gpfs/global_fs01/sym_shared/YPProdSpark/user/s2b7-790f27d2e466b6-772f4e1cd93d/.local/lib/python2.7/site-packages
Requirement already up-to-date: mpld3 in /gpfs/global_fs01/sym_shared/YPProdSpark/user/s2b7-790f27d2e466b6-772f4e1cd93d/.local/lib/python2.7/site-packages (from pixiedust)
Requirement already up-to-date: lxml in /gpfs/global_fs01/sym_shared/YPProdSpark/user/s2b7-790f27d2e466b6-772f4e1cd93d/.local/lib/python2.7/site-packages (from pixiedust)
Requirement already up-to-date: geojson in /gpfs/global_fs01/sym_shared/YPProdSpark/user/s2b7-790f27d2e466b6-772f4e1cd93d/.local/lib/python2.7/site-packages (from pixiedust)

Now that you have PixieDust installed and up-to-date on your system, you need to import it into this notebook. This is the last dependency before you can play with PixieDust.

In [2]:
import pixiedust
Pixiedust database opened successfully
Pixiedust version 1.0.9

If you get a message telling you that you're not running the latest version of PixieDust, restart the kernel from the Kernel menu and rerun the import pixiedust command. (Any time you restart the kernel, rerun the import pixiedust command.)

Behold, display()

In the next cell, build a simple dataset and store it in a variable.

In [3]:
# Build the SQL context required to create a Spark dataframe 
sqlContext=SQLContext(sc) 
# Create the Spark dataframe, passing in some data, and assign it to a variable
df = spark.createDataFrame(
[("Green", 75),
 ("Blue", 25)],
["Colors","%"])

The data in the variable df is ready to be visualized, without any further code other than the call to display().

In [4]:
# display the dataframe above as a pie chart
display(df)
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
Colors in this pie chart, by %

After running the cell above, you should see a Spark DataFrame displayed as a pie chart, along with some controls to tweak the display. All that came from passing the DataFrame variable to display().

In the next cell, you'll pass more interesting data to display(), which will also offer more advanced controls.

In [5]:
# create another DataFrame, in a new variable
df2 = spark.createDataFrame(
[(2010, 'Camping Equipment', 3),
 (2010, 'Golf Equipment', 1),
 (2010, 'Mountaineering Equipment', 1),
 (2010, 'Outdoor Protection', 2),
 (2010, 'Personal Accessories', 2),
 (2011, 'Camping Equipment', 4),
 (2011, 'Golf Equipment', 5),
 (2011, 'Mountaineering Equipment',2),
 (2011, 'Outdoor Protection', 4),
 (2011, 'Personal Accessories', 2),
 (2012, 'Camping Equipment', 5),
 (2012, 'Golf Equipment', 5),
 (2012, 'Mountaineering Equipment', 3),
 (2012, 'Outdoor Protection', 5),
 (2012, 'Personal Accessories', 3),
 (2013, 'Camping Equipment', 8),
 (2013, 'Golf Equipment', 5),
 (2013, 'Mountaineering Equipment', 3),
 (2013, 'Outdoor Protection', 8),
 (2013, 'Personal Accessories', 4)],
["year","category","unique_customers"])

# This time, we've combined the dataframe and display() call in the same cell
# Run this cell 
display(df2)
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
Customers by Category clustered by Year

display() controls

Renderers

The chart above, like the first one, is rendered by matplotlib. With PixieDust, you have other options. To toggle between renderers, use the Renderers control at top right of the display output:

  1. Bokeh is interactive; play with the controls along the top of the chart, for example, zoom and save.
  2. Matplotlib is static; you can save the image as a PNG

Chart options

  1. Chart types: At top left, you should see an option to display the dataframe as a table. You should also see a dropdown menu with other chart options, including bar charts, pie charts, scatter plots, and so on.
  2. Options: Click the Options button to explore other display configurations; for example, clustering and aggregation.

Here's more on customizing display() output.

Load External Data

So far, you've worked with data hard-coded into our notebook. Now, load external data (CSV) from a URL.

In [6]:
# load a CSV with pixiedust.sampleData()
df3 = pixiedust.sampleData("https://github.com/ibm-watson-data-lab/open-data/raw/master/cars/cars.csv")
display(df3)
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
Distribution of MPG per Horsepower

You should see a scatterplot above, rendered again by matplotlib. Find the Renderer menu at top-right. You should see options for Bokeh and Seaborn. If you don't see Seaborn, it's not installed on your system. No problem, just install it by running the next cell.

In [7]:
# To install Seaborn, uncomment the next line, and then run this cell
#!pip install --user seaborn

If you installed Seaborn, you'll need to also restart your notebook kernel, and run the cell to import pixiedust again. Find Restart in the Kernel menu above.

End of chapter. Return to table of contents


Data files commonly reside in remote sources, such as such as public or private market places or GitHub repositories. You can load comma separated value (csv) data files using Pixiedust's sampleData method.

Prerequisites

If you haven't already, import PixieDust. Follow the instructions in Get started.

Load data

To load a data set, run pixiedust.sampleData and specify the data set URL:

In [8]:
homes = pixiedust.sampleData("https://openobjectstore.mybluemix.net/misc/milliondollarhomes.csv")
Downloading 'https://openobjectstore.mybluemix.net/misc/milliondollarhomes.csv' from https://openobjectstore.mybluemix.net/misc/milliondollarhomes.csv
Downloaded 102051 bytes
Creating pySpark DataFrame for 'https://openobjectstore.mybluemix.net/misc/milliondollarhomes.csv'. Please wait...
Loading file using 'SparkSession'
Successfully created pySpark DataFrame for 'https://openobjectstore.mybluemix.net/misc/milliondollarhomes.csv'

The pixiedust.sampleData method loads the data into an Apache Spark DataFrame, which you can inspect and visualize using display().

Inspect and preview the loaded data

To inspect the automatically inferred schema and preview a small subset of the data, you can use the DataFrame Table view, as shown in this preconfigured example:

In [9]:
display(homes)
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
type: struct
field:
{'metadata': {}, 'type': 'string', 'name': 'PROPERTY TYPE', 'nullable': True}
{'metadata': {}, 'type': 'string', 'name': 'ADDRESS', 'nullable': True}
{'metadata': {}, 'type': 'string', 'name': 'CITY', 'nullable': True}
{'metadata': {}, 'type': 'string', 'name': 'STATE', 'nullable': True}
{'metadata': {}, 'type': 'integer', 'name': 'ZIP', 'nullable': True}
{'metadata': {}, 'type': 'integer', 'name': 'PRICE', 'nullable': True}
{'metadata': {}, 'type': 'integer', 'name': 'BEDS', 'nullable': True}
{'metadata': {}, 'type': 'double', 'name': 'BATHS', 'nullable': True}
{'metadata': {}, 'type': 'string', 'name': 'LOCATION', 'nullable': True}
{'metadata': {}, 'type': 'integer', 'name': 'SQFT', 'nullable': True}
{'metadata': {}, 'type': 'integer', 'name': 'LOT SIZE', 'nullable': True}
{'metadata': {}, 'type': 'integer', 'name': 'YEAR BUILT', 'nullable': True}
{'metadata': {}, 'type': 'integer', 'name': 'DAYS ON MARKET', 'nullable': True}
{'metadata': {}, 'type': 'string', 'name': 'URL', 'nullable': True}
{'metadata': {}, 'type': 'string', 'name': 'SOURCE', 'nullable': True}
{'metadata': {}, 'type': 'integer', 'name': 'LISTING ID', 'nullable': True}
{'metadata': {}, 'type': 'double', 'name': 'LATITUDE', 'nullable': True}
{'metadata': {}, 'type': 'double', 'name': 'LONGITUDE', 'nullable': True}
Showing 100 of 500
PROPERTY TYPE
ADDRESS
CITY
STATE
ZIP
PRICE
BEDS
BATHS
LOCATION
SQFT
LOT SIZE
YEAR BUILT
DAYS ON MARKET
URL
SOURCE
LISTING ID
LATITUDE
LONGITUDE
PROPERTY TYPE
ADDRESS
CITY
STATE
ZIP
PRICE
BEDS
BATHS
LOCATION
SQFT
LOT SIZE
YEAR BUILT
DAYS ON MARKET
URL
SOURCE
LISTING ID
LATITUDE
LONGITUDE
Single Family Residential 4 Newbury Road Rd Windham NH 3087 2450000 5 7.5 Windham 13461 139392 2008 84 http://www.redfin.com/NH/Windham/4-Newbury-Rd-03087/home/96548208 NEREN 58467283 42.83153747 -71.27639808
Single Family Residential 25 Marshall Rd Wellesley MA 2482 1909847 5 4.5 Wellesley 4900 12228 2016 71 http://www.redfin.com/MA/Wellesley/25-Marshall-Rd-02482/home/105557102 MLS PIN 61782463 42.2997542 -71.3088256
Single Family Residential 15 E Meadow Ln Middleton MA 1949 1177500 None 2.5 None 4263 40281 2015 None http://www.redfin.com/MA/Middleton/15-E-Meadow-Ln-01949/home/67981805 None None 42.585715 -71.012888
Condo/Co-op 983 Memorial Dr #302 Cambridge MA 2138 1100000 3 2.0 Harvard Square 1606 None 1920 74 http://www.redfin.com/MA/Cambridge/983-Memorial-Dr-02138/unit-302/home/105594755 MLS PIN 61690710 42.3722656 -71.1252212
Condo/Co-op 1 Franklin St Ph 2E Boston MA 2110 8950000 3 4.5 Midtown 3435 None 2016 86 http://www.redfin.com/MA/Boston/1-Franklin-St-02108/unit-2E/home/102070369 MLS PIN 55818606 42.35631 -71.05945
Condo/Co-op 18 Yarmouth St #1 Boston MA 2116 2600000 3 3.5 South End 2522 None 1880 88 http://www.redfin.com/MA/Boston/18-Yarmouth-St-02116/unit-1/home/9313347 MLS PIN 59168291 42.3458731 -71.0767967
Single Family Residential 128 Lowell St Lexington MA 2420 1185000 5 3.5 Lexington 3275 6300 2016 88 http://www.redfin.com/MA/Lexington/128-Lowell-St-02420/home/8553025 MLS PIN 59375875 42.436932 -71.190511
Single Family Residential 20 Jackson Rd Wellesley MA 2481 2165000 4 4.5 Wellesley 5199 16321 2016 88 http://www.redfin.com/MA/Wellesley/20-Jackson-Rd-02481/home/8964864 MLS PIN 51221892 42.307657 -71.252257
Condo/Co-op 30 Winchester St #3 Brookline MA 2446 1400000 3 3.0 Coolidge Corner 1504 None 1915 66 http://www.redfin.com/MA/Brookline/30-Winchester-St-02446/unit-3/home/105251020 MLS PIN 58480309 42.3420632 -71.1257602
Condo/Co-op 30 Winchester St #4 Brookline MA 2446 1500000 3 3.0 Coolidge Corner 1584 None 1915 66 http://www.redfin.com/MA/Brookline/30-Winchester-St-02446/unit-4/home/105251022 MLS PIN 58480311 42.3420632 -71.1257602
Condo/Co-op 30 Winchester St #5 Brookline MA 2446 1600000 3 3.0 Coolidge Corner 1686 None 1915 66 http://www.redfin.com/MA/Brookline/30-Winchester-St-02446/unit-5/home/105251023 MLS PIN 58480312 42.3420632 -71.1257602
Condo/Co-op 576 Washington St #206 Wellesley MA 2482 2500000 2 2.5 Wellesley 2221 None 2015 59 http://www.redfin.com/MA/Wellesley/576-Washington-St-02482/unit-206/home/109083144 MLS PIN 62060226 42.295421 -71.292739
Single Family Residential 2 Wellington Way Bedford MA 1730 1150000 4 3.5 Wellington Way 3531 43560 2012 58 http://www.redfin.com/MA/Bedford/2-Wellington-Way-01730/home/41363649 MLS PIN 59806880 42.5029123 -71.2849657
Condo/Co-op 267 Humphrey St #1 Swampscott MA 1907 1700000 3 2.5 Swampscott 2140 None 2016 58 http://www.redfin.com/MA/Swampscott/267-Humphrey-St-01907/unit-1/home/105944789 MLS PIN 61004497 42.466738 -70.913681
Condo/Co-op 1 Franklin St #3602 Boston MA 2110 2825000 2 2.5 Midtown 1486 None 2016 59 http://www.redfin.com/MA/Boston/1-Franklin-St-02108/unit-3602/home/108795642 MLS PIN 61519341 42.35631 -71.05945
Condo/Co-op 33 Monument Sq #1 Boston MA 2129 1250000 2 3.0 Charlestown 1923 None 1896 59 http://www.redfin.com/MA/Boston/33-Monument-Sq-02129/unit-1/home/105149354 MLS PIN 58437300 42.3764066 -71.0618387
Townhouse 343 Commercial St Unit TH20 Boston MA 2109 3100000 2 2.5 Waterfront 2290 2290 1978 58 http://www.redfin.com/MA/Boston/343-Commercial-St-02109/unit-TH20/home/105711342 MLS PIN 59226234 42.3658992 -71.0511488
Condo/Co-op 46 Shepard St #24 Cambridge MA 2138 1120000 3 2.0 Harvard Square 1524 None 1900 58 http://www.redfin.com/MA/Cambridge/46-Shepard-St-02138/unit-24/home/11585901 MLS PIN 61655663 42.3807083 -71.123601
Condo/Co-op 1 Franklin St #1808 Boston MA 2110 2038888 2 2.0 Midtown 1566 None 2016 58 http://www.redfin.com/MA/Boston/1-Franklin-St-02108/unit-1808/home/109247975 MLS PIN 62369735 42.35631 -71.05945
Condo/Co-op 117 Beacon St Unit A Boston MA 2116 4750000 4 4.5 Back Bay 3052 None 1864 58 http://www.redfin.com/MA/Boston/117-Beacon-St-02116/unit-A/home/108973963 MLS PIN 61672595 42.354921 -71.073407
Condo/Co-op 1 Franklin St #1008 Boston MA 2110 2049000 2 2.0 Midtown 1476 None 2016 59 http://www.redfin.com/MA/Boston/1-Franklin-St-02108/unit-1008/home/109481369 MLS PIN 62725868 42.35631 -71.05945
Condo/Co-op 341 Marlborough St #4 Boston MA 2116 3005000 3 3.0 Back Bay 1618 None 1900 67 http://www.redfin.com/MA/Boston/341-Marlborough-St-02115/unit-4/home/56787595 MLS PIN 61970237 42.351212 -71.085754
Condo/Co-op 485-495 Harrison Ave #401 Boston MA 2118 1390000 2 2.0 South End 1409 None 1914 67 http://www.redfin.com/MA/Boston/495-Harrison-Ave-02118/unit-401/home/108978383 MLS PIN 61691728 42.3417737 -71.0667754
Single Family Residential 415 Concord Rd Weston MA 2493 1725000 4 3.5 Weston 3185 217687 1974 59 http://www.redfin.com/MA/Weston/415-Concord-Rd-02493/home/8779249 MLS PIN 47855399 42.386891 -71.320689
Single Family Residential 18 Lorena Rd Winchester MA 1890 2375000 6 7.0 Winchester 5812 11988 2016 67 http://www.redfin.com/MA/Winchester/18-Lorena-Rd-01890/home/103985838 MLS PIN 56019944 42.4458374 -71.1285051
Single Family Residential 1 Wilshire Rd Newbury MA 1951 2225000 4 5.5 Wilshire Road 4214 18138 2014 58 http://www.redfin.com/MA/Newbury/1-Wilshire-Rd-01951/home/105539600 MLS PIN 59011440 42.7796754 -70.8476708
Single Family Residential 25 Lorena Rd Winchester MA 1890 2075000 5 4.5 Winchester 5600 16820 2016 66 http://www.redfin.com/MA/Winchester/25-Lorena-Rd-01890/home/108267308 MLS PIN 60054516 42.4456575 -71.1279728
Single Family Residential Lot 1 Monsen Rd Concord MA 1742 1454900 4 4.5 Monsen Farm 4555 21509 2015 58 http://www.redfin.com/MA/Concord/1-Monsen-Rd-01742/home/102153489 MLS PIN 51940855 42.4693342 -71.3266466
Townhouse 170 Harvard St Unit 1 Newton MA 2460 1100000 4 3.5 Newtonville 2388 10089 1910 66 http://www.redfin.com/MA/Newton/170-Harvard-St-02460/unit-1/home/109313528 MLS PIN 62550577 42.3468986 -71.2005455
Condo/Co-op Zero Worcester Sq Ph 2 Boston MA 2118 1665000 2 2.5 South End 1515 None 2013 58 http://www.redfin.com/MA/Boston/0-Worcester-Sq-02118/unit-2/home/109286549 MLS PIN 62470232 42.3372471 -71.0753519
Single Family Residential 1 Jerusalem Ln Cohasset MA 2025 1437000 4 3.5 Jerusalem Road/Atlantic Avenue/Jerusalem Lane Cul De Sac 2724 9443 2000 66 http://www.redfin.com/MA/Cohasset/1-Jerusalem-Ln-02025/home/8835487 MLS PIN 60777396 42.259862 -70.811424
Condo/Co-op 183 Massachusetts Ave #803 Boston MA 2115 1082500 2 2.0 Back Bay 1220 None 2002 74 http://www.redfin.com/MA/Boston/183-Massachusetts-Ave-02115/unit-803/home/12402725 MLS PIN 62071287 42.3455392 -71.0871356
Townhouse 31 Day St Unit B Somerville MA 2144 1351000 3 3.0 Davis Square 1874 None 2012 88 http://www.redfin.com/MA/Somerville/31-Day-St-02144/unit-B/home/40314152 MLS PIN 61798074 42.3951863 -71.1242988
Condo/Co-op 83 Newton St Somerville MA 2143 1240000 8 4.0 None 4034 None 1900 None http://www.redfin.com/MA/Somerville/83-Newton-St-02143/home/8730062 None None 42.3771375 -71.0976651
Single Family Residential 34 Crestwood Rd Marblehead MA 1945 2997000 1 5.0 None 8509 34400 2012 None http://www.redfin.com/MA/Marblehead/34-Crestwood-Rd-01945/home/11768413 None None 42.501777 -70.877002
Single Family Residential 78 Bonad Rd Brookline MA 2467 1070000 3 2.5 South Brookline 1806 4982 2016 71 http://www.redfin.com/MA/Brookline/78-Bonad-Rd-02467/home/109052310 MLS PIN 61948749 42.3178198 -71.1626756
Single Family Residential 217 Forest St Winchester MA 1890 1155000 4 3.5 Muraco School District 3779 9234 2013 72 http://www.redfin.com/MA/Winchester/217-Forest-St-01890/home/110057079 MLS PIN 56198065 42.4714541 -71.1156268
Single Family Residential 31 Mussell Point Way Gloucester MA 1930 2075000 3 3.5 None 2878 39400 2001 None http://www.redfin.com/MA/Gloucester/31-Mussell-Point-Way-01930/home/11301633 None None 42.586906 -70.687225
Condo/Co-op 25 Piedmont St #2 Boston MA 2116 1660000 2 2.0 South End 1328 None 2015 70 http://www.redfin.com/MA/Boston/25-Piedmont-St-02116/unit-2/home/109296598 MLS PIN 62502554 42.3498964 -71.0684372
Single Family Residential 1 Denny St Westborough MA 1581 1100000 6 3.5 Westborough 3394 239580 1917 79 http://www.redfin.com/MA/Westborough/1-Denny-St-01581/home/16634032 MLS PIN 60054100 42.259306 -71.611866
Single Family Residential 192 Claybrook Rd Dover MA 2030 3780000 5 7.5 Dover 11500 110016 2000 81 http://www.redfin.com/MA/Dover/192-Claybrook-Rd-02030/home/11706025 MLS PIN 57585881 42.2640769 -71.3030216
Condo/Co-op 36 Mount Vernon St #36 Cambridge MA 2140 1460000 4 2.0 Cambridge 2006 None 1873 78 http://www.redfin.com/MA/Cambridge/36-Mount-Vernon-St-02140/unit-36/home/109048192 MLS PIN 61935510 42.3869743 -71.1206802
Condo/Co-op 67 Harvard Ave #67 Brookline MA 2446 1789000 3 3.5 None 3584 None 1909 None http://www.redfin.com/MA/Brookline/67-Harvard-Ave-02446/unit-67/home/26768959 None None 42.3370986 -71.1238246
Townhouse 7 Marlborough St #1 Boston MA 2116 2570000 3 3.5 Back Bay 2179 2179 1900 88 http://www.redfin.com/MA/Boston/7-Marlborough-St-02116/unit-1/home/9299190 MLS PIN 59800618 42.354279 -71.072681
Condo/Co-op 78 Waltham St #3 Boston MA 2118 1135000 2 1.0 None 850 None 1900 None http://www.redfin.com/MA/Boston/78-Waltham-St-02118/unit-3/home/9310881 None None 42.343133 -71.0706771
Condo/Co-op 32 Union Park #4 Boston MA 2118 1235000 2 2.0 None 1028 None 1900 None http://www.redfin.com/MA/Boston/32-Union-Park-02118/unit-4/home/9282029 None None 42.3429262 -71.0717198
Condo/Co-op 109 Chandler St #1 Boston MA 2116 1925000 3 3.5 None 1803 None 1900 None http://www.redfin.com/MA/Boston/109-Chandler-St-02116/unit-1/home/9322251 None None 42.3464393 -71.0736695
Condo/Co-op 141 Dorchester Ave #119 Boston MA 2127 1150000 2 2.0 South Boston 1701 1701 2006 78 http://www.redfin.com/MA/Boston/141-Dorchester-Ave-02127/unit-119/home/18982472 MLS PIN 59722577 42.3419516 -71.0574969
Townhouse 113 Commonwealth Ave #4 Boston MA 2116 2330000 3 2.5 Back Bay 1884 1884 1930 78 http://www.redfin.com/MA/Boston/113-Commonwealth-Ave-02116/unit-4/home/9250390 MLS PIN 61017855 42.3524353 -71.0770045
Single Family Residential 75 Brookline St Cambridge MA 2139 1399900 3 2.5 Cambridgeport 1810 3016 1854 79 http://www.redfin.com/MA/Cambridge/75-Brookline-St-02139/home/11558783 MLS PIN 59593814 42.3620583 -71.1033835
Single Family Residential 23 Laurel Hill Ln Winchester MA 1890 1475000 5 4.0 Winchester 4037 12475 2014 80 http://www.redfin.com/MA/Winchester/23-Laurel-Hill-Ln-01890/home/11439968 MLS PIN 58047231 42.4724111 -71.1196572
Single Family Residential 14 Hillside Ave Cambridge MA 2140 5350000 3 2.5 None 4557 15526 1905 None http://www.redfin.com/MA/Cambridge/14-Hillside-Ave-02140/home/110093849 None None 42.3853181 -71.1238296
Single Family Residential 22 Garfield Rd Belmont MA 2478 1560000 5 3.5 Belmont 3137 12614 1936 80 http://www.redfin.com/MA/Belmont/22-Garfield-Rd-02478/home/8452209 MLS PIN 61669441 42.4033462 -71.1745363
Single Family Residential 721 Pleasant St Belmont MA 2478 1480000 7 5.0 Belmont Center 6000 43900 1837 78 http://www.redfin.com/MA/Belmont/721-Pleasant-St-02478/home/11774971 MLS PIN 61594748 42.39572 -71.179743
Single Family Residential 129 I St Boston MA 2127 1290000 4 2.5 South Boston 2385 None 1890 77 http://www.redfin.com/MA/Boston/129-I-St-02127/home/9187855 MLS PIN 59630321 42.3337549 -71.040101
Single Family Residential 20 Authors Rd Concord MA 1742 1935000 4 4.0 Concord 5010 25102 2016 80 http://www.redfin.com/MA/Concord/20-Authors-Rd-01742/home/11594499 MLS PIN 57898084 42.461323 -71.338927
Single Family Residential 297 Heaths Bridge Rd Concord MA 1742 1400000 3 3.0 Conantum 5135 61344 1995 78 http://www.redfin.com/MA/Concord/297-Heaths-Bridge-Rd-01742/home/11600408 MLS PIN 61928887 42.434909 -71.362825
Single Family Residential 73 Monument St Concord MA 1742 2250000 5 4.0 Concord 3799 40075 1880 78 http://www.redfin.com/MA/Concord/73-Monument-St-01742/home/11590552 MLS PIN 57869520 42.463184 -71.348692
Single Family Residential 162 Highland St Weston MA 2493 4800000 5 5.0 South side Estate Area 6741 129260 2001 80 http://www.redfin.com/MA/Weston/162-Highland-St-02493/home/8775137 MLS PIN 58298305 42.355043 -71.319191
Single Family Residential 12 Thoreau Rd Lexington MA 2420 1403000 4 2.0 Burnham Farms Estates 2263 38712 1959 81 http://www.redfin.com/MA/Lexington/12-Thoreau-Rd-02420/home/8577537 MLS PIN 61594796 42.463883 -71.211304
Single Family Residential 229 Manning St Needham MA 2492 1660706 4 1.5 None 1678 10454 1949 None http://www.redfin.com/MA/Needham/229-Manning-St-02492/home/8914592 None None 42.2863162 -71.2283038
Single Family Residential 252 Manning St Needham MA 2492 1649000 5 5.5 Needham 6163 9583 2016 78 http://www.redfin.com/MA/Needham/252-Manning-St-02492/home/8914358 MLS PIN 61580545 42.285505 -71.227655
Single Family Residential 228 Upham St Melrose MA 2176 1500000 7 4.5 East Side 6639 23418 1908 79 http://www.redfin.com/MA/Melrose/228-Upham-St-02176/home/11783107 MLS PIN 61488166 42.458383 -71.054305
Single Family Residential 26 Coolidge Hill Rd Cambridge MA 2138 1225000 3 2.5 None 2682 4615 1925 None http://www.redfin.com/MA/Cambridge/26-Coolidge-Hill-Rd-02138/home/11594928 None None 42.374227 -71.1395469
Condo/Co-op 221 Mount Auburn St #302 Cambridge MA 2138 1110000 1 1.0 Harvard Square 988 None 1960 79 http://www.redfin.com/MA/Cambridge/221-Mount-Auburn-St-02138/unit-302/home/11587449 MLS PIN 59440749 42.3749516 -71.129983
Single Family Residential 197 Countryside Rd Newton MA 2459 1415000 4 2.5 Newton Center 2764 25878 1979 80 http://www.redfin.com/MA/Newton/197-Countryside-Rd-02459/home/11490231 MLS PIN 61707285 42.304059 -71.2010034
Single Family Residential 321 Central St Newton MA 2466 1030000 4 2.5 Newton 2724 30033 1874 81 http://www.redfin.com/MA/Newton/321-Central-St-02466/home/11440793 MLS PIN 59477712 42.3428608 -71.2532636
Single Family Residential 13 Utica Rd Needham MA 2494 1131250 4 2.5 Needham 3923 9000 2005 78 http://www.redfin.com/MA/Needham/13-Utica-Rd-02494/home/11714925 MLS PIN 61553852 42.2994493 -71.2294273
Single Family Residential 6 Sawyer Rd Wellesley MA 2481 2745000 5 8.0 Cliff Estates 7000 20000 2014 81 http://www.redfin.com/MA/Wellesley/6-Sawyer-Rd-02481/home/8976461 MLS PIN 59710146 42.311576 -71.280358
Single Family Residential 22 Leighton Rd Wellesley MA 2482 1530000 5 3.5 Dana Hall 3574 10000 1916 78 http://www.redfin.com/MA/Wellesley/22-Leighton-Rd-02482/home/8986596 MLS PIN 59747229 42.290604 -71.295376
Single Family Residential 123 Abbott Rd Wellesley MA 2481 2184000 6 5.5 Country Club 6119 34050 1905 78 http://www.redfin.com/MA/Wellesley/123-Abbott-Rd-02481/home/11724812 MLS PIN 56084352 42.303269 -71.267922
Single Family Residential 46 White Oak Rd Wellesley MA 2481 1630000 5 3.5 Wellesley 2777 27440 1946 78 http://www.redfin.com/MA/Wellesley/46-White-Oak-Rd-02481/home/8978449 MLS PIN 60874162 42.323326 -71.286699
Single Family Residential 94 Albion Rd Wellesley MA 2481 1457500 4 3.5 Cliff Estates 2918 26021 1978 81 http://www.redfin.com/MA/Wellesley/94-Albion-Rd-02481/home/8984365 MLS PIN 58888275 42.315699 -71.291925
Single Family Residential 79 Overlook Dr Carlisle MA 1741 1080000 4 3.5 Carlisle 3800 182516 1997 87 http://www.redfin.com/MA/Carlisle/79-Overlook-Dr-01741/home/8467988 MLS PIN 60090312 42.544212 -71.330903
Single Family Residential 3 Grey Ln Lynnfield MA 1940 1050000 5 3.5 King James Grant 4500 40140 1960 87 http://www.redfin.com/MA/Lynnfield/3-Grey-Ln-01940/home/11330976 MLS PIN 61233631 42.542049 -71.031766
Single Family Residential 6 Bennett Rd Wayland MA 1778 1100000 4 3.0 Wayland 3768 35840 1893 87 http://www.redfin.com/MA/Wayland/6-Bennett-Rd-01778/home/11685599 MLS PIN 59097036 42.36189 -71.353183
Condo/Co-op 10 Bowdoin St #417 Boston MA 2114 1050000 2 2.0 Beacon Hill 1139 1306 2003 87 http://www.redfin.com/MA/Boston/10-Bowdoin-St-02114/unit-417/home/11837836 MLS PIN 61320832 42.3607583 -71.062671
Townhouse 9 Joy St #3 Boston MA 2114 1175000 2 1.0 None 1250 1250 1890 None http://www.redfin.com/MA/Boston/9-Joy-St-02114/unit-3/home/9236297 None None 42.3587067 -71.0649545
Condo/Co-op 1 Garden St Unit 12 Boston MA 2114 1592500 3 2.0 None 1435 None 1899 None http://www.redfin.com/MA/Boston/1-Garden-St-02114/unit-12/home/9329550 None None 42.3611202 -71.066975
Single Family Residential 210 Meadowbrook Rd Weston MA 2493 3938000 4 5.0 Weston 6300 60000 1961 87 http://www.redfin.com/MA/Weston/210-Meadowbrook-Rd-02493/home/8785940 MLS PIN 59777195 42.358451 -71.282983
Single Family Residential 8 Hidden Rd Weston MA 2493 1700000 5 4.0 Country Club 4160 71326 1926 85 http://www.redfin.com/MA/Weston/8-Hidden-Rd-02493/home/8784603 MLS PIN 59640638 42.362089 -71.286346
Condo/Co-op 12 Museum Way #2304 Cambridge MA 2141 1175000 2 2.0 None 1266 None 1998 None http://www.redfin.com/MA/Cambridge/12-Museum-Way-02141/unit-2304/home/11602082 None None 42.3703968 -71.0712157
Condo/Co-op 1 Franklin St Ph 1A Boston MA 2110 7995000 3 4.5 Midtown 3172 None 2016 60 http://www.redfin.com/MA/Boston/1-Franklin-St-02108/unit-1A/home/101613053 MLS PIN 50956978 42.35631 -71.05945
Condo/Co-op 41 Milford St #2 Boston MA 2118 2175000 3 2.5 South End 2150 2150 1890 84 http://www.redfin.com/MA/Boston/41-Milford-St-02118/unit-2/home/28465481 MLS PIN 61799764 42.344312 -71.069765
Condo/Co-op 250 Boylston St Unit 5 Boston MA 2116 11500000 4 4.5 None 4841 None 1900 None http://www.redfin.com/MA/Boston/250-Boylston-St-02116/unit-5/home/9313437 None None 42.351883 -71.0693583
Condo/Co-op 137 Marlborough St #6 Boston MA 2116 2950000 3 3.0 Back Bay 2128 2128 1930 84 http://www.redfin.com/MA/Boston/137-Marlborough-St-02116/unit-6/home/9250255 MLS PIN 62113842 42.3531759 -71.078579
Condo/Co-op 128 Beacon St Unit G Boston MA 2116 3395000 2 2.5 Back Bay 2170 None 1899 87 http://www.redfin.com/MA/Boston/128-Beacon-St-02116/unit-G/home/11838759 MLS PIN 61916911 42.35518 -71.074644
Condo/Co-op 65 E India Row Apt 17A Boston MA 2110 2050000 1 1.0 None 754 None 1972 None http://www.redfin.com/MA/Boston/65-E-India-Row-02110/unit-17A/home/9260829 None None 42.3576699 -71.0504859
Condo/Co-op 110 Stuart St Unit 25J Boston MA 2116 1110000 1 1.5 Midtown 1103 None 2009 88 http://www.redfin.com/MA/Boston/110-Stuart-St-02116/unit-25J/home/39913276 MLS PIN 60206455 42.3509321 -71.0653685
Condo/Co-op 165 Tremont St Unit 1502 Boston MA 2111 2365000 2 2.0 None 1054 None 2003 None http://www.redfin.com/MA/Boston/165-Tremont-St-02111/unit-1502/home/9329174 None None 42.3539223 -71.0636921
Condo/Co-op 2 Avery St Unit 18E Boston MA 2111 3055000 3 3.5 Midtown 2344 None 2000 87 http://www.redfin.com/MA/Boston/2-Avery-St-02111/unit-18E/home/11736816 MLS PIN 54707422 42.3527843 -71.0630991
Condo/Co-op 1 Franklin St #1108 Boston MA 2110 1750000 2 2.0 Midtown 1566 999 2016 60 http://www.redfin.com/MA/Boston/1-Franklin-St-02108/unit-1108/home/108154689 MLS PIN 62285782 42.35631 -71.05945
Single Family Residential 142 Farm St Dover MA 2030 1125000 4 2.5 None 3273 89734 1859 None http://www.redfin.com/MA/Dover/142-Farm-St-02030/home/11708174 None None 42.221839 -71.321119
Single Family Residential 143 Pine St Dover MA 2030 1077000 4 2.5 Dover 3736 87466 1995 86 http://www.redfin.com/MA/Dover/143-Pine-St-02030/home/11708952 MLS PIN 60627103 42.219392 -71.284986
Condo/Co-op 223 Morrison Ave #223 Somerville MA 2144 1375000 4 3.5 Davis Square 2923 None 1900 60 http://www.redfin.com/MA/Somerville/223-Morrison-Ave-02144/unit-223/home/109084304 MLS PIN 62063986 42.3978318 -71.1204787
Single Family Residential 85 CHANDLER St Somerville MA 2144 1000000 5 1.0 Somerville 2054 4500 1900 87 http://www.redfin.com/MA/Somerville/85-Chandler-St-02144/home/8692230 MLS PIN 61671952 42.4004857 -71.1201065
Single Family Residential 32 Hancock Rd Weston MA 2493 1070000 4 3.0 Weston 2403 64792 1971 84 http://www.redfin.com/MA/Weston/32-Hancock-Rd-02493/home/8776937 MLS PIN 59921546 42.397893 -71.284612
Single Family Residential 294 Central Ave Needham MA 2494 1049900 4 2.5 Needham 2938 10019 2011 87 http://www.redfin.com/MA/Needham/294-Central-Ave-02494/home/8923881 MLS PIN 61048054 42.306322 -71.237639
Single Family Residential 46 Sudbury Rd Concord MA 1742 1562500 4 3.5 Concord 2646 7840 1812 85 http://www.redfin.com/MA/Concord/46-Sudbury-Rd-01742/home/11593642 MLS PIN 61869457 42.458425 -71.353752
Single Family Residential 18 Calumet Rd Winchester MA 1890 1705000 6 2.5 Winchester 3860 14850 1896 87 http://www.redfin.com/MA/Winchester/18-Calumet-Rd-01890/home/11457397 MLS PIN 61460482 42.448535 -71.150249

Simple visualization using bar charts

With PixieDust display(), you can visually explore the loaded data using built-in charts, such as, bar charts, line charts, scatter plots, or maps.

To explore a data set:

  • choose the desired chart type from the drop down
  • configure chart options
  • configure display options

You can analyze the average home price for each city by choosing:

  • chart type: bar chart
  • chart options
    • Options > Keys: CITY
    • Options > Values: PRICE
    • Options > Aggregation: AVG

Run the next cell to review the results.

In [10]:
display(homes)
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
Average home price by city

Explore the data

You can change the display Options so you can continue to explore the loaded data set without having to pre-process the data.

For example, change:

  • Options > Key to YEAR_BUILT and
  • Options > aggregation to COUNT

Now you can find out how old the listed properties are:

In [11]:
display(homes)
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
Property age

Use sample data sets

PixieDust comes with a set of curated data sets that you can use get familiar with the different chart types and options.

Type pixiedust.sampleData() to display those data sets.

In [12]:
pixiedust.sampleData()
Id Name Topic Publisher
1 Car performance data transportation IBM
2 Sample retail sales transactions, January 2009 Economy & Business IBM Cloud Data Services
3 Total population by country Society IBM Cloud Data Services
4 GoSales Transactions for Naive Bayes Model Leisure IBM
5 Election results by County Society IBM
6 Million dollar home sales in NE Mass late 2016 Economy & Business Redfin.com
7 Boston Crime data, 2-week sample Society City of Boston

The homes sales data set you loaded earlier is one of the samples. Therefore, you could have loaded it by specifying the displayed data set id as parameter: home = pixiedust.sampleData(6)

If your data isn't stored in csv files, you can load it into a DataFrame from any supported Spark data source. See these Python code snippets for more information.

End of chapter. Return to table of contents


Mix Scala and Python on the same notebook

Python has a rich ecosystem of modules including plotting with matplotlib, data structure and analysis with pandas, machine learning, and natural language processing. However, data scientists working with Spark might occasionally need to call out code written in Scala or Java, for example, one of the hundreds of libraries available on spark-packages.org. Unfortunately, Jupyter Python notebooks do not currently provide a way to call out Scala or Java code. As a result, a typical workaround is to first use a Scala notebook to run the Scala code, persist the output somewhere like a Hadoop Distributed File System, create another Python notebook, and re-load the data. This is obviously inefficent and awkward.

As you'll see in this notebook, PixieDust provides a solution to this problem by letting users write and run scala code directly in its own cell. It also lets variables be shared between Python and Scala and vice-versa.

Define a few simple variables in Python

In [13]:
pythonString = "Hello From Python"
pythonInt = 20

Import the PixieDust module

If you haven't already, import PixieDust. Follow the instructions in Get started.

Use the Python variables in Scala code

PixieDust makes all variables defined in the Python scope available to Scala using the following rules:

  • Primitive types are mapped to the Scala equivalent: for example, Python Strings become Scala Strings, Python Integer become Scala Integer, and so on.
  • Some complex types are mapped as follows: PySpark SQLContext, DataFrame, RDD are mapped to their Scala Spark equivalents. Python GraphFrames mapped to their Scala equivalents. PixieDust will add more mapping as needed.
  • Python classes are currently not converted and therefore cannot be used in Scala.

The PixieDust Scala Bridge requires the environment variable SCALA_HOME to be defined and pointing at a Scala install:

In [14]:
%%scala
print(pythonString)
print(pythonInt + 10)
Hello From Python
30

Define a variable in Scala and use it in Python

In this section, you'll create a variable in Scala and use it in Python.

Note: only variables that are prefixed with two underscores ( __ ) are available for use in Python.

In [15]:
%%scala
val __scalaString = "Hello From Scala"
val __scalaInt = 5
In [16]:
# using Scala variable in Python
print __scalaString
print __scalaInt + 10
Hello From Scala
15

In this chapter, you've seen how easy it is to intersperse Scala and Python in the same notebook. Continue exploring this powerful functionality by using more complex Scala libraries!

End of chapter. Return to table of contents


Add Spark packages and run inside your notebook

PixieDust PackageManager helps you install spark packages inside your notebook. This is especially useful when you're working in a hosted cloud environment without access to configuration files. Use PixieDust Package Manager to install:

  • a spark package from spark-packages.org
  • a package from the Maven search repository
  • a jar file directly from URL

Note: After you install a package, you must restart the kernel and import Pixiedust again.

View list of packages

To see the packages installed on your system, run the following command:

In [17]:
import pixiedust
pixiedust.printAllPackages()
direct.download:https://github.com/ibm-cds-labs/spark.samples/raw/master/dist/streaming-twitter-assembly-1.6.jar:1.0 => /gpfs/fs01/user/s2b7-790f27d2e466b6-772f4e1cd93d/data/libs/streaming-twitter-assembly-1.6.jar
direct.download:https://github.com/ibm-watson-data-lab/spark.samples/blob/master/dist/helloSpark-assembly-2.1.jar?raw=true:1.0 => /gpfs/fs01/user/s2b7-790f27d2e466b6-772f4e1cd93d/data/libs/helloSpark-assembly-2.1.jar?raw=true
com.typesafe.scala-logging:scala-logging-api_2.11:2.1.2 => /gpfs/fs01/user/s2b7-790f27d2e466b6-772f4e1cd93d/data/libs/scala-logging-api_2.11-2.1.2.jar
com.typesafe.scala-logging:scala-logging-slf4j_2.11:2.1.2 => /gpfs/fs01/user/s2b7-790f27d2e466b6-772f4e1cd93d/data/libs/scala-logging-slf4j_2.11-2.1.2.jar
graphframes:graphframes:0.5.0-spark2.1-s_2.11 => /gpfs/fs01/user/s2b7-790f27d2e466b6-772f4e1cd93d/data/libs/graphframes-0.5.0-spark2.1-s_2.11.jar
direct.download:https://github.com/ibm-watson-data-lab/spark.samples/raw/master/dist/streaming-twitter-assembly-1.6.jar:1.0 => /gpfs/fs01/user/s2b7-790f27d2e466b6-772f4e1cd93d/data/libs/streaming-twitter-assembly-1.6.jar

Add a package from spark-packages.org

The command you use to install GraphFrames depends on your Spark version.

In [18]:
if sc.version.startswith('1.6.'):  # Spark 1.6
    pixiedust.installPackage("graphframes:graphframes:0.5.0-spark1.6-s_2.11")
elif sc.version.startswith('2.'):  # Spark 2.1, 2.0
    pixiedust.installPackage("graphframes:graphframes:0.5.0-spark2.1-s_2.11")


pixiedust.installPackage("com.typesafe.scala-logging:scala-logging-api_2.11:2.1.2")
pixiedust.installPackage("com.typesafe.scala-logging:scala-logging-slf4j_2.11:2.1.2")
Package already installed: graphframes:graphframes:0.5.0-spark2.1-s_2.11
Package already installed: com.typesafe.scala-logging:scala-logging-api_2.11:2.1.2
Package already installed: com.typesafe.scala-logging:scala-logging-slf4j_2.11:2.1.2
Out[18]:
<pixiedust.packageManager.package.Package at 0x7f35f4143490>

Note: After you install a package, you must restart the kernel and import Pixiedust again. You'll also need to run pixiedust.installPackage again before that package can be used. You can do this by running the two code cells above again after you have restarted the kernel.

View the updated list of packages

Run printAllPackages again to see that GraphFrames is now in your list:

In [19]:
pixiedust.printAllPackages()
direct.download:https://github.com/ibm-cds-labs/spark.samples/raw/master/dist/streaming-twitter-assembly-1.6.jar:1.0 => /gpfs/fs01/user/s2b7-790f27d2e466b6-772f4e1cd93d/data/libs/streaming-twitter-assembly-1.6.jar
direct.download:https://github.com/ibm-watson-data-lab/spark.samples/blob/master/dist/helloSpark-assembly-2.1.jar?raw=true:1.0 => /gpfs/fs01/user/s2b7-790f27d2e466b6-772f4e1cd93d/data/libs/helloSpark-assembly-2.1.jar?raw=true
com.typesafe.scala-logging:scala-logging-api_2.11:2.1.2 => /gpfs/fs01/user/s2b7-790f27d2e466b6-772f4e1cd93d/data/libs/scala-logging-api_2.11-2.1.2.jar
com.typesafe.scala-logging:scala-logging-slf4j_2.11:2.1.2 => /gpfs/fs01/user/s2b7-790f27d2e466b6-772f4e1cd93d/data/libs/scala-logging-slf4j_2.11-2.1.2.jar
graphframes:graphframes:0.5.0-spark2.1-s_2.11 => /gpfs/fs01/user/s2b7-790f27d2e466b6-772f4e1cd93d/data/libs/graphframes-0.5.0-spark2.1-s_2.11.jar
direct.download:https://github.com/ibm-watson-data-lab/spark.samples/raw/master/dist/streaming-twitter-assembly-1.6.jar:1.0 => /gpfs/fs01/user/s2b7-790f27d2e466b6-772f4e1cd93d/data/libs/streaming-twitter-assembly-1.6.jar

Display a GraphFrames data sample

Even if GraphFrames is already installed, running the install command loads the Python that comes along with the package. Run the following cell and PixieDust displays a sample graph data set. On the upper left of the display, click the table dropdown and switch between views of nodes and edges.

In [20]:
from graphframes import GraphFrame

try:
    sqlcontext = SparkSession.builder.getOrCreate()
except:
    sqlcontext = SQLContext(sc)

# Vertex DataFrame
v = sqlcontext.createDataFrame([
  ("a", "Alice", 34),
  ("b", "Bob", 36),
  ("c", "Charlie", 30),
  ("d", "David", 29),
  ("e", "Esther", 32),
  ("f", "Fanny", 36),
  ("g", "Gabby", 60)
], ["id", "name", "age"])

# Edge DataFrame
e = sqlcontext.createDataFrame([
  ("a", "b", "friend"),
  ("b", "c", "follow"),
  ("c", "b", "follow"),
  ("f", "c", "follow"),
  ("e", "f", "follow"),
  ("e", "d", "friend"),
  ("d", "a", "friend"),
  ("a", "e", "friend")
], ["src", "dst", "relationship"])

# Create a GraphFrame
g = GraphFrame(v, e)

display(g)
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
type: struct
field:
{'metadata': {}, 'type': 'string', 'name': 'src', 'nullable': True}
{'metadata': {}, 'type': 'string', 'name': 'dst', 'nullable': True}
{'metadata': {}, 'type': 'string', 'name': 'relationship', 'nullable': True}
Showing 8 of 8
src
dst
relationship
src
dst
relationship
a b friend
b c follow
c b follow
f c follow
e f follow
e d friend
d a friend
a e friend

Install from Maven

To install a package from the Apache Maven search repository, visit the project and find the groupId and artifactId for the package that you want. Enter them in the following installation command. See instructions for the installPackage command. For example, the following cell installs Apache Commons:

In [21]:
pixiedust.installPackage("org.apache.commons:commons-csv:0")
Downloading package org.apache.commons:commons-csv:1.4 to /gpfs/fs01/user/s2b7-790f27d2e466b6-772f4e1cd93d/data/libs/commons-csv-1.4.jar
Starting download...
Package org.apache.commons:commons-csv:1.4 downloaded successfully
Please restart Kernel to complete installation of the new package
Successfully added package org.apache.commons:commons-csv:1.4
Out[21]:
<pixiedust.packageManager.package.Package at 0x7f35f7fa8b90>

Install a jar file directly from a URL

To install a jar file that is not packaged in a maven repository, provide its URL.

In [22]:
pixiedust.installPackage("https://github.com/ibm-watson-data-lab/spark.samples/raw/master/dist/streaming-twitter-assembly-1.6.jar")
Package already installed: https://github.com/ibm-watson-data-lab/spark.samples/raw/master/dist/streaming-twitter-assembly-1.6.jar
Out[22]:
<pixiedust.packageManager.package.Package at 0x7f35f7fa8290>

Follow the tutorial

To understand what you can do with this jar file, read David Taieb's latest Realtime Sentiment Analysis of Twitter Hashtags with Spark tutorial.

Uninstall a package

It's just as easy to get rid of a package you installed. Just run the command pixiedust.uninstallPackage("<<mypackage>>"). For example, you can uninstall Apache Commons:

In [23]:
pixiedust.uninstallPackage("org.apache.commons:commons-csv:0")
Successfully deleted package org.apache.commons:commons-csv:1.4

Restart the kernel and import pixiedust

After uninstalling a package the restart kernel and import pixiedust before continuing.

In [1]:
# import pixiedust after restarting kernel
import pixiedust
Pixiedust database opened successfully
Pixiedust version 1.0.9

End of chapter. Return to table of contents


Stash Your Data

With PixieDust, you also have the option to export the data from your notebook to external sources. The output of the display API includes a toolbar that contains a Download button.

Stash to Cloudant

You save the data directly into a Cloudant or CouchDB database.

Prerequisite: Collect your database connection information: the database host, user name, and password.

If your Cloudant instance was provisioned in Bluemix, you can find the connectivity information in the Service Credentials tab.

To stash to Cloudant:

  1. From the toolbar in the display output, click the Download button.
  2. Choose Stash to Cloudant from the menu.
  3. Click the dropdown to see the list of available connections and select an existing connection or add a new connection:
    1. Click the + plus button to add a new connection.
    2. Enter your Cloudant database credentials in JSON format.
    3. If you are stashing to CouchDB, include the protocol. See the sample credentials format below.
    4. Click OK.
    5. Select the new connection.
  4. Click Submit.

Sample Credentials Format

CouchDB

{
    "name": "local-couchdb-connection",
    "credentials": {
        "username": "couchdbuser",
        "password": "password",
        "protocol": "http",
        "host": "127.0.0.1:5984",
        "port": 5984,
        "url": "http://couchdbuser:password@127.0.0.1:5984"
    }
}

Cloudant

{
    "name": "remote-cloudant-connection",
    "credentials": {
        "username": "username-bluemix",
        "password": "password",
        "host": "host-bluemix.cloudant.com",
        "port": 443,
        "url": "https://username-bluemix:password@host-bluemix.cloudant.com"
    }
}

Download as a file

Alternatively, you can choose to save the data set to various file formats (for example, CSV, JSON, XML, and so on).

To save a data set as a file:

  1. From the toolbar in the display output, click the Download button.
  2. Choose Download as File.
  3. Choose the desired format.
  4. Specify the number of records to download.
  5. Click OK.

End of chapter. Return to table of contents


Contribute

By now, you've walked through PixieDust's intro notebooks and seen PixieDust in action. If you like what you saw, join the project!

Anyone can get involved. Here are some ways you can contribute:

Write a visualization

Contribute your own custom visualization. Here's a taste of how it works.

Run the next 4 cells to do the following:

  1. Import PixieDust.
  2. Generate a sample DataFrame.
  3. Create a custom table display option called NewSample.
  4. Display the DataFrame and see your new custom option under the Table dropdown menu.

This is just one small example you can quickly do within this notebook. Read how to create a custom visualization.

In [2]:
import pixiedust

Now, create a simple DataFrame:

In [3]:
sqlContext=SQLContext(sc)
d1 = spark.createDataFrame(
[(2010, 'Camping Equipment', 3),
 (2010, 'Golf Equipment', 1),
 (2010, 'Mountaineering Equipment', 1),
 (2010, 'Outdoor Protection', 2),
 (2010, 'Personal Accessories', 2),
 (2011, 'Camping Equipment', 4),
 (2011, 'Golf Equipment', 5),
 (2011, 'Mountaineering Equipment',2),
 (2011, 'Outdoor Protection', 4),
 (2011, 'Personal Accessories', 2),
 (2012, 'Camping Equipment', 5),
 (2012, 'Golf Equipment', 5),
 (2012, 'Mountaineering Equipment', 3),
 (2012, 'Outdoor Protection', 5),
 (2012, 'Personal Accessories', 3),
 (2013, 'Camping Equipment', 8),
 (2013, 'Golf Equipment', 5),
 (2013, 'Mountaineering Equipment', 3),
 (2013, 'Outdoor Protection', 8),
 (2013, 'Personal Accessories', 4)],
["year","zone","unique_customers"])

The following cell creates a new custom table visualization plugin called NewSample:

In [4]:
from pixiedust.display.display import *

class TestDisplay(Display):
    def doRender(self, handlerId):
        self._addHTMLTemplateString(
"""
NewSample Plugin
<table class="table table-striped">
    <thead>                 
        {%for field in entity.schema.fields%}
        <th>{{field.name}}</th>
        {%endfor%}
    </thead>
    <tbody>
        {%for row in entity.take(100)%}
        <tr>
            {%for field in entity.schema.fields%}
            <td>{{row[field.name]}}</td>
            {%endfor%}
        </tr>
        {%endfor%}
    </tbody>
</table>
"""
        )

@PixiedustDisplay()
class TestPluginMeta(DisplayHandlerMeta):
    @addId
    def getMenuInfo(self,entity,dataHandler):
        if entity.__class__.__name__ == "DataFrame":
            return [
                {"categoryId": "Table", "title": "NewSample Table", "icon": "fa-table", "id": "newsampleTest"}
            ]
        else:
            return []
    def newDisplayHandler(self,options,entity):
        return TestDisplay(options,entity)

Next, run display() to show the data. Click the Table dropdown. You now see NewSample Table option, the custom visualization you just created!

In [5]:
display(d1)
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
type: struct
field:
{'metadata': {}, 'type': 'long', 'name': 'year', 'nullable': True}
{'metadata': {}, 'type': 'string', 'name': 'zone', 'nullable': True}
{'metadata': {}, 'type': 'long', 'name': 'unique_customers', 'nullable': True}
Showing 20 of 20
year
zone
unique_customers
year
zone
unique_customers
2010 Camping Equipment 3
2010 Golf Equipment 1
2010 Mountaineering Equipment 1
2010 Outdoor Protection 2
2010 Personal Accessories 2
2011 Camping Equipment 4
2011 Golf Equipment 5
2011 Mountaineering Equipment 2
2011 Outdoor Protection 4
2011 Personal Accessories 2
2012 Camping Equipment 5
2012 Golf Equipment 5
2012 Mountaineering Equipment 3
2012 Outdoor Protection 5
2012 Personal Accessories 3
2013 Camping Equipment 8
2013 Golf Equipment 5
2013 Mountaineering Equipment 3
2013 Outdoor Protection 8
2013 Personal Accessories 4

Error? If you changed the name yourself in cell 3, you might get an error when you try to display. You can fix this by updating metadata in the display() cell. To do so, go to the Jupyter menu above the notebook and choose View > Cell Toolbar > Edit Metadata. Then scroll down to the display(dl) cell, click its Edit Metadata button and change the handlerID.

Build a renderer

PixieDust lets you switch between renderers for charts and maps. We'd love to add more to the list. It's easy to get started. Try the generate tool to create a boilerplate renderer using a quick CLI wizard. Read how to build a renderer.

Enter an issue

Found a bug? Thought of great enhancement? Enter an issue to let us know. Tell us what you think.

Share PixieDust

If you think someone you know would be interested in PixieDust, spread the word:

Learn more

Ready to pitch in? We can't wait to see what you share. More on how to contribute.

End of chapter. Return to table of contents

Authors

  • Jose Barbosa
  • Mike Broberg
  • Inge Halilovic
  • Jess Mantaro
  • Brad Noble
  • David Taieb
  • Patrick Titzler


Copyright © IBM Corp. 2017. This notebook and its source code are released under the terms of the MIT License.