Recently I found Apache Zeppelin, an Apache Incubator project that seems to bring a new paradox into the data science game, and other areas.
Something I’ve really like about Zeppelin is the ease of interaction with spark, I use the spark-shell all the time, but it’s tedious having to re-evaluate commands that I previously inputted, Zeppelin fixes this problem. It let’s me go back and forth across the script that I’m building on spark which is nice.
At time of writing the latest release of Zeppelin is 0.5.6, which comes bundled with Spark 1.4.1 but for reasons I want to use Spark 1.6 so in order to build Zeppelin with Spark 1.6 you are going to have to build it from the source.
1.- Download the latest stable source code from Zeppelin’s download page:
tar -zxvf zeppelin-0.5.6-incubating.tgz
3.- compile with support for spark 1.6
mvn clean package -Pspark-1.6 -Dspark.version=1.6.0 -Dhadoop.version=2.6.0-cdh5.4.8 -Phadoop-2.6 -Pyarn -Ppyspark -Pvendor-repo -DskipTests
For more information on what other parameters you can tweak, checkout Zepellin’s Readme file