content/releases/spark-release-0-8-1.html [1:320]: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Apache Spark 0.8.1 is a maintenance and performance release for the Scala 2.9 version of Spark. It also adds several new features, such as standalone mode high availability, that will appear in Spark 0.9 but developers wanted to have in Scala 2.9. Contributions to 0.8.1 came from 41 developers.
Support has been added for running Spark on YARN 2.2 and newer. Due to a change in the YARN API between previous versions and 2.2+, this was not supported in Spark 0.8.0. See the YARN documentation for specific instructions on how to build Spark for YARN 2.2+. We’ve also included a pre-compiled binary for YARN 2.2.
The standalone cluster manager now has a high availability (H/A) mode which can tolerate master failures. This is particularly useful for long-running applications such as streaming jobs and the shark server, where the scheduler master previously represented a single point of failure. Instructions for deploying H/A mode are included in the documentation. The current implementation uses Zookeeper for coordination.
This release adds several performance optimizations:
spark.shuffle.consolidateFiles
to “true”.add
methodlocal://
URI’s - allows users to specify files already present on slaves as dependenciestransformWith
, leftInnerJoin
, rightOuterJoin
repartition
local://
scheme for dependency jarsadd
method for python accumulators, support for setting config properties in pythonmapPartitionsWithIndex
repartition
operator, shuffle write metrics, various fixes and release managementThanks to everyone who contributed!
Apache Spark 0.8.1 is a maintenance and performance release for the Scala 2.9 version of Spark. It also adds several new features, such as standalone mode high availability, that will appear in Spark 0.9 but developers wanted to have in Scala 2.9. Contributions to 0.8.1 came from 41 developers.
Support has been added for running Spark on YARN 2.2 and newer. Due to a change in the YARN API between previous versions and 2.2+, this was not supported in Spark 0.8.0. See the YARN documentation for specific instructions on how to build Spark for YARN 2.2+. We’ve also included a pre-compiled binary for YARN 2.2.
The standalone cluster manager now has a high availability (H/A) mode which can tolerate master failures. This is particularly useful for long-running applications such as streaming jobs and the shark server, where the scheduler master previously represented a single point of failure. Instructions for deploying H/A mode are included in the documentation. The current implementation uses Zookeeper for coordination.
This release adds several performance optimizations:
spark.shuffle.consolidateFiles
to “true”.add
methodlocal://
URI’s - allows users to specify files already present on slaves as dependenciestransformWith
, leftInnerJoin
, rightOuterJoin
repartition
local://
scheme for dependency jarsadd
method for python accumulators, support for setting config properties in pythonmapPartitionsWithIndex
repartition
operator, shuffle write metrics, various fixes and release managementThanks to everyone who contributed!