We are happy to announce the availability of Spark 2.2.3! Visit the release notes to read about the new features, or download the release today.
We are happy to announce the availability of Spark 2.2.3! Visit the release notes to read about the new features, or download the release today.
The agenda for Spark + AI Summit 2019 is now available! The summit kicks off on April 23rd with a full day of Apache Spark training followed by over 100+ talks featuring speakers from Netflix, Facebook, Uber, Yelp, Target, Apple and more! Check out the full schedule and register to attend!
We are happy to announce the availability of Spark 2.4.0! Visit the release notes to read about the new features, or download the release today.
We are happy to announce the availability of Spark 2.3.2! Visit the release notes to read about the new features, or download the release today.
The agenda for Spark+AI Summit Europe is now available! The summit kicks off on October 2nd with a full day of Spark training followed by over 100 talks featuring speakers from Databricks, Facebook, Intel, IBM, CERN, Uber and Google. Check out the full schedule and register to attend!
We are happy to announce the availability of Spark 2.2.2! Visit the release notes to read about the new features, or download the release today.
We are happy to announce the availability of Spark 2.1.3! Visit the release notes to read about the new features, or download the release today.
We are happy to announce the availability of Spark 2.3.1! Visit the release notes to read about the new features, or download the release today.
The agenda for Spark+AI Summit is now available! The summit kicks off on June 4th with a full day of Spark training followed by over 180 talks featuring speakers from Databricks, Facebook, Microsoft, Intel, IBM, Salesforce, Uber and UC Berkeley. Check out the full schedule and register to attend!
We are happy to announce the availability of Spark 2.3.0! Visit the release notes to read about the new features, or download the release today.
We are happy to announce the availability of Apache Spark 2.2.1! Visit the release notes to read about the changes, or download the release today.
We are happy to announce the availability of Apache Spark 2.1.2! Visit the release notes to read about the changes, or download the release today.
The agenda for Spark Summit EU 2017 is now available! The summit kicks off on October 24 with a full day of Apache Spark training followed by over 80+ talks featuring speakers from Shell, Netflix, Intel, IBM, Facebook, Toon and many more. Check out the full schedule and register to attend!
We are happy to announce the availability of Spark 2.2.0! Visit the release notes to read about the new features, or download the release today.
We are happy to announce the availability of Apache Spark 2.1.1! Visit the release notes to read about the changes, or download the release today.
The agenda for Spark Summit is now available! The summit kicks off on June 5th with a full day of Spark training followed by over 110+ talks featuring speakers from Databricks, Facebook, Airbnb, Yelp, Salesforce and UC Berkeley. Check out the full schedule and register to attend!
The agenda for Spark Summit East is now available! The summit kicks off on February 7th with a full day of Spark training followed by over 100+ talks featuring speakers from Netflix, Walmart Labs, Databricks, MIT, IBM, Microsoft, Facebook, CaptialOne, UC Berkeley. Check out the full schedule and register to attend!
We are happy to announce the availability of Spark 2.1.0! Visit the release notes to read about the new features, or download the release today.
We are proud to announce that Apache Spark won the 2016 CloudSort Benchmark (both Daytona and Indy category). A joint team from Nanjing University, Alibaba Group, and Databricks Inc. entered the competition using NADSort, a distributed sorting program built on top of Spark, and set a new world record as the most cost-efficient way to sort 100TB of data.
We are happy to announce the availability of Apache Spark 2.0.2! This maintenance release includes fixes across several areas of Spark, as well as Kafka 0.10 and runtime metrics support for Structured Streaming.
We are happy to announce the availability of Spark 1.6.3! This maintenance release includes fixes across several areas of Spark.
We are happy to announce the availability of Apache Spark 2.0.1! Visit the release notes to read about the new features, or download the release today.
We are happy to announce the availability of Spark 2.0.0! Visit the release notes to read about the new features, or download the release today.
We are happy to announce the availability of Spark 1.6.2! This maintenance release includes fixes across several areas of Spark.
Call for presentations is now open for Spark Summit EU! The event will take place on October 25-27 in Brussels. Submissions are welcome across a variety of Spark-related topics, including applications, development, data science, enterprise, spark ecosystem and research. Please submit by July 1 to be considered.
To enable wide-scale community testing of the upcoming Spark 2.0 release, the Apache Spark team has posted a preview release of Spark 2.0. This preview is not a stable release in terms of either API or functionality, but it is meant to give the community early access to try the code that will become Spark 2.0. If you would like to test the release, simply download it, and send feedback using either the mailing lists or JIRA.
The agenda for Spark Summit 2016 is now available! The summit kicks off on June 6th with a full day of Spark training followed by over 90+ talks featuring speakers from Airbnb, Baidu, Bloomberg, Databricks, Duke, IBM, Microsoft, Netflix, Uber, UC Berkeley. Check out the full schedule and register to attend!
We are happy to announce the availability of Spark 1.6.1! This maintenance release includes fixes across several areas of Spark, including signficant updates to the experimental Dataset API.
Call for presentations is now open for Spark Summit San Francisco! The event will take place on June 6-8 in San Francisco. Submissions are welcome across a variety of Spark-related topics, including applications, development, data science, business value, spark ecosystem and research. Please submit by February 29th to be considered.
The agenda for Spark Summit East is now posted, with 60 talks from organizations including Netflix, Comcast, Blackrock, Bloomberg and others. The 2nd annual Spark Summit East will run February 16-18th in NYC and feature a full program of speakers along with Spark training opportunities. More details are available on the Spark Summit East website, where you can also register to attend.
We are happy to announce the availability of Spark 1.6.0! Spark 1.6.0 is the seventh release on the API-compatible 1.X line. With this release the Spark community continues to grow, with contributions from 248 developers!
Call for presentations is closing soon for Spark Summit East! The event will take place on February 16th-18th in New York City. Submissions are welcome across a variety of Spark-related topics, including applications, development, data science, enterprise, and research. Please submit by November 22nd to be considered.
We are happy to announce the availability of Spark 1.5.2! This maintenance release includes fixes across several areas of Spark, including the DataFrame API, Spark Streaming, PySpark, R, Spark SQL, and MLlib.
Abstract submissions are now open for the 2nd Spark Summit East! The event will take place on February 16th-18th in New York City. Submissions are welcome across a variety of Spark-related topics, including applications, development, data science, enterprise, and research.
We are happy to announce the availability of Spark 1.5.1! This maintenance release includes fixes across several areas of Spark, including the DataFrame API, Spark Streaming, PySpark, R, Spark SQL, and MLlib.
We are happy to announce the availability of Spark 1.5.0! Spark 1.5.0 is the sixth release on the API-compatible 1.X line. It is Spark’s largest release ever, with contributions from 230 developers and more than 1,400 commits!
The agenda for Spark Summit Europe is now posted, with 38 talks from organizations including Barclays, Netflix, Elsevier, Intel and others. This inaugural Spark conference in Europe will run October 27th-29th 2015 in Amsterdam and feature a full program of speakers along with Spark training opportunities. More details are available on the Spark Summit Europe website, where you can also register to attend.
We are happy to announce the availability of Spark 1.4.1! This is a maintenance release that includes contributions from 85 developers. Spark 1.4.1 includes fixes across several areas of Spark, including the DataFrame API, Spark Streaming, PySpark, Spark SQL, and MLlib.
The videos and slides for Spark Summit 2015 are now all available online! The talks include technical roadmap discussions, deep dives on Spark components, and use cases built on top of Spark.
We are happy to announce the availability of Spark 1.4.0! Spark 1.4.0 is the fifth release on the API-compatible 1.X line. It is Spark’s largest release ever, with contributions from 210 developers and more than 1,000 commits!
Abstract submissions are now open for the first ever Spark Summit Europe. The event will take place on October 27th to 29th in Amsterdam. Submissions are welcome across a variety of Spark related topics, including use cases and ongoing development.
There is one month left until Spark Summit 2015, which will be held in San Francisco on June 15th to 17th. The Summit will contain presentations from over 50 organizations using Spark, focused on use cases and ongoing development.
The videos and slides for Spark Summit East 2015 are now all available online. Watch them to get the latest news from the Spark community as well as use cases and applications built on top.
We are happy to announce the availability of Spark 1.2.2 and Spark 1.3.1! These are both maintenance releases that collectively feature the work of more than 90 developers.
We are happy to announce the availability of Spark 1.3.0! Spark 1.3.0 is the third release on the API-compatible 1.X line. It is Spark’s largest release ever, with contributions from 174 developers and more than 1,000 commits!
We are happy to announce the availability of Spark 1.2.1! This is a maintenance release that includes contributions from 69 developers. Spark 1.2.1 includes fixes across several areas of Spark, including the core API, Streaming, PySpark, SQL, GraphX, and MLlib.
The agenda for Spark Summit East is now posted, with 38 talks from organizations including Goldman Sachs, Baidu, Salesforce, Novartis, Cisco and others. This inaugural Spark conference on the US East Coast will run March 18th-19th 2015 in New York City. More details are available on the Spark Summit East website, where you can also register to attend.
We are happy to announce the availability of Spark 1.2.0! Spark 1.2.0 is the third release on the API-compatible 1.X line. It is Spark’s largest release ever, with contributions from 172 developers and more than 1,000 commits!
We are happy to announce the availability of Spark 1.1.1! This is a maintenance release that includes contributions from 55 developers. Spark 1.1.1 includes fixes across several areas of Spark, including the core API, Streaming, PySpark, SQL, GraphX, and MLlib.
Registration is now open for Spark Summit East 2015, to be held on March 18th and 19th in New York City. The conference will be a great chance to meet people from throughout the Spark community as well as attend training workshops on Spark. If you haven’t been to previous Spark Summits, you can find content from previous events on the Spark Summit website.
We are proud to announce that Spark won the 2014 Gray Sort Benchmark (Daytona 100TB category). A team from Databricks including Spark committers, Reynold Xin, Xiangrui Meng, and Matei Zaharia, entered the benchmark using Spark. Spark won a tie with the Themis team from UCSD, and jointly set a new world record in sorting.
After successful events in the past two years, the Spark Summit conference has expanded for 2015, offering both an event in New York on March 18-19 and one in San Francisco on June 15-17. The conference is a great chance to meet people from throughout the Spark community and see the latest news, tips and use cases.
We are happy to announce the availability of Spark 1.1.0! Spark 1.1.0 is the second release on the API-compatible 1.X line. It is Spark’s largest release ever, with contributions from 171 developers!
We are happy to announce the availability of Spark 1.0.2! This release includes contributions from 30 developers. Spark 1.0.2 includes fixes across several areas of Spark, including the core API, Streaming, PySpark, and MLlib.
We are happy to announce the availability of Spark 0.9.2! Apache Spark 0.9.2 is a maintenance release with bug fixes. We recommend all 0.9.x users to upgrade to this stable release. Contributions to this release came from 28 developers.
The videos and slides for Spark Summit 2014 are now all available online. Watch them to see the latest news from the Spark community as well as use cases and applications built on top. In addition, training materials from the Summit, including hands-on exercises, are all available freely as well.
We are happy to announce the availability of Spark 1.0.1! This release includes contributions from 70 developers. Spark 1.0.0 includes fixes across several areas of Spark, including the core API, PySpark, and MLlib. It also includes new features in Spark’s (alpha) SQL library, including support for JSON data and performance and stability fixes.
There are now two weeks left to Spark Summit 2014, which will be held in San Francisco on June 30th to July 2nd. The Summit will contain presentations from over 50 organizations using Spark, focused on use cases and ongoing development.
We are happy to announce the availability of Spark 1.0.0! Spark 1.0.0 is the first in the 1.X line of releases, providing API stability for Spark’s core interfaces. It is Spark’s largest release ever, with contributions from 117 developers. This release expands Spark’s standard libraries, introducing a new SQL package (Spark SQL) that lets users integrate SQL queries into existing Spark workflows. MLlib, Spark’s machine learning library, is expanded with sparse vector support and several new algorithms. The GraphX and Streaming libraries also introduce new features and optimizations. Spark’s core engine adds support for secured YARN clusters, a unified tool for submitting Spark applications, and several performance and stability improvements.
The agenda for the Spark Summit 2014 conference is now available online. With talks from more than 50 organizations, it will be the biggest Spark event yet, bringing the developer and user communities together. Join us in person or tune in online to learn about the latest happenings in Spark.
We are happy to announce the availability of Spark 0.9.1! Apache Spark 0.9.1 is a maintenance release with bug fixes, performance improvements, better stability with YARN and improved parity of the Scala and Python API. We recommend all 0.9.0 users to upgrade to this stable release. Contributions to this release came from 37 developers.
After last year’s successful first Spark Summit, registrations and talk submissions are now open for Spark Summit 2014. This will be a 3-day event in San Francisco organized by multiple companies in the Spark community. The event will run June 30th to July 2nd in San Francisco, CA.
The Apache Software Foundation announced today that Spark has graduated from the Apache Incubator to become a top-level Apache project, signifying that the project’s community and products have been well-governed under the ASF’s meritocratic process and principles. This is a major step for the community and we are very proud to share this news with users as we complete Spark’s move to Apache. Read more about Spark’s growth during the past year and from contributors and users in the ASF’s press release.
We are happy to announce the availability of Spark 0.9.0! Spark 0.9.0 is a major release and Spark’s largest release ever, with contributions from 83 developers. This release expands Spark’s standard libraries, introducing a new graph computation package (GraphX) and adding several new features to the machine learning and stream-processing packages. It also makes major improvements to the core engine, including external aggregations, a simplified H/A mode for long lived applications, and hardened YARN support.
We’ve just posted Spark Release 0.8.1, a maintenance and performance release for the Scala 2.9 version of Spark. 0.8.1 includes support for YARN 2.2, a high availability mode for the standalone scheduler, optimizations to the shuffle, and many other improvements. We recommend that all users update to this release. Visit the release notes to read about the new features, or download the release today.
The Spark Summit 2013, held in early December 2013 in downtown San Francisco, was a success! Over 450 Spark developers and enthusiasts from 13 countries and more than 180 companies came to learn from project leaders and production users of Spark, Shark, Spark Streaming and related projects about use cases, recent developments, and the Spark community roadmap.
We are excited to announce the first Spark Summit on Dec 2, 2013 in Downtown San Francisco. Come hear from key production users of Spark, Shark, Spark Streaming and related projects. Also find out where the development is going, and learn how to use the Spark stack in a variety of applications. The summit is being organized and sponsored by leading organizations in the Spark community.
We’re proud to announce the release of Apache Spark 0.8.0. Spark 0.8.0 is a major release that includes many new capabilities and usability improvements. It’s also our first release under the Apache incubator. It is the largest Spark release yet, with contributions from 67 developers and 24 companies. Major new features include an expanded monitoring framework and UI, a machine learning library, and support for running Spark inside of YARN.
As we continue developing Spark, we would love to get feedback from users and hear what you’d like us to work on next. We’ve decided that a good way to do that is a survey – we hope to run this at regular intervals. If you have a few minutes to participate, fill in the survey here. Your time is greatly appreciated.
We have released the next screencast, A Standalone Job in Scala that takes you beyond the Spark shell, helping you write your first standalone Spark job.
Want to learn how to use Spark, Shark, GraphX, and related technologies in person? The AMP Lab is hosting a two-day training workshop for them on August 29th and 30th in Berkeley. The workshop will include tutorials, talks from users, and over four hours of hands-on exercises. Registration is now open on the AMP Camp website, for a price of $250 per person. We recommend signing up early because last year’s workshop was sold out.
As part of the Spark project's recent move to Apache, we are planning to migrate the mailing lists to Apache infrastructure this month, so that the existing Google groups will become read-only on September 1, 2013. To keep receiving updates about Spark or to participate in development discussions, please subscribe to the following lists:
Most users will probably want the User list, but individuals interested in contributing code to the project should also subscribe to the Dev list.
We’ve just posted Spark Release 0.7.3, a maintenance release that contains several fixes, including streaming API updates and new functionality for adding JARs to a spark-shell
session. We recommend that all users update to this release. Visit the release notes to read about the new features, or download the release today.
Spark, its creators at the AMP Lab, and some of its users were featured in a Wired Enterprise article a few days ago. Read on to learn a little about how Spark is being used in industry.
Spark was recently accepted into the Apache Incubator, which will serve as the long-term home for the project. While moving the source code and issue tracking to Apache will take some time, we are excited to be joining the community at Apache. Stay tuned on this site for updates on how the project hosting will change.
We’re happy to announce the release of Spark 0.7.2, a new maintenance release that includes several bug fixes and improvements, as well as new code examples and API features. We recommend that all users update to this release. Head over to the release notes to read about the new features, or download the release today.
We have released the first two screencasts in a series of short hands-on video training courses we will be publishing to help new users get up and running with Spark in minutes.
At this year’s Strata conference, the AMP Lab hosted a full day of tutorials on Spark, Shark, and Spark Streaming, including online exercises on Amazon EC2. Those exercises are now available online, letting you learn Spark and Shark at your own pace on an EC2 cluster with real data. They are a great resource for learning the systems. You can also find slides from the Strata tutorials online, as well as videos from the AMP Camp workshop we held at Berkeley in August.
We’re proud to announce the release of Spark 0.7.0, a new major version of Spark that adds several key features, including a Python API for Spark and an alpha of Spark Streaming. This release is the result of the largest group of contributors yet behind a Spark release – 31 contributors from inside and outside Berkeley. Head over to the release notes to read more about the new features, or download the release today.
This weekend, Amazon posted an article and code that make it easy to launch Spark and Shark on Elastic MapReduce. The article includes examples of how to run both interactive Scala commands and SQL queries from Shark on data in S3. Head over to the Amazon article for details. We’re very excited because, to our knowledge, this makes Spark the first non-Hadoop engine that you can launch with EMR.
We recently released Spark 0.6.2, a new version of Spark. This is a maintenance release that includes several bug fixes and usability improvements (see the release notes). We recommend that all users upgrade to this release.
Quantifind, one of the Bay Area companies that has been using Spark for predictive analytics, recently posted two useful entries on working with Spark in their tech blog:
Thanks for sharing this, and looking forward to see others!
On December 18th, we held the first of a series of Spark development meetups, for people interested in learning the Spark codebase and contributing to the project. There was quite a bit more demand than we anticipated, with over 80 people signing up and 64 attending. The first meetup was an introduction to Spark internals. Thanks to one of the attendees, there’s now a video of the meetup on YouTube. We’ve also posted the slides. Look to see more development meetups on Spark and Shark in the future.
Recently, we’ve seen quite a bit of coverage of Spark in the news. I wanted to list some of the more recent articles, for readers interested in learning more.
In other news, there will be a full day of tutorials on Spark and Shark at the O’Reilly Strata conference in February. They include a three-hour introduction to Spark, Shark and BDAS Tuesday morning, and a three-hour hands-on exercise session.
Today we’ve made available two maintenance releases for Spark: 0.6.1 and 0.5.2. They both contain important bug fixes as well as some new features, such as the ability to build against Hadoop 2 distributions. We recommend that users update to the latest version for their branch; for new users, we recommend 0.6.1.
Spark version 0.6.0 was released today, a major release that brings a wide range of performance improvements and new features, including a simpler standalone deploy mode and a Java API. Read more about it in the release notes.
Our paper on Spark won the Best Paper Award at the USENIX NSDI conference. You can see a video of the talk, as well as slides, online on the NSDI website.
We’ve started hosting a regular Bay Area Spark User Meetup. Sign up on the meetup.com page to be notified about events and meet other Spark developers and users.