News | Apache Spark

Spark 2.2.3 released

January 11, 2019

We are happy to announce the availability of Spark 2.2.3! Visit the release notes to read about the new features, or download the release today.

Spark+AI Summit (April 23-25th, 2019, San Francisco) agenda posted

December 19, 2018

The agenda for Spark + AI Summit 2019 is now available! The summit kicks off on April 23rd with a full day of Apache Spark training followed by over 100+ talks featuring speakers from Netflix, Facebook, Uber, Yelp, Target, Apple and more! Check out the full schedule and register to attend!

Spark 2.4.0 released

November 2, 2018

We are happy to announce the availability of Spark 2.4.0! Visit the release notes to read about the new features, or download the release today.

Spark 2.3.2 released

September 24, 2018

We are happy to announce the availability of Spark 2.3.2! Visit the release notes to read about the new features, or download the release today.

Spark+AI Summit (October 2-4th, 2018, London) agenda posted

July 24, 2018

The agenda for Spark+AI Summit Europe is now available! The summit kicks off on October 2nd with a full day of Spark training followed by over 100 talks featuring speakers from Databricks, Facebook, Intel, IBM, CERN, Uber and Google. Check out the full schedule and register to attend!

Spark 2.2.2 released

July 2, 2018

We are happy to announce the availability of Spark 2.2.2! Visit the release notes to read about the new features, or download the release today.

Spark 2.1.3 released

June 29, 2018

We are happy to announce the availability of Spark 2.1.3! Visit the release notes to read about the new features, or download the release today.

Spark 2.3.1 released

June 8, 2018

We are happy to announce the availability of Spark 2.3.1! Visit the release notes to read about the new features, or download the release today.

Spark+AI Summit (June 4-6th, 2018, San Francisco) agenda posted

March 1, 2018

The agenda for Spark+AI Summit is now available! The summit kicks off on June 4th with a full day of Spark training followed by over 180 talks featuring speakers from Databricks, Facebook, Microsoft, Intel, IBM, Salesforce, Uber and UC Berkeley. Check out the full schedule and register to attend!

Spark 2.3.0 released

February 28, 2018

We are happy to announce the availability of Spark 2.3.0! Visit the release notes to read about the new features, or download the release today.

Spark 2.2.1 released

December 1, 2017

We are happy to announce the availability of Apache Spark 2.2.1! Visit the release notes to read about the changes, or download the release today.

Spark 2.1.2 released

October 9, 2017

We are happy to announce the availability of Apache Spark 2.1.2! Visit the release notes to read about the changes, or download the release today.

Spark Summit Europe (October 24-26th, 2017, Dublin, Ireland) agenda posted

August 28, 2017

The agenda for Spark Summit EU 2017 is now available! The summit kicks off on October 24 with a full day of Apache Spark training followed by over 80+ talks featuring speakers from Shell, Netflix, Intel, IBM, Facebook, Toon and many more. Check out the full schedule and register to attend!

Spark 2.2.0 released

July 11, 2017

We are happy to announce the availability of Spark 2.2.0! Visit the release notes to read about the new features, or download the release today.

Spark 2.1.1 released

May 2, 2017

We are happy to announce the availability of Apache Spark 2.1.1! Visit the release notes to read about the changes, or download the release today.

Spark Summit (June 5-7th, 2017, San Francisco) agenda posted

March 31, 2017

The agenda for Spark Summit is now available! The summit kicks off on June 5th with a full day of Spark training followed by over 110+ talks featuring speakers from Databricks, Facebook, Airbnb, Yelp, Salesforce and UC Berkeley. Check out the full schedule and register to attend!

Spark Summit East (Feb 7-9th, 2017, Boston) agenda posted

January 4, 2017

The agenda for Spark Summit East is now available! The summit kicks off on February 7th with a full day of Spark training followed by over 100+ talks featuring speakers from Netflix, Walmart Labs, Databricks, MIT, IBM, Microsoft, Facebook, CaptialOne, UC Berkeley. Check out the full schedule and register to attend!

Spark 2.1.0 released

December 28, 2016

We are happy to announce the availability of Spark 2.1.0! Visit the release notes to read about the new features, or download the release today.

Spark wins CloudSort Benchmark as the most efficient engine

November 15, 2016

We are proud to announce that Apache Spark won the 2016 CloudSort Benchmark (both Daytona and Indy category). A joint team from Nanjing University, Alibaba Group, and Databricks Inc. entered the competition using NADSort, a distributed sorting program built on top of Spark, and set a new world record as the most cost-efficient way to sort 100TB of data.

Spark 2.0.2 released

November 14, 2016

We are happy to announce the availability of Apache Spark 2.0.2! This maintenance release includes fixes across several areas of Spark, as well as Kafka 0.10 and runtime metrics support for Structured Streaming.

Spark 1.6.3 released

November 7, 2016

We are happy to announce the availability of Spark 1.6.3! This maintenance release includes fixes across several areas of Spark.

Spark 2.0.1 released

October 3, 2016

We are happy to announce the availability of Apache Spark 2.0.1! Visit the release notes to read about the new features, or download the release today.

Spark 2.0.0 released

July 26, 2016

We are happy to announce the availability of Spark 2.0.0! Visit the release notes to read about the new features, or download the release today.

Spark 1.6.2 released

June 25, 2016

We are happy to announce the availability of Spark 1.6.2! This maintenance release includes fixes across several areas of Spark.

Call for Presentations for Spark Summit EU is Open

June 16, 2016

Call for presentations is now open for Spark Summit EU! The event will take place on October 25-27 in Brussels. Submissions are welcome across a variety of Spark-related topics, including applications, development, data science, enterprise, spark ecosystem and research. Please submit by July 1 to be considered.

Preview release of Spark 2.0

May 26, 2016

To enable wide-scale community testing of the upcoming Spark 2.0 release, the Apache Spark team has posted a preview release of Spark 2.0. This preview is not a stable release in terms of either API or functionality, but it is meant to give the community early access to try the code that will become Spark 2.0. If you would like to test the release, simply download it, and send feedback using either the mailing lists or JIRA.

Spark Summit (June 6, 2016, San Francisco) agenda posted

April 17, 2016

The agenda for Spark Summit 2016 is now available! The summit kicks off on June 6th with a full day of Spark training followed by over 90+ talks featuring speakers from Airbnb, Baidu, Bloomberg, Databricks, Duke, IBM, Microsoft, Netflix, Uber, UC Berkeley. Check out the full schedule and register to attend!

Spark 1.6.1 released

March 9, 2016

We are happy to announce the availability of Spark 1.6.1! This maintenance release includes fixes across several areas of Spark, including signficant updates to the experimental Dataset API.

Submission is open for Spark Summit San Francisco

February 11, 2016

Call for presentations is now open for Spark Summit San Francisco! The event will take place on June 6-8 in San Francisco. Submissions are welcome across a variety of Spark-related topics, including applications, development, data science, business value, spark ecosystem and research. Please submit by February 29th to be considered.

Spark Summit East (Feb 16, 2016, New York) agenda posted

January 14, 2016

The agenda for Spark Summit East is now posted, with 60 talks from organizations including Netflix, Comcast, Blackrock, Bloomberg and others. The 2nd annual Spark Summit East will run February 16-18th in NYC and feature a full program of speakers along with Spark training opportunities. More details are available on the Spark Summit East website, where you can also register to attend.

Spark 1.6.0 released

January 4, 2016

We are happy to announce the availability of Spark 1.6.0! Spark 1.6.0 is the seventh release on the API-compatible 1.X line. With this release the Spark community continues to grow, with contributions from 248 developers!

CFP for Spark Summit East 2016 is closing soon!

November 19, 2015

Call for presentations is closing soon for Spark Summit East! The event will take place on February 16th-18th in New York City. Submissions are welcome across a variety of Spark-related topics, including applications, development, data science, enterprise, and research. Please submit by November 22nd to be considered.

Spark 1.5.2 released

November 9, 2015

We are happy to announce the availability of Spark 1.5.2! This maintenance release includes fixes across several areas of Spark, including the DataFrame API, Spark Streaming, PySpark, R, Spark SQL, and MLlib.

Submission is open for Spark Summit East 2016

October 14, 2015

Abstract submissions are now open for the 2nd Spark Summit East! The event will take place on February 16th-18th in New York City. Submissions are welcome across a variety of Spark-related topics, including applications, development, data science, enterprise, and research.

Spark 1.5.1 released

October 2, 2015

We are happy to announce the availability of Spark 1.5.1! This maintenance release includes fixes across several areas of Spark, including the DataFrame API, Spark Streaming, PySpark, R, Spark SQL, and MLlib.

Spark 1.5.0 released

September 9, 2015

We are happy to announce the availability of Spark 1.5.0! Spark 1.5.0 is the sixth release on the API-compatible 1.X line. It is Spark’s largest release ever, with contributions from 230 developers and more than 1,400 commits!

Spark Summit Europe agenda posted

September 7, 2015

The agenda for Spark Summit Europe is now posted, with 38 talks from organizations including Barclays, Netflix, Elsevier, Intel and others. This inaugural Spark conference in Europe will run October 27th-29th 2015 in Amsterdam and feature a full program of speakers along with Spark training opportunities. More details are available on the Spark Summit Europe website, where you can also register to attend.

Spark 1.4.1 released

July 15, 2015

We are happy to announce the availability of Spark 1.4.1! This is a maintenance release that includes contributions from 85 developers. Spark 1.4.1 includes fixes across several areas of Spark, including the DataFrame API, Spark Streaming, PySpark, Spark SQL, and MLlib.

Spark Summit 2015 Videos Posted

June 29, 2015

The videos and slides for Spark Summit 2015 are now all available online! The talks include technical roadmap discussions, deep dives on Spark components, and use cases built on top of Spark.

Spark 1.4.0 released

June 11, 2015

We are happy to announce the availability of Spark 1.4.0! Spark 1.4.0 is the fifth release on the API-compatible 1.X line. It is Spark’s largest release ever, with contributions from 210 developers and more than 1,000 commits!

Announcing Spark Summit Europe

May 15, 2015

Abstract submissions are now open for the first ever Spark Summit Europe. The event will take place on October 27th to 29th in Amsterdam. Submissions are welcome across a variety of Spark related topics, including use cases and ongoing development.

One month to Spark Summit 2015 in San Francisco

May 15, 2015

There is one month left until Spark Summit 2015, which will be held in San Francisco on June 15th to 17th. The Summit will contain presentations from over 50 organizations using Spark, focused on use cases and ongoing development.

Spark Summit East 2015 Videos Posted

April 20, 2015

The videos and slides for Spark Summit East 2015 are now all available online. Watch them to get the latest news from the Spark community as well as use cases and applications built on top.

Spark 1.2.2 and 1.3.1 released

April 17, 2015

We are happy to announce the availability of Spark 1.2.2 and Spark 1.3.1! These are both maintenance releases that collectively feature the work of more than 90 developers.

Spark 1.3.0 released

March 13, 2015

We are happy to announce the availability of Spark 1.3.0! Spark 1.3.0 is the third release on the API-compatible 1.X line. It is Spark’s largest release ever, with contributions from 174 developers and more than 1,000 commits!

Spark 1.2.1 released

February 9, 2015

We are happy to announce the availability of Spark 1.2.1! This is a maintenance release that includes contributions from 69 developers. Spark 1.2.1 includes fixes across several areas of Spark, including the core API, Streaming, PySpark, SQL, GraphX, and MLlib.

Spark Summit East agenda posted, CFP open for West

January 21, 2015

The agenda for Spark Summit East is now posted, with 38 talks from organizations including Goldman Sachs, Baidu, Salesforce, Novartis, Cisco and others. This inaugural Spark conference on the US East Coast will run March 18th-19th 2015 in New York City. More details are available on the Spark Summit East website, where you can also register to attend.

Spark 1.2.0 released

December 18, 2014

We are happy to announce the availability of Spark 1.2.0! Spark 1.2.0 is the third release on the API-compatible 1.X line. It is Spark’s largest release ever, with contributions from 172 developers and more than 1,000 commits!

Spark 1.1.1 released

November 26, 2014

We are happy to announce the availability of Spark 1.1.1! This is a maintenance release that includes contributions from 55 developers. Spark 1.1.1 includes fixes across several areas of Spark, including the core API, Streaming, PySpark, SQL, GraphX, and MLlib.

Registration open for Spark Summit East 2015

November 26, 2014

Registration is now open for Spark Summit East 2015, to be held on March 18th and 19th in New York City. The conference will be a great chance to meet people from throughout the Spark community as well as attend training workshops on Spark. If you haven’t been to previous Spark Summits, you can find content from previous events on the Spark Summit website.

Spark wins Daytona Gray Sort 100TB Benchmark

November 5, 2014

We are proud to announce that Spark won the 2014 Gray Sort Benchmark (Daytona 100TB category). A team from Databricks including Spark committers, Reynold Xin, Xiangrui Meng, and Matei Zaharia, entered the benchmark using Spark. Spark won a tie with the Themis team from UCSD, and jointly set a new world record in sorting.

Submissions open for Spark Summit East 2015 in New York

October 18, 2014

After successful events in the past two years, the Spark Summit conference has expanded for 2015, offering both an event in New York on March 18-19 and one in San Francisco on June 15-17. The conference is a great chance to meet people from throughout the Spark community and see the latest news, tips and use cases.

Spark 1.1.0 released

September 11, 2014

We are happy to announce the availability of Spark 1.1.0! Spark 1.1.0 is the second release on the API-compatible 1.X line. It is Spark’s largest release ever, with contributions from 171 developers!

Spark 1.0.2 released

August 5, 2014

We are happy to announce the availability of Spark 1.0.2! This release includes contributions from 30 developers. Spark 1.0.2 includes fixes across several areas of Spark, including the core API, Streaming, PySpark, and MLlib.

Spark 0.9.2 released

July 23, 2014

We are happy to announce the availability of Spark 0.9.2! Apache Spark 0.9.2 is a maintenance release with bug fixes. We recommend all 0.9.x users to upgrade to this stable release. Contributions to this release came from 28 developers.

Spark Summit 2014 videos posted

July 18, 2014

The videos and slides for Spark Summit 2014 are now all available online. Watch them to see the latest news from the Spark community as well as use cases and applications built on top. In addition, training materials from the Summit, including hands-on exercises, are all available freely as well.

Spark 1.0.1 released

July 11, 2014

We are happy to announce the availability of Spark 1.0.1! This release includes contributions from 70 developers. Spark 1.0.0 includes fixes across several areas of Spark, including the core API, PySpark, and MLlib. It also includes new features in Spark’s (alpha) SQL library, including support for JSON data and performance and stability fixes.

Two weeks to Spark Summit 2014

June 16, 2014

There are now two weeks left to Spark Summit 2014, which will be held in San Francisco on June 30th to July 2nd. The Summit will contain presentations from over 50 organizations using Spark, focused on use cases and ongoing development.

Spark 1.0.0 released

May 30, 2014

We are happy to announce the availability of Spark 1.0.0! Spark 1.0.0 is the first in the 1.X line of releases, providing API stability for Spark’s core interfaces. It is Spark’s largest release ever, with contributions from 117 developers. This release expands Spark’s standard libraries, introducing a new SQL package (Spark SQL) that lets users integrate SQL queries into existing Spark workflows. MLlib, Spark’s machine learning library, is expanded with sparse vector support and several new algorithms. The GraphX and Streaming libraries also introduce new features and optimizations. Spark’s core engine adds support for secured YARN clusters, a unified tool for submitting Spark applications, and several performance and stability improvements.

Spark Summit agenda posted

May 11, 2014

The agenda for the Spark Summit 2014 conference is now available online. With talks from more than 50 organizations, it will be the biggest Spark event yet, bringing the developer and user communities together. Join us in person or tune in online to learn about the latest happenings in Spark.

Spark 0.9.1 released

April 9, 2014

We are happy to announce the availability of Spark 0.9.1! Apache Spark 0.9.1 is a maintenance release with bug fixes, performance improvements, better stability with YARN and improved parity of the Scala and Python API. We recommend all 0.9.0 users to upgrade to this stable release. Contributions to this release came from 37 developers.

Submissions and registration open for Spark Summit 2014

March 20, 2014

After last year’s successful first Spark Summit, registrations and talk submissions are now open for Spark Summit 2014. This will be a 3-day event in San Francisco organized by multiple companies in the Spark community. The event will run June 30th to July 2nd in San Francisco, CA.

Spark becomes top-level Apache project

February 27, 2014

The Apache Software Foundation announced today that Spark has graduated from the Apache Incubator to become a top-level Apache project, signifying that the project’s community and products have been well-governed under the ASF’s meritocratic process and principles. This is a major step for the community and we are very proud to share this news with users as we complete Spark’s move to Apache. Read more about Spark’s growth during the past year and from contributors and users in the ASF’s press release.

Spark 0.9.0 released

February 2, 2014

We are happy to announce the availability of Spark 0.9.0! Spark 0.9.0 is a major release and Spark’s largest release ever, with contributions from 83 developers. This release expands Spark’s standard libraries, introducing a new graph computation package (GraphX) and adding several new features to the machine learning and stream-processing packages. It also makes major improvements to the core engine, including external aggregations, a simplified H/A mode for long lived applications, and hardened YARN support.

Spark 0.8.1 released

December 19, 2013

We’ve just posted Spark Release 0.8.1, a maintenance and performance release for the Scala 2.9 version of Spark. 0.8.1 includes support for YARN 2.2, a high availability mode for the standalone scheduler, optimizations to the shuffle, and many other improvements. We recommend that all users update to this release. Visit the release notes to read about the new features, or download the release today.

Spark Summit 2013 is a Wrap

December 15, 2013

The Spark Summit 2013, held in early December 2013 in downtown San Francisco, was a success! Over 450 Spark developers and enthusiasts from 13 countries and more than 180 companies came to learn from project leaders and production users of Spark, Shark, Spark Streaming and related projects about use cases, recent developments, and the Spark community roadmap.

Announcing the first Spark Summit: December 2, 2013

October 8, 2013

We are excited to announce the first Spark Summit on Dec 2, 2013 in Downtown San Francisco. Come hear from key production users of Spark, Shark, Spark Streaming and related projects. Also find out where the development is going, and learn how to use the Spark stack in a variety of applications. The summit is being organized and sponsored by leading organizations in the Spark community.

Spark 0.8.0 released

September 25, 2013

We’re proud to announce the release of Apache Spark 0.8.0. Spark 0.8.0 is a major release that includes many new capabilities and usability improvements. It’s also our first release under the Apache incubator. It is the largest Spark release yet, with contributions from 67 developers and 24 companies. Major new features include an expanded monitoring framework and UI, a machine learning library, and support for running Spark inside of YARN.

Spark user survey and "Powered By" page

September 5, 2013

As we continue developing Spark, we would love to get feedback from users and hear what you’d like us to work on next. We’ve decided that a good way to do that is a survey – we hope to run this at regular intervals. If you have a few minutes to participate, fill in the survey here. Your time is greatly appreciated.

Fourth Spark screencast released

August 27, 2013

We have released the next screencast, A Standalone Job in Scala that takes you beyond the Spark shell, helping you write your first standalone Spark job.

Registration open for AMP Camp training camp in Berkeley

July 23, 2013

Want to learn how to use Spark, Shark, GraphX, and related technologies in person? The AMP Lab is hosting a two-day training workshop for them on August 29th and 30th in Berkeley. The workshop will include tutorials, talks from users, and over four hours of hands-on exercises. Registration is now open on the AMP Camp website, for a price of $250 per person. We recommend signing up early because last year’s workshop was sold out.

Spark mailing lists moving to Apache

July 22, 2013

As part of the Spark project's recent move to Apache, we are planning to migrate the mailing lists to Apache infrastructure this month, so that the existing Google groups will become read-only on September 1, 2013. To keep receiving updates about Spark or to participate in development discussions, please subscribe to the following lists:

user@spark.incubator.apache.org -- for usage questions, help, and announcements. (subscribe) (archives)
dev@spark.incubator.apache.org -- for people who want to contribute code to Spark. (subscribe) (archives)

Most users will probably want the User list, but individuals interested in contributing code to the project should also subscribe to the Dev list.

Spark 0.7.3 released

July 16, 2013

We’ve just posted Spark Release 0.7.3, a maintenance release that contains several fixes, including streaming API updates and new functionality for adding JARs to a spark-shell session. We recommend that all users update to this release. Visit the release notes to read about the new features, or download the release today.

Spark featured in Wired

June 21, 2013

Spark, its creators at the AMP Lab, and some of its users were featured in a Wired Enterprise article a few days ago. Read on to learn a little about how Spark is being used in industry.

Spark accepted into Apache Incubator

June 21, 2013

Spark was recently accepted into the Apache Incubator, which will serve as the long-term home for the project. While moving the source code and issue tracking to Apache will take some time, we are excited to be joining the community at Apache. Stay tuned on this site for updates on how the project hosting will change.

Spark 0.7.2 released

June 2, 2013

We’re happy to announce the release of Spark 0.7.2, a new maintenance release that includes several bug fixes and improvements, as well as new code examples and API features. We recommend that all users update to this release. Head over to the release notes to read about the new features, or download the release today.

Spark screencasts published

April 16, 2013

We have released the first two screencasts in a series of short hands-on video training courses we will be publishing to help new users get up and running with Spark in minutes.

Strata exercises now available online

March 17, 2013

At this year’s Strata conference, the AMP Lab hosted a full day of tutorials on Spark, Shark, and Spark Streaming, including online exercises on Amazon EC2. Those exercises are now available online, letting you learn Spark and Shark at your own pace on an EC2 cluster with real data. They are a great resource for learning the systems. You can also find slides from the Strata tutorials online, as well as videos from the AMP Camp workshop we held at Berkeley in August.

Spark 0.7.0 released

February 27, 2013

We’re proud to announce the release of Spark 0.7.0, a new major version of Spark that adds several key features, including a Python API for Spark and an alpha of Spark Streaming. This release is the result of the largest group of contributors yet behind a Spark release – 31 contributors from inside and outside Berkeley. Head over to the release notes to read more about the new features, or download the release today.

Spark/Shark Tutorial for Amazon EMR

February 24, 2013

This weekend, Amazon posted an article and code that make it easy to launch Spark and Shark on Elastic MapReduce. The article includes examples of how to run both interactive Scala commands and SQL queries from Shark on data in S3. Head over to the Amazon article for details. We’re very excited because, to our knowledge, this makes Spark the first non-Hadoop engine that you can launch with EMR.

Spark 0.6.2 released

February 7, 2013

We recently released Spark 0.6.2, a new version of Spark. This is a maintenance release that includes several bug fixes and usability improvements (see the release notes). We recommend that all users upgrade to this release.

Spark tips from Quantifind

January 12, 2013

Quantifind, one of the Bay Area companies that has been using Spark for predictive analytics, recently posted two useful entries on working with Spark in their tech blog:

Thanks for sharing this, and looking forward to see others!

Video up from first Spark development meetup

December 21, 2012

On December 18th, we held the first of a series of Spark development meetups, for people interested in learning the Spark codebase and contributing to the project. There was quite a bit more demand than we anticipated, with over 80 people signing up and 64 attending. The first meetup was an introduction to Spark internals. Thanks to one of the attendees, there’s now a video of the meetup on YouTube. We’ve also posted the slides. Look to see more development meetups on Spark and Shark in the future.

Spark in the news

December 21, 2012

Recently, we’ve seen quite a bit of coverage of Spark in the news. I wanted to list some of the more recent articles, for readers interested in learning more.

Curt Monash, editor of the popular DBMS2 blog, wrote a great introduction to Spark and Shark, as well as a more detailed technical overview.
Silicon Angle covered Spark and Shark after our presentation at Amazon re:Invent.
Datanami highlighted Shark in its survey of big data research projects.
O'Reilly's Strata blog recently covered Spark, Shark, and the Spark 0.6 release.
DataInformed interviewed two Spark users and wrote about their applications in anomaly detection, predictive analytics and data mining.

In other news, there will be a full day of tutorials on Spark and Shark at the O’Reilly Strata conference in February. They include a three-hour introduction to Spark, Shark and BDAS Tuesday morning, and a three-hour hands-on exercise session.

Spark 0.6.1 and 0.5.2 out

November 22, 2012

Today we’ve made available two maintenance releases for Spark: 0.6.1 and 0.5.2. They both contain important bug fixes as well as some new features, such as the ability to build against Hadoop 2 distributions. We recommend that users update to the latest version for their branch; for new users, we recommend 0.6.1.

Latest News

Spark News