Name | Organization |
---|---|
Sameer Agarwal | |
Michael Armbrust | Databricks |
Joseph Bradley | Databricks |
Matthew Cheah | Palantir |
Felix Cheung | Uber |
Mosharaf Chowdhury | University of Michigan, Ann Arbor |
Bryan Cutler | IBM |
Jason Dai | Intel |
Tathagata Das | Databricks |
Ankur Dave | UC Berkeley |
Aaron Davidson | Databricks |
Thomas Dudziak | |
Erik Erlandson | Red Hat |
Robert Evans | Oath |
Wenchen Fan | Databricks |
Joseph Gonzalez | UC Berkeley |
Thomas Graves | Oath |
Stephen Haberman | |
Mark Hamstra | ClearStory Data |
Seth Hendrickson | Cloudera |
Herman van Hovell | Databricks |
Yin Huai | Databricks |
Shane Huang | Intel |
Dongjoon Hyun | Hortonworks |
Kazuaki Ishizaki | IBM |
Xingbo Jiang | Databricks |
Holden Karau | |
Shane Knapp | UC Berkeley |
Cody Koeninger | Nexstar Digital |
Andy Konwinski | Databricks |
Hyukjin Kwon | Hortonworks |
Ryan LeCompte | Quantifind |
Haoyuan Li | Alluxio |
Xiao Li | Databricks |
Yinan Li | |
Davies Liu | Juicedata |
Cheng Lian | Databricks |
Yanbo Liang | Hortonworks |
Sean McNamara | Oracle |
Xiangrui Meng | Databricks |
Mridul Muralidharam | Hortonworks |
Andrew Or | Princeton University |
Kay Ousterhout | LightStep |
Sean Owen | Databricks |
Tejas Patil | |
Nick Pentreath | IBM |
Anirudh Ramanathan | |
Imran Rashid | Cloudera |
Charles Reiss | University of Virginia |
Josh Rosen | Databricks |
Sandy Ryza | Remix |
Kousuke Saruta | NTT Data |
Saisai Shao | Tencent |
Prashant Sharma | IBM |
Ram Sriharsha | Databricks |
DB Tsai | Apple |
Takuya Ueshin | Databricks |
Marcelo Vanzin | Cloudera |
Shivaram Venkataraman | University of Wisconsin, Madison |
Zhenhua Wang | Huawei |
Patrick Wendell | Databricks |
Andrew Xia | Alibaba |
Reynold Xin | Databricks |
Takeshi Yamamuro | NTT |
Burak Yavuz | Databricks |
Matei Zaharia | Databricks, Stanford |
Shixiong Zhu | Databricks |
To get started contributing to Spark, learn how to contribute – anyone can submit patches, documentation and examples to the project.
The PMC regularly adds new committers from the active contributors, based on their contributions to Spark. The qualifications for new committers include:
The type and level of contributions considered may vary by project area – for example, we greatly encourage contributors who want to work on mainly the documentation, or mainly on platform support for specific OSes, storage systems, etc.
The PMC also adds new PMC members. PMC members are expected to carry out PMC responsibilities as described in Apache Guidance, including helping vote on releases, enforce Apache project trademarks, take responsibility for legal and license issues, and ensure the project follows Apache project mechanics. The PMC periodically adds committers to the PMC who have shown they understand and can help with these activities.
All contributions should be reviewed before merging as described in
Contributing to Spark.
In particular, if you are working on an area of the codebase you are unfamiliar with, look at the
Git history for that code to see who reviewed patches before. You can do this using
git log --format=full <filename>
, by examining the “Commit” field to see who committed each patch.
Changes pushed to the master branch on Apache cannot be removed; that is, we can’t force-push to it. So please don’t add any test commits or anything like that, only real patches.
To use the merge_spark_pr.py
script described below, you
will need to add a git remote called apache
at https://github.com/apache/spark
,
as well as one called apache-github
at git://github.com/apache/spark
.
You will likely also have a remote origin
pointing to your fork of Spark, and
upstream
pointing to the apache/spark
GitHub repo.
If correct, your git remote -v
should look like:
apache https://github.com/apache/spark.git (fetch)
apache https://github.com/apache/spark.git (push)
apache-github git://github.com/apache/spark (fetch)
apache-github git://github.com/apache/spark (push)
origin https://github.com/[your username]/spark.git (fetch)
origin https://github.com/[your username]/spark.git (push)
upstream https://github.com/apache/spark.git (fetch)
upstream https://github.com/apache/spark.git (push)
For the apache
repo, you will need to set up command-line authentication to GitHub. This may
include setting up an SSH key and/or personal access token. See:
Ask dev@spark.apache.org
if you have trouble with these steps, or want help doing your first merge.
All merges should be done using the dev/merge_spark_pr.py, which squashes the pull request’s changes into one commit.
The script is fairly self explanatory and walks you through steps and options interactively.
If you want to amend a commit before merging – which should be used for trivial touch-ups –
then simply let the script wait at the point where it asks you if you want to push to Apache.
Then, in a separate window, modify the code and push a commit. Run git rebase -i HEAD~2
and
“squash” your new commit. Edit the commit message just after to remove your commit message.
You can verify the result is one change with git log
. Then resume the script in the other window.
Also, please remember to set Assignee on JIRAs where applicable when they are resolved. The script can do this automatically in most cases. However where the contributor is not yet a part of the Contributors group for the Spark project in ASF JIRA, it won’t work until they are added. Ask an admin to add the person to Contributors at https://issues.apache.org/jira/plugins/servlet/project-config/SPARK/roles .
Once a PR is merged please leave a comment on the PR stating which branch(es) it has been merged with.
From pwendell
:
The trade off when backporting is you get to deliver the fix to people running older versions (great!), but you risk introducing new or even worse bugs in maintenance releases (bad!). The decision point is when you have a bug fix and it’s not clear whether it is worth backporting.
I think the following facets are important to consider:
For me, the consequence of these is that we should backport in the following situations:
We tend to avoid backports in the converse situations: