Most development is done on the master branch, which is named
master
in the Git repository. Previously, HBase used Subversion, in
which the master branch was called TRUNK
. Branches exist for minor
releases, and important features and bug fixes are often back-ported.
Each maintained release branch has a release manager, who volunteers to coordinate new features and bug fixes are backported to that release. The release managers are committers. If you would like your feature or bug fix to be included in a given release, communicate with that release manager. If this list goes out of date or you can't reach the listed person, reach out to someone else on the list.
End-of-life releases are not included in this list.
See Section 18.3.1.1, “Code Formatting” and Section 18.10.3.2, “Code Formatting Conventions”.
Interfaces are classified both by audience and by stability level. These labels appear at the head of a class. The conventions followed by HBase are inherited by its parent project, Hadoop.
The following interface classifications are commonly used:
@InterfaceAudience
@InterfaceAudience.Public
APIs for users and HBase applications. These APIs will be deprecated through major versions of HBase.
@InterfaceAudience.Private
APIs for HBase internals developers. No guarantees on
compatibility or availability in future versions. Private interfaces
do not need an @InterfaceStability
classification.
@InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.COPROC)
APIs for HBase coprocessor writers. As of HBase 0.92/0.94/0.96/0.98 this api is still unstable. No guarantees on compatibility with future versions.
@InterfaceAudience
ClassificationPackages without an @InterfaceAudience
label are
considered private. Mark your new packages if publicly
accessible.
Only interfaces classified @InterfaceAudience.Public
should
be included in API documentation (Javadoc). Committers must add new package
excludes ExcludePackageNames
section of the
pom.xml
for new packages which do not contain
public classes.
@InterfaceStability
@InterfaceStability
is important for packages marked
@InterfaceAudience.Public
.
@InterfaceStability.Stable
Public packages marked as stable cannot be changed without a deprecation path or a very good reason.
@InterfaceStability.Unstable
Public packages marked as unstable can be changed without a deprecation path.
@InterfaceStability.Evolving
Public packages marked as evolving may be changed, but it is discouraged.
@InterfaceStability
LabelPublic classes with no @InterfaceStability
label are
discouraged, and should be considered implicitly unstable.
If you are unclear about how to mark packages, ask on the development list.
Please adhere to the following guidelines so that your patches can be reviewed more quickly. These guidelines have been developed based upon common feedback on patches from new contributors.
See the Code Conventions for the Java Programming Language for more information on coding conventions in Java.
Do not use extra spaces around brackets. Use the second style, rather than the first.
if ( foo.equals( bar ) ) { // don't do this
if (foo.equals(bar)) {
foo = barArray[ i ]; // don't do this
foo = barArray[i];
Auto-generated code in Eclipse often uses bad variable names such as
arg0
. Use more informative variable names. Use code
like the second example here.
public void readFields(DataInput arg0) throws IOException { // don't do this foo = arg0.readUTF(); // don't do this
public void readFields(DataInput di) throws IOException { foo = di.readUTF();
Keep lines less than 100 characters. You can configure your IDE to do this automatically.
Bar bar = foo.veryLongMethodWithManyArguments(argument1, argument2, argument3, argument4, argument5, argument6, argument7, argument8, argument9); // don't do this
Bar bar = foo.veryLongMethodWithManyArguments( argument1, argument2, argument3,argument4, argument5, argument6, argument7, argument8, argument9);
Trailing spaces are a common problem. Be sure there is a line break after the end of your code, and avoid lines with nothing but whitespace. This makes diffs more meaningful. You can configure your IDE to help with this.
Bar bar = foo.getBar(); <--- imagine there is an extra space(s) after the semicolon.
This is also a very common feedback item. Don't forget Javadoc!
Javadoc warnings are checked during precommit. If the precommit tool gives you a '-1', please fix the javadoc issue. Your patch won't be committed if it adds such warnings.
Findbugs
is used to detect common bugs pattern. It is checked
during the precommit build by Apache's Jenkins. If errors are found, please
fix them. You can run findbugs locally with mvn
findbugs:findbugs, which will generate the
findbugs
files locally. Sometimes, you may have to write
code smarter than findbugs
. You can annotate your code to tell
findbugs
you know what you're doing, by annotating your
class with the following annotation:
@edu.umd.cs.findbugs.annotations.SuppressWarnings( value="HE_EQUALS_USE_HASHCODE", justification="I know what I'm doing")
It is important to use the Apache-licensed version of the annotations.
Don't just leave the @param arguments the way your IDE generated them.:
/** * * @param bar <---- don't do this!!!! * @return <---- or this!!!! */ public Foo getFoo(Bar bar);
Either add something descriptive to the @param
and
@return
lines, or just remove them. The preference is to
add something descriptive and useful.
If you submit a patch for one thing, don't do auto-reformatting or unrelated reformatting of code on a completely different area of code.
Likewise, don't add unrelated cleanup or refactorings outside the scope of your Jira.
Make sure that you're clear about what you are testing in your unit tests and why.
In 0.96, HBase moved to protocol buffers (protobufs). The below section on Writables applies to 0.94.x and previous, not to 0.96 and beyond.
Every class returned by RegionServers must implement the
Writable
interface. If you are creating a new class that
needs to implement this interface, do not forget the default constructor.
We don't have many but what we have we list below. All are subject to challenge of course but until then, please hold to the rules of the road.
ZooKeeper state should transient (treat it like memory). If ZooKeeper state is deleted, hbase should be able to recover and essentially be in the same state.
Exceptions
There are currently a few exceptions that we need to fix around whether a table is enabled or disabled.
Replication data is currently stored only in ZooKeeper. Deleting
ZooKeeper data related to replication may cause replication to be
disabled. Do not delete the replication tree,
/hbase/replication/
.
Replication may be disrupted and data loss may occur if you delete
the replication tree (/hbase/replication/
) from
ZooKeeper. Follow progress on this issue at HBASE-10295.
If you are developing Apache HBase, frequently it is useful to test your changes against a more-real cluster than what you find in unit tests. In this case, HBase can be run directly from the source in local-mode. All you need to do is run:
${HBASE_HOME}/bin/start-hbase.sh
This will spin up a full local-cluster, just as if you had packaged up HBase and installed it on your machine.
Keep in mind that you will need to have installed HBase into your local maven repository for the in-situ cluster to work properly. That is, you will need to run:
mvn clean install -DskipTests
to ensure that maven can find the correct classpath and dependencies. Generally, the above command is just a good thing to try running first, if maven is acting oddly.
After adding a new feature a developer might want to add metrics. HBase exposes metrics using the Hadoop Metrics 2 system, so adding a new metric involves exposing that metric to the hadoop system. Unfortunately the API of metrics2 changed from hadoop 1 to hadoop 2. In order to get around this a set of interfaces and implementations have to be loaded at runtime. To get an in-depth look at the reasoning and structure of these classes you can read the blog post located here. To add a metric to an existing MBean follow the short guide below:
Inside of the source interface the corresponds to where the metrics are generated (eg MetricsMasterSource for things coming from HMaster) create new static strings for metric name and description. Then add a new method that will be called to add new reading.
Inside of the implementation of the source (eg. MetricsMasterSourceImpl in the above example) create a new histogram, counter, gauge, or stat in the init method. Then in the method that was added to the interface wire up the parameter passed in to the histogram.
Now add tests that make sure the data is correctly exported to the metrics 2 system. For this the MetricsAssertHelper is provided.
HBase moved to GIT from SVN. Until we develop our own documentation for how to contribute patches in our new GIT context, caveat the fact that we have a different branching model and that we don't currently do the merge practice described in the following, the accumulo doc on how to contribute and develop after our move to GIT is worth a read.
If you are new to submitting patches to open source or new to submitting patches to Apache, start by reading the On Contributing Patches page from Apache Commons Project. It provides a nice overview that applies equally to the Apache HBase Project.
Patching Workflow
Always patch against the master branch first, even if you want to patch in another branch. HBase committers always apply patches first to the master branch, and backport if necessary.
Submit one single patch for a fix. If necessary, squash local commits to merge local commits into a single one first. See this Stack Overflow question for more information about squashing commits.
The patch should have the JIRA ID in the name. If you are generating from a branch, include the target branch in the filename. A common naming scheme for patches is:
HBASE-XXXX
.patch
HBASE-XXXX
-0.90.patch # to denote that the patch is against branch 0.90
HBASE-XXXX
-v3.patch # to denote that this is the third version of the patch
To submit a patch, first create it using one of the methods in Methods to Create Patches. Next, attach the patch to the JIRA (one patch for the whole fix), using the → dialog. Next, click the button, which triggers the Hudson job which checks the patch for validity.
Please understand that not every patch may get committed, and that feedback will likely be provided on the patch.
If your patch is longer than a single screen, also attach a Review Board to the case. See Section 18.10.7.4, “ReviewBoard”.
If you need to revise your patch, leave the previous patch file(s)
attached to the JIRA, and upload the new one, following the naming
conventions in Section 18.10.7.1, “Create Patch”. Cancel the
Patch Available flag and then re-trigger it, by toggling the
button in JIRA. JIRA sorts
attached files by the time they were attached, and has no problem with
multiple attachments with the same name. However, at times it is easier
to refer to different version of a patch if you add
-vX
, where the X
is
the version (starting with 2).
If you need to submit your patch against multiple branches, rather than just master, name each version of the patch with the branch it is for, following the naming conventions in Section 18.10.7.1, “Create Patch”.
Methods to Create Patches
Select the
→ menu item.git format-patch
is preferred because it preserves
commit messages. Use git squash
first, to combine
smaller commits into a single larger one.
git format-patch --no-prefix origin/master --stdout > HBASE-XXXX.patch
git diff --no-prefix origin/master > HBASE-XXXX.patch
svn diff > HBASE-XXXX.patch
Make sure you review Section 18.3.1.1, “Code Formatting” and Section 18.10.3.2, “Code Formatting Conventions” for code style.
Yes, please. Please try to include unit tests with every code patch (and especially new classes and large changes). Make sure unit tests pass locally before submitting the patch.
Also, see Section 19.2, “Mockito”.
If you are creating a new unit test class, notice how other unit test classes have classification/sizing annotations at the top and a static method on the end. Be sure to include these in any new unit test files you generate. See Section 18.9, “Tests” for more on how the annotations work.
Significant new features should provide an integration test in addition to unit tests, suitable for exercising the new feature at different points in its configuration space.
Patches larger than one screen, or patches that will be tricky to review, should go through ReviewBoard.
Procedure 18.3. Use ReviewBoard
Register for an account if you don't already have one. It does not use the credentials from issues.apache.org. Log in.
Click New Review Request.
Choose the hbase-git
repository. Click Choose File
to select the diff and optionally a parent diff. Click .
Fill in the fields as required. At the minimum, fill in the
Summary and choose hbase
as
the Review Group. If you fill in the
Bugs field, the review board links back to the
relevant JIRA. The more fields you fill in, the better. Click
to make your review request public.
An email will be sent to everyone in the hbase
group,
to review the patch.
Back in your JIRA, click
→ → , and paste in the URL of your ReviewBoard request. This attaches the ReviewBoard to the JIRA, for easy access.To cancel the request, click
→ .For more information on how to use ReviewBoard, see the ReviewBoard documentation.
New committers are encouraged to first read Apache's generic committer documentation:
HBase committers should, as often as possible, attempt to review patches submitted by others. Ideally every submitted patch will get reviewed by a committer within a few days. If a committer reviews a patch they have not authored, and believe it to be of sufficient quality, then they can commit the patch, otherwise the patch should be cancelled with a clear explanation for why it was rejected.
The list of submitted patches is in the HBase Review Queue, which is ordered by time of last modification. Committers should scan the list from top to bottom, looking for patches that they feel qualified to review and possibly commit.
For non-trivial changes, it is required to get another committer to review
your own patches before commit. Use the +1
response from another committer before committing.
Patches which do not adhere to the guidelines in HowToContribute and to the code review checklist should be rejected. Committers should always be polite to contributors and try to instruct and encourage them to contribute better patches. If a committer wishes to improve an unacceptable patch, then it should first be rejected, and a new patch should be attached by the committer for review.
Committers commit patches to the Apache HBase GIT repository.
Make sure your local configuration is correct, especially your identity and email. Examine the output of the $ git config --list command and be sure it is correct. See this GitHub article, Set Up Git if you need pointers.
When you commit a patch, please:
Include the Jira issue id in the commit message, along with a short description of the change and the name of the contributor if it is not you. Be sure to get the issue id right, as this causes Jira to link to the change in Subversion (use the issue's "All" tab to see these).
Resolve the issue as fixed, thanking the contributor. Always set the "Fix Version" at this point, but please only set a single fix version, the earliest release in which the change will appear.
The commit message should contain the JIRA ID and a description of what the patch does. The preferred commit message format is:
<jira-id> <jira-title> (<contributor-name-if-not-commit-author>)
HBASE-12345 Fix All The Things (jane@example.com)
If the contributor used git format-patch to generate the patch, their commit message is in their patch and you can use that, but be sure the JIRA ID is at the front of the commit message, even if the contributor left it out.
We've established the practice of committing to trunk and then cherry
picking back to branches whenever possible. When there is a minor
conflict we can fix it up and just proceed with the commit. The
resulting commit retains the original author. When the amending author
is different from the original committer, add notice of this at the end
of the commit message as: Amending-Author: Author
<committer&apache>
See discussion at HBase, mail # dev
- [DISCUSSION] Best practice when amending commits cherry picked
from master to branch.
If a committer commits a patch, it is their responsibility to make sure it passes the test suite. It is helpful if contributors keep an eye out that their patch does not break the hbase build and/or tests, but ultimately, a contributor cannot be expected to be aware of all the particular vagaries and interconnections that occur in a project like HBase. A committer should.
In the thread HBase, mail # dev - ANNOUNCEMENT: Git Migration In Progress (WAS => Re: Git Migration), it was agreed on the following patch flow
Develop and commit the patch against trunk/master first.
Try to cherry-pick the patch when backporting if possible.
If this does not work, manually commit the patch to the branch.
Committers should hang out in the #hbase room on irc.freenode.net for real-time discussions. However any substantive discussion (as with any off-list project-related discussion) should be re-iterated in Jira or on the developer list.
Misspellings and/or bad grammar is preferable to the disruption a JIRA comment edit causes: See the discussion at Re:(HBASE-451) Remove HTableDescriptor from HRegionInfo