Use the bulk load tool if you can. See Section 9.8, “Bulk Loading”. Otherwise, pay attention to the below.
Tables in HBase are initially created with one region by default. For bulk imports, this means that all clients will write to the same region until it is large enough to split and become distributed across the cluster. A useful pattern to speed up the bulk import process is to pre-create empty regions. Be somewhat conservative in this, because too-many regions can actually degrade performance.
There are two different approaches to pre-creating splits. The first approach is to rely
        on the default HBaseAdmin strategy (which is implemented in
          Bytes.split)... 
byte[] startKey = ...;   	// your lowest key
byte[] endKey = ...;   		// your highest key
int numberOfRegions = ...;	// # of regions to create
admin.createTable(table, startKey, endKey, numberOfRegions);
      And the other approach is to define the splits yourself...
byte[][] splits = ...; // create your own splits admin.createTable(table, splits);
See Section 6.3.7, “Relationship Between RowKeys and Region Splits” for issues related to understanding your keyspace and pre-creating regions. See Section 9.7.5, “Manual Region Splitting” for discussion on manually pre-splitting regions.
 The default behavior for Puts using the Write Ahead Log (WAL) is that
          HLog edits will be written immediately. If deferred log flush is
        used, WAL edits are kept in memory until the flush period. The benefit is aggregated and
        asynchronous HLog- writes, but the potential downside is that if the
        RegionServer goes down the yet-to-be-flushed edits are lost. This is safer, however, than
        not using WAL at all with Puts. 
 Deferred log flush can be configured on tables via HTableDescriptor.
        The default value of hbase.regionserver.optionallogflushinterval is
        1000ms. 
When performing a lot of Puts, make sure that setAutoFlush is set to false on your HTable
        instance. Otherwise, the Puts will be sent one at a time to the RegionServer. Puts added via
           htable.add(Put) and  htable.add( <List> Put) wind up in
        the same write buffer. If autoFlush = false, these messages are not sent until
        the write-buffer is filled. To explicitly flush the messages, call
          flushCommits. Calling close on the
          HTable instance will invoke
        flushCommits.
A frequent request is to disable the WAL to increase performance of Puts. This is only appropriate for bulk loads, as it puts your data at risk by removing the protection of the WAL in the event of a region server crash. Bulk loads can be re-run in the event of a crash, with little risk of data loss.
If you disable the WAL for anything other than bulk loads, your data is at risk.
In general, it is best to use WAL for Puts, and where loading throughput is a concern to use bulk loading techniques instead. For normal Puts, you are not likely to see a performance improvement which would outweigh the risk. To disable the WAL, see Section 9.6.5.4, “Disabling the WAL”.
In addition to using the writeBuffer, grouping Puts by
        RegionServer can reduce the number of client RPC calls per writeBuffer flush. There is a
        utility HTableUtil currently on TRUNK that does this, but you can
        either copy that or implement your own version for those still on 0.90.x or earlier. 
When writing a lot of data to an HBase table from a MR job (e.g., with TableOutputFormat), and specifically where Puts are being emitted from the Mapper, skip the Reducer step. When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then sorted/shuffled to other Reducers that will most likely be off-node. It's far more efficient to just write directly to HBase.
For summary jobs where HBase is used as a source and a sink, then writes will be coming from the Reducer step (e.g., summarize values then write out result). This is a different processing problem than from the the above case.
If all your data is being written to one region at a time, then re-read the section on processing timeseries data.
Also, if you are pre-splitting regions and all your data is still winding up in a single region even though your keys aren't monotonically increasing, confirm that your keyspace actually works with the split strategy. There are a variety of reasons that regions may appear "well split" but won't work with your data. As the HBase client communicates directly with the RegionServers, this can be obtained via HTable.getRegionLocation.
See Section 14.8.2, “ Table Creation: Pre-Creating Regions ”, as well as Section 14.4, “HBase Configurations”