HBase

HBase is a high performance database using the Hadoop framework.

In HBase you have rows and column families which group several columns together. So it looks like a normal SQL Database where some columns somehow belong together.

The column families must exist as part of the table schema definition while the columns within it can be created on demand.

Download HBase, extract the folder somewhere, add its bin folder to the PATH, set the JAVA_HOME variable.

Set at least the JAVA_HOME value in conf/hbase-env.sh within the installation folder. In conf/hbase-site.xml set the following to a folder of your choice (/tmp will be deleted on reboot, so maybe something else)

<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>file:///tmp/hbase</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/tmp/zookeeper</value>
  </property>
</configuration>

Test if it works

# hbase

Now start it

# start-hbase.sh

Failed for me with

org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss FOR /hbase/hbaseid

and it helped to ensure JAVA_HOME is set in the hbase-env.sh, hbase-site.xml points to valid folders which do not exist but can be created and than stop and start hbase again.

Create the table people with column groups 'personal data' and 'professional data'. There are no columns yet in them.

# CREATE 'people', 'personal data', 'professional data'

Show the new table

# DESCRIBE 'people'

List of all tables

# list

How many rows do we have in our table

# COUNT 'people'

Put some entries into the table. The columns are created on the fly within the specified column groups.

# put 'people','42','personal data:name','Mr. John Doe'
# put 'people','42','personal data:city','New York'
# put 'people','42','professional data:company','Example inc.'

List the whole table

# scan 'people'

Get the row with the named id

# GET 'people', '42'
COLUMN                          CELL
 personal DATA:city             TIMESTAMP=1431089145062, VALUE=NEW York
 personal DATA:name             TIMESTAMP=1431089144977, VALUE=Mr. John Doe
 professional DATA:company      TIMESTAMP=1431089146357, VALUE=Example inc.
3 ROW(s) IN 0.0160 seconds

Delete a full row

# deleteall 'people', 'abc'

Disable a table, required before you can delete it

# disable 'test'

Enable a disabled table again

enable ‘test'

Drop table

# DROP 'test'

Read commands line be line from a file

./hbase shell ./MyCommands.txt

Table variables Normally you would create and use a table like this

# CREATE 'myTable', 'colA', 'colB'
# put 'myTable', 'id', 'colA', 'value'
# scan 'myTable'
# DESCRIBE 'myTable'
# disable 'myTable'
# DROP 'myTable'

You can also get a variable during creating the table and use it instead, saves you from repeating the table name in any command

# t = CREATE 'myTable', 'colA', 'colB'
# t.put 'id', 'colA', 'value'
# t.scan
# t.describe
# t.disable
# t.drop

You can also get such a variable from an existing table

# t2 = get_table 'myTable'

You can even get more than one table and issue command per table

TABLES = list('MyT.*')
TABLES.map { |t| discribe t ; scan  t}

HBase Filter

HBase Filtering

From the HBase Shell

# scan 'my_table',{ FILTER => "PageFilter(10)" }

From Java code

final FILTER FILTER = NEW PageFilter(10);
final Scan scan = NEW Scan();
scan.setFilter(FILTER);

WTF

# import java.text.SimpleDateFormat
# import java.text.ParsePosition
# SimpleDateFormat.new("yy/MM/dd HH:mm:ss").parse("08/08/16 20:56:29", ParsePosition.new(0)).getTime()
1218920189000
# import java.util.Date
# DATE.new(1218920189000).toString()
"Sat Aug 16 20:56:29 UTC 2008”