Understanding HBase

To put it in simple form HBase is a distributed datastore, where the basic container to hold data is called Table as in RDBMS world. A record in Hbase Table consists of RowKey, ColumnFamily, ColumnQualifier, Timestamp and value.

RowKey - Its the unique identifier for a record.

ColumnFamily - Its a way to group related columns/attributes together.

ColumnQualifier - Its the actual attribute/field which is supposed to have/hold a value.

TimeStamp - Each value is associated with a time stamp, by default HBase holds 3 versions of value with respect to the timestamp

Value - Its the value of the attribute/field. 

Say we need to store Employee attendance details into HBase. Now an employee would be working or would have taken off. If working he would be either at office or working from home. If he happens to be on leave it could be a public holiday or he would have taken leave or would be on comp-off against some date.

JSON format to store attendance data when in office would be like-

{"101":{
    "working":{
        "home":"",
        "office":"9"
    }
    ,
    "off":{
        "vacation":"",
        "publicHoliday":"",
        "comp-off":""
    }
}}


Let us see the above data from HBase perspective -

say the table name is 'ATTENDANCE'

RowKey - <date>:101
ColumnFamilies(CF) - working, off
ColumnQualifier(CQ) - home, office, vacation, publicHoliday,comp-off
Value - for CQ office is 9

HBase Shell - Create the ATTENDANCE table in HBase

HBase Sell is an interface to HBase, which provides commands to interact with HBase. To initiate it, type hbase shell on the terminal.

To create the table we would issue the following command.

create 'ATTENDANCE' , 'WORKING' , 'OFF'

For an employee 101 for the date 20131103, working for 9 hrs in office we put the data as below

put 'ATTENDANCE' , '20131103:101' , 'WORKING:OFFICE' , '9'

In case he takes a vacation on 20131104, we put the data as below

put 'ATTENDANCE' , '20131104:101' , 'OFF:VACATION' , '9'












No comments:

Post a Comment