Saturday, August 29, 2015

First time learning Apache HBase

Today, we will take another look at another big data technology. Apache HBase is the topic for today and before we dip our toe into Apache HBase, let's find out what actually is Apache HBase.

Apache HBase [1] is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al.[2]  Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Apache Hadoop [3].

In this article, we can setup a single node for this adventure. Before we begin, let's download a copy of Apache HBase here. Once downloaded, extract the compressed content. At the time of this writing, I'm using Apache HBase version 1.1.1 for this learning experience.

 user@localhost:~/Desktop/hbase-1.1.1$ ls  
 bin CHANGES.txt conf     docs hbase-webapps lib LICENSE.txt NOTICE.txt README.txt  

If you have not install java, go ahead and install it. Pick a recent java or at least java7. Make sure terminal prompt the correct version of java. An example would be as of following

 user@localhost:~/Desktop/hbase-1.1.1$ java -version  
 java version "1.7.0_55"  
 Java(TM) SE Runtime Environment (build 1.7.0_55-b13)  
 Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)  

If you cannot change system configuration for this java, then in the HBase configuration file, conf/, uncomment JAVA_HOME variable and set to the java that you installed. The main configuration file for hbase is conf/hbase-site.xml and we will now edit this file so it became such as following. Change to your environment as required.

 user@localhost:~/Desktop/hbase-1.1.1$ cat conf/hbase-site.xml   
 <?xml version="1.0"?>  
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  
  * Licensed to the Apache Software Foundation (ASF) under one  
  * or more contributor license agreements. See the NOTICE file  
  * distributed with this work for additional information  
  * regarding copyright ownership. The ASF licenses this file  
  * to you under the Apache License, Version 2.0 (the  
  * "License"); you may not use this file except in compliance  
  * with the License. You may obtain a copy of the License at  
  * Unless required by applicable law or agreed to in writing, software  
  * distributed under the License is distributed on an "AS IS" BASIS,  
  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  
  * See the License for the specific language governing permissions and  
  * limitations under the License.  

Okay, we are ready to start hbase. start it with a helpful script bin/

 user@localhost:~/Desktop/hbase-1.1.1$ bin/   
 starting master, logging to /home/user/Desktop/hbase-1.1.1/bin/../logs/hbase-user-master-localhost.out  
 user@localhost:~/Desktop/hbase-1.1.1/logs$ tail -F hbase-user-master-localhost.out SecurityAuth.audit hbase-user-master-localhost.log  
 ==> hbase-user-master-localhost.out <==  
 ==> SecurityAuth.audit <==  
 2015-08-18 17:49:41,533 INFO Connection from port: 36745 with version info: version: "1.1.1" url: "git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision: "d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun 23 14:44:07 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"  
 2015-08-18 17:49:46,812 INFO Connection from port: 53042 with version info: version: "1.1.1" url: "git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision: "d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun 23 14:44:07 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"  
 2015-08-18 17:49:48,309 INFO Connection from port: 53043 with version info: version: "1.1.1" url: "git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision: "d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun 23 14:44:07 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"  
 2015-08-18 17:49:49,317 INFO Connection from port: 53044 with version info: version: "1.1.1" url: "git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision: "d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun 23 14:44:07 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"  
 ==> hbase-user-master-localhost.log <==  
 2015-08-18 17:49:49,281 INFO [StoreOpener-78a2a3664205fcf679d2043ac3259648-1] hfile.CacheConfig: blockCache=LruBlockCache{blockCount=0, currentSize=831688, freeSize=808983544, maxSize=809815232, heapSize=831688, minSize=769324480, minFactor=0.95, multiSize=384662240, multiFactor=0.5, singleSize=192331120, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false  
 2015-08-18 17:49:49,282 INFO [StoreOpener-78a2a3664205fcf679d2043ac3259648-1] compactions.CompactionConfiguration: size [134217728, 9223372036854775807); files [3, 10); ratio 1.200000; off-peak ratio 5.000000; throttle point 2684354560; major period 604800000, major jitter 0.500000, min locality to compact 0.000000  
 2015-08-18 17:49:49,295 INFO [RS_OPEN_REGION-localhost:60631-0] regionserver.HRegion: Onlined 78a2a3664205fcf679d2043ac3259648; next sequenceid=2  
 2015-08-18 17:49:49,303 INFO [PostOpenDeployTasks:78a2a3664205fcf679d2043ac3259648] regionserver.HRegionServer: Post open deploy tasks for hbase:namespace,,1439891388424.78a2a3664205fcf679d2043ac3259648.  
 2015-08-18 17:49:49,322 INFO [PostOpenDeployTasks:78a2a3664205fcf679d2043ac3259648] hbase.MetaTableAccessor: Updated row hbase:namespace,,1439891388424.78a2a3664205fcf679d2043ac3259648. with server=localhost,60631,1439891378840  
 2015-08-18 17:49:49,332 INFO [AM.ZK.Worker-pool3-t6] master.RegionStates: Transition {78a2a3664205fcf679d2043ac3259648 state=OPENING, ts=1439891389276, server=localhost,60631,1439891378840} to {78a2a3664205fcf679d2043ac3259648 state=OPEN, ts=1439891389332, server=localhost,60631,1439891378840}  
 2015-08-18 17:49:49,603 INFO [ProcessThread(sid:0 cport:-1):] server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x14f4036b87d0000 type:create cxid:0x1d5 zxid:0x44 txntype:-1 reqpath:n/a Error Path:/hbase/namespace/default Error:KeeperErrorCode = NodeExists for /hbase/namespace/default  
 2015-08-18 17:49:49,625 INFO [ProcessThread(sid:0 cport:-1):] server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x14f4036b87d0000 type:create cxid:0x1d8 zxid:0x46 txntype:-1 reqpath:n/a Error Path:/hbase/namespace/hbase Error:KeeperErrorCode = NodeExists for /hbase/namespace/hbase  
 2015-08-18 17:49:49,639 INFO [localhost:51452.activeMasterManager] master.HMaster: Master has completed initialization  
 2015-08-18 17:49:49,642 INFO [localhost:51452.activeMasterManager] quotas.MasterQuotaManager: Quota support disabled  

and you notice, log file is also available and jps shown a HMaster is running.

 user@localhost: $ jps  
 22144 Jps  
 21793 HMaster  

okay, let's experience apache hbase using a hbase shell.

 user@localhost:~/Desktop/hbase-1.1.1$ ./bin/hbase shell  
 2015-08-18 17:55:25,134 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable  
 HBase Shell; enter 'help<RETURN>' for list of supported commands.  
 Type "exit<RETURN>" to leave the HBase Shell  
 Version 1.1.1, rd0a115a7267f54e01c72c603ec53e91ec418292f, Tue Jun 23 14:44:07 PDT 2015  
 A help command show very helpful description such as the followings.  
 hbase(main):001:0> help  
 HBase Shell, version 1.1.1, rd0a115a7267f54e01c72c603ec53e91ec418292f, Tue Jun 23 14:44:07 PDT 2015  
 Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.  
 Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.  
  Group name: general  
  Commands: status, table_help, version, whoami  
  Group name: ddl  
  Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, show_filters  
  Group name: namespace  
  Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables  
  Group name: dml  
  Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve  
  Group name: tools  
  Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, close_region, compact, compact_rs, flush, major_compact, merge_region, move, split, trace, unassign, wal_roll, zk_dump  
  Group name: replication  
  Commands: add_peer, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, list_peers, list_replicated_tables, remove_peer, remove_peer_tableCFs, set_peer_tableCFs, show_peer_tableCFs  
  Group name: snapshots  
  Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, list_snapshots, restore_snapshot, snapshot  
  Group name: configuration  
  Commands: update_all_config, update_config  
  Group name: quotas  
  Commands: list_quotas, set_quota  
  Group name: security  
  Commands: grant, revoke, user_permission  
  Group name: visibility labels  
  Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility  
 Quote all names in HBase Shell such as table and column names. Commas delimit  
 command parameters. Type <RETURN> after entering a command to run it.  
 Dictionaries of configuration used in the creation and alteration of tables are  
 Ruby Hashes. They look like this:  
  {'key1' => 'value1', 'key2' => 'value2', ...}  
 and are opened and closed with curley-braces. Key/values are delimited by the  
 '=>' character combination. Usually keys are predefined constants such as  
 NAME, VERSIONS, COMPRESSION, etc. Constants do not need to be quoted. Type  
 'Object.constants' to see a (messy) list of all constants in the environment.  
 If you are using binary keys or values and need to enter them in the shell, use  
 double-quote'd hexadecimal representation. For example:  
  hbase> get 't1', "key\x03\x3f\xcd"  
  hbase> get 't1', "key\003\023\011"  
  hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"  
 The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.  
 For more on the HBase Shell, see  

To create a table (column family),

 hbase(main):002:0> create 'test', 'cf'  
 0 row(s) in 1.5700 seconds  
 => Hbase::Table - test  

list information about a table.

 hbase(main):001:0> list 'test'  
 1 row(s) in 0.3530 seconds  
 => ["test"]  

let's put something into the table we have just created.

 hbase(main):002:0> put 'test', 'row1', 'cf:a', 'value1'  
 0 row(s) in 0.2280 seconds  
 hbase(main):003:0> put 'test', 'row2', 'cf:b', 'value2'  
 0 row(s) in 0.0140 seconds  
 hbase(main):004:0> put 'test', 'row3', 'cf:c', 'value3'  
 0 row(s) in 0.0060 seconds  

Here, we insert three values, one at a time. The first insert is at row1, column cf:a, with a value of value1. Columns in HBase are comprised of a column family prefix, cf in this example, followed by a colon and then a column qualifier suffix, a in this case.

To select the row from the table, use scan.

 hbase(main):005:0> scan 'test'  
 ROW                       COLUMN+CELL                                                                   
  row1                      column=cf:a, timestamp=1439892359305, value=value1                                                
  row2                      column=cf:b, timestamp=1439892363921, value=value2                                                
  row3                      column=cf:c, timestamp=1439892369775, value=value3                                                
 3 row(s) in 0.0420 seconds  

To get a row only.

 hbase(main):006:0> get 'test', 'row1'  
 COLUMN                      CELL                                                                       
  cf:a                      timestamp=1439892359305, value=value1                                                      
 1 row(s) in 0.0340 seconds  

Something really interesting about apache hbase, say if you want to delete or change settings of a table, you need to disable it first. After that, you can enable it back.

 hbase(main):007:0> disable 'test'  
 0 row(s) in 2.3610 seconds  
 hbase(main):008:0> enable 'test'  
 0 row(s) in 1.2790 seconds  

okay, now, let's delete this table.

 hbase(main):009:0> drop 'test'  
 ERROR: Table test is enabled. Disable it first.  
 Here is some help for this command:  
 Drop the named table. Table must first be disabled:  
  hbase> drop 't1'  
  hbase> drop 'ns1:t1'  
 hbase(main):010:0> disable 'test'  
 0 row(s) in 2.2640 seconds  
 hbase(main):011:0> drop 'test'  
 0 row(s) in 1.2800 seconds  

Okay, we are done for this basic learning. Let's quit for now.

 hbase(main):012:0> quit  
 To stop apache hbase instance,   
 user@localhost:~/Desktop/hbase-1.1.1$ ./bin/   
 stopping hbase.................  
 user@localhost:~/Desktop/hbase-1.1.1$ jps  
 23399 Jps  
 5445 org.eclipse.equinox.launcher_1.3.0.v20140415-2008.jar  

If you like me who came from apache cassandra, apache hbase looks very similar. If this interest you, I shall leave you with the following three links which will get you further.

No comments:

Post a Comment