HBase数据迁移方案介绍

发布时间：2019-08-09 12:15:30 所属栏目：MySql教程来源：ballwql

导读：副标题#e# 一、前言 HBase数据迁移是很常见的操作，目前业界主要的迁移方式主要分为以下几类：图1.HBase数据迁移方案从上面图中可看出，目前的方案主要有四类，Hadoop层有一类，HBase层有三类。下面分别介绍一下。二、Hadoop层数据迁移 2.1 方案介绍 Had

来看下copyTable的一些使用参数：

Usage: CopyTable [general options] [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] <tablename>  
Options:  
 rs.class     hbase.regionserver.class of the peer cluster  
              specify if different from current cluster  
 rs.impl      hbase.regionserver.impl of the peer cluster  
 startrow     the start row  
 stoprow      the stop row  
 starttime    beginning of the time range (unixtime in millis)  
              without endtime means from starttime to forever  
 endtime      end of the time range.  Ignored if no starttime specified.  
 versions     number of cell versions to copy  
 new.name     new table's name  
 peer.adr     Address of the peer cluster given in the format  
              hbase.zookeeer.quorum:hbase.zookeeper.client.port:zookeeper.znode.parent  
 families     comma-separated list of families to copy  
              To copy from cf1 to cf2, give sourceCfName:destCfName.   
              To keep the same name, just give "cfName"  
 all.cells    also copy delete markers and deleted cells  
Args:  
 tablename    Name of the table to copy  
Examples:  
 To copy 'TestTable' to a cluster that uses replication for a 1 hour window:  
 $ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --starttime=1265875194289 --endtime=1265878794289 --peer.adr=server1,server2,server3:2181:/hbase --families=myOldCf:myNewCf,cf2,cf3 TestTable  
 For performance consider the following general options:  
-Dhbase.client.scanner.caching=100  
-Dmapred.map.tasks.speculative.execution=false

从上面参数，可以看出，copyTable支持设定需要复制的表的时间范围，cell的版本，也可以指定列簇，设定从集群的地址，起始/结束行键等。参数还是很灵活的。

copyTable支持如下几个场景：

1、表深度拷贝：相当于一个快照，不过这个快照是包含原表实际数据的，0.94.x版本之前是不支持snapshot快照命令的，所以用copyTable相当于可以实现对原表的拷贝，使用方式如下：

create 'table_snapshot',{NAME=>"i"}  
hbase org.apache.hadoop.hbase.mapreduce.CopyTable --new.name=tableCopy table_snapshot

2、集群间拷贝：在集群之间以表维度同步一个表数据，使用方式如下：

create 'table_test',{NAME=>"i"}   #目的集群上先创建一个与原表结构相同的表  
hbase org.apache.hadoop.hbase.mapreduce.CopyTable --peer.adr=zk-addr1,zk-addr2,zk-addr3:2181:/hbase table_test

3、增量备份：增量备份表数据，参数中支持timeRange，指定要备份的时间范围，使用方式如下：

hbase org.apache.hadoop.hbase.mapreduce.CopyTable ... --starttime=start_timestamp --endtime=end_timestamp

（编辑：常州站长网）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!

3/9

首页

尾页