求知 文章 文库 Lib 视频 iPerson 课程 认证 咨询 工具 讲座 Modeler   Code  
会员   
 
  
 
 
     
   
分享到
Hbase使用filter快速高效查询
 

发布于2013-4-28

 

几大Filters

1、Comparision Filters

1.1 RowFilter

1.2 FamilyFilter

1.3 QualifierFilter

1.4 ValueFilter

1.5 DependentColumnFilter

2、Dedicated Filters

2.1 SingleColumnValueFilter

2.2 SingleColumnValueExcludeFilter

2.3 PrefixFilter

2.4 PageFilter

2.5 KeyOnlyFilter

2.6 FirstKeyOnlyFilter

2.7 TimestampsFilter

2.8 RandomRowFilter

3、Decorating Filters

3.1 SkipFilter

3.2 WhileMatchFilters

一个简单的示例 SingleColumnValueFilter

public static void selectByFilter(String tablename,List<String> arr) throws IOException{    
HTable table=new HTable(hbaseConfig,tablename);
FilterList filterList = new FilterList();
Scan s1 = new Scan();
for(String v:arr){ // 各个条件之间是“与”的关系
String [] s=v.split(",");
filterList.addFilter(new SingleColumnValueFilter(Bytes.toBytes(s[0]),
Bytes.toBytes(s[1]),
CompareOp.EQUAL,Bytes.toBytes(s[2])
)
);
// 添加下面这一行后,则只返回指定的cell,同一行中的其他cell不返回
// s1.addColumn(Bytes.toBytes(s[0]), Bytes.toBytes(s[1]));
}
s1.setFilter(filterList);
ResultScanner ResultScannerFilterList = table.getScanner(s1);
for(Result rr=ResultScannerFilterList.next();rr!=null;rr=ResultScannerFilterList.next()){
for(KeyValue kv:rr.list()){
System.out.println("row : "+new String(kv.getRow()));
System.out.println("column : "+new String(kv.getColumn()));
System.out.println("value : "+new String(kv.getValue()));
}
}
}

MultipleColumnPrefixFilter

api上介绍如下

This filter is used for selecting only those keys with columns that matches a particular prefix. 
For example, if prefix is 'an', it will pass keys will columns like 'and', 'anti' but not keys with columns like 'ball', 'act'.   

构造方法如下

public MultipleColumnPrefixFilter(byte[][] prefixes)  

传入多个prefix

源码里说明如下

public MultipleColumnPrefixFilter(final byte [][] prefixes) {  
if (prefixes != null) {
for (int i = 0; i < prefixes.length; i++) {
if (!sortedPrefixes.add(prefixes[i]))
throw new IllegalArgumentException ("prefixes must be distinct");
}
}
}

示例代码如下:是我从网上找的,看了,没啥难理解的,

+public class TestMultipleColumnPrefixFilter {  
+
+ private final static HBaseTestingUtility TEST_UTIL = new
+ HBaseTestingUtility();
+
+ @Test
+ public void testMultipleColumnPrefixFilter() throws IOException {
+ String family = "Family";
+ HTableDescriptor htd = new HTableDescriptor("TestMultipleColumnPrefixFilter");
+ htd.addFamily(new HColumnDescriptor(family));
+ // HRegionInfo info = new HRegionInfo(htd, null, null, false);
+ HRegionInfo info = new HRegionInfo(htd.getName(), null, null, false);
+ HRegion region = HRegion.createHRegion(info, HBaseTestingUtility.
+ getTestDir(), TEST_UTIL.getConfiguration(), htd);
+
+ List<String> rows = generateRandomWords(100, "row");
+ List<String> columns = generateRandomWords(10000, "column");
+ long maxTimestamp = 2;
+
+ List<KeyValue> kvList = new ArrayList<KeyValue>();
+
+ Map<String, List<KeyValue>> prefixMap = new HashMap<String,
+ List<KeyValue>>();
+
+ prefixMap.put("p", new ArrayList<KeyValue>());
+ prefixMap.put("q", new ArrayList<KeyValue>());
+ prefixMap.put("s", new ArrayList<KeyValue>());
+
+ String valueString = "ValueString";
+
+ for (String row: rows) {
+ Put p = new Put(Bytes.toBytes(row));
+ for (String column: columns) {
+ for (long timestamp = 1; timestamp <= maxTimestamp; timestamp++) {
+ KeyValue kv = KeyValueTestUtil.create(row, family, column, timestamp,
+ valueString);
+ p.add(kv);
+ kvList.add(kv);
+ for (String s: prefixMap.keySet()) {
+ if (column.startsWith(s)) {
+ prefixMap.get(s).add(kv);
+ }
+ }
+ }
+ }
+ region.put(p);
+ }
+
+ MultipleColumnPrefixFilter filter;
+ Scan scan = new Scan();
+ scan.setMaxVersions();
+ byte [][] filter_prefix = new byte [2][];
+ filter_prefix[0] = new byte [] {'p'};
+ filter_prefix[1] = new byte [] {'q'};
+
+ filter = new MultipleColumnPrefixFilter(filter_prefix);
+ scan.setFilter(filter);
+ List<KeyValue> results = new ArrayList<KeyValue>();
+ InternalScanner scanner = region.getScanner(scan);
+ while(scanner.next(results));
+ assertEquals(prefixMap.get("p").size() + prefixMap.get("q").size(), results.size());
+ }
+
+ @Test
+ public void testMultipleColumnPrefixFilterWithManyFamilies() throws IOException {
+ String family1 = "Family1";
+ String family2 = "Family2";
+ HTableDescriptor htd = new HTableDescriptor("TestMultipleColumnPrefixFilter");
+ htd.addFamily(new HColumnDescriptor(family1));
+ htd.addFamily(new HColumnDescriptor(family2));
+ HRegionInfo info = new HRegionInfo(htd.getName(), null, null, false);
+ HRegion region = HRegion.createHRegion(info, HBaseTestingUtility.
+ getTestDir(), TEST_UTIL.getConfiguration(), htd);
+
+ List<String> rows = generateRandomWords(100, "row");
+ List<String> columns = generateRandomWords(10000, "column");
+ long maxTimestamp = 3;
+
+ List<KeyValue> kvList = new ArrayList<KeyValue>();
+
+ Map<String, List<KeyValue>> prefixMap = new HashMap<String,
+ List<KeyValue>>();
+
+ prefixMap.put("p", new ArrayList<KeyValue>());
+ prefixMap.put("q", new ArrayList<KeyValue>());
+ prefixMap.put("s", new ArrayList<KeyValue>());
+
+ String valueString = "ValueString";
+
+ for (String row: rows) {
+ Put p = new Put(Bytes.toBytes(row));
+ for (String column: columns) {
+ for (long timestamp = 1; timestamp <= maxTimestamp; timestamp++) {
+ double rand = Math.random();
+ KeyValue kv;
+ if (rand < 0.5)
+ kv = KeyValueTestUtil.create(row, family1, column, timestamp,
+ valueString);
+ else
+ kv = KeyValueTestUtil.create(row, family2, column, timestamp,
+ valueString);
+ p.add(kv);
+ kvList.add(kv);
+ for (String s: prefixMap.keySet()) {
+ if (column.startsWith(s)) {
+ prefixMap.get(s).add(kv);
+ }
+ }
+ }
+ }
+ region.put(p);
+ }
+
+ MultipleColumnPrefixFilter filter;
+ Scan scan = new Scan();
+ scan.setMaxVersions();
+ byte [][] filter_prefix = new byte [2][];
+ filter_prefix[0] = new byte [] {'p'};
+ filter_prefix[1] = new byte [] {'q'};
+
+ filter = new MultipleColumnPrefixFilter(filter_prefix);
+ scan.setFilter(filter);
+ List<KeyValue> results = new ArrayList<KeyValue>();
+ InternalScanner scanner = region.getScanner(scan);
+ while(scanner.next(results));
+ assertEquals(prefixMap.get("p").size() + prefixMap.get("q").size(), results.size());
+ }
+
+ @Test
+ public void testMultipleColumnPrefixFilterWithColumnPrefixFilter() throws IOException {
+ String family = "Family";
+ HTableDescriptor htd = new HTableDescriptor("TestMultipleColumnPrefixFilter");
+ htd.addFamily(new HColumnDescriptor(family));
+ HRegionInfo info = new HRegionInfo(htd.getName(), null, null, false);
+ HRegion region = HRegion.createHRegion(info, HBaseTestingUtility.
+ getTestDir(), TEST_UTIL.getConfiguration(),htd);
+
+ List<String> rows = generateRandomWords(100, "row");
+ List<String> columns = generateRandomWords(10000, "column");
+ long maxTimestamp = 2;
+
+ String valueString = "ValueString";
+
+ for (String row: rows) {
+ Put p = new Put(Bytes.toBytes(row));
+ for (String column: columns) {
+ for (long timestamp = 1; timestamp <= maxTimestamp; timestamp++) {
+ KeyValue kv = KeyValueTestUtil.create(row, family, column, timestamp,
+ valueString);
+ p.add(kv);
+ }
+ }
+ region.put(p);
+ }
+
+ MultipleColumnPrefixFilter multiplePrefixFilter;
+ Scan scan1 = new Scan();
+ scan1.setMaxVersions();
+ byte [][] filter_prefix = new byte [1][];
+ filter_prefix[0] = new byte [] {'p'};
+
+ multiplePrefixFilter = new MultipleColumnPrefixFilter(filter_prefix);
+ scan1.setFilter(multiplePrefixFilter);
+ List<KeyValue> results1 = new ArrayList<KeyValue>();
+ InternalScanner scanner1 = region.getScanner(scan1);
+ while(scanner1.next(results1));
+
+ ColumnPrefixFilter singlePrefixFilter;
+ Scan scan2 = new Scan();
+ scan2.setMaxVersions();
+ singlePrefixFilter = new ColumnPrefixFilter(Bytes.toBytes("p"));
+
+ scan2.setFilter(singlePrefixFilter);
+ List<KeyValue> results2 = new ArrayList<KeyValue>();
+ InternalScanner scanner2 = region.getScanner(scan1);
+ while(scanner2.next(results2));
+
+ assertEquals(results1.size(), results2.size());
+ }
+
+ List<String> generateRandomWords(int numberOfWords, String suffix) {
+ Set<String> wordSet = new HashSet<String>();
+ for (int i = 0; i < numberOfWords; i++) {
+ int lengthOfWords = (int) (Math.random()*2) + 1;
+ char[] wordChar = new char[lengthOfWords];
+ for (int j = 0; j < wordChar.length; j++) {
+ wordChar[j] = (char) (Math.random() * 26 + 97);
+ }
+ String word;
+ if (suffix == null) {
+ word = new String(wordChar);
+ } else {
+ word = new String(wordChar) + suffix;
+ }
+ wordSet.add(word);
+ }
+ List<String> wordList = new ArrayList<String>(wordSet);
+ return wordList;
+ }
+}
+
.

ColumnPrefixFilter

public class ColumnPrefixFilterextends FilterBaseThis filter is used for selecting only those
 keys with columns that matches a particular prefix. 
For example, if prefix is 'an', it will pass keys will columns like 'and', 'anti' but not keys with columns like 'ball', 'act'.   

上面是类的说明

只有一个有参构造 ColumnPrefixFilter(byte[] prefix)

这个类用法很简单,就是匹配前缀是prefix的rowkey,但是,不知道大家用了之后有什么感觉,我是用了,但是不起作用,有起作用的大牛告诉我下。

无奈之下,只好选择PrefixFilter

PrefixFilter

类说明 :

Pass results that have same row prefix.

同样的构造方法,跟ColumnPrefixFilter一模一样,用法也相同,基本上几个Filter就是这些了,慢慢的我再更新这个文章上段代码,我自己写的,使用中的代码

public static String getKeywordTableRowkeyUseFilter(String filterString1,String filterString2) {  
FilterList filterList = new FilterList();
String rowkeyValue = "" ;
Scan s1 = new Scan();
String [] sf1=filterString1.split(",");
filterList.addFilter(new SingleColumnValueFilter(Bytes.toBytes(sf1[0]),
Bytes.toBytes(sf1[1]),
CompareOp.EQUAL,Bytes.toBytes(sf1[2])
));
String [] sf2=filterString2.split(",");
filterList.addFilter(new SingleColumnValueFilter(Bytes.toBytes(sf2[0]),
Bytes.toBytes(sf2[1]),
CompareOp.EQUAL,Bytes.toBytes(sf2[2])
));
filterList.addFilter(new ColumnPrefixFilter(Bytes.toBytes("3274980668:"))) ;
filterList.addFilter(new PrefixFilter(Bytes.toBytes("3274980668:"))) ;

s1.setFilter(filterList);
ResultScanner ResultScannerFilterList;
try {
ResultScannerFilterList = tableKeyword.getScanner(s1);
for(Result rr=ResultScannerFilterList.next();rr!=null;rr=ResultScannerFilterList.next()){
String rowkeyValueTmp = new String(rr.getRow()) ;

rowkeyValue = rowkeyValue + "##" + rowkeyValueTmp ;

}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
log.warn("rowkeyValue" + rowkeyValue) ;
return rowkeyValue ;
}
相关文章 相关文档 相关视频



我们该如何设计数据库
数据库设计经验谈
数据库设计过程
数据库编程总结
数据库性能调优技巧
数据库性能调整
数据库性能优化讲座
数据库系统性能调优系列
高性能数据库设计与优化
高级数据库架构师
数据仓库和数据挖掘技术
Hadoop原理、部署与性能调优
 
分享到
 
 


MySQL索引背后的数据结构
MySQL性能调优与架构设计
SQL Server数据库备份与恢复
让数据库飞起来 10大DB2优化
oracle的临时表空间写满磁盘
数据库的跨平台设计
更多...   


并发、大容量、高性能数据库
高级数据库架构设计师
Hadoop原理与实践
Oracle 数据仓库
数据仓库和数据挖掘
Oracle数据库开发与管理


GE 区块链技术与实现培训
航天科工某子公司 Nodejs高级应用开发
中盛益华 卓越管理者必须具备的五项能力
某信息技术公司 Python培训
某博彩IT系统厂商 易用性测试与评估
中国邮储银行 测试成熟度模型集成(TMMI)
中物院 产品经理与产品管理
更多...