Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓƵ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Modeler   Code  
»áÔ±   
 
   
 
 
     
   
 ¶©ÔÄ
  ¾èÖú
´ï¹ÛÊý¾Ý·ÖÎöƽ̨¼Ü¹¹ºÍHiveʵ¼ù
 
×÷ÕߣºÎÄ»Ô À´Ô´£ºInfoQ ·¢²¼ÓÚ 2016-4-5
  3030  次浏览      19
 

HadoopÓÚ2006Äê1ÔÂ28ÈÕµ®Éú£¬ÖÁ½ñÒÑÓÐ10Ä꣬Ëü¸Ä±äÁËÆóÒµ¶ÔÊý¾ÝµÄ´æ´¢¡¢´¦ÀíºÍ·ÖÎöµÄ¹ý³Ì£¬¼ÓËÙÁË´óÊý¾ÝµÄ·¢Õ¹£¬ÐγÉÁË×Ô¼ºµÄ¼«Æä»ð±¬µÄ¼¼ÊõÉú̬Ȧ£¬²¢Êܵ½·Ç³£¹ã·ºµÄÓ¦Óá£ÔÚ2016ÄêHadoopÊ®ËêÉúÈÕÖ®¼Ê£¬InfoQ²ß»®ÁËÒ»¸öHadoopÈȵãϵÁÐÎÄÕ£¬Îª´ó¼ÒÊáÀíHadoopÕâÊ®ÄêµÄ±ä»¯£¬¼¼ÊõȦµÄÉú̬״¿ö£¬»Ø¹ËÒÔÇ°£¬¼¤ÀøÒÔºó¡£

½üÊ®ÄêÀ´£¬Ëæ×ÅHadoopÉú̬ϵͳµÄ²»¶ÏÍêÉÆ£¬HadoopÔçÒѳÉΪ´óÊý¾ÝÊÂʵÉϵÄÐÐÒµ±ê×¼Ö®Ò»¡£Ãæ¶Ôµ±½ñ»¥ÁªÍø²úÉúµÄ¾Þ´óµÄTBÉõÖÁPB¼¶Ô­Ê¼Êý¾Ý£¬ÀûÓûùÓÚHadoopµÄÊý¾Ý²Ö¿â½â¾ö·½°¸HiveÔçÒÑÊÇHadoopµÄÈȵãÓ¦ÓÃÖ®Ò»¡£´ï¹ÛÊý¾ÝÍŶӳ¤ÆÚÖÂÁ¦ÓÚÑо¿ºÍ»ýÀÛHadoopϵͳµÄ¼¼ÊõºÍ¾­Ñ飬²¢¹¹½¨ÆðÁË·Ö²¼Ê½´æ´¢¡¢·ÖÎö¡¢ÍÚ¾òÒÔ¼°Ó¦ÓõÄÕûÌ×´óÊý¾Ý´¦Àíƽ̨¡£

±¾ÎĽ«´ÓHiveÔ­Àí¡¢Êý¾Ý·ÖÎöƽ̨¼Ü¹¹¡¢Êý¾Ý·ÖÎöʵս¡¢HiveÓÅ»¯µÈËĸö·½ÃæÀ´·ÖÏíһЩ¹ØÓÚϵͳ¼Ü¹¹ºÍHiveµÄÐĵúÍʵս¾­Ñ飬ϣÍû´ó¼ÒÓÐËùÊÕ»ñ¡£

1 HiveÔ­Àí

HadoopÊÇÒ»¸öÁ÷ÐеĿªÔ´¿ò¼Ü£¬ÓÃÀ´´æ´¢ºÍ´¦ÀíÉÌÓÃÓ²¼þÉϵĴó¹æÄ£Êý¾Ý¼¯¡£¶ÔÓÚHDFSÉϵĺ£Á¿ÈÕÖ¾¶øÑÔ£¬±àдMapreduce³ÌÐò´úÂë¶ÔÓÚÀàËÆÊý¾Ý²Ö¿âµÄÐèÇóÀ´Ëµ×ÜÊÇÏÔµÃÏà¶ÔÓÚÄÑÒÔά»¤ºÍÖØÓã¬Hive×÷ΪһÖÖ»ùÓÚHadoopµÄÊý¾Ý²Ö¿â½â¾ö·½°¸Ó¦Ô˶øÉú£¬²¢µÃµ½Á˹㷺ӦÓá£

HiveÊÇ»ùÓÚHadoopµÄÊý¾Ý²Ö¿âƽ̨£¬ÓÉFacebook¹±Ï×£¬ÆäÖ§³ÖÀàËÆSQLµÄ½á¹¹»¯²éѯ¹¦ÄÜ¡£FacebookÉè¼Æ¿ª·¢HiveµÄ³õÖÔ¾ÍÊÇÈÃÄÇЩÊìϤsql±à³Ì·½Ê½µÄÈËÒ²¿ÉÒÔ¸üºÃµÄÀûÓÃhadoop£¬hive¿ÉÒÔÈÃÊý¾Ý·ÖÎöÈËÔ±Ö»¹Ø×¢ÓÚ¾ßÌåÒµÎñÄ£ÐÍ£¬¶ø²»ÐèÒªÉîÈëÁ˽âMap/ReduceµÄ±à³Ìϸ½Ú£¬µ«ÊÇÕâ²¢²»Òâζ×ÅʹÓÃhive²»ÐèÒªÁ˽âºÍѧϰMap/Reduce±à³ÌÄ£ÐͺÍhadoop¡£¶ÔÓÚHive·ÖÎöÈËÔ±À´Ëµ£¬ÉîÈëÁ˽âHadoopºÍHiveµÄÔ­ÀíºÍMapreduceÄ£ÐÍ£¬¶ÔÓÚÓÅ»¯²éѯ×ÜÓÐÒæ´¦¡£

1.1 Hive×é¼þÓëÄ£ÐÍ

HiveµÄ×é¼þ×ÜÌåÉÏ¿ÉÒÔ·ÖΪÒÔϼ¸¸ö²¿·Ö£ºÓû§½Ó¿Ú£¨UI£©¡¢Çý¶¯¡¢±àÒëÆ÷¡¢ÔªÊý¾Ý£¨Hiveϵͳ²ÎÊýÊý¾Ý£©ºÍÖ´ÐÐÒýÇæ¡£HiveÖаüº¬4ÖÐÊý¾ÝÄ£ÐÍ£ºTabel¡¢ExternalTable¡¢Partition¡¢Bucket¡£

ͼ£ºhiveÊý¾ÝÄ£ÐÍ

a) Table£ºÃ¿Ò»¸öTableÔÚHiveÖж¼ÓÐÒ»¸öÏàÓ¦µÄĿ¼À´´æ´¢Êý¾Ý£»

b) Partition£º±íÖеÄÒ»¸öPartition¶ÔÓ¦ÓÚ±íϵÄÒ»¸öĿ¼£¬ËùÓеÄPartitionÊý¾Ý¶¼´æ´¢ÔÚ¶ÔÓ¦µÄĿ¼ÖУ»

c) Buckets£º¶ÔÖ¸¶¨ÁмÆËãµÄhash£¬¸ù¾ÝhashÖµÇзÖÊý¾Ý£¬Ä¿µÄÊÇΪÁ˱ãÓÚ²¢ÐУ¬Ã¿Ò»¸öBuckets¶ÔÓ¦Ò»¸öÎļþ£»

d) External TableÖ¸ÏòÒÑ´æÔÚHDFSÖеÄÊý¾Ý£¬¿É´´½¨Partition¡£

¶ÁʱÑéÖ¤»úÖÆ

Ó봫ͳÊý¾Ý¿â¶Ô±íÊý¾Ý½øÐÐдʱÑÏÖز»Í¬£¬Hive¶ÔÊý¾ÝµÄÑéÖ¤·½Ê½Îª¶Áʱģʽ£¬¼´Ö»ÓÐÔÚ¶Á±íÊý¾ÝµÄʱºò£¬hive²Å¼ì²é½âÎö¾ßÌåµÄ×ֶΡ¢shemaµÈ£¬´Ó¶ø±£Ö¤ÁË´óÊý¾ÝÁ¿µÄ¿ìËÙ¼ÓÔØ¡£

Èç¹û±íschemaÓë±íÎļþÄÚÈݲ»Æ¥Å䣬Hive»á¾¡ÆäËùÄܵÄÈ¥¶ÁÊý¾Ý¡£Èç¹ûschemaÖбíÓÐ10¸ö×ֶΣ¬¶øÎļþ¼Ç¼ȴֻÓÐ3¸ö×ֶΣ¬ÄÇôÆäÖÐ7¸ö×ֶν«Îªnull£»Èç¹ûijЩ×Ö¶ÎÀàÐͶ¨Î»ÎªÊýÖµÀàÐÍ£¬µ«ÊǼǼÖÐȴΪ·ÇÊýÖµ×Ö·û´®£¬ÕâЩ×Ö¶ÎÒ²½«»á±»×ª»»Îªnull¡£Hive»áŬÁ¦catch¶ÁÊý¾ÝʱÓöµ½µÄ´íÎ󣬲¢Å¬Á¦·µ»Ø¡£¼ÈÈ»Hive±íÊý¾Ý´æ´¢ÔÚHDFSÖÐÇÒHive²ÉÓõÄÊǶÁʱÑéÖ¤·½Ê½£¬¶¨ÒåÍê±íµÄschema»á×Ô¶¯Éú³É±íÊý¾ÝµÄHDFSĿ¼£¬ÇÒÎÒÃÇ¿ÉÒÔÒÔÈκοÉÄܵķ½Ê½À´¼ÓÔرíÊý¾Ý»òÕßÀûÓÃHDFS API½«Êý¾ÝдÈëÎļþ£¬Í¬Àí£¬µ±ÎÒÃÇÈôÐèÒª½«hiveÊý¾ÝдÈëÆäËû¿â£¨Èçoracle£©£¬Ò²¿ÉÒÔÖ±½Óͨ¹ýapi¶ÁÈ¡Êý¾ÝÔÙдÈëÄ¿±ê¿â¡£

ÔÙ´Î×¢Ò⣬¼ÓÔØ»òÕßдÈëµÄÊý¾ÝÄÚÈÝÒªºÍ±í¶¨ÒåµÄschemaÒ»Ö£¬·ñÔò½«»áÔì³É×ֶλòÕß±íΪ¿Õ¡£

1.2 HQL·­Òë³ÉMapReduce Job

Hive±àÒëÆ÷½«HQL´úÂëת»»³ÉÒ»×é²Ù×÷·û£¨operator£©£¬²Ù×÷·ûÊÇHiveµÄ×îС²Ù×÷µ¥Ôª£¬Ã¿¸ö²Ù×÷·û´ú±íÁËÒ»ÖÖHDFS²Ù×÷»òÕßMapReduce×÷Òµ¡£HiveÖеIJÙ×÷·û°üÀ¨£ºTableScanOperator¡¢ReduceSinkOperator¡¢JoinOperator¡¢SelectOperator¡¢FileSinkOperator¡¢FilterOperator¡¢GroupByOperator¡¢MapJoinOperatorµÈ¡£

HiveÓï¾ä

INSERT OVERWRITE TABLE read_log_tmp
SELECT a.userid,a.bookid,b.author,b.categoryid 
FROM user_read_log a JOIN book_info b ON a.bookid = b.bookid;

ÆäÖ´Ðмƻ®Îª£º

ͼ£ºjoinµÄÈÎÎñÖ´ÐÐÁ÷³Ì

1.3 ÓëÒ»°ãSQLµÄÇø±ð

Hive ÊÓͼÓëÒ»°ãÊý¾Ý¿âÊÓͼ

HiveÊÓͼֻ֧³ÖÂß¼­ÊÓͼ£¬²»Ö§³ÖÎﻯÊÓͼ£¬¼´Ã¿´Î¶ÔÊÓͼµÄ²éѯhive¶¼½«Ö´ÐвéѯÈÎÎñ£¬Òò´ËÊÓͼ²»»á´øÀ´ÐÔÄÜÉϵÄÌáÉý¡£×÷ΪHive²éѯÓÅ»¯µÄÒ»²¿·Ö£¬¶ÔÊÓͼµÄ²éѯÌõ¼þÓï¾äºÍÊÓͼµÄ¶¨Òå²éѯÌõ¼þÓï¾ä½«»á¾¡¿ÉÄܵĺϲ¢³ÉÒ»¸öÌõ¼þ²éѯ¡£

HiveË÷ÒýÓëÒ»°ãÊý¾Ý¿âË÷Òý

Hive1.2.1°æ±¾Ä¿Ç°Ö§³ÖµÄË÷ÒýÀàÐÍÓÐCompactIndexHandlerºÍBitmap¡£

CompactIndexHandler ѹËõË÷Òýͨ¹ý½«ÁÐÖÐÏàͬµÄÖµµÃ×ֶνøÐÐѹËõ´Ó¶ø¼õС´æ´¢ºÍ¼Ó¿ì·ÃÎÊʱ¼ä¡£ÐèҪעÒâµÄÊÇHive´´½¨Ñ¹ËõË÷Òýʱ»á½«Ë÷ÒýÊý¾ÝÒ²´æ´¢ÔÚHive±íÖС£¶ÔÓÚ±ítb_index (id int, name string) ¶øÑÔ£¬½¨Á¢Ë÷ÒýºóµÄË÷Òý±íÖÐĬÈϵÄÈýÁÐÒ»´ÎΪË÷ÒýÁУ¨id£©¡¢hdfsÎļþµØÖ·(_bucketname)¡¢Æ«ÒÆÁ¿(offset)¡£

Bitmap λͼË÷Òý×÷ΪһÖÖ³£¼ûµÄË÷Òý£¬Èç¹ûË÷ÒýÁÐÖ»Óй̶¨µÄ¼¸¸öÖµ£¬ÄÇô¾Í¿ÉÒÔ²ÉÓÃλͼË÷ÒýÀ´¼ÓËÙ²éѯ¡£ÀûÓÃλͼË÷Òý¿ÉÒÔ·½±ãµÄ½øÐÐAND/OR/XORµÈ¸÷Àà¼ÆË㣬Hive0.8°æ±¾¿ªÊ¼ÒýÈëλͼË÷Òý£¬Î»Í¼Ë÷ÒýÔÚ´óÊý¾Ý´¦Àí·½ÃæµÄÓ¦Óù㷺£¬±ÈÈç¿ÉÒÔÀûÓÃbitmapÀ´¼ÆËãÓû§Áô´æÂÊ£¨Ë÷Òý×öÓëÔËË㣬ЧÂÊÔ¶ºÃÓÚjoinµÄ·½Ê½£©¡£Èç¹ûBitmapË÷ÒýºÜÏ¡Ê裬ÄÇô¾ÍÐèÒª¶ÔË÷ÒýѹËõÒÔ½ÚÊ¡´æ´¢¿Õ¼äºÍ¼Ó¿ìIO¡£HiveµÄBitmap Handler²ÉÓõÄÊÇEWAH£¨https://github.com/lemire/javaewah£©Ñ¹Ëõ·½Ê½¡£

2 Êý¾Ý·ÖÎöƽ̨

2.1 ¼Ü¹¹ÓëÄ£¿é

´ï¹ÛÊý¾Ý·ÖÎöƽ̨°üÀ¨Êý¾ÝÊÕ¼¯¼ÓÔØÄ£¿é¡¢Êý¾Ý·ÖÎö¼ÆËãÄ£¿é¡¢ÈÎÎñµ÷¶ÈϵͳÒÔ¼°¿ÉÊÓ»¯ÏµÍ³¡£

ͼ£ºÊý¾Ý·ÖÎöƽ̨»ù±¾¿ò¼Ü

Êý¾ÝÊÕ¼¯Ä£¿é

Êý¾ÝÄ£¿é¸ºÔðÊÕ¼¯Òƶ¯¶Ëapp¡¢ÍøÒ³¶ËÒÔ¼°·þÎñÆ÷¶Ë´óÁ¿µÄÈÕÖ¾Êý¾Ý¡£Òƶ¯¶Ë¿É×ÔÐпª·¢Êý¾ÝÉϱ¨¹¦ÄÜ»òÕßʹÓÃsdkÀ´Éϱ¨Êý¾Ý¡£ÍøÒ³¶ËÀûÓÃÖ²ÈëµÄjs½«Óû§µÄÐÐΪ½øÐÐÉϱ¨£¬·þÎñÆ÷¶Ëͨ¹ýhttp serverÀ´ÊÕ¼¯Éϱ¨µÄÊý¾Ý¡£·þÎñÆ÷¶ËµÄÈÕÖ¾ÐÅÏ¢¿ÉÒÔͨ¹ýDXÄ£¿é(Ò»¸ö¿ç¿âµÄÊý¾Ý½»»»ÏµÍ³)À´½«´ý´¦ÀíÊý¾ÝÍÆÈëhiveÊý¾Ý·ÖÎöƽ̨¡£³ý´ËÖ®Í⣬Êý¾ÝÀ´Ô´»¹°üÀ¨´óÁ¿µÄuser ¡¢item»ù±¾Êý¾ÝµÈµÈ¡£Êý¾ÝÊÕ¼¯Íê³É½«ËùÓÐÐèÒª´¦Àí·ÖÎöµÄԭʼÊý¾ÝÍÆÈëhadoopƽ̨¡£´ÓÎïÀíÐÎʽÀ´¿´£¬¼´½«´ý·ÖÎöÊý¾ÝдÈëHDFS¡£

Êý¾ÝETLÄ£¿é

Ò»°ã¶øÑÔ£¬Éϱ¨µÄÊý¾Ý¶¼ÊǷǽṹ»¯»òÕß°ë½á¹¹»¯µÄ¡£ETL£¨³éÈ¡¡¢×ª»»¡¢¼ÓÔØ£©Ä£¿é¸ºÔð½«ËùÓеķǽṹ»òÕß°ë½á¹¹µÄÊý¾Ýת»»³É½á¹¹»¯µÄÊý¾Ý²¢¼ÓÔص½hive¿â±íÖС£ÀýÈç¶ÔÓÚÓû§·ÃÎÊÈÕÖ¾£¨¿ÉÄÜÊÇweb serverÈÕÖ¾£©£¬ÎÒÃÇÐèÒª´ÓÿÐÐÈÕÖ¾ÖгéÈ¡³öÓû§µÄ±êʶ£¨cookie¡¢imei»òÕßuserid£©£¬ipÀ´Ô´¡¢urlµÈ¡£´ÓÐÎʽÉÏÀ´¿´£¬ETL½«HDFSµÄԭʼÊý¾Ý½á¹¹»¯£¬ÒÔ±íµÄÐÎʽÌṩ·ÖÎö¡£

Êý¾Ý·ÖÎöÓë¼ÆËã

¸ù¾ÝÒµÎñÐèÇóºÍ¹¦ÄÜ£¬ÀûÓÃHQLʵÏÖ¸÷ÖÖͳ¼Æ·ÖÎö¡£Ò»¸öHiveÈÎÎñµÄÀ´Ô´±í¿ÉÄÜÊǶà¸ö£¬½á¹ûÊý¾ÝÒ²ÓпÉÄÜ»áдÈë¶àÕÅ±í¡£

ͼ£ºHiveÈÎÎñÖ´ÐÐÊäÈëÊä³ö

ÈÎÎñµ÷¶Èϵͳ

´ÓÉÏͼ¿ÉÒÔ¿´³ö£¬HiveÈÎÎñÖ®¼ä´æÔÚÒÀÀµ¹Øϵ£¬²»ÖÁÓÚHiveÈÎÎñÖ®¼ä´æÔÚÒÀÀµ£¬HiveÈÎÎñÓëDXÈÎÎñÖ®¼ä¡¢DXÈÎÎñÖ®¼ä¶¼¿ÉÄÜ´æÔÚijÖÖÒÀÀµ¹Øϵ£¬´ï¹ÛÊý¾Ý·ÖÎöƽ̨֧³ÖµÄÈÎÎñÀàÐÍ»¹°üÀ¨MRÈÎÎñ¡¢shellÈÎÎñµÈ£¬´ï¹ÛÊý¾Ý·ÖÎöƽ̨×ÔÐпª·¢Ë¾Äϵ÷¶ÈϵͳÀ´Íê³Éƽ̨ÖÐËùÓÐÈÎÎñµÄµ÷¶È¡£¹ØÓÚ˾Äϵ÷¶Èϵͳ¿É¼ûºóÐøÌÖÂÛ¡£

Êý¾Ý·ÖÎöƽ̨ģ¿é

ͼ£ºÊý¾Ý·ÖÎöƽ̨»ù±¾Ä£¿é

½ÓÏÂÀ´½«Â½Ðø½éÉÜ£¬Êý¾Ý·ÖÎöƽ̨ÖеÄÁ½¸öÖØҪģ¿é£ºDXÊý¾Ý½»»»ÏµÍ³ÒÔ¼°ÈÎÎñµ÷¶Èϵͳ¡£

2.2 DXÊý¾Ý½»»»

DXϵͳ¿ÉÒÔÔÚ¹ØϵÐÍÊý¾Ý¿â¡¢Hive¡¢FTPµÈϵͳ֮¼äʵÏÖÊý¾ÝµÄ½»»»¡£DX¶¨ÒåÁËWriterºÍReader½Ó¿ÚÀ´³éÏó¶ÔÊý¾ÝµÄ¶Áд²Ù×÷£¬¶ÔÓÚ¸÷ÖÖ´æ´¢ÀàÐ͵ÄÊý¾Ý£¬Ð趨ÖÆËûÃǵÄʵÏÖ·½·¨¡£

¹ØϵÐÍÊý¾Ý¿âÀûÓÃJDBCʵÏÖÆä¶Áд¹¦ÄÜ£»¶ÔÓÚHive¶øÑÔ£¬Ö±½ÓÀûÓÃHDFS APIʵÏÖ¶ÔHDFSÎļþµÄ¶Áд£¬ÓÉÓÚHiveµÄ¶ÁʱÑéÖ¤»úÖÆ£¬ÐèÒªÔÚ¶ÁдHive±íÎļþʱ£¬¶¨ÒåÆä×ֶθöÊý¡¢Ãû³ÆµÈÐÅÏ¢£¬±£Ö¤Óë±í¶¨ÒåÒ»Ö£»FTPÎļþÄ¿Ç°µÄ´¦Àí·½·¨ÊÇÏȽ«Êý¾Ý´ÓFTP·þÎñÆ÷À­ÏÂÀ´£¬È»ºó½«¶ÁÈ¡ÎļþÄÚÈÝ£¬Ð´ÈëHiveÊý¾Ý¿â¡£

ÒÔÉϹý³ÌÊÇÆäËûÊý¾ÝÔ´µ½HiveµÄÊý¾Ý´«Êä¹ý³Ì£¬HiveÊý¾ÝͬÑù¿ÉÒÔͨ¹ýDXϵͳдÈëÆäËûÊý¾ÝÔ´¡£

2.3 ÈÎÎñµ÷¶È

´ï¹ÛÊý¾Ý·ÖÎöƽ̨¿ª·¢µÄ˾Äϵ÷¶Èϵͳ½«ÈÎÎñ·ÖΪ×ÊÔ´ÒÀÀµÐͺÍʵ¼ùÒÀÀµÐÍ¡£Ê±¼äÒÀÀµÐÍÈÎÎñÀàËÆÓÚcrontab¶¨Ê±ÈÎÎñÒ»Ñù£¬µ½Ê±´¥·¢ÆäÖ´ÐС£×ÊÔ´ÒÀÀµÐÍÈÎÎñÐèÒªÆäÒÀÀµµÄ×ÊÔ´¶¼Âú×ãʱ²Å»á´¥·¢ÆäÖ´ÐС£¿Éµ÷¶ÈµÄÈÎÎñÀàÐÍ°üÀ¨DXÈÎÎñ¡¢HiveÈÎÎñ¡¢MRÈÎÎñ¡¢shellÈÎÎñµÈ¡£
˾ÄÏϵͳÖÐ×îΪ¹Ø¼üµÄÊÇdispatcherÄ£¿é£¬¸ÃÄ£¿éͨ¹ýzookeeperÀ´µ÷¶ÈÈÎÎñÔÚagent£¨Ö´ÐÐÈÎÎñµÄ´úÀí·þÎñÆ÷£¬ÐèÒªÉèÖöà¸ö£©ÉϵÄÔËÐУ¬¹ØÓÚzookeeperÈçºÎЭµ÷·Ö²¼Ê½Ó¦ÓõÄÒ»ÖÂÐÔÔڴ˲»ÔÙÀÛÊö¡£

2.4 ¼Ü¹¹ÑÝ»¯

´ï¹ÛÊý¾Ý·ÖÎöƽ̨ÔÚʹÓùý³ÌÖУ¬²»¶ÏÌá¸ßÆäÒ×ÓÃÐÔºÍÎȶ¨ÐÔ¡£ÔÚ´óÁ¿µÄÑо¿ºÍ¿ª·¢¹ý³ÌÖУ¬Æ½Ì¨´ÓÎÞµ½ÓУ¬×ß³öµÚÒ»²½µ½¹¦ÄÜÍêÉÆ¡¢·¢»Ó¾Þ´óµÄÒµÎñ¼ÛÖµ¡£

´Ó·ÖÉ¢µÄÊý¾Ý½»»»µ½¼¯ÖеÄÊý¾Ý½»»»ÏµÍ³

ÔÚʹÓÃͳһµÄÊý¾Ý½»»»ÏµÍ³DXºó£¬¸÷ÒµÎñϵͳµÄÊý¾Ý¿ÉÒÔ¸üºÃµÄ½øÐлã¾ÛºÍ´òͨ£¬½øÐÐͳһµÄ·ÖÎöºÍ´¦Àí¡£

´Ó·ÖÉ¢µÄ×÷Òµµ÷¶Èµ½¼¯ÖеÄÈÎÎñµ÷¶Èϵͳ

ÿÌ켸ǧ¹æÄ£µÄÈÎÎñÊýʹµÃÈÎÎñµÄµ÷¶È¼«ÆäÀ§ÄÑ£¬ÌرðÊǵ±ÈÎÎñÖ®¼ä´æÔÚÒÀÀµ¹Øϵʱ£¬ÏÔÈ»¼òµ¥µÄͨ¹ýcrontabÒѾ­ÎÞ·¨Âú×ãÒµÎñµÄÐèÇó¡£Ë¾Äϵ÷¶Èϵͳ±£Ö¤ËùÓÐÈÎÎñÓÐÐòÕýÈ·µÄÔËÐС£

´ÓÅúÁ¿Ê½´¦Àíµ½¼¯³ÉÁ÷ʽ´¦Àí

Ëæ×Åʵʱͳ¼Æ·ÖÎöµÄÐèÇóÔ½À´Ô½¶à£¬hive²éѯ»ùÓÚMRÈÎÎñÀ´ÊµÏÖµÄȱµãÈÕÒæÃ÷ÏÔ£¨ÈÎÎñÆô¶¯¿ªÏú´ó£©¡£ÎªÁËÌṩʵʱµÄÊý¾Ý·ÖÎöÇëÇó£¬Æ½Ì¨¿ªÊ¼ÒýÈëstormÁ÷ʽ¼ÆËãÄ£ÐÍ¡£StormÒÔÊý¾ÝÁ÷ΪÇý¶¯¡£´¥·¢¼ÆË㣬ÿÀ´Ò»ÌõÊý¾Ý¾Í²úÉúÒ»´Î¼ÆËã½á¹û£¬Ê±Ð§ÐԷdz£¸ß£¬ÔÚÒµ½çÒ²µÃµ½Á˷ḻµÄÓ¦Óá£

´Ó¹ØϵÐÍÊý¾Ý¿âµ½Hbase

³õÆÚ£¬Êý¾Ý·ÖÎöµÄ½á¹ûÊý¾Ý¶¼ÊÇͨ¹ýDXµ¼Èë¹ØϵÐÍÊý¾Ý¿â£¬ÒÔ±ãÊý¾Ý¿ÉÊÓ»¯Æ½Ì¨µ÷ÓûòÕßÆäËûϵͳʹÓ㬴óÁ¿µÄÊý¾ÝÔì³É¹ØϵÊý¾Ý¿âµÄÈÕÒæÅӴ󣬴øÀ´ÑÏÖصÄÐÔÄÜÎÊÌâ¡£HBaseÊÇÒ»¸ö¿ªÔ´¡¢ÁÐʽ·Ö²¼Ê½µÄÊý¾Ý¿â£¬»ùÓÚHDFSÎļþϵͳ£¬¿ÉÒÔ·½ÃæµÄºÍHive½øÐм¯³É¡£¾­¹ý¼¯³ÉHBase£¬Îª¿ÉÊÓ»¯Æ½Ì¨ºÍÏßÉÏϵͳÌṩ·þÎñ£¬½µµÍDXÈÎÎñÁ¿£¬½µµÍ·ÃÎÊÑÓ³Ù¡£

3 Hive·ÖÎöʵ¼ù

3.1 SchemaÉè¼Æ

ûÓÐͨÓõÄschema£¬Ö»ÓкÏÊʵÄschema¡£ÔÚÉè¼ÆHiveµÄschemaµÄʱºò£¬ÐèÒª¿¼Âǵ½´æ´¢¡¢ÒµÎñÉϵĸßƵ²éѯÔì³ÉµÄ¿ªÏúµÈµÈ£¬Éè¼ÆÊʺÏ×Ô¼ºµÄÊý¾ÝÄ£ÐÍ¡£

ÉèÖ÷ÖÇø±í

¶ÔÓÚHiveÀ´Ëµ£¬ÀûÓ÷ÖÇøÀ´Éè¼Æ±í×ÜÊDZØÒªµÄ£¬·ÖÇøÌṩÁËÒ»ÖÖ¸ôÀëÊý¾ÝºÍÓÅ»¯²éѯµÄ±ãÀûµÄ·½Ê½¡£ÉèÖ÷ÖÇøʱ£¬ÐèÒª¿¼ÂDZ»ÉèÖóɷÖÇøµÄ×ֶΣ¬°´ÕÕʱ¼ä·ÖÇøÒ»°ã¶øÑÔ¾ÍÊÇÒ»¸öºÃµÄ·½°¸£¬ÆäºÃ´¦ÔÚÓÚÆäÊÇ°´ÕÕ²»Í¬Ê±¼äÁ£¶ÈÀ´È·¶¨ºÏÊÊ´óСµÄÊý¾Ý»ýÀÛÁ¿£¬Ëæ×Åʱ¼äµÄÍÆÒÆ£¬·ÖÇøÊýÁ¿µÄÔö³¤ÊǾùÔȵģ¬·ÖÇøµÄ´óСҲÊǾùÔȵġ£

±ÜÃâСÎļþ

ËäÈ»·ÖÇøÓÐÀûÓÚ¸ôÀëÊý¾ÝºÍ²éѯ£¬ÉèÖùý¶à¹ýϸµÄ·ÖÇøÒ²»á´øÀ´Æ¿¾±£¬Ö÷ÒªÊÇÒòΪ¹ý¶àµÄ·ÖÇøÒâζ×ÅÎļþµÄÊýÄ¿¾ÍÔ½¶à£¬¹ý¶àÔö³¤µÄСÎļþ»á¸ønamecode´øÀ´¾Þ´óµÄÐÔÄÜѹÁ¦¡£Í¬Ê±Ð¡Îļþ¹ý¶à»áÓ°ÏìJOBµÄÖ´ÐУ¬hadoop»á½«Ò»¸öjobת»»³É¶à¸ötask£¬¼´Ê¹¶ÔÓÚÿ¸öСÎļþÒ²ÐèÒªÒ»¸ötaskÈ¥µ¥¶À´¦Àí£¬´øÀ´ÐÔÄÜ¿ªÏú¡£Òò´Ë£¬hive±íÉè¼ÆµÄ·ÖÇø²»Ó¦¸Ã¹ý¶à¹ýϸ£¬Ã¿¸öĿ¼ÏµÄÎļþ×ã¹»´ó£¬Ó¦¸ÃÊÇÎļþϵͳÖпé´óСµÄÈô¸É±¶¡£

Ñ¡ÔñÎļþ¸ñʽ

HiveÌṩµÄĬÈÏÎļþ´æ´¢¸ñʽÓÐtextfile¡¢sequencefile¡¢rcfileµÈ¡£Óû§Ò²¿ÉÒÔͨ¹ýʵÏÖ½Ó¿ÚÀ´×Ô¶¨ÒåÊäÈëÊäµÄÎļþ¸ñʽ¡£

ÔÚʵ¼ÊÓ¦ÓÃÖУ¬textfileÓÉÓÚÎÞѹËõ£¬´ÅÅ̼°½âÎöµÄ¿ªÏú¶¼ºÜ´ó£¬Ò»°ãºÜÉÙʹÓá£SequencefileÒÔ¼üÖµ¶ÔµÄÐÎʽ´æ´¢µÄ¶þ½øÖƵĸñʽ£¬ÆäÖ§³ÖÕë¶Ô¼Ç¼¼¶±ðºÍ¿é¼¶±ðµÄѹËõ¡£rcfileÊÇÒ»ÖÖÐÐÁнáºÏµÄ´æ´¢·½Ê½£¨text fileºÍsequencefile¶¼ÊÇÐбí[row table]£©£¬Æ䱣֤ͬһÌõ¼Ç¼ÔÚͬһ¸öhdfs¿éÖУ¬¿éÒÔÁÐʽ´æ´¢¡£rcfileµÄ¾ÛºÏÔËËã²»Ò»¶¨×ÜÊÇ´æÔÚ£¬µ«ÊÇrcfileµÄ¸ßѹËõÂÊȷʵ¼õÉÙÎļþ´óС£¬Òò´Ëʵ¼ÊÓ¦ÓÃÖУ¬rcfile×ÜÊdzÉΪ²»¶þµÄÑ¡Ôñ£¬´ï¹ÛÊý¾Ýƽ̨ÔÚÑ¡ÔñÎļþ´æ´¢¸ñʽʱҲ´óÁ¿Ñ¡ÔñÁËrcfile·½°¸¡£

3.2 ͳ¼Æ·ÖÎö

±¾½Ú½«´ÓÅÅÐòºÍ´°¿Úº¯ÊýÁ½¸ö·½ÃæµÄ½éÉÜHiveµÄͳ¼Æ·ÖÎö¹¦ÄÜ¡£

ÅÅÃûÈÈÃÅÅÅÃûÔÚʵ¼ÊµÄÒµÎñ³¡¾°Öо­³£Óö¼û¡£ÀýÈç×îÊÜ»¶Ó­µÄÊé¼®¡¢ÏúÁ¿TOP100µÄÉÌÆ·µÈµÈ¡£ÔÙʵ¼ÊÇé¿öÏ£¬ÎÒÃDz»½öÐèÒª¿¼ÂǸ÷Á¿»¯Ö¸±ê£¬»¹ÐèÒª¿¼ÂÇÖÃÐŶÈÎÊÌâ¡£

×î¼òµ¥µÄÅÅÃû£ºORDER BY value LIMIT n

ÉÏÊö²éѯ½ö½ö¿¼ÂÇÁËÁ¿»¯Ö¸±ê£¬ÅÅÃû²»¹»Æ½»¬£¬²¨¶¯½Ï´ó¡£

¸÷ÖÖÅÅÃû·½·¨Öڶ࣬´ï¹ÛÊý¾Ý·ÖÎöƽ̨ÔÚ½øÐÐitem ÅÅÃû¶à²ÉÓûùÓÚÓû§Í¶Æ±µÄÅÅÃûËã·¨¡£Èç»ùÓÚÍþ¶ûÑ·Çø¼äµÄÅÅÃûËã·¨£¬¸ÃËã·¨¿ÉÒԽϺõĽâ¾öСÑù±¾µÄ²»×¼È·ÎÊÌâ¡£

ͼ£ºÍþ¶ûÑ·Çø¼ä

´°¿Ú·ÖÎöº¯Êý

HiveÌṩÁ˷ḻÁËÊýѧͳ¼Æº¯Êý£¬Í¬Ê±Ò²ÌṩÁËÓû§×Ô¶¨Ò庯ÊýµÄ½Ó¿Ú£¬Óû§¿ÉÒÔ×Ô¶¨ÒåUDF¡¢UDAF¡¢UDTF Hive 0.11°æ±¾¿ªÊ¼Ìṩ´°¿ÚºÍ·ÖÎöº¯Êý£¨Windowing and Analytics Functions£©£¬°üÀ¨LEAD¡¢LAG¡¢FIRST_VALUE¡¢LAST_VALUE¡¢RANK¡¢ROW_NUMBER¡¢PERCENT_RANK¡¢CUBE¡¢ROLLUPµÈ¡£´°¿Úº¯ÊýÓë¾ÛºÏº¯ÊýÒ»Ñù£¬¶¼ÊǶԱí×Ó¼¯µÄ²Ù×÷£¬´Ó½á¹ûÉÏ¿´£¬Çø±ðÔÚÓÚ´°¿Úº¯ÊýµÄ½á¹û²»»á¾ÛºÏ£¬Ô­ÓеÄÿÐмǼÒÀÈ»»á´æÔÚ¡£´°¿Úº¯ÊýµÄµäÐÍ·ÖÎöÓ¦ÓðüÀ¨£º°´·ÖÇø¾ÛºÏ£¨ÅÅÐò£¬top nÎÊÌ⣩¡¢Ðмä¼ÆË㣨ʱ¼äÐòÁзÖÎö£©¡¢¹ØÁª¼ÆË㣨¹ºÎïÀº·ÖÎö£©¡£

ÎÒÃÇÒÔÒ»¸ö¼òµ¥µÄÐмä¼ÆËãµÄÀý×Ó˵Ã÷´°¿Úº¯ÊýµÄÓ¦Ó㨹ØÓÚÆäËûº¯ÊýµÄ¾ßÌå˵Ã÷£¬Çë²Î¿¼hiveÎĵµ£©¡£Óû§ÔĶÁÐÐΪµÄͳ¼Æ·ÖÎöÐèÒª´Óµã»÷Êé¼®ÐÐΪÖйéÄÉͳ¼Æ³öÀ´¡£Óû§ä¯ÀÀÈÕÖ¾½á¹¹ÈçϱíËùʾ£¬Ã¿Ìõ¼Ç¼ΪÓû§µÄµ¥´Îµã»÷ÐÐΪ¡£

ͨ¹ý¶ÔÁ¬ÐøµÄÓû§µã»÷ÈÕÖ¾·ÖÎö£¬Í¨¹ýHiveÌṩµÄ´°¿Ú·ÖÎöº¯Êý¿ÉÒÔ¼ÆËã³öÓû§¸÷Õ½ڵÄÔĶÁʱ¼ä¡£

SELECT userid, bookid, chapterid, end_time ¨C start_time as read_time
FROM
(
SELECT userid, bookid, chapterid, log_time as start_time,
lead(log_time,1,null) over(partition by userid, bookid order by log_time) as end_time
FROM user_read_log where pt=¡¯2015-12-01¡¯
) t;

ͨ¹ýÉÏÊö²éѯ¼È¿ÉÒÔÕÒ³ö2015-12-01ÈÕËùÓÐÓû§¶ÔÿһÕ½ڵÄÔĶÁʱ¼ä¡£Ö»ÄÜͨ¹ý¿ª·¢mr´úÂë»òÕßʵÏÖudafÀ´ÊµÏÖÉÏÊö¹¦ÄÜ¡£

´°¿Ú·ÖÎöº¯Êý¹Ø¼üÔÚÓÚ¶¨ÒåµÄ´°¿ÚÊý¾Ý¼¯¼°Æä¶Ô´°¿ÚµÄ²Ù×÷£¬Í¨¹ýover£¨´°¿Ú¶¨ÒåÓï¾ä£©À´¶¨Òå´°¿Ú¡£ÈÕ³£·ÖÎöºÍʵ¼ÊÓ¦ÓÃÖУ¬¾­³£»áÓд°¿Ú·ÖÎöÓ¦Óõij¡¾°£¬ÀýÈç»ùÓÚ·ÖÇøµÄÅÅÐò¡¢¼¯ºÏ¡¢Í³¼ÆµÈ¸´ÔÓ²Ù×÷¡£ÀýÈçÎÒÃÇÐèҪͳ¼Æÿ¸öÓû§ÔĶÁʱ¼ä×î¶àµÄ3±¾Êé:

ͼ£ºÐмä¼ÆËãʾÒâͼ¼°´úÂë

´°¿Úº¯ÊýʹµÃHiveµÄ¾ß±¸ÁËÍêÕûµÄÊý¾Ý·ÖÎö¹¦ÄÜ£¬ÔÚʵ¼ÊµÄÓ¦Óû·¾³ÖУ¬´ï¹ÛÊý¾Ý·ÖÎöÍŶӴóÁ¿Ê¹ÓÃhive´°¿Ú·ÖÎöº¯ÊýÀ´ÊµÏÖ½ÏΪ¸´ÔÓµÄÂß¼­£¬Ìá¸ß¿ª·¢ºÍµü´úЧÂÊ¡£

3.3 Óû§»­Ïñ

Óû§»­Ïñ¼´»ùÓÚÕæʵÊý¾ÝµÄÓû§Ä£ÐÍ¡£¼òµ¥À´Ëµ£¬Óû§»­ÏñÌáÈ¡ÁËÓû§µÄÊôÐÔÐÅÏ¢¡¢ÐÐΪÐÅÏ¢£¬´Ó¶ø¹éÄÉͳ¼Æ³öÆäÈË¿ÚѧÌØÕ÷¡¢Æ«ºÃÌØÕ÷µÈ¡£½¨Á¢Óû§Ä£Ð͵ÄÊ×ÒªÈÎÎñ¾ÍÊÇÌáÈ¡ÌØÕ÷£¬¼È°üÀ¨Óû§»ù±¾ÌØÕ÷£¬Ò²°üÀ¨ÐÐΪÌØÕ÷ºÍͳ¼ÆÌØÕ÷¡£

Óû§Ä£Ðͱ¾ÖÊÉϾÍÊÇ¿Ì»­Óû§ÐËȤµÄÄ£ÐÍ£¬¶øÓû§µÄÐËȤģÐÍÊǶàά¶È¡¢¶à³ß¶ÈµÄ¡£¿Ì»­Óû§Ä£ÐÍ»¹ÐèÒª´Óʱ¼äÉϽøÐжÈÁ¿£¬ÉõÖÁÊǽøÐжà³ß¶ÈµÄ×éºÏ£¬¸ù¾ÝÓû§ÐÐΪͳ¼Æʱ¼äµÄ³¤¶Ì£¬¿ÉÒÔ½«Óû§µÄÆ«ºÃ·ÖΪ¶ÌÆÚÆ«ºÃºÍ³¤ÆÚÆ«ºÃ¡£Æ«ºÃµÄȨÖؼ´ÎªÓû§µÄÆ«ºÃ³Ì¶ÈµÄ¶ÈÁ¿¡£

¶ÔÓû§Æ«ºÃµÄÃèÊö£¬»¹ÐèÒª¿¼ÂÇÖÃÐŶȵÄÎÊÌ⣬ÀýÈç¶ÔÓÚÒ»¸öÔĶÁÐÐΪ¼«ÆäÏ¡ÊèµÄÓû§À´Ëµ£¬¿Ì»­ÆäÔĶÁÀà±ðÆ«ºÃÊǺÁÎÞÒâÒåµÄ¡£

ͼ£ºÓû§»­Ïñ¿Ì»­

3.4 ·´×÷±×·ÖÎö

ÖÚËùÖÜÖª£¬´æÔÚÅÅÃû¾Í¿ÉÄÜ´æÔÚ×÷±×¡£ËÑË÷¹ã¸æ¡¢Ë÷»¥ÁªÍøË¢µ¥¡¢Ë¢°ñÏÖÏó²ã³ö²»Çî¡£Ò»°ãÀ´Ëµ£¬×÷±×µÄÄ¿µÄ¶¼ÊÇΪÁËÌá¸ß×Ô¼ºµÄÅÅÃû£¬»òÕßÊǽµµÍ¶ÔÊÖµÄÅÅÃû¡£ÀûÓÃHive¶ÔÊý¾Ý½øÐзÖÎö¿ÉÒÔ¹ýÂ˵ô½ÏÃ÷ÏÔµÄ×÷±×Êý¾Ý£¬´ïµ½Êý¾ÝÇåÏ´µÄÄ¿µÄ¡£

ÀýÈç¶ÔÓÚÒ»¸öË¢°ñ×÷±×ÐÐΪ£¬ÐèÒª×÷±××Ų»¶ÏË¢ÈÕÖ¾ÐÐΪÀ´Ìá¸ßÆäÅÅÃû£¬ÎÒÃÇ¿ÉÒÔÖ¸¶¨Èô¸É¹æÔòÀ´¹ýÂË×÷±×Êý¾Ý¡£ÈçͬIPͬÎïƷͬÐÐΪÊýÄ¿Òì³£¡¢Í¬Óû§IDÐÐΪƵ´ÎÒì³£¡¢Í¬ÎïÆ·IDÐÐΪƵ´ÎÒì³£µÈµÈ¡£ÈçÏÂͼ£¬Èç¹ûÏà±ÈÓÚËùÓÐitemµÄƽ¾ùÔö³¤Ç÷ÊÆ£¬Èç¹ûijitemµÄÔö³¤Ç÷ÊÆÏà¶Ôƽ¾ùˮƽ¹ý´ó£¬ÄÇôÆä×÷±×µÄ¸ÅÂʾͱȽϸߡ£

ͼ£º×÷±×Êý¾ÝÇ÷ÊÆÓëƽ¾ùÇ÷ÊÆÊý¾Ý¶Ô±È

×÷±×·ÖÎö»¹ÐèÒª½áºÏÒµÎñÐèÇóºÍÌص㣬²ÉÓúÏÊʵĻúÆ÷ѧϰËã·¨À´½øÐиü½øÒ»²½µÄÅжϺ͹ýÂË£¬´ïµ½·´×÷±×µÄÄ¿±ê¡£

4 HiveÓÅ»¯

´ï¹ÛµÄÊý¾Ý²Ö¿â»ùÓÚHive´î½¨£¬Ã¿ÈÕÐèÒª´¦Àí´óÁ¿µÄ¼ÆËãÁ÷³Ì£¬HiveµÄÎȶ¨ÐÔºÍÐÔÄÜÖÁ¹ØÖØÒª¡£ÖÚ¶àµÄÈÎÎñÐèÒªÎÒÃǺÏÀíµÄµ÷½Ú·ÖÅ伯Ⱥ×ÊÔ´£¬ºÏÀíµÄÅäÖø÷²ÎÊý£¬ºÏÀíµÄÓÅ»¯²éѯ¡£HiveÓÅ»¯°üº¬¸÷¸ö·½Ã棬Èçjob¸öÊýÓÅ»¯¡¢jobµÄmap/reducer¸öÊýÓÅ»¯¡¢²¢ÐÐÖ´ÐÐÓÅ»¯µÈµÈ£¬±¾½Ú½«Ö÷ÒªÌÖÂÛHQLÖеÄÎÞʱ²»ÔÚµÄJOINµÄÓÅ»¯¾­Ñé¡£

4.1 JoinÓï¾ä

¶ÔÓÚÉÏÊöµÄjoinÓï¾ä£¬ÆäÖÐbook_info±íÊýÁ¿ÎªÇ§¹æÄ££¬

INSERT OVERWRITE TABLE read_log_tmp
SELECT a.userid,a.bookid,b.author
FROM user_read_log a JOIN book_info b ON a.bookid = b.bookid;

¸ÃÓï¾äµÄÖ´Ðмƻ®Îª£º

ͼ£ºmap joinµÄÈÎÎñÖ´ÐÐÁ÷³Ì

¶ÔÓÚСÊý¾ÝÁ¿£¬hive»á×Ô¶¯²ÉÈ¡map joinµÄ·½Ê½À´ÓÅ»¯join£¬´ÓmapreduceµÄ±à³ÌÄ£ÐÍÀ´¿´£¬ÊµÏÖjoinµÄ·½Ê½Ö÷ÒªÓÐmap¶Ëjoin¡¢reduce¶Ëjoin¡£Map¶ËjoinÀûÓÃhadoop ·Ö²¼Ê½»º´æ¼¼Êõͨ¹ý½«Ð¡±í±ä»»³ÉhashtableÎļþ·Ö·¢µ½¸÷¸ötask£¬map´ó±íʱ¿ÉÒÔÖ±½ÓÅжÏhashtableÀ´Íê³Éjoin£¬×¢ÒâС±íµÄhashtableÊÇ·ÅÔÚÄÚ´æÖеģ¬ÔÚÄÚ´æÖÐ×÷Æ¥Å䣬Òò´Ëmap joinÊÇÒ»Öַdz£¿ìµÄjoin·½Ê½£¬Ò²ÊÇÒ»ÖÖ³£¼ûµÄÓÅ»¯·½Ê½¡£Èç¹ûС±í¹»Ð¡£¬ÄÇô¾Í¿ÉÒÔÒÔmap joinµÄ·½Ê½À´Íê³ÉjoinÍê³É¡£Hiveͨ¹ýÉèÖÃhive.auto.convert.join=true(ĬÈÏÖµ)À´×Ô¶¯Íê³Émap joinµÄÓÅ»¯£¬¶øÎÞÐèÏÔʾָʾmap join¡£È±Ê¡Çé¿öÏÂmap joinµÄÓÅ»¯ÊÇ´ò¿ªµÄ¡£

Reduce¶ËjoinÐèÒªreducerÀ´Íê³Éjoin¹ý³Ì£¬¶ÔÓÚÉÏÊöjoin´úÂ룬reduce ¶ËjoinµÄmrÁ÷³ÌÈçÏ£¬

ͼ£ºreduce¶ËjoinµÄmapreduce¹ý³Ì

Ïà±ÈÓÚmap join, reduce ¶ËjoinÎÞ·¨ÔÙmap¹ý³ÌÖйýÂËÈκμǼ£¬Ö»Äܽ«joinµÄÁ½ÕűíµÄËùÓÐÊý¾Ý°´ÕÕjoin key½øÐÐshuffle/sort£¬²¢°´ÕÕjoin keyµÄhashÖµ½«<key,value>¶Ô·Ö·¢µ½Ìض¨µÄreducer¡£Reducer¶ÔÓÚËùÓеļüÖµ¶ÔÖ´ÐÐjoin²Ù×÷£¬ÀýÈç0ºÅ£¨bookidµÄhashֵΪ0£©reducerÊÕµ½µÄ¼üÖµ¶ÔÈçÏ£¬ÆäÖÐT1¡¢T2±íʾ¼Ç¼µÄÀ´Ô´±í£¬Æðµ½±êʶ×÷Óãº

ͼ£ºreduce¶ËjoinµÄreducer join

Reducer¶ËjoinÎÞ·¨±ÜÃâµÄreduce½Ø¶ÏÒÔ¼°´«ÊäµÄ´óÁ¿Êý¾Ý¶¼»á¸ø¼¯ÈºÍøÂç´øÀ´Ñ¹Á¦£¬´ÓÉÏͼ¿ÉÒÔ¿´³öËùÓÐhash(bookid) % reducer_numberµÈÓÚ0µÄkey-value¶Ô¶¼»áͨ¹ýshuffle±»·Ö·¢µ½0ºÅreducer£¬Èç¹û·Öµ½0ºÅreducerµÄ¼Ç¼ÊýÄ¿Ô¶´óÓÚÆäËûreducerµÄ¼Ç¼ÊýÄ¿£¬ÏÔÈ»0ºÅµÄreducerµÄÊý¾Ý´¦ÀíÁ¿½«»áÔ¶´óÓÚÆäËûreducer£¬Òò´Ë´¦Àíʱ¼äÒ²»áÔ¶´óÓÚÆäËûreducer£¬ÉõÖÁ»á´øÀ´ÄÚ´æµÈÆäËûÎÊÌ⣬Õâ¾ÍÊÇÊý¾ÝÇãбÎÊÌâ¡£¶ÔÓÚjoinÔì³ÉµÄÊý¾ÝÇãбÎÊÌâÎÒÃÇ¿ÉÒÔͨ¹ýÉèÖòÎÊýset Hive.optimize.skewjoin=true£¬ÈÃhive×Ô¼º³¢ÊÔ½â¾öjoin¹ý³ÌÖвúÉúµÄÇãбÎÊÌâ¡£

4.2 Group by

Óï¾äÎÒÃǶÔuser_read_log±í°´userid goup byÓï¾äÀ´¼ÌÐø̽ÌÖÊý¾ÝÇãбÎÊÌ⣬Ê×ÏÈÎÒÃÇexplain group byÓï¾ä£º

explain select userid,count(*) from user_read_log group by userid

ͼ£ºgoup byµÄÖ´Ðмƻ®

Group byµÄÖ´Ðмƻ®°´ÕÕuseridµÄhashÖµ·Ö·¢Êý¾Ý£¬Í¬Ê±ÔÚmap¶ËÒ²×öÁ˱¾µØreduce£¬group byµÄshuffle¹ý³ÌÊÇ°´ÕÕhash(userid)À´·Ö·¢µÄ£¬Êµ¼ÊÓ¦ÓÃÖÐÈÕÖ¾ÖкܶàÓû§¶¼ÊÇδע²áÓû§»òÕßδµÇ¼£¬userid×Ö¶ÎΪ¿ÕµÄ¼Ç¼ÊýÔ¶´óÓÚuserid²»Îª¿ÕµÄ¼Ç¼Êý£¬µ±ËùÓеĿÕuserid¼Ç¼¶¼·Ö·¢µ½Ìض¨Ä³Ò»¸öreducerºó£¬Ò²»á´øÀ´ÑÏÖصÄÊý¾ÝÇãбÎÊÌâ¡£Ôì³ÉÊý¾ÝÇãбµÄÖ÷ÒªÔ­ÒòÔÚÓÚ·Ö·¢µ½Ä³¸ö»òij¼¸¸öreducerµÄÊý¾ÝÁ¿Ô¶´óÓÚÆäËûreducerµÄÊý¾ÝÁ¿¡£

¶ÔÓÚgroup byÔì³ÉµÄÊý¾ÝÇãбÎÊÌ⣬ÎÒÃÇ¿ÉÒÔͨ¹ýÉèÖòÎÊý

set hive.map.aggr=true (¿ªÆômap¶Ëcombiner);

set hive.groupby.skewindata=true£»

Õâ¸ö²ÎÊýµÄ×÷ÓÃÊÇ×öreduce²Ù×÷µÄʱºò£¬Äõ½µÄkey²¢²»ÊÇËùÓÐÏàֵͬ¸øͬһ¸öReduce£¬¶øÊÇËæ»ú·Ö·¢£¬È»ºóreduce×ö¾ÛºÏ£¬×öÍêÖ®ºóÔÙ×öÒ»ÂÖMR£¬ÄÃÇ°Ãæ¾ÛºÏ¹ýµÄÊý¾ÝÔÙËã½á¹û¡£ËäÈ»¶àÁËÒ»ÂÖMRÈÎÎñ£¬µ«ÊÇ¿ÉÒÔÓÐЧµÄ¼õÉÙÊý¾ÝÇãбÎÊÌâ¿ÉÄÜ´øÀ´µÄΣÏÕ¡£

Hive½â¾öÊý¾ÝÇãб

ÕýÈ·µÄÉèÖÃHive²ÎÊý¿ÉÒÔÔÚijÖ̶ֳÈÉϱÜÃâµÄÊý¾ÝÇãбÎÊÌ⣬ºÏÊʵIJéѯÓï¾äÒ²¿ÉÒÔ±ÜÃâÊý¾ÝÇãбÎÊÌâ¡£Òª¾¡ÔçµÄ¹ýÂËÊý¾ÝºÍ²Ã¼ôÊý¾Ý£¬¼õÉÙºóÐø´¦ÀíµÄÊý¾ÝÁ¿£¬Ê¹µÃjoin keyµÄÊý¾Ý·Ö²¼½ÏΪ¾ùÔÈ£¬½«¿Õ×Ö¶ÎËæ»ú¸³ÓèÖµ£¬ÕâÑù¼È¿ÉÒÔ¾ùÔÈ·Ö·¢ÇãбµÄÊý¾Ý£º

select userid,name from user_info a
join (
select case when userid is null then cast(rand(47)*100000 as int)
else userid
from user_read_log
) b on a.userid = b.userid

Èç¹ûÓû§ÔÚ¶¨ÒåschemaµÄʱºò¾ÍÒѾ­Ô¤Áϵ½±íÊý¾Ý¿ÉÄÜ»á´æÔÚÑÏÖصÄÊý¾ÝÇãбÎÊÌ⣬Hive×Ô0.10.0ÒýÈëÁËskew tableµÄ¸ÅÄÈ罨±íÓï¾ä

CREATE TABLE user_read_log (userid int,bookid, ¡­)
SKEWED BY (userid) ON (null) [STORED AS DIRECTORIES];

ÐèҪעÒâµÄÊÇ£¬skew tableÖ»Êǽ«ÇãбÌرðÑÏÖصÄÁеķֿª´æ´¢Îª²»Í¬µÄÎļþ£¬Ã¿¸öÖƶ¨µÄÇãбֵÖƶ¨ÎªÒ»¸öÎļþ»òÕßĿ¼£¬Òò´ËÔÚ²éѯµÄʱºò¿ÉÒÔͨ¹ý¹ýÂËÇãбֵÀ´±ÜÃâÊý¾ÝÇãбÎÊÌ⣺

select userid,name from user_info a
join (
select userid from user_read_log where pt=¡¯2015¡¯ and userid is not null
) b on a.userid = b.userid

¿ÉÒÔ¿´³ö£¬Èç¹û²»¼Ó¹ýÂËÌõ¼þ£¬ÇãбÎÊÌ⻹ÊÇ»á´æÔÚ£¬Í¨¹ý¶Ôskew table¼Ó¹ýÂËÌõ¼þµÄºÃ´¦ÊDZÜÃâÁËmapperµÄ±íɨÃè¹ýÂ˲Ù×÷¡£

4.3 JoinµÄÎïÀíÓÅ»¯

HiveÄÚ²¿ÊµÏÖÁËMapJoinResolver£¨´¦ÀíMapJoin£©¡¢SkewJoinResolver£¨´¦ÀíÇãбjoin£©¡¢CommonJoinResolver£¨´¦ÀíÆÕͨJoin£©µÈÀàÀ´ÊµÏÖjoinµÄ²éѯÎïÀíÓÅ»¯£¨/org/apache/hadoop/hive/ql/optimizer/physical£©¡£

CommonJoinResolverÀฺÔð½«ÆÕͨJoinת»»³ÉMapJoin£¬Hiveͨ¹ýÕâ¸öÀàÀ´ÊµÏÖmapjoinµÄ×Ô¶¯ÓÅ»¯¡£¶ÔÓÚ±íAºÍ±íBµÄjoin²éѯ£¬»á²úÉú3¸ö·ÖÖ§£º

1) ÒÔ±íA×÷Ϊ´ó±í½øÐÐMapjoin£»

2) ÒÔ±íA×÷Ϊ´ó±í½øÐÐMapjoin£»

3) Map-reduce join

ÓÉÓÚ²»ÖªµÀÊäÈëÊý¾Ý¹æÄ££¬Òò´Ë±àÒëʱ²¢²»»á¾ö¶¨×ßÄǸö·ÖÖ§£¬¶øÊÇÔÚÔËÐÐʱÅжÏ×ßÄǸö·ÖÖ§¡£ÐèҪעÒâµÄÊÇÒªÏñÍê³ÉÉÏÊö×Ô¶¯×ª»»£¬ÐèÒª½«hive.auto.convert.join.noconditionaltaskÉèÖÃΪtrue£¨Ä¬ÈÏÖµ£©£¬Í¬Ê±¿ÉÒÔÊÖ¹¤¿ØÖÆתÔؽøÄÚ´æµÄС±íµÄ´óС£¨hive.auto.convert.join.noconditionaltask.size£©¡£

MapJoinResolver ÀฺÔðµü´ú¸÷¸ömrÈÎÎñ£¬¼ì²éÿ¸öÈÎÎñÊÇ·ñ´æÔÚmap join²Ù×÷£¬Èç¹ûÓУ¬»á½«local map workת»»³Élocal map join work¡£

SkewJoinResolverÀฺÔðµü´úÓÐjoin²Ù×÷µÄreducerÈÎÎñ£¬Ò»µ©µ¥¸öreducer²úÉúÁËÇãб£¬ÄÇô¾Í»á½«ÇãбֵµÃÊý¾ÝдÈëhdfs£¬È»ºóÓÃÒ»¸öеÄmap joinµÄÈÎÎñÀ´´¦ÀíÇãбֵµÄ¼ÆËã¡£ËäÈ»¶àÁËÒ»ÂÖmrÈÎÎñ£¬µ«ÊÇÓÉÓÚ²ÉÓõÄmap join£¬Ð§ÂÊÒ²ÊǺܸߵġ£Á¼ºÃµÄmrģʽºÍÖ´ÐÐÁ÷³Ì×ÜÊÇÖÁ¹ØÖØÒªµÄ¡£

5 ×ܽá

±¾ÎÄÏêϸ½éÉÜÁË´ï¹Û´óÊý¾Ý·ÖÎöƽ̨µÄ»ù±¾¼Ü¹¹ºÍÔ­Àí£¬»ùÓÚhadoop/hiveµÄ´óÊý¾Ý·ÖÎöƽ̨ʹº£Á¿Êý¾ÝµÄ´æ´¢¡¢·ÖÎö¡¢ÍÚ¾òÖð²½³ÉΪÏÖʵ£¬²¢´øÀ´ÒâÏë²»µ½µÄÒæ´¦¡£×÷ΪÊý¾Ý·ÖÎöƽ̨Ö÷Á¦¾üµÄHiveÈÔÈ»´¦ÔÚ²»¶ÏµÄ·¢Õ¹Ö®ÖУ¬½«HQLÀí½â³ÉMapreduce³ÌÐò¡¢Àí½âHadoopµÄºËÐÄÄÜÁ¦ÊǸüºÃµÄʹÓúÍÓÅ»¯HiveµÄ¸ù±¾¡£´ï¹ÛÊý¾ÝÍŶÓÒ²½«½ô¸ú¼¼Êõ·¢Õ¹³±Á÷£¬½áºÏ×ÔÉíµÄÒµÎñÐèÇ󣬲ÉÈ¡ºÏÀíµÄ¿ò¼Ü¼Ü¹¹£¬ÌáÉýÊý¾Ýƽ̨µÄ´¦ÀíÄÜÁ¦¡£

   
3030 ´Îä¯ÀÀ       19
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ
×îл¼Æ»®
Èí¼þ¼Ü¹¹Éè¼Æ·½·¨¡¢°¸ÀýÓëʵ¼ù 8-23[ÌØ»Ý]
LinuxÄں˱à³Ì¼°É豸Çý¶¯ 8-15[±±¾©]
Python¡¢Êý¾Ý·ÖÎöÓë»úÆ÷ѧϰ 8-23[ÌØ»Ý]
ǶÈëʽÈí¼þ¼Ü¹¹Éè¼Æ 8-22[ÏßÉÏ]
QTÓ¦Óÿª·¢ 9-5[±±¾©]

MySQLË÷Òý±³ºóµÄÊý¾Ý½á¹¹
MySQLÐÔÄܵ÷ÓÅÓë¼Ü¹¹Éè¼Æ
SQL ServerÊý¾Ý¿â±¸·ÝÓë»Ö¸´
ÈÃÊý¾Ý¿â·ÉÆðÀ´ 10´óDB2ÓÅ»¯
oracleµÄÁÙʱ±í¿Õ¼äдÂú´ÅÅÌ
Êý¾Ý¿âµÄ¿çƽ̨Éè¼Æ

²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿â
¸ß¼¶Êý¾Ý¿â¼Ü¹¹Éè¼Æʦ
HadoopÔ­ÀíÓëʵ¼ù
Oracle Êý¾Ý²Ö¿â
Êý¾Ý²Ö¿âºÍÊý¾ÝÍÚ¾ò
OracleÊý¾Ý¿â¿ª·¢Óë¹ÜÀí

GE Çø¿éÁ´¼¼ÊõÓëʵÏÖÅàѵ
º½Ìì¿Æ¹¤Ä³×Ó¹«Ë¾ Nodejs¸ß¼¶Ó¦Óÿª·¢
ÖÐÊ¢Ò滪 ׿Խ¹ÜÀíÕß±ØÐë¾ß±¸µÄÎåÏîÄÜÁ¦
ijÐÅÏ¢¼¼Êõ¹«Ë¾ PythonÅàѵ
ij²©²ÊITϵͳ³§ÉÌ Ò×ÓÃÐÔ²âÊÔÓëÆÀ¹À
ÖйúÓÊ´¢ÒøÐÐ ²âÊÔ³ÉÊì¶ÈÄ£Ðͼ¯³É(TMMI)
ÖÐÎïÔº ²úÆ·¾­ÀíÓë²úÆ·¹ÜÀí