Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Modeler   Code  
»áÔ±   
 
   
 
 
     
   
 ¶©ÔÄ
  ¾èÖú
Hadoop¼¯Èºdatanode´ÅÅ̲»¾ùºâµÄ½â¾ö·½°¸
 
×÷ÕߣºÖìÁÖº£ À´Ô´£ºÊý¾ÝÔÓ»õÆÌ·¢²¼ÓÚ£º 2016-10-24
  2331  次浏览      27
 

Ò»¡¢ÒýÑÔ£º

HadoopµÄHDFS¼¯Èº·Ç³£ÈÝÒ׳öÏÖ»úÆ÷Óë»úÆ÷Ö®¼ä´ÅÅÌÀûÓÃÂʲ»Æ½ºâµÄÇé¿ö£¬±ÈÈ缯ȺÖÐÌí¼ÓеÄÊý¾Ý½Úµã£¬½ÚµãÓë½ÚµãÖ®¼ä´ÅÅÌ´óС²»Ò»ÑùµÈµÈ¡£µ±hdfs³öÏÖ²»Æ½ºâ×´¿öµÄʱºò£¬½«Òý·¢ºÜ¶àÎÊÌ⣬±ÈÈçMR³ÌÐòÎÞ·¨ºÜºÃµØÀûÓñ¾µØ¼ÆËãµÄÓÅÊÆ£¬»úÆ÷Ö®¼äÎÞ·¨´ïµ½¸üºÃµÄÍøÂç´ø¿íʹÓÃÂÊ£¬»úÆ÷´ÅÅÌÎÞ·¨ÀûÓõȵȡ£

¶þ¡¢ÎÊÌ⣺

ÒòÒµÎñÐèÒª´î½¨Ò»¸öÐÂhadoop¼¯Èº£¬²¢½«ÀϵÄhadoop¼¯ÈºÖеÄÊý¾ÝÇ¨ÒÆÖÁеÄhadoop¼¯Èº£¬¶øÇÒdatanode½Úµã²»ÄÜÈ«²¿ÉÏÏߣ¬ÆäÖл¹¿ÉÄÜ»á³öÏÖ½ÚµãÉÏÏß»òÏÂÏßµÄÇé¿ö£¬Õâ¸öʱºò¾ÍºÜÈÝÒ׳öÏÖ»úÆ÷Óë»úÆ÷Ö®¼ä´ÅÅ̵ľùºâµÄÇé¿ö£¬¾ßÌåÈçÏ£º

ÉÏͼÖпÉÒÔ¿´³ömaxÊÇ94.18%£¬¶øminÊÇ0.37%£¬ÆäÖÐÓÐ600¶ą̀ÊÇ´ïµ½94%µÄ£¬Õâ¸öʱºòÔÚÅÜmapredµÄʱºòÍùÍù»á±¨´íÎó£º

µÇ½µ½¸Ã»úÆ÷Éϲ鿴·þÎñÆ÷µÄ´ÅÅÌ£¬´ÅÅ̶¼¿ìÒѾ­´ïµ½100%£¬ÈçÏ£º

ÒòΪÎÒÃÇÔÚhdfs-site.xmlÖÐÉèÖÃÁËdfs.datanode.du.reservedµÄÖµ£¬ËùÒÔ´ÅÅÌ»áÓÐÒ»¶¨Ô¤Áô¿Õ¼ä£º

<property>  
<name>dfs.datanode.du.reserved</name>
<value>107374182400</value>
</property>

ÉÏÃæÕâ¸ö²ÎÊýµÄÒâ˼£º

Reserved space in bytes per volume. Always leave this much space free for non dfs use.

Ôٲ鿴datanodeÈÕÖ¾£¬Ï£ÍûÄÜÕÒµ½¿É¿¿µÄÏßË÷£º

ÕâÖÖ´íÎóÎÞ·¨Í¨¹ýnamenodeÀ´±ÜÃ⣬ÒòΪËü²»»áÔÙfailedµÄʱºòÈ¥³¢ÊÔÍù±ðµÄ½ÚµãдÊý£¬ ×î³õµÄ°ì·¨Êǽ«¸Ã½ÚµãµÄdatanode¹Ø±Õµô£¬¾ÍÄÜ˳ÀûµØÅÜÍêÕâ¸ömapreduce¡£

ÔÙÕ߲鿴namenodeµÄÒ³Ãæ£¬¿´µ½ÓкöàdatanodeµÄ½ÚµãµÄRemaining¿ìÒªÇ÷ÓÚ0BÁË£¬Õâ¸öʱºò¾ÍºÜÈÝÒ׳öÏÖÉÏÃæµÄ±¨´í¡£

ΪÁË·ÀÖ¹ÉÏÃæµÄ±¨´íÔٴγöÏÖÒÔ¼°±ÜÃâhdfsÊý¾Ý²»¾ùºâ£¬¶Ôhadoop¼¯Èº×öbalanceÒѾ­²»¿É±ÜÃâÁË!

¶þ¡¢½â¾ö·½°¸

1¡¢balancer

´ó¼ÒÊ×ÏÈ»áÏëµ½hadoop×Ô´øµÄbalancer£¬ÄǾÍÏȽéÉÜÒ»ÏÂbalancer!

Balancer.javaÖÐÊÇÕâôÃèÊöbalancerµÄ£º

The balancer is a tool that balances disk space usage on an HDFS cluster when some datanodes become full or when new empty nodes join the cluster.

The tool is deployed as an application program that can be run by the cluster administrator on a live HDFS cluster while applications adding and deleting files.

ÏÂÃæµÄͼƬÊǹÙÍøÖÐbalancerÃüÁîµÃÏê½â£º

¿¼Âǵ½balancerÊÇ×î½üÐèÒª¾­³£×öµÄ²Ù×÷£¬ËùÒÔÎÒÃÇ×Ô¼º¿ª·¢ÁËÒ»¸ö²é¿´balancerÇé¿öµÄÒ³Ãæ£¬½á¹ûÈçÏ£º

ÉÏͼ¿ÉÒÔ¿´µ½Ã¿¸ö¼¯ÈºÏÂbalancerÖ´ÐÐÇé¿ö¡£

balanceÒ»ÌìÄܳɹ¦Òƶ¯µÄÊý¾ÝÁ¿´óÔ¼ÔÚ10-20T£¬Õâ¸öÊý¾ÝÁ¿ºÜÄÑÂú×㳬´ó¼¯Èº¡£

ĿǰÎÒÃǵ÷ÓÃbalance»áʹÓÃÈçÏÂÃüÁ

start-balancer.sh -threshold 20 -policy blockpool -include -f /tmp/ip.txt 

ÉÏÃæµÄÃüÁîͨ¹ýÊÖ¹¤É¸Ñ¡³ö´ÅÅ̸ߵĺʹÅÅ̵͵ķÅÔÚip.txtÎļþÖУ¬ÕâÑùbalance¾Íֻͨ¹ýÕâÎļþÀïµÄÁË£¬ÁíÍ⻹ÐèÒªÉèÖÃÊʵ±µÄthresholdÖµ£¬ÒòΪÊǶànamespaceµÄ£¬ËùÒÔÐèҪѡÔñblockpoolģʽ¡£

ÁíÍâ´ø¿íÒ²ÊÇÏÞÖÆbalanceµÄÒ»¸öÒòËØ£¬ÔÚhdfs-site.xmlÖÐÊÇÓÐÉèÖõģº

<property>  
<name>dfs.datanode.balance.bandwidthPerSec</name>
<value>10485760</value>
</property>

µ«ÊÇÕâ¸öÐèÒªÖØÆô£¬hadoopÌṩÁËÒ»¸ö¶¯Ì¬µ÷ÕûµÄÃüÁ

hdfs dfsadmin -fs hdfs://ns1:8020 -setBalancerBandwidth 104857600 
hdfs dfsadmin -fs hdfs://ns2:8020 -setBalancerBandwidth 104857600

2¡¢ÉÏϽڵ㣺

Æäʵ½«¸ß´ÅÅ̵ĽڵãÇ¿ÖÆDecommissionÊÇ×î¿ì×îÓÐЧµÄ·½°¸¡£

ϽڵãµÄʱºò¿ÉÄÜ»á³öÏÖÓÐns²»ÄÜÕý³£ÏµôµÄÇé¿ö£¬ÆäʵÕâ¸öʱºò½ÚµãµÄÊý¾Ý´ó²¿·ÖÒѾ­ÒƳöÈ¥ÁË£¬¿ÉÄÜÓÐһЩ¿é¿¨ÔÚÄDZßûÓÐÒÆ³öÈ¥¡£

Õâ¸öʱºòÖ»ÄÜÒ»¸öÒ»¸ö½Úµã½«ÒѾ­Decommissioned½Úµãstopµôdatanode½ø³Ì£¬Èç¹ûÔÚnamenodeµÄÒ³ÃæÉÏ¿´µ½ÓжªÊ§¿éµÄ»°£¬¾ÍÐèÒª½«Õâ¸ö¿éÏÈgetµ½±¾µØ£¬ÔÚputÉÏÈ¥¡£ÀýÈ磺

hdfs dfs -get hdfs://ns1/test1/dt=2016-07-24/000816_0.lzo 

hdfs dfs -put -f 000816_0.lzo hdfs://ns1/test1/dt=2016-07-24/000816_0.lzo

hdfs dfs -chown test1:test1 hdfs://ns1/test1/dt=2016-07-24/000816_0.lzo

ǰÌáÌõ¼þÐèÒª½«Õâ¸ö½ÚµãµÄdatanodeÖØÐÂÆô¶¯¡£

3¡¢Éý½µÊý¾Ý¸±±¾£º

Éý½µ¸±±¾ÊÇÒ»¸öÆÈ²»µÃÒѵİ취£¬ÕâÑùÈç¹ûdatanodeÓйҵô½Úµã£¬¾Í»áÔö¼Ó¶ªÊ§¿éµÄ¼¸ÂÊ¡£

¾ßÌå½µ¸±±¾µÄÃüÁîÈçÏ£º

hdfs dfs -setrep -R -w 2 hdfs://ns1/tmp/test.db 

Éý¸±±¾µÄÃüÁîÈçÏ£º

hdfs dfs -setrep -R -w 3 hdfs://ns1/tmp/test.db 

ÉÏÃæµÄÃüÁîÊǽ«ns1ϵÄ/tmp/test.db¸±±¾Êý½µÖÁ2¸ö£¬È»ºóÓÖ½«ËüÉýÖÁ3¸ö¸±±¾¡£¾ßÌåµÄhdfs dfs -setrepÃüÁîÈçÏÂͼ£º

ÕâÑù¶¯Ì¬µÄÉý½µ¸±±¾¿ÉÒÔ½â¾ö¡£

ÁíÍâÔÚÉý½µ¸±±¾µÄÓöµ½Ò»¸öBUG£º

ÍÆ²â¿ÉÄÜÊÇnamenodeµÄreplicationsÄ£¿éÓк»×¡Çé¿ö£¬ËùÒÔ³öÏÖ¸ÃÇé¿öÖ´ÐÐkillµô½øÐУ¬Ìø¹ý¸Ã¿éÔÙÅÜ!

×ܽ᣺֮ËùÒÔÑ¡ÔñʹÓÃÉý½µ¸±±¾ÊÇÒòΪËü²»ÊÜ´ø¿íµÄ¿ØÖÆ£¬ÁíÍâÔÚÉý½µ¸±±¾µÄʱºòhadoopÊÇÐèÒªÖØÐÂдÊýµÄ£¬Õâ¸öʱºòËü»áÓÅÏÈÍù´ÅÅ̵ÍдÊý¾Ý£¬ÕâÑù¾ÍÄܽ«´ÅÅ̸ߵÄÊý¾ÝÇ¨ÒÆÖÁ´ÅÅ̵͵ġ£

4¡¢distcp

DistCp (distributed copy) is a tool used for large inter/intra-cluster copying. It uses MapReduce to effect its distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list. Its MapReduce pedigree has endowed it with some quirks in both its semantics and execution. The purpose of this document is to offer guidance for common tasks and to elucidate its model.

ÔÚÕâÀï¾ÙÒ»¸öÀý×Ó£º

ͨ¹ýdistcp½«/tmp/output12ÉϵÄÊý¾Ýµ÷ÓÃmapreduceÇ¨ÒÆÖÁ/tmp/zhulhĿ¼Ï£¬Ô­ÏÈ/tmp/output12ÉϵÄÊý¾Ý»¹ÊÇÓдæÔڵ쬵«ÊÇËüµÄ¿é¾Í·¢ÉúÁ˱仯¡£

Õâ¸öʱºòÓÐÈË¿ÉÄÜ»á˵Ôõô²»Ê¹ÓÃcpÃüÁîÄØ?

Á½ÕßµÄÇø±ðÈçÏ£º

CPµÄģʽÊDz»×ßmapreduceµÄ;DISTCPµÄģʽÊÇ×ßmapreduceµÄ£¬ËùÒÔËüÓÅÏÈдÓÐnodemanagerµÄ»úÆ÷;

CPÊǵ¥Ï̵߳ģ¬ÀàËÆscpµÄģʽ£¬ÔÚÖ´ÐÐËÙ¶ÈÉϱÈDISTCPÒªÂýºÜ¶à¡£

5¡¢Ìá¸ßdfs.datanode.du.reservedÖµ

¹ÙÍøÊÇÕâô˵µÄ£ºReserved space in bytes per volume. Always leave this much space free for non dfs use.

ÔÚÉÏÃæµÄÌáµ½dfs.datanode.du.reservedµÄÖµÊÇÉè³É100G£¬ÒòΪnamenodeÈÏΪ¸Ã½Úµã»¹ÓÐÊ£ÓàµÄ¿Õ¼ä£¬ËùÒÔ¸ø·ÖÅäµ½ÕâÀ¼ÙÈçÕâ¸ö¿éÊÇ128K£¬µ«ÊÇʵ¼ÊÊ£Óà¿Õ¼äÖ»ÓÐ100K£¬ËùÒԾͻᱨÉÏÃæµÄ´íÎ󣬼ÙÈç°Ñdfs.datanode.du.reserved³É300G£¬ÈÃnamenodeÖªµÀ¸Ã½ÚµãÒѾ­Ã»ÓÐÊ£Óà¿Õ¼ä£¬ËùÒԾͲ»»áÍùÕâÀïдÊý¾ÝÁË¡£

6¡¢¹Ø±Õnodemanger½ø³Ì

ÔÚÏÖÓмÆËã×ÊÔ´¶àÓàµÄÇé¿öÏ£¬¿ÉÒÔ¿¼ÂǹرոߴÅÅ̽ڵãµÄnodemanager£¬±ÜÃâÔڸýڵãÆðYarnChild£¬ÒòΪÈç¹ûÔڸýڵãÉϽøÐмÆËãµÄ»°£¬Êý¾Ý´æ´¢Ê×ÏÈ»áÍù±¾µØÐ´Ò»·Ý£¬ÕâÑù¸ü¼Ó¼ÓÖØÁ˱¾µØ½ÚµãµÄ¸ºµ£¡£

7¡¢É¾³ý¾ÉÊý¾Ý

¸Ã·½°¸ÊÇÔÚÆÈ²»µÃÒѵÄÇé¿öϽøÐеģ¬ÒòΪɾµôµÄÊý¾Ý¿ÉÄÜÒԺ󻹵ò¹»ØÀ´£¬ÕâÑùµÄ»°ÓÖÊǵÃÒªÀË·ÑÒ»¶¨µÄʱ¼ä¡£

ÁíÍâÔÚɾ³ýÊý¾Ýʱºò¾ÍµÃÐèÒªÌø¹ý»ØÊÕÕ¾²ÅÄÜËãÊÇÕæÕýɾ³ý£¬¿ÉÒÔʹÓõÄÃüÁîÈçÏ£º

Èý¡¢·½°¸Ñ¡Ôñ

¿¼Âǵ½Óжà´ï600̨»úÆ÷´ÅÅÌʹÓÃÂÊ´ïµ½94%£¬¶øÇÒÕⲿ·Ö¸ßµÄ»úÆ÷ÊÇÔÚͬһ¸ö»ú·¿µÄ£¬ËùÒÔ²»ÄܲÉÓÃÉÏϽڵãµÄ·½·¨£¬×îºÃµÄ°ì·¨ÈçÏ£º

1¡¢Ìá¸ßdfs.datanode.du.reservedµÄÖµ;

2¡¢¹Ø±Õnodemanager½ø³Ì;

3¡¢Éý½µ¸±±¾;

4¡¢Æô¶¯hadoop×Ô´øµÄbalance;

È˹¤µÄ¶¨ÆÚ¹Û²ì£¬µ±´ïµ½ÆÚÍûµÄЧ¹ûµÄʱºò¾ÍÊǻָ´³ÉÔ­Ñù;ÔÚÌá¸ßdfs.datanode.du.reservedµÄÖµ¾ÍµÃÐèÒª¿¼Âǵ½datanodeÐèÒª½øÐÐÂÖѯµÄÖØÆô£¬Õâ¸öʱºò¾Í¿¼Âǵ½Ê±¼ä¼ä¸ô£¬Èç¹ûʱ¼ä¹ý¶Ì¾Í¿ÉÄܾͶª£¬Èç¹û¹ý³¤¾ÍÊǷѵÄʱ¼ä±È½Ï¶à¡£

ÕâÖÖ·½·¨ºÃ±È£º±ÈÈçÔÚ½Ú¼ÙÈÕµÄʱºò£¬Ä³¸öÊշѿڵijµÁ¾Ìرð¶à£¬ÄǸöʱºòÖ´·¨ÈËÔ±¾Í»á·â±ÕÕâ¸öÊÕ·ÑÕ¾µÄ³ö¿Ú£¬µÈ³µÁ¾¹ýµÄ²î²»¶àµÄʱºòÔÙ¸ø¿ª·Å¡£Õâ´ÎµÄ·½°¸ÓÐÕâ¸öÓеãÀàËÆ£¬µ±Ö÷»úµÄdfs.datanode.du.reservedÖµ¸ßÓÚĿǰ´ÅÅÌʹÓõÄÇé¿ö£¬namenode¾Í²»»á·ÖÅäÊý¾Ý¹ýÀ´ÁË£¬Í¨¹ýÉý½µ¸±±¾ºÍbalanceÄÜ¿ìËٵĽ«±¾»úµÄÊý¾Ý×ªÒÆ×ß¡£

ËÄ¡¢½áÊøÓï

±¾ÆªÎÄÕÂÖ÷Òª½éÉÜÁ˶ÔhadoopÊý¾Ý³öÏÖ²»¾ùºâÇé¿öÏ¿ÉÒÔʹÓõķ½·¨£¬¼°ÎÒÃÇÇé¿öÏÂʹÓõķ½°¸!

   
2331 ´Îä¯ÀÀ       27
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ
×îл¼Æ»®
DeepSeekÔÚÈí¼þ²âÊÔÓ¦ÓÃʵ¼ù 4-12[ÔÚÏß]
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢Êµ¼ù 4-19[ÔÚÏß]
UAF¼Ü¹¹ÌåϵÓëʵ¼ù 4-11[±±¾©]
AIÖÇÄÜ»¯Èí¼þ²âÊÔ·½·¨Óëʵ¼ù 5-23[ÉϺ£]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 4-26[±±¾©]
ÒµÎñ¼Ü¹¹Éè¼ÆÓ뽨ģ 4-18[±±¾©]

APPÍÆ¹ãÖ®ÇÉÓù¤¾ß½øÐÐÊý¾Ý·ÖÎö
Hadoop Hive»ù´¡sqlÓï·¨
Ó¦Óö༶»º´æÄ£Ê½Ö§³Åº£Á¿¶Á·þÎñ
HBase ³¬Ïêϸ½éÉÜ
HBase¼¼ÊõÏêϸ½éÉÜ
Spark¶¯Ì¬×ÊÔ´·ÖÅä

HadoopÓëSpark´óÊý¾Ý¼Ü¹¹
HadoopÔ­ÀíÓë¸ß¼¶Êµ¼ù
HadoopÔ­Àí¡¢Ó¦ÓÃÓëÓÅ»¯
´óÊý¾ÝÌåϵ¿ò¼ÜÓëÓ¦ÓÃ
´óÊý¾ÝµÄ¼¼ÊõÓëʵ¼ù
Spark´óÊý¾Ý´¦Àí¼¼Êõ

GE Çø¿éÁ´¼¼ÊõÓëʵÏÖÅàѵ
º½Ìì¿Æ¹¤Ä³×Ó¹«Ë¾ Nodejs¸ß¼¶Ó¦Óÿª·¢
ÖÐÊ¢Òæ»ª ׿Խ¹ÜÀíÕß±ØÐë¾ß±¸µÄÎåÏîÄÜÁ¦
ijÐÅÏ¢¼¼Êõ¹«Ë¾ PythonÅàѵ
ij²©²ÊITϵͳ³§ÉÌ Ò×ÓÃÐÔ²âÊÔÓëÆÀ¹À
ÖйúÓÊ´¢ÒøÐÐ ²âÊÔ³ÉÊì¶ÈÄ£Ðͼ¯³É(TMMI)
ÖÐÎïÔº ²úÆ·¾­ÀíÓë²úÆ·¹ÜÀí