
Ò»¡¢ÒýÑÔ£º
HadoopµÄHDFS¼¯Èº·Ç³£ÈÝÒ׳öÏÖ»úÆ÷Óë»úÆ÷Ö®¼ä´ÅÅÌÀûÓÃÂʲ»Æ½ºâµÄÇé¿ö£¬±ÈÈ缯ȺÖÐÌí¼ÓеÄÊý¾Ý½Úµã£¬½ÚµãÓë½ÚµãÖ®¼ä´ÅÅÌ´óС²»Ò»ÑùµÈµÈ¡£µ±hdfs³öÏÖ²»Æ½ºâ×´¿öµÄʱºò£¬½«Òý·¢ºÜ¶àÎÊÌ⣬±ÈÈçMR³ÌÐòÎÞ·¨ºÜºÃµØÀûÓñ¾µØ¼ÆËãµÄÓÅÊÆ£¬»úÆ÷Ö®¼äÎÞ·¨´ïµ½¸üºÃµÄÍøÂç´ø¿íʹÓÃÂÊ£¬»úÆ÷´ÅÅÌÎÞ·¨ÀûÓõȵȡ£
¶þ¡¢ÎÊÌ⣺
ÒòÒµÎñÐèÒª´î½¨Ò»¸öÐÂhadoop¼¯Èº£¬²¢½«ÀϵÄhadoop¼¯ÈºÖеÄÊý¾ÝÇ¨ÒÆÖÁеÄhadoop¼¯Èº£¬¶øÇÒdatanode½Úµã²»ÄÜÈ«²¿ÉÏÏߣ¬ÆäÖл¹¿ÉÄÜ»á³öÏÖ½ÚµãÉÏÏß»òÏÂÏßµÄÇé¿ö£¬Õâ¸öʱºò¾ÍºÜÈÝÒ׳öÏÖ»úÆ÷Óë»úÆ÷Ö®¼ä´ÅÅ̵ľùºâµÄÇé¿ö£¬¾ßÌåÈçÏ£º 
ÉÏͼÖпÉÒÔ¿´³ömaxÊÇ94.18%£¬¶øminÊÇ0.37%£¬ÆäÖÐÓÐ600¶ą̀ÊÇ´ïµ½94%µÄ£¬Õâ¸öʱºòÔÚÅÜmapredµÄʱºòÍùÍù»á±¨´íÎó£º 
µÇ½µ½¸Ã»úÆ÷Éϲ鿴·þÎñÆ÷µÄ´ÅÅÌ£¬´ÅÅ̶¼¿ìÒѾ´ïµ½100%£¬ÈçÏ£º

ÒòΪÎÒÃÇÔÚhdfs-site.xmlÖÐÉèÖÃÁËdfs.datanode.du.reservedµÄÖµ£¬ËùÒÔ´ÅÅÌ»áÓÐÒ»¶¨Ô¤Áô¿Õ¼ä£º
<property> <name>dfs.datanode.du.reserved</name> <value>107374182400</value> </property> |
ÉÏÃæÕâ¸ö²ÎÊýµÄÒâ˼£º
Reserved space in bytes per volume. Always leave this
much space free for non dfs use.
Ôٲ鿴datanodeÈÕÖ¾£¬Ï£ÍûÄÜÕÒµ½¿É¿¿µÄÏßË÷£º

ÕâÖÖ´íÎóÎÞ·¨Í¨¹ýnamenodeÀ´±ÜÃ⣬ÒòΪËü²»»áÔÙfailedµÄʱºòÈ¥³¢ÊÔÍù±ðµÄ½ÚµãдÊý£¬ ×î³õµÄ°ì·¨Êǽ«¸Ã½ÚµãµÄdatanode¹Ø±Õµô£¬¾ÍÄÜ˳ÀûµØÅÜÍêÕâ¸ömapreduce¡£

ÔÙÕ߲鿴namenodeµÄÒ³Ãæ£¬¿´µ½ÓкöàdatanodeµÄ½ÚµãµÄRemaining¿ìÒªÇ÷ÓÚ0BÁË£¬Õâ¸öʱºò¾ÍºÜÈÝÒ׳öÏÖÉÏÃæµÄ±¨´í¡£
ΪÁË·ÀÖ¹ÉÏÃæµÄ±¨´íÔٴγöÏÖÒÔ¼°±ÜÃâhdfsÊý¾Ý²»¾ùºâ£¬¶Ôhadoop¼¯Èº×öbalanceÒѾ²»¿É±ÜÃâÁË!
¶þ¡¢½â¾ö·½°¸
1¡¢balancer
´ó¼ÒÊ×ÏÈ»áÏëµ½hadoop×Ô´øµÄbalancer£¬ÄǾÍÏȽéÉÜÒ»ÏÂbalancer!
Balancer.javaÖÐÊÇÕâôÃèÊöbalancerµÄ£º
The balancer is a tool that balances disk space usage
on an HDFS cluster when some datanodes become full or
when new empty nodes join the cluster.
The tool is deployed as an application program that
can be run by the cluster administrator on a live HDFS
cluster while applications adding and deleting files.
ÏÂÃæµÄͼƬÊǹÙÍøÖÐbalancerÃüÁîµÃÏê½â£º

¿¼Âǵ½balancerÊÇ×î½üÐèÒª¾³£×öµÄ²Ù×÷£¬ËùÒÔÎÒÃÇ×Ô¼º¿ª·¢ÁËÒ»¸ö²é¿´balancerÇé¿öµÄÒ³Ãæ£¬½á¹ûÈçÏ£º

ÉÏͼ¿ÉÒÔ¿´µ½Ã¿¸ö¼¯ÈºÏÂbalancerÖ´ÐÐÇé¿ö¡£
balanceÒ»ÌìÄܳɹ¦Òƶ¯µÄÊý¾ÝÁ¿´óÔ¼ÔÚ10-20T£¬Õâ¸öÊý¾ÝÁ¿ºÜÄÑÂú×㳬´ó¼¯Èº¡£
ĿǰÎÒÃǵ÷ÓÃbalance»áʹÓÃÈçÏÂÃüÁ
start-balancer.sh -threshold 20 -policy blockpool -include -f /tmp/ip.txt |
ÉÏÃæµÄÃüÁîͨ¹ýÊÖ¹¤É¸Ñ¡³ö´ÅÅ̸ߵĺʹÅÅ̵͵ķÅÔÚip.txtÎļþÖУ¬ÕâÑùbalance¾Íֻͨ¹ýÕâÎļþÀïµÄÁË£¬ÁíÍ⻹ÐèÒªÉèÖÃÊʵ±µÄthresholdÖµ£¬ÒòΪÊǶànamespaceµÄ£¬ËùÒÔÐèҪѡÔñblockpoolģʽ¡£
ÁíÍâ´ø¿íÒ²ÊÇÏÞÖÆbalanceµÄÒ»¸öÒòËØ£¬ÔÚhdfs-site.xmlÖÐÊÇÓÐÉèÖõģº
<property> <name>dfs.datanode.balance.bandwidthPerSec</name> <value>10485760</value> </property> |
µ«ÊÇÕâ¸öÐèÒªÖØÆô£¬hadoopÌṩÁËÒ»¸ö¶¯Ì¬µ÷ÕûµÄÃüÁ
hdfs dfsadmin -fs hdfs://ns1:8020 -setBalancerBandwidth 104857600 hdfs dfsadmin -fs hdfs://ns2:8020 -setBalancerBandwidth 104857600 |
2¡¢ÉÏϽڵ㣺
Æäʵ½«¸ß´ÅÅ̵ĽڵãÇ¿ÖÆDecommissionÊÇ×î¿ì×îÓÐЧµÄ·½°¸¡£
ϽڵãµÄʱºò¿ÉÄÜ»á³öÏÖÓÐns²»ÄÜÕý³£ÏµôµÄÇé¿ö£¬ÆäʵÕâ¸öʱºò½ÚµãµÄÊý¾Ý´ó²¿·ÖÒÑ¾ÒÆ³öÈ¥ÁË£¬¿ÉÄÜÓÐһЩ¿é¿¨ÔÚÄDZßûÓÐÒÆ³öÈ¥¡£
Õâ¸öʱºòÖ»ÄÜÒ»¸öÒ»¸ö½Úµã½«ÒѾDecommissioned½Úµãstopµôdatanode½ø³Ì£¬Èç¹ûÔÚnamenodeµÄÒ³ÃæÉÏ¿´µ½ÓжªÊ§¿éµÄ»°£¬¾ÍÐèÒª½«Õâ¸ö¿éÏÈgetµ½±¾µØ£¬ÔÚputÉÏÈ¥¡£ÀýÈ磺
hdfs dfs -get hdfs://ns1/test1/dt=2016-07-24/000816_0.lzo hdfs dfs -put -f 000816_0.lzo hdfs://ns1/test1/dt=2016-07-24/000816_0.lzo hdfs dfs -chown test1:test1 hdfs://ns1/test1/dt=2016-07-24/000816_0.lzo |
ǰÌáÌõ¼þÐèÒª½«Õâ¸ö½ÚµãµÄdatanodeÖØÐÂÆô¶¯¡£
3¡¢Éý½µÊý¾Ý¸±±¾£º
Éý½µ¸±±¾ÊÇÒ»¸öÆÈ²»µÃÒѵİ취£¬ÕâÑùÈç¹ûdatanodeÓйҵô½Úµã£¬¾Í»áÔö¼Ó¶ªÊ§¿éµÄ¼¸ÂÊ¡£
¾ßÌå½µ¸±±¾µÄÃüÁîÈçÏ£º
hdfs dfs -setrep -R -w 2 hdfs://ns1/tmp/test.db |
Éý¸±±¾µÄÃüÁîÈçÏ£º
hdfs dfs -setrep -R -w 3 hdfs://ns1/tmp/test.db |
ÉÏÃæµÄÃüÁîÊǽ«ns1ϵÄ/tmp/test.db¸±±¾Êý½µÖÁ2¸ö£¬È»ºóÓÖ½«ËüÉýÖÁ3¸ö¸±±¾¡£¾ßÌåµÄhdfs
dfs -setrepÃüÁîÈçÏÂͼ£º

ÕâÑù¶¯Ì¬µÄÉý½µ¸±±¾¿ÉÒÔ½â¾ö¡£
ÁíÍâÔÚÉý½µ¸±±¾µÄÓöµ½Ò»¸öBUG£º

ÍÆ²â¿ÉÄÜÊÇnamenodeµÄreplicationsÄ£¿éÓк»×¡Çé¿ö£¬ËùÒÔ³öÏÖ¸ÃÇé¿öÖ´ÐÐkillµô½øÐУ¬Ìø¹ý¸Ã¿éÔÙÅÜ!
×ܽ᣺֮ËùÒÔÑ¡ÔñʹÓÃÉý½µ¸±±¾ÊÇÒòΪËü²»ÊÜ´ø¿íµÄ¿ØÖÆ£¬ÁíÍâÔÚÉý½µ¸±±¾µÄʱºòhadoopÊÇÐèÒªÖØÐÂдÊýµÄ£¬Õâ¸öʱºòËü»áÓÅÏÈÍù´ÅÅ̵ÍдÊý¾Ý£¬ÕâÑù¾ÍÄܽ«´ÅÅ̸ߵÄÊý¾ÝÇ¨ÒÆÖÁ´ÅÅ̵͵ġ£
4¡¢distcp
DistCp (distributed copy) is a tool used for large inter/intra-cluster
copying. It uses MapReduce to effect its distribution,
error handling and recovery, and reporting. It expands
a list of files and directories into input to map tasks,
each of which will copy a partition of the files specified
in the source list. Its MapReduce pedigree has endowed
it with some quirks in both its semantics and execution.
The purpose of this document is to offer guidance for
common tasks and to elucidate its model.
ÔÚÕâÀï¾ÙÒ»¸öÀý×Ó£º


ͨ¹ýdistcp½«/tmp/output12ÉϵÄÊý¾Ýµ÷ÓÃmapreduceÇ¨ÒÆÖÁ/tmp/zhulhĿ¼Ï£¬ÔÏÈ/tmp/output12ÉϵÄÊý¾Ý»¹ÊÇÓдæÔڵ쬵«ÊÇËüµÄ¿é¾Í·¢ÉúÁ˱仯¡£
Õâ¸öʱºòÓÐÈË¿ÉÄÜ»á˵Ôõô²»Ê¹ÓÃcpÃüÁîÄØ?
Á½ÕßµÄÇø±ðÈçÏ£º
CPµÄģʽÊDz»×ßmapreduceµÄ;DISTCPµÄģʽÊÇ×ßmapreduceµÄ£¬ËùÒÔËüÓÅÏÈдÓÐnodemanagerµÄ»úÆ÷;
CPÊǵ¥Ï̵߳ģ¬ÀàËÆscpµÄģʽ£¬ÔÚÖ´ÐÐËÙ¶ÈÉϱÈDISTCPÒªÂýºÜ¶à¡£
5¡¢Ìá¸ßdfs.datanode.du.reservedÖµ
¹ÙÍøÊÇÕâô˵µÄ£ºReserved space in bytes per volume. Always leave
this much space free for non dfs use.
ÔÚÉÏÃæµÄÌáµ½dfs.datanode.du.reservedµÄÖµÊÇÉè³É100G£¬ÒòΪnamenodeÈÏΪ¸Ã½Úµã»¹ÓÐÊ£ÓàµÄ¿Õ¼ä£¬ËùÒÔ¸ø·ÖÅäµ½ÕâÀ¼ÙÈçÕâ¸ö¿éÊÇ128K£¬µ«ÊÇʵ¼ÊÊ£Óà¿Õ¼äÖ»ÓÐ100K£¬ËùÒԾͻᱨÉÏÃæµÄ´íÎ󣬼ÙÈç°Ñdfs.datanode.du.reserved³É300G£¬ÈÃnamenodeÖªµÀ¸Ã½ÚµãÒѾûÓÐÊ£Óà¿Õ¼ä£¬ËùÒԾͲ»»áÍùÕâÀïдÊý¾ÝÁË¡£
6¡¢¹Ø±Õnodemanger½ø³Ì
ÔÚÏÖÓмÆËã×ÊÔ´¶àÓàµÄÇé¿öÏ£¬¿ÉÒÔ¿¼ÂǹرոߴÅÅ̽ڵãµÄnodemanager£¬±ÜÃâÔڸýڵãÆðYarnChild£¬ÒòΪÈç¹ûÔڸýڵãÉϽøÐмÆËãµÄ»°£¬Êý¾Ý´æ´¢Ê×ÏÈ»áÍù±¾µØÐ´Ò»·Ý£¬ÕâÑù¸ü¼Ó¼ÓÖØÁ˱¾µØ½ÚµãµÄ¸ºµ£¡£
7¡¢É¾³ý¾ÉÊý¾Ý
¸Ã·½°¸ÊÇÔÚÆÈ²»µÃÒѵÄÇé¿öϽøÐеģ¬ÒòΪɾµôµÄÊý¾Ý¿ÉÄÜÒԺ󻹵ò¹»ØÀ´£¬ÕâÑùµÄ»°ÓÖÊǵÃÒªÀË·ÑÒ»¶¨µÄʱ¼ä¡£
ÁíÍâÔÚɾ³ýÊý¾Ýʱºò¾ÍµÃÐèÒªÌø¹ý»ØÊÕÕ¾²ÅÄÜËãÊÇÕæÕýɾ³ý£¬¿ÉÒÔʹÓõÄÃüÁîÈçÏ£º

Èý¡¢·½°¸Ñ¡Ôñ
¿¼Âǵ½Óжà´ï600̨»úÆ÷´ÅÅÌʹÓÃÂÊ´ïµ½94%£¬¶øÇÒÕⲿ·Ö¸ßµÄ»úÆ÷ÊÇÔÚͬһ¸ö»ú·¿µÄ£¬ËùÒÔ²»ÄܲÉÓÃÉÏϽڵãµÄ·½·¨£¬×îºÃµÄ°ì·¨ÈçÏ£º
1¡¢Ìá¸ßdfs.datanode.du.reservedµÄÖµ;
2¡¢¹Ø±Õnodemanager½ø³Ì;
3¡¢Éý½µ¸±±¾;
4¡¢Æô¶¯hadoop×Ô´øµÄbalance;
È˹¤µÄ¶¨ÆÚ¹Û²ì£¬µ±´ïµ½ÆÚÍûµÄЧ¹ûµÄʱºò¾ÍÊǻָ´³ÉÔÑù;ÔÚÌá¸ßdfs.datanode.du.reservedµÄÖµ¾ÍµÃÐèÒª¿¼Âǵ½datanodeÐèÒª½øÐÐÂÖѯµÄÖØÆô£¬Õâ¸öʱºò¾Í¿¼Âǵ½Ê±¼ä¼ä¸ô£¬Èç¹ûʱ¼ä¹ý¶Ì¾Í¿ÉÄܾͶª£¬Èç¹û¹ý³¤¾ÍÊǷѵÄʱ¼ä±È½Ï¶à¡£
ÕâÖÖ·½·¨ºÃ±È£º±ÈÈçÔÚ½Ú¼ÙÈÕµÄʱºò£¬Ä³¸öÊշѿڵijµÁ¾Ìرð¶à£¬ÄǸöʱºòÖ´·¨ÈËÔ±¾Í»á·â±ÕÕâ¸öÊÕ·ÑÕ¾µÄ³ö¿Ú£¬µÈ³µÁ¾¹ýµÄ²î²»¶àµÄʱºòÔÙ¸ø¿ª·Å¡£Õâ´ÎµÄ·½°¸ÓÐÕâ¸öÓеãÀàËÆ£¬µ±Ö÷»úµÄdfs.datanode.du.reservedÖµ¸ßÓÚĿǰ´ÅÅÌʹÓõÄÇé¿ö£¬namenode¾Í²»»á·ÖÅäÊý¾Ý¹ýÀ´ÁË£¬Í¨¹ýÉý½µ¸±±¾ºÍbalanceÄÜ¿ìËٵĽ«±¾»úµÄÊý¾Ý×ªÒÆ×ß¡£
ËÄ¡¢½áÊøÓï
±¾ÆªÎÄÕÂÖ÷Òª½éÉÜÁ˶ÔhadoopÊý¾Ý³öÏÖ²»¾ùºâÇé¿öÏ¿ÉÒÔʹÓõķ½·¨£¬¼°ÎÒÃÇÇé¿öÏÂʹÓõķ½°¸! |