DistCp
姒傝堪
DistCp锛堝垎甯冨紡鎷疯礉锛夋槸鐢ㄤ簬澶ц妯¢泦缇ゅ唴閮ㄥ拰闆嗙兢涔嬮棿鎷疯礉鐨勫伐鍏枫 瀹冧娇鐢∕ap/Reduce瀹炵幇鏂囦欢鍒嗗彂锛岄敊璇鐞嗗拰鎭㈠锛屼互鍙婃姤鍛婄敓鎴愩 瀹冩妸鏂囦欢鍜岀洰褰曠殑鍒楄〃浣滀负map浠诲姟鐨勮緭鍏ワ紝姣忎釜浠诲姟浼氬畬鎴愭簮鍒楄〃涓儴鍒嗘枃浠剁殑鎷疯礉銆 鐢变簬浣跨敤浜哅ap/Reduce鏂规硶锛岃繖涓伐鍏峰湪璇箟鍜屾墽琛屼笂閮戒細鏈夌壒娈婄殑鍦版柟銆 杩欑瘒鏂囨。浼氫负甯哥敤DistCp鎿嶄綔鎻愪緵鎸囧崡骞堕槓杩板畠鐨勫伐浣滄ā鍨嬨
浣跨敤鏂规硶
鍩烘湰浣跨敤鏂规硶
DistCp鏈甯哥敤鍦ㄩ泦缇や箣闂寸殑鎷疯礉锛
bash$ hadoop distcp hdfs://nn1:8020/foo/bar \
hdfs://nn2:8020/bar/foo
杩欐潯鍛戒护浼氭妸nn1闆嗙兢鐨/foo/bar鐩綍涓嬬殑鎵鏈夋枃浠舵垨鐩綍鍚嶅睍寮骞跺瓨鍌ㄥ埌涓涓复鏃舵枃浠朵腑锛岃繖浜涙枃浠跺唴瀹圭殑鎷疯礉宸ヤ綔琚垎閰嶇粰澶氫釜map浠诲姟锛 鐒跺悗姣忎釜TaskTracker鍒嗗埆鎵ц浠巒n1鍒皀n2鐨勬嫹璐濇搷浣溿傛敞鎰廌istCp浣跨敤缁濆璺緞杩涜鎿嶄綔銆
鍛戒护琛屼腑鍙互鎸囧畾澶氫釜婧愮洰褰曪細
bash$ hadoop distcp hdfs://nn1:8020/foo/a \
hdfs://nn1:8020/foo/b \
hdfs://nn2:8020/bar/foo
鎴栬呬娇鐢-f閫夐」锛屼粠鏂囦欢閲岃幏寰楀涓簮锛
bash$ hadoop distcp -f hdfs://nn1:8020/srclist \
hdfs://nn2:8020/bar/foo
鍏朵腑srclist 鐨勫唴瀹规槸
hdfs://nn1:8020/foo/a
hdfs://nn1:8020/foo/b
褰撲粠澶氫釜婧愭嫹璐濇椂锛屽鏋滀袱涓簮鍐茬獊锛孌istCp浼氬仠姝㈡嫹璐濆苟鎻愮ず鍑洪敊淇℃伅锛 濡傛灉鍦ㄧ洰鐨勪綅缃彂鐢熷啿绐侊紝浼氭牴鎹閫夐」璁剧疆瑙e喅銆 榛樿鎯呭喌浼氳烦杩囧凡缁忓瓨鍦ㄧ殑鐩爣鏂囦欢锛堟瘮濡備笉鐢ㄦ簮鏂囦欢鍋氭浛鎹㈡搷浣滐級銆傛瘡娆℃搷浣滅粨鏉熸椂 閮戒細鎶ュ憡璺宠繃鐨勬枃浠舵暟鐩紝浣嗘槸濡傛灉鏌愪簺鎷疯礉鎿嶄綔澶辫触浜嗭紝浣嗗湪涔嬪悗鐨勫皾璇曟垚鍔熶簡锛 閭d箞鎶ュ憡鐨勪俊鎭彲鑳戒笉澶熺簿纭紙璇峰弬鑰闄勫綍锛夈
姣忎釜TaskTracker蹇呴』閮借兘澶熶笌婧愮鍜岀洰鐨勭鏂囦欢绯荤粺杩涜璁块棶鍜屼氦浜掋 瀵逛簬HDFS鏉ヨ锛屾簮鍜岀洰鐨勭瑕佽繍琛岀浉鍚岀増鏈殑鍗忚鎴栬呬娇鐢ㄥ悜涓嬪吋瀹圭殑鍗忚銆 锛堣鍙傝涓嶅悓鐗堟湰闂寸殑鎷疯礉 锛夈
鎷疯礉瀹屾垚鍚庯紝寤鸿鐢熸垚婧愮鍜岀洰鐨勭鏂囦欢鐨勫垪琛紝骞朵氦鍙夋鏌ワ紝鏉ョ‘璁ゆ嫹璐濈湡姝f垚鍔熴 鍥犱负DistCp浣跨敤Map/Reduce鍜屾枃浠剁郴缁烝PI杩涜鎿嶄綔锛屾墍浠ヨ繖涓夎呮垨瀹冧滑涔嬮棿鏈変换浣曢棶棰 閮戒細褰卞搷鎷疯礉鎿嶄綔銆備竴浜汥istcp鍛戒护鐨勬垚鍔熸墽琛屽彲浠ラ氳繃鍐嶆鎵ц甯-update鍙傛暟鐨勮鍛戒护鏉ュ畬鎴愶紝 浣嗙敤鎴峰湪濡傛鎿嶄綔涔嬪墠搴旇瀵硅鍛戒护鐨勮娉曞緢鐔熸倝銆
鍊煎緱娉ㄦ剰鐨勬槸锛屽綋鍙︿竴涓鎴风鍚屾椂鍦ㄥ悜婧愭枃浠跺啓鍏ユ椂锛屾嫹璐濆緢鏈夊彲鑳戒細澶辫触銆 灏濊瘯瑕嗙洊HDFS涓婃鍦ㄨ鍐欏叆鐨勬枃浠剁殑鎿嶄綔涔熶細澶辫触銆 濡傛灉涓涓簮鏂囦欢鍦ㄦ嫹璐濅箣鍓嶈绉诲姩鎴栧垹闄や簡锛屾嫹璐濆け璐ュ悓鏃惰緭鍑哄紓甯 FileNotFoundException銆
閫夐」
閫夐」绱㈠紩
鏍囪瘑 | 鎻忚堪 | 澶囨敞 |
---|---|---|
-p[rbugp] | Preserve r: replication number b: block size u: user g: group p: permission |
淇敼娆℃暟涓嶄細琚繚鐣欍傚苟涓斿綋鎸囧畾 -update 鏃讹紝鏇存柊鐨勭姸鎬涓浼 琚悓姝ワ紝闄ら潪鏂囦欢澶у皬涓嶅悓锛堟瘮濡傛枃浠惰閲嶆柊鍒涘缓锛夈 |
-i | 蹇界暐澶辫触 | 灏卞儚鍦 闄勫綍涓彁鍒扮殑锛岃繖涓夐」浼氭瘮榛樿鎯呭喌鎻愪緵鍏充簬鎷疯礉鐨勬洿绮剧‘鐨勭粺璁★紝 鍚屾椂瀹冭繕灏嗕繚鐣欏け璐ユ嫹璐濇搷浣滅殑鏃ュ織锛岃繖浜涙棩蹇椾俊鎭彲浠ョ敤浜庤皟璇曘傛渶鍚庯紝濡傛灉涓涓猰ap澶辫触浜嗭紝浣嗗苟娌″畬鎴愭墍鏈夊垎鍧椾换鍔$殑灏濊瘯锛岃繖涓嶄細瀵艰嚧鏁翠釜浣滀笟鐨勫け璐ャ |
-log <logdir> | 璁板綍鏃ュ織鍒 <logdir> | DistCp涓烘瘡涓枃浠剁殑姣忔灏濊瘯鎷疯礉鎿嶄綔閮借褰曟棩蹇楋紝骞舵妸鏃ュ織浣滀负map鐨勮緭鍑恒 濡傛灉涓涓猰ap澶辫触浜嗭紝褰撻噸鏂版墽琛屾椂杩欎釜鏃ュ織涓嶄細琚繚鐣欍 |
-m <num_maps> | 鍚屾椂鎷疯礉鐨勬渶澶ф暟鐩 | 鎸囧畾浜嗘嫹璐濇暟鎹椂map鐨勬暟鐩傝娉ㄦ剰骞朵笉鏄痬ap鏁拌秺澶氬悶鍚愰噺瓒婂ぇ銆 |
-overwrite | 瑕嗙洊鐩爣 | 濡傛灉涓涓猰ap澶辫触骞朵笖娌℃湁浣跨敤-i閫夐」锛屼笉浠呬粎閭d簺鎷疯礉澶辫触鐨勬枃浠讹紝杩欎釜鍒嗗潡浠诲姟涓殑鎵鏈夋枃浠堕兘浼氳閲嶆柊鎷疯礉銆 灏卞儚涓嬮潰鎻愬埌鐨勶紝瀹冧細鏀瑰彉鐢熸垚鐩爣璺緞鐨勮涔夛紝鎵浠 鐢ㄦ埛瑕佸皬蹇冧娇鐢ㄨ繖涓夐」銆 |
-update | 濡傛灉婧愬拰鐩爣鐨勫ぇ灏忎笉涓鏍峰垯杩涜瑕嗙洊 | 鍍忎箣鍓嶆彁鍒扮殑锛岃繖涓嶆槸"鍚屾"鎿嶄綔銆 鎵ц瑕嗙洊鐨勫敮涓鏍囧噯鏄簮鏂囦欢鍜岀洰鏍囨枃浠跺ぇ灏忔槸鍚︾浉鍚岋紱濡傛灉涓嶅悓锛屽垯婧愭枃浠舵浛鎹㈢洰鏍囨枃浠躲 鍍 涓嬮潰鎻愬埌鐨勶紝瀹冧篃鏀瑰彉鐢熸垚鐩爣璺緞鐨勮涔夛紝 鐢ㄦ埛浣跨敤瑕佸皬蹇冦 |
-f <urilist_uri> | 浣跨敤<urilist_uri> 浣滀负婧愭枃浠跺垪琛 | 杩欑瓑浠蜂簬鎶婃墍鏈夋枃浠跺悕鍒楀湪鍛戒护琛屼腑銆 urilist_uri 鍒楄〃搴旇鏄畬鏁村悎娉曠殑URI銆 |
鏇存柊鍜岃鐩
杩欓噷缁欏嚭涓浜 -update鍜 -overwrite鐨勪緥瀛愩 鑰冭檻涓涓粠/foo/a 鍜 /foo/b 鍒 /bar/foo鐨勬嫹璐濓紝婧愯矾寰勫寘鎷細
hdfs://nn1:8020/foo/a
hdfs://nn1:8020/foo/a/aa
hdfs://nn1:8020/foo/a/ab
hdfs://nn1:8020/foo/b
hdfs://nn1:8020/foo/b/ba
hdfs://nn1:8020/foo/b/ab
濡傛灉娌¤缃-update鎴 -overwrite閫夐」锛 閭d箞涓や釜婧愰兘浼氭槧灏勫埌鐩爣绔殑 /bar/foo/ab銆 濡傛灉璁剧疆浜嗚繖涓や釜閫夐」锛屾瘡涓簮鐩綍鐨勫唴瀹归兘浼氬拰鐩爣鐩綍鐨 鍐呭 鍋氭瘮杈冦侱istCp纰板埌杩欑被鍐茬獊鐨勬儏鍐典細缁堟鎿嶄綔骞堕鍑恒
榛樿鎯呭喌涓嬶紝/bar/foo/a 鍜 /bar/foo/b 鐩綍閮戒細琚垱寤猴紝鎵浠ュ苟涓嶄細鏈夊啿绐併
鐜板湪鑰冭檻涓涓娇鐢-update鍚堟硶鐨勬搷浣:
distcp -update hdfs://nn1:8020/foo/a \
hdfs://nn1:8020/foo/b \
hdfs://nn2:8020/bar
鍏朵腑婧愯矾寰/澶у皬:
hdfs://nn1:8020/foo/a
hdfs://nn1:8020/foo/a/aa 32
hdfs://nn1:8020/foo/a/ab 32
hdfs://nn1:8020/foo/b
hdfs://nn1:8020/foo/b/ba 64
hdfs://nn1:8020/foo/b/bb 32
鍜岀洰鐨勮矾寰/澶у皬:
hdfs://nn2:8020/bar
hdfs://nn2:8020/bar/aa 32
hdfs://nn2:8020/bar/ba 32
hdfs://nn2:8020/bar/bb 64
浼氫骇鐢:
hdfs://nn2:8020/bar
hdfs://nn2:8020/bar/aa 32
hdfs://nn2:8020/bar/ab 32
hdfs://nn2:8020/bar/ba 64
hdfs://nn2:8020/bar/bb 32
鍙湁nn2鐨aa鏂囦欢娌℃湁琚鐩栥傚鏋滄寚瀹氫簡 -overwrite閫夐」锛屾墍鏈夋枃浠堕兘浼氳瑕嗙洊銆
闄勫綍
Map鏁扮洰
DistCp浼氬皾璇曠潃鍧囧垎闇瑕佹嫹璐濈殑鍐呭锛岃繖鏍锋瘡涓猰ap鎷疯礉宸笉澶氱浉绛夊ぇ灏忕殑鍐呭銆 浣嗗洜涓烘枃浠舵槸鏈灏忕殑鎷疯礉绮掑害锛屾墍浠ラ厤缃鍔犲悓鏃舵嫹璐濓紙濡俶ap锛夌殑鏁扮洰涓嶄竴瀹氫細澧炲姞瀹為檯鍚屾椂鎷疯礉鐨勬暟鐩互鍙婃诲悶鍚愰噺銆
濡傛灉娌′娇鐢-m閫夐」锛孌istCp浼氬皾璇曞湪璋冨害宸ヤ綔鏃舵寚瀹歮ap鐨勬暟鐩 涓 min (total_bytes / bytes.per.map, 20 * num_task_trackers)锛 鍏朵腑bytes.per.map榛樿鏄256MB銆
寤鸿瀵逛簬闀挎椂闂磋繍琛屾垨瀹氭湡杩愯鐨勪綔涓氾紝鏍规嵁婧愬拰鐩爣闆嗙兢澶у皬銆佹嫹璐濇暟閲忓ぇ灏忎互鍙婂甫瀹借皟鏁磎ap鐨勬暟鐩
涓嶅悓HDFS鐗堟湰闂寸殑鎷疯礉
瀵逛簬涓嶅悓Hadoop鐗堟湰闂寸殑鎷疯礉锛岀敤鎴峰簲璇ヤ娇鐢℉ftpFileSystem銆 杩欐槸涓涓彧璇绘枃浠剁郴缁燂紝鎵浠istCp蹇呴』杩愯鍦ㄧ洰鏍囩闆嗙兢涓婏紙鏇寸‘鍒囩殑璇存槸鍦ㄨ兘澶熷啓鍏ョ洰鏍囬泦缇ょ殑TaskTracker涓婏級銆 婧愮殑鏍煎紡鏄 hftp://<dfs.http.address>/<path> 锛堥粯璁ゆ儏鍐dfs.http.address鏄 <namenode>:50070锛夈
Map/Reduce鍜屽壇鏁堝簲
鍍忓墠闈㈡彁鍒扮殑锛宮ap鎷疯礉杈撳叆鏂囦欢澶辫触鏃讹紝浼氬甫鏉ヤ竴浜涘壇鏁堝簲銆
- 闄ら潪浣跨敤浜-i锛屼换鍔′骇鐢熺殑鏃ュ織浼氳鏂扮殑灏濊瘯鏇挎崲鎺夈
- 闄ら潪浣跨敤浜-overwrite锛屾枃浠惰涔嬪墠鐨刴ap鎴愬姛鎷疯礉鍚庡綋鍙堜竴娆℃墽琛屾嫹璐濇椂浼氳鏍囪涓 "琚拷鐣"銆
- 濡傛灉map澶辫触浜mapred.map.max.attempts娆★紝鍓╀笅鐨刴ap浠诲姟浼氳缁堟锛堥櫎闈炰娇鐢ㄤ簡-i)銆
- 濡傛灉mapred.speculative.execution琚缃负 final鍜true锛屽垯鎷疯礉鐨勭粨鏋滄槸鏈畾涔夌殑銆