Apache > Hadoop > Core
 

DistCp

姒傝堪

DistCp锛堝垎甯冨紡鎷疯礉锛夋槸鐢ㄤ簬澶ц妯¢泦缇ゅ唴閮ㄥ拰闆嗙兢涔嬮棿鎷疯礉鐨勫伐鍏枫 瀹冧娇鐢∕ap/Reduce瀹炵幇鏂囦欢鍒嗗彂锛岄敊璇鐞嗗拰鎭㈠锛屼互鍙婃姤鍛婄敓鎴愩 瀹冩妸鏂囦欢鍜岀洰褰曠殑鍒楄〃浣滀负map浠诲姟鐨勮緭鍏ワ紝姣忎釜浠诲姟浼氬畬鎴愭簮鍒楄〃涓儴鍒嗘枃浠剁殑鎷疯礉銆 鐢变簬浣跨敤浜哅ap/Reduce鏂规硶锛岃繖涓伐鍏峰湪璇箟鍜屾墽琛屼笂閮戒細鏈夌壒娈婄殑鍦版柟銆 杩欑瘒鏂囨。浼氫负甯哥敤DistCp鎿嶄綔鎻愪緵鎸囧崡骞堕槓杩板畠鐨勫伐浣滄ā鍨嬨

浣跨敤鏂规硶

鍩烘湰浣跨敤鏂规硶

DistCp鏈甯哥敤鍦ㄩ泦缇や箣闂寸殑鎷疯礉锛

bash$ hadoop distcp hdfs://nn1:8020/foo/bar \
                    hdfs://nn2:8020/bar/foo

杩欐潯鍛戒护浼氭妸nn1闆嗙兢鐨/foo/bar鐩綍涓嬬殑鎵鏈夋枃浠舵垨鐩綍鍚嶅睍寮骞跺瓨鍌ㄥ埌涓涓复鏃舵枃浠朵腑锛岃繖浜涙枃浠跺唴瀹圭殑鎷疯礉宸ヤ綔琚垎閰嶇粰澶氫釜map浠诲姟锛 鐒跺悗姣忎釜TaskTracker鍒嗗埆鎵ц浠巒n1鍒皀n2鐨勬嫹璐濇搷浣溿傛敞鎰廌istCp浣跨敤缁濆璺緞杩涜鎿嶄綔銆

鍛戒护琛屼腑鍙互鎸囧畾澶氫釜婧愮洰褰曪細

bash$ hadoop distcp hdfs://nn1:8020/foo/a \
                    hdfs://nn1:8020/foo/b \
                    hdfs://nn2:8020/bar/foo

鎴栬呬娇鐢-f閫夐」锛屼粠鏂囦欢閲岃幏寰楀涓簮锛
bash$ hadoop distcp -f hdfs://nn1:8020/srclist \
                       hdfs://nn2:8020/bar/foo

鍏朵腑srclist 鐨勫唴瀹规槸
    hdfs://nn1:8020/foo/a
    hdfs://nn1:8020/foo/b

褰撲粠澶氫釜婧愭嫹璐濇椂锛屽鏋滀袱涓簮鍐茬獊锛孌istCp浼氬仠姝㈡嫹璐濆苟鎻愮ず鍑洪敊淇℃伅锛 濡傛灉鍦ㄧ洰鐨勪綅缃彂鐢熷啿绐侊紝浼氭牴鎹閫夐」璁剧疆瑙e喅銆 榛樿鎯呭喌浼氳烦杩囧凡缁忓瓨鍦ㄧ殑鐩爣鏂囦欢锛堟瘮濡備笉鐢ㄦ簮鏂囦欢鍋氭浛鎹㈡搷浣滐級銆傛瘡娆℃搷浣滅粨鏉熸椂 閮戒細鎶ュ憡璺宠繃鐨勬枃浠舵暟鐩紝浣嗘槸濡傛灉鏌愪簺鎷疯礉鎿嶄綔澶辫触浜嗭紝浣嗗湪涔嬪悗鐨勫皾璇曟垚鍔熶簡锛 閭d箞鎶ュ憡鐨勪俊鎭彲鑳戒笉澶熺簿纭紙璇峰弬鑰闄勫綍锛夈

姣忎釜TaskTracker蹇呴』閮借兘澶熶笌婧愮鍜岀洰鐨勭鏂囦欢绯荤粺杩涜璁块棶鍜屼氦浜掋 瀵逛簬HDFS鏉ヨ锛屾簮鍜岀洰鐨勭瑕佽繍琛岀浉鍚岀増鏈殑鍗忚鎴栬呬娇鐢ㄥ悜涓嬪吋瀹圭殑鍗忚銆 锛堣鍙傝涓嶅悓鐗堟湰闂寸殑鎷疯礉 锛夈

鎷疯礉瀹屾垚鍚庯紝寤鸿鐢熸垚婧愮鍜岀洰鐨勭鏂囦欢鐨勫垪琛紝骞朵氦鍙夋鏌ワ紝鏉ョ‘璁ゆ嫹璐濈湡姝f垚鍔熴 鍥犱负DistCp浣跨敤Map/Reduce鍜屾枃浠剁郴缁烝PI杩涜鎿嶄綔锛屾墍浠ヨ繖涓夎呮垨瀹冧滑涔嬮棿鏈変换浣曢棶棰 閮戒細褰卞搷鎷疯礉鎿嶄綔銆備竴浜汥istcp鍛戒护鐨勬垚鍔熸墽琛屽彲浠ラ氳繃鍐嶆鎵ц甯-update鍙傛暟鐨勮鍛戒护鏉ュ畬鎴愶紝 浣嗙敤鎴峰湪濡傛鎿嶄綔涔嬪墠搴旇瀵硅鍛戒护鐨勮娉曞緢鐔熸倝銆

鍊煎緱娉ㄦ剰鐨勬槸锛屽綋鍙︿竴涓鎴风鍚屾椂鍦ㄥ悜婧愭枃浠跺啓鍏ユ椂锛屾嫹璐濆緢鏈夊彲鑳戒細澶辫触銆 灏濊瘯瑕嗙洊HDFS涓婃鍦ㄨ鍐欏叆鐨勬枃浠剁殑鎿嶄綔涔熶細澶辫触銆 濡傛灉涓涓簮鏂囦欢鍦ㄦ嫹璐濅箣鍓嶈绉诲姩鎴栧垹闄や簡锛屾嫹璐濆け璐ュ悓鏃惰緭鍑哄紓甯 FileNotFoundException銆

閫夐」

閫夐」绱㈠紩

鏍囪瘑 鎻忚堪 澶囨敞
-p[rbugp] Preserve
  r: replication number
  b: block size
  u: user
  g: group
  p: permission
淇敼娆℃暟涓嶄細琚繚鐣欍傚苟涓斿綋鎸囧畾 -update 鏃讹紝鏇存柊鐨勭姸鎬浼 琚悓姝ワ紝闄ら潪鏂囦欢澶у皬涓嶅悓锛堟瘮濡傛枃浠惰閲嶆柊鍒涘缓锛夈
-i 蹇界暐澶辫触 灏卞儚鍦 闄勫綍涓彁鍒扮殑锛岃繖涓夐」浼氭瘮榛樿鎯呭喌鎻愪緵鍏充簬鎷疯礉鐨勬洿绮剧‘鐨勭粺璁★紝 鍚屾椂瀹冭繕灏嗕繚鐣欏け璐ユ嫹璐濇搷浣滅殑鏃ュ織锛岃繖浜涙棩蹇椾俊鎭彲浠ョ敤浜庤皟璇曘傛渶鍚庯紝濡傛灉涓涓猰ap澶辫触浜嗭紝浣嗗苟娌″畬鎴愭墍鏈夊垎鍧椾换鍔$殑灏濊瘯锛岃繖涓嶄細瀵艰嚧鏁翠釜浣滀笟鐨勫け璐ャ
-log <logdir> 璁板綍鏃ュ織鍒 <logdir> DistCp涓烘瘡涓枃浠剁殑姣忔灏濊瘯鎷疯礉鎿嶄綔閮借褰曟棩蹇楋紝骞舵妸鏃ュ織浣滀负map鐨勮緭鍑恒 濡傛灉涓涓猰ap澶辫触浜嗭紝褰撻噸鏂版墽琛屾椂杩欎釜鏃ュ織涓嶄細琚繚鐣欍
-m <num_maps> 鍚屾椂鎷疯礉鐨勬渶澶ф暟鐩 鎸囧畾浜嗘嫹璐濇暟鎹椂map鐨勬暟鐩傝娉ㄦ剰骞朵笉鏄痬ap鏁拌秺澶氬悶鍚愰噺瓒婂ぇ銆
-overwrite 瑕嗙洊鐩爣 濡傛灉涓涓猰ap澶辫触骞朵笖娌℃湁浣跨敤-i閫夐」锛屼笉浠呬粎閭d簺鎷疯礉澶辫触鐨勬枃浠讹紝杩欎釜鍒嗗潡浠诲姟涓殑鎵鏈夋枃浠堕兘浼氳閲嶆柊鎷疯礉銆 灏卞儚涓嬮潰鎻愬埌鐨勶紝瀹冧細鏀瑰彉鐢熸垚鐩爣璺緞鐨勮涔夛紝鎵浠 鐢ㄦ埛瑕佸皬蹇冧娇鐢ㄨ繖涓夐」銆
-update 濡傛灉婧愬拰鐩爣鐨勫ぇ灏忎笉涓鏍峰垯杩涜瑕嗙洊 鍍忎箣鍓嶆彁鍒扮殑锛岃繖涓嶆槸"鍚屾"鎿嶄綔銆 鎵ц瑕嗙洊鐨勫敮涓鏍囧噯鏄簮鏂囦欢鍜岀洰鏍囨枃浠跺ぇ灏忔槸鍚︾浉鍚岋紱濡傛灉涓嶅悓锛屽垯婧愭枃浠舵浛鎹㈢洰鏍囨枃浠躲 鍍 涓嬮潰鎻愬埌鐨勶紝瀹冧篃鏀瑰彉鐢熸垚鐩爣璺緞鐨勮涔夛紝 鐢ㄦ埛浣跨敤瑕佸皬蹇冦
-f <urilist_uri> 浣跨敤<urilist_uri> 浣滀负婧愭枃浠跺垪琛 杩欑瓑浠蜂簬鎶婃墍鏈夋枃浠跺悕鍒楀湪鍛戒护琛屼腑銆 urilist_uri 鍒楄〃搴旇鏄畬鏁村悎娉曠殑URI銆

鏇存柊鍜岃鐩

杩欓噷缁欏嚭涓浜 -update-overwrite鐨勪緥瀛愩 鑰冭檻涓涓粠/foo/a/foo/b/bar/foo鐨勬嫹璐濓紝婧愯矾寰勫寘鎷細

    hdfs://nn1:8020/foo/a
    hdfs://nn1:8020/foo/a/aa
    hdfs://nn1:8020/foo/a/ab
    hdfs://nn1:8020/foo/b
    hdfs://nn1:8020/foo/b/ba
    hdfs://nn1:8020/foo/b/ab

濡傛灉娌¤缃-update-overwrite閫夐」锛 閭d箞涓や釜婧愰兘浼氭槧灏勫埌鐩爣绔殑 /bar/foo/ab銆 濡傛灉璁剧疆浜嗚繖涓や釜閫夐」锛屾瘡涓簮鐩綍鐨勫唴瀹归兘浼氬拰鐩爣鐩綍鐨 鍐呭 鍋氭瘮杈冦侱istCp纰板埌杩欑被鍐茬獊鐨勬儏鍐典細缁堟鎿嶄綔骞堕鍑恒

榛樿鎯呭喌涓嬶紝/bar/foo/a/bar/foo/b 鐩綍閮戒細琚垱寤猴紝鎵浠ュ苟涓嶄細鏈夊啿绐併

鐜板湪鑰冭檻涓涓娇鐢-update鍚堟硶鐨勬搷浣:
distcp -update hdfs://nn1:8020/foo/a \
               hdfs://nn1:8020/foo/b \
               hdfs://nn2:8020/bar

鍏朵腑婧愯矾寰/澶у皬:

    hdfs://nn1:8020/foo/a
    hdfs://nn1:8020/foo/a/aa 32
    hdfs://nn1:8020/foo/a/ab 32
    hdfs://nn1:8020/foo/b
    hdfs://nn1:8020/foo/b/ba 64
    hdfs://nn1:8020/foo/b/bb 32

鍜岀洰鐨勮矾寰/澶у皬:

    hdfs://nn2:8020/bar
    hdfs://nn2:8020/bar/aa 32
    hdfs://nn2:8020/bar/ba 32
    hdfs://nn2:8020/bar/bb 64

浼氫骇鐢:

    hdfs://nn2:8020/bar
    hdfs://nn2:8020/bar/aa 32
    hdfs://nn2:8020/bar/ab 32
    hdfs://nn2:8020/bar/ba 64
    hdfs://nn2:8020/bar/bb 32

鍙湁nn2鐨aa鏂囦欢娌℃湁琚鐩栥傚鏋滄寚瀹氫簡 -overwrite閫夐」锛屾墍鏈夋枃浠堕兘浼氳瑕嗙洊銆

闄勫綍

Map鏁扮洰

DistCp浼氬皾璇曠潃鍧囧垎闇瑕佹嫹璐濈殑鍐呭锛岃繖鏍锋瘡涓猰ap鎷疯礉宸笉澶氱浉绛夊ぇ灏忕殑鍐呭銆 浣嗗洜涓烘枃浠舵槸鏈灏忕殑鎷疯礉绮掑害锛屾墍浠ラ厤缃鍔犲悓鏃舵嫹璐濓紙濡俶ap锛夌殑鏁扮洰涓嶄竴瀹氫細澧炲姞瀹為檯鍚屾椂鎷疯礉鐨勬暟鐩互鍙婃诲悶鍚愰噺銆

濡傛灉娌′娇鐢-m閫夐」锛孌istCp浼氬皾璇曞湪璋冨害宸ヤ綔鏃舵寚瀹歮ap鐨勬暟鐩 涓 min (total_bytes / bytes.per.map, 20 * num_task_trackers)锛 鍏朵腑bytes.per.map榛樿鏄256MB銆

寤鸿瀵逛簬闀挎椂闂磋繍琛屾垨瀹氭湡杩愯鐨勪綔涓氾紝鏍规嵁婧愬拰鐩爣闆嗙兢澶у皬銆佹嫹璐濇暟閲忓ぇ灏忎互鍙婂甫瀹借皟鏁磎ap鐨勬暟鐩

涓嶅悓HDFS鐗堟湰闂寸殑鎷疯礉

瀵逛簬涓嶅悓Hadoop鐗堟湰闂寸殑鎷疯礉锛岀敤鎴峰簲璇ヤ娇鐢℉ftpFileSystem銆 杩欐槸涓涓彧璇绘枃浠剁郴缁燂紝鎵浠istCp蹇呴』杩愯鍦ㄧ洰鏍囩闆嗙兢涓婏紙鏇寸‘鍒囩殑璇存槸鍦ㄨ兘澶熷啓鍏ョ洰鏍囬泦缇ょ殑TaskTracker涓婏級銆 婧愮殑鏍煎紡鏄 hftp://<dfs.http.address>/<path> 锛堥粯璁ゆ儏鍐dfs.http.address鏄 <namenode>:50070锛夈

Map/Reduce鍜屽壇鏁堝簲

鍍忓墠闈㈡彁鍒扮殑锛宮ap鎷疯礉杈撳叆鏂囦欢澶辫触鏃讹紝浼氬甫鏉ヤ竴浜涘壇鏁堝簲銆

  • 闄ら潪浣跨敤浜-i锛屼换鍔′骇鐢熺殑鏃ュ織浼氳鏂扮殑灏濊瘯鏇挎崲鎺夈
  • 闄ら潪浣跨敤浜-overwrite锛屾枃浠惰涔嬪墠鐨刴ap鎴愬姛鎷疯礉鍚庡綋鍙堜竴娆℃墽琛屾嫹璐濇椂浼氳鏍囪涓 "琚拷鐣"銆
  • 濡傛灉map澶辫触浜mapred.map.max.attempts娆★紝鍓╀笅鐨刴ap浠诲姟浼氳缁堟锛堥櫎闈炰娇鐢ㄤ簡-i)銆
  • 濡傛灉mapred.speculative.execution琚缃负 finaltrue锛屽垯鎷疯礉鐨勭粨鏋滄槸鏈畾涔夌殑銆