hadoop之环境配置

hadoop

对于Hadoop来说,最主要的是两个方面,一个是分布式文件系统HDFS,另一个是MapReduce计算模型,下面是搭建Hadoop 环境过程。

部署 Hadoop 前的准备工作

[collapse title="点我展开"]
1 需要知道hadoop依赖Java和SSH
Java 1.5.x (以上),必须安装。
ssh 必须安装并且保证 sshd 一直运行,以便用Hadoop 脚本管理远端Hadoop守护进程。

2 建立 Hadoop 公共帐号
所有的节点应该具有相同的用户名,可以使用如下命令添加:
useradd hadoop
passwd hadoop

3 配置 host 主机名
tail -n 3 /etc/hosts
192.168.57.75 namenode
192.168.57.76 datanode1
192.168.57.78 datanode2
192.168.57.79 datanode3

4 以上几点要求所有节点(namenode|datanode)配置全部相同

[/collapse]

ssh 配置

[collapse title="点我展开"]

1 生成私匙 id_rsa 与 公匙 id_rsa.pub 配置文件
[hadoop@hadoop1 ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
d6:63:76:43:e2:5b:8e:85:ab:67:a2:7c:a6:8f:23:f9 hadoop@hadoop1.test.com

2 私匙 id_rsa 与 公匙 id_rsa.pub 配置文件
[hadoop@hadoop1 ~]$ ls .ssh/
authorized_keys id_rsa id_rsa.pub known_hosts

3 把公匙文件上传到datanode服务器
[hadoop@hadoop1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@datanode1
28
hadoop@datanode1's password:
Now try logging into the machine, with "ssh 'hadoop@datanode1'", and check in:

.ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

[hadoop@hadoop1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@datanode2
28
hadoop@datanode2's password:
Now try logging into the machine, with "ssh 'hadoop@datanode2'", and check in:

.ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

[hadoop@hadoop1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@datanode3
28
hadoop@datanode3's password:
Now try logging into the machine, with "ssh 'hadoop@datanode3'", and check in:

.ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

[hadoop@hadoop1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@localhost
28
hadoop@localhost's password:
Now try logging into the machine, with "ssh 'hadoop@localhost'", and check in:

.ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

4 验证
[hadoop@hadoop1 ~]$ ssh datanode1
Last login: Thu Feb 2 09:01:16 2012 from 192.168.57.71
[hadoop@hadoop2 ~]$ exit
logout

[hadoop@hadoop1 ~]$ ssh datanode2
Last login: Thu Feb 2 09:01:18 2012 from 192.168.57.71
[hadoop@hadoop3 ~]$ exit
logout

[hadoop@hadoop1 ~]$ ssh datanode3
Last login: Thu Feb 2 09:01:20 2012 from 192.168.57.71
[hadoop@hadoop4 ~]$ exit
logout

[hadoop@hadoop1 ~]$ ssh localhost
Last login: Thu Feb 2 09:01:24 2012 from 192.168.57.71
[hadoop@hadoop1 ~]$ exit
logout

java环境配置

1 下载合适的jdk 
//此文件为64Linux 系统使用的 RPM包 
wget http://download.oracle.com/otn-pub/java/jdk/7/jdk-7-linux-x64.rpm 
 
2 安装jdk 
rpm -ivh jdk-7-linux-x64.rpm 
 
3 验证java 
[root@hadoop1 ~]# java -version 
java version "1.7.0" 
Java(TM) SE Runtime Environment (build 1.7.0-b147) 
Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode) 
[root@hadoop1 ~]# ls /usr/java/ 
default  jdk1.7.0  latest 
 
4 配置java环境变量 
#vim /etc/profile //在profile文件中加入如下信息: 
 
#add for hadoop 
export JAVA_HOME=/usr/java/jdk1.7.0 
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/ 
export PATH=$PATH:$JAVA_HOME/bin 
 
//使环境变量生效 
source /etc/profile 
 
5 拷贝 /etc/profile 到 datanode 
[root@hadoop1 src]# scp /etc/profile root@datanode1:/etc/ 
The authenticity of host 'datanode1 (192.168.57.86)' can't be established. 
RSA key fingerprint is b5:00:d1:df:73:4c:94:f1:ea:1f:b5:cd:ed:3a:cc:e1. 
Are you sure you want to continue connecting (yes/no)? yes 
Warning: Permanently added 'datanode1,192.168.57.86' (RSA) to the list of known hosts. 
root@datanode1's password: 
profile                                       100% 1624     1.6KB/s   00:00    
[root@hadoop1 src]# scp /etc/profile root@datanode2:/etc/ 
The authenticity of host 'datanode2 (192.168.57.87)' can't be established. 
RSA key fingerprint is 57:cf:96:15:78:a3:94:93:30:16:8e:66:47:cd:f9:cd. 
Are you sure you want to continue connecting (yes/no)? yes 
Warning: Permanently added 'datanode2,192.168.57.87' (RSA) to the list of known hosts. 
root@datanode2's password: 
profile                                       100% 1624     1.6KB/s   00:00    
[root@hadoop1 src]# scp /etc/profile root@datanode3:/etc/ 
The authenticity of host 'datanode3 (192.168.57.88)' can't be established. 
RSA key fingerprint is 31:73:e8:3c:20:0c:1e:b2:59:5c:d1:01:4b:26:41:70. 
Are you sure you want to continue connecting (yes/no)? yes 
Warning: Permanently added 'datanode3,192.168.57.88' (RSA) to the list of known hosts. 
root@datanode3's password: 
profile                                       100% 1624     1.6KB/s   00:00
  
6 拷贝 jdk 安装包,并在每个datanode 节点安装 jdk 包
[root@hadoop1 ~]# scp -r /home/hadoop/src/ hadoop@datanode1:/home/hadoop/ 
hadoop@datanode1's password: 
hadoop-0.20.203.0rc1.tar.gz                   100%   58MB  57.8MB/s   00:01    
jdk-7-linux-x64.rpm                           100%   78MB  77.9MB/s   00:01    
[root@hadoop1 ~]# scp -r /home/hadoop/src/ hadoop@datanode2:/home/hadoop/ 
hadoop@datanode2's password: 
hadoop-0.20.203.0rc1.tar.gz                   100%   58MB  57.8MB/s   00:01    
jdk-7-linux-x64.rpm                           100%   78MB  77.9MB/s   00:01    
[root@hadoop1 ~]# scp -r /home/hadoop/src/ hadoop@datanode3:/home/hadoop/ 
hadoop@datanode3's password: 
hadoop-0.20.203.0rc1.tar.gz                   100%   58MB  57.8MB/s   00:01    
jdk-7-linux-x64.rpm                           100%   78MB  77.9MB/s   00:01    
[/collapse]

hadoop 配置

[collapse title="点我展开"]

  1. 1 配置目录
  2. [hadoop@hadoop1 ~]$ pwd
  3. /home/hadoop
  4. [hadoop@hadoop1 ~]$ ll
  5. total 59220
  6. lrwxrwxrwx  1 hadoop hadoop       17 Feb  1 16:59 hadoop -> hadoop-0.20.203.0
  7. drwxr-xr-x 12 hadoop hadoop     4096 Feb  1 17:31 hadoop-0.20.203.0
  8. -rw-r--r--  1 hadoop hadoop 60569605 Feb  1 14:24 hadoop-0.20.203.0rc1.tar.gz
  9. 2 配置hadoop-env.sh,指定java位置
  10. vim hadoop/conf/hadoop-env.sh
  11. export JAVA_HOME=/usr/java/jdk1.7.0
  12. 3 配置core-site.xml //定位文件系统的 namenode
  13. [hadoop@hadoop1 ~]$ cat hadoop/conf/core-site.xml
  14. <?xml version="1.0"?>
  15. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  16. <!-- Put site-specific property overrides in this file. -->
  17. <configuration>
  18. <property>
  19. <name>fs.default.name</name>
  20. <value>hdfs://namenode:9000</value>
  21. </property>
  22. </configuration>
  23. 4 配置mapred-site.xml //定位jobtracker 所在的主节点
  24. [hadoop@hadoop1 ~]$ cat hadoop/conf/mapred-site.xml
  25. <?xml version="1.0"?>
  26. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  27. <!-- Put site-specific property overrides in this file. -->
  28. <configuration>
  29. <property>
  30. <name>mapred.job.tracker</name>
  31. <value>namenode:9001</value>
  32. </property>
  33. </configuration>
  34. 5 配置hdfs-site.xml //配置HDFS副本数量
  35. [hadoop@hadoop1 ~]$ cat hadoop/conf/hdfs-site.xml
  36. <?xml version="1.0"?>
  37. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  38. <!-- Put site-specific property overrides in this file. -->
  39. <configuration>
  40. <property>
  41. <name>dfs.replication</name>
  42. <value>3</value>
  43. </property>
  44. </configuration>
  45. 6 配置 master 与 slave 配置文档
  46. [hadoop@hadoop1 ~]$ cat hadoop/conf/masters
  47. namenode
  48. [hadoop@hadoop1 ~]$ cat hadoop/conf/slaves
  49. datanode1
  50. datanode2
  51. 7 拷贝hadoop 目录到所有节点(datanode)
  52. [hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode1:/home/hadoop/
  53. [hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode2:/home/hadoop/
  54. [hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode3:/home/hadoop
  55. 8 格式化 HDFS
  56. [hadoop@hadoop1 hadoop]$ bin/hadoop namenode -format
  57. 12/02/02 11:31:15 INFO namenode.NameNode: STARTUP_MSG:
  58. /************************************************************
  59. STARTUP_MSG: Starting NameNode
  60. STARTUP_MSG:   host = hadoop1.test.com/127.0.0.1
  61. STARTUP_MSG:   args = [-format]
  62. STARTUP_MSG:   version = 0.20.203.0
  63. STARTUP_MSG:   build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May  4 07:57:50 PDT 2011
  64. ************************************************************/
  65. Re-format filesystem in /tmp/hadoop-hadoop/dfs/name ? (Y or N)  Y  //这里输入Y
  66. 12/02/02 11:31:17 INFO util.GSet: VM type       = 64-bit
  67. 12/02/02 11:31:17 INFO util.GSet: 2% max memory = 19.33375 MB
  68. 12/02/02 11:31:17 INFO util.GSet: capacity      = 2^21 = 2097152 entries
  69. 12/02/02 11:31:17 INFO util.GSet: recommended=2097152actual=2097152
  70. 12/02/02 11:31:17 INFO namenode.FSNamesystem: fsOwner=hadoop
  71. 12/02/02 11:31:18 INFO namenode.FSNamesystem: supergroupsupergroup=supergroup
  72. 12/02/02 11:31:18 INFO namenode.FSNamesystem: isPermissionEnabled=true
  73. 12/02/02 11:31:18 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
  74. 12/02/02 11:31:18 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
  75. 12/02/02 11:31:18 INFO namenode.NameNode: Caching file names occuring more than 10 times
  76. 12/02/02 11:31:18 INFO common.Storage: Image file of size 112 saved in 0 seconds.
  77. 12/02/02 11:31:18 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
  78. 12/02/02 11:31:18 INFO namenode.NameNode: SHUTDOWN_MSG:
  79. /************************************************************
  80. SHUTDOWN_MSG: Shutting down NameNode at hadoop1.test.com/127.0.0.1
  81. ************************************************************/
  82. [hadoop@hadoop1 hadoop]$
  83. 9 启动hadoop 守护进程
  84. [hadoop@hadoop1 hadoop]$ bin/start-all.sh
  85. starting namenode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-namenode-hadoop1.test.com.out
  86. datanode1: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop2.test.com.out
  87. datanode2: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop3.test.com.out
  88. datanode3: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop4.test.com.out
  89. starting jobtracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-jobtracker-hadoop1.test.com.out
  90. datanode1: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop2.test.com.out
  91. datanode2: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop3.test.com.out
  92. datanode3: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop4.test.com.out
  93. 10 验证
  94. //namenode
  95. [hadoop@hadoop1 logs]$ jps
  96. 2883 JobTracker
  97. 3002 Jps
  98. 2769 NameNode
  99. //datanode
  100. [hadoop@hadoop2 ~]$ jps
  101. 2743 TaskTracker
  102. 2670 DataNode
  103. 2857 Jps
  104. [hadoop@hadoop3 ~]$ jps
  105. 2742 TaskTracker
  106. 2856 Jps
  107. 2669 DataNode
  108. [hadoop@hadoop4 ~]$ jps
  109. 2742 TaskTracker
  110. 2852 Jps
  111. 2659 DataNode
  112. Hadoop 监控web页面
  113. http://192.168.57.75:50070/dfshealth.jsp

[/collapse]

简单验证HDFS

[collapse title="点我展开"]
hadoop 的文件命令格式如下:
hadoop fs -cmd
//建立目录
[hadoop@hadoop1 hadoop]$ bin/hadoop fs -mkdir /test-hadoop
//査看目录
[hadoop@hadoop1 hadoop]$ bin/hadoop fs -ls /
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2012-02-02 13:32 /test-hadoop
drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp
//査看目录包括子目录
[hadoop@hadoop1 hadoop]$ bin/hadoop fs -lsr /
drwxr-xr-x - hadoop supergroup 0 2012-02-02 13:32 /test-hadoop
drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp
drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop
drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred
drwx------ - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system
-rw------- 2 hadoop supergroup 4 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system/jobtracker.info
//添加文件
[hadoop@hadoop1 hadoop]$ bin/hadoop fs -put /home/hadoop/hadoop-0.20.203.0rc1.tar.gz /test-hadoop
[hadoop@hadoop1 hadoop]$ bin/hadoop fs -lsr /
drwxr-xr-x - hadoop supergroup 0 2012-02-02 13:34 /test-hadoop
-rw-r--r-- 2 hadoop supergroup 60569605 2012-02-02 13:34 /test-hadoop/hadoop-0.20.203.0rc1.tar.gz
drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp
drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop
drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred
drwx------ - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system
-rw------- 2 hadoop supergroup 4 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system/jobtracker.info
//获取文件
[hadoop@hadoop1 hadoop]$ bin/hadoop fs -get /test-hadoop/hadoop-0.20.203.0rc1.tar.gz /tmp/
[hadoop@hadoop1 hadoop]$ ls /tmp/*.tar.gz
/tmp/1.tar.gz /tmp/hadoop-0.20.203.0rc1.tar.gz
//删除文件
[hadoop@hadoop1 hadoop]$ bin/hadoop fs -rm /test-hadoop/hadoop-0.20.203.0rc1.tar.gz
Deleted hdfs://namenode:9000/test-hadoop/hadoop-0.20.203.0rc1.tar.gz
[hadoop@hadoop1 hadoop]$ bin/hadoop fs -lsr /
drwxr-xr-x - hadoop supergroup 0 2012-02-02 13:57 /test-hadoop
drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp
drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop
drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred
drwx------ - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system
-rw------- 2 hadoop supergroup 4 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system/jobtracker.info
drwxr-xr-x - hadoop supergroup 0 2012-02-02 13:36 /user
-rw-r--r-- 2 hadoop supergroup 321 2012-02-02 13:36 /user/hadoop
//删除目录
[hadoop@hadoop1 hadoop]$ bin/hadoop fs -rmr /test-hadoop
Deleted hdfs://namenode:9000/test-hadoop
[hadoop@hadoop1 hadoop]$ bin/hadoop fs -lsr /
drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp
drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop
drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred
drwx------ - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system
-rw------- 2 hadoop supergroup 4 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system/jobtracker.info
drwxr-xr-x - hadoop supergroup 0 2012-02-02 13:36 /user
-rw-r--r-- 2 hadoop supergroup 321 2012-02-02 13:36 /user/hadoop

//hadoop fs 帮助(部分)
[hadoop@hadoop1 hadoop]$ bin/hadoop fs -help
hadoop fs is the command to execute fs commands. The full syntax is:

hadoop fs [-fs ] [-conf ]
[-D <propertyproperty=value>] [-ls ] [-lsr ] [-du ]
[-dus ] [-mv ] [-cp ] [-rm [-skipTrash] ]
[-rmr [-skipTrash] ] [-put ... ] [-copyFromLocal ... ]
[-moveFromLocal ... ] [-get [-ignoreCrc] [-crc]
[-getmerge [addnl]] [-cat ]
[-copyToLocal [-ignoreCrc] [-crc] ] [-moveToLocal ]
[-mkdir ] [-report] [-setrep [-R] [-w] <path/file>]
[-touchz ] [-test -[ezd] ] [-stat [format] ]
[-tail [-f] ] [-text ]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-chgrp [-R] GROUP PATH...]
[-count[-q] ]
[-help [cmd]]

[/collapse]
111




444

马洪飞

但尽人事,莫问前程...

相关推荐

5 条评论

  1. 谢谢博主 又学到新知识了。

  2. 不能截图吗?还需要拍照

    • mhf

      亲你都看出来这是照片了啊哈哈

  3. 谢谢分享。学习了。

    • mhf

      客气了

发表评论

电子邮件地址不会被公开。 必填项已用*标注

微信扫一扫,分享到朋友圈

hadoop之环境配置
返回顶部

显示

忘记密码?

显示

显示

获取验证码

Close