1. 拷贝集群客户端,配置文件及配置环境变量
说明:
1. 以下操作请在执行代理服务器上用root权限执行。
2. $cdh_ip为集群任意节点的ip。
3. 具体操作可能因版本不同而有所变化,请根据实际情况操作。
sudo mkdir -p /opt/cloudera/parcels
sudo rsync -av root@$cdh_ip:/opt/cloudera/parcels /opt/cloudera/
sudo chown -R deploy:deploy /opt/cloudera/parcels
sudo rsync -av root@$cdh_ip:/etc/alternatives/* /etc/alternatives/
sudo rsync -av root@$cdh_ip:/etc/hadoop /etc/
sudo rsync -av root@$cdh_ip:/etc/hive /etc/
sudo rsync -av root@$cdh_ip:/etc/spark* /etc/
sudo rsync -av root@$cdh_ip:/etc/hbase* /etc/
sudo rsync -av root@$cdh_ip:/usr/bin/spark* /usr/bin/
sudo rsync -av root@$cdh_ip:/usr/bin/h* /usr/bin/
echo '
export HADOOPHOME=/opt/cloudera/parcels/CDH/lib/hadoop
export SPARKHOME=/opt/cloudera/parcels/CDH/lib/spark
export PATH=$PATH:$HADOOPHOME/sbin:$HADOOPHOME/bin:$SPARKHOME/bin' | sudo tee -a /home/deploy/.bashrc > /dev/null
cd /etc/alternatives
ln -s /etc/hive/conf.cloudera.hive hive-conf
ln -s /etc/hadoop/conf.cloudera.yarn hadoop-conf
ln -s /etc/spark/conf.cloudera.spark spark-conf
cd /etc/hive
ln -s /etc/alternatives/hive-conf conf
cd /etc/hadoop
ln -s /etc/alternatives/hadoop-conf conf
cd /etc/hbase
ln -s /etc/alternatives/hbase-conf conf
sudo cp /etc/hive/conf/hive-site.xml /etc/spark/conf/
sudo cp /etc/hive/conf/hive-site.xml /etc/spark2/conf/
sudo cp /etc/hive/conf/hive-site.xml /etc/hadoop/conf/
#如果spark是2.x的版本,还需要在执行代理节点上进行如下操作(具体版本号根据实际情况而定)。
#注意:如果部署执行代理的服务器为集群节点,则请勿操作!!!
mv /opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/jars/commons-logging-1.1.3.jar /tmp/
2. 以上操作完成之后,请手动将集群的hosts信息拷贝到执行代理服务器上的/etc/hosts内。
3. 在hdfs上创建数栖平台项目默认资源目录,在所有集群节点新建deploy用户并同步deploy用户到hdfs的超级用户组
ssh root@$cdh_ip 'sudo su - hdfs -c "hadoop fs -mkdir -p /user/shuqi"'
ssh root@$cdh_ip 'sudo su - hdfs -c "hadoop fs -chown -R deploy:deploy /user/shuqi"'
#在所有集群节点创建deploy用户
useradd deploy
#增加supergroup
ssh root@$cdh_ip 'sudo groupadd supergroup'
#将用户deploy增加到supergroup中
ssh root@$cdh_ip 'sudo usermod -a -G supergroup deploy'
#同步系统的权限信息到HDFS
ssh root@$cdh_ip 'sudo su - hdfs -s /bin/bash -c "hdfs dfsadmin -refreshUserToGroupsMappings"'
4. 验证
# 进入spark目录
cd /opt/cloudera/parcels/CDH/lib/spark
# 执行sparkPi,指定调度队列为dev,请注意:spark-examples的jar包版本可能因集群版本而改变
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --queue root.default --name sparkPi lib/spark-examples-1.6.0-cdh5.15.2-hadoop2.6.0-cdh5.15.2.jar 100
# 结果:
......
17/10/14 14:52:30 INFO Submitted application application_1507947630013_0003
17/10/14 14:53:12 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 1.976050 s
Pi is roughly 3.1411539141153915
......