内容纲要
概要描述
在 Windows 环境下,使用 Java 程序通过 Kerberos 认证连接 TDH 集群,将数据以 ORC 格式写入 HDFS,供 Quark/Inceptor 的 ORC 表直接读取。
前置条件
| 项目 | 要求 |
|---|---|
| JDK | 1.8(1.8.0_391 验证通过) |
| 网络 | Windows 机器需能访问 HDFS NameNode 端口(默认 8020) |
| hosts | 本地 C:\Windows\System32\drivers\etc\hosts 需配置集群节点映射 |
| TDH 服务器 | 需能 SSH 登录到任一集群节点,拷贝依赖文件 |
操作步骤
步骤一:配置本地 hosts
在 Windows 的 C:\Windows\System32\drivers\etc\hosts 中添加集群节点映射:
172.18.131.171 kv1
172.18.131.172 kv2
172.18.131.173 kv3
172.18.131.174 kv4
注意:节点主机名需与集群实际配置一致,可在服务器的
/etc/hosts中确认。
步骤二:从服务器拷贝依赖文件
在 Windows 本地创建目录结构:
C:\Users\<用户名>\tdh-libs\
├── conf\ ← HDFS 配置文件
└── jars\ ← 依赖 JAR 包
2.1 拷贝配置文件
从 TDH 服务器 /etc/hdfs1/conf/ 目录下载以下文件到本地 conf\ 目录:
| 文件 | 来源路径 |
|---|---|
core-site.xml |
/etc/hdfs1/conf/core-site.xml |
hdfs-site.xml |
/etc/hdfs1/conf/hdfs-site.xml |
krb5.conf |
/etc/hdfs1/conf/krb5.conf |
hdfs.keytab |
/etc/hdfs1/conf/hdfs.keytab |
2.2 拷贝依赖 JAR
从 TDH 服务器的 TDH Client 目录下载以下 JAR 到本地 jars\ 目录:
核心 JAR(必须):
| JAR 文件 | 服务器路径 |
|---|---|
hadoop-common-transwarp-9.3.3.jar |
/root/TDH-Client/hadoop/hadoop/share/common/ |
inceptor-exec-8.43.2.jar |
/root/TDH-Client/inceptor/lib/ |
hadoop-hdfs-transwarp-9.3.3.jar |
/root/TDH-Client/hadoop/hadoop/share/hdfs/ |
hadoop-hdfs-client-transwarp-9.3.3.jar |
/root/TDH-Client/hadoop/hadoop/share/hdfs/ |
hadoop-mapreduce-client-core-transwarp-9.3.3.jar |
/root/TDH-Client/hadoop/hadoop/share/mapreduce/ |
hadoop-mapreduce-client-common-transwarp-9.3.3.jar |
/root/TDH-Client/hadoop/hadoop/share/mapreduce/ |
hadoop-auth-transwarp-9.3.3.jar |
/root/TDH-Client/hadoop/hadoop/share/common/lib/ |
快捷方式:如果不想逐个下载,可以直接将服务器上以下三个目录的所有 JAR 拷贝到本地
jars\目录:
/root/TDH-Client/hadoop/hadoop/share/common/lib/*.jar/root/TDH-Client/hadoop/hadoop/share/hdfs/lib/*.jar/root/TDH-Client/inceptor/lib/inceptor-exec-8.43.2.jar
步骤三:编写 Java 代码
创建 HdfsOrcWriter.java:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hive.ql.io.orc.OrcFile;
import org.apache.hadoop.hive.ql.io.orc.Writer;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import java.util.ArrayList;
import java.util.List;
public class HdfsOrcWriter {
public static void main(String[] args) throws Exception {
// 配置文件目录(通过命令行参数指定,默认使用本地路径)
String confDir = args.length > 0 ? args[0] : "C:\\Users\\17171\\tdh-libs\\conf";
Configuration conf = new Configuration();
conf.addResource(new Path(new java.io.File(confDir, "core-site.xml").getAbsolutePath()));
conf.addResource(new Path(new java.io.File(confDir, "hdfs-site.xml").getAbsolutePath()));
// 禁用 OAuth2(TDH Guardian 插件在本地客户端环境无法工作)
conf.set("hadoop.security.authentication.oauth2.enabled", "false");
conf.set("hadoop.security.authentication.web.oauth2.enabled", "false");
// Kerberos 认证
org.apache.hadoop.security.UserGroupInformation.setConfiguration(conf);
String keytabPath = new java.io.File(confDir, "hdfs.keytab").getAbsolutePath();
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab("hdfs/kv1@KTDH", keytabPath);
FileSystem fs = FileSystem.get(conf);
// HDFS 输出路径
Path hdfsDir = new Path("/user/hdfs/orc_test");
fs.mkdirs(hdfsDir);
Path hdfsOrcPath = new Path(hdfsDir, "output.orc");
if (fs.exists(hdfsOrcPath)) {
fs.delete(hdfsOrcPath, false);
}
// 定义 Schema:id INT, name STRING, age INT
List colNames = new ArrayList();
colNames.add("id");
colNames.add("name");
colNames.add("age");
List colOIs = new ArrayList();
colOIs.add(PrimitiveObjectInspectorFactory.writableIntObjectInspector);
colOIs.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector);
colOIs.add(PrimitiveObjectInspectorFactory.writableIntObjectInspector);
StructObjectInspector soi = ObjectInspectorFactory.getStandardStructObjectInspector(colNames, colOIs);
// 创建 ORC Writer
Writer orcWriter = OrcFile.createWriter(
hdfsOrcPath,
OrcFile.writerOptions(conf)
.inspector(soi)
.stripeSize(67108864)
.bufferSize(262144)
.rowIndexStride(10000)
.fileSystem(fs)
);
// 写入数据
Object[][] data = {
{new IntWritable(1), new Text("zhangsan"), new IntWritable(20)},
{new IntWritable(2), new Text("lisi"), new IntWritable(25)},
{new IntWritable(3), new Text("wangwu"), new IntWritable(30)},
{new IntWritable(4), new Text("zhaoliu"), new IntWritable(28)},
{new IntWritable(5), new Text("tianqi"), new IntWritable(22)}
};
for (Object[] row : data) {
orcWriter.addRow(row);
}
orcWriter.close();
fs.close();
System.out.println("ORC file written to HDFS successfully!");
System.out.println("Path: " + hdfsOrcPath.toString());
}
}
注意:代码中
loginUserFromKeytab的 principal(hdfs/kv1@KTDH)需根据实际集群配置修改,可通过在服务器上执行klist -kt /etc/hdfs1/conf/hdfs.keytab查看 keytab 中包含的 principal。
步骤四:编译与运行
构建classpath
将 jars\ 目录下所有 JAR 和 conf\ 目录加入 classpath:
$jarDir = "C:\Users\<用户名>\tdh-libs\jars"
$confDir = "C:\Users\<用户名>\tdh-libs\conf"
$jars = Get-ChildItem -Path $jarDir -Filter "*.jar" | Select-Object -ExpandProperty FullName
$cp = ($jars -join ";") + ";" + $confDir
编译
javac -cp $cp HdfsOrcWriter.java
运行
java -cp $cp `
-Djava.security.krb5.conf="$confDir\krb5.conf" `
-Dhadoop.home.dir=C:\Users\<用户名>\tdh-libs `
HdfsOrcWriter $confDir
| JVM 参数 | 说明 |
|---|---|
-Djava.security.krb5.conf |
指定 Kerberos 配置文件路径 |
-Dhadoop.home.dir |
Hadoop 主目录(避免 Windows 下找不到 WINUTILS 的警告) |
运行成功后输出:
ORC file written to HDFS successfully!
Path: /user/hdfs/orc_test/output.orc
步骤五:验证结果
5.1 通过 Beeline 创建 ORC 表并查询
在 TDH 服务器上执行:
source /root/TDH-Client/init.sh n n
beeline -u "jdbc:hive2://:10000/default" -n hive -p <密码> \
-e "CREATE EXTERNAL TABLE IF NOT EXISTS test_orc_table(
id INT,
name STRING,
age INT
) STORED AS ORC
LOCATION '/user/hdfs/orc_test';
SELECT * FROM test_orc_table;"
查询结果:
+-----+-----------+------+
| id | name | age |
+-----+-----------+------+
| 1 | zhangsan | 20 |
| 2 | lisi | 25 |
| 3 | wangwu | 30 |
| 4 | zhaoliu | 28 |
| 5 | tianqi | 22 |
+-----+-----------+------+
5.2 直接查看 HDFS 文件
kinit -kt /etc/hdfs1/conf/hdfs.keytab hdfs/kv1@KTDH
hdfs dfs -ls -h /user/hdfs/orc_test/