内容纲要
概要描述
当我们通过hdfs dfs -put或者hdfs api将分区表的分区目录挪到hive分区表目录下时,数据在hive中是无法被查询到的。
此时我们需要手动为表添加指定分区,通常情况下,我们可以使用 alter table add partition (\<partition_key>=\<value>)
命令手动添加指定的分区;本案例介绍一种比较简单的添加分区的方式;
该方法只适用于单值分区表;
该方法支持多级分区表;
该方法不适用于范围分区表;
该方法不适用 INTERVAL 分区表;
详细说明
- 环境准备:TDH6.2.1
MSCK 是一个 metastore 检查命令MSCK REPAIR TABLE table_name
,是通过对比metastore中 partition 元信息与 HDFS 目录下的分区信息来实现的;可以将 HDFS 目录下存在但表的 metastore 中不存在的 partition 元信息更新到 metastore 中。
Ps:MSCK 修复命令不支持 db_name.tab_name,只能用tab_name,也就是
USE db_name;MSCK REPAIR TABLE tab_name
1. 创建分区表
> DROP TABLE IF EXISTS repair_test;
> CREATE TABLE repair_test (col_a STRING) PARTITIONED BY (par STRING);
2. 向一个分区插入数据
> INSERT INTO TABLE repair_test PARTITION(par="partition_1") SELECT "test" FROM system.dual;
1 row affected (7.051 seconds)
> SHOW PARTITIONS repair_test;
+------------------+
| partition |
+------------------+
| par=partition_1 |
+------------------+
1 row selected (0.23 seconds)
> select * from repair_test;
+--------+--------------+
| col_a | par |
+--------+--------------+
| test | partition_1 |
+--------+--------------+
3. 通过hdfs的put命令手动创建一个数据
$ echo '123123' > test.txt
$ hadoop fs -mkdir hdfs://nameservice1/inceptor1/user/hive/warehouse/default.db/hive/repair_test/par=partition_2
$ hadoop fs -put -f ./test.txt hdfs://nameservice1/inceptor1/user/hive/warehouse/default.db/hive/repair_test/par=partition_2
2020-01-21 10:52:18,688 INFO util.KerberosUtil: Using principal pattern: HTTP/_HOST
$ hadoop fs -ls -R hdfs://nameservice1/inceptor1/user/hive/warehouse/default.db/hive/repair_test/
2020-01-21 10:52:38,205 INFO util.KerberosUtil: Using principal pattern: HTTP/_HOST
drwx--x--x - hive hive 0 2020-01-21 10:50 hdfs://nameservice1/inceptor1/user/hive/warehouse/default.db/hive/repair_test/par=partition_1
-rwx--x--x 3 hive hive 5 2020-01-21 10:50 hdfs://nameservice1/inceptor1/user/hive/warehouse/default.db/hive/repair_test/par=partition_1/000000_0
drwxr-xr-x - hdfs hive 0 2020-01-21 10:52 hdfs://nameservice1/inceptor1/user/hive/warehouse/default.db/hive/repair_test/par=partition_2
-rw-r--r-- 3 hdfs hive 7 2020-01-21 10:52 hdfs://nameservice1/inceptor1/user/hive/warehouse/default.db/hive/repair_test/par=partition_2/test.txt
4. 我们查看分区数据,发现partition_2这个分区并没有加到hive中
> SHOW PARTITIONS repair_test;
+------------------+
| partition |
+------------------+
| par=partition_1 |
+------------------+
1 row selected (0.215 seconds)
5. 运行msck repair table 命令再次查看分区信息,可以看到通过put命令放入的分区已经可以查询
> MSCK REPAIR TABLE repair_test;
No rows affected (0.472 seconds)
> SHOW PARTITIONS repair_test;
+------------------+
| partition |
+------------------+
| par=partition_1 |
| par=partition_2 |
+------------------+
2 rows selected (0.166 seconds)
> select * from repair_test;
+---------+--------------+
| col_a | par |
+---------+--------------+
| test | partition_1 |
| 123123 | partition_2 |
+---------+--------------+
2 rows selected (0.809 seconds)
附录:通过alter table add partition (\<partition_key>=\<value>)
添加分区的方法
> alter table repair_test add partition(par='partition_2');
No rows affected (0.374 seconds)
> SHOW PARTITIONS repair_test;
+------------------+
| partition |
+------------------+
| par=partition_1 |
| par=partition_2 |
+------------------+
2 rows selected (0.241 seconds)
> select * from repair_test;
+---------+--------------+
| col_a | par |
+---------+--------------+
| test | partition_1 |
| 123123 | partition_2 |
+---------+--------------+
2 rows selected (0.908 seconds)