如何通过hdfs插件repository-hdfs迁移search数据

  其他常见问题
内容纲要

概要描述


search数据迁移主要方式:

  1. 通过Data Migration Tool,参考 <search运维手册> ,但该迁移方式经可能不太稳定,特别是数据量较大的情况下;
  2. 本案例讲述通过hdfs插件repository-hdfs进行迁移:通过snapshot的方式将search数据导入hdfs中,再从hdfs中将数据重入search中。

使用前说明:

  1. 不论是同一集群还是不同集群间备份还原或迁移,集群都是关闭kerberos安全认证的;
  2. TDH从5.1版本开始对应开源的ElasticSearch的版本都是5.4.1,所以只要是TDH5.1以上的版本之间都是可以通过该方式来迁移数据的;
  3. 本案例讲述从TDH5.2.3版本迁移数据到TDH6.2.1版本
  4. 环境说明
    TDH5.2.3为源集群:172.22.33.33/xixia01、172.22.33.34/xixia02、172.22.33.35/xixia03
    TDH6.2.1为目标集群:172.22.33.1/mll01、172.22.33.2/mll02、172.22.33.3/mll03

详细说明


安装hdfs插件repository-hdfs

注意:
以下步骤除额外说明,每一步骤都要在各集群所有search相关节点都操作;
本步骤演示在TDH6.2.1版本安装插件,其他版本类似的步骤。

unzip repository-hdfs-5.4.1.zip
mv elasticsearch repository-hdfs
  • 将repository-hdfs拷贝到search plugin路径下
    TDH6.2.1对应的路径是:/etc/search1/instancegroup1/conf/plugins/
    TDH5.2.3对应的路径是:/etc/search1/conf/plugins/

file

  • manager节点修改 jvm.options 模板配置文件,避免在操作过程中出现 JVM 的 JSM 安全校验导致的报错
    模板文件路径是:/var/lib/transwarp-manager/master/content/meta/services/SEARCH/transwarp-6.2.1-final/templates/jvm.options (不同版本,注意修改成对应的版本号)

在文件末尾加入配置内容:
-Djava.security.policy=/etc/search1/instancegroup1/conf/plugins/repository-hdfs/plugin-security.policy

plugin-security.policy文件所对应的路径以实际为准;

file

  • 配置服务,重启search服务,安装插件完成

file

创建Repository,在源集群和目标集群各⾃创建repository

  1. TDH6.2.1目标集群创建repository
curl -X PUT "localhost:9200/_snapshot/geo_repository" -H 'Content-Type:application/json' -d '
{
"type":"hdfs",
"settings":{
"uri":"hdfs://mll02:8020",
"path":"/user/backup",
"max_snapshot_bytes_per_sec":"500mb",
"max_restore_bytes_per_sec":"500mb",
"compress":"true",
"conf_location":"/root/TDH-Client/conf/hdfs1/hdfs-site.xml"}}'

file

- uri: hdfs active namenode的IP或主机名;
- path: hdfs上的目录,创建的snapshot保存在该目录下,确保 **elasticsearch** 用户对该目录有 **write** 权限;
- max_snapshot_bytes_per_sec、max_restore_bytes_per_sec、compress : 控制snapshot创建时的速度;
- conf_location : 指定配置文件;
  1. TDH5.2.3源集群创建repository
curl -X PUT "localhost:9200/_snapshot/geo_repository" -H 'Content-Type:application/json' -d '
{
"type":"hdfs",
"settings":{
"uri":"hdfs://xixia02:8020",
"path":"/user/backup",
"max_snapshot_bytes_per_sec":"500mb",
"max_restore_bytes_per_sec":"500mb",
"compress":"true",
"conf_location":"/root/TDH-Client/conf/hdfs1/hdfs-site.xml"}}'

file

生成 snapshot

  1. 在TDH5.2.3源集群创建snapshot

本案例测试将源集群TDH5.2.3环境中的 transwarp、news_analyze_zh、default.es_driver1 三个index迁移到目标集群TDH6.2.1环境中。

TDH5.2.3源集群search页面index信息如下:

file

curl -XPUT "172.22.33.33:9200/_snapshot/geo_repository/geo_snapshot_v?wait_for_completion=true" -H 'Content-Type:application/json' -d '{
"indices":"transwarp,news*,default.es*",
"ignore_unavailable":"true",
"include_global_state":"false"
}'
- indices : 可以指定具体的index,以逗号分隔,也可以正则匹配;
- wait_for_completion=true : 如果不希望备份操作以后台方式运行,而是希望在前台发送请求时等待备份操作执行完成,添加该参数;
- ignore_unavailable : 如果设置为true的话,那么那些不存在的index就会被忽略掉。默认情况下,这个参数是不设置的,那么此时如果某个index丢失了,会导致备份过程失败;
- include_global_state : 可以阻止cluster的全局state也作为snapshot的一部分被备份;

上述命令执行没有报错表示执行成功,可以在hdfs对应路径上看到生成的snapshot信息:

file

也可以通过curl命令查看所有snapshot:

curl -X GET localhost:9200/_snapshot/geo_repository/_all?pretty

可以看到geo_snapshot_v下面已经有了transwarp、news_analyze_zh、default.es_driver1 三个index的snapshot了。

file

copy snapshot

  1. 将源集群TDH5.2.3hdfs上/user/backup目录下的所有文件拷贝到目标集群TDH6.2.1环境创建repository时指定的hdfs路径下
    本案例中,源集群TDH5.2.3和目标TDH6.2.1创建repository时指定的hdfs路径都是 /user/backup 。
    拷贝完成后,目标集群TDH6.2.1环境hdfs上/user/backup内容如下,应该和源集群内容是保持一致的:

file

同步完hdfs上数据之后,可以在目标集群通过 curl 命令查看所有的 snapshot

curl -X GET localhost:9200/_snapshot/geo_repository/_all?pretty

file

在TDH6.2.1目标集群上通过snapshot恢复数据

TDH6.2.1目标集群search页面index信息如下:

file

通过以下命令恢复数据:

curl -X POST "172.22.33.1:9200/_snapshot/geo_repository/geo_snapshot_v/_restore?wait_for_completion=true" -H 'Content-Type:application/json' -d '{
"indices":"news_analyze_zh,transwarp,default.es_driver1",
"ignore_unavailable":"true",
"include_global_state":"false"
}'
- wait_for_completion=true : restore过程也是在后台运行的,如果要在前台等待它运行完,添加该参数;
- indices : 可以指定具体的index,以逗号分隔,也可以正则匹配。 

file

然后在search页面查看数据:

file
file

常见问题

  1. 部署完插件重启search时启动失败,查看pod日志报错如下
Caused by: java.nio.file.FileSystemException: /usr/lib/elasticsearch/elasticsearch-5.4.1-transwarp/plugins/repository-hdfs-5.4.1.zip/plugin-descriptor.properties: Not a directory

file

解决方案:检查search plugin目录下是不是没有删除repository-hdfs-5.4.1.zip压缩包,该目录下不能有压缩包,只需要保留解压后的文件夹即可。

file

  1. 部署完插件重启search,netstat -anpl | grep 9200 查看端口没有监听,查看search server的pod报错
Caused by: java.lang.IllegalStateException: jar hell!
class: org.apache.commons.logging.impl.AvalonLogger
jar1: /usr/lib/elasticsearch/elasticsearch-5.4.1-transwarp/plugins/elasticsearch/commons-logging-1.1.3.jar
jar2: /usr/lib/elasticsearch/elasticsearch-5.4.1-transwarp/lib/commons-logging-1.2.jar

file

解决方案:jar包有冲突,对比两个目录下的jar包,将search plugin目录下所有重复的包都移走
TDH6.2.1search plugin目录:/etc/search1/instancegroup1/conf/plugins/repository-hdfs
TDH6.2.1自带的jar,pod内的路径:/usr/lib/elasticsearch/elasticsearch-5.4.1-transwarp/lib/

  1. 创建repository时报错如下
{"error":{"root_cause":[{"type":"repository_exception","reason":"[geo_repository] failed to create repository"}],"type":"repository_exception","reason":"[geo_repository] failed to create repository","caused_by":{"type":"unchecked_i_o_exception","reason":"Cannot create HDFS repository for uri [hdfs://xixia01:8020]","caused_by":{"type":"remote_exception","reason":"Operation category WRITE is not supported in state standby\n\tat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)

解决方案:创建repository时指定的uri需要是hdfs当前active的namenode节点

  1. 创建repository时报错信息如下
{"error":{"root_cause":[{"type":"repository_exception","reason":"[geo_repository] failed to create repository"}],"type":"repository_exception","reason":"[geo_repository] failed to create repository","caused_by":{"type":"unchecked_i_o_exception","reason":"Cannot create HDFS repository for uri [hdfs://xixia02:8020]","caused_by":{"type":"i_o_exception","reason":"com.google.protobuf.ServiceException: java.security.AccessControlException: access denied (\"javax.security.auth.PrivateCredentialPermission\" \"org.apache.hadoop.security.Credentials\" \"read\")","caused_by":{"type":"service_exception","reason":"java.security.AccessControlException: access denied (\"javax.security.auth.PrivateCredentialPermission\" \"org.apache.hadoop.security.Credentials\" \"read\")","caused_by":{"type":"access_control_exception","reason":"access denied (\"javax.security.auth.PrivateCredentialPermission\" \"org.apache.hadoop.security.Credentials\" \"read\")"}}}}},"status":500}

解决方案:检查manager节点模板文件 /var/lib/transwarp-manager/master/content/meta/services/SEARCH/transwarp-6.2.1-final/templates/jvm.options 中指定的 plugin-security.policy 文件是否正确。

  1. 创建repository时报错信息如下
{"error":{"root_cause":[{"type":"exception","reason":"failed to create blob container"}],"type":"exception","reason":"failed to create blob container","caused_by":{"type":"access_control_exception","reason":"Permission denied: user=elasticsearch, access=WRITE, inode=\"/user/backup\":hdfs:hadoop:drwxr-xr-x\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:324)\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:223)\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:199)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1771)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1755)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1738)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:69)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3946)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:981)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:634)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:415)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1988)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)\n","caused_by":{"type":"remote_exception","reason":"Permission denied: user=elasticsearch, access=WRITE, inode=\"/user/backup\":hdfs:hadoop:drwxr-xr-x\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:324)\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:223)\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:199)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1771)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1755)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1738)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:69)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3946)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:981)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:634)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:415)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1988)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)\n"}}},"status":500}

解决方案:elasticsearch用户对repository中指定的path没有写权限;
通过如下命令加入权限:

hadoop fs -setfacl -m user:elasticsearch:rwx /user/backup

file

这篇文章对您有帮助吗?

平均评分 5 / 5. 次数: 1

尚无评价,您可以第一个评哦!

非常抱歉,这篇文章对您没有帮助.

烦请您告诉我们您的建议与意见,以便我们改进,谢谢您。