Tablet Server throwed HDFS replication error(Accumulo 1.7.2)

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Tablet Server throwed HDFS replication error(Accumulo 1.7.2)

Takashi Sasaki
Hello,

We encountered some error on Accumulo 1.7.2.
The error seems to be HDFS replication issue, but HDFS is not full.

Actual log is below,
2017-05-15 06:18:40,751 [log.TabletServerLogger] ERROR: Unexpected
error writing to log, retrying attempt 43
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
  at org.apache.accumulo.tserver.log.DfsLogger$LoggerOperation.await(DfsLogger.java:235)
  at org.apache.accumulo.tserver.log.TabletServerLogger.write(TabletServerLogger.java:330)
  at org.apache.accumulo.tserver.log.TabletServerLogger.write(TabletServerLogger.java:270)
  at org.apache.accumulo.tserver.log.TabletServerLogger.log(TabletServerLogger.java:405)
  at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.update(TabletServer.java:1043)
  at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.apache.accumulo.core.trace.wrappers.RpcServerInvocationHandler.invoke(RpcServerInvocationHandler.java:46)
  at org.apache.accumulo.server.rpc.RpcWrapper$1.invoke(RpcWrapper.java:74)
  at com.sun.proxy.$Proxy20.update(Unknown Source)
  at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$update.getResult(TabletClientService.java:2470)
  at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$update.getResult(TabletClientService.java:2454)
  at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
  at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
  at org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:63)
  at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:516)
  at org.apache.accumulo.server.rpc.CustomNonBlockingServer$1.run(CustomNonBlockingServer.java:78)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
  at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
  at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.apache.accumulo.tserver.log.DfsLogger$LogSyncingTask.run(DfsLogger.java:181)
  ... 2 more
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException):
File /accumulo/wal/ip-192-168-0-253+9997/8cca6a4d-85ee-492f-b97a-6c8645aa0dc2
could only be replicated to 0 nodes instead of minReplication (=1).
There are 5 datanode(s) running and no node(s) are excluded in this
operation.
  at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
  at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
  at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
  at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
  at org.apache.hadoop.ipc.Client.call(Client.java:1475)
  at org.apache.hadoop.ipc.Client.call(Client.java:1412)
  at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
  at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
  at sun.reflect.GeneratedMethodAccessor62.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
  at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
  at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
2017-05-15 06:18:40,852 [log.DfsLogger] WARN : Exception syncing
java.lang.reflect.InvocationTargetException
2017-05-15 06:18:40,852 [log.DfsLogger] ERROR: Failed to close log file
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/accumulo/wal/ip-192-168-0-253+9997/8cca6a4d-85ee-492f-b97a-6c8645aa0dc2
could only be replicated to 0 nodes instead of minReplication (=1).
There are 5 datanode(s) running and no node(s) are excluded in this
operation.
  at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
  at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
  at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
  at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
  at org.apache.hadoop.ipc.Client.call(Client.java:1475)
  at org.apache.hadoop.ipc.Client.call(Client.java:1412)
  at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
  at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
  at sun.reflect.GeneratedMethodAccessor62.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
  at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
  at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)

HDFS web ui info is below,
 Security is off.
 Safemode is off.

 17461 files and directories, 14873 blocks = 32334 total filesystem object(s).
 Heap Memory used 62.81 MB of 91 MB Heap Memory. Max Heap Memory is 1.6 GB.
 Non Heap Memory used 67.14 MB of 69.06 MB Commited Non Heap Memory.
Max Non Heap Memory is -1 B.

 Configured Capacity: 132.43 GB
 DFS Used: 12.44 GB (9.39%)
 Non DFS Used: 58.07 GB
 DFS Remaining: 61.92 GB (46.76%)
 Block Pool Used: 12.44 GB (9.39%)
 DataNodes usages% (Min/Median/Max/stdDev):  5.74% / 9.94% / 11.01% / 1.91%
 Live Nodes 5 (Decommissioned: 0)
 Dead Nodes 0 (Decommissioned: 0)
 Decommissioning Nodes 0
 Total Datanode Volume Failures 0 (0 B)
 Number of Under-Replicated Blocks 0
 Number of Blocks Pending Deletion 0
 Block Deletion Start Time 2017/4/19 11:16:31

Accumulo Configuration is below,
 config -s table.cache.block.enable=true
 config -s tserver.memory.maps.native.enabled=true
 config -s tserver.cache.data.size=1G
 config -s tserver.cache.index.size=2G
 config -s tserver.memory.maps.max=2G
 config -s tserver.client.timeout=5s
 config -s table.durability=flush
 config -t accumulo.metadata -d table.durability
 config -t accumulo.root -d table.durability

Accumulo Monitor web ui info is below,
 Accumulo Overview
 Disk Used 904.26M
 % of Used DFS 100.00%
 Tables 57
 Tablet Servers 5
 Dead Tablet Servers 0
 Tablets 1.86K
 Entries 22.60M
 Lookups 35.62M
 Uptime 28d 3h

If there was a similar error in the past, could you tell me fix method.

Thanks,
Takashi
Reply | Threaded
Open this post in threaded view
|

Re: Tablet Server throwed HDFS replication error(Accumulo 1.7.2)

Josh Elser
Hi Takashi,

Accumulo TabletServers, by default, create WALs with a size of ~1GB
(think, pre-allocate the file). The error you show often comes because a
Datanode cannot actually allocate that much space given its reserved
space threshold. See dfs.datanode.du.reserved in hdfs-site.xml

To help in confirming the problem, you can try to temporarily reduce
tserver.walog.max.size from 1G to 128M (or similar).

I'd recommend you take a look at the Datanode logs. You might get a clue.

- Josh

Takashi Sasaki wrote:

> Hello,
>
> We encountered some error on Accumulo 1.7.2.
> The error seems to be HDFS replication issue, but HDFS is not full.
>
> Actual log is below,
> 2017-05-15 06:18:40,751 [log.TabletServerLogger] ERROR: Unexpected
> error writing to log, retrying attempt 43
> java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
>    at org.apache.accumulo.tserver.log.DfsLogger$LoggerOperation.await(DfsLogger.java:235)
>    at org.apache.accumulo.tserver.log.TabletServerLogger.write(TabletServerLogger.java:330)
>    at org.apache.accumulo.tserver.log.TabletServerLogger.write(TabletServerLogger.java:270)
>    at org.apache.accumulo.tserver.log.TabletServerLogger.log(TabletServerLogger.java:405)
>    at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.update(TabletServer.java:1043)
>    at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>    at java.lang.reflect.Method.invoke(Method.java:498)
>    at org.apache.accumulo.core.trace.wrappers.RpcServerInvocationHandler.invoke(RpcServerInvocationHandler.java:46)
>    at org.apache.accumulo.server.rpc.RpcWrapper$1.invoke(RpcWrapper.java:74)
>    at com.sun.proxy.$Proxy20.update(Unknown Source)
>    at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$update.getResult(TabletClientService.java:2470)
>    at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$update.getResult(TabletClientService.java:2454)
>    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>    at org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:63)
>    at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:516)
>    at org.apache.accumulo.server.rpc.CustomNonBlockingServer$1.run(CustomNonBlockingServer.java:78)
>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>    at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>    at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.reflect.InvocationTargetException
>    at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>    at java.lang.reflect.Method.invoke(Method.java:498)
>    at org.apache.accumulo.tserver.log.DfsLogger$LogSyncingTask.run(DfsLogger.java:181)
>    ... 2 more
> Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException):
> File /accumulo/wal/ip-192-168-0-253+9997/8cca6a4d-85ee-492f-b97a-6c8645aa0dc2
> could only be replicated to 0 nodes instead of minReplication (=1).
> There are 5 datanode(s) running and no node(s) are excluded in this
> operation.
>    at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547)
>    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
>    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
>    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
>    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
>    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:422)
>    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>    at org.apache.hadoop.ipc.Client.call(Client.java:1475)
>    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>    at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
>    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
>    at sun.reflect.GeneratedMethodAccessor62.invoke(Unknown Source)
>    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>    at java.lang.reflect.Method.invoke(Method.java:498)
>    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>    at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
>    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
>    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
>    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
> 2017-05-15 06:18:40,852 [log.DfsLogger] WARN : Exception syncing
> java.lang.reflect.InvocationTargetException
> 2017-05-15 06:18:40,852 [log.DfsLogger] ERROR: Failed to close log file
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /accumulo/wal/ip-192-168-0-253+9997/8cca6a4d-85ee-492f-b97a-6c8645aa0dc2
> could only be replicated to 0 nodes instead of minReplication (=1).
> There are 5 datanode(s) running and no node(s) are excluded in this
> operation.
>    at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547)
>    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
>    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
>    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
>    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
>    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:422)
>    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>    at org.apache.hadoop.ipc.Client.call(Client.java:1475)
>    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>    at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
>    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
>    at sun.reflect.GeneratedMethodAccessor62.invoke(Unknown Source)
>    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>    at java.lang.reflect.Method.invoke(Method.java:498)
>    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>    at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
>    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
>    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
>    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
>
> HDFS web ui info is below,
>   Security is off.
>   Safemode is off.
>
>   17461 files and directories, 14873 blocks = 32334 total filesystem object(s).
>   Heap Memory used 62.81 MB of 91 MB Heap Memory. Max Heap Memory is 1.6 GB.
>   Non Heap Memory used 67.14 MB of 69.06 MB Commited Non Heap Memory.
> Max Non Heap Memory is -1 B.
>
>   Configured Capacity: 132.43 GB
>   DFS Used: 12.44 GB (9.39%)
>   Non DFS Used: 58.07 GB
>   DFS Remaining: 61.92 GB (46.76%)
>   Block Pool Used: 12.44 GB (9.39%)
>   DataNodes usages% (Min/Median/Max/stdDev):  5.74% / 9.94% / 11.01% / 1.91%
>   Live Nodes 5 (Decommissioned: 0)
>   Dead Nodes 0 (Decommissioned: 0)
>   Decommissioning Nodes 0
>   Total Datanode Volume Failures 0 (0 B)
>   Number of Under-Replicated Blocks 0
>   Number of Blocks Pending Deletion 0
>   Block Deletion Start Time 2017/4/19 11:16:31
>
> Accumulo Configuration is below,
>   config -s table.cache.block.enable=true
>   config -s tserver.memory.maps.native.enabled=true
>   config -s tserver.cache.data.size=1G
>   config -s tserver.cache.index.size=2G
>   config -s tserver.memory.maps.max=2G
>   config -s tserver.client.timeout=5s
>   config -s table.durability=flush
>   config -t accumulo.metadata -d table.durability
>   config -t accumulo.root -d table.durability
>
> Accumulo Monitor web ui info is below,
>   Accumulo Overview
>   Disk Used 904.26M
>   % of Used DFS 100.00%
>   Tables 57
>   Tablet Servers 5
>   Dead Tablet Servers 0
>   Tablets 1.86K
>   Entries 22.60M
>   Lookups 35.62M
>   Uptime 28d 3h
>
> If there was a similar error in the past, could you tell me fix method.
>
> Thanks,
> Takashi
Reply | Threaded
Open this post in threaded view
|

Re: Tablet Server throwed HDFS replication error(Accumulo 1.7.2)

Takashi Sasaki
Hi Josh,

The problem was solved and was too simple.

We've used small size HDFS which is on AWS EMR cluster(total disk size
about 120G, replication 2, so actually allocatable max size 40G).

We will increase disk size properly...

Thanks for tips,
Takashi

2017-05-18 0:42 GMT+09:00 Josh Elser <[hidden email]>:

> Hi Takashi,
>
> Accumulo TabletServers, by default, create WALs with a size of ~1GB (think,
> pre-allocate the file). The error you show often comes because a Datanode
> cannot actually allocate that much space given its reserved space threshold.
> See dfs.datanode.du.reserved in hdfs-site.xml
>
> To help in confirming the problem, you can try to temporarily reduce
> tserver.walog.max.size from 1G to 128M (or similar).
>
> I'd recommend you take a look at the Datanode logs. You might get a clue.
>
> - Josh
>
>
> Takashi Sasaki wrote:
>>
>> Hello,
>>
>> We encountered some error on Accumulo 1.7.2.
>> The error seems to be HDFS replication issue, but HDFS is not full.
>>
>> Actual log is below,
>> 2017-05-15 06:18:40,751 [log.TabletServerLogger] ERROR: Unexpected
>> error writing to log, retrying attempt 43
>> java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
>>    at
>> org.apache.accumulo.tserver.log.DfsLogger$LoggerOperation.await(DfsLogger.java:235)
>>    at
>> org.apache.accumulo.tserver.log.TabletServerLogger.write(TabletServerLogger.java:330)
>>    at
>> org.apache.accumulo.tserver.log.TabletServerLogger.write(TabletServerLogger.java:270)
>>    at
>> org.apache.accumulo.tserver.log.TabletServerLogger.log(TabletServerLogger.java:405)
>>    at
>> org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.update(TabletServer.java:1043)
>>    at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>>    at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>    at java.lang.reflect.Method.invoke(Method.java:498)
>>    at
>> org.apache.accumulo.core.trace.wrappers.RpcServerInvocationHandler.invoke(RpcServerInvocationHandler.java:46)
>>    at
>> org.apache.accumulo.server.rpc.RpcWrapper$1.invoke(RpcWrapper.java:74)
>>    at com.sun.proxy.$Proxy20.update(Unknown Source)
>>    at
>> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$update.getResult(TabletClientService.java:2470)
>>    at
>> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$update.getResult(TabletClientService.java:2454)
>>    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>>    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>>    at
>> org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:63)
>>    at
>> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:516)
>>    at
>> org.apache.accumulo.server.rpc.CustomNonBlockingServer$1.run(CustomNonBlockingServer.java:78)
>>    at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>    at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>    at
>> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>>    at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.lang.reflect.InvocationTargetException
>>    at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>>    at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>    at java.lang.reflect.Method.invoke(Method.java:498)
>>    at
>> org.apache.accumulo.tserver.log.DfsLogger$LogSyncingTask.run(DfsLogger.java:181)
>>    ... 2 more
>> Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException):
>> File
>> /accumulo/wal/ip-192-168-0-253+9997/8cca6a4d-85ee-492f-b97a-6c8645aa0dc2
>> could only be replicated to 0 nodes instead of minReplication (=1).
>> There are 5 datanode(s) running and no node(s) are excluded in this
>> operation.
>>    at
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547)
>>    at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
>>    at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
>>    at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
>>    at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
>>    at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>    at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>>    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>>    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>>    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>>    at java.security.AccessController.doPrivileged(Native Method)
>>    at javax.security.auth.Subject.doAs(Subject.java:422)
>>    at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>>    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>>    at org.apache.hadoop.ipc.Client.call(Client.java:1475)
>>    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>>    at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>>    at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
>>    at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
>>    at sun.reflect.GeneratedMethodAccessor62.invoke(Unknown Source)
>>    at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>    at java.lang.reflect.Method.invoke(Method.java:498)
>>    at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>>    at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>    at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
>>    at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
>>    at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
>>    at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
>> 2017-05-15 06:18:40,852 [log.DfsLogger] WARN : Exception syncing
>> java.lang.reflect.InvocationTargetException
>> 2017-05-15 06:18:40,852 [log.DfsLogger] ERROR: Failed to close log file
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>> /accumulo/wal/ip-192-168-0-253+9997/8cca6a4d-85ee-492f-b97a-6c8645aa0dc2
>> could only be replicated to 0 nodes instead of minReplication (=1).
>> There are 5 datanode(s) running and no node(s) are excluded in this
>> operation.
>>    at
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547)
>>    at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
>>    at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
>>    at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
>>    at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
>>    at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>    at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>>    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>>    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>>    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>>    at java.security.AccessController.doPrivileged(Native Method)
>>    at javax.security.auth.Subject.doAs(Subject.java:422)
>>    at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>>    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>>    at org.apache.hadoop.ipc.Client.call(Client.java:1475)
>>    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>>    at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>>    at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
>>    at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
>>    at sun.reflect.GeneratedMethodAccessor62.invoke(Unknown Source)
>>    at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>    at java.lang.reflect.Method.invoke(Method.java:498)
>>    at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>>    at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>    at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
>>    at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
>>    at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
>>    at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
>>
>> HDFS web ui info is below,
>>   Security is off.
>>   Safemode is off.
>>
>>   17461 files and directories, 14873 blocks = 32334 total filesystem
>> object(s).
>>   Heap Memory used 62.81 MB of 91 MB Heap Memory. Max Heap Memory is 1.6
>> GB.
>>   Non Heap Memory used 67.14 MB of 69.06 MB Commited Non Heap Memory.
>> Max Non Heap Memory is -1 B.
>>
>>   Configured Capacity: 132.43 GB
>>   DFS Used: 12.44 GB (9.39%)
>>   Non DFS Used: 58.07 GB
>>   DFS Remaining: 61.92 GB (46.76%)
>>   Block Pool Used: 12.44 GB (9.39%)
>>   DataNodes usages% (Min/Median/Max/stdDev):  5.74% / 9.94% / 11.01% /
>> 1.91%
>>   Live Nodes 5 (Decommissioned: 0)
>>   Dead Nodes 0 (Decommissioned: 0)
>>   Decommissioning Nodes 0
>>   Total Datanode Volume Failures 0 (0 B)
>>   Number of Under-Replicated Blocks 0
>>   Number of Blocks Pending Deletion 0
>>   Block Deletion Start Time 2017/4/19 11:16:31
>>
>> Accumulo Configuration is below,
>>   config -s table.cache.block.enable=true
>>   config -s tserver.memory.maps.native.enabled=true
>>   config -s tserver.cache.data.size=1G
>>   config -s tserver.cache.index.size=2G
>>   config -s tserver.memory.maps.max=2G
>>   config -s tserver.client.timeout=5s
>>   config -s table.durability=flush
>>   config -t accumulo.metadata -d table.durability
>>   config -t accumulo.root -d table.durability
>>
>> Accumulo Monitor web ui info is below,
>>   Accumulo Overview
>>   Disk Used 904.26M
>>   % of Used DFS 100.00%
>>   Tables 57
>>   Tablet Servers 5
>>   Dead Tablet Servers 0
>>   Tablets 1.86K
>>   Entries 22.60M
>>   Lookups 35.62M
>>   Uptime 28d 3h
>>
>> If there was a similar error in the past, could you tell me fix method.
>>
>> Thanks,
>> Takashi