[jira] [Commented] (ACCUMULO-623) Data lost with hdfs write ahead log

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (ACCUMULO-623) Data lost with hdfs write ahead log

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/ACCUMULO-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415434#comment-13415434 ]

Keith Turner commented on ACCUMULO-623:
---------------------------------------

John V and I were discussing this, one possibility is that a tserver will only start if dfs.durable.sync OR dfs.support.append is set to true.  This is kinda screwy because at some point the property dfs.support.append (which defaults to false) will go away and the property dfs.durable.sync will appear (which defaults to true).  However, I do not think there is a way to determine what a property defaults to in HAdoop, because this is just hardcoded into code that uses the prop.  So a user would need to explicitly set dfs.durable.sync to true in their config even though this is the default.  See HADOOP-8365.
               

> Data lost with hdfs write ahead log
> -----------------------------------
>
>                 Key: ACCUMULO-623
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-623
>             Project: Accumulo
>          Issue Type: Bug
>         Environment: MacOSX, Hadoop 1.0.3, zookeeper 3.3.3
>            Reporter: Keith Turner
>            Assignee: Eric Newton
>            Priority: Blocker
>             Fix For: 1.5.0
>
>
> I shut my machine down with Accumulo, Zookeeper, and HDFS running.  When I restarted it, Accumulo failed to recover its write ahead log because it was zero length.  I wondered if this was because I shutdown HDFS so I tried the following on my single node Accumulo instance.
>  * start HDFS and zookeeper
>  * init & start Accumulo
>  * created a table and insert some data
>  * pkill -f java
>  * restart everything
>  * Accumulo fails to start because walog is zero length
> Saw excpetions like the following
> {noformat}
> 06 18:58:44,581 [log.SortedLogRecovery] INFO : Looking at mutations from /accumulo/recovery/def72721-5c64-4755-87cc-2e8cfc3002b7 for !0;!0<<
> 06 18:58:44,590 [tabletserver.TabletServer] WARN : exception trying to assign tablet !0;!0<< /root_tablet
> java.lang.RuntimeException: java.io.IOException: java.lang.RuntimeException: Unable to read log entries
>         at org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1458)
>         at org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1295)
>         at org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1134)
>         at org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1121)
>         at org.apache.accumulo.server.tabletserver.TabletServer$AssignmentHandler.run(TabletServer.java:2477)
>         at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
>         at java.lang.Thread.run(Thread.java:680)
> Caused by: java.io.IOException: java.lang.RuntimeException: Unable to read log entries
>         at org.apache.accumulo.server.tabletserver.log.TabletServerLogger.recover(TabletServerLogger.java:428)
>         at org.apache.accumulo.server.tabletserver.TabletServer.recover(TabletServer.java:3206)
>         at org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1426)
>         ... 6 more
> Caused by: java.lang.RuntimeException: Unable to read log entries
>         at org.apache.accumulo.server.tabletserver.log.SortedLogRecovery.findLastStartToFinish(SortedLogRecovery.java:125)
>         at org.apache.accumulo.server.tabletserver.log.SortedLogRecovery.recover(SortedLogRecovery.java:89)
>         at org.apache.accumulo.server.tabletserver.log.TabletServerLogger.recover(TabletServerLogger.java:426)
>         ... 8 more
> {noformat}
> When trying to run LogReader on the files, it prints nothing.  
> {noformat}
> $ ./bin/accumulo org.apache.accumulo.server.logger.LogReader /accumulo/recovery/def72721-5c64-4755-87cc-2e8cfc3002b7
> 06 19:04:37,147 [util.NativeCodeLoader] WARN : Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> $ ./bin/accumulo org.apache.accumulo.server.logger.LogReader /accumulo/wal/127.0.0.1+40200/def72721-5c64-4755-87cc-2e8cfc3002b7
> $
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira