Quantcast

Re-init Accumulo over existing installation

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re-init Accumulo over existing installation

Krishmin Rai
Hi All,
  We've recently encountered a strange situation on a small test cluster: after an awkward crash, our ZooKeeper data was erased and we no longer have the [accumulo] znode. The HDFS accumulo directory is intact, so all the RFiles and etc are still there, but it's not clear how best to bring Accumulo back up to its previous state. Obviously just starting Accumulo as-is complains about the missing znode ("Waiting for accumulo to be initialized"), whereas re-initializing is not possible over existing HDFS directories ("It appears this location was previously initialized, exiting").

  A couple of questions about recovery strategies:

1) Is there any way to re-create the znode for a previous instance-id? My understanding is that ZK is mostly used to store ephemeral data (such as which tserver is currently responsible for which tablets) and things like users (which we could re-create), so perhaps this is plausible?

2) I imagine that I could init Accumulo with a new instance.dfs.dir, then import the RFiles from the old installation back in. I see Patrick just asked a related question, so, with the data integrity caveats, I would essentially be following the last of the steps in ACCUMULO-456.

3) This is a vague question, but have any of you had experience with the [accumulo] znode being entirely deleted? Aside from stopping/starting ZK (3.3.5) and Accumulo 1.4.0 (possibly with a force-quit), I'm not sure what we could have done to actually delete it.

This is just a test instance, and the data could easily be recreated, but I want to take this opportunity to learn a little more about Accumulo plumbing and maintenance.

Thanks,
Krishmin

 


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Re-init Accumulo over existing installation

John Vines
Responding inline

On Thu, Jul 5, 2012 at 11:16 AM, Krishmin Rai <[hidden email]> wrote:
Hi All,
  We've recently encountered a strange situation on a small test cluster: after an awkward crash, our ZooKeeper data was erased and we no longer have the [accumulo] znode. The HDFS accumulo directory is intact, so all the RFiles and etc are still there, but it's not clear how best to bring Accumulo back up to its previous state. Obviously just starting Accumulo as-is complains about the missing znode ("Waiting for accumulo to be initialized"), whereas re-initializing is not possible over existing HDFS directories ("It appears this location was previously initialized, exiting").

  A couple of questions about recovery strategies:

1) Is there any way to re-create the znode for a previous instance-id? My understanding is that ZK is mostly used to store ephemeral data (such as which tserver is currently responsible for which tablets) and things like users (which we could re-create), so perhaps this is plausible?

Theoretically, this is possible. But there could be issues with the ACLs involved with some pieces of the user space. Zookeeper also stores table configuration info. I suggest that rather then trying to regenerate zookeeper in the proper fashion you move the accumulo HDFS directory to teh side, create a new instance, recreate the users and tables, and then bulk import the old instance's table files into the new accumulo instance.
 

2) I imagine that I could init Accumulo with a new instance.dfs.dir, then import the RFiles from the old installation back in. I see Patrick just asked a related question, so, with the data integrity caveats, I would essentially be following the last of the steps in ACCUMULO-456.

Answered above.
 

3) This is a vague question, but have any of you had experience with the [accumulo] znode being entirely deleted? Aside from stopping/starting ZK (3.3.5) and Accumulo 1.4.0 (possibly with a force-quit), I'm not sure what we could have done to actually delete it.

The biggest cause for this I've seen is people leaving their zookeeper data directory in /tmp. I would start there.
 

This is just a test instance, and the data could easily be recreated, but I want to take this opportunity to learn a little more about Accumulo plumbing and maintenance.

Thanks,
Krishmin





Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Re-init Accumulo over existing installation

Keith Turner
In reply to this post by Krishmin Rai
I have never seen this happen, I have found zookeeper to be very
reliable.  I think Accumulo needs a utility to handle this case of
reinitializing just zookeeper.  Would you like to open a ticket?

Zookeeper does store some important persistent info, like mappings of
table names to table ids, table config, users data, and FATE ops.


On Thu, Jul 5, 2012 at 11:15 AM, Krishmin Rai <[hidden email]> wrote:

> Hi All,
>   We've recently encountered a strange situation on a small test cluster: after an awkward crash, our ZooKeeper data was erased and we no longer have the [accumulo] znode. The HDFS accumulo directory is intact, so all the RFiles and etc are still there, but it's not clear how best to bring Accumulo back up to its previous state. Obviously just starting Accumulo as-is complains about the missing znode ("Waiting for accumulo to be initialized"), whereas re-initializing is not possible over existing HDFS directories ("It appears this location was previously initialized, exiting").
>
>   A couple of questions about recovery strategies:
>
> 1) Is there any way to re-create the znode for a previous instance-id? My understanding is that ZK is mostly used to store ephemeral data (such as which tserver is currently responsible for which tablets) and things like users (which we could re-create), so perhaps this is plausible?
>
> 2) I imagine that I could init Accumulo with a new instance.dfs.dir, then import the RFiles from the old installation back in. I see Patrick just asked a related question, so, with the data integrity caveats, I would essentially be following the last of the steps in ACCUMULO-456.
>
> 3) This is a vague question, but have any of you had experience with the [accumulo] znode being entirely deleted? Aside from stopping/starting ZK (3.3.5) and Accumulo 1.4.0 (possibly with a force-quit), I'm not sure what we could have done to actually delete it.
>
> This is just a test instance, and the data could easily be recreated, but I want to take this opportunity to learn a little more about Accumulo plumbing and maintenance.
>
> Thanks,
> Krishmin
>
>
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Re-init Accumulo over existing installation

Krishmin Rai
Keith, I created Accumulo-671 about a ZK re-initialization process.

John: definitely not using /tmp as the ZK data dir, but if no one else has experienced this kind of data loss with ZK, I'm ready to chalk it up to some kind of one-time user error on our part. If it happens again, I'll dig in further.

Thanks for the quick responses!
Krishmin

On Jul 5, 2012, at 11:35 AM, Keith Turner wrote:

> I have never seen this happen, I have found zookeeper to be very
> reliable.  I think Accumulo needs a utility to handle this case of
> reinitializing just zookeeper.  Would you like to open a ticket?
>
> Zookeeper does store some important persistent info, like mappings of
> table names to table ids, table config, users data, and FATE ops.
>
>
> On Thu, Jul 5, 2012 at 11:15 AM, Krishmin Rai <[hidden email]> wrote:
>> Hi All,
>>  We've recently encountered a strange situation on a small test cluster: after an awkward crash, our ZooKeeper data was erased and we no longer have the [accumulo] znode. The HDFS accumulo directory is intact, so all the RFiles and etc are still there, but it's not clear how best to bring Accumulo back up to its previous state. Obviously just starting Accumulo as-is complains about the missing znode ("Waiting for accumulo to be initialized"), whereas re-initializing is not possible over existing HDFS directories ("It appears this location was previously initialized, exiting").
>>
>>  A couple of questions about recovery strategies:
>>
>> 1) Is there any way to re-create the znode for a previous instance-id? My understanding is that ZK is mostly used to store ephemeral data (such as which tserver is currently responsible for which tablets) and things like users (which we could re-create), so perhaps this is plausible?
>>
>> 2) I imagine that I could init Accumulo with a new instance.dfs.dir, then import the RFiles from the old installation back in. I see Patrick just asked a related question, so, with the data integrity caveats, I would essentially be following the last of the steps in ACCUMULO-456.
>>
>> 3) This is a vague question, but have any of you had experience with the [accumulo] znode being entirely deleted? Aside from stopping/starting ZK (3.3.5) and Accumulo 1.4.0 (possibly with a force-quit), I'm not sure what we could have done to actually delete it.
>>
>> This is just a test instance, and the data could easily be recreated, but I want to take this opportunity to learn a little more about Accumulo plumbing and maintenance.
>>
>> Thanks,
>> Krishmin
>>
>>
>>
>>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Re-init Accumulo over existing installation

Adam Fuchs
In reply to this post by Krishmin Rai
Let me start by saying that everything I say is experimental and this information does not carry any warranties of correctness.

You basically have two avenues of recovery when zookeeper data is lost:
1. Create new tables and bulk import your old RFiles.
2. Try to recreate the Zookeeper data.

The first option has been done before, and is not too hard. You basically just move the old HDFS directory, initialize a new instance, create all your tables, find the RFiles from the old tables, and bulk load them into the new tables. The risk here is that you will lose information that was only in the write-ahead logs, and the conditions described in ACCUMULO-456 may cause you trouble.

The second option has never been done to our knowledge. The hard part there is to create all of the tables that you used to have with the same table IDs that they used to have, and with the same configuration. If you new the mapping of table ID to table name, you could probably write a script that did something like:
1. Move old HDFS directory.
2. Initialize new instance.
3. Bring new instance online (except for the garbage collector).
4. Create tables in the same order that you created them with the old instance (including creating and deleting tables that were deleted in the old instance).
5. Take the new instance offline.
6. Create references to the correct write-ahead log files for the root tablet of the old instance in zookeeper.
7. Delete the new HDFS directory.
8. Copy the old HDFS directory into the location of the new HDFS directory. (as long as this is a copy and you don't start the garbage collector you should be able to repeat these steps until you get them right)
9. Bring the system online and hope everything worked.

Cheers,
Adam


On Thu, Jul 5, 2012 at 11:15 AM, Krishmin Rai <[hidden email]> wrote:
Hi All,
  We've recently encountered a strange situation on a small test cluster: after an awkward crash, our ZooKeeper data was erased and we no longer have the [accumulo] znode. The HDFS accumulo directory is intact, so all the RFiles and etc are still there, but it's not clear how best to bring Accumulo back up to its previous state. Obviously just starting Accumulo as-is complains about the missing znode ("Waiting for accumulo to be initialized"), whereas re-initializing is not possible over existing HDFS directories ("It appears this location was previously initialized, exiting").

  A couple of questions about recovery strategies:

1) Is there any way to re-create the znode for a previous instance-id? My understanding is that ZK is mostly used to store ephemeral data (such as which tserver is currently responsible for which tablets) and things like users (which we could re-create), so perhaps this is plausible?

2) I imagine that I could init Accumulo with a new instance.dfs.dir, then import the RFiles from the old installation back in. I see Patrick just asked a related question, so, with the data integrity caveats, I would essentially be following the last of the steps in ACCUMULO-456.

3) This is a vague question, but have any of you had experience with the [accumulo] znode being entirely deleted? Aside from stopping/starting ZK (3.3.5) and Accumulo 1.4.0 (possibly with a force-quit), I'm not sure what we could have done to actually delete it.

This is just a test instance, and the data could easily be recreated, but I want to take this opportunity to learn a little more about Accumulo plumbing and maintenance.

Thanks,
Krishmin





Loading...