Fix "Table x has a hole" [SEC=UNOFFICIAL]

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Fix "Table x has a hole" [SEC=UNOFFICIAL]

Dickson, Matt MR

UNOFFICIAL

Running the CheckForMetadataProblems on Accumulo is listing
 
Table xxx has a hole 11111111 != 2222222
 
Is there a correct way to repair this?
 
Thanks in advance.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Fix "Table x has a hole" [SEC=UNOFFICIAL]

Dickson, Matt MR

UNOFFICIAL

This issue appears to be blocking ingest also. 
 
So far I haven't found a way to alter the metadata to correct the issue.


From: Dickson, Matt MR
Sent: Monday, 27 February 2017 10:50
To: '[hidden email]'
Subject: Fix "Table x has a hole" [SEC=UNOFFICIAL]

UNOFFICIAL

Running the CheckForMetadataProblems on Accumulo is listing
 
Table xxx has a hole 11111111 != 2222222
 
Is there a correct way to repair this?
 
Thanks in advance.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fix "Table x has a hole" [SEC=UNOFFICIAL]

Marc P.
Matt,
  You can add replace/insert the key extents in which the hole exists. The check simply looks at the prev end row key ( ~pr ) and ensure it matches the actual last end row. You can insert the keys for that extent. I would back up the table just in case. I thought there was a utility to fill in this gap, but I'm on my phone, so I will look later. I wouldn't normally advise the nuclear step of filling in the hole, but since ingest is backing up I would triage to determine the cause at a later time if you are struggling to maintain homeostasis in your ingest pipeline.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fix "Table x has a hole" [SEC=UNOFFICIAL]

Marc P.
Since I'm on my phone and am likely to be more terse, the extent referenced by the hole likely referenced files on HDFS that are recoverable. If you can locate those you can rebuild that hole, which I and others have done on this list. I will try and respond when I'm at the computer with more info if needed if someone else does not fill in my holey phone responses.

On Mon, Feb 27, 2017 at 8:47 PM, Marc P. <[hidden email]> wrote:
Matt,
  You can add replace/insert the key extents in which the hole exists. The check simply looks at the prev end row key ( ~pr ) and ensure it matches the actual last end row. You can insert the keys for that extent. I would back up the table just in case. I thought there was a utility to fill in this gap, but I'm on my phone, so I will look later. I wouldn't normally advise the nuclear step of filling in the hole, but since ingest is backing up I would triage to determine the cause at a later time if you are struggling to maintain homeostasis in your ingest pipeline.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fix "Table x has a hole" [SEC=UNOFFICIAL]

Michael Wall
Matt,

Did you get past this issue?  Any thoughts on what happened? Did a metadata tablet get deleted?  Once you are well again, I'd like to try to figure out how you got into this state.

Mike

On Mon, Feb 27, 2017 at 8:56 PM Marc P. <[hidden email]> wrote:
Since I'm on my phone and am likely to be more terse, the extent referenced by the hole likely referenced files on HDFS that are recoverable. If you can locate those you can rebuild that hole, which I and others have done on this list. I will try and respond when I'm at the computer with more info if needed if someone else does not fill in my holey phone responses.

On Mon, Feb 27, 2017 at 8:47 PM, Marc P. <[hidden email]> wrote:
Matt,
  You can add replace/insert the key extents in which the hole exists. The check simply looks at the prev end row key ( ~pr ) and ensure it matches the actual last end row. You can insert the keys for that extent. I would back up the table just in case. I thought there was a utility to fill in this gap, but I'm on my phone, so I will look later. I wouldn't normally advise the nuclear step of filling in the hole, but since ingest is backing up I would triage to determine the cause at a later time if you are struggling to maintain homeostasis in your ingest pipeline.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fix "Table x has a hole" [SEC=UNOFFICIAL]

Keith Turner
In reply to this post by Dickson, Matt MR
Below are some commands that show how to recreate this problem and how
to fix it.   Each table in the metadata table has a pointer to the
previous tablets.  Adding and removing splits to a table changes this.

  root@uno> createtable test

Get the tables ID below we will need it later.

  root@uno test> tables -l
  accumulo.metadata    =>        !0
  accumulo.replication =>      +rep
  accumulo.root        =>        +r
  test                 =>         3
  trace                =>         1

Add some splits and then scan the metadata table.  The pointers to the
previous tablet are in the ~tab:~pr column.  The scan below uses the
table id above.

  root@uno test> addsplits 11111111 3333333
  root@uno test> scan -t accumulo.metadata -c ~tab:~pr -b 3; -e 3<
  3;11111111 ~tab:~pr []    \x00
  3;3333333 ~tab:~pr []    \x0111111111
  3< ~tab:~pr []    \x013333333

Add another split and rescan the metadata table.

  root@uno test> addsplits 2222222
  root@uno test> scan -t accumulo.metadata -c ~tab:~pr -b 3; -e 3<
  3;11111111 ~tab:~pr []    \x00
  3;2222222 ~tab:~pr []    \x0111111111
  3;3333333 ~tab:~pr []    \x012222222
  3< ~tab:~pr []    \x013333333

Grant permission to write to the metadata table and then recreate the
problem you have.

  root@uno test> grant Table.WRITE -u root -t accumulo.metadata
  root@uno test> table accumulo.metadata
  root@uno accumulo.metadata> insert 3;3333333 ~tab ~pr \x0111111111
  root@uno accumulo.metadata> scan -t accumulo.metadata -c ~tab:~pr -b 3; -e 3<
  3;11111111 ~tab:~pr []    \x00
  3;2222222 ~tab:~pr []    \x0111111111
  3;3333333 ~tab:~pr []    \x0111111111
  3< ~tab:~pr []    \x013333333

If you ran check for metadata problems here, should see the error
message you saw.  Below, the pointer is fixed and write permission is
revoked (to prevent accidental writes in the future).

  root@uno accumulo.metadata> insert 3;3333333 ~tab ~pr \x012222222
  root@uno accumulo.metadata> revoke Table.WRITE -u root -t accumulo.metadata
  root@uno accumulo.metadata>

After running the command above to fix the potiner, check for metadata
problems should be happy.

It would be nice to try to track down the cause of this.  Spliting a
tablet involves three metadata operations.  For fault tolerance, the
columns ~tab:oldprevrow and ~tab:splitRatio are temporarily written.
If a tablet server dies in the middle of splitting a tablet, then
Accumulo will see these temporary columns and attempt to continue the
split.  So I am curious if you see these columns?

On Sun, Feb 26, 2017 at 6:49 PM, Dickson, Matt MR
<[hidden email]> wrote:
> UNOFFICIAL
>
> Running the CheckForMetadataProblems on Accumulo is listing
>
> Table xxx has a hole 11111111 != 2222222
>
> Is there a correct way to repair this?
>
> Thanks in advance.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Fix "Table x has a hole" [SEC=UNOFFICIAL]

Dickson, Matt MR
UNOFFICIAL

Thanks for that Keith,

That's got it working again.  As for the cause, I had an error in the logs stating a tablet was hosted and assigned.  I've always removed the referenced tablet in the metadata table to fix this and had no issues in the past.  It looks like I fat fingered the deletion which removed the wrong entry so not an issue with Accumulo.

Thanks.

-----Original Message-----
From: Keith Turner [mailto:[hidden email]]
Sent: Wednesday, 1 March 2017 03:36
To: [hidden email]
Subject: Re: Fix "Table x has a hole" [SEC=UNOFFICIAL]

Below are some commands that show how to recreate this problem and how
to fix it.   Each table in the metadata table has a pointer to the
previous tablets.  Adding and removing splits to a table changes this.

  root@uno> createtable test

Get the tables ID below we will need it later.

  root@uno test> tables -l
  accumulo.metadata    =>        !0
  accumulo.replication =>      +rep
  accumulo.root        =>        +r
  test                 =>         3
  trace                =>         1

Add some splits and then scan the metadata table.  The pointers to the previous tablet are in the ~tab:~pr column.  The scan below uses the table id above.

  root@uno test> addsplits 11111111 3333333
  root@uno test> scan -t accumulo.metadata -c ~tab:~pr -b 3; -e 3<
  3;11111111 ~tab:~pr []    \x00
  3;3333333 ~tab:~pr []    \x0111111111
  3< ~tab:~pr []    \x013333333

Add another split and rescan the metadata table.

  root@uno test> addsplits 2222222
  root@uno test> scan -t accumulo.metadata -c ~tab:~pr -b 3; -e 3<
  3;11111111 ~tab:~pr []    \x00
  3;2222222 ~tab:~pr []    \x0111111111
  3;3333333 ~tab:~pr []    \x012222222
  3< ~tab:~pr []    \x013333333

Grant permission to write to the metadata table and then recreate the problem you have.

  root@uno test> grant Table.WRITE -u root -t accumulo.metadata
  root@uno test> table accumulo.metadata
  root@uno accumulo.metadata> insert 3;3333333 ~tab ~pr \x0111111111
  root@uno accumulo.metadata> scan -t accumulo.metadata -c ~tab:~pr -b 3; -e 3<
  3;11111111 ~tab:~pr []    \x00
  3;2222222 ~tab:~pr []    \x0111111111
  3;3333333 ~tab:~pr []    \x0111111111
  3< ~tab:~pr []    \x013333333

If you ran check for metadata problems here, should see the error message you saw.  Below, the pointer is fixed and write permission is revoked (to prevent accidental writes in the future).

  root@uno accumulo.metadata> insert 3;3333333 ~tab ~pr \x012222222
  root@uno accumulo.metadata> revoke Table.WRITE -u root -t accumulo.metadata
  root@uno accumulo.metadata>

After running the command above to fix the potiner, check for metadata problems should be happy.

It would be nice to try to track down the cause of this.  Spliting a tablet involves three metadata operations.  For fault tolerance, the columns ~tab:oldprevrow and ~tab:splitRatio are temporarily written.
If a tablet server dies in the middle of splitting a tablet, then Accumulo will see these temporary columns and attempt to continue the split.  So I am curious if you see these columns?

On Sun, Feb 26, 2017 at 6:49 PM, Dickson, Matt MR <[hidden email]> wrote:
> UNOFFICIAL
>
> Running the CheckForMetadataProblems on Accumulo is listing
>
> Table xxx has a hole 11111111 != 2222222
>
> Is there a correct way to repair this?
>
> Thanks in advance.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fix "Table x has a hole" [SEC=UNOFFICIAL]

Michael Wall
Matt,

This sentence is concerning to me "I've always removed the referenced tablet in the metadata table to fix this and had no issues in the past."  I rarely make edits to the metadata table and am very, very cautious when I do.  This should not be part of normal operating procedures.  Can you provide more context?

Mike

On Wed, Mar 1, 2017 at 12:23 AM, Dickson, Matt MR <[hidden email]> wrote:
UNOFFICIAL

Thanks for that Keith,

That's got it working again.  As for the cause, I had an error in the logs stating a tablet was hosted and assigned.  I've always removed the referenced tablet in the metadata table to fix this and had no issues in the past.  It looks like I fat fingered the deletion which removed the wrong entry so not an issue with Accumulo.

Thanks.

-----Original Message-----
From: Keith Turner [mailto:[hidden email]]
Sent: Wednesday, 1 March 2017 03:36
To: [hidden email]
Subject: Re: Fix "Table x has a hole" [SEC=UNOFFICIAL]

Below are some commands that show how to recreate this problem and how
to fix it.   Each table in the metadata table has a pointer to the
previous tablets.  Adding and removing splits to a table changes this.

  root@uno> createtable test

Get the tables ID below we will need it later.

  root@uno test> tables -l
  accumulo.metadata    =>        !0
  accumulo.replication =>      +rep
  accumulo.root        =>        +r
  test                 =>         3
  trace                =>         1

Add some splits and then scan the metadata table.  The pointers to the previous tablet are in the ~tab:~pr column.  The scan below uses the table id above.

  root@uno test> addsplits 11111111 3333333
  root@uno test> scan -t accumulo.metadata -c ~tab:~pr -b 3; -e 3<
  3;11111111 ~tab:~pr []    \x00
  3;3333333 ~tab:~pr []    \x0111111111
  3< ~tab:~pr []    \x013333333

Add another split and rescan the metadata table.

  root@uno test> addsplits 2222222
  root@uno test> scan -t accumulo.metadata -c ~tab:~pr -b 3; -e 3<
  3;11111111 ~tab:~pr []    \x00
  3;2222222 ~tab:~pr []    \x0111111111
  3;3333333 ~tab:~pr []    \x012222222
  3< ~tab:~pr []    \x013333333

Grant permission to write to the metadata table and then recreate the problem you have.

  root@uno test> grant Table.WRITE -u root -t accumulo.metadata
  root@uno test> table accumulo.metadata
  root@uno accumulo.metadata> insert 3;3333333 ~tab ~pr \x0111111111
  root@uno accumulo.metadata> scan -t accumulo.metadata -c ~tab:~pr -b 3; -e 3<
  3;11111111 ~tab:~pr []    \x00
  3;2222222 ~tab:~pr []    \x0111111111
  3;3333333 ~tab:~pr []    \x0111111111
  3< ~tab:~pr []    \x013333333

If you ran check for metadata problems here, should see the error message you saw.  Below, the pointer is fixed and write permission is revoked (to prevent accidental writes in the future).

  root@uno accumulo.metadata> insert 3;3333333 ~tab ~pr \x012222222
  root@uno accumulo.metadata> revoke Table.WRITE -u root -t accumulo.metadata
  root@uno accumulo.metadata>

After running the command above to fix the potiner, check for metadata problems should be happy.

It would be nice to try to track down the cause of this.  Spliting a tablet involves three metadata operations.  For fault tolerance, the columns ~tab:oldprevrow and ~tab:splitRatio are temporarily written.
If a tablet server dies in the middle of splitting a tablet, then Accumulo will see these temporary columns and attempt to continue the split.  So I am curious if you see these columns?

On Sun, Feb 26, 2017 at 6:49 PM, Dickson, Matt MR <[hidden email]> wrote:
> UNOFFICIAL
>
> Running the CheckForMetadataProblems on Accumulo is listing
>
> Table xxx has a hole 11111111 != 2222222
>
> Is there a correct way to repair this?
>
> Thanks in advance.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fix "Table x has a hole" [SEC=UNOFFICIAL]

Marc P.
Mike,
  That's a good point. My thoughts on this are that we lack the utilities to help since of the five largish instances I've seen recently have required their maintainers to edit the metadata table manually. The CheckForMetadataProblems could prompt the user with ways to fix certain issues along with suggestions? I'd love to have more context too, but I'd be more eager to learn of reasonably sized instances that have not had to do this type of triage manually. 

On Wed, Mar 1, 2017 at 7:56 AM, Michael Wall <[hidden email]> wrote:
Matt,

This sentence is concerning to me "I've always removed the referenced tablet in the metadata table to fix this and had no issues in the past."  I rarely make edits to the metadata table and am very, very cautious when I do.  This should not be part of normal operating procedures.  Can you provide more context?

Mike

On Wed, Mar 1, 2017 at 12:23 AM, Dickson, Matt MR <[hidden email]> wrote:
UNOFFICIAL

Thanks for that Keith,

That's got it working again.  As for the cause, I had an error in the logs stating a tablet was hosted and assigned.  I've always removed the referenced tablet in the metadata table to fix this and had no issues in the past.  It looks like I fat fingered the deletion which removed the wrong entry so not an issue with Accumulo.

Thanks.

-----Original Message-----
From: Keith Turner [mailto:[hidden email]]
Sent: Wednesday, 1 March 2017 03:36
To: [hidden email]
Subject: Re: Fix "Table x has a hole" [SEC=UNOFFICIAL]

Below are some commands that show how to recreate this problem and how
to fix it.   Each table in the metadata table has a pointer to the
previous tablets.  Adding and removing splits to a table changes this.

  root@uno> createtable test

Get the tables ID below we will need it later.

  root@uno test> tables -l
  accumulo.metadata    =>        !0
  accumulo.replication =>      +rep
  accumulo.root        =>        +r
  test                 =>         3
  trace                =>         1

Add some splits and then scan the metadata table.  The pointers to the previous tablet are in the ~tab:~pr column.  The scan below uses the table id above.

  root@uno test> addsplits 11111111 3333333
  root@uno test> scan -t accumulo.metadata -c ~tab:~pr -b 3; -e 3<
  3;11111111 ~tab:~pr []    \x00
  3;3333333 ~tab:~pr []    \x0111111111
  3< ~tab:~pr []    \x013333333

Add another split and rescan the metadata table.

  root@uno test> addsplits 2222222
  root@uno test> scan -t accumulo.metadata -c ~tab:~pr -b 3; -e 3<
  3;11111111 ~tab:~pr []    \x00
  3;2222222 ~tab:~pr []    \x0111111111
  3;3333333 ~tab:~pr []    \x012222222
  3< ~tab:~pr []    \x013333333

Grant permission to write to the metadata table and then recreate the problem you have.

  root@uno test> grant Table.WRITE -u root -t accumulo.metadata
  root@uno test> table accumulo.metadata
  root@uno accumulo.metadata> insert 3;3333333 ~tab ~pr \x0111111111
  root@uno accumulo.metadata> scan -t accumulo.metadata -c ~tab:~pr -b 3; -e 3<
  3;11111111 ~tab:~pr []    \x00
  3;2222222 ~tab:~pr []    \x0111111111
  3;3333333 ~tab:~pr []    \x0111111111
  3< ~tab:~pr []    \x013333333

If you ran check for metadata problems here, should see the error message you saw.  Below, the pointer is fixed and write permission is revoked (to prevent accidental writes in the future).

  root@uno accumulo.metadata> insert 3;3333333 ~tab ~pr \x012222222
  root@uno accumulo.metadata> revoke Table.WRITE -u root -t accumulo.metadata
  root@uno accumulo.metadata>

After running the command above to fix the potiner, check for metadata problems should be happy.

It would be nice to try to track down the cause of this.  Spliting a tablet involves three metadata operations.  For fault tolerance, the columns ~tab:oldprevrow and ~tab:splitRatio are temporarily written.
If a tablet server dies in the middle of splitting a tablet, then Accumulo will see these temporary columns and attempt to continue the split.  So I am curious if you see these columns?

On Sun, Feb 26, 2017 at 6:49 PM, Dickson, Matt MR <[hidden email]> wrote:
> UNOFFICIAL
>
> Running the CheckForMetadataProblems on Accumulo is listing
>
> Table xxx has a hole 11111111 != 2222222
>
> Is there a correct way to repair this?
>
> Thanks in advance.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fix "Table x has a hole" [SEC=UNOFFICIAL]

Keith Turner
In reply to this post by Dickson, Matt MR
On Wed, Mar 1, 2017 at 12:23 AM, Dickson, Matt MR
<[hidden email]> wrote:
> UNOFFICIAL
>
> Thanks for that Keith,
>
> That's got it working again.  As for the cause, I had an error in the logs stating a tablet was hosted and assigned.  I've always removed the referenced tablet in the metadata table to fix this and had no issues in the past.  It looks like I fat fingered the deletion which removed the wrong entry so not an issue with Accumulo.

Hosted and assigned, that's not good. What version of Accumulo are you
running?  Do you have any ideas on what may cause this? How often do
you see this?  Is there any event that usually precedes this?

>
> Thanks.
>
> -----Original Message-----
> From: Keith Turner [mailto:[hidden email]]
> Sent: Wednesday, 1 March 2017 03:36
> To: [hidden email]
> Subject: Re: Fix "Table x has a hole" [SEC=UNOFFICIAL]
>
> Below are some commands that show how to recreate this problem and how
> to fix it.   Each table in the metadata table has a pointer to the
> previous tablets.  Adding and removing splits to a table changes this.
>
>   root@uno> createtable test
>
> Get the tables ID below we will need it later.
>
>   root@uno test> tables -l
>   accumulo.metadata    =>        !0
>   accumulo.replication =>      +rep
>   accumulo.root        =>        +r
>   test                 =>         3
>   trace                =>         1
>
> Add some splits and then scan the metadata table.  The pointers to the previous tablet are in the ~tab:~pr column.  The scan below uses the table id above.
>
>   root@uno test> addsplits 11111111 3333333
>   root@uno test> scan -t accumulo.metadata -c ~tab:~pr -b 3; -e 3<
>   3;11111111 ~tab:~pr []    \x00
>   3;3333333 ~tab:~pr []    \x0111111111
>   3< ~tab:~pr []    \x013333333
>
> Add another split and rescan the metadata table.
>
>   root@uno test> addsplits 2222222
>   root@uno test> scan -t accumulo.metadata -c ~tab:~pr -b 3; -e 3<
>   3;11111111 ~tab:~pr []    \x00
>   3;2222222 ~tab:~pr []    \x0111111111
>   3;3333333 ~tab:~pr []    \x012222222
>   3< ~tab:~pr []    \x013333333
>
> Grant permission to write to the metadata table and then recreate the problem you have.
>
>   root@uno test> grant Table.WRITE -u root -t accumulo.metadata
>   root@uno test> table accumulo.metadata
>   root@uno accumulo.metadata> insert 3;3333333 ~tab ~pr \x0111111111
>   root@uno accumulo.metadata> scan -t accumulo.metadata -c ~tab:~pr -b 3; -e 3<
>   3;11111111 ~tab:~pr []    \x00
>   3;2222222 ~tab:~pr []    \x0111111111
>   3;3333333 ~tab:~pr []    \x0111111111
>   3< ~tab:~pr []    \x013333333
>
> If you ran check for metadata problems here, should see the error message you saw.  Below, the pointer is fixed and write permission is revoked (to prevent accidental writes in the future).
>
>   root@uno accumulo.metadata> insert 3;3333333 ~tab ~pr \x012222222
>   root@uno accumulo.metadata> revoke Table.WRITE -u root -t accumulo.metadata
>   root@uno accumulo.metadata>
>
> After running the command above to fix the potiner, check for metadata problems should be happy.
>
> It would be nice to try to track down the cause of this.  Spliting a tablet involves three metadata operations.  For fault tolerance, the columns ~tab:oldprevrow and ~tab:splitRatio are temporarily written.
> If a tablet server dies in the middle of splitting a tablet, then Accumulo will see these temporary columns and attempt to continue the split.  So I am curious if you see these columns?
>
> On Sun, Feb 26, 2017 at 6:49 PM, Dickson, Matt MR <[hidden email]> wrote:
>> UNOFFICIAL
>>
>> Running the CheckForMetadataProblems on Accumulo is listing
>>
>> Table xxx has a hole 11111111 != 2222222
>>
>> Is there a correct way to repair this?
>>
>> Thanks in advance.
Loading...