Quantcast

Check split points of a given table

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Check split points of a given table

Jeff N
Is there a command or way to check the  split points of a given table? I'm attempting to determine how full a tablet is with respect to its capacity, which is, as I understand it, the set/default split point (when based splitting on at a set size). The data distribution over my tablets is drastically skewed with a few tables containing hundreds of GBs while others only store hundreds of MBs.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Check split points of a given table

Eric Newton
There's an API call to get the split points, but not to get the sizes of
each tablet.

You can scan the !METADATA table for file size information, though,
strictly speaking, this isn't part of the public API.

-Eric


On Fri, Sep 20, 2013 at 12:05 PM, Mastergeek <[hidden email]>wrote:

> Is there a command or way to check the  split points of a given table? I'm
> attempting to determine how full a tablet is with respect to its capacity,
> which is, as I understand it, the set/default split point (when based
> splitting on at a set size). The data distribution over my tablets is
> drastically skewed with a few tables containing hundreds of GBs while
> others
> only store hundreds of MBs.
>
>
>
> -----
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/Check-split-points-of-a-given-table-tp5478.html
> Sent from the Developers mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Check split points of a given table

David Medinets
In reply to this post by Jeff N
I run a map-reduce job to count the entries in each mapper (and therefore
each split). In a cleanup method I check the range given by the
RangeInputSplit. Then write start row, end row, and count to some
persistent storage.


On Fri, Sep 20, 2013 at 12:05 PM, Mastergeek <[hidden email]>wrote:

> Is there a command or way to check the  split points of a given table? I'm
> attempting to determine how full a tablet is with respect to its capacity,
> which is, as I understand it, the set/default split point (when based
> splitting on at a set size). The data distribution over my tablets is
> drastically skewed with a few tables containing hundreds of GBs while
> others
> only store hundreds of MBs.
>
>
>
> -----
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/Check-split-points-of-a-given-table-tp5478.html
> Sent from the Developers mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Check split points of a given table

Jeff N
In reply to this post by Eric Newton
So I basically blanket scanned the !METADATA table, but I'm having trouble interpreting the information. I can't seem to find a clear definition of what is in that table so I'm having issues reading the data. A link, if you have one, or any kind of elaboration would be greatly appreciated.

Thanks,
Jeff
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Check split points of a given table

Eric Newton
The !METADATA table isn't well documented, or part of the public API. And
it will change in future releases.

The row id consists of the table id, a semicolon, and the end-row.  When
there is no end-row, the row id ends with "<".

7;endrow
7;lastrow
7;zzzzzz
7<

Every tablet will have a "prev row" entry, which points to the previous
tablet's end row.  This entry contains a value for the prev row, which
starts with \x01 if the entry has a previous row, or \x00 if this is the
first tablet in the table.

7;endrow ~tab:~pr \x00
7;lastrow ~tab:~pr \x01endrow
7;zzzzzz ~tab;~pr \x01lastrow
7<          ~tab;~pr \x01zzzzzz

BTW, the tilde (~) is used to make sure that this entry occurs last in the
tablet.  The !METADATA should always have chains of end-row/prev-row
entries, except during splits and merges.

Tablets contain file references, which contain the file size, and estimated
key count.  Due to splits and bulk imports, the number of keys that apply
to a given tablet for a file reference is not precise.  The entry looks
like this:

7;endrow file:/t-000000/F000000j.rf 9999,123

This tablet points to
hdfs://namenode/accumulo/tables/7/t-000000/F000000j.rf.  The file is 9999
bytes long (compressed) and contains 123 key/value entries.

Since 1.4, file names (F000000.rf) should be universally unique.  The file
naming scheme is:

F- Result of a Flush, a minor compaction
C- Result of a major Compaction, but not over all files
A- Major compaction of All files, in which delete entries were removed.
M- Result of a Merging minor compaction.  A flush that was combined with
the smallest file because there were already too many files in the tablet.
B- A file that was bulk imported

So, if you scan the !METADATA table, looking for prev-row entries, and file
entries, you can get a reasonable estimate of size of each tablet,
including those that are empty.

When tables are cloned, the filenames are relative:

7;endrow file:../5/t-1234567/C00000f.rf 1234,56

In 1.6, the filenames will be absolute:

7;endrow file:hdfs://namenode:port/accumulo/tables/7/t-1234567/C00000f.rf
1234,56

tl;dr - use the file entries: the first number in the value is the file
size.

-Eric


On Mon, Sep 23, 2013 at 5:15 PM, Mastergeek <[hidden email]> wrote:

> So I basically blanket scanned the !METADATA table, but I'm having trouble
> interpreting the information. I can't seem to find a clear definition of
> what is in that table so I'm having issues reading the data. A link, if you
> have one, or any kind of elaboration would be greatly appreciated.
>
> Thanks,
> Jeff
>
>
>
> -----
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/Check-split-points-of-a-given-table-tp5478p5510.html
> Sent from the Developers mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Check split points of a given table

Jeff N
Thank you for your detailed description of the contents of the !METADATA table! This has been extremely useful and I do greatly appreciate it.

Thanks,
Jeff
Loading...