Delete a range of tablets

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Delete a range of tablets

Krzysztof Martyn
Hi Accumulo,

I want to make a system that will store data for a certain period of time,
say 2 days, after that time the data should be deleted.
All time there are ingestion with big amount of new data.
The key is the time of arrival, and splits are generated every 1s so that
data from 1s have a separate tablet.
Is there any possibility to remove a range of tablets without having to run
major compact and merge?

I have tested AgeOffFilter, however, it requires manual launch of the major
compact which almost makes it impossible to scan the database.
I also have tested BatchDeleter, and deleteRows from tableOperations,
however, they are even worse than AgeOffFilter.

Krzysztof



--
Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html
Reply | Threaded
Open this post in threaded view
|

Re: Delete a range of tablets

Andrew Hulbert-2
We do this on large tables by setting our own iterator to age things off
based on our key structure but then use compact range to delete specific
days with a cron.

https://accumulo.apache.org/1.7/apidocs/org/apache/accumulo/core/client/admin/TableOperations.html#compact-java.lang.String-org.apache.hadoop.io.Text-org.apache.hadoop.io.Text-boolean-boolean-

However, the tables in question have keys that that can be computed from
a date range which is why it works. Then it only compacts (deletes) that
specific date range.

Since you have the time of arrival in the key you could likely do the
same thing.

Andrew

On 2/14/19 2:57 AM, Krzysztof Martyn wrote:

> Hi Accumulo,
>
> I want to make a system that will store data for a certain period of time,
> say 2 days, after that time the data should be deleted.
> All time there are ingestion with big amount of new data.
> The key is the time of arrival, and splits are generated every 1s so that
> data from 1s have a separate tablet.
> Is there any possibility to remove a range of tablets without having to run
> major compact and merge?
>
> I have tested AgeOffFilter, however, it requires manual launch of the major
> compact which almost makes it impossible to scan the database.
> I also have tested BatchDeleter, and deleteRows from tableOperations,
> however, they are even worse than AgeOffFilter.
>
> Krzysztof
>
>
>
> --
> Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html
Reply | Threaded
Open this post in threaded view
|

Re: Delete a range of tablets

Christopher Tubbs-2
In reply to this post by Krzysztof Martyn
Have you tried the "deleteRows" API?
http://static.javadoc.io/org.apache.accumulo/accumulo-core/1.9.2/org/apache/accumulo/core/client/admin/TableOperations.html#deleteRows-java.lang.String-org.apache.hadoop.io.Text-org.apache.hadoop.io.Text-

If I remember correctly, it will efficiently delete all the data in
the range by splitting on the end points of the range specified, and
simply dropping the tablets in the middle, stitching the endpoints
back together with a merge.

On Thu, Feb 14, 2019 at 7:59 AM Krzysztof Martyn <[hidden email]> wrote:

>
> Hi Accumulo,
>
> I want to make a system that will store data for a certain period of time,
> say 2 days, after that time the data should be deleted.
> All time there are ingestion with big amount of new data.
> The key is the time of arrival, and splits are generated every 1s so that
> data from 1s have a separate tablet.
> Is there any possibility to remove a range of tablets without having to run
> major compact and merge?
>
> I have tested AgeOffFilter, however, it requires manual launch of the major
> compact which almost makes it impossible to scan the database.
> I also have tested BatchDeleter, and deleteRows from tableOperations,
> however, they are even worse than AgeOffFilter.
>
> Krzysztof
>
>
>
> --
> Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html
Reply | Threaded
Open this post in threaded view
|

Re: Delete a range of tablets

Krzysztof Martyn
In reply to this post by Andrew Hulbert-2
I am actually do that.
I am able to calculate which tablets are to be removed and I am doing range
major compact, which results in a major compact being run on each tablet
server which is caused by balance of the tablet.

"deleteRows" API for me works terrible becouse when I delete rows I cann't
do ingests neither scans, because they are all queued.

Is there any posibility to delete rfile without major compact?


Andrew Hulbert-2 wrote

> We do this on large tables by setting our own iterator to age things off
> based on our key structure but then use compact range to delete specific
> days with a cron.
>
> https://accumulo.apache.org/1.7/apidocs/org/apache/accumulo/core/client/admin/TableOperations.html#compact-java.lang.String-org.apache.hadoop.io.Text-org.apache.hadoop.io.Text-boolean-boolean-
>
> However, the tables in question have keys that that can be computed from
> a date range which is why it works. Then it only compacts (deletes) that
> specific date range.
>
> Since you have the time of arrival in the key you could likely do the
> same thing.
>
> Andrew
>
> On 2/14/19 2:57 AM, Krzysztof Martyn wrote:
>> Hi Accumulo,
>>
>> I want to make a system that will store data for a certain period of
>> time,
>> say 2 days, after that time the data should be deleted.
>> All time there are ingestion with big amount of new data.
>> The key is the time of arrival, and splits are generated every 1s so that
>> data from 1s have a separate tablet.
>> Is there any possibility to remove a range of tablets without having to
>> run
>> major compact and merge?
>>
>> I have tested AgeOffFilter, however, it requires manual launch of the
>> major
>> compact which almost makes it impossible to scan the database.
>> I also have tested BatchDeleter, and deleteRows from tableOperations,
>> however, they are even worse than AgeOffFilter.
>>
>> Krzysztof
>>
>>
>>
>> --
>> Sent from:
>> http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html





--
Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html