Improving Accumulo Replication Latency

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Improving Accumulo Replication Latency

Adam J. Shook
I'm currently scoping what it would take to improve the latency in the replication feature of Accumulo.  I'm interested in knowing what work, if any, is being done to improve replication latency?  If work is being done, would there be some interest in collaborating on that effort?

If nothing is currently being planned, I'd be interested in design ideas and pointers from the community for improvements to the existing implementation.  We're looking to get replication down to less than five minutes and are willing to put in the effort to implement the improvements.

Thank you for your time!

Cheers,
--Adam
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Improving Accumulo Replication Latency

Josh Elser
Hi Adam,

I'm not presently working on anything (too many irons in other fires),
but I'd be happy to help work through a design doc for improvements.

Do you have a list of pain-points which are the primary causes of
latency? That would help in identifying the changes to make and how best
to implement them.

- Josh

Adam J. Shook wrote:

> I'm currently scoping what it would take to improve the latency in the
> replication feature of Accumulo.  I'm interested in knowing what work,
> if any, is being done to improve replication latency?  If work is being
> done, would there be some interest in collaborating on that effort?
>
> If nothing is currently being planned, I'd be interested in design ideas
> and pointers from the community for improvements to the existing
> implementation.  We're looking to get replication down to less than five
> minutes and are willing to put in the effort to implement the improvements.
>
> Thank you for your time!
>
> Cheers,
> --Adam
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Improving Accumulo Replication Latency

Adam J. Shook
Thanks, Josh.  I think the main pain-point is that replication doesn't occur until the WAL is closed.  We've made some aggressive configuration changes to Accumulo to reduce the WAL time rollover and minor compaction frequency to force replication to go faster.  It is down to around 20 minutes or so on our production clusters, but we are kind of at our limit -- Accumulo is spending a lot more time doing bookkeeping tasks and it is starting to affect our query performance.

My initial thoughts are to increase the replication parallelism and start replicating the WAL before it is closed (I see a few JIRAs open already that mention these things), but I haven't done enough digging in the code base to see what is really available.

Are you free for a bit in the near future to meet up for a bit and talk replication?  I'll buy lunch!

Cheers,
--Adam

On Wed, Feb 15, 2017 at 2:52 PM, Josh Elser <[hidden email]> wrote:
Hi Adam,

I'm not presently working on anything (too many irons in other fires), but I'd be happy to help work through a design doc for improvements.

Do you have a list of pain-points which are the primary causes of latency? That would help in identifying the changes to make and how best to implement them.

- Josh


Adam J. Shook wrote:
I'm currently scoping what it would take to improve the latency in the
replication feature of Accumulo.  I'm interested in knowing what work,
if any, is being done to improve replication latency?  If work is being
done, would there be some interest in collaborating on that effort?

If nothing is currently being planned, I'd be interested in design ideas
and pointers from the community for improvements to the existing
implementation.  We're looking to get replication down to less than five
minutes and are willing to put in the effort to implement the improvements.

Thank you for your time!

Cheers,
--Adam

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Improving Accumulo Replication Latency

Josh Elser
Gotcha. That's definitely the biggest factor that I was aware of. I
wasn't sure if you knew more than I did by now ;). I can respect the
implications of too much bookkeeping going on. That might really start
pounding the metadata and replication tables.

Happy to do lunch, also happy to just have a video call too if that's
more convenient.

Adam J. Shook wrote:

> Thanks, Josh.  I think the main pain-point is that replication doesn't
> occur until the WAL is closed.  We've made some aggressive configuration
> changes to Accumulo to reduce the WAL time rollover and minor compaction
> frequency to force replication to go faster.  It is down to around 20
> minutes or so on our production clusters, but we are kind of at our
> limit -- Accumulo is spending a lot more time doing bookkeeping tasks
> and it is starting to affect our query performance.
>
> My initial thoughts are to increase the replication parallelism and
> start replicating the WAL before it is closed (I see a few JIRAs open
> already that mention these things), but I haven't done enough digging in
> the code base to see what is really available.
>
> Are you free for a bit in the near future to meet up for a bit and talk
> replication?  I'll buy lunch!
>
> Cheers,
> --Adam
>
> On Wed, Feb 15, 2017 at 2:52 PM, Josh Elser <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi Adam,
>
>     I'm not presently working on anything (too many irons in other
>     fires), but I'd be happy to help work through a design doc for
>     improvements.
>
>     Do you have a list of pain-points which are the primary causes of
>     latency? That would help in identifying the changes to make and how
>     best to implement them.
>
>     - Josh
>
>
>     Adam J. Shook wrote:
>
>         I'm currently scoping what it would take to improve the latency
>         in the
>         replication feature of Accumulo.  I'm interested in knowing what
>         work,
>         if any, is being done to improve replication latency?  If work
>         is being
>         done, would there be some interest in collaborating on that effort?
>
>         If nothing is currently being planned, I'd be interested in
>         design ideas
>         and pointers from the community for improvements to the existing
>         implementation.  We're looking to get replication down to less
>         than five
>         minutes and are willing to put in the effort to implement the
>         improvements.
>
>         Thank you for your time!
>
>         Cheers,
>         --Adam
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Improving Accumulo Replication Latency

Adam J. Shook
Thanks -- I'll reach out offline to get something set up.

On Wed, Feb 15, 2017 at 3:21 PM, Josh Elser <[hidden email]> wrote:
Gotcha. That's definitely the biggest factor that I was aware of. I wasn't sure if you knew more than I did by now ;). I can respect the implications of too much bookkeeping going on. That might really start pounding the metadata and replication tables.

Happy to do lunch, also happy to just have a video call too if that's more convenient.

Adam J. Shook wrote:
Thanks, Josh.  I think the main pain-point is that replication doesn't
occur until the WAL is closed.  We've made some aggressive configuration
changes to Accumulo to reduce the WAL time rollover and minor compaction
frequency to force replication to go faster.  It is down to around 20
minutes or so on our production clusters, but we are kind of at our
limit -- Accumulo is spending a lot more time doing bookkeeping tasks
and it is starting to affect our query performance.

My initial thoughts are to increase the replication parallelism and
start replicating the WAL before it is closed (I see a few JIRAs open
already that mention these things), but I haven't done enough digging in
the code base to see what is really available.

Are you free for a bit in the near future to meet up for a bit and talk
replication?  I'll buy lunch!

Cheers,
--Adam

On Wed, Feb 15, 2017 at 2:52 PM, Josh Elser <[hidden email]
<mailto:[hidden email]>> wrote:

    Hi Adam,

    I'm not presently working on anything (too many irons in other
    fires), but I'd be happy to help work through a design doc for
    improvements.

    Do you have a list of pain-points which are the primary causes of
    latency? That would help in identifying the changes to make and how
    best to implement them.

    - Josh


    Adam J. Shook wrote:

        I'm currently scoping what it would take to improve the latency
        in the
        replication feature of Accumulo.  I'm interested in knowing what
        work,
        if any, is being done to improve replication latency?  If work
        is being
        done, would there be some interest in collaborating on that effort?

        If nothing is currently being planned, I'd be interested in
        design ideas
        and pointers from the community for improvements to the existing
        implementation.  We're looking to get replication down to less
        than five
        minutes and are willing to put in the effort to implement the
        improvements.

        Thank you for your time!

        Cheers,
        --Adam



Loading...