Scanner / Batch Scanner Reuse

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Scanner / Batch Scanner Reuse

gtotsline
Hi -

Can Scanners / Batch Scanners be reused?  Is there any downside to reusing a
scanner (e.g. poorer performance)?  I assume creating a scanner takes time,
so asking this question to see if I can avoid needlessly recreating a
BatchScanner every time I do a query.

Thanks!



--
Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html
Reply | Threaded
Open this post in threaded view
|

Re: Scanner / Batch Scanner Reuse

Christopher Tubbs-2
They can be reused, if they haven't been closed (they are
AutoCloseable and should be closed when you're done with them). I
wouldn't recommend it, though. While most of the work is done in the
scanner's iterator() method, the close() method on the scanner API
implicitly associates the scanner's resources with the entire scanner,
and not just its iterator. This can be hard to reason about if you're
trying to predict performance impact. It's probably better to think of
each instance of scanner having one set of resources for one
iterator/query. The behavior could also change between versions. (For
example, regular, non-batch, scanners currently don't do anything in
their close() method, but that is expected to change in future to help
clean up unused server-side resources when a scan is finished).

In short: yes, they can be reused (if not closed), but I wouldn't do
it; you'll probably not see much benefit, and there may be unintended
consequences.

On Mon, Jan 14, 2019 at 2:05 PM gtotsline <[hidden email]> wrote:

>
> Hi -
>
> Can Scanners / Batch Scanners be reused?  Is there any downside to reusing a
> scanner (e.g. poorer performance)?  I assume creating a scanner takes time,
> so asking this question to see if I can avoid needlessly recreating a
> BatchScanner every time I do a query.
>
> Thanks!
>
>
>
> --
> Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html
Reply | Threaded
Open this post in threaded view
|

Re: Scanner / Batch Scanner Reuse

Keith Turner
In reply to this post by gtotsline
On Mon, Jan 14, 2019 at 2:05 PM gtotsline <[hidden email]> wrote:
>
> Hi -
>
> Can Scanners / Batch Scanners be reused?  Is there any downside to reusing a
> scanner (e.g. poorer performance)?  I assume creating a scanner takes time,
> so asking this question to see if I can avoid needlessly recreating a
> BatchScanner every time I do a query.

The main difference I can think of is that each batch scanner instance
creates a thread pool.  Other resources like tablet server connection
pools and tablet location caches are shared between batch scanner
instances.

>
> Thanks!
>
>
>
> --
> Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html
Reply | Threaded
Open this post in threaded view
|

Re: Scanner / Batch Scanner Reuse

gtotsline
In reply to this post by Christopher Tubbs-2
Great reasons not to reuse, thank you very much!



--
Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html
Reply | Threaded
Open this post in threaded view
|

Re: Scanner / Batch Scanner Reuse

gtotsline
In reply to this post by Keith Turner
Very good point, I did not consider the thread pool associated with each
instance.  Thanks for responding.



--
Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html
Reply | Threaded
Open this post in threaded view
|

Re: Scanner / Batch Scanner Reuse

Andrew Hulbert-2
In reply to this post by Keith Turner
Keith,

I was thinking about that a year or so ago...do you think there'd be any
problems with modifying the batch scanners to share a thread pool per
JVM? Or be able to pass in a thread pool. I think the reason was we were
trying to limit the client scans threads globally in the JVM. But it
hasn't been a huge issue.

Andrew

On 1/16/19 10:27 AM, Keith Turner wrote:

> On Mon, Jan 14, 2019 at 2:05 PM gtotsline <[hidden email]> wrote:
>> Hi -
>>
>> Can Scanners / Batch Scanners be reused?  Is there any downside to reusing a
>> scanner (e.g. poorer performance)?  I assume creating a scanner takes time,
>> so asking this question to see if I can avoid needlessly recreating a
>> BatchScanner every time I do a query.
> The main difference I can think of is that each batch scanner instance
> creates a thread pool.  Other resources like tablet server connection
> pools and tablet location caches are shared between batch scanner
> instances.
>
>> Thanks!
>>
>>
>>
>> --
>> Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html
Reply | Threaded
Open this post in threaded view
|

Re: Scanner / Batch Scanner Reuse

Keith Turner
On Fri, Jan 25, 2019 at 4:22 PM Andrew Hulbert <[hidden email]> wrote:
>
> Keith,
>
> I was thinking about that a year or so ago...do you think there'd be any
> problems with modifying the batch scanners to share a thread pool per
> JVM? Or be able to pass in a thread pool. I think the reason was we were
> trying to limit the client scans threads globally in the JVM. But it
> hasn't been a huge issue.

I think passing in an ExecutorService makes sense.  It would be good
to do this before 2.0.0 for the new AccumuloClient API.   The
following method could be removed ....

https://github.com/apache/accumulo/blob/7e2e6bd2d1afff7fde0223463f84b0c70f9cab7f/core/src/main/java/org/apache/accumulo/core/client/AccumuloClient.java#L80

... and replaced with something like

createBatchScanner(String tableName, Authorizations authorizations,
BatchScannerConfig config);

On BatchScannerConfig and executor could be set.   Also we could allow
executors to be set for the batch writer and conditional writers.

>
> Andrew
>
> On 1/16/19 10:27 AM, Keith Turner wrote:
> > On Mon, Jan 14, 2019 at 2:05 PM gtotsline <[hidden email]> wrote:
> >> Hi -
> >>
> >> Can Scanners / Batch Scanners be reused?  Is there any downside to reusing a
> >> scanner (e.g. poorer performance)?  I assume creating a scanner takes time,
> >> so asking this question to see if I can avoid needlessly recreating a
> >> BatchScanner every time I do a query.
> > The main difference I can think of is that each batch scanner instance
> > creates a thread pool.  Other resources like tablet server connection
> > pools and tablet location caches are shared between batch scanner
> > instances.
> >
> >> Thanks!
> >>
> >>
> >>
> >> --
> >> Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html