Accumulo Table Sacanning Taking Time!!!

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Accumulo Table Sacanning Taking Time!!!

Suresh Prajapati
Hello Team

I am developing a client in accumulo to store geo-spatial information and
using geomesa for indexing on top of it. However i found that scanning *~1
million* records taking *2-3 sec*. I looked at indexes and query plan of
geomesa but not able to find cause of the problem. I am running accumulo as
single tablet-server(including master). I want to know -
what are the factors can affect accumulo scanning operation? how can I
optimise this time?

Thank You
Suresh Prajapati
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Fwd: Accumulo Table Sacanning Taking Time!!!

Suresh Prajapati
---------- Forwarded message ----------
From: Suresh Prajapati <[hidden email]>
Date: Thu, Apr 27, 2017 at 4:39 PM
Subject: Accumulo Table Sacanning Taking Time!!!
To: [hidden email]


Hello Team

I am developing a client in accumulo to store geo-spatial information and
using geomesa for indexing on top of it. However i found that scanning *~1
million* records taking *2-3 sec*. I looked at indexes and query plan of
geomesa but not able to find cause of the problem. I am running accumulo as
single tablet-server(including master). I want to know -
what are the factors can affect accumulo scanning operation? how can I
optimise this time?

Thank You
Suresh Prajapati
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fwd: Accumulo Table Sacanning Taking Time!!!

Dave Marion-2
You could add more tablet servers and add splits to the table.

> On April 27, 2017 at 7:17 AM Suresh Prajapati <[hidden email]> wrote:
>
>
> ---------- Forwarded message ----------
> From: Suresh Prajapati <[hidden email]>
> Date: Thu, Apr 27, 2017 at 4:39 PM
> Subject: Accumulo Table Sacanning Taking Time!!!
> To: [hidden email]
>
>
> Hello Team
>
> I am developing a client in accumulo to store geo-spatial information and
> using geomesa for indexing on top of it. However i found that scanning *~1
> million* records taking *2-3 sec*. I looked at indexes and query plan of
> geomesa but not able to find cause of the problem. I am running accumulo as
> single tablet-server(including master). I want to know -
> what are the factors can affect accumulo scanning operation? how can I
> optimise this time?
>
> Thank You
> Suresh Prajapati
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Accumulo Table Sacanning Taking Time!!!

Marc P.
In reply to this post by Suresh Prajapati
Suresh,
   There are a lot of configuration points that can have an impact. For
example, there is a configuration option that dictates how much data is
returned each "iteration," called table.scan.max.memory [0]. Increasing
this will cause more work to be done in each RPC call to get data. Lowering
this can have the illusion of improved response time since you get data
faster. Playing with this might impact your use case. If your keys/values
are large you might attempt to increase this configuration number.

Further, scanning can be impacted by the size of the data and the way it is
stored. Table block caching might have an improvement [1], but I'm curious
about how the data is stored. Do you have example keys. Are you returning
all 1 million records from Accumulo through the scanner to perform some
logic client side or is the logic server side in an iterator? Could you do
more work in an iterator? Iterating over 1 M keys likely won't take 2-3
seconds when executed at the tablet server, depending on the size of the
key. Providing some insight into what the key structure is might give us
more insight into how to better configure your tablet server properties.

   Finally, is the 2-3 seconds just the time to get the data or does that
include time to inspect keys?

[0]
http://accumulo.apache.org/1.6/accumulo_user_manual#_table_scan_max_memory
[1] http://accumulo.apache.org/1.6/accumulo_user_manual#_block_cache

On Thu, Apr 27, 2017 at 7:09 AM, Suresh Prajapati <[hidden email]
> wrote:

> Hello Team
>
> I am developing a client in accumulo to store geo-spatial information and
> using geomesa for indexing on top of it. However i found that scanning *~1
> million* records taking *2-3 sec*. I looked at indexes and query plan of
> geomesa but not able to find cause of the problem. I am running accumulo as
> single tablet-server(including master). I want to know -
> what are the factors can affect accumulo scanning operation? how can I
> optimise this time?
>
> Thank You
> Suresh Prajapati
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Accumulo Table Sacanning Taking Time!!!

Suresh Prajapati
Hello Marc

Thanks for pointing out the area of problems. I tried changing
*table.scan.max.memory
*but didn't find any changes in performance.
I am trying to fetch matching records count for specified query by using
AccumuloDatastore(ds) stats. Here is my sample code:

public int getRideCount(Long rideId) throws Exception {

    if(rideId != null){

         return ((Long) (ds.stats().getCount(sft, CQL.toFilter("r=" + rideId),
true).get())).intValue();

    }

    return 0;

  }

I also tried using Iterator but this is even worst. Below is the sample
code:

public int getRideCount(Long rideId) throws Exception {

   int count = 0;

    if(rideId != null){

      Query q = new Query(tableName, CQL.toFilter("r=" + rideId));

      SimpleFeatureIterator it = sfs.getFeatures(q).features();

      while(it.hasNext()){

      it.next();

      count++;

      }

      it.close();

    }

    return count;

  }


For highlighting the *key structure*, here is my feature type description :


*r:Long:cardinality=high:index=join,*g:Point:srid=4326,di:Integer:index=join,al:Float,s:Float,b:Float,an:Float,he:Float,ve:Float,t:Float,m:Boolean,i:Boolean,ts:Long;geomesa.table.sharing='true',geomesa.indices='attr:4:3,records:2:3,z2:3:3',geomesa.table.sharing.prefix='\\u0001'*


Please feel free to ask for any further clarifications.

Thank You

Suresh Prajapati

On Thu, Apr 27, 2017 at 7:05 PM, Marc P. <[hidden email]> wrote:

> Suresh,
>    There are a lot of configuration points that can have an impact. For
> example, there is a configuration option that dictates how much data is
> returned each "iteration," called table.scan.max.memory [0]. Increasing
> this will cause more work to be done in each RPC call to get data. Lowering
> this can have the illusion of improved response time since you get data
> faster. Playing with this might impact your use case. If your keys/values
> are large you might attempt to increase this configuration number.
>
> Further, scanning can be impacted by the size of the data and the way it is
> stored. Table block caching might have an improvement [1], but I'm curious
> about how the data is stored. Do you have example keys. Are you returning
> all 1 million records from Accumulo through the scanner to perform some
> logic client side or is the logic server side in an iterator? Could you do
> more work in an iterator? Iterating over 1 M keys likely won't take 2-3
> seconds when executed at the tablet server, depending on the size of the
> key. Providing some insight into what the key structure is might give us
> more insight into how to better configure your tablet server properties.
>
>    Finally, is the 2-3 seconds just the time to get the data or does that
> include time to inspect keys?
>
> [0]
> http://accumulo.apache.org/1.6/accumulo_user_manual#_table_scan_max_memory
> [1] http://accumulo.apache.org/1.6/accumulo_user_manual#_block_cache
>
> On Thu, Apr 27, 2017 at 7:09 AM, Suresh Prajapati <
> [hidden email]
> > wrote:
>
> > Hello Team
> >
> > I am developing a client in accumulo to store geo-spatial information and
> > using geomesa for indexing on top of it. However i found that scanning
> *~1
> > million* records taking *2-3 sec*. I looked at indexes and query plan of
> > geomesa but not able to find cause of the problem. I am running accumulo
> as
> > single tablet-server(including master). I want to know -
> > what are the factors can affect accumulo scanning operation? how can I
> > optimise this time?
> >
> > Thank You
> > Suresh Prajapati
> >
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Accumulo Table Sacanning Taking Time!!!

Keith Turner
In reply to this post by Suresh Prajapati
Do you know if the tablet server and/or client is CPU bound?  When you
run the query, do you see either go to 100% CPU?

For the *~1 million* records, what is the data size?  I Ask because I
am curious what the data rate is?  For example is it 2MB/sec
500KB/sec?

On Thu, Apr 27, 2017 at 7:09 AM, Suresh Prajapati
<[hidden email]> wrote:

> Hello Team
>
> I am developing a client in accumulo to store geo-spatial information and
> using geomesa for indexing on top of it. However i found that scanning *~1
> million* records taking *2-3 sec*. I looked at indexes and query plan of
> geomesa but not able to find cause of the problem. I am running accumulo as
> single tablet-server(including master). I want to know -
> what are the factors can affect accumulo scanning operation? how can I
> optimise this time?
>
> Thank You
> Suresh Prajapati
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Accumulo Table Sacanning Taking Time!!!

Suresh Prajapati
No, I don't see CPU utilisation going 100% (reaches upto ~40%). Here is the accumulo table data size:
Table Name - aj_join
aj_join_attr_v4 - 79.79MB
aj_join_records_v2 - 58.25 MB

Scan(Entries/s) goes to - 200000
Disk Usage shows - ~10Mbps for Read while scan rate on Accumulo web interface is very less. Here is the screen shot for the same


On Mon, May 1, 2017 at 8:22 PM, Keith Turner <[hidden email]> wrote:
Do you know if the tablet server and/or client is CPU bound?  When you
run the query, do you see either go to 100% CPU?

For the *~1 million* records, what is the data size?  I Ask because I
am curious what the data rate is?  For example is it 2MB/sec
500KB/sec?

On Thu, Apr 27, 2017 at 7:09 AM, Suresh Prajapati
<[hidden email]> wrote:
> Hello Team
>
> I am developing a client in accumulo to store geo-spatial information and
> using geomesa for indexing on top of it. However i found that scanning *~1
> million* records taking *2-3 sec*. I looked at indexes and query plan of
> geomesa but not able to find cause of the problem. I am running accumulo as
> single tablet-server(including master). I want to know -
> what are the factors can affect accumulo scanning operation? how can I
> optimise this time?
>
> Thank You
> Suresh Prajapati

Loading...