benchmarking

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

benchmarking

guy sharon
hi,

I've just started working with Accumulo and I think I'm experiencing slow reads/writes. I'm aware of the recommended configuration. Does anyone know of any standard benchmarks and benchmarking tools I can use to tell if the performance I'm getting is reasonable?


Reply | Threaded
Open this post in threaded view
|

Re: benchmarking

Michael Wall-2
Hi Guy,

Here are a couple links I found.  Can you tell us more about your setup and what you are seeing?


Mike


On Sat, Aug 25, 2018 at 5:09 PM guy sharon <[hidden email]> wrote:
hi,

I've just started working with Accumulo and I think I'm experiencing slow reads/writes. I'm aware of the recommended configuration. Does anyone know of any standard benchmarks and benchmarking tools I can use to tell if the performance I'm getting is reasonable?


Reply | Threaded
Open this post in threaded view
|

Re: benchmarking

Mike Walch-2
Hi Guy,

If you are looking to improve performance, you should also check out the 2.0 documentation below:


On Mon, Aug 27, 2018 at 9:43 AM Michael Wall <[hidden email]> wrote:
Hi Guy,

Here are a couple links I found.  Can you tell us more about your setup and what you are seeing?


Mike


On Sat, Aug 25, 2018 at 5:09 PM guy sharon <[hidden email]> wrote:
hi,

I've just started working with Accumulo and I think I'm experiencing slow reads/writes. I'm aware of the recommended configuration. Does anyone know of any standard benchmarks and benchmarking tools I can use to tell if the performance I'm getting is reasonable?


Reply | Threaded
Open this post in threaded view
|

Re: benchmarking

guy sharon
In reply to this post by Michael Wall-2
hi Mike,

Thanks for the links.

My current setup is a 4 node cluster (tserver, master, gc, monitor) running on Alpine Docker containers on a laptop with an i7 processor (8 cores) with 16GB of RAM. As an example I'm running a count of all entries for a table with 6.3M entries with "accumulo shell -u root -p secret  -e "scan -t benchmark_table -np" | wc -l" and it takes 43 seconds. Not sure if this is reasonable or not. Seems a little slow to me. What do you think?

BR,
Guy.




On Mon, Aug 27, 2018 at 4:43 PM Michael Wall <[hidden email]> wrote:
Hi Guy,

Here are a couple links I found.  Can you tell us more about your setup and what you are seeing?


Mike


On Sat, Aug 25, 2018 at 5:09 PM guy sharon <[hidden email]> wrote:
hi,

I've just started working with Accumulo and I think I'm experiencing slow reads/writes. I'm aware of the recommended configuration. Does anyone know of any standard benchmarks and benchmarking tools I can use to tell if the performance I'm getting is reasonable?


Reply | Threaded
Open this post in threaded view
|

Re: benchmarking

Sean Busbey-6
Hi Guy,

Apache Accumulo is designed for horizontally scaling out for large scale workloads that need to do random reads and writes. There's a non-trivial amount of overhead that comes with a system aimed at doing that on thousands of nodes.

If your use case works for a single laptop with such a small number of entries and exhaustive scans, then Accumulo is probably not the correct tool for the job.

For example, on my laptop (i7 2 cores, 8GiB memory) with that dataset size you can just rely on a file format like Apache Avro:

busbey$ time java -jar avro-tools-1.7.7.jar random --codec snappy --count 6300000 --schema '{ "type": "record", "name": "entry", "fields": [ { "name": "field0", "type": "string" } ] }' ~/Downloads/6.3m_entries.avro
Aug 28, 2018 12:31:13 AM org.apache.hadoop.util.NativeCodeLoader <clinit>
WARNING: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
test.seed=1535441473243

real 0m5.451s
user 0m5.922s
sys 0m0.656s
busbey$ ls -lah ~/Downloads/6.3m_entries.avro
-rwxrwxrwx  1 busbey  staff   186M Aug 28 00:31 /Users/busbey/Downloads/6.3m_entries.avro
busbey$ time java -jar avro-tools-1.7.7.jar tojson ~/Downloads/6.3m_entries.avro | wc -l
 6300000

real 0m4.239s
user 0m6.026s
sys 0m0.721s

I'd recommend that you start at >= 5 nodes if you want to look at rough per-node throughput capabilities.


On 2018/08/28 06:59:38, guy sharon <[hidden email]> wrote:

> hi Mike,
>
> Thanks for the links.
>
> My current setup is a 4 node cluster (tserver, master, gc, monitor) running
> on Alpine Docker containers on a laptop with an i7 processor (8 cores) with
> 16GB of RAM. As an example I'm running a count of all entries for a table
> with 6.3M entries with "accumulo shell -u root -p secret  -e "scan -t
> benchmark_table -np" | wc -l" and it takes 43 seconds. Not sure if this is
> reasonable or not. Seems a little slow to me. What do you think?
>
> BR,
> Guy.
>
>
>
>
> On Mon, Aug 27, 2018 at 4:43 PM Michael Wall <[hidden email]> wrote:
>
> > Hi Guy,
> >
> > Here are a couple links I found.  Can you tell us more about your setup
> > and what you are seeing?
> >
> > https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf
> > https://www.youtube.com/watch?v=Ae9THpmpFpM
> >
> > Mike
> >
> >
> > On Sat, Aug 25, 2018 at 5:09 PM guy sharon <[hidden email]>
> > wrote:
> >
> >> hi,
> >>
> >> I've just started working with Accumulo and I think I'm experiencing slow
> >> reads/writes. I'm aware of the recommended configuration. Does anyone know
> >> of any standard benchmarks and benchmarking tools I can use to tell if the
> >> performance I'm getting is reasonable?
> >>
> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: benchmarking

Jeremy Kepner
FYI, Single node Accumulo instances is our most popular deployment.
We have hundreds of them.   Accummulo is so fast that it can replace
what would normally require 20 MySQL servers.

Regards.  -Jeremy

On Tue, Aug 28, 2018 at 07:38:37AM +0000, Sean Busbey wrote:

> Hi Guy,
>
> Apache Accumulo is designed for horizontally scaling out for large scale workloads that need to do random reads and writes. There's a non-trivial amount of overhead that comes with a system aimed at doing that on thousands of nodes.
>
> If your use case works for a single laptop with such a small number of entries and exhaustive scans, then Accumulo is probably not the correct tool for the job.
>
> For example, on my laptop (i7 2 cores, 8GiB memory) with that dataset size you can just rely on a file format like Apache Avro:
>
> busbey$ time java -jar avro-tools-1.7.7.jar random --codec snappy --count 6300000 --schema '{ "type": "record", "name": "entry", "fields": [ { "name": "field0", "type": "string" } ] }' ~/Downloads/6.3m_entries.avro
> Aug 28, 2018 12:31:13 AM org.apache.hadoop.util.NativeCodeLoader <clinit>
> WARNING: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> test.seed=1535441473243
>
> real 0m5.451s
> user 0m5.922s
> sys 0m0.656s
> busbey$ ls -lah ~/Downloads/6.3m_entries.avro
> -rwxrwxrwx  1 busbey  staff   186M Aug 28 00:31 /Users/busbey/Downloads/6.3m_entries.avro
> busbey$ time java -jar avro-tools-1.7.7.jar tojson ~/Downloads/6.3m_entries.avro | wc -l
>  6300000
>
> real 0m4.239s
> user 0m6.026s
> sys 0m0.721s
>
> I'd recommend that you start at >= 5 nodes if you want to look at rough per-node throughput capabilities.
>
>
> On 2018/08/28 06:59:38, guy sharon <[hidden email]> wrote:
> > hi Mike,
> >
> > Thanks for the links.
> >
> > My current setup is a 4 node cluster (tserver, master, gc, monitor) running
> > on Alpine Docker containers on a laptop with an i7 processor (8 cores) with
> > 16GB of RAM. As an example I'm running a count of all entries for a table
> > with 6.3M entries with "accumulo shell -u root -p secret  -e "scan -t
> > benchmark_table -np" | wc -l" and it takes 43 seconds. Not sure if this is
> > reasonable or not. Seems a little slow to me. What do you think?
> >
> > BR,
> > Guy.
> >
> >
> >
> >
> > On Mon, Aug 27, 2018 at 4:43 PM Michael Wall <[hidden email]> wrote:
> >
> > > Hi Guy,
> > >
> > > Here are a couple links I found.  Can you tell us more about your setup
> > > and what you are seeing?
> > >
> > > https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf
> > > https://www.youtube.com/watch?v=Ae9THpmpFpM
> > >
> > > Mike
> > >
> > >
> > > On Sat, Aug 25, 2018 at 5:09 PM guy sharon <[hidden email]>
> > > wrote:
> > >
> > >> hi,
> > >>
> > >> I've just started working with Accumulo and I think I'm experiencing slow
> > >> reads/writes. I'm aware of the recommended configuration. Does anyone know
> > >> of any standard benchmarks and benchmarking tools I can use to tell if the
> > >> performance I'm getting is reasonable?
> > >>
> > >>
> > >>
> >
Reply | Threaded
Open this post in threaded view
|

Re: benchmarking

Michael Wall
In reply to this post by guy sharon
Hi Guy,

I can't say if that is reasonable without more info.  How are you running datanodes, namenodes and zookeepers?  Also, what are the JVM options for each process?  Can you share your dockerfiles?  What OS are you on?  How much of your OS can Docker take?  What is the data in your benchmark_table?

Like Sean mentioned, running multiple tservers will help to distribute the load.  You may or may not have headroom.  It is possible to run multiple tservers on the same host, even without docker.

Like Jeremy mentioned, I have seem better performance than you are getting on a single node cluster but I usually use the standalone mini accumulo for that, not a full cluster setup with HDFS.

Mike

On Tue, Aug 28, 2018 at 2:59 AM guy sharon <[hidden email]> wrote:
hi Mike,

Thanks for the links.

My current setup is a 4 node cluster (tserver, master, gc, monitor) running on Alpine Docker containers on a laptop with an i7 processor (8 cores) with 16GB of RAM. As an example I'm running a count of all entries for a table with 6.3M entries with "accumulo shell -u root -p secret  -e "scan -t benchmark_table -np" | wc -l" and it takes 43 seconds. Not sure if this is reasonable or not. Seems a little slow to me. What do you think?

BR,
Guy.




On Mon, Aug 27, 2018 at 4:43 PM Michael Wall <[hidden email]> wrote:
Hi Guy,

Here are a couple links I found.  Can you tell us more about your setup and what you are seeing?


Mike


On Sat, Aug 25, 2018 at 5:09 PM guy sharon <[hidden email]> wrote:
hi,

I've just started working with Accumulo and I think I'm experiencing slow reads/writes. I'm aware of the recommended configuration. Does anyone know of any standard benchmarks and benchmarking tools I can use to tell if the performance I'm getting is reasonable?


Reply | Threaded
Open this post in threaded view
|

Re: benchmarking

guy sharon
In reply to this post by Sean Busbey-6
hi Sean,

Thanks for the advice. I tried bringing up a 5 tserver cluster on AWS with Muchos (https://github.com/apache/fluo-muchos). My first attempt was using servers with 2 vCPU, 8GB RAM (m5d.large on AWS). The Hadoop datanodes were colocated with the tservers and the Accumulo master was on the same server as the Hadoop namenode. I populated a table with 6M entries using a modified version of org.apache.accumulo.examples.simple.helloworld.InsertWithBatchWriter from Accumulo (the only thing I modified was the number of entries as it usually inserts 50k). I then did a count with "bin/accumulo shell -u root -p secret -e "scan -t hellotable -np" | wc -l". That took 15 seconds. I then upgraded to m5d.xlarge instances (4vCPU, 16GB RAM) and got the exact same result, so it seems upgrading the servers doesn't help.

Is this expected or am I doing something terribly wrong?

BR,
Guy.



On Tue, Aug 28, 2018 at 10:38 AM Sean Busbey <[hidden email]> wrote:
Hi Guy,

Apache Accumulo is designed for horizontally scaling out for large scale workloads that need to do random reads and writes. There's a non-trivial amount of overhead that comes with a system aimed at doing that on thousands of nodes.

If your use case works for a single laptop with such a small number of entries and exhaustive scans, then Accumulo is probably not the correct tool for the job.

For example, on my laptop (i7 2 cores, 8GiB memory) with that dataset size you can just rely on a file format like Apache Avro:

busbey$ time java -jar avro-tools-1.7.7.jar random --codec snappy --count 6300000 --schema '{ "type": "record", "name": "entry", "fields": [ { "name": "field0", "type": "string" } ] }' ~/Downloads/6.3m_entries.avro
Aug 28, 2018 12:31:13 AM org.apache.hadoop.util.NativeCodeLoader <clinit>
WARNING: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
test.seed=1535441473243

real    0m5.451s
user    0m5.922s
sys     0m0.656s
busbey$ ls -lah ~/Downloads/6.3m_entries.avro
-rwxrwxrwx  1 busbey  staff   186M Aug 28 00:31 /Users/busbey/Downloads/6.3m_entries.avro
busbey$ time java -jar avro-tools-1.7.7.jar tojson ~/Downloads/6.3m_entries.avro | wc -l
 6300000

real    0m4.239s
user    0m6.026s
sys     0m0.721s

I'd recommend that you start at >= 5 nodes if you want to look at rough per-node throughput capabilities.


On 2018/08/28 06:59:38, guy sharon <[hidden email]> wrote:
> hi Mike,
>
> Thanks for the links.
>
> My current setup is a 4 node cluster (tserver, master, gc, monitor) running
> on Alpine Docker containers on a laptop with an i7 processor (8 cores) with
> 16GB of RAM. As an example I'm running a count of all entries for a table
> with 6.3M entries with "accumulo shell -u root -p secret  -e "scan -t
> benchmark_table -np" | wc -l" and it takes 43 seconds. Not sure if this is
> reasonable or not. Seems a little slow to me. What do you think?
>
> BR,
> Guy.
>
>
>
>
> On Mon, Aug 27, 2018 at 4:43 PM Michael Wall <[hidden email]> wrote:
>
> > Hi Guy,
> >
> > Here are a couple links I found.  Can you tell us more about your setup
> > and what you are seeing?
> >
> > https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf
> > https://www.youtube.com/watch?v=Ae9THpmpFpM
> >
> > Mike
> >
> >
> > On Sat, Aug 25, 2018 at 5:09 PM guy sharon <[hidden email]>
> > wrote:
> >
> >> hi,
> >>
> >> I've just started working with Accumulo and I think I'm experiencing slow
> >> reads/writes. I'm aware of the recommended configuration. Does anyone know
> >> of any standard benchmarks and benchmarking tools I can use to tell if the
> >> performance I'm getting is reasonable?
> >>
> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: benchmarking

Mike Miller
Measuring scan performance by piping output from the shell is not the best way.  A lot of time is wasted printing output to the terminal. You are better off measuring the difference using the Batch Scanner API directly.  An example can be found here: https://accumulo.apache.org/tour/batch-scanner/


On Tue, Aug 28, 2018 at 2:50 PM guy sharon <[hidden email]> wrote:
hi Sean,

Thanks for the advice. I tried bringing up a 5 tserver cluster on AWS with Muchos (https://github.com/apache/fluo-muchos). My first attempt was using servers with 2 vCPU, 8GB RAM (m5d.large on AWS). The Hadoop datanodes were colocated with the tservers and the Accumulo master was on the same server as the Hadoop namenode. I populated a table with 6M entries using a modified version of org.apache.accumulo.examples.simple.helloworld.InsertWithBatchWriter from Accumulo (the only thing I modified was the number of entries as it usually inserts 50k). I then did a count with "bin/accumulo shell -u root -p secret -e "scan -t hellotable -np" | wc -l". That took 15 seconds. I then upgraded to m5d.xlarge instances (4vCPU, 16GB RAM) and got the exact same result, so it seems upgrading the servers doesn't help.

Is this expected or am I doing something terribly wrong?

BR,
Guy.



On Tue, Aug 28, 2018 at 10:38 AM Sean Busbey <[hidden email]> wrote:
Hi Guy,

Apache Accumulo is designed for horizontally scaling out for large scale workloads that need to do random reads and writes. There's a non-trivial amount of overhead that comes with a system aimed at doing that on thousands of nodes.

If your use case works for a single laptop with such a small number of entries and exhaustive scans, then Accumulo is probably not the correct tool for the job.

For example, on my laptop (i7 2 cores, 8GiB memory) with that dataset size you can just rely on a file format like Apache Avro:

busbey$ time java -jar avro-tools-1.7.7.jar random --codec snappy --count 6300000 --schema '{ "type": "record", "name": "entry", "fields": [ { "name": "field0", "type": "string" } ] }' ~/Downloads/6.3m_entries.avro
Aug 28, 2018 12:31:13 AM org.apache.hadoop.util.NativeCodeLoader <clinit>
WARNING: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
test.seed=1535441473243

real    0m5.451s
user    0m5.922s
sys     0m0.656s
busbey$ ls -lah ~/Downloads/6.3m_entries.avro
-rwxrwxrwx  1 busbey  staff   186M Aug 28 00:31 /Users/busbey/Downloads/6.3m_entries.avro
busbey$ time java -jar avro-tools-1.7.7.jar tojson ~/Downloads/6.3m_entries.avro | wc -l
 6300000

real    0m4.239s
user    0m6.026s
sys     0m0.721s

I'd recommend that you start at >= 5 nodes if you want to look at rough per-node throughput capabilities.


On 2018/08/28 06:59:38, guy sharon <[hidden email]> wrote:
> hi Mike,
>
> Thanks for the links.
>
> My current setup is a 4 node cluster (tserver, master, gc, monitor) running
> on Alpine Docker containers on a laptop with an i7 processor (8 cores) with
> 16GB of RAM. As an example I'm running a count of all entries for a table
> with 6.3M entries with "accumulo shell -u root -p secret  -e "scan -t
> benchmark_table -np" | wc -l" and it takes 43 seconds. Not sure if this is
> reasonable or not. Seems a little slow to me. What do you think?
>
> BR,
> Guy.
>
>
>
>
> On Mon, Aug 27, 2018 at 4:43 PM Michael Wall <[hidden email]> wrote:
>
> > Hi Guy,
> >
> > Here are a couple links I found.  Can you tell us more about your setup
> > and what you are seeing?
> >
> > https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf
> > https://www.youtube.com/watch?v=Ae9THpmpFpM
> >
> > Mike
> >
> >
> > On Sat, Aug 25, 2018 at 5:09 PM guy sharon <[hidden email]>
> > wrote:
> >
> >> hi,
> >>
> >> I've just started working with Accumulo and I think I'm experiencing slow
> >> reads/writes. I'm aware of the recommended configuration. Does anyone know
> >> of any standard benchmarks and benchmarking tools I can use to tell if the
> >> performance I'm getting is reasonable?
> >>
> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: benchmarking

guy sharon
In reply to this post by Jeremy Kepner
hi Jeremy,

Do you have any information on how you configure them and what kind of hardware they run on?

Thanks,
Guy.



On Tue, Aug 28, 2018 at 3:44 PM Jeremy Kepner <[hidden email]> wrote:
FYI, Single node Accumulo instances is our most popular deployment.
We have hundreds of them.   Accummulo is so fast that it can replace
what would normally require 20 MySQL servers.

Regards.  -Jeremy

On Tue, Aug 28, 2018 at 07:38:37AM +0000, Sean Busbey wrote:
> Hi Guy,
>
> Apache Accumulo is designed for horizontally scaling out for large scale workloads that need to do random reads and writes. There's a non-trivial amount of overhead that comes with a system aimed at doing that on thousands of nodes.
>
> If your use case works for a single laptop with such a small number of entries and exhaustive scans, then Accumulo is probably not the correct tool for the job.
>
> For example, on my laptop (i7 2 cores, 8GiB memory) with that dataset size you can just rely on a file format like Apache Avro:
>
> busbey$ time java -jar avro-tools-1.7.7.jar random --codec snappy --count 6300000 --schema '{ "type": "record", "name": "entry", "fields": [ { "name": "field0", "type": "string" } ] }' ~/Downloads/6.3m_entries.avro
> Aug 28, 2018 12:31:13 AM org.apache.hadoop.util.NativeCodeLoader <clinit>
> WARNING: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> test.seed=1535441473243
>
> real  0m5.451s
> user  0m5.922s
> sys   0m0.656s
> busbey$ ls -lah ~/Downloads/6.3m_entries.avro
> -rwxrwxrwx  1 busbey  staff   186M Aug 28 00:31 /Users/busbey/Downloads/6.3m_entries.avro
> busbey$ time java -jar avro-tools-1.7.7.jar tojson ~/Downloads/6.3m_entries.avro | wc -l
>  6300000
>
> real  0m4.239s
> user  0m6.026s
> sys   0m0.721s
>
> I'd recommend that you start at >= 5 nodes if you want to look at rough per-node throughput capabilities.
>
>
> On 2018/08/28 06:59:38, guy sharon <[hidden email]> wrote:
> > hi Mike,
> >
> > Thanks for the links.
> >
> > My current setup is a 4 node cluster (tserver, master, gc, monitor) running
> > on Alpine Docker containers on a laptop with an i7 processor (8 cores) with
> > 16GB of RAM. As an example I'm running a count of all entries for a table
> > with 6.3M entries with "accumulo shell -u root -p secret  -e "scan -t
> > benchmark_table -np" | wc -l" and it takes 43 seconds. Not sure if this is
> > reasonable or not. Seems a little slow to me. What do you think?
> >
> > BR,
> > Guy.
> >
> >
> >
> >
> > On Mon, Aug 27, 2018 at 4:43 PM Michael Wall <[hidden email]> wrote:
> >
> > > Hi Guy,
> > >
> > > Here are a couple links I found.  Can you tell us more about your setup
> > > and what you are seeing?
> > >
> > > https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf
> > > https://www.youtube.com/watch?v=Ae9THpmpFpM
> > >
> > > Mike
> > >
> > >
> > > On Sat, Aug 25, 2018 at 5:09 PM guy sharon <[hidden email]>
> > > wrote:
> > >
> > >> hi,
> > >>
> > >> I've just started working with Accumulo and I think I'm experiencing slow
> > >> reads/writes. I'm aware of the recommended configuration. Does anyone know
> > >> of any standard benchmarks and benchmarking tools I can use to tell if the
> > >> performance I'm getting is reasonable?
> > >>
> > >>
> > >>
> >
Reply | Threaded
Open this post in threaded view
|

Re: benchmarking

Jeremy Kepner
Our nodes are usually 20+ cores and 100+ GB RAM.

On Tue, Aug 28, 2018 at 10:18:24PM +0300, guy sharon wrote:

> hi Jeremy,
>
> Do you have any information on how you configure them and what kind of
> hardware they run on?
>
> Thanks,
> Guy.
>
>
>
> On Tue, Aug 28, 2018 at 3:44 PM Jeremy Kepner <[hidden email]> wrote:
>
> > FYI, Single node Accumulo instances is our most popular deployment.
> > We have hundreds of them.   Accummulo is so fast that it can replace
> > what would normally require 20 MySQL servers.
> >
> > Regards.  -Jeremy
> >
> > On Tue, Aug 28, 2018 at 07:38:37AM +0000, Sean Busbey wrote:
> > > Hi Guy,
> > >
> > > Apache Accumulo is designed for horizontally scaling out for large scale
> > workloads that need to do random reads and writes. There's a non-trivial
> > amount of overhead that comes with a system aimed at doing that on
> > thousands of nodes.
> > >
> > > If your use case works for a single laptop with such a small number of
> > entries and exhaustive scans, then Accumulo is probably not the correct
> > tool for the job.
> > >
> > > For example, on my laptop (i7 2 cores, 8GiB memory) with that dataset
> > size you can just rely on a file format like Apache Avro:
> > >
> > > busbey$ time java -jar avro-tools-1.7.7.jar random --codec snappy
> > --count 6300000 --schema '{ "type": "record", "name": "entry", "fields": [
> > { "name": "field0", "type": "string" } ] }' ~/Downloads/6.3m_entries.avro
> > > Aug 28, 2018 12:31:13 AM org.apache.hadoop.util.NativeCodeLoader <clinit>
> > > WARNING: Unable to load native-hadoop library for your platform... using
> > builtin-java classes where applicable
> > > test.seed=1535441473243
> > >
> > > real  0m5.451s
> > > user  0m5.922s
> > > sys   0m0.656s
> > > busbey$ ls -lah ~/Downloads/6.3m_entries.avro
> > > -rwxrwxrwx  1 busbey  staff   186M Aug 28 00:31
> > /Users/busbey/Downloads/6.3m_entries.avro
> > > busbey$ time java -jar avro-tools-1.7.7.jar tojson
> > ~/Downloads/6.3m_entries.avro | wc -l
> > >  6300000
> > >
> > > real  0m4.239s
> > > user  0m6.026s
> > > sys   0m0.721s
> > >
> > > I'd recommend that you start at >= 5 nodes if you want to look at rough
> > per-node throughput capabilities.
> > >
> > >
> > > On 2018/08/28 06:59:38, guy sharon <[hidden email]> wrote:
> > > > hi Mike,
> > > >
> > > > Thanks for the links.
> > > >
> > > > My current setup is a 4 node cluster (tserver, master, gc, monitor)
> > running
> > > > on Alpine Docker containers on a laptop with an i7 processor (8 cores)
> > with
> > > > 16GB of RAM. As an example I'm running a count of all entries for a
> > table
> > > > with 6.3M entries with "accumulo shell -u root -p secret  -e "scan -t
> > > > benchmark_table -np" | wc -l" and it takes 43 seconds. Not sure if
> > this is
> > > > reasonable or not. Seems a little slow to me. What do you think?
> > > >
> > > > BR,
> > > > Guy.
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Aug 27, 2018 at 4:43 PM Michael Wall <[hidden email]>
> > wrote:
> > > >
> > > > > Hi Guy,
> > > > >
> > > > > Here are a couple links I found.  Can you tell us more about your
> > setup
> > > > > and what you are seeing?
> > > > >
> > > > > https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf
> > > > > https://www.youtube.com/watch?v=Ae9THpmpFpM
> > > > >
> > > > > Mike
> > > > >
> > > > >
> > > > > On Sat, Aug 25, 2018 at 5:09 PM guy sharon <
> > [hidden email]>
> > > > > wrote:
> > > > >
> > > > >> hi,
> > > > >>
> > > > >> I've just started working with Accumulo and I think I'm
> > experiencing slow
> > > > >> reads/writes. I'm aware of the recommended configuration. Does
> > anyone know
> > > > >> of any standard benchmarks and benchmarking tools I can use to tell
> > if the
> > > > >> performance I'm getting is reasonable?
> > > > >>
> > > > >>
> > > > >>
> > > >
> >