data miss when use rowiterator

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

data miss when use rowiterator

Lu Q

I use accumulo 1.8.0,and I develop a ORM framework for conversion the scan result to a object.

Before,I use Rowiterator because it faster than direct to use scan

RowIterator rows = new RowIterator(scan);
rows.forEachRemaining(rowIterator -> {
while (rowIterator.hasNext()) {
Map.Entry<Key, Value> entry = rowIterator.next();
...
}
}

it works ok until I query 1000+ once .I found that when the range size bigger then 1000,some data miss.
I think maybe I conversion it error ,so I change it to a map struct ,the row_id as the map key ,and other as the map value ,the problem still exists.

Then I not use RowIterator,it works ok.
for (Map.Entry<Key, Value> entry : scan) {
...
}


Is the bug or my program error ?
Thanks.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: data miss when use rowiterator

Christopher Tubbs-2
Does it matter if your scanner is a BatchScanner or a Scanner?
I wonder if this is due to the way BatchScanner could split rows up.

On Thu, Feb 9, 2017 at 9:50 PM Lu Q <[hidden email]> wrote:

I use accumulo 1.8.0,and I develop a ORM framework for conversion the scan result to a object.

Before,I use Rowiterator because it faster than direct to use scan

RowIterator rows = new RowIterator(scan);
rows.forEachRemaining(rowIterator -> {
while (rowIterator.hasNext()) {
Map.Entry<Key, Value> entry = rowIterator.next();
...
}
}

it works ok until I query 1000+ once .I found that when the range size bigger then 1000,some data miss.
I think maybe I conversion it error ,so I change it to a map struct ,the row_id as the map key ,and other as the map value ,the problem still exists.

Then I not use RowIterator,it works ok.
for (Map.Entry<Key, Value> entry : scan) {
...
}


Is the bug or my program error ?
Thanks.
--
Christopher
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: data miss when use rowiterator

Lu Q
I use BatchScanner

在 2017年2月10日,11:24,Christopher <[hidden email]> 写道:

Does it matter if your scanner is a BatchScanner or a Scanner?
I wonder if this is due to the way BatchScanner could split rows up.

On Thu, Feb 9, 2017 at 9:50 PM Lu Q <[hidden email]> wrote:

I use accumulo 1.8.0,and I develop a ORM framework for conversion the scan result to a object.

Before,I use Rowiterator because it faster than direct to use scan

RowIterator rows = new RowIterator(scan);
rows.forEachRemaining(rowIterator -> {
while (rowIterator.hasNext()) {
Map.Entry<Key, Value> entry = rowIterator.next();
...
}
}

it works ok until I query 1000+ once .I found that when the range size bigger then 1000,some data miss.
I think maybe I conversion it error ,so I change it to a map struct ,the row_id as the map key ,and other as the map value ,the problem still exists.

Then I not use RowIterator,it works ok.
for (Map.Entry<Key, Value> entry : scan) {
...
}


Is the bug or my program error ?
Thanks.
--
Christopher

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: data miss when use rowiterator

Christopher Tubbs-2
I suspected that was the case. BatchScanner does not guarantee ordering of entries, which is needed for the behavior you're expecting with RowIterator. This means that the RowIterator could see the same row multiple times with different subsets of the row's columns. This is probably affecting your count.

On Thu, Feb 9, 2017 at 10:29 PM Lu Q <[hidden email]> wrote:
I use BatchScanner

在 2017年2月10日,11:24,Christopher <[hidden email]> 写道:

Does it matter if your scanner is a BatchScanner or a Scanner?
I wonder if this is due to the way BatchScanner could split rows up.

On Thu, Feb 9, 2017 at 9:50 PM Lu Q <[hidden email]> wrote:

I use accumulo 1.8.0,and I develop a ORM framework for conversion the scan result to a object.

Before,I use Rowiterator because it faster than direct to use scan

RowIterator rows = new RowIterator(scan);
rows.forEachRemaining(rowIterator -> {
while (rowIterator.hasNext()) {
Map.Entry<Key, Value> entry = rowIterator.next();
...
}
}

it works ok until I query 1000+ once .I found that when the range size bigger then 1000,some data miss.
I think maybe I conversion it error ,so I change it to a map struct ,the row_id as the map key ,and other as the map value ,the problem still exists.

Then I not use RowIterator,it works ok.
for (Map.Entry<Key, Value> entry : scan) {
...
}


Is the bug or my program error ?
Thanks.
--
Christopher

--
Christopher
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: data miss when use rowiterator

Josh Elser-2
Just to be clear, Lu, for now stick to using a Scanner with the
RowIterator :)

It sounds like we might have to re-think how the RowIterator works with
the BatchScanner...

Christopher wrote:

> I suspected that was the case. BatchScanner does not guarantee ordering
> of entries, which is needed for the behavior you're expecting with
> RowIterator. This means that the RowIterator could see the same row
> multiple times with different subsets of the row's columns. This is
> probably affecting your count.
>
> On Thu, Feb 9, 2017 at 10:29 PM Lu Q <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     I use BatchScanner
>
>>     在 2017年2月10日,11:24,Christopher <[hidden email]
>>     <mailto:[hidden email]>> 写道:
>>
>>     Does it matter if your scanner is a BatchScanner or a Scanner?
>>     I wonder if this is due to the way BatchScanner could split rows up.
>>
>>     On Thu, Feb 9, 2017 at 9:50 PM Lu Q <[hidden email]
>>     <mailto:[hidden email]>> wrote:
>>
>>
>>         I use accumulo 1.8.0,and I develop a ORM framework for
>>         conversion the scan result to a object.
>>
>>         Before,I use Rowiterator because it faster than direct to use scan
>>
>>         RowIterator rows = new RowIterator(scan);
>>         rows.forEachRemaining(rowIterator -> {
>>         while (rowIterator.hasNext()) {
>>         Map.Entry<Key, Value> entry = rowIterator.next();
>>         ...
>>         }
>>         }
>>
>>         it works ok until I query 1000+ once .I found that when the
>>         range size bigger then 1000,some data miss.
>>         I think maybe I conversion it error ,so I change it to a map
>>         struct ,the row_id as the map key ,and other as the map value
>>         ,the problem still exists.
>>
>>         Then I not use RowIterator,it works ok.
>>         for (Map.Entry<Key, Value> entry : scan) {
>>         ...
>>         }
>>
>>
>>         Is the bug or my program error ?
>>         Thanks.
>>
>>     --
>>     Christopher
>
> --
> Christopher
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: data miss when use rowiterator

Lu Q
Thanks

> 在 2017年2月10日,12:39,Josh Elser <[hidden email]> 写道:
>
> Just to be clear, Lu, for now stick to using a Scanner with the RowIterator :)
>
> It sounds like we might have to re-think how the RowIterator works with the BatchScanner...
>
> Christopher wrote:
>> I suspected that was the case. BatchScanner does not guarantee ordering
>> of entries, which is needed for the behavior you're expecting with
>> RowIterator. This means that the RowIterator could see the same row
>> multiple times with different subsets of the row's columns. This is
>> probably affecting your count.
>>
>> On Thu, Feb 9, 2017 at 10:29 PM Lu Q <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>    I use BatchScanner
>>
>>>    在 2017年2月10日,11:24,Christopher <[hidden email]
>>>    <mailto:[hidden email]>> 写道:
>>>
>>>    Does it matter if your scanner is a BatchScanner or a Scanner?
>>>    I wonder if this is due to the way BatchScanner could split rows up.
>>>
>>>    On Thu, Feb 9, 2017 at 9:50 PM Lu Q <[hidden email]
>>>    <mailto:[hidden email]>> wrote:
>>>
>>>
>>>        I use accumulo 1.8.0,and I develop a ORM framework for
>>>        conversion the scan result to a object.
>>>
>>>        Before,I use Rowiterator because it faster than direct to use scan
>>>
>>>        RowIterator rows = new RowIterator(scan);
>>>        rows.forEachRemaining(rowIterator -> {
>>>        while (rowIterator.hasNext()) {
>>>        Map.Entry<Key, Value> entry = rowIterator.next();
>>>        ...
>>>        }
>>>        }
>>>
>>>        it works ok until I query 1000+ once .I found that when the
>>>        range size bigger then 1000,some data miss.
>>>        I think maybe I conversion it error ,so I change it to a map
>>>        struct ,the row_id as the map key ,and other as the map value
>>>        ,the problem still exists.
>>>
>>>        Then I not use RowIterator,it works ok.
>>>        for (Map.Entry<Key, Value> entry : scan) {
>>>        ...
>>>        }
>>>
>>>
>>>        Is the bug or my program error ?
>>>        Thanks.
>>>
>>>    --
>>>    Christopher
>>
>> --
>> Christopher

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: data miss when use rowiterator

Keith Turner
In reply to this post by Josh Elser-2
On Thu, Feb 9, 2017 at 11:39 PM, Josh Elser <[hidden email]> wrote:
> Just to be clear, Lu, for now stick to using a Scanner with the RowIterator
> :)
>
> It sounds like we might have to re-think how the RowIterator works with the
> BatchScanner...

I opened : https://issues.apache.org/jira/browse/ACCUMULO-4586

>
> Christopher wrote:
>>
>> I suspected that was the case. BatchScanner does not guarantee ordering
>> of entries, which is needed for the behavior you're expecting with
>> RowIterator. This means that the RowIterator could see the same row
>> multiple times with different subsets of the row's columns. This is
>> probably affecting your count.
>>
>> On Thu, Feb 9, 2017 at 10:29 PM Lu Q <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>     I use BatchScanner
>>
>>>     在 2017年2月10日,11:24,Christopher <[hidden email]
>>>     <mailto:[hidden email]>> 写道:
>>>
>>>     Does it matter if your scanner is a BatchScanner or a Scanner?
>>>     I wonder if this is due to the way BatchScanner could split rows up.
>>>
>>>     On Thu, Feb 9, 2017 at 9:50 PM Lu Q <[hidden email]
>>>     <mailto:[hidden email]>> wrote:
>>>
>>>
>>>         I use accumulo 1.8.0,and I develop a ORM framework for
>>>         conversion the scan result to a object.
>>>
>>>         Before,I use Rowiterator because it faster than direct to use
>>> scan
>>>
>>>         RowIterator rows = new RowIterator(scan);
>>>         rows.forEachRemaining(rowIterator -> {
>>>         while (rowIterator.hasNext()) {
>>>         Map.Entry<Key, Value> entry = rowIterator.next();
>>>         ...
>>>         }
>>>         }
>>>
>>>         it works ok until I query 1000+ once .I found that when the
>>>         range size bigger then 1000,some data miss.
>>>         I think maybe I conversion it error ,so I change it to a map
>>>         struct ,the row_id as the map key ,and other as the map value
>>>         ,the problem still exists.
>>>
>>>         Then I not use RowIterator,it works ok.
>>>         for (Map.Entry<Key, Value> entry : scan) {
>>>         ...
>>>         }
>>>
>>>
>>>         Is the bug or my program error ?
>>>         Thanks.
>>>
>>>     --
>>>     Christopher
>>
>>
>> --
>> Christopher
Loading...