[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

classic Classic list List threaded Threaded
36 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

ctubbsii
GitHub user joshelser opened a pull request:

    https://github.com/apache/accumulo/pull/247

    ACCUMULO-3208 Integration test for the OrIterator and cleanup

    The OrIterator was in very bad shape, with next to no documentation
    about what it actually does.
   
    Put this against 1.8 for fun. Can rebase wherever.
   
    FYI @milleruntime @keith-turner @phrocker

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/joshelser/accumulo 3208-OrIterator

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/accumulo/pull/247.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #247
   
----
commit d42e4207aeba60eb4d6c0fac0a48113d1a0ea1a7
Author: Josh Elser <[hidden email]>
Date:   2017-04-10T02:45:56Z

    ACCUMULO-3208 Integration test for the OrIterator and cleanup
   
    The OrIterator was in very bad shape, with next to no documentation
    about what it actually does.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

ctubbsii
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110790242
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -30,36 +33,66 @@
     import org.apache.accumulo.core.data.Key;
     import org.apache.accumulo.core.data.Range;
     import org.apache.accumulo.core.data.Value;
    +import org.apache.commons.lang.StringUtils;
     import org.apache.hadoop.io.Text;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
     
     /**
      * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
    + *
    + * The table structure should have the following form:
    + *
    + * <pre>
    + * row term:docId =&lt; value
    + * </pre>
    + *
    + * This Iterator will return a sorted iteration of docIDs for a given set of terms.
    + *
    + * For example, given the data and an OR'ing of "bob,steve":
    + *
    + * <pre>
    + * row1 bob:4
    + * row1 george:2
    + * row1 steve:3
    + * </pre>
    + *
    + * This Iterator will return:
    + *
    + * <pre>
    + * row1 bob:4
    + * row1 steve:3
    + * </pre>
      */
     
    -public class OrIterator implements SortedKeyValueIterator<Key,Value> {
    +public class OrIterator implements SortedKeyValueIterator<Key,Value>, OptionDescriber {
    --- End diff --
   
    I was talking to Marc about this one: I'm doubting everything because I never realized that this iterator doesn't actually return data in sorted-order (violating the contract of SKVI). Best as I can tell, that is just how it works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo issue #247: ACCUMULO-3208 Integration test for the OrIterator and c...

ctubbsii
In reply to this post by ctubbsii
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/247
 
    👍  Thanks Josh, this is great!  I didn't want to have to follow through with my threatening PR :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo issue #247: ACCUMULO-3208 Integration test for the OrIterator and c...

ctubbsii
In reply to this post by ctubbsii
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/247
 
    > this is great! I didn't want to have to follow through with my threatening PR :)
   
    Hah!  Did you happen to look at this with an eye for clarity/simplification? One of my goals was to simplify the code paths (without breaking "optimizations") while also leaving better comments explaining what each path was doing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo issue #247: ACCUMULO-3208 Integration test for the OrIterator and c...

ctubbsii
In reply to this post by ctubbsii
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/247
 
    > Hah! Did you happen to look at this with an eye for clarity/simplification? One of my goals was to simplify the code paths (without breaking "optimizations") while also leaving better comments explaining what each path was doing.
   
    I am looking deeper.  I just cloned your repo and I'm looking at the code through my IDE... the true clarity test!  I also want to step through the IT you created.  I wanted to give my initial response to the PR above since it was very positive.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo issue #247: ACCUMULO-3208 Integration test for the OrIterator and c...

ctubbsii
In reply to this post by ctubbsii
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/247
 
    > I also want to step through the IT you created.
   
    So the problem with this is that the TabletServer, and thus the OrIterator, are running in a separate JVM. It's not easy to just debug this IT like it is with any other unit test.
   
    Creating a MiniAccumuloCluster that can run the Accumulo services in the same JVM would be extremely beneficial (https://issues.apache.org/jira/browse/ACCUMULO-2739). I will say, debugging "minicluster" tests in HBase is monumentally easier than it is in Accumulo. I just don't have the time to revisit this one, sadly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

ctubbsii
In reply to this post by ctubbsii
Github user milleruntime commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110950549
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -30,36 +33,66 @@
     import org.apache.accumulo.core.data.Key;
     import org.apache.accumulo.core.data.Range;
     import org.apache.accumulo.core.data.Value;
    +import org.apache.commons.lang.StringUtils;
     import org.apache.hadoop.io.Text;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
     
     /**
      * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
    --- End diff --
   
    > This code has been adapted/merged from Heap and Multi Iterators.
   
    It is not clear to me what this means.  Does this add anything for someone with a deep understanding of Heap and Multi iterators?  If not, maybe just remove this sentence.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

ctubbsii
In reply to this post by ctubbsii
Github user milleruntime commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110951131
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -30,36 +33,66 @@
     import org.apache.accumulo.core.data.Key;
     import org.apache.accumulo.core.data.Range;
     import org.apache.accumulo.core.data.Value;
    +import org.apache.commons.lang.StringUtils;
     import org.apache.hadoop.io.Text;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
     
     /**
      * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
    + *
    + * The table structure should have the following form:
    + *
    + * <pre>
    + * row term:docId =&lt; value
    + * </pre>
    + *
    + * This Iterator will return a sorted iteration of docIDs for a given set of terms.
    + *
    + * For example, given the data and an OR'ing of "bob,steve":
    + *
    + * <pre>
    + * row1 bob:4
    + * row1 george:2
    + * row1 steve:3
    + * </pre>
    + *
    + * This Iterator will return:
    + *
    + * <pre>
    + * row1 steve:3
    + * row1 bob:4
    + * </pre>
    --- End diff --
   
    Should state this iterator requires setting of the COLUMNS_KEY option.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

ctubbsii
In reply to this post by ctubbsii
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110951592
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -30,36 +33,66 @@
     import org.apache.accumulo.core.data.Key;
     import org.apache.accumulo.core.data.Range;
     import org.apache.accumulo.core.data.Value;
    +import org.apache.commons.lang.StringUtils;
     import org.apache.hadoop.io.Text;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
     
     /**
      * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
    + *
    + * The table structure should have the following form:
    + *
    + * <pre>
    + * row term:docId =&lt; value
    + * </pre>
    + *
    + * This Iterator will return a sorted iteration of docIDs for a given set of terms.
    + *
    + * For example, given the data and an OR'ing of "bob,steve":
    + *
    + * <pre>
    + * row1 bob:4
    + * row1 george:2
    + * row1 steve:3
    + * </pre>
    + *
    + * This Iterator will return:
    + *
    + * <pre>
    + * row1 steve:3
    + * row1 bob:4
    + * </pre>
    --- End diff --
   
    Good point. The way is was written before required you to write your own iterator to use it (adding `TermSource`'s on your own). I'll expand the class-level javadoc to better describe these two usages.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

ctubbsii
In reply to this post by ctubbsii
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110951958
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -30,36 +33,66 @@
     import org.apache.accumulo.core.data.Key;
     import org.apache.accumulo.core.data.Range;
     import org.apache.accumulo.core.data.Value;
    +import org.apache.commons.lang.StringUtils;
     import org.apache.hadoop.io.Text;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
     
     /**
      * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
    --- End diff --
   
    Yeah, that's a good point since the big-picture was to make this "user-facing" which HeapIterator and MultiIterator are not (they're actually "system" iterators, lol). That comment likely predates the package separation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

ctubbsii
In reply to this post by ctubbsii
Github user milleruntime commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110965861
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -80,59 +113,103 @@ public int compareTo(TermSource o) {
           // sorted after they have been determined to be valid.
           return this.iter.getTopKey().compareColumnQualifier(o.iter.getTopKey().getColumnQualifier());
    --- End diff --
   
    Why ColumnQualifer and not ColumnFamily?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

ctubbsii
In reply to this post by ctubbsii
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110971254
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -80,59 +113,103 @@ public int compareTo(TermSource o) {
           // sorted after they have been determined to be valid.
           return this.iter.getTopKey().compareColumnQualifier(o.iter.getTopKey().getColumnQualifier());
    --- End diff --
   
    See the class-level javadoc "This Iterator will return a sorted iteration of docIDs for a given set of terms."
   
    It's the same issue that I pinged @phrocker about. This is actually violating the SKVI contract (returning records out of order), but the implementation just happens to allow it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

ctubbsii
In reply to this post by ctubbsii
Github user milleruntime commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110991222
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -30,36 +33,66 @@
     import org.apache.accumulo.core.data.Key;
     import org.apache.accumulo.core.data.Range;
     import org.apache.accumulo.core.data.Value;
    +import org.apache.commons.lang.StringUtils;
     import org.apache.hadoop.io.Text;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
     
     /**
      * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
    + *
    + * The table structure should have the following form:
    + *
    + * <pre>
    + * row term:docId =&lt; value
    + * </pre>
    + *
    + * This Iterator will return a sorted iteration of docIDs for a given set of terms.
    + *
    + * For example, given the data and an OR'ing of "bob,steve":
    + *
    + * <pre>
    + * row1 bob:4
    + * row1 george:2
    + * row1 steve:3
    + * </pre>
    + *
    + * This Iterator will return:
    + *
    + * <pre>
    + * row1 bob:4
    + * row1 steve:3
    + * </pre>
      */
     
    -public class OrIterator implements SortedKeyValueIterator<Key,Value> {
    +public class OrIterator implements SortedKeyValueIterator<Key,Value>, OptionDescriber {
    --- End diff --
   
    I assume this was done as an optimization.  Since it fully implements SKVI (with that one exception), you could rename the class to something like "OrIteratorUnsorted" or "OptimizedOrIterator" making the exception more obvious.  Then someone could extend it or create another (albeit less optimized) iterator that returns sorted data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

ctubbsii
In reply to this post by ctubbsii
Github user milleruntime commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110992809
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -80,59 +113,103 @@ public int compareTo(TermSource o) {
           // sorted after they have been determined to be valid.
           return this.iter.getTopKey().compareColumnQualifier(o.iter.getTopKey().getColumnQualifier());
    --- End diff --
   
    You could add a comment to TermSource stating that the CQ is the docID for the term and is used for sorting.  Then note in the class comments that the inner class TermSource, is sortable by docID but this implementation is optimized and returns an unsorted list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

ctubbsii
In reply to this post by ctubbsii
Github user milleruntime commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110995255
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -30,36 +33,66 @@
     import org.apache.accumulo.core.data.Key;
     import org.apache.accumulo.core.data.Range;
     import org.apache.accumulo.core.data.Value;
    +import org.apache.commons.lang.StringUtils;
     import org.apache.hadoop.io.Text;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
     
     /**
      * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
    + *
    + * The table structure should have the following form:
    + *
    + * <pre>
    + * row term:docId =&lt; value
    + * </pre>
    + *
    + * This Iterator will return a sorted iteration of docIDs for a given set of terms.
    + *
    + * For example, given the data and an OR'ing of "bob,steve":
    + *
    + * <pre>
    + * row1 bob:4
    + * row1 george:2
    + * row1 steve:3
    + * </pre>
    + *
    + * This Iterator will return:
    + *
    + * <pre>
    + * row1 steve:3
    + * row1 bob:4
    + * </pre>
      */
     
    -public class OrIterator implements SortedKeyValueIterator<Key,Value> {
    +public class OrIterator implements SortedKeyValueIterator<Key,Value>, OptionDescriber {
    +  private static final Logger LOG = LoggerFactory.getLogger(OrIterator.class);
    +  public static final String COLUMNS_KEY = "or.iterator.columns";
     
       private TermSource currentTerm;
    -  private ArrayList<TermSource> sources;
    +  private List<TermSource> sources;
       private PriorityQueue<TermSource> sorted = new PriorityQueue<>(5);
    --- End diff --
   
    Since this queue is not actually sorted, this name is very misleading.  I would rename to something like "validSources".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

ctubbsii
In reply to this post by ctubbsii
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110996691
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -30,36 +33,66 @@
     import org.apache.accumulo.core.data.Key;
     import org.apache.accumulo.core.data.Range;
     import org.apache.accumulo.core.data.Value;
    +import org.apache.commons.lang.StringUtils;
     import org.apache.hadoop.io.Text;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
     
     /**
      * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
    + *
    + * The table structure should have the following form:
    + *
    + * <pre>
    + * row term:docId =&lt; value
    + * </pre>
    + *
    + * This Iterator will return a sorted iteration of docIDs for a given set of terms.
    + *
    + * For example, given the data and an OR'ing of "bob,steve":
    + *
    + * <pre>
    + * row1 bob:4
    + * row1 george:2
    + * row1 steve:3
    + * </pre>
    + *
    + * This Iterator will return:
    + *
    + * <pre>
    + * row1 steve:3
    + * row1 bob:4
    + * </pre>
      */
     
    -public class OrIterator implements SortedKeyValueIterator<Key,Value> {
    +public class OrIterator implements SortedKeyValueIterator<Key,Value>, OptionDescriber {
    +  private static final Logger LOG = LoggerFactory.getLogger(OrIterator.class);
    +  public static final String COLUMNS_KEY = "or.iterator.columns";
     
       private TermSource currentTerm;
    -  private ArrayList<TermSource> sources;
    +  private List<TermSource> sources;
       private PriorityQueue<TermSource> sorted = new PriorityQueue<>(5);
    --- End diff --
   
    uhh, I don't follow what you mean. The priority queue is implicitly sorted by the `Comparator` of the class being stored (TermSource)...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

ctubbsii
In reply to this post by ctubbsii
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110996811
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -80,59 +113,103 @@ public int compareTo(TermSource o) {
           // sorted after they have been determined to be valid.
           return this.iter.getTopKey().compareColumnQualifier(o.iter.getTopKey().getColumnQualifier());
    --- End diff --
   
    > this implementation is optimized and returns an unsorted list.
   
    It's not an optimization. It's the implementation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

ctubbsii
In reply to this post by ctubbsii
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r110997152
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -30,36 +33,66 @@
     import org.apache.accumulo.core.data.Key;
     import org.apache.accumulo.core.data.Range;
     import org.apache.accumulo.core.data.Value;
    +import org.apache.commons.lang.StringUtils;
     import org.apache.hadoop.io.Text;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
     
     /**
      * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
    + *
    + * The table structure should have the following form:
    + *
    + * <pre>
    + * row term:docId =&lt; value
    + * </pre>
    + *
    + * This Iterator will return a sorted iteration of docIDs for a given set of terms.
    + *
    + * For example, given the data and an OR'ing of "bob,steve":
    + *
    + * <pre>
    + * row1 bob:4
    + * row1 george:2
    + * row1 steve:3
    + * </pre>
    + *
    + * This Iterator will return:
    + *
    + * <pre>
    + * row1 bob:4
    + * row1 steve:3
    + * </pre>
      */
     
    -public class OrIterator implements SortedKeyValueIterator<Key,Value> {
    +public class OrIterator implements SortedKeyValueIterator<Key,Value>, OptionDescriber {
    --- End diff --
   
    > I assume this was done as an optimization.
   
    No, it's the design of the iterator. It wouldn't function correctly if it returned in non-docId sorted order.
   
    >  Since it fully implements SKVI
   
    Yes, it implements all the methods of SKVI (I mean, it wouldn't compile otherwise). The issue is that it doesn't adhere to the runtime contract. While iterating over any SKVI, the Keys should always be increasing in sort-order. This iterator, by design, does not do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

ctubbsii
In reply to this post by ctubbsii
Github user milleruntime commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r111001139
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -30,36 +33,66 @@
     import org.apache.accumulo.core.data.Key;
     import org.apache.accumulo.core.data.Range;
     import org.apache.accumulo.core.data.Value;
    +import org.apache.commons.lang.StringUtils;
     import org.apache.hadoop.io.Text;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
     
     /**
      * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
    + *
    + * The table structure should have the following form:
    + *
    + * <pre>
    + * row term:docId =&lt; value
    + * </pre>
    + *
    + * This Iterator will return a sorted iteration of docIDs for a given set of terms.
    + *
    + * For example, given the data and an OR'ing of "bob,steve":
    + *
    + * <pre>
    + * row1 bob:4
    + * row1 george:2
    + * row1 steve:3
    + * </pre>
    + *
    + * This Iterator will return:
    + *
    + * <pre>
    + * row1 steve:3
    + * row1 bob:4
    + * </pre>
      */
     
    -public class OrIterator implements SortedKeyValueIterator<Key,Value> {
    +public class OrIterator implements SortedKeyValueIterator<Key,Value>, OptionDescriber {
    +  private static final Logger LOG = LoggerFactory.getLogger(OrIterator.class);
    +  public static final String COLUMNS_KEY = "or.iterator.columns";
     
       private TermSource currentTerm;
    -  private ArrayList<TermSource> sources;
    +  private List<TermSource> sources;
       private PriorityQueue<TermSource> sorted = new PriorityQueue<>(5);
    --- End diff --
   
    Ah OK I missed that.  I don't see how this iterator is NOT sorting then...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #247: ACCUMULO-3208 Integration test for the OrIterato...

ctubbsii
In reply to this post by ctubbsii
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/247#discussion_r111001594
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java ---
    @@ -30,36 +33,66 @@
     import org.apache.accumulo.core.data.Key;
     import org.apache.accumulo.core.data.Range;
     import org.apache.accumulo.core.data.Value;
    +import org.apache.commons.lang.StringUtils;
     import org.apache.hadoop.io.Text;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
     
     /**
      * An iterator that handles "OR" query constructs on the server side. This code has been adapted/merged from Heap and Multi Iterators.
    + *
    + * The table structure should have the following form:
    + *
    + * <pre>
    + * row term:docId =&lt; value
    + * </pre>
    + *
    + * This Iterator will return a sorted iteration of docIDs for a given set of terms.
    + *
    + * For example, given the data and an OR'ing of "bob,steve":
    + *
    + * <pre>
    + * row1 bob:4
    + * row1 george:2
    + * row1 steve:3
    + * </pre>
    + *
    + * This Iterator will return:
    + *
    + * <pre>
    + * row1 steve:3
    + * row1 bob:4
    + * </pre>
      */
     
    -public class OrIterator implements SortedKeyValueIterator<Key,Value> {
    +public class OrIterator implements SortedKeyValueIterator<Key,Value>, OptionDescriber {
    +  private static final Logger LOG = LoggerFactory.getLogger(OrIterator.class);
    +  public static final String COLUMNS_KEY = "or.iterator.columns";
     
       private TermSource currentTerm;
    -  private ArrayList<TermSource> sources;
    +  private List<TermSource> sources;
       private PriorityQueue<TermSource> sorted = new PriorityQueue<>(5);
    --- End diff --
   
    Look at the class level javadoc again. It's sorting by row and then colqual, not row then colfam. e.g. "row1 bob:4" should come before "row1 steve:3" but it will not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
12