[GitHub] accumulo pull request #237: ACCUMULO-3521: minor improvements to iterators

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #237: ACCUMULO-3521: minor improvements to iterators

joshelser
GitHub user milleruntime opened a pull request:

    https://github.com/apache/accumulo/pull/237

    ACCUMULO-3521: minor improvements to iterators

    Updated iterators mentioned in [ACCUMULO-3521](https://issues.apache.org/jira/browse/ACCUMULO-3521), added tests to cover untested methods and deprecated OrIterator.  Couldn't find IteratorUtil.getMaxPriority and .findIterator methods.  StatsCombiner is in examples.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/milleruntime/accumulo ACCUMULO-3521

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/accumulo/pull/237.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #237
   
----
commit 96c1e4ff9ebbefc00c57c7c5c76aeac737ca314d
Author: Mike Miller <[hidden email]>
Date:   2017-03-29T20:45:56Z

    ACCUMULO-3521: deprecate OrIterator

commit 808b10278e20f9222c2c398a003bee4c39676bc5
Author: Mike Miller <[hidden email]>
Date:   2017-03-29T20:49:18Z

    ACCUMULO-3521: improvements to FirstEntryInRowIterator and Test

commit 4366d58e60b641f4ace075d7812fc947e2e18910
Author: Mike Miller <[hidden email]>
Date:   2017-03-29T20:49:55Z

    ACCUMULO-3521: improvements to TypedValueCombiner and added test to CombinerTest

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

joshelser
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/237
 
    If there is no opposition to deprecating OrIterator, I will merge these changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

joshelser
In reply to this post by joshelser
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/237
 
    > If there is no opposition to deprecating OrIterator
   
    I'm iffy on this. It seems like you're just deprecating it without replacement because it doesn't have any tests. Is this accurate?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

joshelser
In reply to this post by joshelser
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/237
 
    Mostly... its also not clear whether it actually functions correctly or not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

joshelser
In reply to this post by joshelser
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/237
 
    > Mostly... its also not clear whether it actually functions correctly or not
   
    Sorry to be pedantic -- again, is this because of the lack of unit tests? Or, did you try to use it and didn't have success?
   
    Backstory is that I have a lot experience with this iterator and, while it's been years, I know it did work. I can't think of any changes that would have caused it to not work -- this is why I'm asking.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

joshelser
In reply to this post by joshelser
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/237
 
    I haven't tried to use it, mainly is because its very confusing and not documented.  I do believe you that it worked at one point.  But no one has touched the code ever (other than formatting) and without a good test, its likely to rot if it hasn't already.  Since you have experience with it, would you be able to write a test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

joshelser
In reply to this post by joshelser
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/237
 
    >  Since you have experience with it, would you be able to write a test?
   
    I have the ability and knowledge, just can't guarantee the time :)
   
    IMO, leaving this iterator out of the `user` package was intended to note that it's not great for users to directly consume. I don't see any value added to slapping a "Deprecated" on it too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

joshelser
In reply to this post by joshelser
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/237
 
    > MO, leaving this iterator out of the user package was intended to note that it's not great for users to directly consume. I don't see any value added to slapping a "Deprecated" on it too.
   
    Isn't that something that Deprecation does?  Tells users to use at your own risk.  Just having the class up in the "iterator" package as opposed to the "user" package doesn't portray this...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

joshelser
In reply to this post by joshelser
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/237
 
    > Isn't that something that Deprecation does? Tells users to use at your own risk
   
    Verbatim from JDK8
   
    "A program element annotated @Deprecated is one that programmers are discouraged from using, typically because it is dangerous, or because a better alternative exists. Compilers warn when a deprecated program element is used or overridden in non-deprecated code."
   
    Personally, I wouldn't call the OrIterator's possible code-rot/negligence, "dangerous". IIRC, the {{o.a.a.c.i.user}} package was introduced to bridge the gap between "Iterators we expect users to pull off the shelf" and "Iterators which YMMV using" (a nod towards SKVI not being in public API). I think the OrIterator's lack of love is adequately advertised by not being in {{o.a.a.c.i.user}}.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

joshelser
In reply to this post by joshelser
Github user ctubbsii commented on the issue:

    https://github.com/apache/accumulo/pull/237
 
    > Isn't that something that Deprecation does?
   
    I'm not a fan of using deprecation tags to signal "YMMV". Not being in the public API does that all by itself. Currently, (for better or worse) no iterators are in the public API. Some are more risky than others, but I don't think we can use deprecation tags to meaningfully distinguish between continuous ranges of risk.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

joshelser
In reply to this post by joshelser
Github user keith-turner commented on the issue:

    https://github.com/apache/accumulo/pull/237
 
    On the PR for adding close to iterators we noticed the OrIterator seemed to be doing something incorrect.  However, without knowing what the iterators is supposed to actually do, we were not sure :)
   
    Under some case in seek it removes an iterator and then later adds it.  Does anyone know whats going on here?
   
    https://github.com/milleruntime/accumulo/blob/4366d58e60b641f4ace075d7812fc947e2e18910/core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java#L225


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

joshelser
In reply to this post by joshelser
Github user phrocker commented on the issue:

    https://github.com/apache/accumulo/pull/237
 
    @keith-turner it remove sources that don't match the term or don't have any values. Thus removing it from the TermSources. This seems reasonable based on dealing with a source as a term amongst a list of iterable sources.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

joshelser
In reply to this post by joshelser
Github user keith-turner commented on the issue:

    https://github.com/apache/accumulo/pull/237
 
    @phrocker  do you think it should still call `sorted.add(TS)` when it removes TS from the iterator? The comments make it sound like maybe it shouldn't,  but I am not sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

joshelser
In reply to this post by joshelser
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/237
 
    > should still call sorted.add(TS) when it removes TS from the iterator?
   
    That does look like a bug to me. When it is heap-ifying each column (termsource), it's saying that for the given `Range` it was seek'ed to, there are no results. However, there is no reason that this instance of the `OrIterator` couldn't be seek'ed to a new `Range` without being torn-down.
   
    However, most of the time, I would guess that the above is not actually triggered (they would get a new instance of the iterator that would re-construct the term sources).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

joshelser
In reply to this post by joshelser
Github user phrocker commented on the issue:

    https://github.com/apache/accumulo/pull/237
 
    @keith-turner I agree it seems to break the SKVI contract by trying to make some optimization that likely isn't of great benefit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

joshelser
In reply to this post by joshelser
Github user phrocker commented on the issue:

    https://github.com/apache/accumulo/pull/237
 
    @keith-turner I'll admit I'm still trying to wrap my head around the why. I'm trying to reverse engineer it to figure out if there's something I'm missing...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

joshelser
In reply to this post by joshelser
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/237
 
    > I agree it seems to break the SKVI contract
   
    +1
   
    I think that removal breaks the contract, but, for the wikisearch use-case with very large rows/shards, this bug wouldn't have been noticed. I'm not sure of anyone who has ever used the OrIterator/AndIterator for general purpose table schemas. But again, a quick unit test would likely be the proof in the puddin'.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #237: ACCUMULO-3521: minor improvements to iterators

joshelser
In reply to this post by joshelser
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/237
 
    There is also another version of OrIterator in the wikisearch repo: https://github.com/apache/accumulo-wikisearch/blob/master/query/src/main/java/org/apache/accumulo/examples/wikisearch/iterator/OrIterator.java


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #237: ACCUMULO-3521: minor improvements to iterators

joshelser
In reply to this post by joshelser
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/237#discussion_r109517652
 
    --- Diff: core/src/test/java/org/apache/accumulo/core/iterators/user/CombinerTest.java ---
    @@ -922,4 +923,49 @@ public void testDeleteHandling() throws Exception {
         runDeleteHandlingTest(input, expected, null, paritalMajcIe, ".*ERROR.*SummingCombiner.*ACCUMULO-2232.*");
         runDeleteHandlingTest(input, expected, null, fullMajcIe, ".*ERROR.*SummingCombiner.*ACCUMULO-2232.*");
       }
    +
    +  /**
    +   * Tests the Lossy option will ignore errors in TypedValueCombiner. Uses SummingArrayCombiner to generate error.
    +   */
    +  @Test
    +  public void testLossyOption() throws IOException, IllegalAccessException, InstantiationException {
    +    Encoder<List<Long>> encoder = SummingArrayCombiner.VarLongArrayEncoder.class.newInstance();
    --- End diff --
   
    Is there a reason could not do `new VarLongArrayEncoder()`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #237: ACCUMULO-3521: minor improvements to iterators

joshelser
In reply to this post by joshelser
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/237#discussion_r109519640
 
    --- Diff: core/src/test/java/org/apache/accumulo/core/iterators/FirstEntryInRowIteratorTest.java ---
    @@ -53,30 +54,33 @@ private static long process(TreeMap<Key,Value> sourceMap, TreeMap<Key,Value> res
       public void test() throws IOException {
         TreeMap<Key,Value> sourceMap = new TreeMap<>();
         Value emptyValue = new Value("".getBytes());
    +    IteratorSetting iteratorSetting = new IteratorSetting(1, FirstEntryInRowIterator.class);
    +    FirstEntryInRowIterator.setNumScansBeforeSeek(iteratorSetting, 10);
    --- End diff --
   
    Will the test fail if this is not set to properly?  If not, could call iteratorSetting.getOptions() after this and check if it was really set.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
12
Loading...