[GitHub] accumulo pull request #275: ACCUMULO-4667 Reworked the LocalityGroupIterator...

classic Classic list List threaded Threaded
45 messages Options
123
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #275: ACCUMULO-4667 Reworked the LocalityGroupIterator...

ctubbsii
GitHub user ivakegg opened a pull request:

    https://github.com/apache/accumulo/pull/275

    ACCUMULO-4667 Reworked the LocalityGroupIterator to more efficiently …

    …determine which groups to search when seeking.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ivakegg/accumulo ACCUMULO-4667

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/accumulo/pull/275.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #275
   
----
commit 58a9dfc3d0fc34e748141bbac53851d1b875bedb
Author: Ivan Bella <[hidden email]>
Date:   2017-06-28T14:00:27Z

    ACCUMULO-4667 Reworked the LocalityGroupIterator to more efficiently determine which groups to search when seeking.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #275: ACCUMULO-4667 Reworked the LocalityGroupIterator...

ctubbsii
Github user brianloss commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/275#discussion_r124557591
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/system/LocalityGroupIterator.java ---
    @@ -97,75 +133,118 @@ public static final int seek(HeapIterator hiter, LocalityGroup[] groups, Set<Byt
         else
           cfSet = Collections.emptySet();
     
    -    for (LocalityGroup lgr : groups) {
    -      // when include is set to true it means this locality groups contains
    -      // wanted column families
    -      boolean include = false;
    +    // determine the set of groups to use
    +    Set<LocalityGroup> groupsToUse = new HashSet<LocalityGroup>();
     
    -      if (cfSet.size() == 0) {
    -        include = !inclusive;
    -      } else if (lgr.isDefaultLocalityGroup && lgr.columnFamilies == null) {
    -        // do not know what column families are in the default locality group,
    -        // only know what column families are not in it
    +    // if no column families specified, then include all groups unless !inclusive
    +    if (cfSet.size() == 0) {
    +      if (!inclusive) {
    +        groupsToUse.addAll(Arrays.asList(groups.groups));
    +      }
    +    } else {
     
    +      // do not know what column families are in the default locality group,
    +      // only know what column families are not in it
    +      if (groups.defaultGroup != null) {
             if (inclusive) {
    -          if (!nonDefaultColumnFamilies.containsAll(cfSet)) {
    +          if (!groups.groupByCf.keySet().containsAll(cfSet)) {
                 // default LG may contain wanted and unwanted column families
    -            include = true;
    +            groupsToUse.add(groups.defaultGroup);
               }// else - everything wanted is in other locality groups, so nothing to do
             } else {
               // must include, if all excluded column families are in other locality groups
    --- End diff --
   
    I know this wasn't your change, but this comment is misleading. We haven't tested that all excluded column families are in other locality groups here. I think that for !inclusive, you have to use the default locality group no matter what.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #275: ACCUMULO-4667 Reworked the LocalityGroupIterator to mor...

ctubbsii
In reply to this post by ctubbsii
Github user keith-turner commented on the issue:

    https://github.com/apache/accumulo/pull/275
 
    I created a [little program](https://gist.github.com/keith-turner/18aeadfad1f46e919cd4649fa59b3787) to experiment with the changes in this branch.  I am seeing significantly improved performance (this PR vs master) when fetching two columns from a file with 10K column families across 10 loc groups.  I plan to do more experimentation and review the code tomorrow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #275: ACCUMULO-4667 Reworked the LocalityGroupIterator...

ctubbsii
In reply to this post by ctubbsii
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/275#discussion_r125072047
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/system/LocalityGroupIterator.java ---
    @@ -97,75 +133,117 @@ public static final int seek(HeapIterator hiter, LocalityGroup[] groups, Set<Byt
         else
           cfSet = Collections.emptySet();
     
    -    for (LocalityGroup lgr : groups) {
    -      // when include is set to true it means this locality groups contains
    -      // wanted column families
    -      boolean include = false;
    +    // determine the set of groups to use
    +    Set<LocalityGroup> groupsToUse = new HashSet<LocalityGroup>();
     
    -      if (cfSet.size() == 0) {
    -        include = !inclusive;
    -      } else if (lgr.isDefaultLocalityGroup && lgr.columnFamilies == null) {
    -        // do not know what column families are in the default locality group,
    -        // only know what column families are not in it
    +    // if no column families specified, then include all groups unless !inclusive
    +    if (cfSet.size() == 0) {
    +      if (!inclusive) {
    +        groupsToUse.addAll(Arrays.asList(groups.groups));
    --- End diff --
   
    For this case we could have an immutable set that is always used.  Create this once in `LocalityGroupContext`.  Then we can avoid allocating the `HashSet` for `groupsToUse`, just allocate it in the else.  We can also avoid populating the hashset.  Could use Guava immutable set


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #275: ACCUMULO-4667 Reworked the LocalityGroupIterator...

ctubbsii
In reply to this post by ctubbsii
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/275#discussion_r125072391
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/system/LocalityGroupIterator.java ---
    @@ -97,75 +133,117 @@ public static final int seek(HeapIterator hiter, LocalityGroup[] groups, Set<Byt
         else
           cfSet = Collections.emptySet();
     
    -    for (LocalityGroup lgr : groups) {
    -      // when include is set to true it means this locality groups contains
    -      // wanted column families
    -      boolean include = false;
    +    // determine the set of groups to use
    +    Set<LocalityGroup> groupsToUse = new HashSet<LocalityGroup>();
     
    -      if (cfSet.size() == 0) {
    -        include = !inclusive;
    -      } else if (lgr.isDefaultLocalityGroup && lgr.columnFamilies == null) {
    -        // do not know what column families are in the default locality group,
    -        // only know what column families are not in it
    +    // if no column families specified, then include all groups unless !inclusive
    +    if (cfSet.size() == 0) {
    +      if (!inclusive) {
    +        groupsToUse.addAll(Arrays.asList(groups.groups));
    +      }
    +    } else {
     
    +      // do not know what column families are in the default locality group,
    +      // only know what column families are not in it
    +      if (groups.defaultGroup != null) {
             if (inclusive) {
    -          if (!nonDefaultColumnFamilies.containsAll(cfSet)) {
    +          if (!groups.groupByCf.keySet().containsAll(cfSet)) {
                 // default LG may contain wanted and unwanted column families
    -            include = true;
    +            groupsToUse.add(groups.defaultGroup);
               }// else - everything wanted is in other locality groups, so nothing to do
             } else {
    -          // must include, if all excluded column families are in other locality groups
    -          // then there are not unwanted column families in default LG
    -          include = true;
    +          // must include the default group as it may include cfs not in our cfSet
    +          groupsToUse.add(groups.defaultGroup);
    +        }
    +      }
    +
    +      /*
    +       * Need to consider the following cases for inclusive and exclusive (lgcf:locality group column family set, cf:column family set) lgcf and cf are disjoint
    +       * lgcf and cf are the same cf contains lgcf lgcf contains cf lgccf and cf intersect but neither is a subset of the other
    +       */
    +      if (!inclusive) {
    +        for (Entry<ByteSequence,LocalityGroup> entry : groups.groupByCf.entrySet()) {
    +          if (!cfSet.contains(entry.getKey())) {
    +            groupsToUse.add(entry.getValue());
    +          }
    +        }
    +      } else if (groups.groupByCf.size() <= cfSet.size()) {
    --- End diff --
   
    This is very nice!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #275: ACCUMULO-4667 Reworked the LocalityGroupIterator...

ctubbsii
In reply to this post by ctubbsii
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/275#discussion_r125073040
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/system/LocalityGroupIterator.java ---
    @@ -97,75 +133,117 @@ public static final int seek(HeapIterator hiter, LocalityGroup[] groups, Set<Byt
         else
           cfSet = Collections.emptySet();
     
    -    for (LocalityGroup lgr : groups) {
    -      // when include is set to true it means this locality groups contains
    -      // wanted column families
    -      boolean include = false;
    +    // determine the set of groups to use
    +    Set<LocalityGroup> groupsToUse = new HashSet<LocalityGroup>();
     
    -      if (cfSet.size() == 0) {
    -        include = !inclusive;
    -      } else if (lgr.isDefaultLocalityGroup && lgr.columnFamilies == null) {
    -        // do not know what column families are in the default locality group,
    -        // only know what column families are not in it
    +    // if no column families specified, then include all groups unless !inclusive
    +    if (cfSet.size() == 0) {
    +      if (!inclusive) {
    +        groupsToUse.addAll(Arrays.asList(groups.groups));
    +      }
    +    } else {
     
    +      // do not know what column families are in the default locality group,
    +      // only know what column families are not in it
    +      if (groups.defaultGroup != null) {
             if (inclusive) {
    -          if (!nonDefaultColumnFamilies.containsAll(cfSet)) {
    +          if (!groups.groupByCf.keySet().containsAll(cfSet)) {
                 // default LG may contain wanted and unwanted column families
    -            include = true;
    +            groupsToUse.add(groups.defaultGroup);
               }// else - everything wanted is in other locality groups, so nothing to do
             } else {
    -          // must include, if all excluded column families are in other locality groups
    -          // then there are not unwanted column families in default LG
    -          include = true;
    +          // must include the default group as it may include cfs not in our cfSet
    +          groupsToUse.add(groups.defaultGroup);
    +        }
    +      }
    +
    +      /*
    +       * Need to consider the following cases for inclusive and exclusive (lgcf:locality group column family set, cf:column family set) lgcf and cf are disjoint
    +       * lgcf and cf are the same cf contains lgcf lgcf contains cf lgccf and cf intersect but neither is a subset of the other
    +       */
    +      if (!inclusive) {
    +        for (Entry<ByteSequence,LocalityGroup> entry : groups.groupByCf.entrySet()) {
    +          if (!cfSet.contains(entry.getKey())) {
    +            groupsToUse.add(entry.getValue());
    +          }
    +        }
    +      } else if (groups.groupByCf.size() <= cfSet.size()) {
    +        for (Entry<ByteSequence,LocalityGroup> entry : groups.groupByCf.entrySet()) {
    +          if (cfSet.contains(entry.getKey())) {
    +            groupsToUse.add(entry.getValue());
    +          }
             }
           } else {
    -        /*
    -         * Need to consider the following cases for inclusive and exclusive (lgcf:locality group column family set, cf:column family set) lgcf and cf are
    -         * disjoint lgcf and cf are the same cf contains lgcf lgcf contains cf lgccf and cf intersect but neither is a subset of the other
    -         */
    -
    -        for (Entry<ByteSequence,MutableLong> entry : lgr.columnFamilies.entrySet())
    -          if (entry.getValue().longValue() > 0)
    -            if (cfSet.contains(entry.getKey())) {
    -              if (inclusive)
    -                include = true;
    -            } else if (!inclusive) {
    -              include = true;
    -            }
    +        for (ByteSequence cf : cfSet) {
    +          LocalityGroup group = groups.groupByCf.get(cf);
    +          if (group != null) {
    +            groupsToUse.add(group);
    +          }
    +        }
           }
    +    }
     
    -      if (include) {
    -        lgr.getIterator().seek(range, EMPTY_CF_SET, false);
    -        hiter.addSource(lgr.getIterator());
    -        numLGSeeked++;
    -      }// every column family is excluded, zero count, or not present
    +    for (LocalityGroup lgr : groupsToUse) {
    +      lgr.getIterator().seek(range, EMPTY_CF_SET, false);
    +      hiter.addSource(lgr.getIterator());
    +      numLGSeeked++;
    +    }
    +
    +    if (used != null) {
    +      used.addAll(groupsToUse);
         }
     
         return numLGSeeked;
       }
     
       @Override
       public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive) throws IOException {
    -    seek(this, groups, nonDefaultColumnFamilies, range, columnFamilies, inclusive);
    +    Set<ByteSequence> cfSet;
    +    if (columnFamilies.size() > 0)
    +      if (columnFamilies instanceof Set<?>) {
    +        cfSet = (Set<ByteSequence>) columnFamilies;
    +      } else {
    +        cfSet = new HashSet<>();
    +        cfSet.addAll(columnFamilies);
    +      }
    +    else
    +      cfSet = Collections.emptySet();
    +
    +    if (lastUsed != null && cfSet.equals(lastColumnFamilies) && inclusive == lastInclusive) {
    +      clear();
    +      for (LocalityGroup lgr : lastUsed) {
    +        lgr.getIterator().seek(range, EMPTY_CF_SET, false);
    +        addSource(lgr.getIterator());
    +      }
    +    } else {
    +      if (lastUsed == null) {
    +        lastUsed = new ArrayList<LocalityGroup>(lgContext.groups.length);
    +      } else {
    +        lastUsed.clear();
    +      }
    +      lastColumnFamilies = cfSet;
    --- End diff --
   
    Need to be careful with this, it could be a ref to a set a user passed.  The user could call `seek(setA)` then modify `setA` and then call `seek(setA)`.  In this use case it seems like the wrong loc groups could be used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #275: ACCUMULO-4667 Reworked the LocalityGroupIterator...

ctubbsii
In reply to this post by ctubbsii
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/275#discussion_r125076821
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/system/LocalityGroupIterator.java ---
    @@ -97,75 +133,117 @@ public static final int seek(HeapIterator hiter, LocalityGroup[] groups, Set<Byt
         else
           cfSet = Collections.emptySet();
     
    -    for (LocalityGroup lgr : groups) {
    -      // when include is set to true it means this locality groups contains
    -      // wanted column families
    -      boolean include = false;
    +    // determine the set of groups to use
    +    Set<LocalityGroup> groupsToUse = new HashSet<LocalityGroup>();
     
    -      if (cfSet.size() == 0) {
    -        include = !inclusive;
    -      } else if (lgr.isDefaultLocalityGroup && lgr.columnFamilies == null) {
    -        // do not know what column families are in the default locality group,
    -        // only know what column families are not in it
    +    // if no column families specified, then include all groups unless !inclusive
    +    if (cfSet.size() == 0) {
    +      if (!inclusive) {
    +        groupsToUse.addAll(Arrays.asList(groups.groups));
    --- End diff --
   
    Good observation.  Thanks.  Will do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #275: ACCUMULO-4667 Reworked the LocalityGroupIterator...

ctubbsii
In reply to this post by ctubbsii
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/275#discussion_r125081955
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/system/LocalityGroupIterator.java ---
    @@ -97,75 +133,117 @@ public static final int seek(HeapIterator hiter, LocalityGroup[] groups, Set<Byt
         else
           cfSet = Collections.emptySet();
     
    -    for (LocalityGroup lgr : groups) {
    -      // when include is set to true it means this locality groups contains
    -      // wanted column families
    -      boolean include = false;
    +    // determine the set of groups to use
    +    Set<LocalityGroup> groupsToUse = new HashSet<LocalityGroup>();
     
    -      if (cfSet.size() == 0) {
    -        include = !inclusive;
    -      } else if (lgr.isDefaultLocalityGroup && lgr.columnFamilies == null) {
    -        // do not know what column families are in the default locality group,
    -        // only know what column families are not in it
    +    // if no column families specified, then include all groups unless !inclusive
    +    if (cfSet.size() == 0) {
    +      if (!inclusive) {
    +        groupsToUse.addAll(Arrays.asList(groups.groups));
    +      }
    +    } else {
     
    +      // do not know what column families are in the default locality group,
    +      // only know what column families are not in it
    +      if (groups.defaultGroup != null) {
             if (inclusive) {
    -          if (!nonDefaultColumnFamilies.containsAll(cfSet)) {
    +          if (!groups.groupByCf.keySet().containsAll(cfSet)) {
                 // default LG may contain wanted and unwanted column families
    -            include = true;
    +            groupsToUse.add(groups.defaultGroup);
               }// else - everything wanted is in other locality groups, so nothing to do
             } else {
    -          // must include, if all excluded column families are in other locality groups
    -          // then there are not unwanted column families in default LG
    -          include = true;
    +          // must include the default group as it may include cfs not in our cfSet
    +          groupsToUse.add(groups.defaultGroup);
    +        }
    +      }
    +
    +      /*
    +       * Need to consider the following cases for inclusive and exclusive (lgcf:locality group column family set, cf:column family set) lgcf and cf are disjoint
    +       * lgcf and cf are the same cf contains lgcf lgcf contains cf lgccf and cf intersect but neither is a subset of the other
    +       */
    +      if (!inclusive) {
    +        for (Entry<ByteSequence,LocalityGroup> entry : groups.groupByCf.entrySet()) {
    +          if (!cfSet.contains(entry.getKey())) {
    +            groupsToUse.add(entry.getValue());
    +          }
    +        }
    +      } else if (groups.groupByCf.size() <= cfSet.size()) {
    --- End diff --
   
    Thank you


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #275: ACCUMULO-4667 Reworked the LocalityGroupIterator...

ctubbsii
In reply to this post by ctubbsii
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/275#discussion_r125082364
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/system/LocalityGroupIterator.java ---
    @@ -97,75 +133,117 @@ public static final int seek(HeapIterator hiter, LocalityGroup[] groups, Set<Byt
         else
           cfSet = Collections.emptySet();
     
    -    for (LocalityGroup lgr : groups) {
    -      // when include is set to true it means this locality groups contains
    -      // wanted column families
    -      boolean include = false;
    +    // determine the set of groups to use
    +    Set<LocalityGroup> groupsToUse = new HashSet<LocalityGroup>();
     
    -      if (cfSet.size() == 0) {
    -        include = !inclusive;
    -      } else if (lgr.isDefaultLocalityGroup && lgr.columnFamilies == null) {
    -        // do not know what column families are in the default locality group,
    -        // only know what column families are not in it
    +    // if no column families specified, then include all groups unless !inclusive
    +    if (cfSet.size() == 0) {
    +      if (!inclusive) {
    +        groupsToUse.addAll(Arrays.asList(groups.groups));
    +      }
    +    } else {
     
    +      // do not know what column families are in the default locality group,
    +      // only know what column families are not in it
    +      if (groups.defaultGroup != null) {
             if (inclusive) {
    -          if (!nonDefaultColumnFamilies.containsAll(cfSet)) {
    +          if (!groups.groupByCf.keySet().containsAll(cfSet)) {
                 // default LG may contain wanted and unwanted column families
    -            include = true;
    +            groupsToUse.add(groups.defaultGroup);
               }// else - everything wanted is in other locality groups, so nothing to do
             } else {
    -          // must include, if all excluded column families are in other locality groups
    -          // then there are not unwanted column families in default LG
    -          include = true;
    +          // must include the default group as it may include cfs not in our cfSet
    +          groupsToUse.add(groups.defaultGroup);
    +        }
    +      }
    +
    +      /*
    +       * Need to consider the following cases for inclusive and exclusive (lgcf:locality group column family set, cf:column family set) lgcf and cf are disjoint
    +       * lgcf and cf are the same cf contains lgcf lgcf contains cf lgccf and cf intersect but neither is a subset of the other
    +       */
    +      if (!inclusive) {
    +        for (Entry<ByteSequence,LocalityGroup> entry : groups.groupByCf.entrySet()) {
    +          if (!cfSet.contains(entry.getKey())) {
    +            groupsToUse.add(entry.getValue());
    +          }
    +        }
    +      } else if (groups.groupByCf.size() <= cfSet.size()) {
    +        for (Entry<ByteSequence,LocalityGroup> entry : groups.groupByCf.entrySet()) {
    +          if (cfSet.contains(entry.getKey())) {
    +            groupsToUse.add(entry.getValue());
    +          }
             }
           } else {
    -        /*
    -         * Need to consider the following cases for inclusive and exclusive (lgcf:locality group column family set, cf:column family set) lgcf and cf are
    -         * disjoint lgcf and cf are the same cf contains lgcf lgcf contains cf lgccf and cf intersect but neither is a subset of the other
    -         */
    -
    -        for (Entry<ByteSequence,MutableLong> entry : lgr.columnFamilies.entrySet())
    -          if (entry.getValue().longValue() > 0)
    -            if (cfSet.contains(entry.getKey())) {
    -              if (inclusive)
    -                include = true;
    -            } else if (!inclusive) {
    -              include = true;
    -            }
    +        for (ByteSequence cf : cfSet) {
    +          LocalityGroup group = groups.groupByCf.get(cf);
    +          if (group != null) {
    +            groupsToUse.add(group);
    +          }
    +        }
           }
    +    }
     
    -      if (include) {
    -        lgr.getIterator().seek(range, EMPTY_CF_SET, false);
    -        hiter.addSource(lgr.getIterator());
    -        numLGSeeked++;
    -      }// every column family is excluded, zero count, or not present
    +    for (LocalityGroup lgr : groupsToUse) {
    +      lgr.getIterator().seek(range, EMPTY_CF_SET, false);
    +      hiter.addSource(lgr.getIterator());
    +      numLGSeeked++;
    +    }
    +
    +    if (used != null) {
    +      used.addAll(groupsToUse);
         }
     
         return numLGSeeked;
       }
     
       @Override
       public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive) throws IOException {
    -    seek(this, groups, nonDefaultColumnFamilies, range, columnFamilies, inclusive);
    +    Set<ByteSequence> cfSet;
    +    if (columnFamilies.size() > 0)
    +      if (columnFamilies instanceof Set<?>) {
    +        cfSet = (Set<ByteSequence>) columnFamilies;
    +      } else {
    +        cfSet = new HashSet<>();
    +        cfSet.addAll(columnFamilies);
    +      }
    +    else
    +      cfSet = Collections.emptySet();
    +
    +    if (lastUsed != null && cfSet.equals(lastColumnFamilies) && inclusive == lastInclusive) {
    +      clear();
    +      for (LocalityGroup lgr : lastUsed) {
    +        lgr.getIterator().seek(range, EMPTY_CF_SET, false);
    +        addSource(lgr.getIterator());
    +      }
    +    } else {
    +      if (lastUsed == null) {
    +        lastUsed = new ArrayList<LocalityGroup>(lgContext.groups.length);
    +      } else {
    +        lastUsed.clear();
    +      }
    +      lastColumnFamilies = cfSet;
    --- End diff --
   
    good point....I will try and rework this part.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #275: ACCUMULO-4667 Reworked the LocalityGroupIterator to mor...

ctubbsii
In reply to this post by ctubbsii
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/275
 
    @ivakegg I recently created a performance test suite using JMH to measure the work I did with iterators: https://github.com/milleruntime/accumulo-benchmark
    It is pretty specific but could be useful for measuring your improvements.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #275: ACCUMULO-4667 Reworked the LocalityGroupIterator to mor...

ctubbsii
In reply to this post by ctubbsii
Github user keith-turner commented on the issue:

    https://github.com/apache/accumulo/pull/275
 
    Squashing commits while the review is in progress makes it hard to know what changed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #275: ACCUMULO-4667 Reworked the LocalityGroupIterator...

ctubbsii
In reply to this post by ctubbsii
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/275#discussion_r125107098
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/system/LocalityGroupIterator.java ---
    @@ -97,75 +133,116 @@ public static final int seek(HeapIterator hiter, LocalityGroup[] groups, Set<Byt
         else
           cfSet = Collections.emptySet();
     
    -    for (LocalityGroup lgr : groups) {
    -      // when include is set to true it means this locality groups contains
    -      // wanted column families
    -      boolean include = false;
    +    // determine the set of groups to use
    +    Collection<LocalityGroup> groupsToUse = Collections.EMPTY_LIST;
     
    -      if (cfSet.size() == 0) {
    -        include = !inclusive;
    -      } else if (lgr.isDefaultLocalityGroup && lgr.columnFamilies == null) {
    -        // do not know what column families are in the default locality group,
    -        // only know what column families are not in it
    +    // if no column families specified, then include all groups unless !inclusive
    +    if (cfSet.size() == 0) {
    +      if (!inclusive) {
    +        groupsToUse = groups.groups;
    +      }
    +    } else {
    +      groupsToUse = new HashSet<LocalityGroup>();
     
    +      // do not know what column families are in the default locality group,
    +      // only know what column families are not in it
    +      if (groups.defaultGroup != null) {
             if (inclusive) {
    -          if (!nonDefaultColumnFamilies.containsAll(cfSet)) {
    +          if (!groups.groupByCf.keySet().containsAll(cfSet)) {
                 // default LG may contain wanted and unwanted column families
    -            include = true;
    +            groupsToUse.add(groups.defaultGroup);
               }// else - everything wanted is in other locality groups, so nothing to do
             } else {
    -          // must include, if all excluded column families are in other locality groups
    -          // then there are not unwanted column families in default LG
    -          include = true;
    +          // must include the default group as it may include cfs not in our cfSet
    +          groupsToUse.add(groups.defaultGroup);
    +        }
    +      }
    +
    +      /*
    +       * Need to consider the following cases for inclusive and exclusive (lgcf:locality group column family set, cf:column family set) lgcf and cf are disjoint
    +       * lgcf and cf are the same cf contains lgcf lgcf contains cf lgccf and cf intersect but neither is a subset of the other
    +       */
    +      if (!inclusive) {
    +        for (Entry<ByteSequence,LocalityGroup> entry : groups.groupByCf.entrySet()) {
    +          if (!cfSet.contains(entry.getKey())) {
    +            groupsToUse.add(entry.getValue());
    +          }
    +        }
    +      } else if (groups.groupByCf.size() <= cfSet.size()) {
    +        for (Entry<ByteSequence,LocalityGroup> entry : groups.groupByCf.entrySet()) {
    +          if (cfSet.contains(entry.getKey())) {
    +            groupsToUse.add(entry.getValue());
    +          }
             }
           } else {
    -        /*
    -         * Need to consider the following cases for inclusive and exclusive (lgcf:locality group column family set, cf:column family set) lgcf and cf are
    -         * disjoint lgcf and cf are the same cf contains lgcf lgcf contains cf lgccf and cf intersect but neither is a subset of the other
    -         */
    -
    -        for (Entry<ByteSequence,MutableLong> entry : lgr.columnFamilies.entrySet())
    -          if (entry.getValue().longValue() > 0)
    -            if (cfSet.contains(entry.getKey())) {
    -              if (inclusive)
    -                include = true;
    -            } else if (!inclusive) {
    -              include = true;
    -            }
    +        for (ByteSequence cf : cfSet) {
    +          LocalityGroup group = groups.groupByCf.get(cf);
    +          if (group != null) {
    +            groupsToUse.add(group);
    +          }
    +        }
           }
    +    }
     
    -      if (include) {
    -        lgr.getIterator().seek(range, EMPTY_CF_SET, false);
    -        hiter.addSource(lgr.getIterator());
    -        numLGSeeked++;
    -      }// every column family is excluded, zero count, or not present
    +    for (LocalityGroup lgr : groupsToUse) {
    +      lgr.getIterator().seek(range, EMPTY_CF_SET, false);
    +      hiter.addSource(lgr.getIterator());
    +      numLGSeeked++;
    +    }
    +
    +    if (used != null) {
    +      used.addAll(groupsToUse);
         }
     
         return numLGSeeked;
       }
     
       @Override
       public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive) throws IOException {
    -    seek(this, groups, nonDefaultColumnFamilies, range, columnFamilies, inclusive);
    +    Set<ByteSequence> cfSet;
    +    if (columnFamilies.size() > 0)
    +      cfSet = new HashSet<>(columnFamilies);
    --- End diff --
   
    Always doing this copy is annoying. I wonder if the cost of this copy is made up by the cost savings of checking against the last families.  I suspect it may be, but not sure.   I think using the same set of columns for each seek is common, so its nice to use the same decision.  In the use case were seek is repeatedly being called with a different set of families, this copy and later equality check would be a lot of overhead with no gain. However I think this an exotic use case??
   
    I like Guava's `cfSet = ImmutableSet.of(columnFamilies)` because if the input is already an ImmutableSet it just returns it.   This would be beneficial if code calling seek passed in ImmutableSet.  We could avoid the copy without worrying about correctness.  Also another nice thing about using immutable set is the equality check below will be super quick (because its the same instance it can check for reference equality).  Comparing two different hash set instances that have the same content requires examining the content (I looked and it calls `containsAll()`).
   
    I am thinking we should change this to  `cfSet = ImmutableSet.of(columnFamilies)` and then have a follow on issue to make Accumulo code that constructs these sets in the tablet server use `ImmutableSet`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #275: ACCUMULO-4667 Reworked the LocalityGroupIterator...

ctubbsii
In reply to this post by ctubbsii
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/275#discussion_r125117476
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/system/LocalityGroupIterator.java ---
    @@ -97,75 +133,116 @@ public static final int seek(HeapIterator hiter, LocalityGroup[] groups, Set<Byt
         else
           cfSet = Collections.emptySet();
     
    -    for (LocalityGroup lgr : groups) {
    -      // when include is set to true it means this locality groups contains
    -      // wanted column families
    -      boolean include = false;
    +    // determine the set of groups to use
    +    Collection<LocalityGroup> groupsToUse = Collections.EMPTY_LIST;
     
    -      if (cfSet.size() == 0) {
    -        include = !inclusive;
    -      } else if (lgr.isDefaultLocalityGroup && lgr.columnFamilies == null) {
    -        // do not know what column families are in the default locality group,
    -        // only know what column families are not in it
    +    // if no column families specified, then include all groups unless !inclusive
    +    if (cfSet.size() == 0) {
    +      if (!inclusive) {
    +        groupsToUse = groups.groups;
    +      }
    +    } else {
    +      groupsToUse = new HashSet<LocalityGroup>();
     
    +      // do not know what column families are in the default locality group,
    +      // only know what column families are not in it
    +      if (groups.defaultGroup != null) {
             if (inclusive) {
    -          if (!nonDefaultColumnFamilies.containsAll(cfSet)) {
    +          if (!groups.groupByCf.keySet().containsAll(cfSet)) {
                 // default LG may contain wanted and unwanted column families
    -            include = true;
    +            groupsToUse.add(groups.defaultGroup);
               }// else - everything wanted is in other locality groups, so nothing to do
             } else {
    -          // must include, if all excluded column families are in other locality groups
    -          // then there are not unwanted column families in default LG
    -          include = true;
    +          // must include the default group as it may include cfs not in our cfSet
    +          groupsToUse.add(groups.defaultGroup);
    +        }
    +      }
    +
    +      /*
    +       * Need to consider the following cases for inclusive and exclusive (lgcf:locality group column family set, cf:column family set) lgcf and cf are disjoint
    +       * lgcf and cf are the same cf contains lgcf lgcf contains cf lgccf and cf intersect but neither is a subset of the other
    +       */
    +      if (!inclusive) {
    +        for (Entry<ByteSequence,LocalityGroup> entry : groups.groupByCf.entrySet()) {
    +          if (!cfSet.contains(entry.getKey())) {
    +            groupsToUse.add(entry.getValue());
    +          }
    +        }
    +      } else if (groups.groupByCf.size() <= cfSet.size()) {
    +        for (Entry<ByteSequence,LocalityGroup> entry : groups.groupByCf.entrySet()) {
    +          if (cfSet.contains(entry.getKey())) {
    +            groupsToUse.add(entry.getValue());
    +          }
             }
           } else {
    -        /*
    -         * Need to consider the following cases for inclusive and exclusive (lgcf:locality group column family set, cf:column family set) lgcf and cf are
    -         * disjoint lgcf and cf are the same cf contains lgcf lgcf contains cf lgccf and cf intersect but neither is a subset of the other
    -         */
    -
    -        for (Entry<ByteSequence,MutableLong> entry : lgr.columnFamilies.entrySet())
    -          if (entry.getValue().longValue() > 0)
    -            if (cfSet.contains(entry.getKey())) {
    -              if (inclusive)
    -                include = true;
    -            } else if (!inclusive) {
    -              include = true;
    -            }
    +        for (ByteSequence cf : cfSet) {
    +          LocalityGroup group = groups.groupByCf.get(cf);
    +          if (group != null) {
    +            groupsToUse.add(group);
    +          }
    +        }
           }
    +    }
     
    -      if (include) {
    -        lgr.getIterator().seek(range, EMPTY_CF_SET, false);
    -        hiter.addSource(lgr.getIterator());
    -        numLGSeeked++;
    -      }// every column family is excluded, zero count, or not present
    +    for (LocalityGroup lgr : groupsToUse) {
    +      lgr.getIterator().seek(range, EMPTY_CF_SET, false);
    +      hiter.addSource(lgr.getIterator());
    +      numLGSeeked++;
    +    }
    +
    +    if (used != null) {
    +      used.addAll(groupsToUse);
         }
     
         return numLGSeeked;
       }
     
       @Override
       public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive) throws IOException {
    -    seek(this, groups, nonDefaultColumnFamilies, range, columnFamilies, inclusive);
    +    Set<ByteSequence> cfSet;
    +    if (columnFamilies.size() > 0)
    +      cfSet = new HashSet<>(columnFamilies);
    --- End diff --
   
    I like this idea.  The change to make the cfset an ImmutableSet in the first place is trivial in the LocalityGroupUtil.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #275: ACCUMULO-4667 Reworked the LocalityGroupIterator...

ctubbsii
In reply to this post by ctubbsii
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/275#discussion_r125126212
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/system/LocalityGroupIterator.java ---
    @@ -97,75 +134,122 @@ public static final int seek(HeapIterator hiter, LocalityGroup[] groups, Set<Byt
         else
           cfSet = Collections.emptySet();
     
    -    for (LocalityGroup lgr : groups) {
    -      // when include is set to true it means this locality groups contains
    -      // wanted column families
    -      boolean include = false;
    +    // determine the set of groups to use
    +    Collection<LocalityGroup> groupsToUse = Collections.EMPTY_LIST;
     
    -      if (cfSet.size() == 0) {
    -        include = !inclusive;
    -      } else if (lgr.isDefaultLocalityGroup && lgr.columnFamilies == null) {
    -        // do not know what column families are in the default locality group,
    -        // only know what column families are not in it
    +    // if no column families specified, then include all groups unless !inclusive
    +    if (cfSet.size() == 0) {
    +      if (!inclusive) {
    +        groupsToUse = groups.groups;
    +      }
    +    } else {
    +      groupsToUse = new HashSet<LocalityGroup>();
     
    +      // do not know what column families are in the default locality group,
    +      // only know what column families are not in it
    +      if (groups.defaultGroup != null) {
             if (inclusive) {
    -          if (!nonDefaultColumnFamilies.containsAll(cfSet)) {
    +          if (!groups.groupByCf.keySet().containsAll(cfSet)) {
                 // default LG may contain wanted and unwanted column families
    -            include = true;
    +            groupsToUse.add(groups.defaultGroup);
               }// else - everything wanted is in other locality groups, so nothing to do
             } else {
    -          // must include, if all excluded column families are in other locality groups
    -          // then there are not unwanted column families in default LG
    -          include = true;
    +          // must include the default group as it may include cfs not in our cfSet
    +          groupsToUse.add(groups.defaultGroup);
    +        }
    +      }
    +
    +      /*
    +       * Need to consider the following cases for inclusive and exclusive (lgcf:locality group column family set, cf:column family set) lgcf and cf are disjoint
    +       * lgcf and cf are the same cf contains lgcf lgcf contains cf lgccf and cf intersect but neither is a subset of the other
    +       */
    +      if (!inclusive) {
    +        for (Entry<ByteSequence,LocalityGroup> entry : groups.groupByCf.entrySet()) {
    +          if (!cfSet.contains(entry.getKey())) {
    +            groupsToUse.add(entry.getValue());
    +          }
    +        }
    +      } else if (groups.groupByCf.size() <= cfSet.size()) {
    +        for (Entry<ByteSequence,LocalityGroup> entry : groups.groupByCf.entrySet()) {
    +          if (cfSet.contains(entry.getKey())) {
    +            groupsToUse.add(entry.getValue());
    +          }
             }
           } else {
    -        /*
    -         * Need to consider the following cases for inclusive and exclusive (lgcf:locality group column family set, cf:column family set) lgcf and cf are
    -         * disjoint lgcf and cf are the same cf contains lgcf lgcf contains cf lgccf and cf intersect but neither is a subset of the other
    -         */
    -
    -        for (Entry<ByteSequence,MutableLong> entry : lgr.columnFamilies.entrySet())
    -          if (entry.getValue().longValue() > 0)
    -            if (cfSet.contains(entry.getKey())) {
    -              if (inclusive)
    -                include = true;
    -            } else if (!inclusive) {
    -              include = true;
    -            }
    +        for (ByteSequence cf : cfSet) {
    +          LocalityGroup group = groups.groupByCf.get(cf);
    +          if (group != null) {
    +            groupsToUse.add(group);
    +          }
    +        }
           }
    +    }
     
    -      if (include) {
    -        lgr.getIterator().seek(range, EMPTY_CF_SET, false);
    -        hiter.addSource(lgr.getIterator());
    -        numLGSeeked++;
    -      }// every column family is excluded, zero count, or not present
    +    for (LocalityGroup lgr : groupsToUse) {
    +      lgr.getIterator().seek(range, EMPTY_CF_SET, false);
    +      hiter.addSource(lgr.getIterator());
    +      numLGSeeked++;
    +    }
    +
    +    if (used != null) {
    +      used.addAll(groupsToUse);
         }
     
         return numLGSeeked;
       }
     
       @Override
       public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive) throws IOException {
    -    seek(this, groups, nonDefaultColumnFamilies, range, columnFamilies, inclusive);
    +
    +    // determine if the arguments have changed since the last time
    +    boolean sameArgs = false;
    +    ImmutableSet<ByteSequence> cfSet = null;
    +    if (lastUsed != null && inclusive == lastInclusive) {
    --- End diff --
   
    I was curious if the RFile unit test would cover these different branches.  So I ran those test with code coverage.  None of the code in this function was touched.  I was surprised and dug into it, this function is only used by the iin memory map.  Rfile only calsee the static seek method in this code.
   
    It would be nice to test that seeking the same iterator with different families and inclusiveness works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #275: ACCUMULO-4667 Reworked the LocalityGroupIterator...

ctubbsii
In reply to this post by ctubbsii
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/275#discussion_r125325164
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/system/LocalityGroupIterator.java ---
    @@ -97,75 +134,122 @@ public static final int seek(HeapIterator hiter, LocalityGroup[] groups, Set<Byt
         else
           cfSet = Collections.emptySet();
     
    -    for (LocalityGroup lgr : groups) {
    -      // when include is set to true it means this locality groups contains
    -      // wanted column families
    -      boolean include = false;
    +    // determine the set of groups to use
    +    Collection<LocalityGroup> groupsToUse = Collections.EMPTY_LIST;
     
    -      if (cfSet.size() == 0) {
    -        include = !inclusive;
    -      } else if (lgr.isDefaultLocalityGroup && lgr.columnFamilies == null) {
    -        // do not know what column families are in the default locality group,
    -        // only know what column families are not in it
    +    // if no column families specified, then include all groups unless !inclusive
    +    if (cfSet.size() == 0) {
    +      if (!inclusive) {
    +        groupsToUse = groups.groups;
    +      }
    +    } else {
    +      groupsToUse = new HashSet<LocalityGroup>();
     
    +      // do not know what column families are in the default locality group,
    +      // only know what column families are not in it
    +      if (groups.defaultGroup != null) {
             if (inclusive) {
    -          if (!nonDefaultColumnFamilies.containsAll(cfSet)) {
    +          if (!groups.groupByCf.keySet().containsAll(cfSet)) {
                 // default LG may contain wanted and unwanted column families
    -            include = true;
    +            groupsToUse.add(groups.defaultGroup);
               }// else - everything wanted is in other locality groups, so nothing to do
             } else {
    -          // must include, if all excluded column families are in other locality groups
    -          // then there are not unwanted column families in default LG
    -          include = true;
    +          // must include the default group as it may include cfs not in our cfSet
    +          groupsToUse.add(groups.defaultGroup);
    +        }
    +      }
    +
    +      /*
    +       * Need to consider the following cases for inclusive and exclusive (lgcf:locality group column family set, cf:column family set) lgcf and cf are disjoint
    +       * lgcf and cf are the same cf contains lgcf lgcf contains cf lgccf and cf intersect but neither is a subset of the other
    +       */
    +      if (!inclusive) {
    +        for (Entry<ByteSequence,LocalityGroup> entry : groups.groupByCf.entrySet()) {
    +          if (!cfSet.contains(entry.getKey())) {
    +            groupsToUse.add(entry.getValue());
    +          }
    +        }
    +      } else if (groups.groupByCf.size() <= cfSet.size()) {
    +        for (Entry<ByteSequence,LocalityGroup> entry : groups.groupByCf.entrySet()) {
    +          if (cfSet.contains(entry.getKey())) {
    +            groupsToUse.add(entry.getValue());
    +          }
             }
           } else {
    -        /*
    -         * Need to consider the following cases for inclusive and exclusive (lgcf:locality group column family set, cf:column family set) lgcf and cf are
    -         * disjoint lgcf and cf are the same cf contains lgcf lgcf contains cf lgccf and cf intersect but neither is a subset of the other
    -         */
    -
    -        for (Entry<ByteSequence,MutableLong> entry : lgr.columnFamilies.entrySet())
    -          if (entry.getValue().longValue() > 0)
    -            if (cfSet.contains(entry.getKey())) {
    -              if (inclusive)
    -                include = true;
    -            } else if (!inclusive) {
    -              include = true;
    -            }
    +        for (ByteSequence cf : cfSet) {
    +          LocalityGroup group = groups.groupByCf.get(cf);
    +          if (group != null) {
    +            groupsToUse.add(group);
    +          }
    +        }
           }
    +    }
     
    -      if (include) {
    -        lgr.getIterator().seek(range, EMPTY_CF_SET, false);
    -        hiter.addSource(lgr.getIterator());
    -        numLGSeeked++;
    -      }// every column family is excluded, zero count, or not present
    +    for (LocalityGroup lgr : groupsToUse) {
    +      lgr.getIterator().seek(range, EMPTY_CF_SET, false);
    +      hiter.addSource(lgr.getIterator());
    +      numLGSeeked++;
    +    }
    +
    +    if (used != null) {
    +      used.addAll(groupsToUse);
         }
     
         return numLGSeeked;
       }
     
       @Override
       public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive) throws IOException {
    -    seek(this, groups, nonDefaultColumnFamilies, range, columnFamilies, inclusive);
    +
    +    // determine if the arguments have changed since the last time
    +    boolean sameArgs = false;
    +    ImmutableSet<ByteSequence> cfSet = null;
    +    if (lastUsed != null && inclusive == lastInclusive) {
    --- End diff --
   
    @keith-turner OK, I will work on that....


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #275: ACCUMULO-4667 Reworked the LocalityGroupIterator...

ctubbsii
In reply to this post by ctubbsii
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/275#discussion_r125359256
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/system/LocalityGroupIterator.java ---
    @@ -97,75 +134,122 @@ public static final int seek(HeapIterator hiter, LocalityGroup[] groups, Set<Byt
         else
           cfSet = Collections.emptySet();
     
    -    for (LocalityGroup lgr : groups) {
    -      // when include is set to true it means this locality groups contains
    -      // wanted column families
    -      boolean include = false;
    +    // determine the set of groups to use
    +    Collection<LocalityGroup> groupsToUse = Collections.EMPTY_LIST;
     
    -      if (cfSet.size() == 0) {
    -        include = !inclusive;
    -      } else if (lgr.isDefaultLocalityGroup && lgr.columnFamilies == null) {
    -        // do not know what column families are in the default locality group,
    -        // only know what column families are not in it
    +    // if no column families specified, then include all groups unless !inclusive
    +    if (cfSet.size() == 0) {
    +      if (!inclusive) {
    +        groupsToUse = groups.groups;
    +      }
    +    } else {
    +      groupsToUse = new HashSet<LocalityGroup>();
     
    +      // do not know what column families are in the default locality group,
    +      // only know what column families are not in it
    +      if (groups.defaultGroup != null) {
             if (inclusive) {
    -          if (!nonDefaultColumnFamilies.containsAll(cfSet)) {
    +          if (!groups.groupByCf.keySet().containsAll(cfSet)) {
                 // default LG may contain wanted and unwanted column families
    -            include = true;
    +            groupsToUse.add(groups.defaultGroup);
               }// else - everything wanted is in other locality groups, so nothing to do
             } else {
    -          // must include, if all excluded column families are in other locality groups
    -          // then there are not unwanted column families in default LG
    -          include = true;
    +          // must include the default group as it may include cfs not in our cfSet
    +          groupsToUse.add(groups.defaultGroup);
    +        }
    +      }
    +
    +      /*
    +       * Need to consider the following cases for inclusive and exclusive (lgcf:locality group column family set, cf:column family set) lgcf and cf are disjoint
    +       * lgcf and cf are the same cf contains lgcf lgcf contains cf lgccf and cf intersect but neither is a subset of the other
    +       */
    +      if (!inclusive) {
    +        for (Entry<ByteSequence,LocalityGroup> entry : groups.groupByCf.entrySet()) {
    +          if (!cfSet.contains(entry.getKey())) {
    +            groupsToUse.add(entry.getValue());
    +          }
    +        }
    +      } else if (groups.groupByCf.size() <= cfSet.size()) {
    +        for (Entry<ByteSequence,LocalityGroup> entry : groups.groupByCf.entrySet()) {
    +          if (cfSet.contains(entry.getKey())) {
    +            groupsToUse.add(entry.getValue());
    +          }
             }
           } else {
    -        /*
    -         * Need to consider the following cases for inclusive and exclusive (lgcf:locality group column family set, cf:column family set) lgcf and cf are
    -         * disjoint lgcf and cf are the same cf contains lgcf lgcf contains cf lgccf and cf intersect but neither is a subset of the other
    -         */
    -
    -        for (Entry<ByteSequence,MutableLong> entry : lgr.columnFamilies.entrySet())
    -          if (entry.getValue().longValue() > 0)
    -            if (cfSet.contains(entry.getKey())) {
    -              if (inclusive)
    -                include = true;
    -            } else if (!inclusive) {
    -              include = true;
    -            }
    +        for (ByteSequence cf : cfSet) {
    +          LocalityGroup group = groups.groupByCf.get(cf);
    +          if (group != null) {
    +            groupsToUse.add(group);
    +          }
    +        }
           }
    +    }
     
    -      if (include) {
    -        lgr.getIterator().seek(range, EMPTY_CF_SET, false);
    -        hiter.addSource(lgr.getIterator());
    -        numLGSeeked++;
    -      }// every column family is excluded, zero count, or not present
    +    for (LocalityGroup lgr : groupsToUse) {
    +      lgr.getIterator().seek(range, EMPTY_CF_SET, false);
    +      hiter.addSource(lgr.getIterator());
    +      numLGSeeked++;
    +    }
    +
    +    if (used != null) {
    +      used.addAll(groupsToUse);
         }
     
         return numLGSeeked;
       }
     
       @Override
       public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive) throws IOException {
    -    seek(this, groups, nonDefaultColumnFamilies, range, columnFamilies, inclusive);
    +
    +    // determine if the arguments have changed since the last time
    +    boolean sameArgs = false;
    +    ImmutableSet<ByteSequence> cfSet = null;
    +    if (lastUsed != null && inclusive == lastInclusive) {
    --- End diff --
   
    It appears that the InMemoryMapTest.testLocalityGroups test does go through this logic and indeed changes the column families and changes the inclusiveness testing all of these branches.  I am not sure how the code coverage missed this.  Can you verify please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #275: ACCUMULO-4667 Reworked the LocalityGroupIterator...

ctubbsii
In reply to this post by ctubbsii
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/275#discussion_r127018155
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/system/LocalityGroupIterator.java ---
    @@ -97,75 +134,122 @@ public static final int seek(HeapIterator hiter, LocalityGroup[] groups, Set<Byt
         else
           cfSet = Collections.emptySet();
     
    -    for (LocalityGroup lgr : groups) {
    -      // when include is set to true it means this locality groups contains
    -      // wanted column families
    -      boolean include = false;
    +    // determine the set of groups to use
    +    Collection<LocalityGroup> groupsToUse = Collections.EMPTY_LIST;
    --- End diff --
   
    `Collections.emptyList()` instead? From the method javadoc:
   
    > Using this method is likely to have comparable cost to using the like-named field.  (Unlike this method, the field does not provide type safety.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #275: ACCUMULO-4667 Reworked the LocalityGroupIterator...

ctubbsii
In reply to this post by ctubbsii
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/275#discussion_r127019587
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/util/LocalityGroupUtil.java ---
    @@ -50,16 +52,16 @@
     
       // private static final Logger log = Logger.getLogger(ColumnFamilySet.class);
     
    -  public static final Set<ByteSequence> EMPTY_CF_SET = Collections.emptySet();
    +  public static final Set<ByteSequence> EMPTY_CF_SET = ImmutableSet.of();
     
       public static Set<ByteSequence> families(Collection<Column> columns) {
         if (columns.size() == 0)
           return EMPTY_CF_SET;
    -    Set<ByteSequence> result = new HashSet<>(columns.size());
    +    Builder<ByteSequence> builder = ImmutableSet.builder();
         for (Column col : columns) {
    -      result.add(new ArrayByteSequence(col.getColumnFamily()));
    +      builder.add(new ArrayByteSequence(col.getColumnFamily()));
         }
    -    return result;
    +    return builder.build();
    --- End diff --
   
    Could make this a bit more concise with a Java8 lambda since this is slated for master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #275: ACCUMULO-4667 Reworked the LocalityGroupIterator...

ctubbsii
In reply to this post by ctubbsii
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/275#discussion_r127017065
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/system/LocalityGroupIterator.java ---
    @@ -16,23 +16,28 @@
      */
     package org.apache.accumulo.core.iterators.system;
     
    +import com.google.common.collect.ImmutableSet;
    --- End diff --
   
    Can you please revert the reordering of imports?
   
    If the imports are actually in the wrong order, please open up another issue and lets squash them all at once.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #275: ACCUMULO-4667 Reworked the LocalityGroupIterator...

ctubbsii
In reply to this post by ctubbsii
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/275#discussion_r127019184
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/system/LocalityGroupIterator.java ---
    @@ -97,75 +134,122 @@ public static final int seek(HeapIterator hiter, LocalityGroup[] groups, Set<Byt
         else
           cfSet = Collections.emptySet();
     
    -    for (LocalityGroup lgr : groups) {
    -      // when include is set to true it means this locality groups contains
    -      // wanted column families
    -      boolean include = false;
    +    // determine the set of groups to use
    +    Collection<LocalityGroup> groupsToUse = Collections.EMPTY_LIST;
     
    -      if (cfSet.size() == 0) {
    -        include = !inclusive;
    -      } else if (lgr.isDefaultLocalityGroup && lgr.columnFamilies == null) {
    -        // do not know what column families are in the default locality group,
    -        // only know what column families are not in it
    +    // if no column families specified, then include all groups unless !inclusive
    +    if (cfSet.size() == 0) {
    +      if (!inclusive) {
    +        groupsToUse = groups.groups;
    +      }
    +    } else {
    +      groupsToUse = new HashSet<LocalityGroup>();
     
    +      // do not know what column families are in the default locality group,
    +      // only know what column families are not in it
    +      if (groups.defaultGroup != null) {
             if (inclusive) {
    -          if (!nonDefaultColumnFamilies.containsAll(cfSet)) {
    +          if (!groups.groupByCf.keySet().containsAll(cfSet)) {
                 // default LG may contain wanted and unwanted column families
    -            include = true;
    +            groupsToUse.add(groups.defaultGroup);
               }// else - everything wanted is in other locality groups, so nothing to do
             } else {
    -          // must include, if all excluded column families are in other locality groups
    -          // then there are not unwanted column families in default LG
    -          include = true;
    +          // must include the default group as it may include cfs not in our cfSet
    +          groupsToUse.add(groups.defaultGroup);
    +        }
    +      }
    +
    +      /*
    +       * Need to consider the following cases for inclusive and exclusive (lgcf:locality group column family set, cf:column family set) lgcf and cf are disjoint
    +       * lgcf and cf are the same cf contains lgcf lgcf contains cf lgccf and cf intersect but neither is a subset of the other
    +       */
    +      if (!inclusive) {
    +        for (Entry<ByteSequence,LocalityGroup> entry : groups.groupByCf.entrySet()) {
    +          if (!cfSet.contains(entry.getKey())) {
    +            groupsToUse.add(entry.getValue());
    +          }
    +        }
    +      } else if (groups.groupByCf.size() <= cfSet.size()) {
    +        for (Entry<ByteSequence,LocalityGroup> entry : groups.groupByCf.entrySet()) {
    +          if (cfSet.contains(entry.getKey())) {
    +            groupsToUse.add(entry.getValue());
    +          }
             }
           } else {
    -        /*
    -         * Need to consider the following cases for inclusive and exclusive (lgcf:locality group column family set, cf:column family set) lgcf and cf are
    -         * disjoint lgcf and cf are the same cf contains lgcf lgcf contains cf lgccf and cf intersect but neither is a subset of the other
    -         */
    -
    -        for (Entry<ByteSequence,MutableLong> entry : lgr.columnFamilies.entrySet())
    -          if (entry.getValue().longValue() > 0)
    -            if (cfSet.contains(entry.getKey())) {
    -              if (inclusive)
    -                include = true;
    -            } else if (!inclusive) {
    -              include = true;
    -            }
    +        for (ByteSequence cf : cfSet) {
    +          LocalityGroup group = groups.groupByCf.get(cf);
    +          if (group != null) {
    +            groupsToUse.add(group);
    +          }
    +        }
           }
    +    }
     
    -      if (include) {
    -        lgr.getIterator().seek(range, EMPTY_CF_SET, false);
    -        hiter.addSource(lgr.getIterator());
    -        numLGSeeked++;
    -      }// every column family is excluded, zero count, or not present
    +    for (LocalityGroup lgr : groupsToUse) {
    +      lgr.getIterator().seek(range, EMPTY_CF_SET, false);
    +      hiter.addSource(lgr.getIterator());
    +      numLGSeeked++;
    +    }
    +
    +    if (used != null) {
    +      used.addAll(groupsToUse);
         }
     
         return numLGSeeked;
       }
     
       @Override
       public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive) throws IOException {
    -    seek(this, groups, nonDefaultColumnFamilies, range, columnFamilies, inclusive);
    +
    +    // determine if the arguments have changed since the last time
    +    boolean sameArgs = false;
    +    ImmutableSet<ByteSequence> cfSet = null;
    +    if (lastUsed != null && inclusive == lastInclusive) {
    +      if (columnFamilies instanceof Set) {
    +        sameArgs = lastColumnFamilies.equals(columnFamilies);
    +      } else {
    +        cfSet = ImmutableSet.copyOf(columnFamilies);
    +        sameArgs = lastColumnFamilies.equals(cfSet);
    --- End diff --
   
    `sameArgs` is essentially computed the same way in both branches. Could pull it out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
123
Loading...