[GitHub] accumulo pull request #168: ACCUMULO-4501 (work in progress) Very simple imp...

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #168: ACCUMULO-4501 (work in progress) Very simple imp...

joshelser
GitHub user joshelser opened a pull request:

    https://github.com/apache/accumulo/pull/168

    ACCUMULO-4501 (work in progress) Very simple implementation of a visib…

    …ility histogram

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/joshelser/accumulo 4501-rfile-viz-histograms

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/accumulo/pull/168.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #168
   
----
commit 317be5399d85c785f21fec2e5b2ac6dc425df57b
Author: Josh Elser <[hidden email]>
Date:   2016-10-17T04:15:42Z

    ACCUMULO-4501 (work in progress) Very simple implementation of a visibility histogram

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #168: ACCUMULO-4501 (work in progress) Very simple implementa...

joshelser
Github user keith-turner commented on the issue:

    https://github.com/apache/accumulo/pull/168
 
    One option to consider instead of modifying RFile is to make it a decorator like [BloomFilterLayer][1].  BloomFilterLayer stores its information in RFile metadata.  I'm think will be problems with this approach, but I would not know what they are w/o actually trying it.  
   
    Are you considering making this a generic histogram functionality, where the user can configure a function that emits counts for a given Key Value?
   
    [1]: https://github.com/apache/accumulo/blob/e900e67425d950bd4c0c5288a6270d7b362ac458/core/src/main/java/org/apache/accumulo/core/file/BloomFilterLayer.java


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #168: ACCUMULO-4501 (work in progress) Very simple imp...

joshelser
In reply to this post by joshelser
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/168#discussion_r83632592
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/file/rfile/histogram/HashMapVisibilityHistogram.java ---
    @@ -0,0 +1,141 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to you under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.core.file.rfile.histogram;
    +
    +import java.io.DataInput;
    +import java.io.DataOutput;
    +import java.io.IOException;
    +import java.util.AbstractMap;
    +import java.util.HashMap;
    +import java.util.Iterator;
    +import java.util.Map.Entry;
    +import java.util.Objects;
    +import java.util.concurrent.atomic.AtomicLong;
    +
    +import org.apache.accumulo.core.data.Key;
    +import org.apache.hadoop.io.Text;
    +
    +import com.google.common.base.Function;
    +import com.google.common.collect.Iterators;
    +
    +/**
    + * A basic unbounded hash map implementation.
    + */
    +public class HashMapVisibilityHistogram implements VisibilityHistogram {
    +
    +  private HashMap<Text,AtomicLong> histogram;
    +  private ThreadLocal<Text> buffer = new ThreadLocal<Text>() {
    +    @Override public Text initialValue() {
    +      return new Text();
    +    }
    +  };
    +
    +  public HashMapVisibilityHistogram() {
    +    this.histogram = new HashMap<>();
    +  }
    +
    +  @Override
    +  public long increment(Key k) {
    +    final Text t = buffer.get();
    +    Objects.requireNonNull(k.getColumnVisibility(t));
    +    AtomicLong count = histogram.get(t);
    --- End diff --
   
    Is an AtomicLong needed for concurrency reasons?  If not, then could create a simple class like MutableLong in [MapCounter][1] to avoid volatile.
   
    [1]: https://github.com/apache/accumulo/blob/e900e67425d950bd4c0c5288a6270d7b362ac458/core/src/main/java/org/apache/accumulo/core/util/MapCounter.java


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #168: ACCUMULO-4501 (work in progress) Very simple imp...

joshelser
In reply to this post by joshelser
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/168#discussion_r83634597
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java ---
    @@ -374,7 +380,42 @@ public void flushIfNeeded() throws IOException {
         }
       }
     
    -  private static class LocalityGroupWriter {
    +  private static class VisibilityHistogramLocalityGroupWriter {
    +    private final HashMap<Text,AtomicLong> histogram;
    +    private final LocalityGroupWriter lgr;
    +
    +    private ThreadLocal<Text> buffer = new ThreadLocal<Text>() {
    +      @Override public Text initialValue() {
    +        return new Text();
    +      }
    +    };
    +
    +    public VisibilityHistogramLocalityGroupWriter(LocalityGroupWriter lgr) {
    +      this.lgr = lgr;
    +      this.histogram = new HashMap<>();
    +    }
    +
    +    public void append(Key key, Value value) throws IOException {
    +      Text _text = buffer.get();
    +      key.getColumnFamily(_text);
    --- End diff --
   
    get the family?  Is this duplicate code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #168: ACCUMULO-4501 (work in progress) Very simple imp...

joshelser
In reply to this post by joshelser
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/168#discussion_r83634090
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/file/rfile/histogram/VisibilityHistogramLocalityGroupWriter.java ---
    @@ -0,0 +1,62 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to you under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.core.file.rfile.histogram;
    +
    +import java.io.IOException;
    +import java.util.HashMap;
    +import java.util.concurrent.atomic.AtomicLong;
    +
    +import org.apache.accumulo.core.data.Key;
    +import org.apache.accumulo.core.data.Value;
    +import org.apache.accumulo.core.file.rfile.RFile.LocalityGroupWriter;
    +import org.apache.hadoop.io.Text;
    +
    +/**
    + *
    + */
    +public class VisibilityHistogramLocalityGroupWriter {
    +  private final HashMap<Text,AtomicLong> histogram;
    +  private final LocalityGroupWriter lgr;
    +
    +  private ThreadLocal<Text> buffer = new ThreadLocal<Text>() {
    +    @Override public Text initialValue() {
    +      return new Text();
    +    }
    +  };
    +
    +  public VisibilityHistogramLocalityGroupWriter(LocalityGroupWriter lgr) {
    +    this.lgr = lgr;
    +    this.histogram = new HashMap<>();
    +  }
    +
    +  public void append(Key key, Value value) throws IOException {
    +    Text _text = buffer.get();
    +    key.getColumnVisibility(_text);
    +    AtomicLong count = histogram.get(_text);
    --- End diff --
   
    I think this could use the ByteSequence returned by getColumnVisibilityData() to do the lookup in the map (would require keying map on ByteSequence).  This would avoid the copy for lookup.   Only copy when inserting into the map.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #168: ACCUMULO-4501 (work in progress) Very simple imp...

joshelser
In reply to this post by joshelser
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/168#discussion_r83653920
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/file/rfile/histogram/HashMapVisibilityHistogram.java ---
    @@ -0,0 +1,141 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to you under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.accumulo.core.file.rfile.histogram;
    +
    +import java.io.DataInput;
    +import java.io.DataOutput;
    +import java.io.IOException;
    +import java.util.AbstractMap;
    +import java.util.HashMap;
    +import java.util.Iterator;
    +import java.util.Map.Entry;
    +import java.util.Objects;
    +import java.util.concurrent.atomic.AtomicLong;
    +
    +import org.apache.accumulo.core.data.Key;
    +import org.apache.hadoop.io.Text;
    +
    +import com.google.common.base.Function;
    +import com.google.common.collect.Iterators;
    +
    +/**
    + * A basic unbounded hash map implementation.
    + */
    +public class HashMapVisibilityHistogram implements VisibilityHistogram {
    +
    +  private HashMap<Text,AtomicLong> histogram;
    +  private ThreadLocal<Text> buffer = new ThreadLocal<Text>() {
    +    @Override public Text initialValue() {
    +      return new Text();
    +    }
    +  };
    +
    +  public HashMapVisibilityHistogram() {
    +    this.histogram = new HashMap<>();
    +  }
    +
    +  @Override
    +  public long increment(Key k) {
    +    final Text t = buffer.get();
    +    Objects.requireNonNull(k.getColumnVisibility(t));
    +    AtomicLong count = histogram.get(t);
    --- End diff --
   
    I don't think it would be needed. Using something faster would probably be better.
   
    Sam applies to using an unbounded HashMap for storage (e.g. handling tables with 100's to 1000's of visibilities per file).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #168: ACCUMULO-4501 (work in progress) Very simple imp...

joshelser
In reply to this post by joshelser
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/168#discussion_r83653991
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java ---
    @@ -374,7 +380,42 @@ public void flushIfNeeded() throws IOException {
         }
       }
     
    -  private static class LocalityGroupWriter {
    +  private static class VisibilityHistogramLocalityGroupWriter {
    +    private final HashMap<Text,AtomicLong> histogram;
    +    private final LocalityGroupWriter lgr;
    +
    +    private ThreadLocal<Text> buffer = new ThreadLocal<Text>() {
    +      @Override public Text initialValue() {
    +        return new Text();
    +      }
    +    };
    +
    +    public VisibilityHistogramLocalityGroupWriter(LocalityGroupWriter lgr) {
    +      this.lgr = lgr;
    +      this.histogram = new HashMap<>();
    +    }
    +
    +    public void append(Key key, Value value) throws IOException {
    +      Text _text = buffer.get();
    +      key.getColumnFamily(_text);
    --- End diff --
   
    Lol, bad copy-paste :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #168: ACCUMULO-4501 (work in progress) Very simple implementa...

joshelser
In reply to this post by joshelser
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/168
 
    > One option to consider instead of modifying RFile is to make it a decorator like BloomFilterLayer. BloomFilterLayer stores its information in RFile metadata. I'm think will be problems with this approach, but I would not know what they are w/o actually trying it.
   
    This is my first foray into the RFile codebase, so I am very happy to be redirected into a different implementation :). My liberal use of increasing visibility on classes ought to be apparent haha.
   
    > Are you considering making this a generic histogram functionality, where the user can configure a function that emits counts for a given Key Value?
   
    If we can abstract this specific feature into something more generic without it blowing up, I'm ok with that. I just don't have a big picture view right now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #168: ACCUMULO-4501 (work in progress) Very simple implementa...

joshelser
In reply to this post by joshelser
Github user keith-turner commented on the issue:

    https://github.com/apache/accumulo/pull/168
 
    @joshelser I have not forgotten about this. We had discussed writing up a design doc on IRC.  I have just been busy.  I plan to take a crack at that Thur.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #168: ACCUMULO-4501 (work in progress) Very simple implementa...

joshelser
In reply to this post by joshelser
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/168
 
    @keith-turner nbd. This has fallen by the way-side for me too :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #168: ACCUMULO-4501 (work in progress) Very simple implementa...

joshelser
In reply to this post by joshelser
Github user keith-turner commented on the issue:

    https://github.com/apache/accumulo/pull/168
 
    I just posted #180 for review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #168: ACCUMULO-4501 (work in progress) Very simple imp...

joshelser
In reply to this post by joshelser
Github user asfgit closed the pull request at:

    https://github.com/apache/accumulo/pull/168


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Loading...