[GitHub] accumulo pull request #269: ACCUMULO-4656

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #269: ACCUMULO-4656

ctubbsii
GitHub user matthpeterson opened a pull request:

    https://github.com/apache/accumulo/pull/269

    ACCUMULO-4656

    ACCUMULO-4656 new option for PrintInfo that provides index information

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/matthpeterson/accumulo ACCUMULO-4656

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/accumulo/pull/269.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #269
   
----
commit 6b3af73752091bdd20ac348177b9245f2be58086
Author: matthpeterson <[hidden email]>
Date:   2017-06-19T14:34:07Z

    Update PrintInfo.java

commit ed3cd47d133fb8f13a1bea5a03504ce37d7284c3
Author: matthpeterson <[hidden email]>
Date:   2017-06-19T14:37:16Z

    Update MultiLevelIndex.java

commit d95c28086b2c2d0a62220b3d527982e4b790751c
Author: matthpeterson <[hidden email]>
Date:   2017-06-19T14:42:34Z

    Update RFile.java

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo issue #269: ACCUMULO-4656

ctubbsii
Github user dlmarion commented on the issue:

    https://github.com/apache/accumulo/pull/269
 
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo issue #269: ACCUMULO-4656

ctubbsii
In reply to this post by ctubbsii
Github user keith-turner commented on the issue:

    https://github.com/apache/accumulo/pull/269
 
    I tried running this and though it was strange that the index was printed in the middle of the locality group info w/ no context.
   
    ```
    Locality group           : <DEFAULT>
    Num   blocks           : 1
    Index level 0          : 29 bytes  1 blocks
    First key              : r1 f1:q1 [] 1497902390579 false
    Last key               : r1 f1:q5 [] 1497902395419 false
    Key: r1 f1:q5 [] 1497902395419 false NumEntries: 4 Offset: 16 CompressedSize: 53 RawSize: 66
    Num entries            : 4
    Column families        : [f1]
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo issue #269: ACCUMULO-4656

ctubbsii
In reply to this post by ctubbsii
Github user keith-turner commented on the issue:

    https://github.com/apache/accumulo/pull/269
 
    Currently this only prints the lowest level of the index tree.  Could also print index entries for higher levels.  If interested, I could post a function that shows how this might be done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo issue #269: ACCUMULO-4656

ctubbsii
In reply to this post by ctubbsii
Github user mjwall commented on the issue:

    https://github.com/apache/accumulo/pull/269
 
    It would be nice to see all levels, maybe indented appropriately.  Could go in a new section like the dump option does.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo issue #269: ACCUMULO-4656

ctubbsii
In reply to this post by ctubbsii
Github user matthpeterson commented on the issue:

    https://github.com/apache/accumulo/pull/269
 
    RE: Formatting comments
    Agreed, the formatting is jarring as-is.  Does the below format look better to everyone?  I've added an "Index Entries" header, put the index of the IndexEntry on the left (double indented) and the IndexEntry information on the right, indented to line up with the other values in the print out.  
   
    I'd prefer to get consensus before changing the formatting so please let me know if you would like something other than what I've proposed below.  I'll try to make the code changes tomorrow.
   
    ```
    Locality group           : <DEFAULT>
    Num   blocks           : 1
    Index level 0          : 29 bytes  1 blocks
    First key              : r1 f1:q1 [] 1497902390579 false
    Last key               : r1 f1:q5 [] 1497902395419 false
    Index Entries
    0              : Key: r1 f1:q5 [] 1497902395419 false NumEntries: 4 Offset: 16 CompressedSize: 53 RawSize: 66
    1              : Key: r2 f1:q5 [] 1497902395419 false NumEntries: 4 Offset: 16 CompressedSize: 53 RawSize: 66
    Num entries            : 4
    Column families        : [f1]I can add the indenting.
    ```
   
    RE: printing multiple levels
    I was interested in getting only the lowest level of the index for this ticket so that I could get some statistics on the rfile contents without reading the entire file.  Having multiple indexes might simplify certain types of analysis and complicate others.  It would be nice to be able to easily filter the lowest level.  Perhaps the level could be printed on the left hand side, like this:
    ```
    Index Entries
    L0,0              : Key: r1 f1:q5 [] 1497902395419 false NumEntries: 4 Offset: 16 CompressedSize: 53 RawSize: 66
    L0,1              : Key: r2 f1:q5 [] 1497902395419 false NumEntries: 4 Offset: 16 CompressedSize: 53 RawSize: 66
    ```
   
    Overall it seems worthwhile though to provide more information.  I'd welcome Keith's offer of posting a function.  Feel free to adjust the PR with the suggested code if you'd like or I can modify it.  
   
    Thanks for the feedback.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo issue #269: ACCUMULO-4656

ctubbsii
In reply to this post by ctubbsii
Github user phrocker commented on the issue:

    https://github.com/apache/accumulo/pull/269
 
    @matthpeterson I tend to agree with other commenters. In my opinion you may want to add options to support showing all levels, lowest level, or an arbitrary level of the index tree. I've certainly needed the latter and it helps to print the information vice using a profiler to get it. The ticket's title is displaying index information, so that seems we should make it flexible to support more than just the lowest level.
   
    I'm always supportive of @keith-turner writing functions...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo issue #269: ACCUMULO-4656

ctubbsii
In reply to this post by ctubbsii
Github user keith-turner commented on the issue:

    https://github.com/apache/accumulo/pull/269
 
    I created matthpeterson/accumulo#1 which has a comment showing output


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo issue #269: ACCUMULO-4656

ctubbsii
In reply to this post by ctubbsii
Github user matthpeterson commented on the issue:

    https://github.com/apache/accumulo/pull/269
 
    Thanks, Keith for the changes.  I think all comments are now addressed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #269: ACCUMULO-4656

ctubbsii
In reply to this post by ctubbsii
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/269#discussion_r123017014
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java ---
    @@ -309,11 +309,18 @@ public void printInfo(boolean isSample) throws IOException {
           long numKeys = 0;
           IndexIterator countIter = indexReader.lookup(new Key());
           while (countIter.hasNext()) {
    -        numKeys += countIter.next().getNumEntries();
    +        IndexEntry indexEntry = countIter.next();
    +        numKeys += indexEntry.getNumEntries();
           }
     
           out.printf("\t%-22s : %,d\n", "Num entries", numKeys);
           out.printf("\t%-22s : %s\n", "Column families", (isDefaultLG && columnFamilies == null ? "<UNKNOWN>" : columnFamilies.keySet()));
    +
    +      if (includeIndexDetails) {
    +        out.printf("\t%-22s :\n", "Index Entries", lastKey);
    +        String prefix = String.format("\t   ", "");
    --- End diff --
   
    This used to do something different, then i changed it.  Now its just goofy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #269: ACCUMULO-4656

ctubbsii
In reply to this post by ctubbsii
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/269#discussion_r123023639
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/file/rfile/PrintInfo.java ---
    @@ -55,6 +55,8 @@
         boolean hash = false;
         @Parameter(names = {"--histogram"}, description = "print a histogram of the key-value sizes")
         boolean histogram = false;
    +    @Parameter(names = {"--printIndex"}, description = "prints information about all the index entries")
    +    boolean printIndex = false;
    --- End diff --
   
    Given some of the other discussion, I was thinking that a `--maxDepth` option might also be nice to add. Food for thought.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #269: ACCUMULO-4656

ctubbsii
In reply to this post by ctubbsii
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/269#discussion_r123025719
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/file/rfile/PrintInfo.java ---
    @@ -55,6 +55,8 @@
         boolean hash = false;
         @Parameter(names = {"--histogram"}, description = "print a histogram of the key-value sizes")
         boolean histogram = false;
    +    @Parameter(names = {"--printIndex"}, description = "prints information about all the index entries")
    +    boolean printIndex = false;
    --- End diff --
   
    May also want a min depth.   I think both of the use cases can be also be achieved with a follow on grep.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo issue #269: ACCUMULO-4656

ctubbsii
In reply to this post by ctubbsii
Github user mjwall commented on the issue:

    https://github.com/apache/accumulo/pull/269
 
    This is the check that failed in travis
   
    > [INFO] Format-string method java.io.PrintStream.printf(String, Object[]) called with format string " %-22s :\n" wants 1 arguments but is given 2 in org.apache.accumulo.core.file.rfile.RFile$LocalityGroupMetadata.printInfo(boolean, boolean) [org.apache.accumulo.core.file.rfile.RFile$LocalityGroupMetadata] At RFile.java:[line 320] VA_FORMAT_STRING_EXTRA_ARGUMENTS_PASSED
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo issue #269: ACCUMULO-4656

ctubbsii
In reply to this post by ctubbsii
Github user mjwall commented on the issue:

    https://github.com/apache/accumulo/pull/269
 
    I am fixing the issue noted above and will merge this now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #269: ACCUMULO-4656

ctubbsii
In reply to this post by ctubbsii
Github user asfgit closed the pull request at:

    https://github.com/apache/accumulo/pull/269


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo issue #269: ACCUMULO-4656

ctubbsii
In reply to this post by ctubbsii
Github user mjwall commented on the issue:

    https://github.com/apache/accumulo/pull/269
 
    @matthpeterson would you like to be added to the contributors list?  Either make a PR against https://github.com/apache/accumulo-website or let know what you listed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---