[GitHub] accumulo pull request #271: ACCUMULO-4655 Added a Response Time column to th...

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #271: ACCUMULO-4655 Added a Response Time column to th...

joshelser
GitHub user ivakegg opened a pull request:

    https://github.com/apache/accumulo/pull/271

    ACCUMULO-4655 Added a Response Time column to the monitor which shows…

    … how long it took to get status from that tserver.  This is useful in determining whether long Last Contact times are dues to one or more tservers or whether this is due to churn in the master.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ivakegg/accumulo ACCUMULO-4655

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/accumulo/pull/271.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #271
   
----
commit 2c9a569ab3b6c96790ef68a034ecafda7172dcab
Author: Ivan Bella <[hidden email]>
Date:   2017-06-19T16:55:01Z

    ACCUMULO-4655 Added a Response Time column to the monitor which shows how long it took to get status from that tserver.  This is useful in determining whether long Last Contact times are dues to one or more tservers or whether this is due to churn in the master.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #271: ACCUMULO-4655 Added a Response Time column to th...

joshelser
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/271#discussion_r123024171
 
    --- Diff: server/base/src/main/java/org/apache/accumulo/server/master/LiveTServerSet.java ---
    @@ -120,11 +120,17 @@ public TabletServerStatus getTableMap(boolean usePooledConnection) throws TExcep
           if (usePooledConnection == true)
             throw new UnsupportedOperationException();
     
    +      long start = System.currentTimeMillis();
    --- End diff --
   
    Using `System.nanoTime()` would be better here. More accurate and not subject to system clock change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #271: ACCUMULO-4655 Added a Response Time column to th...

joshelser
In reply to this post by joshelser
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/271#discussion_r123024693
 
    --- Diff: server/monitor/src/main/java/org/apache/accumulo/monitor/servlets/TServersServlet.java ---
    @@ -314,13 +314,22 @@ static void doTserverList(HttpServletRequest req, StringBuilder sb, List<TabletS
         long now = System.currentTimeMillis();
     
         double avgLastContact = 0.;
    +    double avgResponseTime = 0.;
    +    int count = 0;
         for (TabletServerStatus status : tservers) {
    +      count++;
           avgLastContact += (now - status.lastContact);
    +      avgResponseTime += status.responseTime;
    --- End diff --
   
    Did you give any thought to including other statistics than just avg? I am thinking about the case where there are one or a few outliers which significantly skew the mean. p95/p99 are nice to give some context, but admittedly require a little more work to compute.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #271: ACCUMULO-4655 Added a Response Time column to th...

joshelser
In reply to this post by joshelser
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/271#discussion_r123024829
 
    --- Diff: server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java ---
    @@ -2852,6 +2852,7 @@ public void run() {
       }
     
       public TabletServerStatus getStats(Map<String,MapCounter<ScanRunState>> scanCounts) {
    +    long start = System.currentTimeMillis();
    --- End diff --
   
    Can use nanoTime here again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #271: ACCUMULO-4655 Added a Response Time column to th...

joshelser
In reply to this post by joshelser
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/271#discussion_r123024306
 
    --- Diff: server/monitor/src/main/java/org/apache/accumulo/monitor/servlets/TServersServlet.java ---
    @@ -314,13 +314,22 @@ static void doTserverList(HttpServletRequest req, StringBuilder sb, List<TabletS
         long now = System.currentTimeMillis();
     
         double avgLastContact = 0.;
    +    double avgResponseTime = 0.;
    +    int count = 0;
         for (TabletServerStatus status : tservers) {
    +      count++;
           avgLastContact += (now - status.lastContact);
    +      avgResponseTime += status.responseTime;
         }
    -    final long MINUTES = 3 * 60 * 1000;
    +    avgResponseTime /= count;
    +    final long SECOND = 1000;
    --- End diff --
   
    Make these static final at the class-level, please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #271: ACCUMULO-4655 Added a Response Time column to th...

joshelser
In reply to this post by joshelser
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/271#discussion_r123051986
 
    --- Diff: server/base/src/main/java/org/apache/accumulo/server/master/LiveTServerSet.java ---
    @@ -120,11 +120,17 @@ public TabletServerStatus getTableMap(boolean usePooledConnection) throws TExcep
           if (usePooledConnection == true)
             throw new UnsupportedOperationException();
     
    +      long start = System.currentTimeMillis();
    --- End diff --
   
    interesting...I will have to look that one up.  Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #271: ACCUMULO-4655 Added a Response Time column to th...

joshelser
In reply to this post by joshelser
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/271#discussion_r123054651
 
    --- Diff: server/monitor/src/main/java/org/apache/accumulo/monitor/servlets/TServersServlet.java ---
    @@ -314,13 +314,22 @@ static void doTserverList(HttpServletRequest req, StringBuilder sb, List<TabletS
         long now = System.currentTimeMillis();
     
         double avgLastContact = 0.;
    +    double avgResponseTime = 0.;
    +    int count = 0;
         for (TabletServerStatus status : tservers) {
    +      count++;
           avgLastContact += (now - status.lastContact);
    +      avgResponseTime += status.responseTime;
    --- End diff --
   
    I was going with the mechanism already used for the lastContact time.  I will think on this.....


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #271: ACCUMULO-4655 Added a Response Time column to th...

joshelser
In reply to this post by joshelser
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/271#discussion_r123054747
 
    --- Diff: server/base/src/main/java/org/apache/accumulo/server/master/LiveTServerSet.java ---
    @@ -120,11 +120,17 @@ public TabletServerStatus getTableMap(boolean usePooledConnection) throws TExcep
           if (usePooledConnection == true)
             throw new UnsupportedOperationException();
     
    +      long start = System.currentTimeMillis();
    --- End diff --
   
    Ok, it appears to be more accurate however it is in general slower depending on how the nanoTime is implemented (hardware dependent).  Not too sure we really need that time of accuracy here.....


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #271: ACCUMULO-4655 Added a Response Time column to th...

joshelser
In reply to this post by joshelser
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/271#discussion_r123054881
 
    --- Diff: server/monitor/src/main/java/org/apache/accumulo/monitor/servlets/TServersServlet.java ---
    @@ -314,13 +314,22 @@ static void doTserverList(HttpServletRequest req, StringBuilder sb, List<TabletS
         long now = System.currentTimeMillis();
     
         double avgLastContact = 0.;
    +    double avgResponseTime = 0.;
    +    int count = 0;
         for (TabletServerStatus status : tservers) {
    +      count++;
           avgLastContact += (now - status.lastContact);
    +      avgResponseTime += status.responseTime;
         }
    -    final long MINUTES = 3 * 60 * 1000;
    +    avgResponseTime /= count;
    +    final long SECOND = 1000;
    --- End diff --
   
    right, of course.  Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #271: ACCUMULO-4655 Added a Response Time column to th...

joshelser
In reply to this post by joshelser
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/271#discussion_r123055910
 
    --- Diff: server/monitor/src/main/java/org/apache/accumulo/monitor/servlets/TServersServlet.java ---
    @@ -314,13 +314,22 @@ static void doTserverList(HttpServletRequest req, StringBuilder sb, List<TabletS
         long now = System.currentTimeMillis();
     
         double avgLastContact = 0.;
    +    double avgResponseTime = 0.;
    +    int count = 0;
         for (TabletServerStatus status : tservers) {
    +      count++;
           avgLastContact += (now - status.lastContact);
    +      avgResponseTime += status.responseTime;
    --- End diff --
   
    What I cannot figure out here is that the avgLastContact is not actually an average...it is never divided by count (which I added).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #271: ACCUMULO-4655 Added a Response Time column to th...

joshelser
In reply to this post by joshelser
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/271#discussion_r123056495
 
    --- Diff: server/base/src/main/java/org/apache/accumulo/server/master/LiveTServerSet.java ---
    @@ -120,11 +120,17 @@ public TabletServerStatus getTableMap(boolean usePooledConnection) throws TExcep
           if (usePooledConnection == true)
             throw new UnsupportedOperationException();
     
    +      long start = System.currentTimeMillis();
    --- End diff --
   
    I've started preferring to use nanoTime() for small, single-threaded timings. I know that's the sort of thing that it's meant to do well (local timings), but I'm not sure if there's a downside to it. Just thought I'd mention it. Like you say, at the end of the day, it probably won't matter.
   
    I think @keith-turner had originally turned me onto it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #271: ACCUMULO-4655 Added a Response Time column to th...

joshelser
In reply to this post by joshelser
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/271#discussion_r123056601
 
    --- Diff: server/monitor/src/main/java/org/apache/accumulo/monitor/servlets/TServersServlet.java ---
    @@ -314,13 +314,22 @@ static void doTserverList(HttpServletRequest req, StringBuilder sb, List<TabletS
         long now = System.currentTimeMillis();
     
         double avgLastContact = 0.;
    +    double avgResponseTime = 0.;
    +    int count = 0;
         for (TabletServerStatus status : tservers) {
    +      count++;
           avgLastContact += (now - status.lastContact);
    +      avgResponseTime += status.responseTime;
    --- End diff --
   
    Hahaha, that's a good catch :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo pull request #271: ACCUMULO-4655 Added a Response Time column to th...

joshelser
In reply to this post by joshelser
Github user ivakegg commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/271#discussion_r123057896
 
    --- Diff: server/monitor/src/main/java/org/apache/accumulo/monitor/servlets/TServersServlet.java ---
    @@ -309,18 +309,28 @@ public static void doDeadServerTable(HttpServletRequest req, StringBuilder sb, T
         }
       }
     
    +  static final long SECOND = 1000;
    +  static final long RESPONSE_TIME_MAX_ERR = 3 * SECOND;
    +  static final long MINUTE = 60 * SECOND;
    +  static final long LAST_CONTEXT_MAX_ERR = 3 * MINUTE;
    +
       static void doTserverList(HttpServletRequest req, StringBuilder sb, List<TabletServerStatus> tservers, String tableId, Table tServerList) {
         int guessHighLoad = ManagementFactory.getOperatingSystemMXBean().getAvailableProcessors();
         long now = System.currentTimeMillis();
     
         double avgLastContact = 0.;
    +    double avgResponseTime = 0.;
    +    int count = 0;
         for (TabletServerStatus status : tservers) {
    +      count++;
           avgLastContact += (now - status.lastContact);
    +      avgResponseTime += status.responseTime;
         }
    -    final long MINUTES = 3 * 60 * 1000;
    +    avgResponseTime /= count;
         tServerList.addSortableColumn("Server", new TServerLinkType(), null);
         tServerList.addSortableColumn("Hosted&nbsp;Tablets", new NumberType<>(0, Integer.MAX_VALUE), null);
    -    tServerList.addSortableColumn("Last&nbsp;Contact", new DurationType(0l, (long) Math.min(avgLastContact * 4, MINUTES)), null);
    +    tServerList.addSortableColumn("Last&nbsp;Contact", new DurationType(0l, (long) Math.min(avgLastContact * 4, LAST_CONTEXT_MAX_ERR)), null);
    --- End diff --
   
    ...and why are we multiplying the (not so) average by 4?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #271: ACCUMULO-4655 Added a Response Time column to the monit...

joshelser
In reply to this post by joshelser
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/271
 
    @ivakegg Are you planning on merging this forward to master?  We have yet to merge the new Monitor so I am wondering what would be the best technique to make sure this doesn't get lost.  Maybe open another PR for master and keep it open until the Monitor PR gets merged?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #271: ACCUMULO-4655 Added a Response Time column to the monit...

joshelser
In reply to this post by joshelser
Github user ivakegg commented on the issue:

    https://github.com/apache/accumulo/pull/271
 
    @milleruntime Yes, I would like this feature to persist.  I expect this would be merged into the master when complete.  If that is before the Monitor PR gets merge, then the person to merge the Monitor PR will have to resolve the conflicts.  Otherwise the person that merges this PR will have to resolve the conflicts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[GitHub] accumulo issue #271: ACCUMULO-4655 Added a Response Time column to the monit...

joshelser
In reply to this post by joshelser
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/271
 
    Ok sounds good.  I was just thinking ahead since TServersServlet.java is removed in that PR.  It shouldn't be a problem finding a spot for your new stuff though... it should actually be easier with the new REST code.  A quick glance and it seems like here might be a good spot: https://github.com/apache/accumulo/pull/242/files#diff-a5f1cdfdb1d00ef6ab74e7e25cd35ce9


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Loading...