[GitHub] accumulo pull request #235: ACCUMULO-4612 - Simplify Accumulo memory configu...

classic Classic list List threaded Threaded
44 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #235: ACCUMULO-4612 - Simplify Accumulo memory configu...

ctubbsii
GitHub user mikewalch opened a pull request:

    https://github.com/apache/accumulo/pull/235

    ACCUMULO-4612 - Simplify Accumulo memory configuration

    * Tablet server memory setttings now default to a percentage
      of Java heap space given to process.  Default settings will
      work for various heap space settings and fixed memory
      settings are still supported.
    * Remove 'accumulo create-config' command as it no longer needed.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mikewalch/accumulo config-refactor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/accumulo/pull/235.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #235
   
----
commit d156d7008edeaaa219edc35ea10a14d43abe6a6e
Author: Mike Walch <[hidden email]>
Date:   2017-03-20T20:36:45Z

    ACCUMULO-4612 - Simplify Accumulo memory configuration
   
    * Tablet server memory setttings now default to a percentage
      of Java heap space given to process.  Default settings will
      work for various heap space settings and fixed memory
      settings are still supported.
    * Remove 'accumulo create-config' command as it no longer needed.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #235: ACCUMULO-4612 - Simplify Accumulo memory configu...

ctubbsii
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/235#discussion_r107680034
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/conf/Property.java ---
    @@ -253,13 +252,13 @@
           + " max number of concurrent writer when configuring. When using Hadoop 2, Accumulo will call hsync() on the WAL . For a small number of "
           + "concurrent writers, increasing this buffer size decreases the frequncy of hsync calls. For a large number of concurrent writers a small buffers "
           + "size is ok because of group commit."),
    -  TSERV_TOTAL_MUTATION_QUEUE_MAX("tserver.total.mutation.queue.max", "50M", PropertyType.MEMORY,
    +  TSERV_TOTAL_MUTATION_QUEUE_MAX("tserver.total.mutation.queue.max", "5%", PropertyType.MEMORY,
           "The amount of memory used to store write-ahead-log mutations before flushing them."),
       TSERV_TABLET_SPLIT_FINDMIDPOINT_MAXOPEN("tserver.tablet.split.midpoint.files.max", "300", PropertyType.COUNT,
           "To find a tablets split points, all index files are opened. This setting determines how many index "
               + "files can be opened at once. When there are more index files than this setting multiple passes "
               + "must be made, which is slower. However opening too many files at once can cause problems."),
    -  TSERV_WALOG_MAX_SIZE("tserver.walog.max.size", "1G", PropertyType.MEMORY,
    +  TSERV_WALOG_MAX_SIZE("tserver.walog.max.size", "33%", PropertyType.MEMORY,
    --- End diff --
   
    This is a file size limit.   Should probably be left at 1G


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #235: ACCUMULO-4612 - Simplify Accumulo memory configu...

ctubbsii
In reply to this post by ctubbsii
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/235#discussion_r107684327
 
    --- Diff: assemble/conf/accumulo-env.sh ---
    @@ -36,6 +36,22 @@ export HADOOP_CONF_DIR="${HADOOP_CONF_DIR:-${HADOOP_PREFIX}/etc/hadoop}"
     ## Zookeeper installation
     export ZOOKEEPER_HOME="${ZOOKEEPER_HOME:-/path/to/zookeeper}"
     
    +##########################
    +# Build CLASSPATH variable
    +##########################
    +
    +## Adds external Hadoop & Zookeeper dependencies to CLASSPATH. See "Vendor configuration" section of Accumulo user manual
    +## for different settings if you installed vendor's distribution of Hadoop or Zookeeper.
    --- End diff --
   
    If we put info about vendor specific configs on website, then we could put a link to that info here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #235: ACCUMULO-4612 - Simplify Accumulo memory configu...

ctubbsii
In reply to this post by ctubbsii
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/235#discussion_r107684856
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/conf/Property.java ---
    @@ -170,10 +170,9 @@
       // general properties
       GENERAL_PREFIX("general.", null, PropertyType.PREFIX,
           "Properties in this category affect the behavior of accumulo overall, but do not have to be consistent throughout a cloud."),
    -  GENERAL_CLASSPATHS(AccumuloClassLoader.CLASSPATH_PROPERTY_NAME, AccumuloClassLoader.ACCUMULO_CLASSPATH_VALUE, PropertyType.STRING,
    +  GENERAL_CLASSPATHS(AccumuloClassLoader.CLASSPATH_PROPERTY_NAME, "", PropertyType.STRING,
    --- End diff --
   
    Should we deprecate this property?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #235: ACCUMULO-4612 - Simplify Accumulo memory configu...

ctubbsii
In reply to this post by ctubbsii
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/235#discussion_r107686386
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/conf/AccumuloConfiguration.java ---
    @@ -209,13 +209,19 @@ static public long getMemoryInBytes(String str) {
             case 'B':
               multiplier = 0;
               break;
    +        case '%':
    +          int percent = Integer.parseInt(str.substring(0, str.length() - 1));
    +          if (percent <= 0 || percent >= 100) {
    +           throw new IllegalArgumentException("The value of '" + str + "' is not a valid memory percentage setting.");
    +          }
    +          return Runtime.getRuntime().maxMemory() * percent / 100;
             default:
               return Long.parseLong(str);
           }
           return Long.parseLong(str.substring(0, str.length() - 1)) << multiplier;
         } catch (Exception ex) {
           throw new IllegalArgumentException("The value '" + str + "' is not a valid memory setting. A valid value would a number "
    -          + "possibily followed by an optional 'G', 'M', 'K', or 'B'.");
    +          + "possibly followed by an optional 'G', 'M', 'K', or 'B'.");
    --- End diff --
   
    Since this is catching exception, it possible that something could go wrong parsing a `%`.  So error message should include `%` as valid suffix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #235: ACCUMULO-4612 - Simplify Accumulo memory configu...

ctubbsii
In reply to this post by ctubbsii
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/235#discussion_r107687845
 
    --- Diff: docs/src/main/asciidoc/chapters/administration.txt ---
    @@ -364,17 +357,19 @@ consideration. There is no enforcement of these warnings via the API.
     
     ==== Configuring the ClassLoader
     
    -Accumulo loads classes from the locations specified in the +general.classpaths+ property. Additionally, Accumulo will load classes
    -from the locations specified in the +general.dynamic.classpaths+ property and will monitor and reload them if they change. The reloading
    -feature is useful during the development and testing of iterators as new or modified iterator classes can be deployed to Accumulo without
    -having to restart the database.
    +Accumulo builds its Java classpath in +accumulo-env.sh+.  After Accumulo application have started, they will loads classes from the locations
    --- End diff --
   
    `they will loads classes`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #235: ACCUMULO-4612 - Simplify Accumulo memory configu...

ctubbsii
In reply to this post by ctubbsii
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/235#discussion_r107685017
 
    --- Diff: docs/src/main/asciidoc/chapters/administration.txt ---
    @@ -125,6 +124,12 @@ and specify the following:
     . Enter the location of ZooKeeper for +$ZOOKEEPER_HOME+
     . Optionally, choose a different location for Accumulo logs using +$ACCUMULO_LOG_DIR+
     
    +Accumulo uses +HADOOP_PREFIX+ and +ZOOKEEPER_HOME+ to locate Hadoop and Zookeeper jars
    +and add them the +CLASSPATH+ variable. If you are running a vendor-specific release of Hadoop
    +or Zookeeper, see the `Vendor configuration` section as you may need to change how your `CLASSPATH`
    --- End diff --
   
    if info is on website, then could link to it from here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #235: ACCUMULO-4612 - Simplify Accumulo memory configu...

ctubbsii
In reply to this post by ctubbsii
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/235#discussion_r107683551
 
    --- Diff: INSTALL.md ---
    @@ -36,62 +36,49 @@ to manage Accumulo:
     These scripts will be used in the remaining instructions to configure and run Accumulo.
     For convenience, consider adding `accumulo-X.Y.Z/bin/` to your shell's path.
     
    -## Configuring
    +## Configuring Accumulo
     
    -Accumulo has some optional native code that improves its performance and
    -stability. Before configuring Accumulo, attempt to build this native code
    -with the following command.
    +Accumulo requires running [Zookeeper][3] and [HDFS][4] instances which should be set up
    +before configuring Accumulo.
     
    -    accumulo-util build-native
    +The primary configuration files for Accumulo are `accumulo-env.sh` and `accumulo-site.xml`
    +which are located in the `conf/` directory.
     
    -If the command fails, its OK to continue with setup and resolve the issue later.
    +Follow the steps below to configure `accumulo-site.xml`:
     
    -Accumulo is configured by the files `accumulo-site.xml` and `accumulo-env.sh` in the `conf/`
    -directory. You can either edit these files for your environment or run the command below which will
    -overwrite them with files configured for your environment.
    +1. Run `accumulo-util build-native` to build native code.  If this command fails, disable
    +   native maps by setting `tserver.memory.maps.native.enabled` to `false`.
     
    -    accumulo-util create-config
    +2. Set `instance.volumes` to HDFS location where Accumulo will store data. If your namenode
    +   is running at 192.168.1.9:9000 and you want to store data in `/accumulo` in HDFS, then set
    +   `instance.volumes` to `hdfs://192.168.1.9:9000/accumulo`.
     
    -The script will ask you questions about your set up. Below are some suggestions:
    +3. Set `instance.zookeeper.host` to the location of your Zookeepers
     
    -* When the script asks about memory-map type, choose Native if the build native script
    -  was successful. Otherwise, choose Java.
    -* The script will prompt for memory usage. Please note that the footprints are
    -  only for the Accumulo system processes, so ample space should be left for other
    -  processes like Hadoop, Zookeeper, and the Accumulo client code.  If Accumulo
    -  worker processes are swapped out and unresponsive, they may be killed.
    +4. (Optional) Change `instance.secret` (which is used by Accumulo processes to communicate)
    +   from the default. This value should match on all servers.
     
    -While `accumulo-util create-config` creates  `accumulo-env.sh` and `accumulo-site.xml` files
    -targeted for your environment, these files still require a few more edits before starting Accumulo.
    +Follow the steps below to configure `accumulo-env.sh`:
     
    -### Secret
    +1. Set `HADOOP_PREFIX` and `ZOOKEEPER_HOME` to help Accumulo locate Hadoop and Zookeeper jars
    +   and add them to the `CLASSPATH` variable. If you are running a vendor-specific release of Hadoop
    +   or Zookeeper, see the `Vendor-specific configuration` documentation in the Administration section
    +   of the Accumulo user manual as you may need to change how your `CLASSPATH` is built. If Accumulo
    +   has problems later on finding jars, run `accumulo classpath -d` to print Accumulo's classpath.
     
    -Accumulo coordination and worker processes can only communicate with each other
    -if they share the same secret key.  To change the secret key set
    -`instance.secret` in `accumulo-site.xml`.  Changing this secret key from
    -the default is highly recommended.
    +2. Accumulo tablet servers are configured by default to use 1GB of memory (768MB is allocated to
    +   JVM and 256MB is allocated for native maps). Native maps are allocated memory equal to 33% of
    +   the tserver JVM heap. The table below can be used if you would like to change tsever memory
    +   usage in the `JAVA_OPTS` section of `accumulo-env.sh`:
     
    -### Dependencies
    +    | Native? | 512MB             | 1GB               | 2GB                 | 3GB           |
    +    |---------|-------------------|-------------------|---------------------|---------------|
    +    | Yes     | -Xmx384m -Xms384m | -Xmx768m -Xms768m | -Xmx1536m -Xms1536m | -Xmx2g -Xms2g |
    +    | No      | -Xmx512m -Xms512m | -Xmx1g -Xms1g     | -Xmx2g -Xms2g       | -Xmx3g -Xms3g |
     
    -Accumulo requires running [Zookeeper][3] and [HDFS][4] instances.  Also, the
    -Accumulo binary distribution does not include jars for Zookeeper and Hadoop.
    -When configuring Accumulo the following information about these dependencies
    -must be provided.
    -
    - * **Location of Zookeepers** :  Provide this by setting `instance.zookeeper.host`
    -   in `accumulo-site.xml`.
    - * **Where to store data** :  Provide this by setting `instance.volumes` in
    -   `accumulo-site.xml`.  If your namenode is running at 192.168.1.9:9000
    -   and you want to store data in `/accumulo` in HDFS, then set
    -  `instance.volumes` to `hdfs://192.168.1.9:9000/accumulo`.
    - * **Location of Zookeeper and Hadoop jars** :  Setting `ZOOKEEPER_HOME` and
    -   `HADOOP_PREFIX` in `accumulo-env.sh` will help Accumulo find these jars
    -   when using the default setting for `general.classpaths` in accumulo-site.xml.
    -
    -If Accumulo has problems later on finding jars, then run `bin/accumulo
    -classpath` to print out info about where Accumulo is finding jars.  If the
    -settings mentioned above are correct, then inspect `general.classpaths` in
    -`accumulo-site.xml`.
    +3. (Optional) The Accumulo master is configured by default to use 512MB while the garbage collector
    +   and monitor is configure to use 256MB. These settings can be changed in the `JAVA_OPTS` section
    --- End diff --
   
    `monitor is configure to use`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #235: ACCUMULO-4612 - Simplify Accumulo memory configu...

ctubbsii
In reply to this post by ctubbsii
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/235#discussion_r107684054
 
    --- Diff: INSTALL.md ---
    @@ -36,62 +36,49 @@ to manage Accumulo:
     These scripts will be used in the remaining instructions to configure and run Accumulo.
     For convenience, consider adding `accumulo-X.Y.Z/bin/` to your shell's path.
     
    -## Configuring
    +## Configuring Accumulo
     
    -Accumulo has some optional native code that improves its performance and
    -stability. Before configuring Accumulo, attempt to build this native code
    -with the following command.
    +Accumulo requires running [Zookeeper][3] and [HDFS][4] instances which should be set up
    +before configuring Accumulo.
     
    -    accumulo-util build-native
    +The primary configuration files for Accumulo are `accumulo-env.sh` and `accumulo-site.xml`
    +which are located in the `conf/` directory.
     
    -If the command fails, its OK to continue with setup and resolve the issue later.
    +Follow the steps below to configure `accumulo-site.xml`:
     
    -Accumulo is configured by the files `accumulo-site.xml` and `accumulo-env.sh` in the `conf/`
    -directory. You can either edit these files for your environment or run the command below which will
    -overwrite them with files configured for your environment.
    +1. Run `accumulo-util build-native` to build native code.  If this command fails, disable
    +   native maps by setting `tserver.memory.maps.native.enabled` to `false`.
     
    -    accumulo-util create-config
    +2. Set `instance.volumes` to HDFS location where Accumulo will store data. If your namenode
    +   is running at 192.168.1.9:9000 and you want to store data in `/accumulo` in HDFS, then set
    +   `instance.volumes` to `hdfs://192.168.1.9:9000/accumulo`.
     
    -The script will ask you questions about your set up. Below are some suggestions:
    +3. Set `instance.zookeeper.host` to the location of your Zookeepers
     
    -* When the script asks about memory-map type, choose Native if the build native script
    -  was successful. Otherwise, choose Java.
    -* The script will prompt for memory usage. Please note that the footprints are
    -  only for the Accumulo system processes, so ample space should be left for other
    -  processes like Hadoop, Zookeeper, and the Accumulo client code.  If Accumulo
    -  worker processes are swapped out and unresponsive, they may be killed.
    +4. (Optional) Change `instance.secret` (which is used by Accumulo processes to communicate)
    +   from the default. This value should match on all servers.
     
    -While `accumulo-util create-config` creates  `accumulo-env.sh` and `accumulo-site.xml` files
    -targeted for your environment, these files still require a few more edits before starting Accumulo.
    +Follow the steps below to configure `accumulo-env.sh`:
     
    -### Secret
    +1. Set `HADOOP_PREFIX` and `ZOOKEEPER_HOME` to help Accumulo locate Hadoop and Zookeeper jars
    +   and add them to the `CLASSPATH` variable. If you are running a vendor-specific release of Hadoop
    +   or Zookeeper, see the `Vendor-specific configuration` documentation in the Administration section
    +   of the Accumulo user manual as you may need to change how your `CLASSPATH` is built. If Accumulo
    +   has problems later on finding jars, run `accumulo classpath -d` to print Accumulo's classpath.
     
    -Accumulo coordination and worker processes can only communicate with each other
    -if they share the same secret key.  To change the secret key set
    -`instance.secret` in `accumulo-site.xml`.  Changing this secret key from
    -the default is highly recommended.
    +2. Accumulo tablet servers are configured by default to use 1GB of memory (768MB is allocated to
    +   JVM and 256MB is allocated for native maps). Native maps are allocated memory equal to 33% of
    +   the tserver JVM heap. The table below can be used if you would like to change tsever memory
    +   usage in the `JAVA_OPTS` section of `accumulo-env.sh`:
     
    -### Dependencies
    +    | Native? | 512MB             | 1GB               | 2GB                 | 3GB           |
    +    |---------|-------------------|-------------------|---------------------|---------------|
    +    | Yes     | -Xmx384m -Xms384m | -Xmx768m -Xms768m | -Xmx1536m -Xms1536m | -Xmx2g -Xms2g |
    +    | No      | -Xmx512m -Xms512m | -Xmx1g -Xms1g     | -Xmx2g -Xms2g       | -Xmx3g -Xms3g |
     
    -Accumulo requires running [Zookeeper][3] and [HDFS][4] instances.  Also, the
    -Accumulo binary distribution does not include jars for Zookeeper and Hadoop.
    -When configuring Accumulo the following information about these dependencies
    -must be provided.
    -
    - * **Location of Zookeepers** :  Provide this by setting `instance.zookeeper.host`
    -   in `accumulo-site.xml`.
    - * **Where to store data** :  Provide this by setting `instance.volumes` in
    -   `accumulo-site.xml`.  If your namenode is running at 192.168.1.9:9000
    -   and you want to store data in `/accumulo` in HDFS, then set
    -  `instance.volumes` to `hdfs://192.168.1.9:9000/accumulo`.
    - * **Location of Zookeeper and Hadoop jars** :  Setting `ZOOKEEPER_HOME` and
    -   `HADOOP_PREFIX` in `accumulo-env.sh` will help Accumulo find these jars
    -   when using the default setting for `general.classpaths` in accumulo-site.xml.
    -
    -If Accumulo has problems later on finding jars, then run `bin/accumulo
    -classpath` to print out info about where Accumulo is finding jars.  If the
    -settings mentioned above are correct, then inspect `general.classpaths` in
    -`accumulo-site.xml`.
    +3. (Optional) The Accumulo master is configured by default to use 512MB while the garbage collector
    --- End diff --
   
    Can you word this differently so that the exact memory amounts are not specified here?  If those defaults are changed in the env file, then the person making the change would need to know to make the changes here.  I think its better to not duplicate the info here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #235: ACCUMULO-4612 - Simplify Accumulo memory configu...

ctubbsii
In reply to this post by ctubbsii
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/235#discussion_r107682883
 
    --- Diff: docs/src/main/asciidoc/chapters/administration.txt ---
    @@ -404,6 +399,51 @@ and BatchScanner passing in the name of the context, app1 in the example above.
     iterators to load classes from the locations defined by the context. Passing the context name to the scanners allows you to override the table setting
     to load only scan time iterators from a different location.
     
    +=== Vendor-specific configuration
    --- End diff --
   
    I think this info would better on the website, where it can be easily updated and appended to.
   
    We could create the web page now w/o any links to it on the website .. and a disclaimer about the info being for 2.0.0-SNAP.  Then 2.0.0-SNAP docs could point to it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #235: ACCUMULO-4612 - Simplify Accumulo memory configu...

ctubbsii
In reply to this post by ctubbsii
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/235#discussion_r107685917
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/conf/PropertyType.java ---
    @@ -45,8 +45,8 @@
               + "Examples of invalid durations are '1w', '1h30m', '1s 200ms', 'ms', '', and 'a'.\n"
               + "Unless otherwise stated, the max value for the duration represented in milliseconds is " + Long.MAX_VALUE),
     
    -  MEMORY("memory", boundedUnits(0, Long.MAX_VALUE, false, "", "B", "K", "M", "G"),
    -      "A positive integer optionally followed by a unit of memory (whitespace disallowed), as in 2G.\n"
    +  MEMORY("memory", boundedUnits(0, Long.MAX_VALUE, false, "", "B", "K", "M", "G", "%"),
    +      "A positive integer optionally followed by a unit of memory or percentage (whitespace disallowed), as in 2G.\n"
    --- End diff --
   
    Need to describe behavior of percentage.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #235: ACCUMULO-4612 - Simplify Accumulo memory configu...

ctubbsii
In reply to this post by ctubbsii
Github user mikewalch commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/235#discussion_r107690464
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/conf/Property.java ---
    @@ -170,10 +170,9 @@
       // general properties
       GENERAL_PREFIX("general.", null, PropertyType.PREFIX,
           "Properties in this category affect the behavior of accumulo overall, but do not have to be consistent throughout a cloud."),
    -  GENERAL_CLASSPATHS(AccumuloClassLoader.CLASSPATH_PROPERTY_NAME, AccumuloClassLoader.ACCUMULO_CLASSPATH_VALUE, PropertyType.STRING,
    +  GENERAL_CLASSPATHS(AccumuloClassLoader.CLASSPATH_PROPERTY_NAME, "", PropertyType.STRING,
    --- End diff --
   
    I think it should be deprecated. I think users should avoid setting up the classpath in accumulo-site.xml.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #235: ACCUMULO-4612 - Simplify Accumulo memory configu...

ctubbsii
In reply to this post by ctubbsii
Github user mikewalch commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/235#discussion_r107690518
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/conf/Property.java ---
    @@ -253,13 +252,13 @@
           + " max number of concurrent writer when configuring. When using Hadoop 2, Accumulo will call hsync() on the WAL . For a small number of "
           + "concurrent writers, increasing this buffer size decreases the frequncy of hsync calls. For a large number of concurrent writers a small buffers "
           + "size is ok because of group commit."),
    -  TSERV_TOTAL_MUTATION_QUEUE_MAX("tserver.total.mutation.queue.max", "50M", PropertyType.MEMORY,
    +  TSERV_TOTAL_MUTATION_QUEUE_MAX("tserver.total.mutation.queue.max", "5%", PropertyType.MEMORY,
           "The amount of memory used to store write-ahead-log mutations before flushing them."),
       TSERV_TABLET_SPLIT_FINDMIDPOINT_MAXOPEN("tserver.tablet.split.midpoint.files.max", "300", PropertyType.COUNT,
           "To find a tablets split points, all index files are opened. This setting determines how many index "
               + "files can be opened at once. When there are more index files than this setting multiple passes "
               + "must be made, which is slower. However opening too many files at once can cause problems."),
    -  TSERV_WALOG_MAX_SIZE("tserver.walog.max.size", "1G", PropertyType.MEMORY,
    +  TSERV_WALOG_MAX_SIZE("tserver.walog.max.size", "33%", PropertyType.MEMORY,
    --- End diff --
   
    Ok.  I will change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #235: ACCUMULO-4612 - Simplify Accumulo memory configu...

ctubbsii
In reply to this post by ctubbsii
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/235#discussion_r107713794
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/conf/AccumuloConfiguration.java ---
    @@ -209,13 +209,19 @@ static public long getMemoryInBytes(String str) {
             case 'B':
               multiplier = 0;
               break;
    +        case '%':
    --- End diff --
   
    I'm wondering if it would make more sense to create a new `PropertyType` instead of piggy-backing on `MEMORY`. Is it confusing to allow percent-values for some of these other properties which we really never expect users to do?
   
    For example, some properties like `table.file.blocksize` or `tserver.compaction.major.throughput` make no sense to allow a percentage of total heap.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #235: ACCUMULO-4612 - Simplify Accumulo memory configu...

ctubbsii
In reply to this post by ctubbsii
Github user mikewalch commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/235#discussion_r107715725
 
    --- Diff: docs/src/main/asciidoc/chapters/administration.txt ---
    @@ -404,6 +399,51 @@ and BatchScanner passing in the name of the context, app1 in the example above.
     iterators to load classes from the locations defined by the context. Passing the context name to the scanners allows you to override the table setting
     to load only scan time iterators from a different location.
     
    +=== Vendor-specific configuration
    --- End diff --
   
    I would prefer if it stayed in the user manual as these settings could vary between minor versions of Accumulo. If vendors release new versions and these docs need to be updated,  the user manual can be updated in the Accumulo repo and pushed to the website without creating a new bugfix release.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #235: ACCUMULO-4612 - Simplify Accumulo memory configu...

ctubbsii
In reply to this post by ctubbsii
Github user mikewalch commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/235#discussion_r107717380
 
    --- Diff: INSTALL.md ---
    @@ -36,62 +36,49 @@ to manage Accumulo:
     These scripts will be used in the remaining instructions to configure and run Accumulo.
     For convenience, consider adding `accumulo-X.Y.Z/bin/` to your shell's path.
     
    -## Configuring
    +## Configuring Accumulo
     
    -Accumulo has some optional native code that improves its performance and
    -stability. Before configuring Accumulo, attempt to build this native code
    -with the following command.
    +Accumulo requires running [Zookeeper][3] and [HDFS][4] instances which should be set up
    +before configuring Accumulo.
     
    -    accumulo-util build-native
    +The primary configuration files for Accumulo are `accumulo-env.sh` and `accumulo-site.xml`
    +which are located in the `conf/` directory.
     
    -If the command fails, its OK to continue with setup and resolve the issue later.
    +Follow the steps below to configure `accumulo-site.xml`:
     
    -Accumulo is configured by the files `accumulo-site.xml` and `accumulo-env.sh` in the `conf/`
    -directory. You can either edit these files for your environment or run the command below which will
    -overwrite them with files configured for your environment.
    +1. Run `accumulo-util build-native` to build native code.  If this command fails, disable
    +   native maps by setting `tserver.memory.maps.native.enabled` to `false`.
     
    -    accumulo-util create-config
    +2. Set `instance.volumes` to HDFS location where Accumulo will store data. If your namenode
    +   is running at 192.168.1.9:9000 and you want to store data in `/accumulo` in HDFS, then set
    +   `instance.volumes` to `hdfs://192.168.1.9:9000/accumulo`.
     
    -The script will ask you questions about your set up. Below are some suggestions:
    +3. Set `instance.zookeeper.host` to the location of your Zookeepers
     
    -* When the script asks about memory-map type, choose Native if the build native script
    -  was successful. Otherwise, choose Java.
    -* The script will prompt for memory usage. Please note that the footprints are
    -  only for the Accumulo system processes, so ample space should be left for other
    -  processes like Hadoop, Zookeeper, and the Accumulo client code.  If Accumulo
    -  worker processes are swapped out and unresponsive, they may be killed.
    +4. (Optional) Change `instance.secret` (which is used by Accumulo processes to communicate)
    +   from the default. This value should match on all servers.
     
    -While `accumulo-util create-config` creates  `accumulo-env.sh` and `accumulo-site.xml` files
    -targeted for your environment, these files still require a few more edits before starting Accumulo.
    +Follow the steps below to configure `accumulo-env.sh`:
     
    -### Secret
    +1. Set `HADOOP_PREFIX` and `ZOOKEEPER_HOME` to help Accumulo locate Hadoop and Zookeeper jars
    +   and add them to the `CLASSPATH` variable. If you are running a vendor-specific release of Hadoop
    +   or Zookeeper, see the `Vendor-specific configuration` documentation in the Administration section
    +   of the Accumulo user manual as you may need to change how your `CLASSPATH` is built. If Accumulo
    +   has problems later on finding jars, run `accumulo classpath -d` to print Accumulo's classpath.
     
    -Accumulo coordination and worker processes can only communicate with each other
    -if they share the same secret key.  To change the secret key set
    -`instance.secret` in `accumulo-site.xml`.  Changing this secret key from
    -the default is highly recommended.
    +2. Accumulo tablet servers are configured by default to use 1GB of memory (768MB is allocated to
    +   JVM and 256MB is allocated for native maps). Native maps are allocated memory equal to 33% of
    +   the tserver JVM heap. The table below can be used if you would like to change tsever memory
    +   usage in the `JAVA_OPTS` section of `accumulo-env.sh`:
     
    -### Dependencies
    +    | Native? | 512MB             | 1GB               | 2GB                 | 3GB           |
    +    |---------|-------------------|-------------------|---------------------|---------------|
    +    | Yes     | -Xmx384m -Xms384m | -Xmx768m -Xms768m | -Xmx1536m -Xms1536m | -Xmx2g -Xms2g |
    +    | No      | -Xmx512m -Xms512m | -Xmx1g -Xms1g     | -Xmx2g -Xms2g       | -Xmx3g -Xms3g |
     
    -Accumulo requires running [Zookeeper][3] and [HDFS][4] instances.  Also, the
    -Accumulo binary distribution does not include jars for Zookeeper and Hadoop.
    -When configuring Accumulo the following information about these dependencies
    -must be provided.
    -
    - * **Location of Zookeepers** :  Provide this by setting `instance.zookeeper.host`
    -   in `accumulo-site.xml`.
    - * **Where to store data** :  Provide this by setting `instance.volumes` in
    -   `accumulo-site.xml`.  If your namenode is running at 192.168.1.9:9000
    -   and you want to store data in `/accumulo` in HDFS, then set
    -  `instance.volumes` to `hdfs://192.168.1.9:9000/accumulo`.
    - * **Location of Zookeeper and Hadoop jars** :  Setting `ZOOKEEPER_HOME` and
    -   `HADOOP_PREFIX` in `accumulo-env.sh` will help Accumulo find these jars
    -   when using the default setting for `general.classpaths` in accumulo-site.xml.
    -
    -If Accumulo has problems later on finding jars, then run `bin/accumulo
    -classpath` to print out info about where Accumulo is finding jars.  If the
    -settings mentioned above are correct, then inspect `general.classpaths` in
    -`accumulo-site.xml`.
    +3. (Optional) The Accumulo master is configured by default to use 512MB while the garbage collector
    --- End diff --
   
    Should be fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #235: ACCUMULO-4612 - Simplify Accumulo memory configu...

ctubbsii
In reply to this post by ctubbsii
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/235#discussion_r107720701
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/conf/AccumuloConfiguration.java ---
    @@ -209,13 +209,19 @@ static public long getMemoryInBytes(String str) {
             case 'B':
               multiplier = 0;
               break;
    +        case '%':
    --- End diff --
   
    Could have a new type `BYTES` which has everything but the `%`.  And `MEMORY` could be what `BYTES` supports plus `%`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #235: ACCUMULO-4612 - Simplify Accumulo memory configu...

ctubbsii
In reply to this post by ctubbsii
Github user mikewalch commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/235#discussion_r107721029
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/conf/AccumuloConfiguration.java ---
    @@ -209,13 +209,19 @@ static public long getMemoryInBytes(String str) {
             case 'B':
               multiplier = 0;
               break;
    +        case '%':
    --- End diff --
   
    I would rather not create another `PropertyType`. If the defaults are fixed memory, users probably won't switch to percentage.  If you are worried about certain properties, I could add some text to the description of those properties that fixed memory amounts should be used.  Even if we limit percentage, users can still misconfigure things by setting fixed memory values that are too high or low for that property.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #235: ACCUMULO-4612 - Simplify Accumulo memory configu...

ctubbsii
In reply to this post by ctubbsii
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/235#discussion_r107721682
 
    --- Diff: docs/src/main/asciidoc/chapters/administration.txt ---
    @@ -404,6 +399,51 @@ and BatchScanner passing in the name of the context, app1 in the example above.
     iterators to load classes from the locations defined by the context. Passing the context name to the scanners allows you to override the table setting
     to load only scan time iterators from a different location.
     
    +=== Vendor-specific configuration
    --- End diff --
   
    > If vendors release new versions and these docs need to be updated, the user manual can be updated in the Accumulo repo and pushed to the website without creating a new bugfix release.
   
    That addresses part of my concern.  Still does not allow linking to the documentation from accumulo-env.sh.  However, the inability to link between different kinds of Accumulo documentation is a more general issue that should not be addresses here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] accumulo pull request #235: ACCUMULO-4612 - Simplify Accumulo memory configu...

ctubbsii
In reply to this post by ctubbsii
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/235#discussion_r107727382
 
    --- Diff: core/src/main/java/org/apache/accumulo/core/conf/AccumuloConfiguration.java ---
    @@ -209,13 +209,19 @@ static public long getMemoryInBytes(String str) {
             case 'B':
               multiplier = 0;
               break;
    +        case '%':
    --- End diff --
   
    > If you are worried about certain properties, I could add some text to the description of those properties that fixed memory amounts should be used
   
    That would also assuage my fears. I am not worried about the general misconfiguration -- just users trying to set percentages where it makes no sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
123