HDFS disk usage grows too much

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

HDFS disk usage grows too much

Anton Puzanov
Hi all,
 
I have a big cluster configuration with 10 Data Nodes and 2 writers.
Currently the HDFS disk usage is 60%. I am ingesting key-value pairs with rate of ~7.5M.
Each Data Node have total disk memory of ~10T.
I observed a very strange behavior of the disk size, screenshot from grafana:


 
As you can see the disk usage increased really fast. This increase could not be caused by solely the ingestion data since previous executions of the writers wrote few hundreds of gigabytes per day!
 
At the peak of the increase of disk usage several Tablet servers have failed and the system froze (CPU, disk, network...)! a screenshot of the cpu usage:

 
 
The GC configurations are not changed in this run so GC should work every 5min Accumulo GC heap memory = 8192MB.
possibly relevant configurations:
      "tserver.wal.blocksize": "1G",
      "tserver.walog.max.size": "2G",
      "tserver.memory.maps.max": "4G",
      "tserver.compaction.minor.concurrent.max": "50",
      "tserver.compaction.major.concurrent.max": "20",
 
My question is whether this increase in disk consumption is normal? should I always keep the disk usage at 50%?
What can cause such errors and how can they be avoided?
 
Thanks,
Anton P.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: HDFS disk usage grows too much

Michael Wall
Hi Anton,

What is your interval for ingesting ~7.5M? Are you writing mutations or bulk ingesting? Assuming you are writing mutations, those writeahead logs will all be replicated 3 times by default in HDFS, so ~22.5M. Then it gets flushed to disk as some point, also replicated 3 times by default for another ~22.5M.

So a couple more questions.

1 - Do you have HDFS Trash enabled? While I consider that a best practice, it will keep the data around longer.
2 - Is all the HDFS storage in /accumulo?

Mike


On Sun, Jul 9, 2017 at 8:18 AM Anton Puzanov <[hidden email]> wrote:
Hi all,
 
I have a big cluster configuration with 10 Data Nodes and 2 writers.
Currently the HDFS disk usage is 60%. I am ingesting key-value pairs with rate of ~7.5M.
Each Data Node have total disk memory of ~10T.
I observed a very strange behavior of the disk size, screenshot from grafana:


 
As you can see the disk usage increased really fast. This increase could not be caused by solely the ingestion data since previous executions of the writers wrote few hundreds of gigabytes per day!
 
At the peak of the increase of disk usage several Tablet servers have failed and the system froze (CPU, disk, network...)! a screenshot of the cpu usage:

 
 
The GC configurations are not changed in this run so GC should work every 5min Accumulo GC heap memory = 8192MB.
possibly relevant configurations:
      "tserver.wal.blocksize": "1G",
      "tserver.walog.max.size": "2G",
      "tserver.memory.maps.max": "4G",
      "tserver.compaction.minor.concurrent.max": "50",
      "tserver.compaction.major.concurrent.max": "20",
 
My question is whether this increase in disk consumption is normal? should I always keep the disk usage at 50%?
What can cause such errors and how can they be avoided?
 
Thanks,
Anton P.


Image.1499602654658.png (511K) Download Attachment
Image.1499602667180.png (135K) Download Attachment
Image.1499602654658.png (511K) Download Attachment
Loading...