Quantcast

Thrift Transport Exception on Master

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Thrift Transport Exception on Master

buttercream
This post has NOT been accepted by the mailing list yet.
Hi all,

I'm getting a transport exception on only the master node. Once started up, the logs just stream filled with these exceptions

DEBUG: Error getting transport to 127.0.0.1:9996 org.apache.thrift.transport.TTransportException

I netstat'd on the machine and saw that nothing was using port 9996. I could see, however, the connections that the master was making to the slave machine's 9996, however. I was originally running a 3 node cluster when I saw this message appearing on the master. Thinking there was an issue with the machine, I dropped the master out and switched one of the slaves to be the master. I then started seeing the same error on the new master. So, I'm guessing there must be some missing/incorrect configuration issue that I have, although I find it strange that only the master nodes have this problem.

I tried changing the tserver client port in the accumulo-site.xml, but saw the same issue with the new port. I also tried using the tserver.port.search=true property in case there was just an issue with the ports themselves. That also did not work.

Anyone seen this one before or have any ideas?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Thrift Transport Exception on Master

buttercream
This post has NOT been accepted by the mailing list yet.
Forgot to add. It is a connection refused exception.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Thrift Transport Exception on Master

buttercream
I am new to Accumulo, so you'll have to forgive some of the naive questions. When I see the master trying to connect to it's own 9996, does that indicate that it thinks there should be a tablet server running on that node? If so, what do I need to configure to specify that a tablet server is not running on the master node? If that is not the case, what exactly is the master trying to do connecting to its own 9996 port?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Thrift Transport Exception on Master

John Vines-3
Most likely than not, your tservers are registering themselves in zookeeper as 127.0.0.1. This will cause the master, or any node really, to think the tserver is local.

When a tserver starts, it will register it's location in zookeeper. They will use the location in the slaves file as a guidance for which IP address it should use to report as. So if you update the slaves file to use actual ip/hosts instead of localhost, it should resolve this issue.

John


On Sat, Sep 21, 2013 at 5:36 PM, buttercream <[hidden email]> wrote:
I am new to Accumulo, so you'll have to forgive some of the naive questions.
When I see the master trying to connect to it's own 9996, does that indicate
that it thinks there should be a tablet server running on that node? If so,
what do I need to configure to specify that a tablet server is not running
on the master node? If that is not the case, what exactly is the master
trying to do connecting to its own 9996 port?



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Thrift-Transport-Exception-on-Master-tp5488p5490.html
Sent from the Users mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Thrift Transport Exception on Master

buttercream
In reply to this post by buttercream
I went into the slaves files and verified that I am using actual IP addresses. The only file I had used localhost in was the monitors file, so I updated that, although I still see the error.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Thrift Transport Exception on Master

John Vines-3
How did you start the tservers and which version are you using?


On Sat, Sep 21, 2013 at 6:24 PM, buttercream <[hidden email]> wrote:
I went into the slaves files and verified that I am using actual IP
addresses. The only file I had used localhost in was the monitors file, so I
updated that, although I still see the error.



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Thrift-Transport-Exception-on-Master-tp5488p5492.html
Sent from the Users mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Thrift Transport Exception on Master

buttercream
I'm running Accumulo 1.5 and I use the bin/start-all.sh run from the master.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Thrift Transport Exception on Master

buttercream
Interesting behavior I just observed. I changed the port number on the master from 9996 to 9997 and on the one slave to 9998. When I restarted, the errors were still streaming that the failed connection was trying to be made to 127.0.0.1:9996. Thoughts on that one?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Thrift Transport Exception on Master

John Vines-3
One option is to check the tservers list in zookeeper with zkcli to see what the tservers are registering as. Furthermore, are any errors being reported on the monitor from the tservers?



On Sat, Sep 21, 2013 at 8:44 PM, buttercream <[hidden email]> wrote:
Interesting behavior I just observed. I changed the port number on the master
from 9996 to 9997 and on the one slave to 9998. When I restarted, the errors
were still streaming that the failed connection was trying to be made to
127.0.0.1:9996. Thoughts on that one?



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Thrift-Transport-Exception-on-Master-tp5488p5495.html
Sent from the Users mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Thrift Transport Exception on Master

buttercream
I ran zkCli and used the ls command to search around in there. I did an ls on /accumulo and saw 4 hash directories. Looking through each of them while accumulo was running, only one of them had anything in the /accumulo/hash/tservers directory. It had a single entry and it was the expected IP address of the one slave machine running on port 9998, which was the expected value from that machine's accumulo-site.xml. Is there another command I should run as well?

I checked the monitor log of the one slave and no errors are printing out. The tserver log on that machine also shows no errors.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Thrift Transport Exception on Master

John Vines-3
Is the hdfs-site.xml/core-site.xml that you have the HADOOP_HOME/HADOOP_CONF_DIR pointed at in Accumulo properly setup? If you have a dummy xml file there using file:/// you may experience issues with running in a distributed environment that you're dealing with.


On Sun, Sep 22, 2013 at 1:52 PM, buttercream <[hidden email]> wrote:
I ran zkCli and used the ls command to search around in there. I did an ls on
/accumulo and saw 4 hash directories. Looking through each of them while
accumulo was running, only one of them had anything in the
/accumulo/hash/tservers directory. It had a single entry and it was the
expected IP address of the one slave machine running on port 9998, which was
the expected value from that machine's accumulo-site.xml. Is there another
command I should run as well?

I checked the monitor log of the one slave and no errors are printing out.
The tserver log on that machine also shows no errors.



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Thrift-Transport-Exception-on-Master-tp5488p5497.html
Sent from the Users mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Thrift Transport Exception on Master

buttercream
I double checked the environment variables and they all appeared to be set correctly. I ended up blowing away the accumulo volume and reinitializing. It did get rid of the 127.0.0.1:9996 error, however, I do get an error on the tablet server trying to write to the wal directory. The error states:
Create failed for file: /accumulo/wal/<ipaddress>+9996/<hash>, error: Permission Denied

I have accumulo located at /opt/accumulo and I created a directory /opt/accumulo/wal and have that defined in the accumulo-site.xml. I opened up the /opt/accumulo/wal to 777 and still get the error. Almost like it isn't actually writing to that directory. Any ideas?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Thrift Transport Exception on Master

Eric Newton
What version of accumulo?

1.5 writes to hdfs, so you would need to make sure that the wal directory is writable in hdfs.
1.4 and earlier write to a local file system on each node.

Your message says that you have configured the wal directory to be "/opt/accumulo/wal" and yet your error message says "/accumulo/wal".

-Eric


On Tue, Sep 24, 2013 at 11:15 AM, buttercream <[hidden email]> wrote:
I double checked the environment variables and they all appeared to be set
correctly. I ended up blowing away the accumulo volume and reinitializing.
It did get rid of the 127.0.0.1:9996 error, however, I do get an error on
the tablet server trying to write to the wal directory. The error states:
Create failed for file: /accumulo/wal/<ipaddress>+9996/<hash>, error:
Permission Denied

I have accumulo located at /opt/accumulo and I created a directory
/opt/accumulo/wal and have that defined in the accumulo-site.xml. I opened
up the /opt/accumulo/wal to 777 and still get the error. Almost like it
isn't actually writing to that directory. Any ideas?



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Thrift-Transport-Exception-on-Master-tp5488p5520.html
Sent from the Users mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Thrift Transport Exception on Master

John Vines-3
In reply to this post by buttercream
What version of accumulo?


On Tue, Sep 24, 2013 at 11:15 AM, buttercream <[hidden email]> wrote:
I double checked the environment variables and they all appeared to be set
correctly. I ended up blowing away the accumulo volume and reinitializing.
It did get rid of the 127.0.0.1:9996 error, however, I do get an error on
the tablet server trying to write to the wal directory. The error states:
Create failed for file: /accumulo/wal/<ipaddress>+9996/<hash>, error:
Permission Denied

I have accumulo located at /opt/accumulo and I created a directory
/opt/accumulo/wal and have that defined in the accumulo-site.xml. I opened
up the /opt/accumulo/wal to 777 and still get the error. Almost like it
isn't actually writing to that directory. Any ideas?



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Thrift-Transport-Exception-on-Master-tp5488p5520.html
Sent from the Users mailing list archive at Nabble.com.

Loading...