Kerberos ticket renewal

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Kerberos ticket renewal

James Srinivasan
I'm using Accumulo 1.7.0 and finding that after some period of time
(>8 hours, <3 days - happened over the weekend) my ingest fails with
errors regarding "Failed to find any Kerberos tgt". My guess is that
the ticket from the keytab has expired, and needs to be renewed - from
memory, I had seen a Kerberos tgt renewer thread running in my client,
so assumed it happened automagically. Is that the case? Perhaps I am
hitting this bug? https://issues.apache.org/jira/browse/ACCUMULO-4069

Thanks,

James
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Kerberos ticket renewal

Christopher Tubbs-2
It certainly sounds like the same issue. I'd recommend upgrading to the latest 1.7.3 (currently the latest 1.7 version) to include all the bugs we've found and fixed in that release line.

On Mon, Jul 10, 2017 at 5:50 AM James Srinivasan <[hidden email]> wrote:
I'm using Accumulo 1.7.0 and finding that after some period of time
(>8 hours, <3 days - happened over the weekend) my ingest fails with
errors regarding "Failed to find any Kerberos tgt". My guess is that
the ticket from the keytab has expired, and needs to be renewed - from
memory, I had seen a Kerberos tgt renewer thread running in my client,
so assumed it happened automagically. Is that the case? Perhaps I am
hitting this bug? https://issues.apache.org/jira/browse/ACCUMULO-4069

Thanks,

James
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Kerberos ticket renewal

Josh Elser
No, you are (likely) not running into ACCUMULO-4069. What you've
described sounds like your client's ticket expired. Accumulo does not
spawn any ticket renewal on the behalf of clients.

Hadoop's UGI code will automatically spawn a renewal thread when you
log in using a ticket cache. This does not happen automatically when
you use a keytab (I have no explanation as to why this is). This is
the most likely cause of your error and something you need to correct
in your application (spawn a thread to renew your application's
ticket).

If you are using MapReduce, you have yet another layer of indirection
with DelegationTokens, but that's probably not what you're seeing (as
DelegationTokens don't actually have a Kerberos TGT).

On Mon, Jul 10, 2017 at 5:42 PM, Christopher <[hidden email]> wrote:

> It certainly sounds like the same issue. I'd recommend upgrading to the
> latest 1.7.3 (currently the latest 1.7 version) to include all the bugs
> we've found and fixed in that release line.
>
> On Mon, Jul 10, 2017 at 5:50 AM James Srinivasan
> <[hidden email]> wrote:
>>
>> I'm using Accumulo 1.7.0 and finding that after some period of time
>> (>8 hours, <3 days - happened over the weekend) my ingest fails with
>> errors regarding "Failed to find any Kerberos tgt". My guess is that
>> the ticket from the keytab has expired, and needs to be renewed - from
>> memory, I had seen a Kerberos tgt renewer thread running in my client,
>> so assumed it happened automagically. Is that the case? Perhaps I am
>> hitting this bug? https://issues.apache.org/jira/browse/ACCUMULO-4069
>>
>> Thanks,
>>
>> James
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Kerberos ticket renewal

James Srinivasan
Thanks both. I can't (easily) upgrade beyond 1.7.0, but have raised a
support case with our Hadoop distribution vendor.

I'm not (yet) worried about expiration with MapReduce - for now I'll
try to keep such jobs to under 24h! Outside MR, sounds like I just
need to periodically call
UserGroupInformation.checkTGTAndReloginFromKeytab like

https://github.com/apache/accumulo/blob/master/server/base/src/main/java/org/apache/accumulo/server/security/SecurityUtil.java#L121

Or is the TGT associated with an Accumulo KerberosToken separate?

Thanks,

James

On 11 July 2017 at 02:59, Josh Elser <[hidden email]> wrote:

> No, you are (likely) not running into ACCUMULO-4069. What you've
> described sounds like your client's ticket expired. Accumulo does not
> spawn any ticket renewal on the behalf of clients.
>
> Hadoop's UGI code will automatically spawn a renewal thread when you
> log in using a ticket cache. This does not happen automatically when
> you use a keytab (I have no explanation as to why this is). This is
> the most likely cause of your error and something you need to correct
> in your application (spawn a thread to renew your application's
> ticket).
>
> If you are using MapReduce, you have yet another layer of indirection
> with DelegationTokens, but that's probably not what you're seeing (as
> DelegationTokens don't actually have a Kerberos TGT).
>
> On Mon, Jul 10, 2017 at 5:42 PM, Christopher <[hidden email]> wrote:
>> It certainly sounds like the same issue. I'd recommend upgrading to the
>> latest 1.7.3 (currently the latest 1.7 version) to include all the bugs
>> we've found and fixed in that release line.
>>
>> On Mon, Jul 10, 2017 at 5:50 AM James Srinivasan
>> <[hidden email]> wrote:
>>>
>>> I'm using Accumulo 1.7.0 and finding that after some period of time
>>> (>8 hours, <3 days - happened over the weekend) my ingest fails with
>>> errors regarding "Failed to find any Kerberos tgt". My guess is that
>>> the ticket from the keytab has expired, and needs to be renewed - from
>>> memory, I had seen a Kerberos tgt renewer thread running in my client,
>>> so assumed it happened automagically. Is that the case? Perhaps I am
>>> hitting this bug? https://issues.apache.org/jira/browse/ACCUMULO-4069
>>>
>>> Thanks,
>>>
>>> James
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Kerberos ticket renewal

Josh Elser
Nope, you've got it exactly right! That's the code I would've pointed
you at to copy :)

If/when you do get to long-running MR jobs, see the
"general.delegation.token.*" configuration properties in this table[1].
I think the docs are citing that one delegation token is valid for 7
days, but it's been a long time since writing/testing that code.

- Josh

[1]
https://accumulo.apache.org/1.8/accumulo_user_manual.html#_server_configuration_2

On 7/11/17 1:25 AM, James Srinivasan wrote:

> Thanks both. I can't (easily) upgrade beyond 1.7.0, but have raised a
> support case with our Hadoop distribution vendor.
>
> I'm not (yet) worried about expiration with MapReduce - for now I'll
> try to keep such jobs to under 24h! Outside MR, sounds like I just
> need to periodically call
> UserGroupInformation.checkTGTAndReloginFromKeytab like
>
> https://github.com/apache/accumulo/blob/master/server/base/src/main/java/org/apache/accumulo/server/security/SecurityUtil.java#L121
>
> Or is the TGT associated with an Accumulo KerberosToken separate?
>
> Thanks,
>
> James
>
> On 11 July 2017 at 02:59, Josh Elser <[hidden email]> wrote:
>> No, you are (likely) not running into ACCUMULO-4069. What you've
>> described sounds like your client's ticket expired. Accumulo does not
>> spawn any ticket renewal on the behalf of clients.
>>
>> Hadoop's UGI code will automatically spawn a renewal thread when you
>> log in using a ticket cache. This does not happen automatically when
>> you use a keytab (I have no explanation as to why this is). This is
>> the most likely cause of your error and something you need to correct
>> in your application (spawn a thread to renew your application's
>> ticket).
>>
>> If you are using MapReduce, you have yet another layer of indirection
>> with DelegationTokens, but that's probably not what you're seeing (as
>> DelegationTokens don't actually have a Kerberos TGT).
>>
>> On Mon, Jul 10, 2017 at 5:42 PM, Christopher <[hidden email]> wrote:
>>> It certainly sounds like the same issue. I'd recommend upgrading to the
>>> latest 1.7.3 (currently the latest 1.7 version) to include all the bugs
>>> we've found and fixed in that release line.
>>>
>>> On Mon, Jul 10, 2017 at 5:50 AM James Srinivasan
>>> <[hidden email]> wrote:
>>>>
>>>> I'm using Accumulo 1.7.0 and finding that after some period of time
>>>> (>8 hours, <3 days - happened over the weekend) my ingest fails with
>>>> errors regarding "Failed to find any Kerberos tgt". My guess is that
>>>> the ticket from the keytab has expired, and needs to be renewed - from
>>>> memory, I had seen a Kerberos tgt renewer thread running in my client,
>>>> so assumed it happened automagically. Is that the case? Perhaps I am
>>>> hitting this bug? https://issues.apache.org/jira/browse/ACCUMULO-4069
>>>>
>>>> Thanks,
>>>>
>>>> James
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Kerberos ticket renewal

James Srinivasan
Hi,

So I've fired off a thread to perform the periodic
checkTGTAndReloginFromKeytab call which seems to be running, but the
connection still fails with GSS errors after precisely 10 hours.

While I am running 1.7.0, it seems the vendor included the
ACCUMULO-4069 patch, and immediately after the exception is thrown I
see a log entry "Performing ticket-cache-based Kerberos re-login".
However, it should be using a keytab - have turned up the logging to
11 and will leave running overnight...

James

On 11 July 2017 at 16:17, Josh Elser <[hidden email]> wrote:

> Nope, you've got it exactly right! That's the code I would've pointed you at
> to copy :)
>
> If/when you do get to long-running MR jobs, see the
> "general.delegation.token.*" configuration properties in this table[1]. I
> think the docs are citing that one delegation token is valid for 7 days, but
> it's been a long time since writing/testing that code.
>
> - Josh
>
> [1]
> https://accumulo.apache.org/1.8/accumulo_user_manual.html#_server_configuration_2
>
> On 7/11/17 1:25 AM, James Srinivasan wrote:
>>
>> Thanks both. I can't (easily) upgrade beyond 1.7.0, but have raised a
>> support case with our Hadoop distribution vendor.
>>
>> I'm not (yet) worried about expiration with MapReduce - for now I'll
>> try to keep such jobs to under 24h! Outside MR, sounds like I just
>> need to periodically call
>> UserGroupInformation.checkTGTAndReloginFromKeytab like
>>
>>
>> https://github.com/apache/accumulo/blob/master/server/base/src/main/java/org/apache/accumulo/server/security/SecurityUtil.java#L121
>>
>> Or is the TGT associated with an Accumulo KerberosToken separate?
>>
>> Thanks,
>>
>> James
>>
>> On 11 July 2017 at 02:59, Josh Elser <[hidden email]> wrote:
>>>
>>> No, you are (likely) not running into ACCUMULO-4069. What you've
>>> described sounds like your client's ticket expired. Accumulo does not
>>> spawn any ticket renewal on the behalf of clients.
>>>
>>> Hadoop's UGI code will automatically spawn a renewal thread when you
>>> log in using a ticket cache. This does not happen automatically when
>>> you use a keytab (I have no explanation as to why this is). This is
>>> the most likely cause of your error and something you need to correct
>>> in your application (spawn a thread to renew your application's
>>> ticket).
>>>
>>> If you are using MapReduce, you have yet another layer of indirection
>>> with DelegationTokens, but that's probably not what you're seeing (as
>>> DelegationTokens don't actually have a Kerberos TGT).
>>>
>>> On Mon, Jul 10, 2017 at 5:42 PM, Christopher <[hidden email]> wrote:
>>>>
>>>> It certainly sounds like the same issue. I'd recommend upgrading to the
>>>> latest 1.7.3 (currently the latest 1.7 version) to include all the bugs
>>>> we've found and fixed in that release line.
>>>>
>>>> On Mon, Jul 10, 2017 at 5:50 AM James Srinivasan
>>>> <[hidden email]> wrote:
>>>>>
>>>>>
>>>>> I'm using Accumulo 1.7.0 and finding that after some period of time
>>>>> (>8 hours, <3 days - happened over the weekend) my ingest fails with
>>>>> errors regarding "Failed to find any Kerberos tgt". My guess is that
>>>>> the ticket from the keytab has expired, and needs to be renewed - from
>>>>> memory, I had seen a Kerberos tgt renewer thread running in my client,
>>>>> so assumed it happened automagically. Is that the case? Perhaps I am
>>>>> hitting this bug? https://issues.apache.org/jira/browse/ACCUMULO-4069
>>>>>
>>>>> Thanks,
>>>>>
>>>>> James
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Kerberos ticket renewal

Sean Busbey
Hi James!

It sounds like you may need to chase things down with your vendor,
since the precise combination of patches included will make looking at
things hard for the community.

On Wed, Jul 12, 2017 at 11:01 AM, James Srinivasan
<[hidden email]> wrote:

> Hi,
>
> So I've fired off a thread to perform the periodic
> checkTGTAndReloginFromKeytab call which seems to be running, but the
> connection still fails with GSS errors after precisely 10 hours.
>
> While I am running 1.7.0, it seems the vendor included the
> ACCUMULO-4069 patch, and immediately after the exception is thrown I
> see a log entry "Performing ticket-cache-based Kerberos re-login".
> However, it should be using a keytab - have turned up the logging to
> 11 and will leave running overnight...
>
> James
>
> On 11 July 2017 at 16:17, Josh Elser <[hidden email]> wrote:
>> Nope, you've got it exactly right! That's the code I would've pointed you at
>> to copy :)
>>
>> If/when you do get to long-running MR jobs, see the
>> "general.delegation.token.*" configuration properties in this table[1]. I
>> think the docs are citing that one delegation token is valid for 7 days, but
>> it's been a long time since writing/testing that code.
>>
>> - Josh
>>
>> [1]
>> https://accumulo.apache.org/1.8/accumulo_user_manual.html#_server_configuration_2
>>
>> On 7/11/17 1:25 AM, James Srinivasan wrote:
>>>
>>> Thanks both. I can't (easily) upgrade beyond 1.7.0, but have raised a
>>> support case with our Hadoop distribution vendor.
>>>
>>> I'm not (yet) worried about expiration with MapReduce - for now I'll
>>> try to keep such jobs to under 24h! Outside MR, sounds like I just
>>> need to periodically call
>>> UserGroupInformation.checkTGTAndReloginFromKeytab like
>>>
>>>
>>> https://github.com/apache/accumulo/blob/master/server/base/src/main/java/org/apache/accumulo/server/security/SecurityUtil.java#L121
>>>
>>> Or is the TGT associated with an Accumulo KerberosToken separate?
>>>
>>> Thanks,
>>>
>>> James
>>>
>>> On 11 July 2017 at 02:59, Josh Elser <[hidden email]> wrote:
>>>>
>>>> No, you are (likely) not running into ACCUMULO-4069. What you've
>>>> described sounds like your client's ticket expired. Accumulo does not
>>>> spawn any ticket renewal on the behalf of clients.
>>>>
>>>> Hadoop's UGI code will automatically spawn a renewal thread when you
>>>> log in using a ticket cache. This does not happen automatically when
>>>> you use a keytab (I have no explanation as to why this is). This is
>>>> the most likely cause of your error and something you need to correct
>>>> in your application (spawn a thread to renew your application's
>>>> ticket).
>>>>
>>>> If you are using MapReduce, you have yet another layer of indirection
>>>> with DelegationTokens, but that's probably not what you're seeing (as
>>>> DelegationTokens don't actually have a Kerberos TGT).
>>>>
>>>> On Mon, Jul 10, 2017 at 5:42 PM, Christopher <[hidden email]> wrote:
>>>>>
>>>>> It certainly sounds like the same issue. I'd recommend upgrading to the
>>>>> latest 1.7.3 (currently the latest 1.7 version) to include all the bugs
>>>>> we've found and fixed in that release line.
>>>>>
>>>>> On Mon, Jul 10, 2017 at 5:50 AM James Srinivasan
>>>>> <[hidden email]> wrote:
>>>>>>
>>>>>>
>>>>>> I'm using Accumulo 1.7.0 and finding that after some period of time
>>>>>> (>8 hours, <3 days - happened over the weekend) my ingest fails with
>>>>>> errors regarding "Failed to find any Kerberos tgt". My guess is that
>>>>>> the ticket from the keytab has expired, and needs to be renewed - from
>>>>>> memory, I had seen a Kerberos tgt renewer thread running in my client,
>>>>>> so assumed it happened automagically. Is that the case? Perhaps I am
>>>>>> hitting this bug? https://issues.apache.org/jira/browse/ACCUMULO-4069
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> James



--
busbey
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Kerberos ticket renewal

James Srinivasan
Yup, I'm going to spin up a vanilla 1.7.0 (maybe newer) install too to
see if it behaves any differently. There is at least one patch
included in their distro that isn't in the formal documentation, plus
it makes matching line numbers in logs to src code rather difficult.

Thanks,

James

On 12 July 2017 at 20:37, Sean Busbey <[hidden email]> wrote:

> Hi James!
>
> It sounds like you may need to chase things down with your vendor,
> since the precise combination of patches included will make looking at
> things hard for the community.
>
> On Wed, Jul 12, 2017 at 11:01 AM, James Srinivasan
> <[hidden email]> wrote:
>> Hi,
>>
>> So I've fired off a thread to perform the periodic
>> checkTGTAndReloginFromKeytab call which seems to be running, but the
>> connection still fails with GSS errors after precisely 10 hours.
>>
>> While I am running 1.7.0, it seems the vendor included the
>> ACCUMULO-4069 patch, and immediately after the exception is thrown I
>> see a log entry "Performing ticket-cache-based Kerberos re-login".
>> However, it should be using a keytab - have turned up the logging to
>> 11 and will leave running overnight...
>>
>> James
>>
>> On 11 July 2017 at 16:17, Josh Elser <[hidden email]> wrote:
>>> Nope, you've got it exactly right! That's the code I would've pointed you at
>>> to copy :)
>>>
>>> If/when you do get to long-running MR jobs, see the
>>> "general.delegation.token.*" configuration properties in this table[1]. I
>>> think the docs are citing that one delegation token is valid for 7 days, but
>>> it's been a long time since writing/testing that code.
>>>
>>> - Josh
>>>
>>> [1]
>>> https://accumulo.apache.org/1.8/accumulo_user_manual.html#_server_configuration_2
>>>
>>> On 7/11/17 1:25 AM, James Srinivasan wrote:
>>>>
>>>> Thanks both. I can't (easily) upgrade beyond 1.7.0, but have raised a
>>>> support case with our Hadoop distribution vendor.
>>>>
>>>> I'm not (yet) worried about expiration with MapReduce - for now I'll
>>>> try to keep such jobs to under 24h! Outside MR, sounds like I just
>>>> need to periodically call
>>>> UserGroupInformation.checkTGTAndReloginFromKeytab like
>>>>
>>>>
>>>> https://github.com/apache/accumulo/blob/master/server/base/src/main/java/org/apache/accumulo/server/security/SecurityUtil.java#L121
>>>>
>>>> Or is the TGT associated with an Accumulo KerberosToken separate?
>>>>
>>>> Thanks,
>>>>
>>>> James
>>>>
>>>> On 11 July 2017 at 02:59, Josh Elser <[hidden email]> wrote:
>>>>>
>>>>> No, you are (likely) not running into ACCUMULO-4069. What you've
>>>>> described sounds like your client's ticket expired. Accumulo does not
>>>>> spawn any ticket renewal on the behalf of clients.
>>>>>
>>>>> Hadoop's UGI code will automatically spawn a renewal thread when you
>>>>> log in using a ticket cache. This does not happen automatically when
>>>>> you use a keytab (I have no explanation as to why this is). This is
>>>>> the most likely cause of your error and something you need to correct
>>>>> in your application (spawn a thread to renew your application's
>>>>> ticket).
>>>>>
>>>>> If you are using MapReduce, you have yet another layer of indirection
>>>>> with DelegationTokens, but that's probably not what you're seeing (as
>>>>> DelegationTokens don't actually have a Kerberos TGT).
>>>>>
>>>>> On Mon, Jul 10, 2017 at 5:42 PM, Christopher <[hidden email]> wrote:
>>>>>>
>>>>>> It certainly sounds like the same issue. I'd recommend upgrading to the
>>>>>> latest 1.7.3 (currently the latest 1.7 version) to include all the bugs
>>>>>> we've found and fixed in that release line.
>>>>>>
>>>>>> On Mon, Jul 10, 2017 at 5:50 AM James Srinivasan
>>>>>> <[hidden email]> wrote:
>>>>>>>
>>>>>>>
>>>>>>> I'm using Accumulo 1.7.0 and finding that after some period of time
>>>>>>> (>8 hours, <3 days - happened over the weekend) my ingest fails with
>>>>>>> errors regarding "Failed to find any Kerberos tgt". My guess is that
>>>>>>> the ticket from the keytab has expired, and needs to be renewed - from
>>>>>>> memory, I had seen a Kerberos tgt renewer thread running in my client,
>>>>>>> so assumed it happened automagically. Is that the case? Perhaps I am
>>>>>>> hitting this bug? https://issues.apache.org/jira/browse/ACCUMULO-4069
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> James
>
>
>
> --
> busbey
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Kerberos ticket renewal

Sean Busbey
FWIW, most vendors will also publish the source code to their patched
releases. that would at least help with the source code reading stuff.

On Wed, Jul 12, 2017 at 2:48 PM, James Srinivasan
<[hidden email]> wrote:

> Yup, I'm going to spin up a vanilla 1.7.0 (maybe newer) install too to
> see if it behaves any differently. There is at least one patch
> included in their distro that isn't in the formal documentation, plus
> it makes matching line numbers in logs to src code rather difficult.
>
> Thanks,
>
> James
>
> On 12 July 2017 at 20:37, Sean Busbey <[hidden email]> wrote:
>> Hi James!
>>
>> It sounds like you may need to chase things down with your vendor,
>> since the precise combination of patches included will make looking at
>> things hard for the community.
>>
>> On Wed, Jul 12, 2017 at 11:01 AM, James Srinivasan
>> <[hidden email]> wrote:
>>> Hi,
>>>
>>> So I've fired off a thread to perform the periodic
>>> checkTGTAndReloginFromKeytab call which seems to be running, but the
>>> connection still fails with GSS errors after precisely 10 hours.
>>>
>>> While I am running 1.7.0, it seems the vendor included the
>>> ACCUMULO-4069 patch, and immediately after the exception is thrown I
>>> see a log entry "Performing ticket-cache-based Kerberos re-login".
>>> However, it should be using a keytab - have turned up the logging to
>>> 11 and will leave running overnight...
>>>
>>> James
>>>
>>> On 11 July 2017 at 16:17, Josh Elser <[hidden email]> wrote:
>>>> Nope, you've got it exactly right! That's the code I would've pointed you at
>>>> to copy :)
>>>>
>>>> If/when you do get to long-running MR jobs, see the
>>>> "general.delegation.token.*" configuration properties in this table[1]. I
>>>> think the docs are citing that one delegation token is valid for 7 days, but
>>>> it's been a long time since writing/testing that code.
>>>>
>>>> - Josh
>>>>
>>>> [1]
>>>> https://accumulo.apache.org/1.8/accumulo_user_manual.html#_server_configuration_2
>>>>
>>>> On 7/11/17 1:25 AM, James Srinivasan wrote:
>>>>>
>>>>> Thanks both. I can't (easily) upgrade beyond 1.7.0, but have raised a
>>>>> support case with our Hadoop distribution vendor.
>>>>>
>>>>> I'm not (yet) worried about expiration with MapReduce - for now I'll
>>>>> try to keep such jobs to under 24h! Outside MR, sounds like I just
>>>>> need to periodically call
>>>>> UserGroupInformation.checkTGTAndReloginFromKeytab like
>>>>>
>>>>>
>>>>> https://github.com/apache/accumulo/blob/master/server/base/src/main/java/org/apache/accumulo/server/security/SecurityUtil.java#L121
>>>>>
>>>>> Or is the TGT associated with an Accumulo KerberosToken separate?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> James
>>>>>
>>>>> On 11 July 2017 at 02:59, Josh Elser <[hidden email]> wrote:
>>>>>>
>>>>>> No, you are (likely) not running into ACCUMULO-4069. What you've
>>>>>> described sounds like your client's ticket expired. Accumulo does not
>>>>>> spawn any ticket renewal on the behalf of clients.
>>>>>>
>>>>>> Hadoop's UGI code will automatically spawn a renewal thread when you
>>>>>> log in using a ticket cache. This does not happen automatically when
>>>>>> you use a keytab (I have no explanation as to why this is). This is
>>>>>> the most likely cause of your error and something you need to correct
>>>>>> in your application (spawn a thread to renew your application's
>>>>>> ticket).
>>>>>>
>>>>>> If you are using MapReduce, you have yet another layer of indirection
>>>>>> with DelegationTokens, but that's probably not what you're seeing (as
>>>>>> DelegationTokens don't actually have a Kerberos TGT).
>>>>>>
>>>>>> On Mon, Jul 10, 2017 at 5:42 PM, Christopher <[hidden email]> wrote:
>>>>>>>
>>>>>>> It certainly sounds like the same issue. I'd recommend upgrading to the
>>>>>>> latest 1.7.3 (currently the latest 1.7 version) to include all the bugs
>>>>>>> we've found and fixed in that release line.
>>>>>>>
>>>>>>> On Mon, Jul 10, 2017 at 5:50 AM James Srinivasan
>>>>>>> <[hidden email]> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> I'm using Accumulo 1.7.0 and finding that after some period of time
>>>>>>>> (>8 hours, <3 days - happened over the weekend) my ingest fails with
>>>>>>>> errors regarding "Failed to find any Kerberos tgt". My guess is that
>>>>>>>> the ticket from the keytab has expired, and needs to be renewed - from
>>>>>>>> memory, I had seen a Kerberos tgt renewer thread running in my client,
>>>>>>>> so assumed it happened automagically. Is that the case? Perhaps I am
>>>>>>>> hitting this bug? https://issues.apache.org/jira/browse/ACCUMULO-4069
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> James
>>
>>
>>
>> --
>> busbey



--
busbey
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Kerberos ticket renewal

Josh Elser-2
In reply to this post by James Srinivasan
If you're using Hortonworks' HDP, you would probably benefit from
https://github.com/hortonworks/accumulo

There is likely a git-tag for the exact version that you're running. The
line numbers would match there.

To be clear, if your services (e.g. TabletServers) aren't failing after
10hrs, you're not running into ACCUMULO-4069. Given my (limited)
understanding, your problem is purely client-side. It's possible that
the client-side RPC implementation isn't correctly handling the ticket
re-login, but I know there is specifically code in there to handle the
re-login case.

The next step would be getting some debug logging from your application
around UserGroupInformation or the JDK itself, or just spin up a trivial
example with a small relogin window to reproduce the problem.

On 7/12/17 3:48 PM, James Srinivasan wrote:

> Yup, I'm going to spin up a vanilla 1.7.0 (maybe newer) install too to
> see if it behaves any differently. There is at least one patch
> included in their distro that isn't in the formal documentation, plus
> it makes matching line numbers in logs to src code rather difficult.
>
> Thanks,
>
> James
>
> On 12 July 2017 at 20:37, Sean Busbey <[hidden email]> wrote:
>> Hi James!
>>
>> It sounds like you may need to chase things down with your vendor,
>> since the precise combination of patches included will make looking at
>> things hard for the community.
>>
>> On Wed, Jul 12, 2017 at 11:01 AM, James Srinivasan
>> <[hidden email]> wrote:
>>> Hi,
>>>
>>> So I've fired off a thread to perform the periodic
>>> checkTGTAndReloginFromKeytab call which seems to be running, but the
>>> connection still fails with GSS errors after precisely 10 hours.
>>>
>>> While I am running 1.7.0, it seems the vendor included the
>>> ACCUMULO-4069 patch, and immediately after the exception is thrown I
>>> see a log entry "Performing ticket-cache-based Kerberos re-login".
>>> However, it should be using a keytab - have turned up the logging to
>>> 11 and will leave running overnight...
>>>
>>> James
>>>
>>> On 11 July 2017 at 16:17, Josh Elser <[hidden email]> wrote:
>>>> Nope, you've got it exactly right! That's the code I would've pointed you at
>>>> to copy :)
>>>>
>>>> If/when you do get to long-running MR jobs, see the
>>>> "general.delegation.token.*" configuration properties in this table[1]. I
>>>> think the docs are citing that one delegation token is valid for 7 days, but
>>>> it's been a long time since writing/testing that code.
>>>>
>>>> - Josh
>>>>
>>>> [1]
>>>> https://accumulo.apache.org/1.8/accumulo_user_manual.html#_server_configuration_2
>>>>
>>>> On 7/11/17 1:25 AM, James Srinivasan wrote:
>>>>>
>>>>> Thanks both. I can't (easily) upgrade beyond 1.7.0, but have raised a
>>>>> support case with our Hadoop distribution vendor.
>>>>>
>>>>> I'm not (yet) worried about expiration with MapReduce - for now I'll
>>>>> try to keep such jobs to under 24h! Outside MR, sounds like I just
>>>>> need to periodically call
>>>>> UserGroupInformation.checkTGTAndReloginFromKeytab like
>>>>>
>>>>>
>>>>> https://github.com/apache/accumulo/blob/master/server/base/src/main/java/org/apache/accumulo/server/security/SecurityUtil.java#L121
>>>>>
>>>>> Or is the TGT associated with an Accumulo KerberosToken separate?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> James
>>>>>
>>>>> On 11 July 2017 at 02:59, Josh Elser <[hidden email]> wrote:
>>>>>>
>>>>>> No, you are (likely) not running into ACCUMULO-4069. What you've
>>>>>> described sounds like your client's ticket expired. Accumulo does not
>>>>>> spawn any ticket renewal on the behalf of clients.
>>>>>>
>>>>>> Hadoop's UGI code will automatically spawn a renewal thread when you
>>>>>> log in using a ticket cache. This does not happen automatically when
>>>>>> you use a keytab (I have no explanation as to why this is). This is
>>>>>> the most likely cause of your error and something you need to correct
>>>>>> in your application (spawn a thread to renew your application's
>>>>>> ticket).
>>>>>>
>>>>>> If you are using MapReduce, you have yet another layer of indirection
>>>>>> with DelegationTokens, but that's probably not what you're seeing (as
>>>>>> DelegationTokens don't actually have a Kerberos TGT).
>>>>>>
>>>>>> On Mon, Jul 10, 2017 at 5:42 PM, Christopher <[hidden email]> wrote:
>>>>>>>
>>>>>>> It certainly sounds like the same issue. I'd recommend upgrading to the
>>>>>>> latest 1.7.3 (currently the latest 1.7 version) to include all the bugs
>>>>>>> we've found and fixed in that release line.
>>>>>>>
>>>>>>> On Mon, Jul 10, 2017 at 5:50 AM James Srinivasan
>>>>>>> <[hidden email]> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> I'm using Accumulo 1.7.0 and finding that after some period of time
>>>>>>>> (>8 hours, <3 days - happened over the weekend) my ingest fails with
>>>>>>>> errors regarding "Failed to find any Kerberos tgt". My guess is that
>>>>>>>> the ticket from the keytab has expired, and needs to be renewed - from
>>>>>>>> memory, I had seen a Kerberos tgt renewer thread running in my client,
>>>>>>>> so assumed it happened automagically. Is that the case? Perhaps I am
>>>>>>>> hitting this bug? https://issues.apache.org/jira/browse/ACCUMULO-4069
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> James
>>
>>
>>
>> --
>> busbey
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Kerberos ticket renewal

James Srinivasan
Yup, I am indeed on HDP - thanks for the link. The services do log GSS
exceptions every ten hours, but seem to sufficiently recover
themselves. Having turned up logging on my client:

1) On client start, I see hadoop login messages
2) After 8 hours (0.8*10 hours) when the renewal is expected to take
place, I don't see any hadoop login messages
3) After 10 hours, I see GSS exceptions
4) After each GSS exception, I see an attempt to renew but using
ticket cache, rather than keytab.

Currently working on shortening the 10 hour expiry time so I can catch
it in a debugger!

Thanks,

James


On 13 July 2017 at 15:20, Josh Elser <[hidden email]> wrote:

> If you're using Hortonworks' HDP, you would probably benefit from
> https://github.com/hortonworks/accumulo
>
> There is likely a git-tag for the exact version that you're running. The
> line numbers would match there.
>
> To be clear, if your services (e.g. TabletServers) aren't failing after
> 10hrs, you're not running into ACCUMULO-4069. Given my (limited)
> understanding, your problem is purely client-side. It's possible that the
> client-side RPC implementation isn't correctly handling the ticket re-login,
> but I know there is specifically code in there to handle the re-login case.
>
> The next step would be getting some debug logging from your application
> around UserGroupInformation or the JDK itself, or just spin up a trivial
> example with a small relogin window to reproduce the problem.
>
> On 7/12/17 3:48 PM, James Srinivasan wrote:
>>
>> Yup, I'm going to spin up a vanilla 1.7.0 (maybe newer) install too to
>> see if it behaves any differently. There is at least one patch
>> included in their distro that isn't in the formal documentation, plus
>> it makes matching line numbers in logs to src code rather difficult.
>>
>> Thanks,
>>
>> James
>>
>> On 12 July 2017 at 20:37, Sean Busbey <[hidden email]> wrote:
>>>
>>> Hi James!
>>>
>>> It sounds like you may need to chase things down with your vendor,
>>> since the precise combination of patches included will make looking at
>>> things hard for the community.
>>>
>>> On Wed, Jul 12, 2017 at 11:01 AM, James Srinivasan
>>> <[hidden email]> wrote:
>>>>
>>>> Hi,
>>>>
>>>> So I've fired off a thread to perform the periodic
>>>> checkTGTAndReloginFromKeytab call which seems to be running, but the
>>>> connection still fails with GSS errors after precisely 10 hours.
>>>>
>>>> While I am running 1.7.0, it seems the vendor included the
>>>> ACCUMULO-4069 patch, and immediately after the exception is thrown I
>>>> see a log entry "Performing ticket-cache-based Kerberos re-login".
>>>> However, it should be using a keytab - have turned up the logging to
>>>> 11 and will leave running overnight...
>>>>
>>>> James
>>>>
>>>> On 11 July 2017 at 16:17, Josh Elser <[hidden email]> wrote:
>>>>>
>>>>> Nope, you've got it exactly right! That's the code I would've pointed
>>>>> you at
>>>>> to copy :)
>>>>>
>>>>> If/when you do get to long-running MR jobs, see the
>>>>> "general.delegation.token.*" configuration properties in this table[1].
>>>>> I
>>>>> think the docs are citing that one delegation token is valid for 7
>>>>> days, but
>>>>> it's been a long time since writing/testing that code.
>>>>>
>>>>> - Josh
>>>>>
>>>>> [1]
>>>>>
>>>>> https://accumulo.apache.org/1.8/accumulo_user_manual.html#_server_configuration_2
>>>>>
>>>>> On 7/11/17 1:25 AM, James Srinivasan wrote:
>>>>>>
>>>>>>
>>>>>> Thanks both. I can't (easily) upgrade beyond 1.7.0, but have raised a
>>>>>> support case with our Hadoop distribution vendor.
>>>>>>
>>>>>> I'm not (yet) worried about expiration with MapReduce - for now I'll
>>>>>> try to keep such jobs to under 24h! Outside MR, sounds like I just
>>>>>> need to periodically call
>>>>>> UserGroupInformation.checkTGTAndReloginFromKeytab like
>>>>>>
>>>>>>
>>>>>>
>>>>>> https://github.com/apache/accumulo/blob/master/server/base/src/main/java/org/apache/accumulo/server/security/SecurityUtil.java#L121
>>>>>>
>>>>>> Or is the TGT associated with an Accumulo KerberosToken separate?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> James
>>>>>>
>>>>>> On 11 July 2017 at 02:59, Josh Elser <[hidden email]> wrote:
>>>>>>>
>>>>>>>
>>>>>>> No, you are (likely) not running into ACCUMULO-4069. What you've
>>>>>>> described sounds like your client's ticket expired. Accumulo does not
>>>>>>> spawn any ticket renewal on the behalf of clients.
>>>>>>>
>>>>>>> Hadoop's UGI code will automatically spawn a renewal thread when you
>>>>>>> log in using a ticket cache. This does not happen automatically when
>>>>>>> you use a keytab (I have no explanation as to why this is). This is
>>>>>>> the most likely cause of your error and something you need to correct
>>>>>>> in your application (spawn a thread to renew your application's
>>>>>>> ticket).
>>>>>>>
>>>>>>> If you are using MapReduce, you have yet another layer of indirection
>>>>>>> with DelegationTokens, but that's probably not what you're seeing (as
>>>>>>> DelegationTokens don't actually have a Kerberos TGT).
>>>>>>>
>>>>>>> On Mon, Jul 10, 2017 at 5:42 PM, Christopher <[hidden email]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> It certainly sounds like the same issue. I'd recommend upgrading to
>>>>>>>> the
>>>>>>>> latest 1.7.3 (currently the latest 1.7 version) to include all the
>>>>>>>> bugs
>>>>>>>> we've found and fixed in that release line.
>>>>>>>>
>>>>>>>> On Mon, Jul 10, 2017 at 5:50 AM James Srinivasan
>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'm using Accumulo 1.7.0 and finding that after some period of time
>>>>>>>>> (>8 hours, <3 days - happened over the weekend) my ingest fails
>>>>>>>>> with
>>>>>>>>> errors regarding "Failed to find any Kerberos tgt". My guess is
>>>>>>>>> that
>>>>>>>>> the ticket from the keytab has expired, and needs to be renewed -
>>>>>>>>> from
>>>>>>>>> memory, I had seen a Kerberos tgt renewer thread running in my
>>>>>>>>> client,
>>>>>>>>> so assumed it happened automagically. Is that the case? Perhaps I
>>>>>>>>> am
>>>>>>>>> hitting this bug?
>>>>>>>>> https://issues.apache.org/jira/browse/ACCUMULO-4069
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> James
>>>
>>>
>>>
>>>
>>> --
>>> busbey
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Kerberos ticket renewal

Josh Elser-2
It also may be worth mentioning to check the principal's configuration
that you're using in your client. Depending on which you're using and
how it was created, it may not actually support renewals.

A quick test is to just `kinit` and then `kinit -R`. You can view the
explicit "configuration" for a principal using the `kadmin` console and
the `getprinc <principal>` command. Be sure to check the krbtgt/<REALM>
principal as well:

e.g.

kadmin.local:  getprinc jelser
Principal: [hidden email]
Maximum ticket life: 1 day 00:00:00
Maximum renewable life: 7 days 00:00:00

kadmin.local:  getprinc krbtgt/EXAMPLE.COM
Principal: krbtgt/[hidden email]
Maximum ticket life: 1 day 00:00:00
Maximum renewable life: 7 days 00:00:00

If the krbtgt/$REALM principal does not have a non-zero renewable
lifetime, any other principals created in that realm would also not be
allowed to be renewed. Since you have the working "service" principals,
you can cross-check those.

On 7/13/17 10:56 AM, James Srinivasan wrote:

> Yup, I am indeed on HDP - thanks for the link. The services do log GSS
> exceptions every ten hours, but seem to sufficiently recover
> themselves. Having turned up logging on my client:
>
> 1) On client start, I see hadoop login messages
> 2) After 8 hours (0.8*10 hours) when the renewal is expected to take
> place, I don't see any hadoop login messages
> 3) After 10 hours, I see GSS exceptions
> 4) After each GSS exception, I see an attempt to renew but using
> ticket cache, rather than keytab.
>
> Currently working on shortening the 10 hour expiry time so I can catch
> it in a debugger!
>
> Thanks,
>
> James
>
>
> On 13 July 2017 at 15:20, Josh Elser <[hidden email]> wrote:
>> If you're using Hortonworks' HDP, you would probably benefit from
>> https://github.com/hortonworks/accumulo
>>
>> There is likely a git-tag for the exact version that you're running. The
>> line numbers would match there.
>>
>> To be clear, if your services (e.g. TabletServers) aren't failing after
>> 10hrs, you're not running into ACCUMULO-4069. Given my (limited)
>> understanding, your problem is purely client-side. It's possible that the
>> client-side RPC implementation isn't correctly handling the ticket re-login,
>> but I know there is specifically code in there to handle the re-login case.
>>
>> The next step would be getting some debug logging from your application
>> around UserGroupInformation or the JDK itself, or just spin up a trivial
>> example with a small relogin window to reproduce the problem.
>>
>> On 7/12/17 3:48 PM, James Srinivasan wrote:
>>>
>>> Yup, I'm going to spin up a vanilla 1.7.0 (maybe newer) install too to
>>> see if it behaves any differently. There is at least one patch
>>> included in their distro that isn't in the formal documentation, plus
>>> it makes matching line numbers in logs to src code rather difficult.
>>>
>>> Thanks,
>>>
>>> James
>>>
>>> On 12 July 2017 at 20:37, Sean Busbey <[hidden email]> wrote:
>>>>
>>>> Hi James!
>>>>
>>>> It sounds like you may need to chase things down with your vendor,
>>>> since the precise combination of patches included will make looking at
>>>> things hard for the community.
>>>>
>>>> On Wed, Jul 12, 2017 at 11:01 AM, James Srinivasan
>>>> <[hidden email]> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> So I've fired off a thread to perform the periodic
>>>>> checkTGTAndReloginFromKeytab call which seems to be running, but the
>>>>> connection still fails with GSS errors after precisely 10 hours.
>>>>>
>>>>> While I am running 1.7.0, it seems the vendor included the
>>>>> ACCUMULO-4069 patch, and immediately after the exception is thrown I
>>>>> see a log entry "Performing ticket-cache-based Kerberos re-login".
>>>>> However, it should be using a keytab - have turned up the logging to
>>>>> 11 and will leave running overnight...
>>>>>
>>>>> James
>>>>>
>>>>> On 11 July 2017 at 16:17, Josh Elser <[hidden email]> wrote:
>>>>>>
>>>>>> Nope, you've got it exactly right! That's the code I would've pointed
>>>>>> you at
>>>>>> to copy :)
>>>>>>
>>>>>> If/when you do get to long-running MR jobs, see the
>>>>>> "general.delegation.token.*" configuration properties in this table[1].
>>>>>> I
>>>>>> think the docs are citing that one delegation token is valid for 7
>>>>>> days, but
>>>>>> it's been a long time since writing/testing that code.
>>>>>>
>>>>>> - Josh
>>>>>>
>>>>>> [1]
>>>>>>
>>>>>> https://accumulo.apache.org/1.8/accumulo_user_manual.html#_server_configuration_2
>>>>>>
>>>>>> On 7/11/17 1:25 AM, James Srinivasan wrote:
>>>>>>>
>>>>>>>
>>>>>>> Thanks both. I can't (easily) upgrade beyond 1.7.0, but have raised a
>>>>>>> support case with our Hadoop distribution vendor.
>>>>>>>
>>>>>>> I'm not (yet) worried about expiration with MapReduce - for now I'll
>>>>>>> try to keep such jobs to under 24h! Outside MR, sounds like I just
>>>>>>> need to periodically call
>>>>>>> UserGroupInformation.checkTGTAndReloginFromKeytab like
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/apache/accumulo/blob/master/server/base/src/main/java/org/apache/accumulo/server/security/SecurityUtil.java#L121
>>>>>>>
>>>>>>> Or is the TGT associated with an Accumulo KerberosToken separate?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> James
>>>>>>>
>>>>>>> On 11 July 2017 at 02:59, Josh Elser <[hidden email]> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> No, you are (likely) not running into ACCUMULO-4069. What you've
>>>>>>>> described sounds like your client's ticket expired. Accumulo does not
>>>>>>>> spawn any ticket renewal on the behalf of clients.
>>>>>>>>
>>>>>>>> Hadoop's UGI code will automatically spawn a renewal thread when you
>>>>>>>> log in using a ticket cache. This does not happen automatically when
>>>>>>>> you use a keytab (I have no explanation as to why this is). This is
>>>>>>>> the most likely cause of your error and something you need to correct
>>>>>>>> in your application (spawn a thread to renew your application's
>>>>>>>> ticket).
>>>>>>>>
>>>>>>>> If you are using MapReduce, you have yet another layer of indirection
>>>>>>>> with DelegationTokens, but that's probably not what you're seeing (as
>>>>>>>> DelegationTokens don't actually have a Kerberos TGT).
>>>>>>>>
>>>>>>>> On Mon, Jul 10, 2017 at 5:42 PM, Christopher <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It certainly sounds like the same issue. I'd recommend upgrading to
>>>>>>>>> the
>>>>>>>>> latest 1.7.3 (currently the latest 1.7 version) to include all the
>>>>>>>>> bugs
>>>>>>>>> we've found and fixed in that release line.
>>>>>>>>>
>>>>>>>>> On Mon, Jul 10, 2017 at 5:50 AM James Srinivasan
>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I'm using Accumulo 1.7.0 and finding that after some period of time
>>>>>>>>>> (>8 hours, <3 days - happened over the weekend) my ingest fails
>>>>>>>>>> with
>>>>>>>>>> errors regarding "Failed to find any Kerberos tgt". My guess is
>>>>>>>>>> that
>>>>>>>>>> the ticket from the keytab has expired, and needs to be renewed -
>>>>>>>>>> from
>>>>>>>>>> memory, I had seen a Kerberos tgt renewer thread running in my
>>>>>>>>>> client,
>>>>>>>>>> so assumed it happened automagically. Is that the case? Perhaps I
>>>>>>>>>> am
>>>>>>>>>> hitting this bug?
>>>>>>>>>> https://issues.apache.org/jira/browse/ACCUMULO-4069
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> James
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> busbey
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Kerberos ticket renewal

James Srinivasan
Thanks, just checked that and it does seem renewable (tested using
kinit -R). I'm running my code in two separate scenarios:

1) As part of a NiFi processor, which currently makes multiple
Accumulo connections using the same keytab, each of which currently
has a separate renewer thread
2) As part of a simple command line application - this seems to have
no problem running for > 10 hours (even before I added the periodic
renewal code)

Will add extra logging to #2 and try to shorten the expiry from 10
hours to 1 so I can see any difference in output.

James

On 13 July 2017 at 16:05, Josh Elser <[hidden email]> wrote:

> It also may be worth mentioning to check the principal's configuration that
> you're using in your client. Depending on which you're using and how it was
> created, it may not actually support renewals.
>
> A quick test is to just `kinit` and then `kinit -R`. You can view the
> explicit "configuration" for a principal using the `kadmin` console and the
> `getprinc <principal>` command. Be sure to check the krbtgt/<REALM>
> principal as well:
>
> e.g.
>
> kadmin.local:  getprinc jelser
> Principal: [hidden email]
> Maximum ticket life: 1 day 00:00:00
> Maximum renewable life: 7 days 00:00:00
>
> kadmin.local:  getprinc krbtgt/EXAMPLE.COM
> Principal: krbtgt/[hidden email]
> Maximum ticket life: 1 day 00:00:00
> Maximum renewable life: 7 days 00:00:00
>
> If the krbtgt/$REALM principal does not have a non-zero renewable lifetime,
> any other principals created in that realm would also not be allowed to be
> renewed. Since you have the working "service" principals, you can
> cross-check those.
>
> On 7/13/17 10:56 AM, James Srinivasan wrote:
>>
>> Yup, I am indeed on HDP - thanks for the link. The services do log GSS
>> exceptions every ten hours, but seem to sufficiently recover
>> themselves. Having turned up logging on my client:
>>
>> 1) On client start, I see hadoop login messages
>> 2) After 8 hours (0.8*10 hours) when the renewal is expected to take
>> place, I don't see any hadoop login messages
>> 3) After 10 hours, I see GSS exceptions
>> 4) After each GSS exception, I see an attempt to renew but using
>> ticket cache, rather than keytab.
>>
>> Currently working on shortening the 10 hour expiry time so I can catch
>> it in a debugger!
>>
>> Thanks,
>>
>> James
>>
>>
>> On 13 July 2017 at 15:20, Josh Elser <[hidden email]> wrote:
>>>
>>> If you're using Hortonworks' HDP, you would probably benefit from
>>> https://github.com/hortonworks/accumulo
>>>
>>> There is likely a git-tag for the exact version that you're running. The
>>> line numbers would match there.
>>>
>>> To be clear, if your services (e.g. TabletServers) aren't failing after
>>> 10hrs, you're not running into ACCUMULO-4069. Given my (limited)
>>> understanding, your problem is purely client-side. It's possible that the
>>> client-side RPC implementation isn't correctly handling the ticket
>>> re-login,
>>> but I know there is specifically code in there to handle the re-login
>>> case.
>>>
>>> The next step would be getting some debug logging from your application
>>> around UserGroupInformation or the JDK itself, or just spin up a trivial
>>> example with a small relogin window to reproduce the problem.
>>>
>>> On 7/12/17 3:48 PM, James Srinivasan wrote:
>>>>
>>>>
>>>> Yup, I'm going to spin up a vanilla 1.7.0 (maybe newer) install too to
>>>> see if it behaves any differently. There is at least one patch
>>>> included in their distro that isn't in the formal documentation, plus
>>>> it makes matching line numbers in logs to src code rather difficult.
>>>>
>>>> Thanks,
>>>>
>>>> James
>>>>
>>>> On 12 July 2017 at 20:37, Sean Busbey <[hidden email]> wrote:
>>>>>
>>>>>
>>>>> Hi James!
>>>>>
>>>>> It sounds like you may need to chase things down with your vendor,
>>>>> since the precise combination of patches included will make looking at
>>>>> things hard for the community.
>>>>>
>>>>> On Wed, Jul 12, 2017 at 11:01 AM, James Srinivasan
>>>>> <[hidden email]> wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> So I've fired off a thread to perform the periodic
>>>>>> checkTGTAndReloginFromKeytab call which seems to be running, but the
>>>>>> connection still fails with GSS errors after precisely 10 hours.
>>>>>>
>>>>>> While I am running 1.7.0, it seems the vendor included the
>>>>>> ACCUMULO-4069 patch, and immediately after the exception is thrown I
>>>>>> see a log entry "Performing ticket-cache-based Kerberos re-login".
>>>>>> However, it should be using a keytab - have turned up the logging to
>>>>>> 11 and will leave running overnight...
>>>>>>
>>>>>> James
>>>>>>
>>>>>> On 11 July 2017 at 16:17, Josh Elser <[hidden email]> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Nope, you've got it exactly right! That's the code I would've pointed
>>>>>>> you at
>>>>>>> to copy :)
>>>>>>>
>>>>>>> If/when you do get to long-running MR jobs, see the
>>>>>>> "general.delegation.token.*" configuration properties in this
>>>>>>> table[1].
>>>>>>> I
>>>>>>> think the docs are citing that one delegation token is valid for 7
>>>>>>> days, but
>>>>>>> it's been a long time since writing/testing that code.
>>>>>>>
>>>>>>> - Josh
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>>>>>>
>>>>>>> https://accumulo.apache.org/1.8/accumulo_user_manual.html#_server_configuration_2
>>>>>>>
>>>>>>> On 7/11/17 1:25 AM, James Srinivasan wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks both. I can't (easily) upgrade beyond 1.7.0, but have raised
>>>>>>>> a
>>>>>>>> support case with our Hadoop distribution vendor.
>>>>>>>>
>>>>>>>> I'm not (yet) worried about expiration with MapReduce - for now I'll
>>>>>>>> try to keep such jobs to under 24h! Outside MR, sounds like I just
>>>>>>>> need to periodically call
>>>>>>>> UserGroupInformation.checkTGTAndReloginFromKeytab like
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> https://github.com/apache/accumulo/blob/master/server/base/src/main/java/org/apache/accumulo/server/security/SecurityUtil.java#L121
>>>>>>>>
>>>>>>>> Or is the TGT associated with an Accumulo KerberosToken separate?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> James
>>>>>>>>
>>>>>>>> On 11 July 2017 at 02:59, Josh Elser <[hidden email]> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> No, you are (likely) not running into ACCUMULO-4069. What you've
>>>>>>>>> described sounds like your client's ticket expired. Accumulo does
>>>>>>>>> not
>>>>>>>>> spawn any ticket renewal on the behalf of clients.
>>>>>>>>>
>>>>>>>>> Hadoop's UGI code will automatically spawn a renewal thread when
>>>>>>>>> you
>>>>>>>>> log in using a ticket cache. This does not happen automatically
>>>>>>>>> when
>>>>>>>>> you use a keytab (I have no explanation as to why this is). This is
>>>>>>>>> the most likely cause of your error and something you need to
>>>>>>>>> correct
>>>>>>>>> in your application (spawn a thread to renew your application's
>>>>>>>>> ticket).
>>>>>>>>>
>>>>>>>>> If you are using MapReduce, you have yet another layer of
>>>>>>>>> indirection
>>>>>>>>> with DelegationTokens, but that's probably not what you're seeing
>>>>>>>>> (as
>>>>>>>>> DelegationTokens don't actually have a Kerberos TGT).
>>>>>>>>>
>>>>>>>>> On Mon, Jul 10, 2017 at 5:42 PM, Christopher <[hidden email]>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It certainly sounds like the same issue. I'd recommend upgrading
>>>>>>>>>> to
>>>>>>>>>> the
>>>>>>>>>> latest 1.7.3 (currently the latest 1.7 version) to include all the
>>>>>>>>>> bugs
>>>>>>>>>> we've found and fixed in that release line.
>>>>>>>>>>
>>>>>>>>>> On Mon, Jul 10, 2017 at 5:50 AM James Srinivasan
>>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I'm using Accumulo 1.7.0 and finding that after some period of
>>>>>>>>>>> time
>>>>>>>>>>> (>8 hours, <3 days - happened over the weekend) my ingest fails
>>>>>>>>>>> with
>>>>>>>>>>> errors regarding "Failed to find any Kerberos tgt". My guess is
>>>>>>>>>>> that
>>>>>>>>>>> the ticket from the keytab has expired, and needs to be renewed -
>>>>>>>>>>> from
>>>>>>>>>>> memory, I had seen a Kerberos tgt renewer thread running in my
>>>>>>>>>>> client,
>>>>>>>>>>> so assumed it happened automagically. Is that the case? Perhaps I
>>>>>>>>>>> am
>>>>>>>>>>> hitting this bug?
>>>>>>>>>>> https://issues.apache.org/jira/browse/ACCUMULO-4069
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> James
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> busbey
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Kerberos ticket renewal

Josh Elser-2
Aha! That's an interesting wrinkle :)

I have more experience with NiFi's use of Kerberos than I care to admit
(due to some folks who work in the physical office I do); I'm not aware
of anything that NiFi does which would cause problems, but that may be a
relevant detail.

After I thought about it some more (to your #2 point): there's a little
failsafe in the Accumulo client implementation that, upon a SASL
authentication failure, it will attempt a relogin via Kerberos. This
should "catch" the cases where your client application is using a ticket
cache (because convention on the ticket cache location lets the jGSS
client library in Java itself do the relogin whereas Java doesn't know
which keytab to use). Still though -- a thread as you describe in #1
should have an equivalent net-effect..

On 7/13/17 11:45 AM, James Srinivasan wrote:

> Thanks, just checked that and it does seem renewable (tested using
> kinit -R). I'm running my code in two separate scenarios:
>
> 1) As part of a NiFi processor, which currently makes multiple
> Accumulo connections using the same keytab, each of which currently
> has a separate renewer thread
> 2) As part of a simple command line application - this seems to have
> no problem running for > 10 hours (even before I added the periodic
> renewal code)
>
> Will add extra logging to #2 and try to shorten the expiry from 10
> hours to 1 so I can see any difference in output.
>
> James
>
> On 13 July 2017 at 16:05, Josh Elser <[hidden email]> wrote:
>> It also may be worth mentioning to check the principal's configuration that
>> you're using in your client. Depending on which you're using and how it was
>> created, it may not actually support renewals.
>>
>> A quick test is to just `kinit` and then `kinit -R`. You can view the
>> explicit "configuration" for a principal using the `kadmin` console and the
>> `getprinc <principal>` command. Be sure to check the krbtgt/<REALM>
>> principal as well:
>>
>> e.g.
>>
>> kadmin.local:  getprinc jelser
>> Principal: [hidden email]
>> Maximum ticket life: 1 day 00:00:00
>> Maximum renewable life: 7 days 00:00:00
>>
>> kadmin.local:  getprinc krbtgt/EXAMPLE.COM
>> Principal: krbtgt/[hidden email]
>> Maximum ticket life: 1 day 00:00:00
>> Maximum renewable life: 7 days 00:00:00
>>
>> If the krbtgt/$REALM principal does not have a non-zero renewable lifetime,
>> any other principals created in that realm would also not be allowed to be
>> renewed. Since you have the working "service" principals, you can
>> cross-check those.
>>
>> On 7/13/17 10:56 AM, James Srinivasan wrote:
>>>
>>> Yup, I am indeed on HDP - thanks for the link. The services do log GSS
>>> exceptions every ten hours, but seem to sufficiently recover
>>> themselves. Having turned up logging on my client:
>>>
>>> 1) On client start, I see hadoop login messages
>>> 2) After 8 hours (0.8*10 hours) when the renewal is expected to take
>>> place, I don't see any hadoop login messages
>>> 3) After 10 hours, I see GSS exceptions
>>> 4) After each GSS exception, I see an attempt to renew but using
>>> ticket cache, rather than keytab.
>>>
>>> Currently working on shortening the 10 hour expiry time so I can catch
>>> it in a debugger!
>>>
>>> Thanks,
>>>
>>> James
>>>
>>>
>>> On 13 July 2017 at 15:20, Josh Elser <[hidden email]> wrote:
>>>>
>>>> If you're using Hortonworks' HDP, you would probably benefit from
>>>> https://github.com/hortonworks/accumulo
>>>>
>>>> There is likely a git-tag for the exact version that you're running. The
>>>> line numbers would match there.
>>>>
>>>> To be clear, if your services (e.g. TabletServers) aren't failing after
>>>> 10hrs, you're not running into ACCUMULO-4069. Given my (limited)
>>>> understanding, your problem is purely client-side. It's possible that the
>>>> client-side RPC implementation isn't correctly handling the ticket
>>>> re-login,
>>>> but I know there is specifically code in there to handle the re-login
>>>> case.
>>>>
>>>> The next step would be getting some debug logging from your application
>>>> around UserGroupInformation or the JDK itself, or just spin up a trivial
>>>> example with a small relogin window to reproduce the problem.
>>>>
>>>> On 7/12/17 3:48 PM, James Srinivasan wrote:
>>>>>
>>>>>
>>>>> Yup, I'm going to spin up a vanilla 1.7.0 (maybe newer) install too to
>>>>> see if it behaves any differently. There is at least one patch
>>>>> included in their distro that isn't in the formal documentation, plus
>>>>> it makes matching line numbers in logs to src code rather difficult.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> James
>>>>>
>>>>> On 12 July 2017 at 20:37, Sean Busbey <[hidden email]> wrote:
>>>>>>
>>>>>>
>>>>>> Hi James!
>>>>>>
>>>>>> It sounds like you may need to chase things down with your vendor,
>>>>>> since the precise combination of patches included will make looking at
>>>>>> things hard for the community.
>>>>>>
>>>>>> On Wed, Jul 12, 2017 at 11:01 AM, James Srinivasan
>>>>>> <[hidden email]> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> So I've fired off a thread to perform the periodic
>>>>>>> checkTGTAndReloginFromKeytab call which seems to be running, but the
>>>>>>> connection still fails with GSS errors after precisely 10 hours.
>>>>>>>
>>>>>>> While I am running 1.7.0, it seems the vendor included the
>>>>>>> ACCUMULO-4069 patch, and immediately after the exception is thrown I
>>>>>>> see a log entry "Performing ticket-cache-based Kerberos re-login".
>>>>>>> However, it should be using a keytab - have turned up the logging to
>>>>>>> 11 and will leave running overnight...
>>>>>>>
>>>>>>> James
>>>>>>>
>>>>>>> On 11 July 2017 at 16:17, Josh Elser <[hidden email]> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Nope, you've got it exactly right! That's the code I would've pointed
>>>>>>>> you at
>>>>>>>> to copy :)
>>>>>>>>
>>>>>>>> If/when you do get to long-running MR jobs, see the
>>>>>>>> "general.delegation.token.*" configuration properties in this
>>>>>>>> table[1].
>>>>>>>> I
>>>>>>>> think the docs are citing that one delegation token is valid for 7
>>>>>>>> days, but
>>>>>>>> it's been a long time since writing/testing that code.
>>>>>>>>
>>>>>>>> - Josh
>>>>>>>>
>>>>>>>> [1]
>>>>>>>>
>>>>>>>>
>>>>>>>> https://accumulo.apache.org/1.8/accumulo_user_manual.html#_server_configuration_2
>>>>>>>>
>>>>>>>> On 7/11/17 1:25 AM, James Srinivasan wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks both. I can't (easily) upgrade beyond 1.7.0, but have raised
>>>>>>>>> a
>>>>>>>>> support case with our Hadoop distribution vendor.
>>>>>>>>>
>>>>>>>>> I'm not (yet) worried about expiration with MapReduce - for now I'll
>>>>>>>>> try to keep such jobs to under 24h! Outside MR, sounds like I just
>>>>>>>>> need to periodically call
>>>>>>>>> UserGroupInformation.checkTGTAndReloginFromKeytab like
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://github.com/apache/accumulo/blob/master/server/base/src/main/java/org/apache/accumulo/server/security/SecurityUtil.java#L121
>>>>>>>>>
>>>>>>>>> Or is the TGT associated with an Accumulo KerberosToken separate?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> James
>>>>>>>>>
>>>>>>>>> On 11 July 2017 at 02:59, Josh Elser <[hidden email]> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> No, you are (likely) not running into ACCUMULO-4069. What you've
>>>>>>>>>> described sounds like your client's ticket expired. Accumulo does
>>>>>>>>>> not
>>>>>>>>>> spawn any ticket renewal on the behalf of clients.
>>>>>>>>>>
>>>>>>>>>> Hadoop's UGI code will automatically spawn a renewal thread when
>>>>>>>>>> you
>>>>>>>>>> log in using a ticket cache. This does not happen automatically
>>>>>>>>>> when
>>>>>>>>>> you use a keytab (I have no explanation as to why this is). This is
>>>>>>>>>> the most likely cause of your error and something you need to
>>>>>>>>>> correct
>>>>>>>>>> in your application (spawn a thread to renew your application's
>>>>>>>>>> ticket).
>>>>>>>>>>
>>>>>>>>>> If you are using MapReduce, you have yet another layer of
>>>>>>>>>> indirection
>>>>>>>>>> with DelegationTokens, but that's probably not what you're seeing
>>>>>>>>>> (as
>>>>>>>>>> DelegationTokens don't actually have a Kerberos TGT).
>>>>>>>>>>
>>>>>>>>>> On Mon, Jul 10, 2017 at 5:42 PM, Christopher <[hidden email]>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> It certainly sounds like the same issue. I'd recommend upgrading
>>>>>>>>>>> to
>>>>>>>>>>> the
>>>>>>>>>>> latest 1.7.3 (currently the latest 1.7 version) to include all the
>>>>>>>>>>> bugs
>>>>>>>>>>> we've found and fixed in that release line.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jul 10, 2017 at 5:50 AM James Srinivasan
>>>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I'm using Accumulo 1.7.0 and finding that after some period of
>>>>>>>>>>>> time
>>>>>>>>>>>> (>8 hours, <3 days - happened over the weekend) my ingest fails
>>>>>>>>>>>> with
>>>>>>>>>>>> errors regarding "Failed to find any Kerberos tgt". My guess is
>>>>>>>>>>>> that
>>>>>>>>>>>> the ticket from the keytab has expired, and needs to be renewed -
>>>>>>>>>>>> from
>>>>>>>>>>>> memory, I had seen a Kerberos tgt renewer thread running in my
>>>>>>>>>>>> client,
>>>>>>>>>>>> so assumed it happened automagically. Is that the case? Perhaps I
>>>>>>>>>>>> am
>>>>>>>>>>>> hitting this bug?
>>>>>>>>>>>> https://issues.apache.org/jira/browse/ACCUMULO-4069
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> James
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> busbey
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Kerberos ticket renewal

James Srinivasan
So when my code runs in a NiFi processor, the initial keytab
authentication works fine but following that it seems to think keytabs
aren't in use (UserGroupInformation.getCurrentUser.isFromKeytab is
false), which explains why the renewal code never actually runs and
why re-login is attempted using the ticket cache after the GSS
exception. Over to the NiFi list I think...

Making some progress!

On 13 July 2017 at 18:28, Josh Elser <[hidden email]> wrote:

> Aha! That's an interesting wrinkle :)
>
> I have more experience with NiFi's use of Kerberos than I care to admit (due
> to some folks who work in the physical office I do); I'm not aware of
> anything that NiFi does which would cause problems, but that may be a
> relevant detail.
>
> After I thought about it some more (to your #2 point): there's a little
> failsafe in the Accumulo client implementation that, upon a SASL
> authentication failure, it will attempt a relogin via Kerberos. This should
> "catch" the cases where your client application is using a ticket cache
> (because convention on the ticket cache location lets the jGSS client
> library in Java itself do the relogin whereas Java doesn't know which keytab
> to use). Still though -- a thread as you describe in #1 should have an
> equivalent net-effect..
>
> On 7/13/17 11:45 AM, James Srinivasan wrote:
>>
>> Thanks, just checked that and it does seem renewable (tested using
>> kinit -R). I'm running my code in two separate scenarios:
>>
>> 1) As part of a NiFi processor, which currently makes multiple
>> Accumulo connections using the same keytab, each of which currently
>> has a separate renewer thread
>> 2) As part of a simple command line application - this seems to have
>> no problem running for > 10 hours (even before I added the periodic
>> renewal code)
>>
>> Will add extra logging to #2 and try to shorten the expiry from 10
>> hours to 1 so I can see any difference in output.
>>
>> James
>>
>> On 13 July 2017 at 16:05, Josh Elser <[hidden email]> wrote:
>>>
>>> It also may be worth mentioning to check the principal's configuration
>>> that
>>> you're using in your client. Depending on which you're using and how it
>>> was
>>> created, it may not actually support renewals.
>>>
>>> A quick test is to just `kinit` and then `kinit -R`. You can view the
>>> explicit "configuration" for a principal using the `kadmin` console and
>>> the
>>> `getprinc <principal>` command. Be sure to check the krbtgt/<REALM>
>>> principal as well:
>>>
>>> e.g.
>>>
>>> kadmin.local:  getprinc jelser
>>> Principal: [hidden email]
>>> Maximum ticket life: 1 day 00:00:00
>>> Maximum renewable life: 7 days 00:00:00
>>>
>>> kadmin.local:  getprinc krbtgt/EXAMPLE.COM
>>> Principal: krbtgt/[hidden email]
>>> Maximum ticket life: 1 day 00:00:00
>>> Maximum renewable life: 7 days 00:00:00
>>>
>>> If the krbtgt/$REALM principal does not have a non-zero renewable
>>> lifetime,
>>> any other principals created in that realm would also not be allowed to
>>> be
>>> renewed. Since you have the working "service" principals, you can
>>> cross-check those.
>>>
>>> On 7/13/17 10:56 AM, James Srinivasan wrote:
>>>>
>>>>
>>>> Yup, I am indeed on HDP - thanks for the link. The services do log GSS
>>>> exceptions every ten hours, but seem to sufficiently recover
>>>> themselves. Having turned up logging on my client:
>>>>
>>>> 1) On client start, I see hadoop login messages
>>>> 2) After 8 hours (0.8*10 hours) when the renewal is expected to take
>>>> place, I don't see any hadoop login messages
>>>> 3) After 10 hours, I see GSS exceptions
>>>> 4) After each GSS exception, I see an attempt to renew but using
>>>> ticket cache, rather than keytab.
>>>>
>>>> Currently working on shortening the 10 hour expiry time so I can catch
>>>> it in a debugger!
>>>>
>>>> Thanks,
>>>>
>>>> James
>>>>
>>>>
>>>> On 13 July 2017 at 15:20, Josh Elser <[hidden email]> wrote:
>>>>>
>>>>>
>>>>> If you're using Hortonworks' HDP, you would probably benefit from
>>>>> https://github.com/hortonworks/accumulo
>>>>>
>>>>> There is likely a git-tag for the exact version that you're running.
>>>>> The
>>>>> line numbers would match there.
>>>>>
>>>>> To be clear, if your services (e.g. TabletServers) aren't failing after
>>>>> 10hrs, you're not running into ACCUMULO-4069. Given my (limited)
>>>>> understanding, your problem is purely client-side. It's possible that
>>>>> the
>>>>> client-side RPC implementation isn't correctly handling the ticket
>>>>> re-login,
>>>>> but I know there is specifically code in there to handle the re-login
>>>>> case.
>>>>>
>>>>> The next step would be getting some debug logging from your application
>>>>> around UserGroupInformation or the JDK itself, or just spin up a
>>>>> trivial
>>>>> example with a small relogin window to reproduce the problem.
>>>>>
>>>>> On 7/12/17 3:48 PM, James Srinivasan wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Yup, I'm going to spin up a vanilla 1.7.0 (maybe newer) install too to
>>>>>> see if it behaves any differently. There is at least one patch
>>>>>> included in their distro that isn't in the formal documentation, plus
>>>>>> it makes matching line numbers in logs to src code rather difficult.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> James
>>>>>>
>>>>>> On 12 July 2017 at 20:37, Sean Busbey <[hidden email]> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi James!
>>>>>>>
>>>>>>> It sounds like you may need to chase things down with your vendor,
>>>>>>> since the precise combination of patches included will make looking
>>>>>>> at
>>>>>>> things hard for the community.
>>>>>>>
>>>>>>> On Wed, Jul 12, 2017 at 11:01 AM, James Srinivasan
>>>>>>> <[hidden email]> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> So I've fired off a thread to perform the periodic
>>>>>>>> checkTGTAndReloginFromKeytab call which seems to be running, but the
>>>>>>>> connection still fails with GSS errors after precisely 10 hours.
>>>>>>>>
>>>>>>>> While I am running 1.7.0, it seems the vendor included the
>>>>>>>> ACCUMULO-4069 patch, and immediately after the exception is thrown I
>>>>>>>> see a log entry "Performing ticket-cache-based Kerberos re-login".
>>>>>>>> However, it should be using a keytab - have turned up the logging to
>>>>>>>> 11 and will leave running overnight...
>>>>>>>>
>>>>>>>> James
>>>>>>>>
>>>>>>>> On 11 July 2017 at 16:17, Josh Elser <[hidden email]> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Nope, you've got it exactly right! That's the code I would've
>>>>>>>>> pointed
>>>>>>>>> you at
>>>>>>>>> to copy :)
>>>>>>>>>
>>>>>>>>> If/when you do get to long-running MR jobs, see the
>>>>>>>>> "general.delegation.token.*" configuration properties in this
>>>>>>>>> table[1].
>>>>>>>>> I
>>>>>>>>> think the docs are citing that one delegation token is valid for 7
>>>>>>>>> days, but
>>>>>>>>> it's been a long time since writing/testing that code.
>>>>>>>>>
>>>>>>>>> - Josh
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://accumulo.apache.org/1.8/accumulo_user_manual.html#_server_configuration_2
>>>>>>>>>
>>>>>>>>> On 7/11/17 1:25 AM, James Srinivasan wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks both. I can't (easily) upgrade beyond 1.7.0, but have
>>>>>>>>>> raised
>>>>>>>>>> a
>>>>>>>>>> support case with our Hadoop distribution vendor.
>>>>>>>>>>
>>>>>>>>>> I'm not (yet) worried about expiration with MapReduce - for now
>>>>>>>>>> I'll
>>>>>>>>>> try to keep such jobs to under 24h! Outside MR, sounds like I just
>>>>>>>>>> need to periodically call
>>>>>>>>>> UserGroupInformation.checkTGTAndReloginFromKeytab like
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://github.com/apache/accumulo/blob/master/server/base/src/main/java/org/apache/accumulo/server/security/SecurityUtil.java#L121
>>>>>>>>>>
>>>>>>>>>> Or is the TGT associated with an Accumulo KerberosToken separate?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> James
>>>>>>>>>>
>>>>>>>>>> On 11 July 2017 at 02:59, Josh Elser <[hidden email]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> No, you are (likely) not running into ACCUMULO-4069. What you've
>>>>>>>>>>> described sounds like your client's ticket expired. Accumulo does
>>>>>>>>>>> not
>>>>>>>>>>> spawn any ticket renewal on the behalf of clients.
>>>>>>>>>>>
>>>>>>>>>>> Hadoop's UGI code will automatically spawn a renewal thread when
>>>>>>>>>>> you
>>>>>>>>>>> log in using a ticket cache. This does not happen automatically
>>>>>>>>>>> when
>>>>>>>>>>> you use a keytab (I have no explanation as to why this is). This
>>>>>>>>>>> is
>>>>>>>>>>> the most likely cause of your error and something you need to
>>>>>>>>>>> correct
>>>>>>>>>>> in your application (spawn a thread to renew your application's
>>>>>>>>>>> ticket).
>>>>>>>>>>>
>>>>>>>>>>> If you are using MapReduce, you have yet another layer of
>>>>>>>>>>> indirection
>>>>>>>>>>> with DelegationTokens, but that's probably not what you're seeing
>>>>>>>>>>> (as
>>>>>>>>>>> DelegationTokens don't actually have a Kerberos TGT).
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jul 10, 2017 at 5:42 PM, Christopher
>>>>>>>>>>> <[hidden email]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> It certainly sounds like the same issue. I'd recommend upgrading
>>>>>>>>>>>> to
>>>>>>>>>>>> the
>>>>>>>>>>>> latest 1.7.3 (currently the latest 1.7 version) to include all
>>>>>>>>>>>> the
>>>>>>>>>>>> bugs
>>>>>>>>>>>> we've found and fixed in that release line.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jul 10, 2017 at 5:50 AM James Srinivasan
>>>>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm using Accumulo 1.7.0 and finding that after some period of
>>>>>>>>>>>>> time
>>>>>>>>>>>>> (>8 hours, <3 days - happened over the weekend) my ingest fails
>>>>>>>>>>>>> with
>>>>>>>>>>>>> errors regarding "Failed to find any Kerberos tgt". My guess is
>>>>>>>>>>>>> that
>>>>>>>>>>>>> the ticket from the keytab has expired, and needs to be renewed
>>>>>>>>>>>>> -
>>>>>>>>>>>>> from
>>>>>>>>>>>>> memory, I had seen a Kerberos tgt renewer thread running in my
>>>>>>>>>>>>> client,
>>>>>>>>>>>>> so assumed it happened automagically. Is that the case? Perhaps
>>>>>>>>>>>>> I
>>>>>>>>>>>>> am
>>>>>>>>>>>>> hitting this bug?
>>>>>>>>>>>>> https://issues.apache.org/jira/browse/ACCUMULO-4069
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> James
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> busbey
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Kerberos ticket renewal

James Srinivasan
Hmm, so it seems updating the Hadoop version used by my processor from
2.6.0 to 2.7.3 has fixed the problem. Testing a little more just to
make sure...

On 14 July 2017 at 13:39, James Srinivasan <[hidden email]> wrote:

> So when my code runs in a NiFi processor, the initial keytab
> authentication works fine but following that it seems to think keytabs
> aren't in use (UserGroupInformation.getCurrentUser.isFromKeytab is
> false), which explains why the renewal code never actually runs and
> why re-login is attempted using the ticket cache after the GSS
> exception. Over to the NiFi list I think...
>
> Making some progress!
>
> On 13 July 2017 at 18:28, Josh Elser <[hidden email]> wrote:
>> Aha! That's an interesting wrinkle :)
>>
>> I have more experience with NiFi's use of Kerberos than I care to admit (due
>> to some folks who work in the physical office I do); I'm not aware of
>> anything that NiFi does which would cause problems, but that may be a
>> relevant detail.
>>
>> After I thought about it some more (to your #2 point): there's a little
>> failsafe in the Accumulo client implementation that, upon a SASL
>> authentication failure, it will attempt a relogin via Kerberos. This should
>> "catch" the cases where your client application is using a ticket cache
>> (because convention on the ticket cache location lets the jGSS client
>> library in Java itself do the relogin whereas Java doesn't know which keytab
>> to use). Still though -- a thread as you describe in #1 should have an
>> equivalent net-effect..
>>
>> On 7/13/17 11:45 AM, James Srinivasan wrote:
>>>
>>> Thanks, just checked that and it does seem renewable (tested using
>>> kinit -R). I'm running my code in two separate scenarios:
>>>
>>> 1) As part of a NiFi processor, which currently makes multiple
>>> Accumulo connections using the same keytab, each of which currently
>>> has a separate renewer thread
>>> 2) As part of a simple command line application - this seems to have
>>> no problem running for > 10 hours (even before I added the periodic
>>> renewal code)
>>>
>>> Will add extra logging to #2 and try to shorten the expiry from 10
>>> hours to 1 so I can see any difference in output.
>>>
>>> James
>>>
>>> On 13 July 2017 at 16:05, Josh Elser <[hidden email]> wrote:
>>>>
>>>> It also may be worth mentioning to check the principal's configuration
>>>> that
>>>> you're using in your client. Depending on which you're using and how it
>>>> was
>>>> created, it may not actually support renewals.
>>>>
>>>> A quick test is to just `kinit` and then `kinit -R`. You can view the
>>>> explicit "configuration" for a principal using the `kadmin` console and
>>>> the
>>>> `getprinc <principal>` command. Be sure to check the krbtgt/<REALM>
>>>> principal as well:
>>>>
>>>> e.g.
>>>>
>>>> kadmin.local:  getprinc jelser
>>>> Principal: [hidden email]
>>>> Maximum ticket life: 1 day 00:00:00
>>>> Maximum renewable life: 7 days 00:00:00
>>>>
>>>> kadmin.local:  getprinc krbtgt/EXAMPLE.COM
>>>> Principal: krbtgt/[hidden email]
>>>> Maximum ticket life: 1 day 00:00:00
>>>> Maximum renewable life: 7 days 00:00:00
>>>>
>>>> If the krbtgt/$REALM principal does not have a non-zero renewable
>>>> lifetime,
>>>> any other principals created in that realm would also not be allowed to
>>>> be
>>>> renewed. Since you have the working "service" principals, you can
>>>> cross-check those.
>>>>
>>>> On 7/13/17 10:56 AM, James Srinivasan wrote:
>>>>>
>>>>>
>>>>> Yup, I am indeed on HDP - thanks for the link. The services do log GSS
>>>>> exceptions every ten hours, but seem to sufficiently recover
>>>>> themselves. Having turned up logging on my client:
>>>>>
>>>>> 1) On client start, I see hadoop login messages
>>>>> 2) After 8 hours (0.8*10 hours) when the renewal is expected to take
>>>>> place, I don't see any hadoop login messages
>>>>> 3) After 10 hours, I see GSS exceptions
>>>>> 4) After each GSS exception, I see an attempt to renew but using
>>>>> ticket cache, rather than keytab.
>>>>>
>>>>> Currently working on shortening the 10 hour expiry time so I can catch
>>>>> it in a debugger!
>>>>>
>>>>> Thanks,
>>>>>
>>>>> James
>>>>>
>>>>>
>>>>> On 13 July 2017 at 15:20, Josh Elser <[hidden email]> wrote:
>>>>>>
>>>>>>
>>>>>> If you're using Hortonworks' HDP, you would probably benefit from
>>>>>> https://github.com/hortonworks/accumulo
>>>>>>
>>>>>> There is likely a git-tag for the exact version that you're running.
>>>>>> The
>>>>>> line numbers would match there.
>>>>>>
>>>>>> To be clear, if your services (e.g. TabletServers) aren't failing after
>>>>>> 10hrs, you're not running into ACCUMULO-4069. Given my (limited)
>>>>>> understanding, your problem is purely client-side. It's possible that
>>>>>> the
>>>>>> client-side RPC implementation isn't correctly handling the ticket
>>>>>> re-login,
>>>>>> but I know there is specifically code in there to handle the re-login
>>>>>> case.
>>>>>>
>>>>>> The next step would be getting some debug logging from your application
>>>>>> around UserGroupInformation or the JDK itself, or just spin up a
>>>>>> trivial
>>>>>> example with a small relogin window to reproduce the problem.
>>>>>>
>>>>>> On 7/12/17 3:48 PM, James Srinivasan wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Yup, I'm going to spin up a vanilla 1.7.0 (maybe newer) install too to
>>>>>>> see if it behaves any differently. There is at least one patch
>>>>>>> included in their distro that isn't in the formal documentation, plus
>>>>>>> it makes matching line numbers in logs to src code rather difficult.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> James
>>>>>>>
>>>>>>> On 12 July 2017 at 20:37, Sean Busbey <[hidden email]> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi James!
>>>>>>>>
>>>>>>>> It sounds like you may need to chase things down with your vendor,
>>>>>>>> since the precise combination of patches included will make looking
>>>>>>>> at
>>>>>>>> things hard for the community.
>>>>>>>>
>>>>>>>> On Wed, Jul 12, 2017 at 11:01 AM, James Srinivasan
>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> So I've fired off a thread to perform the periodic
>>>>>>>>> checkTGTAndReloginFromKeytab call which seems to be running, but the
>>>>>>>>> connection still fails with GSS errors after precisely 10 hours.
>>>>>>>>>
>>>>>>>>> While I am running 1.7.0, it seems the vendor included the
>>>>>>>>> ACCUMULO-4069 patch, and immediately after the exception is thrown I
>>>>>>>>> see a log entry "Performing ticket-cache-based Kerberos re-login".
>>>>>>>>> However, it should be using a keytab - have turned up the logging to
>>>>>>>>> 11 and will leave running overnight...
>>>>>>>>>
>>>>>>>>> James
>>>>>>>>>
>>>>>>>>> On 11 July 2017 at 16:17, Josh Elser <[hidden email]> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Nope, you've got it exactly right! That's the code I would've
>>>>>>>>>> pointed
>>>>>>>>>> you at
>>>>>>>>>> to copy :)
>>>>>>>>>>
>>>>>>>>>> If/when you do get to long-running MR jobs, see the
>>>>>>>>>> "general.delegation.token.*" configuration properties in this
>>>>>>>>>> table[1].
>>>>>>>>>> I
>>>>>>>>>> think the docs are citing that one delegation token is valid for 7
>>>>>>>>>> days, but
>>>>>>>>>> it's been a long time since writing/testing that code.
>>>>>>>>>>
>>>>>>>>>> - Josh
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://accumulo.apache.org/1.8/accumulo_user_manual.html#_server_configuration_2
>>>>>>>>>>
>>>>>>>>>> On 7/11/17 1:25 AM, James Srinivasan wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks both. I can't (easily) upgrade beyond 1.7.0, but have
>>>>>>>>>>> raised
>>>>>>>>>>> a
>>>>>>>>>>> support case with our Hadoop distribution vendor.
>>>>>>>>>>>
>>>>>>>>>>> I'm not (yet) worried about expiration with MapReduce - for now
>>>>>>>>>>> I'll
>>>>>>>>>>> try to keep such jobs to under 24h! Outside MR, sounds like I just
>>>>>>>>>>> need to periodically call
>>>>>>>>>>> UserGroupInformation.checkTGTAndReloginFromKeytab like
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://github.com/apache/accumulo/blob/master/server/base/src/main/java/org/apache/accumulo/server/security/SecurityUtil.java#L121
>>>>>>>>>>>
>>>>>>>>>>> Or is the TGT associated with an Accumulo KerberosToken separate?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> James
>>>>>>>>>>>
>>>>>>>>>>> On 11 July 2017 at 02:59, Josh Elser <[hidden email]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> No, you are (likely) not running into ACCUMULO-4069. What you've
>>>>>>>>>>>> described sounds like your client's ticket expired. Accumulo does
>>>>>>>>>>>> not
>>>>>>>>>>>> spawn any ticket renewal on the behalf of clients.
>>>>>>>>>>>>
>>>>>>>>>>>> Hadoop's UGI code will automatically spawn a renewal thread when
>>>>>>>>>>>> you
>>>>>>>>>>>> log in using a ticket cache. This does not happen automatically
>>>>>>>>>>>> when
>>>>>>>>>>>> you use a keytab (I have no explanation as to why this is). This
>>>>>>>>>>>> is
>>>>>>>>>>>> the most likely cause of your error and something you need to
>>>>>>>>>>>> correct
>>>>>>>>>>>> in your application (spawn a thread to renew your application's
>>>>>>>>>>>> ticket).
>>>>>>>>>>>>
>>>>>>>>>>>> If you are using MapReduce, you have yet another layer of
>>>>>>>>>>>> indirection
>>>>>>>>>>>> with DelegationTokens, but that's probably not what you're seeing
>>>>>>>>>>>> (as
>>>>>>>>>>>> DelegationTokens don't actually have a Kerberos TGT).
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jul 10, 2017 at 5:42 PM, Christopher
>>>>>>>>>>>> <[hidden email]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> It certainly sounds like the same issue. I'd recommend upgrading
>>>>>>>>>>>>> to
>>>>>>>>>>>>> the
>>>>>>>>>>>>> latest 1.7.3 (currently the latest 1.7 version) to include all
>>>>>>>>>>>>> the
>>>>>>>>>>>>> bugs
>>>>>>>>>>>>> we've found and fixed in that release line.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jul 10, 2017 at 5:50 AM James Srinivasan
>>>>>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm using Accumulo 1.7.0 and finding that after some period of
>>>>>>>>>>>>>> time
>>>>>>>>>>>>>> (>8 hours, <3 days - happened over the weekend) my ingest fails
>>>>>>>>>>>>>> with
>>>>>>>>>>>>>> errors regarding "Failed to find any Kerberos tgt". My guess is
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> the ticket from the keytab has expired, and needs to be renewed
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>> from
>>>>>>>>>>>>>> memory, I had seen a Kerberos tgt renewer thread running in my
>>>>>>>>>>>>>> client,
>>>>>>>>>>>>>> so assumed it happened automagically. Is that the case? Perhaps
>>>>>>>>>>>>>> I
>>>>>>>>>>>>>> am
>>>>>>>>>>>>>> hitting this bug?
>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/ACCUMULO-4069
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> James
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> busbey
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Kerberos ticket renewal

Josh Elser-2
Hah! Maybe a bug in the Hadoop client code then :)

Thanks for taking the time to post all of your findings to the list.
This will be a very good thread for future people to refer to.

On 7/14/17 3:56 PM, James Srinivasan wrote:

> Hmm, so it seems updating the Hadoop version used by my processor from
> 2.6.0 to 2.7.3 has fixed the problem. Testing a little more just to
> make sure...
>
> On 14 July 2017 at 13:39, James Srinivasan <[hidden email]> wrote:
>> So when my code runs in a NiFi processor, the initial keytab
>> authentication works fine but following that it seems to think keytabs
>> aren't in use (UserGroupInformation.getCurrentUser.isFromKeytab is
>> false), which explains why the renewal code never actually runs and
>> why re-login is attempted using the ticket cache after the GSS
>> exception. Over to the NiFi list I think...
>>
>> Making some progress!
>>
>> On 13 July 2017 at 18:28, Josh Elser <[hidden email]> wrote:
>>> Aha! That's an interesting wrinkle :)
>>>
>>> I have more experience with NiFi's use of Kerberos than I care to admit (due
>>> to some folks who work in the physical office I do); I'm not aware of
>>> anything that NiFi does which would cause problems, but that may be a
>>> relevant detail.
>>>
>>> After I thought about it some more (to your #2 point): there's a little
>>> failsafe in the Accumulo client implementation that, upon a SASL
>>> authentication failure, it will attempt a relogin via Kerberos. This should
>>> "catch" the cases where your client application is using a ticket cache
>>> (because convention on the ticket cache location lets the jGSS client
>>> library in Java itself do the relogin whereas Java doesn't know which keytab
>>> to use). Still though -- a thread as you describe in #1 should have an
>>> equivalent net-effect..
>>>
>>> On 7/13/17 11:45 AM, James Srinivasan wrote:
>>>>
>>>> Thanks, just checked that and it does seem renewable (tested using
>>>> kinit -R). I'm running my code in two separate scenarios:
>>>>
>>>> 1) As part of a NiFi processor, which currently makes multiple
>>>> Accumulo connections using the same keytab, each of which currently
>>>> has a separate renewer thread
>>>> 2) As part of a simple command line application - this seems to have
>>>> no problem running for > 10 hours (even before I added the periodic
>>>> renewal code)
>>>>
>>>> Will add extra logging to #2 and try to shorten the expiry from 10
>>>> hours to 1 so I can see any difference in output.
>>>>
>>>> James
>>>>
>>>> On 13 July 2017 at 16:05, Josh Elser <[hidden email]> wrote:
>>>>>
>>>>> It also may be worth mentioning to check the principal's configuration
>>>>> that
>>>>> you're using in your client. Depending on which you're using and how it
>>>>> was
>>>>> created, it may not actually support renewals.
>>>>>
>>>>> A quick test is to just `kinit` and then `kinit -R`. You can view the
>>>>> explicit "configuration" for a principal using the `kadmin` console and
>>>>> the
>>>>> `getprinc <principal>` command. Be sure to check the krbtgt/<REALM>
>>>>> principal as well:
>>>>>
>>>>> e.g.
>>>>>
>>>>> kadmin.local:  getprinc jelser
>>>>> Principal: [hidden email]
>>>>> Maximum ticket life: 1 day 00:00:00
>>>>> Maximum renewable life: 7 days 00:00:00
>>>>>
>>>>> kadmin.local:  getprinc krbtgt/EXAMPLE.COM
>>>>> Principal: krbtgt/[hidden email]
>>>>> Maximum ticket life: 1 day 00:00:00
>>>>> Maximum renewable life: 7 days 00:00:00
>>>>>
>>>>> If the krbtgt/$REALM principal does not have a non-zero renewable
>>>>> lifetime,
>>>>> any other principals created in that realm would also not be allowed to
>>>>> be
>>>>> renewed. Since you have the working "service" principals, you can
>>>>> cross-check those.
>>>>>
>>>>> On 7/13/17 10:56 AM, James Srinivasan wrote:
>>>>>>
>>>>>>
>>>>>> Yup, I am indeed on HDP - thanks for the link. The services do log GSS
>>>>>> exceptions every ten hours, but seem to sufficiently recover
>>>>>> themselves. Having turned up logging on my client:
>>>>>>
>>>>>> 1) On client start, I see hadoop login messages
>>>>>> 2) After 8 hours (0.8*10 hours) when the renewal is expected to take
>>>>>> place, I don't see any hadoop login messages
>>>>>> 3) After 10 hours, I see GSS exceptions
>>>>>> 4) After each GSS exception, I see an attempt to renew but using
>>>>>> ticket cache, rather than keytab.
>>>>>>
>>>>>> Currently working on shortening the 10 hour expiry time so I can catch
>>>>>> it in a debugger!
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> James
>>>>>>
>>>>>>
>>>>>> On 13 July 2017 at 15:20, Josh Elser <[hidden email]> wrote:
>>>>>>>
>>>>>>>
>>>>>>> If you're using Hortonworks' HDP, you would probably benefit from
>>>>>>> https://github.com/hortonworks/accumulo
>>>>>>>
>>>>>>> There is likely a git-tag for the exact version that you're running.
>>>>>>> The
>>>>>>> line numbers would match there.
>>>>>>>
>>>>>>> To be clear, if your services (e.g. TabletServers) aren't failing after
>>>>>>> 10hrs, you're not running into ACCUMULO-4069. Given my (limited)
>>>>>>> understanding, your problem is purely client-side. It's possible that
>>>>>>> the
>>>>>>> client-side RPC implementation isn't correctly handling the ticket
>>>>>>> re-login,
>>>>>>> but I know there is specifically code in there to handle the re-login
>>>>>>> case.
>>>>>>>
>>>>>>> The next step would be getting some debug logging from your application
>>>>>>> around UserGroupInformation or the JDK itself, or just spin up a
>>>>>>> trivial
>>>>>>> example with a small relogin window to reproduce the problem.
>>>>>>>
>>>>>>> On 7/12/17 3:48 PM, James Srinivasan wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Yup, I'm going to spin up a vanilla 1.7.0 (maybe newer) install too to
>>>>>>>> see if it behaves any differently. There is at least one patch
>>>>>>>> included in their distro that isn't in the formal documentation, plus
>>>>>>>> it makes matching line numbers in logs to src code rather difficult.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> James
>>>>>>>>
>>>>>>>> On 12 July 2017 at 20:37, Sean Busbey <[hidden email]> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi James!
>>>>>>>>>
>>>>>>>>> It sounds like you may need to chase things down with your vendor,
>>>>>>>>> since the precise combination of patches included will make looking
>>>>>>>>> at
>>>>>>>>> things hard for the community.
>>>>>>>>>
>>>>>>>>> On Wed, Jul 12, 2017 at 11:01 AM, James Srinivasan
>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> So I've fired off a thread to perform the periodic
>>>>>>>>>> checkTGTAndReloginFromKeytab call which seems to be running, but the
>>>>>>>>>> connection still fails with GSS errors after precisely 10 hours.
>>>>>>>>>>
>>>>>>>>>> While I am running 1.7.0, it seems the vendor included the
>>>>>>>>>> ACCUMULO-4069 patch, and immediately after the exception is thrown I
>>>>>>>>>> see a log entry "Performing ticket-cache-based Kerberos re-login".
>>>>>>>>>> However, it should be using a keytab - have turned up the logging to
>>>>>>>>>> 11 and will leave running overnight...
>>>>>>>>>>
>>>>>>>>>> James
>>>>>>>>>>
>>>>>>>>>> On 11 July 2017 at 16:17, Josh Elser <[hidden email]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Nope, you've got it exactly right! That's the code I would've
>>>>>>>>>>> pointed
>>>>>>>>>>> you at
>>>>>>>>>>> to copy :)
>>>>>>>>>>>
>>>>>>>>>>> If/when you do get to long-running MR jobs, see the
>>>>>>>>>>> "general.delegation.token.*" configuration properties in this
>>>>>>>>>>> table[1].
>>>>>>>>>>> I
>>>>>>>>>>> think the docs are citing that one delegation token is valid for 7
>>>>>>>>>>> days, but
>>>>>>>>>>> it's been a long time since writing/testing that code.
>>>>>>>>>>>
>>>>>>>>>>> - Josh
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://accumulo.apache.org/1.8/accumulo_user_manual.html#_server_configuration_2
>>>>>>>>>>>
>>>>>>>>>>> On 7/11/17 1:25 AM, James Srinivasan wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks both. I can't (easily) upgrade beyond 1.7.0, but have
>>>>>>>>>>>> raised
>>>>>>>>>>>> a
>>>>>>>>>>>> support case with our Hadoop distribution vendor.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm not (yet) worried about expiration with MapReduce - for now
>>>>>>>>>>>> I'll
>>>>>>>>>>>> try to keep such jobs to under 24h! Outside MR, sounds like I just
>>>>>>>>>>>> need to periodically call
>>>>>>>>>>>> UserGroupInformation.checkTGTAndReloginFromKeytab like
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> https://github.com/apache/accumulo/blob/master/server/base/src/main/java/org/apache/accumulo/server/security/SecurityUtil.java#L121
>>>>>>>>>>>>
>>>>>>>>>>>> Or is the TGT associated with an Accumulo KerberosToken separate?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> James
>>>>>>>>>>>>
>>>>>>>>>>>> On 11 July 2017 at 02:59, Josh Elser <[hidden email]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> No, you are (likely) not running into ACCUMULO-4069. What you've
>>>>>>>>>>>>> described sounds like your client's ticket expired. Accumulo does
>>>>>>>>>>>>> not
>>>>>>>>>>>>> spawn any ticket renewal on the behalf of clients.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hadoop's UGI code will automatically spawn a renewal thread when
>>>>>>>>>>>>> you
>>>>>>>>>>>>> log in using a ticket cache. This does not happen automatically
>>>>>>>>>>>>> when
>>>>>>>>>>>>> you use a keytab (I have no explanation as to why this is). This
>>>>>>>>>>>>> is
>>>>>>>>>>>>> the most likely cause of your error and something you need to
>>>>>>>>>>>>> correct
>>>>>>>>>>>>> in your application (spawn a thread to renew your application's
>>>>>>>>>>>>> ticket).
>>>>>>>>>>>>>
>>>>>>>>>>>>> If you are using MapReduce, you have yet another layer of
>>>>>>>>>>>>> indirection
>>>>>>>>>>>>> with DelegationTokens, but that's probably not what you're seeing
>>>>>>>>>>>>> (as
>>>>>>>>>>>>> DelegationTokens don't actually have a Kerberos TGT).
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jul 10, 2017 at 5:42 PM, Christopher
>>>>>>>>>>>>> <[hidden email]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It certainly sounds like the same issue. I'd recommend upgrading
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> latest 1.7.3 (currently the latest 1.7 version) to include all
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> bugs
>>>>>>>>>>>>>> we've found and fixed in that release line.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jul 10, 2017 at 5:50 AM James Srinivasan
>>>>>>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm using Accumulo 1.7.0 and finding that after some period of
>>>>>>>>>>>>>>> time
>>>>>>>>>>>>>>> (>8 hours, <3 days - happened over the weekend) my ingest fails
>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>> errors regarding "Failed to find any Kerberos tgt". My guess is
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> the ticket from the keytab has expired, and needs to be renewed
>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>> memory, I had seen a Kerberos tgt renewer thread running in my
>>>>>>>>>>>>>>> client,
>>>>>>>>>>>>>>> so assumed it happened automagically. Is that the case? Perhaps
>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>> am
>>>>>>>>>>>>>>> hitting this bug?
>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/ACCUMULO-4069
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> James
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> busbey
Loading...