Like many others, I pounded my head over the configurations - just looking for the slightest problem. I naturally confirmed that both left and right were identical on both nodes, secrets were correct, iptables rules weren't botched in any manner, rp_filter - everything, it was all correct just like the other several hundred nodes I've configured.
When initiating the tunnel from the other side, I saw an interesting error:
117 "tun2128" #149016: STATE_QUICK_I1: initiate
003 "tun2128" #149016: unable to popen up-client command
032 "tun2128" #149016: STATE_QUICK_I1: internal error
Unfortunately, the only reference to this error message was with regard to memory issues. Since this server was fine on memory (and all other active tunnels were "fine"), I directed my attention to the hordes of messages scrolling across my screen when executing ipsec whack --name tun128 --debug-all. Eventually, I found a few more errors that meant absolutely nothing to me:
STF_INTERNAL_ERROR, STF_SUSPEND, and then finally this:
May 4 10:08:56 localhost pluto[25544]: | ******parse ISAKMP Oakley attribute:
May 4 10:08:56 localhost pluto[25544]: | af+type: OAKLEY_GROUP_DESCRIPTION
May 4 10:08:56 localhost pluto[25544]: | length/value: 5
May 4 10:08:56 localhost pluto[25544]: | [5 is OAKLEY_GROUP_MODP1536]
May 4 10:08:56 localhost pluto[25544]: | Oakley Transform 0 accepted
May 4 10:08:56 localhost pluto[25544]: | sender checking NAT-t: 0 and 0
May 4 10:08:56 localhost pluto[25544]: | 0: w->pcw_dead: 0 w->pcw_work: 0 cnt: 1
May 4 10:08:56 localhost pluto[25544]: | asking helper 0 to do build_kenonce op on seq: 135267
May 4 10:08:56 localhost pluto[25544]: | inserting event EVENT_CRYPTO_FAILED, timeout in 300 seconds for #148978
May 4 10:08:56 localhost pluto[25544]: | complete state transition with STF_SUSPEND
A lot of crazy messages sure, but the most important string of all - "asking helper ...". It reminds me of a series of posts that I came across while debugging a ridiculous ISAKMP issue with a Cisco Router. The posts (which I can no longer remember the case for which they originated) all indicated setting nhelpers=0 in /etc/ipsec.conf (under 'config setup' of course)....
After making the change, calling ipsec setup --restart - all was dandy in magical OpenS/WAN world.
But what does it actually mean? From the PlutoHelper file from OpenS/WAN 2.6.19:
Pluto helpers are started by pluto to do cryptographic operations.
Pluto will start n-1 of them, where n is the number of CPUs that you have
(including hypher threaded CPUs). If you have fewer than 2 CPUs, you will
always get at least one helper.
You can tell pluto never to start any helpers with the command line option
--nhelpers. A value of 0 forces pluto to do all operations in the main
process. A value of -1 tells pluto to perform the above calculation. Any
other value forces the number to that amount.
In one translation or another, it means that pluto will handle its own encryption, and if for some reason your ipsec tunnels that use pre-shared keys start magically complaining about some obscure error that doesn't give you an answer you like, try setting nhelpers=0 and see what happens...
Another notable fact about openswan is that option defaults tend to change over time and versions. If you want an option to be set to a certain value, it is best to include it in the config rather than take your chances.
ReplyDelete