SOLVED Cluster issue: Inbound calls to wrong server

Status
Not open for further replies.

SilkBC

Member
Nov 24, 2017
51
3
8
50
Hello,

I am playing around a bit with my two-node cluster and have actually started noticing that my inbound calls are coming in on my secondary PBX. If I reboot or disconnect the networking from the secondary PBX, then incoming calls go to my primary, but within a few minutes inbound calls start going into the secondary again.

I did recently upgrade the cluster from 4.2 to 4.4 and everything seemed to work fine (and I am quite certain incoming calls were coming in to the primary PBX no problem before), but I am puzzled why this would be the case.

I did not change anything with my SRV records; my primary PBX still has the lower priority number ("10", secondary has priority "20").

Thoughts on why his would be the case?

Thanks! :)
 

DigitalDaz

Administrator
Staff member
Sep 29, 2016
3,070
577
113
I have now switched to Route53 failover of A record instead.

But this should absolutely not be a PBX issue. If the caiier is not sending them to the primary then they need to tell you why. I doubt very much the call is being rejected at the PBX but if it is, it should be in the logs.
 

SilkBC

Member
Nov 24, 2017
51
3
8
50
If the caiier is not sending them to the primary then they need to tell you why. I doubt very much the call is being rejected at the PBX but if it is, it should be in the logs.

I spun up a brand new 2-node cluster today and the exact same behaviour is happening. fs_cli running on both nodes shows no attempt to put the call on the primary node and instead lands on the secondary node. On the primary node, if I go into "Status > SIp Status" and restart the external profile, calls start coming in to the primary node again, but as before, shortly thereafter, calls start going onto the secondary node instead.

My testing SIP trunk is through VoIP.ms. Registration is via username and password. Since this info is kept in the database, I imagine it is getting synced over and causing the conflict. Is it recommended in the case of a 2-node cluster to register the SIP trunk by IP authentication instead of username/password? I think VoIP.ms has an either/or option.
 

SilkBC

Member
Nov 24, 2017
51
3
8
50
OK, after playing with this a bit more, I *think* I stumbled upon the solution: I need to have the gateway stopped on the secondary node, then it will stop trying to reg. I didn't realise you could have the gateway stopped on one node while still running on the other (I thought maybe the status was inherited via database sync)

Anyway, I have stopped the gateway on node 2 and node 1 gateway is showing "REGD", so I will keep an eye on it and see if the gateway on node 2 starts up on its own.

A manual failover step would be to start the gateway on node 2, OR, there could possibly be a script that runs every minute pings node 1 (maybe sends 10 pings), and in the case of it being completely down (100% packet loss), starts the gateway(s) from the CLI, then when node 1 comes back up, fires off a CLI command to stop the gateway(s)...?
 

DigitalDaz

Administrator
Staff member
Sep 29, 2016
3,070
577
113
No, registration is no good in these scenarios, and don't mess with ip auth either. Just leave the trunk with username/password but set register to false.

In voipms you will see a facility to create sip uris for your dids, use that to send them to did@domain:5080
 

krooney

Member
Jun 18, 2018
172
16
18
Hi @DigitalDaz i created sip uri as per example did@domain:5080 and changed gateway to not register but the call is not hitting my pbx any other setting that needs to be changed in order to use sip uri for inbound calls
 

DigitalDaz

Administrator
Staff member
Sep 29, 2016
3,070
577
113
No, did you change the actual did at voipms to use the new SIP URI you created? You don't just create the URI you then choose it as the method to use for that particular did.
 

inform11

New Member
Feb 21, 2017
17
2
3
49
Russia
OK, after playing with this a bit more, I *think* I stumbled upon the solution: I need to have the gateway stopped on the secondary node, then it will stop trying to reg. I didn't realise you could have the gateway stopped on one node while still running on the other (I thought maybe the status was inherited via database sync)

Anyway, I have stopped the gateway on node 2 and node 1 gateway is showing "REGD", so I will keep an eye on it and see if the gateway on node 2 starts up on its own.

A manual failover step would be to start the gateway on node 2, OR, there could possibly be a script that runs every minute pings node 1 (maybe sends 10 pings), and in the case of it being completely down (100% packet loss), starts the gateway(s) from the CLI, then when node 1 comes back up, fires off a CLI command to stop the gateway(s)...?

try the command on the secondary node :
sofia global standby on
this will stop the node, but when you enter the command sofia global standby off instantly revive it.
I have so work trunks with registration without problems.

I use UCARP

Ucarp script up:
#!/bin/sh
/sbin/ifup $1:ucarp
/sbin/ifup eth1:0
/usr/bin/fs_cli -x 'sofia global standby off'
/usr/bin/fs_cli -x 'sofia recover'


Ucarp script down:
#!/bin/sh
/sbin/ifdown $1:ucarp
/sbin/ifdown eth1:0
/usr/bin/fs_cli -x 'sofia global standby on'


you can use keepalived instead of ucarp
 
Last edited:

SilkBC

Member
Nov 24, 2017
51
3
8
50
try the command on the secondary node :
sofia global standby on
this will stop the node, but when you enter the command sofia global standby off instantly revive it.
I have so work trunks with registration without problems.

What ramications does that command have on the slave? Does it just affect the trunk registration? Would the devices that are set to fail over to the slave still be able to register, regardless (just wouldn't be able to make/receive calls until the trunk is registered, obviously)

I assume all the other synchronisations would still occur from the master, otherwise?

I use UCARP

I haven't got to that sophistication yet, but would definitely be more elegant than my proposed ping script :)
 

SilkBC

Member
Nov 24, 2017
51
3
8
50
I use UCARP

I just realised that I don't think UCARP or Keepalived would probably work in the environment I would plan on having the cluster working in. UCARP and Keepalived seem to assume the two servers are in the same datacenter/network segment but each member of the cluster would in fact be located at different datacenters.
 

inform11

New Member
Feb 21, 2017
17
2
3
49
Russia

inform11

New Member
Feb 21, 2017
17
2
3
49
Russia
What ramications does that command have on the slave? Does it just affect the trunk registration? Would the devices that are set to fail over to the slave still be able to register, regardless (just wouldn't be able to make/receive calls until the trunk is registered, obviously)

I assume all the other synchronisations would still occur from the master, otherwise? - yes

Master makes changes to the registration and call status database. Slave will also make changes if you do not put it in standby. Slave would interfere with the operation of the cluster.
 
Status
Not open for further replies.