📜 ⬆️ ⬇️

DCA / DCS Communications Error on high end Oracle Sun servers

Good day to all.

I would like to share this article with the specific configuration of Oracle's High-End class servers from Oracle Sun. When adding / deleting / moving a motherboard in high-end servers (in this article, we will focus on E-class servers, namely E25K), it is not uncommon that came up with one very characteristic error - DCA / DCS Communications errors. The error indicates that there is no connection between the two domains of the same server. Google and support from orakl prompted one solution. Putting it all together, it was decided to combine it all into one article. The most interesting thing is that after installing Solaris10 / 11 on the server, in some cases this error takes place, in some it does not. But this is not the essence, the most important is a solution to overcome this problem.

As mentioned above, the problem is caused by the fact that the relationship between the two domains is broken, i.e., one server behaves like two separate servers / domain (this is natural and logical), but is unaware that it consists of two "Parts" and can "communicate" with each other. And so we have: E25K server, SC ALOM (Service controller), OS Solaris 10. Everything works, everything is patched, but that's not the task, any operations between domains are impossible.

')
Several components are responsible for this relationship:

1. Domain Configuration Agent. DCA. Must be enabled on service controler.

2. Domain Configuration Server. DCS. Must be enabled on domains in solaris.

3. The /ets/inetd.conf file is correctly configured.

4. Ipsec policies are correctly built in the /etc/inet/ipsecinit.conf configuration file.

5. The sckmd daemon is enabled. This is a daemon responsible for IPSec protocol cryptography.

6. Internal network devices. On SC, this is the scman0 interface; in the domain, this is the dman0 interface.

7. Domain X Server.

If one of the conditions above is violated, the showdevices command wakes negative:

#showdevices -d -v [domain id]
Unable to get device information from domain x

Well, let's go in order and find out in the end who is right. Check for the presence of appropriate network devices:

#ifconfig -a

On sc:

scman0: flags = 1008843 <UP, BROADCAST, RUNNING, MULTICAST, PRIVATE, IPv4> mtu 1500 index 3 inet 10.1.1.1 netmask ffffffe0 broadcast 10.1.1.31

Domain:

dman0: flags = 1008843 <UP, BROADCAST, RUNNING, MULTICAST, PRIVATE, IPv4> mtu 1500 index 3 inet 10.10.1.3 netmask ffffffe0 broadcast 10.10.1.31 ether 0: 0: be: a8: 17: 57

If any interface is missing, you would need to create it manually, for this you need to do the following:

#ndd / dev / dman man_get_hostinfo
manc_magic = 0x4d414e43
manc_version = 01
manc_csum = 0x0
manc_ip_type = AF_INET
manc_dom_ipaddr = 10.1.1.0
manc_dom_ip_netmask = 255.255.255.224
manc_dom_ip_netnum = 10.1.1.0
manc_sc_ipaddr = 10.1.1.1
manc_dom_eaddr = 0: 0: be: a8: 48: 26
manc_sc_eaddr = 8: 0: 20: f9: e4: 54
manc_iob_bitmap = 0x400 io boards = 10.1,
manc_golden_iob = 10

Correct the / etc / netmasks file,

#vi / etc / netmasks
<manc_dom_ip_netnum> <man_dom_ip_netmask>

Something like that:

10.1.1.0 255.255.255.224

#vi /etc/hostname.dman0, if not, then create:
<manc_dom_ipaddr> netmask + broadcast + private up
wq!

#ifconfig dman0 plumb

Make sure everything is OK:

#cat /etc/syslog.conf
...
* .notice @ 10.1.1.3

If not, then perform the interface configuration steps again:

#ifconfig dman0 plumb
#ifconfig dman0 <manc_dom_ipaddr> netmask + broadcast + private up

In my case:

# ifconfig dman0 plumb
# ifconfig dman0 10.1.1.3 netmask + broadcast + private up

Now we need to check the services and daemons responsible for the exchange of information between domains. On the service controller, this is DCA. Domain Configuration Agent. He “listens” on port 665 to all incoming control information. If it is not enabled, then the execution of showdevices and rcfgadm commands will not be possible. Checking:

#ps -ef | grep dca
sms-dca 1614 361 0 Feb 26? 0:00 dca -d A
sms-dca 1758 431 0 Feb 26? 0:00 dca -d B

Next, we check the DCS daemon on both domains. It is the “server” part between domains. For its correct operation, the following prerequisite is necessary. In the /etc/inetd.conf configuration file there should be the lines:

#vi /etc/inetd.conf file:

sun-dr stream tcp wait root / usr / lib / dcs dcs
sun-dr stream tcp6 wait root / usr / lib / dcs dcs

Restart it with new additions:
# ps -ef | grep inetd
root 2021 1 0 Feb 11? 0:00 / usr / sbin / inetd -s
# kill -HUP 2021

Check if dcs started because inetd is an SMF application (Service Management Facility), then it is checked by the inetadm command:

# inetadm
ENABLED STATE FMRI
...
enabled online svc: / application / font / stfsloader: default
...
disabled disabled svc: / network / talk: default
...
enabled online svc: / platform / sun4u / dcs: default

Dcs must be on, i.e. enable. And here is another couple of commands, for a more detailed analysis of dcs:

# / usr / sbin / svccfg -s svc: / platform / sun4u / dcs: default listprop
# svcs dcs

Everything is OK, go ahead. Now we need to make sure that the domains “listen” to port 665. The port on which they “communicate”, as mentioned above, dca and dcs: From under solaris, we type:

#netstat -an | grep 665
# netstat -an | grep 665
* .665 *. * 0 0 49152 0 LISTEN
* .665 *. * 0 0 49152 0 LISTEN

And now the most interesting, for the exchange of information between domains, it is necessary to prescribe the policy of information exchange. Default settings are not always sufficient condition! Editing the file
/etc/inet/ipsecinit.conf and add the following lines:

{dport sun-dr ulp tcp} permit {auth_algs md5}
{sport sun-dr ulp tcp} apply {auth_algs md5 sa unique}
{dport cvc_hostd ulp tcp} permit {auth_algs md5}
{sport cvc_hostd ulp tcp} apply {auth_algs md5 sa unique}

Update policies:

#ipsecconf -a /etc/inet/ipsecinit.conf

#ipsecconf

Next in line is DXS. From sc, see if it is running:

# ps -ef | grep dxs
sms-dxs 1609 361 0 Feb 26? 0:57 dxs -d A
sms-dxs 1609 361 0 Feb 26? 0:57 dxs -d B
sms-dxs 1609 361 0 Feb 26? 0:57 dxs -d C

Usually he is always able to enable. So there are no problems with it. If not, then simply turn on the svcadm command. Moving on - the sckmd daemon (Sun cryptographic key management daemon). He is responsible for IPSec tunneling. See if it is running:

# ps -ef | grep sckmd

root 24156 1 0 Apr 02? 0:00 /
usr / platform / SUNW, Sun-Fire-15000 / lib / sckmd

Luckily for me he was able to incl. But if not, then just need to activate it with the svcadm command.
Well, in the end, check whether everything works for us:
# showdevices
#rcfgadm
#addboard -d [id_domain] SBx
#deletboard SBx
#moveboard -d [id_domain] SBx

Thanks for attention!

Source: https://habr.com/ru/post/133213/


All Articles