問題描述:
Cloudstack4.0內建KVM,可以正常添加主機,并且可以正常操作到ZONE啟用,但是到系統VM啟動的時候就開始報錯,報異常。
`/mnt/xx': Invalid argument
Cloudstack Management:
/var/log/cloud/management/management-server.log
1
2
3
4
5
6
7
<code>2013-08-14 03:09:09,161 DEBUG [cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null) Skip capacity scan due to there is no Primary Storage UPintenance mode</code>
<code> </code><code>1993 2013-08-14 03:09:09,721 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:null) Found 0 routers.</code>
<code> </code><code>1994 2013-08-14 03:09:25,572 DEBUG [cloud.server.StatsCollector] (StatsCollector-1:null) VmStatsCollector is running...</code>
<code> </code><code>1995 2013-08-14 03:09:25,587 DEBUG [cloud.server.StatsCollector] (StatsCollector-3:null) StorageCollector is running...</code>
<code> </code><code>1996 2013-08-14 03:09:25,589 DEBUG [cloud.server.StatsCollector] (StatsCollector-1:null) HostStatsCollector is running...</code>
<code> </code><code>1997 2013-08-14 03:09:39,160 DEBUG [cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null) Skip capacity scan due to there is no Primary Storage UPintenance mode</code>
<code> </code><code>1998 2013-08-14 03:09:39,721 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:null) Found 0 routers</code>
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
<code>2013-08-13 15:28:01,634 WARN [cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null) Exception </code><code>while</code> <code>trying to start console proxy</code>
<code> </code><code>9023 com.cloud.exception.AgentUnavailableException: Resource [Host:1] is unreachable: Host 1: Unable to start instance due to Unable to get answer that is of class com.cloud.agent.api.Star tAnswer</code>
<code> </code><code>9024 at com.cloud.vm.VirtualMachineManagerImpl.advanceStart(VirtualMachineManagerImpl.java:847)</code>
<code> </code><code>9025 at com.cloud.vm.VirtualMachineManagerImpl.start(VirtualMachineManagerImpl.java:472)</code>
<code> </code><code>9026 at com.cloud.vm.VirtualMachineManagerImpl.start(VirtualMachineManagerImpl.java:465)</code>
<code> </code><code>9027 at com.cloud.consoleproxy.ConsoleProxyManagerImpl.startProxy(ConsoleProxyManagerImpl.java:627)</code>
<code> </code><code>9028 at com.cloud.consoleproxy.ConsoleProxyManagerImpl.allocCapacity(ConsoleProxyManagerImpl.java:1164)</code>
<code> </code><code>9029 at com.cloud.consoleproxy.ConsoleProxyManagerImpl.expandPool(ConsoleProxyManagerImpl.java:1981)</code>
<code> </code><code>9030 at com.cloud.consoleproxy.ConsoleProxyManagerImpl.expandPool(ConsoleProxyManagerImpl.java:173)</code>
<code> </code><code>9031 at com.cloud.vm.SystemVmLoadScanner.loadScan(SystemVmLoadScanner.java:113)</code>
<code> </code><code>9032 at com.cloud.vm.SystemVmLoadScanner.access$100(SystemVmLoadScanner.java:34)</code>
<code> </code><code>9033 at com.cloud.vm.SystemVmLoadScanner$1.reallyRun(SystemVmLoadScanner.java:83)</code>
<code> </code><code>9034 at com.cloud.vm.SystemVmLoadScanner$1.run(SystemVmLoadScanner.java:73)</code>
<code> </code><code>9035 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)</code>
<code> </code><code>9036 at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)</code>
<code> </code><code>9037 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)</code>
<code> </code><code>9038 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)</code>
<code> </code><code>9039 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)</code>
<code> </code><code>9040 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)</code>
<code> </code><code>9041 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)</code>
<code> </code><code>9042 at java.lang.Thread.run(Thread.java:679)</code>
<code> </code><code>9043 Caused by: com.cloud.utils.exception.CloudRuntimeException: Unable to get answer that is of class com.cloud.agent.api.StartAnswer</code>
<code> </code><code>9044 at com.cloud.agent.manager.Commands.getAnswer(Commands.java:80)</code>
<code> </code><code>9045 at com.cloud.vm.VirtualMachineManagerImpl.advanceStart(VirtualMachineManagerImpl.java:783)</code>
<code> </code><code>9046 ... 19 </code><code>more</code>
KVM Host(Cloudstack Aent):
/var/log/cloud/agent/agent.log
<code>com.cloud.utils.exception.CloudRuntimeException: org.libvirt.LibvirtException: cannot create path </code><code>'/mnt/2c65613e-e5a3-3443-96c9-272fd60502ee/v-2-VM-patchdisk'</code><code>: Invalid argument</code>
<code> </code><code>at com.cloud.hypervisor.kvm.storage.LibvirtStorageAdaptor.createPhysicalDisk(LibvirtStorageAdaptor.java:556)</code>
<code> </code><code>at com.cloud.hypervisor.kvm.storage.LibvirtStoragePool.createPhysicalDisk(LibvirtStoragePool.java:101)</code>
<code> </code><code>at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.createPatchVbd(LibvirtComputingResource.java:2980)</code>
<code> </code><code>at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.createVbd(LibvirtComputingResource.java:2943)</code>
<code> </code><code>at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.execute(LibvirtComputingResource.java:2808)</code>
<code> </code><code>at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1035)</code>
<code> </code><code>at com.cloud.agent.Agent.processRequest(Agent.java:518)</code>
<code> </code><code>at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:831)</code>
<code> </code><code>at com.cloud.utils.nio.Task.run(Task.java:83)</code>
<code> </code><code>at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)</code>
<code> </code><code>at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)</code>
<code> </code><code>at java.lang.Thread.run(Thread.java:679)</code>
<code>2013-08-13 17:41:17,886 WARN [cloud.agent.Agent] (agentRequest-Handler-2:null) Caught:</code>
<code>java.lang.NullPointerException</code>
<code> </code><code>at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.cleanupVMNetworks(LibvirtComputingResource.java:3922)</code>
<code> </code><code>at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.handleVmStartFailure(LibvirtComputingResource.java:2709)</code>
<code> </code><code>at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.execute(LibvirtComputingResource.java:2834)</code>
問題分析:
這個問題的原因從CloudStack日志檔案裡面很難找出如何解決該問題的出口,這個問題從上面的日志分析,不管是從Cloudstack管理節點還是安裝代理軟體的KVM節點,問題大概出現在主存儲上面,但是又并不是權限的問題。
檢視存儲節點NFS的配置檔案:
<code>[root@storage252 ~]</code><code># cat /etc/exports</code>
<code>/primary</code> <code>*(rw,async,no_root_squash)</code>
<code>/secondary</code> <code>*(rw,async,no_root_squash)</code>
<code>[root@storage252 ~]</code><code># ll /primary/ /secondary/ -d</code>
<code>drwxrwxrwx 3 root root 4096 Aug 14 09:09 </code><code>/primary/</code>
<code>drwxrwxrwx 3 root root 4096 Aug 13 18:33 </code><code>/secondary/</code>
<code>[root@storage252 ~]</code><code># service nfs status</code>
<code>rpc.svcgssd is stopped</code>
<code>rpc.mountd (pid 26157) is running...</code>
<code>nfsd (pid 26222 26221 26220 26219 26218 26217 26216 26215) is running...</code>
<code>rpc.rquotad (pid 26153) is running...</code>
<code>[root@storage252 ~]</code><code># exportfs</code>
<code>/primary</code> <code><world></code>
<code>/secondary</code> <code><world></code>
可以看出NFS伺服器的配置檔案跟導出的目錄均沒有問題。
手動挂載NFS導出的目錄到KVM 主機上
<code>[root@kvm01 ~]</code><code># showmount -e 192.168.150.252</code>
<code>Export list </code><code>for</code> <code>192.168.150.252:</code>
<code>/secondary</code> <code>*</code>
<code>/primary</code> <code>*</code>
<code>[root@kvm01 ~]</code><code># mkdir /mnt/1</code>
<code>[root@kvm01 ~]</code><code># mkdir /mnt/2</code>
<code>[root@kvm01 ~]</code><code># mount -t nfs 192.168.150.252:/primary /mnt/1</code>
<code>[root@kvm01 ~]</code><code># mount -t nfs 192.168.150.252:/secondary /mnt/2</code>
<code>[root@kvm01 ~]</code><code># ll /mnt/</code>
<code>total 8</code>
<code>drwxrwxrwx. 3 nobody nobody 4096 Aug 14 09:09 1</code>
<code>drwxrwxrwx. 3 nobody nobody 4096 Aug 13 18:33 2</code>
建立目錄檢視權限是否沒有限制
<code>[root@kvm01 ~]</code><code># touch /mnt/1/test1</code>
<code>[root@kvm01 ~]</code><code># touch /mnt/2/test1</code>
<code>[root@kvm01 ~]</code><code># ll /mnt/1/</code>
<code>total 1</code>
<code>-rw-r--r--. 1 nobody nobody 0 Aug 14 09:35 test1</code>
<code>[root@kvm01 ~]</code><code># ll /mnt/2/</code>
可以看出KVM主機對主存儲跟二級存儲目錄均有可寫權限。而且日志中也沒有顯示Operation xxx的報錯。
但是可以發現挂載到KVM 主機上面的NFS目錄的屬主跟屬組都是nobody,而我們在NFS伺服器上面設定了no_root_squash參數,表明用戶端使用root使用者建立的檔案的權限也應該是root.root,而非nobody.nobody。
進而去檢視兩個節點的系統日志檔案,/var/log/message
<code>Aug 13 16:50:25 storage252 rpc.idmapd[19778]: nss_getpwnam: name </code><code>'0'</code> <code>does not map into domain </code><code>'clovem.com'</code>
<code>Aug 13 16:50:25 storage252 rpc.idmapd[19778]: nss_getpwnam: name </code><code>'[email protected]'</code> <code>does not map into domain </code><code>'clovem.com'</code>
<code>Aug 13 16:55:54 storage252 rpc.idmapd[19778]: nss_getpwnam: name </code><code>'[email protected]'</code> <code>does not map into domain </code><code>'clovem.com'</code>
<code>Aug 13 17:00:56 storage252 rpc.idmapd[19778]: nss_getpwnam: name </code><code>'[email protected]'</code> <code>does not map into domain </code><code>'clovem.com'</code>
<code>Aug 13 17:06:24 storage252 rpc.idmapd[19778]: nss_getpwnam: name </code><code>'[email protected]'</code> <code>does not map into domain </code><code>'clovem.com'</code>
<code>Aug 13 17:11:54 storage252 rpc.idmapd[19778]: nss_getpwnam: name </code><code>'[email protected]'</code> <code>does not map into domain </code><code>'clovem.com'</code>
<code>Aug 13 17:17:24 storage252 rpc.idmapd[19778]: nss_getpwnam: name </code><code>'[email protected]'</code> <code>does not map into domain </code><code>'clovem.com'</code>
<code>Aug 13 15:23:35 kvm01 kernel: FS-Cache: Netfs </code><code>'nfs'</code> <code>registered </code><code>for</code> <code>caching</code>
<code> </code><code>2333 Aug 13 15:23:35 kvm01 nfsidmap[13080]: nss_getpwnam: name </code><code>'[email protected]'</code> <code>does not map into domain </code><code>'sjcloud.cn'</code>
<code> </code><code>2334 Aug 13 15:26:48 kvm01 kernel: NFS: v4 server 192.168.150.252 does not accept raw uid</code><code>/gids</code><code>. Reenabling the idmapper.</code>
<code> </code><code>2335 Aug 13 15:37:22 kvm01 kernel: lo: Disabled Privacy Extensions</code>
<code> </code><code>2336 Aug 13 15:40:33 kvm01 gnome-session[17824]: WARNING: GSIdleMonitor: IDLETIME counter not found</code>
<code> </code><code>2337 Aug 13 15:40:33 kvm01 gnome-session[17824]: WARNING: Unable to determine session: Unable to lookup session information </code><code>for</code> <code>process </code><code>'17824'</code>
問題解決
從上面的分析可以看出,問題出在兩個節點的域不一樣,導緻在進行NFS映射的時候出現了問題。
檢視兩個節點的主機名:
<code>[root@storage252 ~]</code><code># hostname --fqdn</code>
<code>storage252.clovem.com</code>
<code>[root@kvm01 ~]</code><code># hostname --fqdn</code>
<code>kvm01.sjcloud.cn</code>
将兩個節點的域進行統一即可。
但是如果僅僅是玩NFS,跟Cloudstack無關的話,可以通過
<code>[root@kvm01 ~]</code><code># mount -t nfs -o vers=3 ip:/dir /localdir 即可</code>
今天找到了最方面的解決方法:
在服務端/etc/exports檔案下,指定導出目錄的參數添加一個fsid=0參數即可
如:
/export_dir *(rw,fsid=1,async,no_root_squash)
/export_dir *(rw,fsid=2,async,no_root_squash)
本文轉自 暗黑魔君 51CTO部落格,原文連結:http://blog.51cto.com/clovemfong/1272798,如需轉載請自行聯系原作者