Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
Virtualization
==============
Running virtual machines on compute nodes
Introduction
------------
There are situations when Anselm's environment is not suitable for user
needs.
- Application requires different operating system (e.g Windows),
application is not available for Linux
- Application requires different versions of base system libraries and
tools
- Application requires specific setup (installation, configuration) of
complex software stack
- Application requires privileged access to operating system
- ... and combinations of above cases
We offer solution for these cases - **virtualization**. Anselm's
environment gives the possibility to run virtual machines on compute
nodes. Users can create their own images of operating system with
specific software stack and run instances of these images as virtual
machines on compute nodes. Run of virtual machines is provided by
standard mechanism of [Resource Allocation and Job
Execution](../../resource-allocation-and-job-execution/introduction.html).
Solution is based on QEMU-KVM software stack and provides
hardware-assisted x86 virtualization.
Limitations
-----------
Anselm's infrastructure was not designed for virtualization. Anselm's
environment is not intended primary for virtualization, compute nodes,
storages and all infrastructure of Anselm is intended and optimized for
running HPC jobs, this implies suboptimal configuration of
virtualization and limitations.
Anselm's virtualization does not provide performance and all features of
native environment. There is significant performance hit (degradation)
in I/O performance (storage, network). Anselm's virtualization is not
suitable for I/O (disk, network) intensive workloads.
Virtualization has also some drawbacks, it is not so easy to setup
efficient solution.
Solution described in chapter
[HOWTO](virtualization.html#howto)
is suitable for single node tasks, does not
introduce virtual machine clustering.
Please consider virtualization as last resort solution for your needs.
Please consult use of virtualization with IT4Innovation's support.
For running Windows application (when source code and Linux native
application are not available) consider use of Wine, Windows
compatibility layer. Many Windows applications can be run using Wine
with less effort and better performance than when using virtualization.
Licensing
---------
IT4Innovations does not provide any licenses for operating systems and
software of virtual machines. Users are ( >
in accordance with [Acceptable use policy
document](http://www.it4i.cz/acceptable-use-policy.pdf))
fully responsible for licensing all software running in virtual machines
on Anselm. Be aware of complex conditions of licensing software in
virtual environments.
Users are responsible for licensing OS e.g. MS Windows and all software
running in their virtual machines.
HOWTO
----------
### Virtual Machine Job Workflow
We propose this job workflow:
Workflow](virtualization-job-workflow "Virtualization Job Workflow")
Our recommended solution is that job script creates distinct shared job
directory, which makes a central point for data exchange between
Anselm's environment, compute node (host) (e.g HOME, SCRATCH, local
scratch and other local or cluster filesystems) and virtual machine
(guest). Job script links or copies input data and instructions what to
do (run script) for virtual machine to job directory and virtual machine
process input data according instructions in job directory and store
output back to job directory. We recommend, that virtual machine is
running in so called [snapshot
mode](virtualization.html#snapshot-mode), image is
immutable - image does not change, so one image can be used for many
concurrent jobs.
### Procedure
1. Prepare image of your virtual machine
2. Optimize image of your virtual machine for Anselm's virtualization
3. Modify your image for running jobs
4. Create job script for executing virtual machine
5. Run jobs
### Prepare image of your virtual machine
You can either use your existing image or create new image from scratch.
QEMU currently supports these image types or formats:
- raw
- cloop
- cow
- qcow
- qcow2
- vmdk - VMware 3 & 4, or 6 image format, for exchanging images with
that product
- vdi - VirtualBox 1.1 compatible image format, for exchanging images
with VirtualBox.
You can convert your existing image using qemu-img convert command.
Supported formats of this command are: blkdebug blkverify bochs cloop
cow dmg file ftp ftps host_cdrom host_device host_floppy http https
nbd parallels qcow qcow2 qed raw sheepdog tftp vdi vhdx vmdk vpc vvfat.
We recommend using advanced QEMU native image format qcow2.
[More about QEMU
Images](http://en.wikibooks.org/wiki/QEMU/Images)
### Optimize image of your virtual machine
Use virtio devices (for disk/drive and network adapter) and install
virtio drivers (paravirtualized drivers) into virtual machine. There is
significant performance gain when using virtio drivers. For more
information see [Virtio
Linux](http://www.linux-kvm.org/page/Virtio) and [Virtio
Windows](http://www.linux-kvm.org/page/WindowsGuestDrivers/Download_Drivers).
Disable all
unnecessary services
and tasks. Restrict all unnecessary operating system operations.
Remove all
unnecessary software and
files.
Remove all paging
space, swap files, partitions, etc.
Shrink your image. (It is recommended to zero all free space and
reconvert image using qemu-img.)
### Modify your image for running jobs
Your image should run some kind of operating system startup script.
Startup script should run application and when application exits run
shutdown or quit virtual machine.
We recommend, that startup script
maps Job Directory from host (from compute node)
runs script (we call it "run script") from Job Directory and waits for
application's exit
- for management purposes if run script does not exist wait for some
time period (few minutes)
shutdowns/quits OS
For Windows operating systems we suggest using Local Group Policy
Startup script, for Linux operating systems rc.local, runlevel init
script or similar service.
Example startup script for Windows virtual machine:
@echo off
set LOG=c:startup.log
set MAPDRIVE=z:
set SCRIPT=%MAPDRIVE%run.bat
set TIMEOUT=300
echo %DATE% %TIME% Running startup script>%LOG%
rem Mount share
echo %DATE% %TIME% Mounting shared drive>%LOG%
net use z: 10.0.2.4qemu >%LOG% 2>&1
dir z: >%LOG% 2>&1
echo. >%LOG%
if exist %MAPDRIVE% (
echo %DATE% %TIME% The drive "%MAPDRIVE%" exists>%LOG%
if exist %SCRIPT% (
echo %DATE% %TIME% The script file "%SCRIPT%"exists>%LOG%
echo %DATE% %TIME% Running script %SCRIPT%>%LOG%
set TIMEOUT=0
call %SCRIPT%
) else (
echo %DATE% %TIME% The script file "%SCRIPT%"does not exist>%LOG%
)
) else (
echo %DATE% %TIME% The drive "%MAPDRIVE%" does not exist>%LOG%
)
echo. >%LOG%
timeout /T %TIMEOUT%
echo %DATE% %TIME% Shut down>%LOG%
shutdown /s /t 0
Example startup script maps shared job script as drive z: and looks for
run script called run.bat. If run script is found it is run else wait
for 5 minutes, then shutdown virtual machine.
### Create job script for executing virtual machine
Create job script according recommended
[Virtual Machine Job
Workflow](virtualization.html#virtual-machine-job-workflow).
Example job for Windows virtual machine:
#/bin/sh
JOB_DIR=/scratch/$USER/win/${PBS_JOBID}
#Virtual machine settings
VM_IMAGE=~/work/img/win.img
VM_MEMORY=49152
VM_SMP=16
# Prepare job dir
mkdir -p ${JOB_DIR} && cd ${JOB_DIR} || exit 1
ln -s ~/work/win .
ln -s /scratch/$USER/data .
ln -s ~/work/win/script/run/run-appl.bat run.bat
# Run virtual machine
export TMPDIR=/lscratch/${PBS_JOBID}
module add qemu
qemu-system-x86_64
-enable-kvm
-cpu host
-smp ${VM_SMP}
-m ${VM_MEMORY}
-vga std
-localtime
-usb -usbdevice tablet
-device virtio-net-pci,netdev=net0
-netdev user,id=net0,smb=${JOB_DIR},hostfwd=tcp::3389-:3389
-drive file=${VM_IMAGE},media=disk,if=virtio
-snapshot
-nographic
Job script links application data (win), input data (data) and run
script (run.bat) into job directory and runs virtual machine.
Example run script (run.bat) for Windows virtual machine:
z:
cd winappl
call application.bat z:data z:output
Run script runs application from shared job directory (mapped as drive
z:), process input data (z:data) from job directory and store output
to job directory (z:output).
### Run jobs
Run jobs as usual, see [Resource Allocation and Job
Execution](../../resource-allocation-and-job-execution/introduction.html).
Use only full node allocation for virtualization jobs.
### Running Virtual Machines
Virtualization is enabled only on compute nodes, virtualization does not
work on login nodes.
Load QEMU environment module:
$ module add qemu
Get help
$ man qemu
Run virtual machine (simple)
$ qemu-system-x86_64 -hda linux.img -enable-kvm -cpu host -smp 16 -m 32768 -vga std -vnc :0
$ qemu-system-x86_64 -hda win.img -enable-kvm -cpu host -smp 16 -m 32768 -vga std -localtime -usb -usbdevice tablet -vnc :0
You can access virtual machine by VNC viewer (option -vnc) connecting to
IP address of compute node. For VNC you must use [VPN
network](../../accessing-the-cluster/vpn-access.html).
Install virtual machine from iso file
$ qemu-system-x86_64 -hda linux.img -enable-kvm -cpu host -smp 16 -m 32768 -vga std -cdrom linux-install.iso -boot d -vnc :0
$ qemu-system-x86_64 -hda win.img -enable-kvm -cpu host -smp 16 -m 32768 -vga std -localtime -usb -usbdevice tablet -cdrom win-install.iso -boot d -vnc :0
Run virtual machine using optimized devices, user network backend with
sharing and port forwarding, in snapshot mode
$ qemu-system-x86_64 -drive file=linux.img,media=disk,if=virtio -enable-kvm -cpu host -smp 16 -m 32768 -vga std -device virtio-net-pci,netdev=net0 -netdev user,id=net0,smb=/scratch/$USER/tmp,hostfwd=tcp::2222-:22 -vnc :0 -snapshot
$ qemu-system-x86_64 -drive file=win.img,media=disk,if=virtio -enable-kvm -cpu host -smp 16 -m 32768 -vga std -localtime -usb -usbdevice tablet -device virtio-net-pci,netdev=net0 -netdev user,id=net0,smb=/scratch/$USER/tmp,hostfwd=tcp::3389-:3389 -vnc :0 -snapshot
Thanks to port forwarding you can access virtual machine via SSH (Linux)
or RDP (Windows) connecting to IP address of compute node (and port 2222
for SSH). You must use [VPN
network](../../accessing-the-cluster/vpn-access.html).
Keep in mind, that if you use virtio devices, you must have virtio
drivers installed on your virtual machine.
### Networking and data sharing
For networking virtual machine we suggest to use (default) user network
backend (sometimes called slirp). This network backend NATs virtual
machines and provides useful services for virtual machines as DHCP, DNS,
SMB sharing, port forwarding.
In default configuration IP network 10.0.2.0/24 is used, host has IP
address 10.0.2.2, DNS server 10.0.2.3, SMB server 10.0.2.4 and virtual
machines obtain address from range 10.0.2.15-10.0.2.31. Virtual machines
have access to Anselm's network via NAT on compute node (host).
Simple network setup
$ qemu-system-x86_64 ... -net nic -net user
(It is default when no -net options are given.)
Simple network setup with sharing and port forwarding (obsolete but
simpler syntax, lower performance)
$ qemu-system-x86_64 ... -net nic -net user,smb=/scratch/$USER/tmp,hostfwd=tcp::3389-:3389
Optimized network setup with sharing and port forwarding
$ qemu-system-x86_64 ... -device virtio-net-pci,netdev=net0 -netdev user,id=net0,smb=/scratch/$USER/tmp,hostfwd=tcp::2222-:22
### Advanced networking
Internet access**
Sometime your virtual machine needs access to internet (install
software, updates, software activation, etc). We suggest solution using
Virtual Distributed Ethernet (VDE) enabled QEMU with SLIRP running on
login node tunnelled to compute node. Be aware, this setup has very low
performance, the worst performance of all described solutions.
Load VDE enabled QEMU environment module (unload standard QEMU module
first if necessary).
$ module add qemu/2.1.2-vde2
Create virtual network switch.
$ vde_switch -sock /tmp/sw0 -mgmt /tmp/sw0.mgmt -daemon
Run SLIRP daemon over SSH tunnel on login node and connect it to virtual
network switch.
$ dpipe vde_plug /tmp/sw0 = ssh login1 $VDE2_DIR/bin/slirpvde -s - --dhcp &
Run qemu using vde network backend, connect to created virtual switch.
Basic setup (obsolete syntax)
$ qemu-system-x86_64 ... -net nic -net vde,sock=/tmp/sw0
Setup using virtio device (obsolete syntax)
$ qemu-system-x86_64 ... -net nic,model=virtio -net vde,sock=/tmp/sw0
Optimized setup
$ qemu-system-x86_64 ... -device virtio-net-pci,netdev=net0 -netdev vde,id=net0,sock=/tmp/sw0
TAP interconnect**
Both user and vde network backend have low performance. For fast
interconnect (10Gbps and more) of compute node (host) and virtual
machine (guest) we suggest using Linux kernel TAP device.
Cluster Anselm provides TAP device tap0 for your job. TAP interconnect
does not provide any services (like NAT, DHCP, DNS, SMB, etc.) just raw
networking, so you should provide your services if you need them.
Run qemu with TAP network backend:
$ qemu-system-x86_64 ... -device virtio-net-pci,netdev=net1
-netdev tap,id=net1,ifname=tap0,script=no,downscript=no
Interface tap0 has IP address 192.168.1.1 and network mask 255.255.255.0
(/24). In virtual machine use IP address from range
192.168.1.2-192.168.1.254. For your convenience some ports on tap0
interface are redirected to higher numbered ports, so you as
non-privileged user can provide services on these ports.
Redirected ports:
- DNS udp/53->udp/3053, tcp/53->tcp3053
- DHCP udp/67->udp3067
- SMB tcp/139->tcp3139, tcp/445->tcp3445).
You can configure IP address of virtual machine statically or
dynamically. For dynamic addressing provide your DHCP server on port
3067 of tap0 interface, you can also provide your DNS server on port
3053 of tap0 interface for example:
$ dnsmasq --interface tap0 --bind-interfaces -p 3053 --dhcp-alternate-port=3067,68 --dhcp-range=192.168.1.15,192.168.1.32 --dhcp-leasefile=/tmp/dhcp.leasefile
You can also provide your SMB services (on ports 3139, 3445) to obtain
high performance data sharing.
Example smb.conf (not optimized)
[global]
socket address=192.168.1.1
smb ports = 3445 3139
private dir=/tmp/qemu-smb
pid directory=/tmp/qemu-smb
lock directory=/tmp/qemu-smb
state directory=/tmp/qemu-smb
ncalrpc dir=/tmp/qemu-smb/ncalrpc
log file=/tmp/qemu-smb/log.smbd
smb passwd file=/tmp/qemu-smb/smbpasswd
security = user
map to guest = Bad User
unix extensions = no
load printers = no
printing = bsd
printcap name = /dev/null
disable spoolss = yes
log level = 1
guest account = USER
[qemu]
path=/scratch/USER/tmp
read only=no
guest ok=yes
writable=yes
follow symlinks=yes
wide links=yes
force user=USER
(Replace USER with your login name.)
Run SMB services
smbd -s /tmp/qemu-smb/smb.conf
Virtual machine can of course have more than one network interface
controller, virtual machine can use more than one network backend. So,
you can combine for example use network backend and TAP interconnect.
### Snapshot mode
In snapshot mode image is not written, changes are written to temporary
file (and discarded after virtual machine exits). **It is strongly
recommended mode for running your jobs.** Set TMPDIR environment
variable to local scratch directory for placement temporary files.
$ export TMPDIR=/lscratch/${PBS_JOBID}
$ qemu-system-x86_64 ... -snapshot
### Windows guests
For Windows guests we recommend these options, life will be easier:
$ qemu-system-x86_64 ... -localtime -usb -usbdevice tablet