12 November 2015

625. In Progress: Dead node -- no power, unless 8-pin EDS/ATX12V1 unplugged

One of my nodes died last night. It's got a Corsair GS700, AsRock 990FX Extreme 3 mobo and an AMD FX 8150 cpu. The mobo and PSU are 2 years old, whereas the CPU is over three.

In it's full configuration, it refuses to turn on. There's no indication that there's any power. No lights, no fan movements, no beeps.

I checked the PSU, and it's fine.

I then discovered that if I only plug in the 24-pin motherboard PSU connector, but don't attach the 8-pin EDS CPU power, the computer behaves as if there's power i.e. all fans, including on PIC-E VGA card, spin etc.

There's no output though, and no POST beeps.

Plugging in the EDS/ATX12V1 cable again leads to a dead system.

Removing the CPU or RAM still yields a dead system (not sure what a mobo without a CPU is supposed to do i.e. whether it's supposed to POST. Either way, it's dead).

The CMOS battery tests fine with a DMM.  The EDS/ATX12V1 output yields 12 V on all four yellow pins.

The power button is obviously working given that the system 'turns on' when the EDS cable is unplugged.

So it's down to the MOBO or CPU being faulty.

I've ordered a POST card and a CPU socket tester, and will update when I've done further diagnostics.

UPDATE 18/11: I put an old Phenom II in the socket. System still won't light up. This points towards the motherboard being dead. That's not a bad thing as the mobo might still be under warranty.

15 October 2015

624. Gaussian fails with "traps: l502.exe[12449] general protection ip:d75df7 sp:7f7de40dcce0 error:0 in l502.exe[400000+dc8000]" on i7-5820K

Update: 
* The systems is rock solid with nwchem and ADF.  Only G09 crashes
* Gaussian has now released G09E, and the release notes say: "A Sandybridge/Haswell binary distribution is also available". Remains to be found out if this new version solves the issue. I can't check, as I don't have access to that version.

Update 18 Oct 2015:
TL;DR version: G09D (EM64T and AMD64) crash within the first 30 min to 4 hours. An NWChem job has so far run 6 days without crashing and is still going strong.

Original post:
This isn't much of a post yet. I'm mostly posting this so that people searching online will see that they aren't alone.


I just built a new node:
AU$559 Intel BX80648I75820K 6 Core i7-5820K 3.3Ghz 15MB LGA-2011-V3 (No Heatsink)
AU$407 Gigabyte X99-SLI Intel X99 S2011-3 8xDDR4/4xPCI-E/Intel GBLan/ATX Motherboard
AU$50 DeepCool FrostWin v2.0 CPU cooler
AU$155x2 Patriot 16G Kit (8Gx2) DDR4 2133 Desktop RAM
AU$185 Antec HCG-900 High Current Gamer Gaming PSU
AU$39 Gigabyte N210SL-1GI 1GB GT210 PCI-E VGA Card
AU$68 Seagate 3.5" Barracuda 1TB ST1000DM003 SATA3 7200RPM 64MB HDD (carbon)
AU$76 Antec GX500B-W Dominator Window USB3.0 Gaming Case without PSU

I've got an installation of up-to-date Jessie on it, with the following kernel: Debian 3.16.7-ckt11-1+deb8u5 (2015-10-09) x86_64 GNU/Linux.


When running G09D rev. 01 EM64T I keep getting random errors along these lines (these are collected over a couple of days and between restarts):
[100433.566789] traps: l703.exe[11236] general protection ip:df18ca sp:7fc96f595268 error:0 in l703.exe[400000+a46000] [ 2587.899019] traps: l703.exe[3727] general protection ip:9c9757 sp:7fa6fc436ce0 error:0 in l703.exe[400000+a46000] [26439.755347] traps: l502.exe[3235] general protection ip:ab8a55 sp:7ffe29504c10 error:0 in l502.exe[400000+dc8000] [43030.457126] traps: l502.exe[427] general protection ip:11565a7 sp:7f2ec1fff268 error:0 in l502.exe[400000+dc8000] [ 2587.899019] traps: l703.exe[3727] general protection ip:9c9757 sp:7fa6fc436ce0 error:0 in l703.exe[400000+a46000] [37460.207608] traps: l703.exe[14649] general protection ip:a38ae0 sp:7f1a813cf8c0 error:0 in l703.exe[400000+a46000] [ 8865.403861] traps: l502.exe[12449] general protection ip:d75df7 sp:7f7de40dcce0 error:0 in l502.exe[400000+dc8000]


Sometimes the crashes happen after 30 minutes, sometimes after 3 hours. Most happen within four hours. I seem to remember that one ran up to 12 hours, but nothing's gone beyond that. Some short (1h 30 min) calculations have managed to run to completion.

I've checked each RAM stick with memtest+ -- they are fine -- and they are distributed as recommended in the motherboard manual.

The temperature is running below 40 degrees Celsius.

The harddrive is fine according to SMART.

I log everything every two minutes, and so can go back and look at what happened right before the crash, but there's nothing odd.


My current best three hypotheses are:

* There's an issue with G09D EM64T and the new generation of LGA2011-v3 i7 intel cpus specifically

* There's an issue with any version of G09D and the new generation of LGA2011-v3 i7 intel cpus

* There's an issue with my system which is independent of G09D.

To test, I'll be:
* Do runs with G09D rev. 01 AMD64
                         Done -- this also crashed.

* Do runs with NWChem 6.5 (ifort, mkl)
                         Running -- 6 days so far without a crash!

* Update the BIOS (long shot)

* Remove CPU and check for bent pins (long, arduous shot)

I'll be posting updates...

28 August 2015

623. Comparing frequency calculations in G09 and NWChem -- the importance of grid density

The vibrational entropy term will differ between calculations done in gaussian and nwchem unless you use "grid xfine" in nwchem. That the grid density is important is nothing new, but the magnitude of the effect on the entropy surprised me.

The difference can be large -- 30 cal/molK for [PPh4]+ at pbe0/cc-pvdz -- which becomes quite significant when multiplied by T (e.g. 298.15 K).

Note that the rotational entropy term also may differ, but that this would be due to different uses of symmetry in the calculations: http://molecularmodelingbasics.blogspot.com.au/2012/12/conformational-and-rotational-entropy.html

If you turn off symmetry (noautosym) in nwchem the rotational entropy will not be corrected. I've noticed that Gaussian, on the other hand, will sneakily apply correction if it finds an acceptable symmetry even if you request nosymm, so make sure that you scan through the output carefully.

Either way, vibrational entropy is not symmetry dependent. Instead you will have to worry about the grid fine-ness when comparing outputs.

If your molecule is very small, such as benzene or tetramethylphosphonium, it seems that you don't have to worry about this. However, even fairly small molecules such as [PPh4]+ will be affected.

Conv. Dens. = Convergence Density


CodeSymmGridConv. Dens.DFT EnergyZPEHCorrS(tot)S(trans)S(rot)S(vib)
G09NF1E-8-1266.584241520.3705160.389751147.07643.35835.00968.708
G09NU1E-8-1266.584303740.3704640.389691146.82943.35835.00968.461
NWNX1E-8-1266.584552230.3703480.389549146.69743.33934.99468.365
NWNX1E-5-1266.584552220.3703480.389549146.70443.33934.99468.371
NWYF1E-5-1266.584536840.3700340.385683118.02343.33933.61741.067
NWYF1E-8-1266.584536840.3700340.385683118.02343.33933.61741.067
NWNF1E-5-1266.584549280.3700340.385683119.39443.33934.99441.062
NWNF1E-8-1266.584549290.3700340.385683119.39443.33934.99441.061
NWYX1E-5-1266.584552740.3703480.389549145.33143.33933.61768.376
NWYX1E-8-1266.584552750.3703480.389549145.33743.33933.61768.382

F=fine. X=Extrafine. U=Ultrafine.
S(rot) values in blue are symmetry corrected. That's completely normal.

With "grid fine" NWChem gives a very different result to G09.

You can see the difference in the predicted IR spectra as well.

Fine (NWChem) (blue rings) vs G09 (red circles):
"grid xfine" (NWChem) (blue rings) vs G09 (red circles):


21 August 2015

622. Crude comparison of a simple frequency calculation in different computational packages

I've given the input files at the bottom of the post.

The job is a simple benzene frequency calculation at PBE0/def2-tzvp. The jobs were run on an AMD FX 8150 with 32 gb ram (debian jessie 64 bit linux)

Orca 3.0.3 and G09 (AM64L rev D.01) were supplied as precompiled binaries.

Nwchem 6.5 and Gamess US 5 Dec 2014 (R1) were compiled by me, and the poor performance of either code may thus be due solely to something that I've done. Both codes were linked against ACML 5.3.1 and compiled with gfortran.

I'm also not familiar with orca and gamess, and so I don't know how to get the best performance from either code. I defined the basis set explicitly in all codes.

Also note that for other frequency calculations Orca seems orders of magnitude slower than NWChem -- in fact they are so slow that I haven't let them finish after waiting for a week, when in Nwchem they take a day.

Basically, my empirical but poorly-supported-by-data view is that G09 is by far the fastest, then Nwchem, then Orca and finally Gamess. In the case of Gamess this may well be due to how I compiled it.

Nwchem was compiled roughly as was shown here: http://verahill.blogspot.com.au/2014/09/593-nwchem-65-on-debian-jessie-and.html

I'll post the compilation of Gamess US 5/12/2014 R1 later.

Results
Note: Orca used a symmetry number of 3 in the calculation of S(rot). I've 'corrected' it back to the non-symmetry corrected value so that it can be compared with the output from the other codes. Likewise, I've divided the entropy terms from Orca by 298.15 K.

Overall the results agree very well across the codes although the electronic energy in Orca is noticably different from the others.

Code Runtime 'DFT' energy (H) ZPE (H) Hcorr (H) S(tot) (Cal) S(trans) (Cal) S(rot) (Cal) S(vib) (Cal)
G09 12 min -232.04511372 0.100682 0.106028 69.033 38.979 25.625 4.428
Orca 17 min -232.04530127 0.100605 0.105960 69.062 38.974 25.628 4.461
Nwchem 41 min -232.04516624 0.100836 0.106162 68.950 38.962 25.614 4.375
Gamess 76 min -232.04517728 0.100701 0.10604 69.011 38.979 25.625 4.407

G09 input:
%nprocshared=8
%Mem=3500000000
%Chk=benzene_freq.chk
#P rPBE1PBE/GEN 5D 7F Freq=() NoSymm Integral(UltraFine )  Punch=(MO) Pop=() 

benzene freq

0 1 ! charge and multiplicity
 C     1.20188     0.693923     0.00000
 C     9.00000e-06     1.38776     0.00000
 C     -1.20188     0.693842     0.00000
 C     -1.20188     -0.693933     0.00000
 C     -1.60000e-05     -1.38777     0.00000
 C     1.20188     -0.693829     0.00000
 H     2.14078     1.23611     0.00000
 H     5.10000e-05     2.47197     0.00000
 H     -2.14085     1.23591     0.00000
 H     -2.14082     -1.23604     0.00000
 H     -2.60000e-05     -2.47197     0.00000
 H     2.14082     -1.23594     0.00000

 C  0
 S   6  1.00
   13575.34968200      0.00022246
    2035.23336800      0.00172327
     463.22562359      0.00892557
     131.20019598      0.03572798
      42.85301589      0.11076260
      15.58418577      0.24295628
 S   2  1.00
       6.20671385      0.41440263
       2.57648965      0.23744969
 S   1  1.00
       0.57696339      1.00000000
 S   1  1.00
       0.22972831      1.00000000
 S   1  1.00
       0.09516444      1.00000000
 P   4  1.00
      34.69723224      0.00533337
       7.95826228      0.03586411
       2.37808269      0.14215873
       0.81433208      0.34270472
 P   1  1.00
       0.28887547      1.00000000
 P   1  1.00
       0.10056824      1.00000000
 D   1  1.00
       1.09700000      1.00000000
 D   1  1.00
       0.31800000      1.00000000
 F   1  1.00
       0.76100000      1.00000000
 ****
 H  0
 S   3  1.00
      34.06134100      0.00602520
       5.12357460      0.04502109
       1.16466260      0.20189726
 S   1  1.00
       0.32723041      1.00000000
 S   1  1.00
       0.10307241      1.00000000
 P   1  1.00
       0.80000000      1.00000000
 ****



Orca input
%pal nprocs 8 end
! DFT pbe0 def2-tzvp printbasis
! freq
%basis
newgto H
S   3
  1     34.0613410              0.60251978E-02   
  2      5.1235746              0.45021094E-01   
  3      1.1646626              0.20189726       
S   1
  1      0.32723041             1.0000000        
S   1
  1      0.10307241             1.0000000        
P   1
  1      0.8000000              1.0000000        
end
newgto C
S   6
  1  13575.3496820              0.22245814352E-03      
  2   2035.2333680              0.17232738252E-02      
  3    463.22562359             0.89255715314E-02      
  4    131.20019598             0.35727984502E-01      
  5     42.853015891            0.11076259931    
  6     15.584185766            0.24295627626    
S   2
  1      6.2067138508           0.41440263448    
  2      2.5764896527           0.23744968655    
S   1
  1      0.57696339419          1.0000000        
S   1
  1      0.22972831358          1.0000000        
S   1
  1      0.95164440028E-01            1.0000000        
P   4
  1     34.697232244            0.53333657805E-02      
  2      7.9582622826           0.35864109092E-01      
  3      2.3780826883           0.14215873329    
  4      0.81433208183          0.34270471845    
P   1
  1      0.28887547253          1.0000000        
P   1
  1      0.10056823671          1.0000000        
D   1
  1      1.09700000             1.0000000        
D   1
  1      0.31800000             1.0000000        
F   1
  1      0.76100000             1.0000000      
end
end
* xyz 0 1
C             0.03998          -1.38721           0.00000
C             1.22135          -0.65898           0.00000
C             1.18137           0.72823           0.00000
C            -0.03998           1.38721           0.00000
C            -1.22135           0.65898           0.00000
C            -1.18137          -0.72823           0.00000
H             0.07121          -2.47097           0.00000
H             2.17552          -1.17381           0.00000
H             2.10431           1.29715           0.00000
H            -0.07121           2.47097           0.00000
H            -2.17552           1.17381           0.00000
H            -2.10431          -1.29715           0.00000
*

nwchem input
cratch_dir /home/me/scratch
Title "benzene freq"

Start  freq

echo

charge 0

geometry noautosym units angstrom
 C     1.20188     0.693923     0.00000
 C     9.00000e-06     1.38776     0.00000
 C     -1.20188     0.693842     0.00000
 C     -1.20188     -0.693933     0.00000
 C     -1.60000e-05     -1.38777     0.00000
 C     1.20188     -0.693829     0.00000
 H     2.14078     1.23611     0.00000
 H     5.10000e-05     2.47197     0.00000
 H     -2.14085     1.23591     0.00000
 H     -2.14082     -1.23604     0.00000
 H     -2.60000e-05     -2.47197     0.00000
 H     2.14082     -1.23594     0.00000
end

basis "ao basis" spherical print
H    S
    34.061341000000     0.006025197800
     5.123574600000     0.045021094000
     1.164662600000     0.201897260000
H    S
     0.327230410000     1.000000000000
H    S
     0.103072410000     1.000000000000
H    P
     0.800000000000     1.000000000000
C    S
 13575.349682000000     0.000222458144
  2035.233368000000     0.001723273825
   463.225623590000     0.008925571531
   131.200195980000     0.035727984502
    42.853015891000     0.110762599310
    15.584185766000     0.242956276260
C    S
     6.206713850800     0.414402634480
     2.576489652700     0.237449686550
C    S
     0.576963394190     1.000000000000
C    S
     0.229728313580     1.000000000000
C    S
     0.095164440028     1.000000000000
C    P
    34.697232244000     0.005333365781
     7.958262282600     0.035864109092
     2.378082688300     0.142158733290
     0.814332081830     0.342704718450
C    P
     0.288875472530     1.000000000000
C    P
     0.100568236710     1.000000000000
C    D
     1.097000000000     1.000000000000
C    D
     0.318000000000     1.000000000000
C    F
     0.761000000000     1.000000000000
END

dft
  mult 1
  direct
  XC pbe0
  grid xfine
  convergence density 1e-08
  mulliken
end

task dft energy
task dft freq

gamess US input:
 $SYSTEM MWORDS=3500 $END
 $CONTRL RUNTYP=Hessian $END
 $CONTRL SCFTYP=RHF $END
 $DFT  DFTTYP=PBE0 $END
 $CONTRL ICHARG=0  MULT=1 $END
 $CONTRL ISPHER=1 $END
 $SCF DIRSCF=.TRUE. $END
 $BASIS EXTFIL=.TRUE. GBASIS=DEF2TZVP $END
 $DATA
 Benzene
 C1
 C  6.000000 0.039980 -1.387210 0.000000
 C  6.000000 1.221350 -0.658980 0.000000
 C  6.000000 1.181370 0.728230 0.000000
 C  6.000000 -0.039980 1.387210 0.000000
 C  6.000000 -1.221350 0.658980 0.000000
 C  6.000000 -1.181370 -0.728230 0.000000
 H  1.000000 0.071210 -2.470970 0.000000
 H  1.000000 2.175520 -1.173810 0.000000
 H  1.000000 2.104310 1.297150 0.000000
 H  1.000000 -0.071210 2.470970 0.000000
 H  1.000000 -2.175520 1.173810 0.000000
 H  1.000000 -2.104310 -1.297150 0.000000
 $END

17 August 2015

621. Very briefly: comparison of different hardware for a single G09 calculation

And another update:
* I've set up an Amazon EC2/AWS instance and have done some benchmarking there too: http://verahill.blogspot.com.au/2016/01/626-briefly-gaussian-g09d-with-slurm-on.html 
Exciting stuff.

Another update:
* I turned off hyperthreading on the i7-4930K to see whether it improved performance. The geometry optimisation took 2 min (4%) longer but the overall runtime was 2 min (2%) shorter. This is probably within the reproducibility error for the measurement.
* I've also added results for an i7-5820K without hyperthreading


Update: I spotted a few mistakes
* the L5430 job ran on a dual-socket machine, so I've multiplied the passmark by two and have replotted
* the X3480 job use the EM64T version of gaussian, not AMD64. I don't have a license to test that system using AMD64.

Original post:
There are lots of potential flaws when comparing the performance of a computational package on different hardware. Thus, it can be difficult to find examples online comparing different hardware using computational chemistry packages which makes it challenging to decide on what hardware to budget for.

So here's a simple comparison of a few different types of hardware for a geovib calculation in Gaussian.

All systems have spinning (7200 rpm) disks and use debian jessie (64). The systems haven't been optimised in any way.

All systems used G09D rev 01 AMD 64 unless otherwise indicated. The amount of time the geometry optimisation took is given within [].

CPU Benchmarks:
i7-4930K 13059
i7-5820K 12999
i7-7700 10817
i7-6700 10011
FX 8350 8943
FX 8150 7626
i5-2400 5910
X3480 5732
PH2 X6 4998

Performance:
2h 15 min. [1h 14 min.] Intel i7-4930K/ 32 Gb ram/ 12 threads
2h 23 min. [1h 25 min.] Intel i7-5820K/ 32 Gb ram/ 6 threads (HT turned off, NOTE)
2h 28 min [1h 15 min] Intel i7-7700/ 16 Gb ram/ 8 threads (w/ HT)
2h 49 min [1h 37 min] Intel i7-6700/ 16Gb ram/ 6 threads (w/ HT)
3h 49 min. [2h 12 min.] AMD FX 8350/ 8 Gb ram/ 8 threads
4h 12 min. [2h 19 min.] Intel i5-2400/ 16 Gb ram/ 4 threads
4h 16 min. [2h 24 min.] AMD Phenom II X6 1055T/ 8 Gb ram/ 6 threads
4h 28 min. [2h 16 min.] dual-socket Intel Xeon L5430/ 16 Gb ram/ 8 threads-- rev A.02
4h 43 min. [2h 47 min.] AMD FX 8150/ 32 Gb ram/ 8 threads
9h 26 min. [5h 18 min.] AMD Athlon II X3 445/ 8 Gb ram/ 3 threads

I also tried the EM64T version of G09D rev 01 and got:
1h 43 min. [57 min.] i7-4930K/ 32 Gb ram/ 12 threads
1h 59 min  [1h    7 min] i7-6700/ 16 Gb ram/6 threads (w/ HT)
2h 12 min  [1h    7 min] i7-7700/ 16 Gb ram/8 threads (w/ HT)
3h 03 min. [1h 44 min.] i5-2400/16 Gb ram/ 4 threads
4h 21 min. [2h 27 min.] Xeon X3480/ 8 Gb ram/ 8 threads -- rev B.01

Just by switching from the AMD64 to the EM64T version we thus cut the calculation down to 75% of the time for the i7.

 I also turned off hyperthreading for the i7-4930K and ran with six threads using the EM64T version:
 1h 41 min. [59 min.] i7-4930K/ 32 Gb ram/ 6 threads
 1h 35 min. [58 min.] i7-5820K/ 32 Gb ram/ 6 threads (NOTE)
 2h   9 min. [1h 11 min.] i7-7700/ 16Gb ram/ 4 threads


Here's a plot of the run times vs Passmark benchmarks (I've multiplied the Xeon L5430 passmark by 2):

Note that the Xeon L5430 and Xeon X340 store the job files on an NFS (scratch files are local) and are not using G09 rev D.01.

It'd be tempting to draw a line through the Athlon II to the I7-4930K, in which case the Phenom II X6 and the I5-2400 perform much, much better than they should based on the CPU Passmark alone.

Either way, these are my observations. No interpretations or opinion attached.

So what's  the benchmarking job that I used? I actually prefer not to reveal it, as it'd eventually point towards my identity (and you're not supposed to publish gaussian benchmarks...)

Suffice to say that it uses:
rpbe/def2-svp and 459 functions/759 primitives (46 atoms) with opt=(verytight) and integral(ultrafine)




620. Very Short: Setting up ORCA (comp. chem.) on debian jessie

Nothing odd here, but providing instructions is never a bad idea. Note that ORCA is binaries only i.e. no source code. It's one reason why I wouldn't want to rely on it as my main code for computational chemistry. However, I am interested in using it to compare select results from nwchem and g09, especially in cases where g09 and nwchem produce differing results.

Either way,. to download ORCA you need to join the ORCA forum:
https://orcaforum.cec.mpg.de/portal.php

Once you've joined you can go to Downloads and, well, download. I went for https://orcaforum.cec.mpg.de/downloads.php?view=detail&df_id=43

sudo mkdir /opt/orca
sudo chown $USER /opt/orca
cp ~/Downloads/orca_3_0_3_linux_x86-64.tbz /opt/orca
cd /opt/orca
tar xvf orca_3_0_3_linux_x86-64.tbz
cd orca_3_0_3_linux_x86-64
ln -s orca orca_cc

The symmlink is there because debian already has a binary called orca.
And that's that installed. To add /opt/orca/orca_3_0_3_linux_x86-64 to PATH do

echo 'export PATH=$PATH:/opt/orca/orca_3_0_3_linux_x86-64' >> ~/.bashrc

To test, try one of the jobs on https://www.sharcnet.ca/help/index.php/ORCA#Parallel_ORCA_job and run using
/opt/orca/orca_3_0_3_linux_x86-64/orca_cc test.inp
***************** * O R C A * ***************** --- An Ab Initio, DFT and Semiempirical electronic structure package --- ####################################################### # -***- # # Department of molecular theory and spectroscopy # # Directorship: Frank Neese # # Max Planck Institute for Chemical Energy Conversion # # D-45470 Muelheim/Ruhr # # Germany # # # # All rights reserved # # -***- # ####################################################### Program Version 3.0.3 - RELEASE - [..] ************************************************************ * Program running with 2 parallel MPI-processes * * working on a common directory * ************************************************************ ------------------------------------------------------------------------------ [..] ------------------------------------------------------------------------------ ORCA ELECTRIC PROPERTIES CALCULATION ------------------------------------------------------------------------------ Dipole Moment Calculation ... on Quadrupole Moment Calculation ... off Polarizability Calculation ... off GBWName ... test.gbw Electron density file ... test.scfp.tmp ------------- DIPOLE MOMENT ------------- X Y Z Electronic contribution: 0.00000 -0.00000 -0.00000 Nuclear contribution : -0.00000 0.00000 -0.00000 ----------------------------------------- Total Dipole Moment : -0.00000 0.00000 -0.00000 ----------------------------------------- Magnitude (a.u.) : 0.00000 Magnitude (Debye) : 0.00000 Timings for individual modules: Sum of individual times ... 45.891 sec (= 0.765 min) GTO integral calculation ... 5.511 sec (= 0.092 min) 12.0 % SCF iterations ... 26.269 sec (= 0.438 min) 57.2 % SCF Gradient evaluation ... 13.957 sec (= 0.233 min) 30.4 % Geometry relaxation ... 0.154 sec (= 0.003 min) 0.3 % ****ORCA TERMINATED NORMALLY**** TOTAL RUN TIME: 0 days 0 hours 0 minutes 50 seconds 359 msec

01 August 2015

619. Making libreoffice look a bit nicer

...and by the title I actually refer to making it look like MS office...we can discuss the merits or lack thereof of that, but not here and now.

NOTE: you can install a few alternative icons themes directly from the debian repos by installing the libreoffice-style-* packages.

Anyway, we're commiting sacriledge by making our libreoffice look like MS office. You can obviously use the instructions here to pick a less controversial look.

Icons:
First go to http://charliecnr.deviantart.com/art/Office-2013-theme-for-LibreOffice-512127527 or use wget:

wget http://orig11.deviantart.net/e10d/f/2015/059/7/3/office_2013_theme_for_libreoffice_by_charliecnr-d8gwo4n.zip
mv office_2013_theme_for_libreoffice_by_charliecnr-d8gwo4n.zip images_office2013.zip
sudo mv images_office2013.zip /usr/lib/libreoffice/share/config/

Then in libreoffice go to Tools/Options/Libreoffice:view and pick your theme:
 Menu bar:
This is a weird one. Go to Tools/Options/Libreoffice:personalisation and under Firefox themes select own theme, and Visit Firefox Themes.



Then paste the URL of the theme you want. I used this URL: https://addons.mozilla.org/en-US/firefox/addon/office2010-edit/

And this is how libreoffice writer looks now:

28 July 2015

618. Modifying ECCE to work with slurm

UPDATE: ecce stops monitoring the job after 10-20 seconds. The job continues fine though. Working on fixing the monitoring issue. This message will be removed once that's fixed. It was due to $q needing to be lowercase (i.e. 'slurm', not 'Slurm') in eccejobmonitor.



Sun Gridengine has been removed from debian jessie (it's in wheezy and sid). This has given me a good excuse to explore setting up SLURM on my debian cluster. So I did: http://verahill.blogspot.com.au/2015/07/617-slurm-on-debian-jessie-and.html

My setup is very simple, with each node having it's own working directory that they export via NFS to the main node. Also, I never run jobs across several nodes. Because of that, each node has it's own queue. Not how beowulf clusters were supposed to work, but it's the best solution for me (e.g. ROCKS does the opposite -- exports the user dir from the main node, but that makes reading and writing slow where it counts i.e. on the work nodes).

I've currently got this slurm.conf:
ControlMachine=beryllium ControlAddr=192.168.1.1 MpiDefault=none ProctrackType=proctrack/pgid ReturnToService=2 SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid SlurmdSpoolDir=/var/lib/slurm/slurmd SlurmUser=slurm StateSaveLocation=/var/lib/slurm/slurmctld SwitchType=switch/none TaskPlugin=task/none FastSchedule=1 SchedulerType=sched/backfill SelectType=select/linear AccountingStorageType=accounting_storage/filetxt AccountingStorageLoc=/var/log/slurm/accounting ClusterName=rupert JobAcctGatherType=jobacct_gather/none SlurmctldLogFile=/var/log/slurm/slurmctld.log SlurmdLogFile=/var/log/slurm/slurmd.log NodeName=beryllium NodeAddr=192.168.1.1 NodeName=neon NodeAddr=192.168.1.120 state=unknown NodeName=tantalum NodeAddr=192.168.1.150 state=unknown NodeName=magnesium NodeAddr=192.168.1.200 state=unknown NodeName=carbon NodeAddr=192.168.1.190 state=unknown NodeName=oxygen NodeAddr=192.168.1.180 state=unknown PartitionName=All Nodes=neon,beryllium,tantalum,oxygen,magnesium,carbon default=yes maxtime=infinite state=up PartitionName=mpi4 Nodes=tantalum maxtime=infinite state=up PartitionName=mpi12 Nodes=carbon maxtime=infinite state=up PartitionName=mpi8 Nodes=neon maxtime=infinite state=up PartitionName=mpix8 Nodes=oxygen maxtime=infinite state=up PartitionName=mpix12 Nodes=magnesium maxtime=infinite state=up PartitionName=mpi1 Nodes=beryllium maxtime=infinite state=up
and sinfo returns
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST All* up infinite 6 idle beryllium,carbon,magnesium,neon,oxygen,tantalum mpi4 up infinite 1 idle tantalum mpi12 up infinite 1 idle carbon mpi8 up infinite 1 idle neon mpix8 up infinite 1 idle oxygen mpix12 up infinite 1 idle magnesium mpi1 up infinite 1 idle beryllium

The first step was to figure out what files to edit:

grep -rs "qsub"
apps/siteconfig/QueueManagers:SGE|submitCommand: qsub ##script##
grep -rs "SGE"
apps/scripts/eccejobmonitor: &MsgSendUp("SGE job id '$id' in state '$state'"); [..] apps/siteconfig/Queues:magnesium|queueMgrName: SGE

Here are the files I edited:

apps/siteconfig/QueueManagers:
12 QueueManagers: LoadLeveler \ 13 Maui \ 14 EASY \ 15 PBS \ 16 LSF \ 17 Moab \ 18 SGE \ 19 Shell\ 20 Slurm
185 Shell|jobIdParseExpression: \ [0-9]+ 186 187 ############################################################################### 188 # SLURM 189 # Simple Linux Utility for Resource Management 190 # 191 # 192 Slurm|submitCommand: sbatch ##script## 193 Slurm|cancelCommand: scancel ##id## 194 Slurm|queryJobCommand: squeue 195 Slurm|queryMachineCommand: sinfo -p ##queue## 196 Slurm|queryQueueCommand: squeue -a 197 Slurm|queryDiskUsageCommand: df -k 198 Slurm|jobIdParseExpression: .* 199 Slurm|jobIdParseLeadingText: job
apps/scripts/eccejobmonitor:
2124 LogMsg "Globus status from eccejobstore: $state"; 2125 } 2126 elsif ($q eq 'slurm') 2127 { 2128 $cmd = "squeue 2>&1"; 2129 if (open(QUERY, "$cmd |")) 2130 { 2131 $gotState = 0; 2132 while () 2133 { 2134 LogMsg "JobCheck: Slurm qstat line: $_"; 2135 if (/^\s*$id/) 2136 { 2137 my $state = (split)[5]; 2138 2139 &MsgSendUp("Slurm job id '$id' in state '$state'"); 2140 2141 if (grep {$state eq $_} qw{R 2142 t}) 2143 { 2144 $status = $JOB_STATE_RUNNING; 2145 } 2146 elsif (grep {$state eq $_} qw{PD}) 2147 { 2148 $status = $JOB_STATE_PENDING; 2149 } 2150 $gotState = 1; 2151 last; 2152 } 2153 } 2154 if ($gotState == 0) 2155 { 2156 if ($gJobCheckState != $JOB_STATE_NONE) 2157 { 2158 $status = $JOB_STATE_DONE; 2159 } 2160 } 2161 close QUERY;

Next set up a new machine (or queue) using ecce -admin. Set up a queue -- you won't be able to select Slurm, so select e.g. PBS. Edit the apps/siteconfig/CONFIG.machinename file to e.g.
1 NWChem: /opt/nwchem/Nwchem/bin/LINUX64/nwchem 2 Gaussian-03: /opt/gaussian/g09d/g09/g09 3 perlPath: /usr/bin/ 4 qmgrPath: /usr/bin/ 5 xappsPath: /usr/bin/ 6 7 Slurm { 8 #!/bin/csh 9 #SBATCH -p mpi8 10 #SBATCH --time=$walltime 11 #SBATCH --output=slurm.out 12 #SBATCH --job-name=$submitFile 13 } 14 15 NWChemEnvironment { 16 PYTHONPATH /opt/nwchem/Nwchem/contrib/python 17 } 18 19 NWChemFilesToRemove{ core *.aoints.* *.gridpts.* } 20 21 NWChemCommand { 22 setenv PATH "/bin:/usr/bin:/sbin:/usr/sbin" 23 setenv LD_LIBRARY_PATH "/usr/lib/openmpi/lib:/opt/openblas/lib:/opt/acml/acml5.3.1/gfortran64_fma4_int64/lib:/opt/acml/acml5.3.1/gfortran64_int64/lib:/opt/intel/mkl/lib/intel64" 24 hostname 25 mpirun -n $totalprocs /opt/nwchem/Nwchem/bin/LINUX64/nwchem $infile > $outfile 26 } 27 28 Gaussian-03FilesToRemove{ core *.rwf } 29 30 Gaussian-03Command{ 31 set path = ( /opt/nbo6/bin $path ) 32 setenv GAUSS_SCRDIR /home/me/scratch 33 setenv GAUSS_EXEDIR /opt/gaussian/g09d/g09/bsd:/opt/gaussian/g09d/g09/local:/opt/gaussian/g09d/g09/extras:/opt/gaussian/g09d/g09 34 /opt/gaussian/g09d/g09/g09< $infile > $outfile 35 echo 0 36 } 37 38 Wrapup{ 39 dmesg|tail 40 find ~/scratch/* -name "*" -user me|xargs -I {} rm {} -rf 41 }
Next, edit apps/siteconfig/Queues -- in my case the machine I created is called neon-slurm:
neon-slurm|queueMgrName: Slurm neon-slurm|queueMgrVersion: 2.0~ neon-slurm|prefFile: neon-slurm.Q
And that's all. You should now be able to submit jobs via slurm. There's obviously a lot more than can be done and configured with SLURM, but this was enough to get me up and running, so that I'm now 'future-proofed' in case SGE never comes back into debian stable.

And here's what it looks like when my ecce-submitted jobs are running:

squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 30 mpi8 In_monom me PD 0:00 1 (Resources) 29 mpi8 b_monome me R 16:12 1 neon 31 mpix12 tl_dimer me R 34:28 1 magnesium