Showing posts with label cosmo. Show all posts
Showing posts with label cosmo. Show all posts

24 May 2013

432. NWChem 6.3 -- COSMO is now fast(er)!

I probably would've been more excited about this about a year ago when I 'believed' in implicit solvation models (nothing's perfect, and we'll use what is practical so they do fill a strong need. They just aren't very informative for a lot of systems) but it's still a Good Thing.

COSMO has been done using numerical gradients in nwchem 6.1.1 and earlier versions, which has meant that it's been horrendously slow in many cases, in particular if you need to optimise a structure using implicit solvation. COSMO has been -- and still is -- the only implicit solvation model implemented in NWChem, so slow COSMO puts a bit of a spanner in the solvation energy works. Sometimes the calculation even refuses to converge at all.

In contrast, Gaussian has had a number of implicit solvation models implemented, ranging from the quick and dirty PCM, to slower (and better?) C-PCM and I-PCM.

So this is great news.

A quick example:


The test:
Here's a test job (the default cosmo parameters aren't realistic, but this is for testing purposes):
scratch_dir /scratch start benzene geometry units angstroms C 0.100 1.396 0.000 C 1.209 0.698 0.000 C 1.209 -0.698 0.000 C 0.000 -1.396 0.000 C -1.209 -0.698 0.000 C -1.209 0.698 0.000 H 0.000 2.479 0.000 H 2.147 1.240 0.000 H 2.147 -1.240 0.000 H 0.000 -2.479 0.000 H -2.147 -1.240 0.000 H -2.147 1.240 0.000 end basis H library "6-31+g*" c library "6-31+g*" end dft direct end cosmo end scf maxiter 999 end task dft
Note that this is the same test job (plus cosmo, minus optimize) as shown here: http://verahill.blogspot.com.au/2013/05/430-briefly-crude-comparison-of.html

The results:
And here is what I see using nwchem 6.3. (w/ acml 5.3.1, AMD FX 8150/32 gb ram):
6.1.1 19.4 seconds
6.3   14.3 seconds

The difference isn't significant (in the sense that times are too variable so we can't really tell which is faster for such a short job).

But when we change task dft to task dft optimize we get
6.1.1 Fails after 2600 seconds
6.3   128.3 seconds

6.3 churns through the steps pretty efficiently:
@ Step Energy Delta E Gmax Grms Xrms Xmax Walltime @ ---- ---------------- -------- -------- -------- -------- -------- -------- @ 0 -230.09337488 0.0D+00 0.07376 0.01302 0.00000 0.00000 18.1 @ 1 -230.10523734 -1.2D-02 0.00903 0.00231 0.03627 0.10509 45.7 @ 2 -230.10619442 -9.6D-04 0.00491 0.00084 0.01898 0.06082 69.1 @ 3 -230.10628696 -9.3D-05 0.00176 0.00030 0.00737 0.02428 93.3 @ 4 -230.10629787 -1.1D-05 0.00023 0.00005 0.00219 0.00682 115.8 @ 5 -230.10629827 -4.0D-07 0.00004 0.00001 0.00047 0.00136 128.2 @ 5 -230.10629827 -4.0D-07 0.00004 0.00001 0.00047 0.00136 128.2
while 6.1.1 drags itself along for almost an hour:
@ Step Energy Delta E Gmax Grms Xrms Xmax Walltime @ ---- ---------------- -------- -------- -------- -------- -------- -------- @ 0 -230.09389924 0.0D+00 0.07389 0.01306 0.00000 0.00000 691.4 @ 1 -230.10680306 -1.3D-02 0.01081 0.00197 0.03065 0.10438 1378.3 @ 2 -230.10690186 -9.9D-05 0.01000 0.00167 0.00231 0.00803 2092.2
before failing with
6:6:driver: task_gradient failed:: 0 (rank:6 hostname:neon pid:4536):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/armci.c:ARMCI_Error():208 cond:0 ------------------------------------------------------------------------ There is an error related to the specified geometry ------------------------------------------------------------------------

Sure, the optimization takes 128 seconds instead of ca 44 seconds, but for anyone who's used NWCHEM with COSMO in the past, that's actually not too bad.

I ran another job to get a better feeling for how much longer COSMO vs no COSMO takes for optimization. Optimization of Arecoline (available in ECCE as a fragment) at rb3lyp/6-31+G* takes 2h 5 min with COSMO (33 optimization steps). Without COSMO it takes 37 minutes and uses 14 steps.

31 October 2012

273. NWChem and COSMO: custom radii

There are two approaches described in the nwchem manual for using custom radii in COSMO calculations:

geometry
  H  0 0 0
  H  0 1 0
  O  1 0 0
end
cosmo
       radius 1.1
              1.1
              1.8
end

and
cosmo
end
set cosmo:map custom.par

where custom.par looks like this:
H 1.1
O 1.8
The downside to the first example is that it's a PITA to use -- you need to enter each vdw radius in the order you are listing the atoms in the geometry section. It means that for a 50 atom geometry you need to enter 50 values, even if all 50 atoms are the same element.

The downside with the second example is that you need to first create the run folder, put cosmo.par there and first then can you submit.

An easier approach is to create the custom.par on the fly using task shell:
task shell "echo -e 'H 1.1 \n O 1.8' > mycosmopars.par"
cosmo
end
set cosmo:map mycosmopars.par

14 June 2012

191. Thinking about Molecular volume -- and is cosmo/nwchem yielding the right ones?

Disclaimer:
I'm an neither a theoretical nor computational chemist. I'm an analytical/inorganic chemist who likes computers. Don't trust my conclusions. This is more like thinking aloud.

The problem:
The underlying impetus is that of molecular volume: if we have a set of scatter points in space which define the surface of a molecule, how can we extract the volume? In particular as we're actually given the surface points by in the form of a cosmo.xyz file by COSMO (and yes, nwchem also outputs a volume - more about that later) there's no reason why we won't do the calculations ourselves. Also, there's at least one example of comparing results from a few major software packages, where nwchem was the odd one out.

Because it's good to know how to use Octave and bash, I'll show the commands as well.

The COSMO parameters used were
cosmo
    rsolv 0
end

[come to think of it: why bother with
and nwchem returned

 atomic radii =
 --------------
    1  6.000  2.000
    2  6.000  2.000
    3  6.000  2.000
    4  6.000  2.000
    5  6.000  2.000
    6  6.000  2.000
    7  1.000  1.300
    8  1.000  1.300
    9  1.000  1.300
   10  1.000  1.300
   11  1.000  1.300
   12  1.000  1.300
and a volume of ca 74.5 Å3

Processing:
me@Be:~$ head cosmo.xyz 
                  325
 cosmo charges
 Bq   2.1848085582473193      -0.38055253987610238        1.5251498369435705       -9.3089382062078174E-004
 Bq   1.6134835908159706      -0.59877925881345084        1.8782480854375714       -3.3706153046646758E-003
 Bq  0.43449121346899733      -0.59877925881345084        1.8782480854375714       -3.9739778624157118E-003
 Bq   1.0239874021424840      -0.23823332776127137        1.8683447179254316       -1.6433149723942275E-003


OK, we need to remove the first two lines, and the first column.
me@Be:~$ tail -n +3 cosmo.xyz|gawk '{print $2,$3,$4,$5}'> cos2.xyz
Start octave.
octave:1> bz=load('cos2.xyz');
octave:2> x=bz(:,1);y=bz(:,2);z=bz(:,3);c=bz(:,4);
octave:3> plot3(x,y,z)

Paradoxically, this would be fairly easy to do with a 'normal-size' physical object (e.g. water displacement, or area on a 2D project: draw it, cut it out, weigh it and use the density of the paper)

 Computationally, we need to think about it though. The most logical approach seems to be to take all x,y data points with a small range of values of z=zi±dz, project them on a 2D surface, calculate the area within, and multiply it by dz. Do this for all values of z.
octave:4> plot(y,z,'*')


But how to calculate the area inside an arbitrary two-dimensional figure then? If we can pick a point in the 'centre' of the figure, we can draw repeated triangles with this point as one of the corners. It's easy to calculate the area of a triangle, so we just need to sum the areas of the triangles. All we need to know is how to find such a central point to use as a corner. Also, there are problems when dz is too large and the 'border' becomes fuzzy.
octave:5> plot(y(1:25),z(1:25),'*')

In fact, at this stage there may well be pre-canned algorithms to help us.
octave:6>H=convhull(y(1:25),z(1:25));
octave:7>plot(y(H),z(H))
octave:8>hold
octave:9>plot(y(1:25),z(1:25),'*')

That way we can reduce the number of points to the ones defining the encircling figure.
octave:10>area(y(H),z(H))


That still doesn't give us the area (I think matlab does though). Since it's centred around the x axis we could probably use cumsum(abs(z(H))), but that's not general enough. In fact, there'd be so much  quality analysis required in order to make sure that we include enough, but not too many, points in our slices that it quickly becomes a chore.

So we'll take a step back. Turns out it's even easier:
octave:11>[H V]=convhulln([x y z]);
This probably isn't how you're supposed to plot it, but it works:
octave:13>trisurf(H,x,y,z)

trisurf plot
octave:12>V
gives V=104.07  Å3 (c.f. Nwchem/cosmo ca 74.5 Å3 for rsolv=0.)

Now that doesn't look good, but it has been noted nwchem/cosmo gives volumes which are about half of what every other program gives. See here and here:

">Cosmo produced volumes, which were twice as small
> as those obtained by PCM, while surfaces where by about 15% bigger in
> Cosmo."

I think nwchem actually isn't returning values of the wrong magnitude -- I think the value returned by nwchem is the molecular volume, while the other programmes return the solvent accessible surface-based volume. But what is in cosmo.xyz?

It appears to be a little bit more complex than that though.


We can open the cosmo.xyz file in jmol, but calculating the volume from these would be meaningless due to the way jmol works.

Instead we'll have to use the VdW radii of the xyz coordinates of the (unoptimised) molecule:


$ isosurface sasurface 0.5 volume
isosurface1 created with cutoff=0.0; isosurface count: 1
isosurfaceVolume = 141.06999
$ isosurface sasurface 0.225 volume
isosurface1 created with cutoff=0.0; isosurface count: 1
isosurfaceVolume =104.452415
$ isosurface solvent 0 volume
isosurface1 created with cutoff=0.0; isosurface count: 1
isosurfaceVolume = 79.09731
$ isosurface solvent volume
isosurface1 created with cutoff=0.0; isosurface count: 1
isosurfaceVolume = [80.26721490808025]
$ isosurface molecular volume
isosurface2 created with cutoff=0.0; isosurface count: 2
isosurfaceVolume = [80.58888982478977]
$ isosurface sasurface 0.2 area
isosurface1 created with cutoff=0.0; isosurface count: 1
isosurfaceArea = 118.730934
Making sense?

sasurface generates a solvent accessible surface. We can generate a value similar to what we saw from the cosmo.xyz points by forcing the sasurface probe radius..

The vdw radii of H and C are 1.2 and 1.7 Å, but COSMO uses 1.3 and 2.0.

Look at this plot again:


The height goes from -2 to 2, which agrees with the large 2.0 Å VDW radius for C that COSMO uses. The volume outputted by Nwchem is the molecular volume (as actually is stated). 
 number of -cosmo- surface points =      176
 molecular surface =    125.008 angstrom**2
 molecular volume  =     74.512 angstrom**3
(electrostatic) solvation energy =         0.0052128678 (    3.27 kcal/mol)
The molecular volume for rsolv=0 is 74.5 Å3 which is fairly close to isosurface sasurface 0 volume. Area is trickier, and requires isosurface sasurface 0.23 volume to yield anything close.

I don't think it's a coincidence that isosurface sasurface 0.225 volume gives a reasonable agreement with the cosmo.xyz, since 1.7+0.225=1.925 which is ca 2 (we only add 0.1 for H).

I'm sure all this is in the manual somewhere. But there's nothing like learning through doing.

The conclusions:
* NWchem returns a volume based on the vdw radii, not the solvent cavity
* cosmo.xyz contains points defining the surface according to the vdw radii that cosmo uses
* These are two different sets of vdw radii
* You can't compare the output of different software packages if they aren't outputting the same data
* The reported NWChem volume does depend on rsolv, the cosmo vol doesn't
* The cosmo.xyz volume is insensitive to rsolv, but sensitive to radius as expected. As far as I understand, the cosmo volumes are based solely on the vdw radii (as supplied to cosmo)
* I haven't quite figured out how, but looking at the dependency of rsolve vs defining vdw radii for cosmo, the radii used to calculate the nwchem volume is is certainly affected.

Increase rsolv=0.0, increase vdw +0.0: 74.51/104.07/3.27
Increase rsolv=0.5, increase vdw +0.0: 58.0/103.96/3.01
Increase rsolv=1.0, increase vdw +0.0: 54 /103.87/2.95
Increase rsolv=0.0, increase vdw +0.1: 84.79/115.10/2.72
Increase rsolv=0.1, increase vdw +0.1: 82.68/115.10/2.63
Increase rsolv=0.27, increase vdw +0.1: 71.84/114.97/2.56
Increase rsolv=0.0, increase vdw +0.2: 96.59/126.83/2.22
Increase rsolv=0.1, increase vdw +0.2: 85.70/126.68/2.09
increase rsolv=0.70, increase vdw +0.2: 74.68/126.56/2.01

My only real conclusion at this point is that you have to be extremely careful about what you do. This is not easy.


A Certain Commercial Programme (ACCP):
Using pcm:

scrf=(pcm,solvent=water) -- this uses vdw radii.
GePol: Cavity volume                                =      134.665 Ang**3
GePol: Cavity surface area                          =    143.132 Ang**2
Let's see if we can do this in jmol:
$ isosurface sasurface 0.5 area
isosurface1 created with cutoff=0.0; isosurface count: 1
isosurfaceArea = 144.25595
$ isosurface sasurface 0.46 volume
isosurface1 created with cutoff=0.0; isosurface count: 1
isosurfaceVolume = 135.33589
PCM is less of a mystery now.

ACCP has a few more options though.
Using IPCM with 50 points. This uses the isodensity volume.
Volume of Solute Cavity = 8.026500E+02
Total "Solvent Accessible Surface Area" of Solute = 4.485628E+02
I've been told that the units are in Bohr3 and Bohr2. That translates to 118.94 Å3 and 125.61Å3, respectively, which sounds about right.