Previous topic

2.2.2. Python Examples using h5py

Next topic

h5py example writing a simple NeXus data file with links

This Page

h5py example writing the simplest NeXus data fileΒΆ

In this example, the 1-D scan data will be written into the simplest possible NeXus HDF5 data file, containing only the required NeXus components. NeXus requires at least one NXentry group at the root level of an HDF5 file. The NXentry group contains all the data and associated information that comprise a single measurement. NeXus also requires that each NXentry group must contain at least one NXdata group. NXdata is used to describe the plottable data in the NXentry group. The simplest place to store data in a NeXus file is directly in the NXdata group, as shown in the next figure.

fig.simple-example-h5py

Simple Example

In the above figure, the data file (writer_1_3_h5py.hdf5) contains a hierarchy of items, starting with an NXentry named entry. (The full HDF5 path reference, /entry in this case, is shown to the right of each component in the data structure.) The next h5py code example will show how to build an HDF5 data file with this structure. Starting with the numerical data described above, the only information written to the file is the absolute minimum information NeXus requires. In this example, you can see how the HDF5 file is created, how Data Groups and datasets (Data Fields) are created, and how Data Attributes are assigned. Note particularly the NX_class attribute on each HDF5 group that describes which of the NeXus Base Class Definitions is being used. When the next Python program (writer_1_3_h5py.py) is run from the command line (and there are no problems), the writer_1_3_h5py.hdf5 file is generated.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#!/usr/bin/env python
'''
Writes the simplest NeXus HDF5 file using h5py
according to the example from Figure 1.3 
in the Introduction chapter
'''

import h5py
import numpy

INPUT_FILE = 'input.dat'
HDF5_FILE = 'writer_1_3_h5py.hdf5'

#---------------------------

tthData, countsData = numpy.loadtxt(INPUT_FILE).T

f = h5py.File(HDF5_FILE, "w")  # create the HDF5 NeXus file
# since this is a simple example, no attributes are used at this point

nxentry = f.create_group('Scan')
nxentry.attrs["NX_class"] = 'NXentry'

nxdata = nxentry.create_group('data')
nxdata.attrs["NX_class"] = 'NXdata'

tth = nxdata.create_dataset("two_theta", data=tthData)
tth.attrs['units'] = "degrees"

counts = nxdata.create_dataset("counts", data=countsData)
counts.attrs['units'] = "counts"
counts.attrs['signal'] = 1
counts.attrs['axes'] = "two_theta"

f.close()   # be CERTAIN to close the file

We wish to make things a bit simpler for ourselves when creating the common structures we use in our data files. To help, we gather together some of the common concepts such as create a file, create a NeXus group, create a dataset and start to build a helper library. (See mylib support module for more details.) Here, we call it my_lib. Applying it to the simple example above, our code only becomes a couple lines shorter! (Let’s hope the library starts to help in larger or more complicated projects.) Here’s the revision that replaces direct calls to numpy and h5py with calls to our library. It generates the file writer_1_3.hdf5.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#!/usr/bin/env python
'''
Writes the simplest NeXus HDF5 file using 
a simple helper library with h5py and numpy calls
according to the example from Figure 1.3 
in the Introduction chapter
'''

import my_lib

INPUT_FILE = 'input.dat'
HDF5_FILE = 'writer_1_3.hdf5'

#---------------------------

tthData, countsData = my_lib.get2ColumnData(INPUT_FILE)

f = my_lib.makeFile(HDF5_FILE)
# since this is a simple example, no attributes are used at this point

nxentry = my_lib.makeGroup(f, 'Scan', 'NXentry')
nxdata = my_lib.makeGroup(nxentry, 'data', 'NXdata')

my_lib.makeDataset(nxdata, "two_theta", tthData, units='degrees')
my_lib.makeDataset(nxdata, "counts", countsData, 
                   units='counts', signal=1, axes='two_theta')

f.close()   # be CERTAIN to close the file

One of the tools provided with the HDF5 support libraries is the h5dump command, a command-line tool to print out the contents of an HDF5 data file. With no better tool in place (the output is verbose), this is a good tool to investigate what has been written to the HDF5 file. View this output from the command line using h5dump writer_1_3.hdf5. Compare the data contents with the numbers shown above. Note that the various HDF5 data types have all been decided by the h5py support package.

Note

The only difference between this file and one written using the NAPI is that the NAPI file will have some additional, optional attributes set at the root level of the file that tells the original file name, time it was written, and some version information about the software involved.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
HDF5 "writer_1_3.hdf5" {
GROUP "/" {
   GROUP "Scan" {
      ATTRIBUTE "NX_class" {
         DATATYPE  H5T_STRING {
               STRSIZE 7;
               STRPAD H5T_STR_NULLPAD;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
         DATASPACE  SCALAR
         DATA {
         (0): "NXentry"
         }
      }
      GROUP "data" {
         ATTRIBUTE "NX_class" {
            DATATYPE  H5T_STRING {
                  STRSIZE 6;
                  STRPAD H5T_STR_NULLPAD;
                  CSET H5T_CSET_ASCII;
                  CTYPE H5T_C_S1;
               }
            DATASPACE  SCALAR
            DATA {
            (0): "NXdata"
            }
         }
         DATASET "counts" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SIMPLE { ( 31 ) / ( 31 ) }
            DATA {
            (0): 1037, 1318, 1704, 2857, 4516, 9998, 23819, 31662, 40458,
            (9): 49087, 56514, 63499, 66802, 66863, 66599, 66206, 65747,
            (17): 65250, 64129, 63044, 60796, 56795, 51550, 43710, 29315,
            (25): 19782, 12992, 6622, 4198, 2248, 1321
            }
            ATTRIBUTE "units" {
               DATATYPE  H5T_STRING {
                     STRSIZE 6;
                     STRPAD H5T_STR_NULLPAD;
                     CSET H5T_CSET_ASCII;
                     CTYPE H5T_C_S1;
                  }
               DATASPACE  SCALAR
               DATA {
               (0): "counts"
               }
            }
            ATTRIBUTE "signal" {
               DATATYPE  H5T_STRING {
                     STRSIZE 1;
                     STRPAD H5T_STR_NULLPAD;
                     CSET H5T_CSET_ASCII;
                     CTYPE H5T_C_S1;
                  }
               DATASPACE  SCALAR
               DATA {
               (0): "1"
               }
            }
            ATTRIBUTE "axes" {
               DATATYPE  H5T_STRING {
                     STRSIZE 9;
                     STRPAD H5T_STR_NULLPAD;
                     CSET H5T_CSET_ASCII;
                     CTYPE H5T_C_S1;
                  }
               DATASPACE  SCALAR
               DATA {
               (0): "two_theta"
               }
            }
         }
         DATASET "two_theta" {
            DATATYPE  H5T_IEEE_F64LE
            DATASPACE  SIMPLE { ( 31 ) / ( 31 ) }
            DATA {
            (0): 17.9261, 17.9259, 17.9258, 17.9256, 17.9254, 17.9252,
            (6): 17.9251, 17.9249, 17.9247, 17.9246, 17.9244, 17.9243,
            (12): 17.9241, 17.9239, 17.9237, 17.9236, 17.9234, 17.9232,
            (18): 17.9231, 17.9229, 17.9228, 17.9226, 17.9224, 17.9222,
            (24): 17.9221, 17.9219, 17.9217, 17.9216, 17.9214, 17.9213,
            (30): 17.9211
            }
            ATTRIBUTE "units" {
               DATATYPE  H5T_STRING {
                     STRSIZE 7;
                     STRPAD H5T_STR_NULLPAD;
                     CSET H5T_CSET_ASCII;
                     CTYPE H5T_C_S1;
                  }
               DATASPACE  SCALAR
               DATA {
               (0): "degrees"
               }
            }
         }
      }
   }
}
}

Since the output of h5dump is verbose, a tool (see h5toText support module) was created to print out the structure of HDF5 data files. This tool provides a simplified view of the NeXus file. It is run with a command like this: python h5toText.py h5dump writer_1_3.hdf5. Here is the output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
writer_1_3.hdf5:NeXus data file
  Scan:NXentry
    @NX_class = NXentry
    data:NXdata
      @NX_class = NXdata
      counts:NX_INT32[31] = __array
        @units = counts
        @signal = 1
        @axes = two_theta
        __array = [1037, 1318, 1704, '...', 1321]
      two_theta:NX_FLOAT64[31] = __array
        @units = degrees
        __array = [17.926079999999999, 17.925909999999998, 
                   17.925750000000001, '...', 17.92108]

As the data files in these examples become more complex, you will appreciate the information density provided by the h5toText.py tool.