Where to learn about the model files

(Anders) #1

Is there a place beginners can go and learn more about the pytorch models we are generating with fastai? I see the .h5 files but I would like to know more about their contents, structure and usage.
I think it would enable us to better understand the courses.

(Cedric Chee) #2


HDF5 (.h5, .hdf5) is a file format suitable for storing large collections of multi-dimensional numeric arrays (e.g. models, data files). HDF stands for Hierarchical Data Format. HDF5 is binary data format.

HDF5 file format documentation.

A HDF5 file can hold groups of datasets, where

  • datasets are multidimensional arrays of a homogeneous type, and
  • groups can hold datasets and other groups.

A single HDF5 file can thus act like a file system, which is more portable and efficient than having an actual folder that holds many files.

How do we use it?

HDF5 has a Python library, h5py

With h5py, you can convert HDF5 files to and from numpy arrays, which work nicely with frameworks like TensorFlow and PyTorch.

We can work with HDF5 datasets in PyTorch via NumPy. An example is the following:

import h5py
import torch
import torch.utils.data as data

class H5Dataset(data.Dataset):

    def __init__(self, file_path):
        super(H5Dataset, self).__init__()
        h5_file = h5py.File(file_path)
        self.data = h5_file.get('data')
        self.target = h5_file.get('label')

    def __getitem__(self, index):            
        return (torch.from_numpy(self.data[index,:,:,:]).float(),

    def __len__(self):
        return self.data.shape[0]

Keras also uses HDF5 to save and load models.

(Anders) #3

Interesting, so does fastai use temp.h5 for more than just a weight+bias matrix after fitting?