Why are the Hierarchical Data (h5) Format files so large?

jimc · December 28, 2016, 2:31pm

After getting lesson 1 running on some old hardware I had laying around I started digging into the code, specifically utils.py and vgg16.py that I had to modify a bit to get running on my old general purpose server. Specifically I’m curious about why some of the h5 files are so huge, vgg16_bn_conv is nearly half a gig. Are the models so complex they require that much space to express even with an efficient format? Or is HDF just inefficient? Are there tools for examining these files? I’m interested in improvements to the way trained models are stored to increase their portability.

jeremy · December 30, 2016, 4:13am

The weight arrays really are big - look at model.summary() to see how many params have to be stored. You can use an HDF5 viewer to look at the arrays yourself too.

HDF5 does support compression, although I don’t recall if keras uses it.