Lambda function and Serialization
After lesson 3, the Deployment Season is official open. Deployment means exporting a Learner
object with involves object serialization.
Here is a useful tip that comes straight from the fastbook chapter 6:
In the example, here below, both get_x
and get_y
are using lambda functions.
dblock = DataBlock(get_x = lambda r: r['fname'], get_y = lambda r: r['labels'])
dsets = dblock.datasets(df)
dsets.train[0]
We can also define them as regular function like this:
def get_x(r): return r['fname']
def get_y(r): return r['labels']
dblock = DataBlock(get_x = get_x, get_y = get_y)
dsets = dblock.datasets(df)
dsets.train[0]
So, which one should we choose?
If we are exporting our Learner
object that is internally using a DataBlock object similar to one of those defined above, it is better to use the second approach (def get_x(r) …
) because lambda function are not compatible with serialization. The latter is used when exporting a Learner
object. On the other hand, if we are quickly experimenting, we can use the lambda version.