What do __getstate__ and __setstate__ do?

In the notebook 08_data_block.ipynb in Part 2 of the 2019 Deep Learning course, there is a __setstate __ defined under the Split_func function(which stores the training and validations sets after they have been split) and it is defined as
def __setstate__(self,data:Any): self.__dict__.update(data)

what does __setstate __ mean and do? And thus, what does __getstate __ mean and do?
I tried googling, turns out it has something to do with pickle. Though i can’t understand much from the internet. Can someone please explain it to me? Which also brings me to the question…what is pickle really? Does it have any really good characteristics?

1 Like

Pickle is library for python serialization. Serialization means smth like “packing into bytes”
So it means that it can turn your class or function or variable into bytes and then it may save it on drive.

But to do so, you have to tell python and pickle library how to pack your class.

If you dont provide any methods, default methods will be used. Default methods just returns or sets self.__dict__ property. All things in __dict__ will be pickled by default. You can change this behaviour createing these methods:

__getstate__ should return object (representing class state) which will be pickled and saved.
__setstate__ should take object from parameter and use it to retrieve class state as it was before

In 08_data_block.ipynb notebook you have SplitData which is storing items (paths splited into two sets). And this __setstate__ method (I am assuming from code) just makes that when you load from drive before splited items, instead of overwriting whole “dict” it will update it.

def __setstate__(self,data:Any): self.__dict__.update(data)

There is a comment above, which suggests that this is some kind of workaround to save successfully.

#This is needed if we want to pickle SplitData and be able to load it back without recursion errors