VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences is deprecated

Does anyone know why I might be getting this warning message?

/opt/conda/envs/fastai/lib/python3.8/site-packages/numpy/core/_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify ‘dtype=object’ when creating the ndarray
return array(a, dtype, copy=False, order=order)

the code I ran is below

seqs_short.shape
(563552, 1)

seqs_short.head()

	Sequence
0	mktllltlvvvtivcldlgytlkchntqlpfiyntcpegknlcfkatlkfplkfpvkrgcaatcprssslvkvvccktdkcn
1	malfrkkdkyirinpnrsriesapqakpevpdelfskcpackvilykndlglektcqhcsynfritaqerraltvdegsfeelftgiettnpldfpnyleklaatrqktgldeavltgkatiggqpvalgimdshfimasmgtvvgekitrlfelaieerlpvvlftasggarmqegimslmqmakisaavkrhsnaglfyltvltdpttggvtasfamegdiilaepqtlvgfagrrviestvrenlpddfqkaeflqehgfvdaivkrqdlpatisrllrmhggvr
2	adnrrpiwnlghmvnalkqiptflxdgana
3	mllgffrllfkglyrvrltgdtqalyqqkvlitpnhvsfldgillalflpvrpvfavytsisqrwfmraltpiidfvpldptkpmsikhlvrlieqgrpvvifpegrisvsgslmkiydgaafvaaksqativplriegaeltpfsrlkglvkrrlfpriqlhllppthlpmpeaprardrrkiagemlhqimmearmavrpretlyesllaaqdrfgarkpcvedinfqpdtyrklltktlfvarilekysqpgekiglmlpnagisaavifgaiargripammnytagvkglssaiaaaelntiftsrtfldkgklwhlpeqltqvrwvfledlkgditladklwifahllaprlaqvkqqpedaamilftsgsegnpkgvvhshksllsnveqiktiadftandrfmsalplfhsfgltvglltplltgaevflypsplhyrvvpelvydrnctvlfgtstflanyarfanpydfyrlryvvagaeklqestkqlwqdkfglrilegygvtecapvvsinvpmaakvgtvgrilpgmdarllampgidqggrlqlkgpnimkgylrvenpgvleapaaenqhgemeagwydtg...
4	mnflahlhlahladsslsgnlladfvrgnpathyppdvvegiymhrridvmtdnlpevrearewfrhetrrvasitldvmwdhflsrhwtqispdfplqafvgyahaqvatilpdspprfvnlndylwsekwleryrdmdfiqnvlngmanrrprldalrdswydldahydaleerfwhfyprmmaqaarkal

dls = DataBlock(blocks=TextBlock.from_df('Sequence', is_lm=True, tok=SubwordTokenizer()),
                    get_x=ColReader('text'),
                    splitter=RandomSplitter(0.1)).dataloaders(seqs_short, bs=64, seq_len=72)
1 Like

Hi wjs20 hope you are having a wonderful day!

Deprecation warnings usually warn of the following scenario.

To mark (a component of a software standard) as obsolete to warn against its use in the future so that it may be phased out.

This link may help resolve the issue, not sure how or where in the code you may need to change your code though.

:thought_balloon: if the sequences were padded and made the same length would the warning stop?

:thought_balloon: Is it something that can be configured in fastai?

hope this helps

Cheers mrfabulous1 :grinning: :grinning:

2 Likes

Good Morning @wjs20 wjs20 and @mrfabulous1 !

It used to be that np.array(list) would put the elements of list into a rectangular array if they were all the same length. And if the lengths were different, it would return an ndarray of objects, with each object as one element of the list. Without complaint.

Now with a later release of numpy, the latter situation generates a warning unless you explicitly specify ‘dtype=object’. In a future update to numpy, not specifying ‘dtype=object’ will generate an error.

So I imagine that you have done nothing wrong and can safely ignore the warning, especially if everything still works. Instead, deep in the bowels of fastai, someone needs to add nd.array(…,dtype=object) (in the case of ragged data) to eliminate the warning and future-proof the code. (I had to do a similar thing in my own code that processes audio samples of different lengths.)

BTW, this speculation could be totally wrong - I have not looked at DataBlock code at all. :upside_down_face:

1 Like