Since starting fastai classes, I have must have typed shape(), len(), and type() a thousand times. What a great feature it would be if the Jupyter notebook automatically told you the type, dimensions, and internal types every time it displays (or assigns) a value. Such a feature would have saved me a lot of time and bugs!
So I made a function that extracts the essential values that we often need to know while writing machine learning code. Here are some examples:
from showType import ShowType
st = ShowType()
a = [[i,i+6]for i in range(6)]
st.type_str(a)
'list[6]<list[2]<int, int>, list[2]<int, int>, list[2]<int, int>, list[2]<int, int>,...>'
npa = np.array(a, dtype=np.float32)
st.type_str(npa)
'ndarray[6, 2]<float32>'
st.type_str(torch.tensor(npa))
'Tensor(cpu)[6, 2]<float32>'
#from Lesson 1 - Pets...
st.type_str(data.one_batch())
'tuple[2]<Tensor(cpu)[64, 3, 224, 224]<float32>, Tensor(cpu)[64]<int64>>'
data = [['1/1/2019', 181, 185.6, 187.3, 180], ['1/2/2019', 185.2, 186.6, 188.3, 182]]
df = pd.DataFrame(data, columns = ['Date', 'Open', 'Close', 'High', 'Low'])
st.type_str(df)
'DataFrame[2, 5]<Date<object> Open<float64> Close<float64> High<float64> Low<int64> >'
st.type_str(df['Open'])
'Series[2]<float64>'
x = (1, 'wsx', 2.3, ['qaz',7])
st.type_str(x)
'tuple[4]<int, str, float, list[2]<str, int>>'
Ideally I would like this type string to be displayed in addition to the output of every notebook cell. I took a look at the docs for Jupyter extensions. It looks very possible to package into a Jupyter extension - the value of the current cell is accessible, and HTML can format as desired. But I realized that I don’t have the skills or time to figure out how to do it.
If anyone wants to tackle integrating type_str into Jupyter, the code is below for the taking. I, and likely others, would find their code development made easier. Thanks!
import numpy as np
import pandas as pd
import torch
class ShowType():
# Initial version 20190504 Malcolm McLean
def __init__(self, width=4):
self._width = width #Traverse lists only this far
def type_str(self, o): # generate a string that tells us the type and dimensions of o.
def getLastWord(s):
ix = s.rfind('.')
return s if (ix == -1) else s[1 + ix:]
ts = getLastWord(type(o).__name__) #The base type name
rs = ts # starts the result string
if hasattr(o, 'shape'):
if ts == 'Tensor':
rs += '('+str(o.device)+')' #Append the device
rs += str([d for d in o.shape]) #Append the shape dimensions in square brackets
if ts=='DataFrame': #List column names and their types
rs += '<'
for col, cte in zip(o.columns, o.dtypes):
rs += col + '<' + str(cte) + '> '
rs += '>'
elif ts=='Series':
rs = rs+'<'+str(o.dtype)+'>' #Its type
else:
rs += '<'+getLastWord(str(o.dtype)) + '>' #If o has a shape, assume elements are homogeneous, append element type.
elif hasattr(o, '__len__'):
if (rs == 'str'): return rs # String has a length but will not be disassembled further
#o is likely to be a Python tuple or list.
rs += '['+str(len(o)) + ']' #append the length
#Show width members of the contents recursively.
if len(o)>0:
rs += '<'
for i,m in enumerate(o):
if (i>=self._width): break
if i!=0: rs += ', '
rs += self.type_str(m)
if i>=self._width: #Were there more elements?
rs += ',...'
rs+= '>' #Close the list of elements
return rs
# Python Tuple
# -length
# -anything
#
# Python list
# -length
# -anything
#
# Python function
#
# Numpy
# - shape
# - homogeneous
# - len yields 1st dimension
# - a.dtype
#
# PyTorch
# -shape
# -homgeneous, t.type()
# -len
# -device
#
# Pandas DataFrame
# -shape
# -columns,dtypes
#
# Pandas Series
# -shape
# -dtype
#
Edit: Bug fixed for empty list/tuple.