Sorry about the spam, just an update on this. It seems that I have made it to work. The issue is that the model expects exactly the same number of categories for the categorical variables. As it stands, and if this assessment is correct, it really limits the way that we can use models to predict on unseen data. However, there is a way out that provides some utility. The simple steps I did, if anyone is interested to do something similar are:
- Find a structured dataset (duh). I have a set of calculated data for building performance from a parametric design model.
- Hallucinate new and unseen data. The hallucinated data I generated represent variations to the designs of the calculated data, but the variation can only be applied to the continuous variables. This means categorical variables have exactly the same unique values in both training and hallucinated data.
- Train the model in the typical way. Create a df_train and df_test out of the calculated data using proc_df. Use the same mapper to pre-process the unseen dataframe.
- Predict on the unseen data using the following code (where df_hallu is my processed hallucinated data):
#Single record - for some i
test_record = df_hallu.iloc[i,:]
cat_vars = [ ‘in:WWRatio’, ‘in:GlazVLT’, ‘in:Orientation’] # my categorical variables in this case
cont_vars = [‘in:Depth’, ‘in:Width’,‘in:CeilingHeight’] # my continuous variables in this case
cat = test_record[cat_vars].values.astype(np.int64)[None]
cont = test_record[cont_vars].values.astype(np.float32)[None]
#Prediction
model = m.model
model.eval()
prediction = to_np(model(V(cat), V(cont)))
prediction = np.exp(prediction) # since we used log(y)
print(f"DA prediction: {prediction}")
#All records
hallucination_results =
cat_vars = [ ‘in:WWRatio’, ‘in:GlazVLT’, ‘in:Orientation’]
cont_vars = [‘in:Depth’, ‘in:Width’,‘in:CeilingHeight’]
model = m.model
model.eval()
for i, row in df_hallu.iterrows():
…test_record = row
…cat = test_record[cat_vars].values.astype(np.int64)[None]
…cont = test_record[cont_vars].values.astype(np.float32)[None]
…prediction = np.exp(to_np(model(V(cat), V(cont))))
…hallucination_results.append(prediction)
Hope this is helpful to someone. Also, if there is something I’ve missed or a simpler way to do this please let me know. Additionally, it would be extremely useful if the (assumed) limitation of same cardinality for categorical variables when predicting can be bypassed (I guess there is a chance it’s not really possible or useful to do so given the inner workings of the model).